Making RL tractable by learning more informative reward functions: example-based control, meta-learning, and normalized maximum likelihood
Diagram of MURAL, our method for learning uncertainty-aware rewards for RL. After the user provides a few examples of desired outcomes, MURAL automatically infers a reward function that takes into account these examples and the agent’s uncertainty for each state. Although reinforcement learning has shown success in domains such as robotics, chip placement and playing …