Reinforcement Learning

Last Time

  • What tools do we have to solve MDPs with continuous \(S\) and \(A\)?

Course Map

  • Outcome Uncertainty, Immediate vs Future Rewards (MDP)
  • Model Uncertainty (Reinforcement Learning)
  • State Uncertainty (POMDP)
  • Interaction Uncertainty (Game)

Guiding Questions

  • What is Reinforcement Learning?
  • What are the main challenges in Reinforcement Learning?
  • How do we categorize RL approaches?

Problem from HW2

Reinforcement Learning

Previously: \((S, A, T, R, \gamma)\)

r = act!(env, a)
s = observe(env)

Note: Different from \(s', r = G(s, a)\)

In python, typically

s, r = env.step(a)

Now: Episodic Simulator

Learning Curve

Break

Challenges

  1. Exploration vs Exploitation
  2. Credit Assignment
  3. Generalization

Classifications

  • Model Based: Attempt to learn \(T\) and \(R\), then find \(\pi^*\) by solving MDP
  • Model Free: Attempt to find \(Q^*\) or \(\pi^*\) directly without estimating \(T\) or \(R\)
  • On-Policy: Learn only using experience generated with the current policy.
  • Off-Policy: Learn using experience generated from the current policy and previous policies.
  • Batch: Learn only from previously-generated experience.
  • Tabular: Keep track of learned values for each state in a table
  • Deep: Use a neural network to approximate learned values

Tabular Maximum Likelihood Model-Based RL

Guiding Questions

  • What is Reinforcement Learning?
  • What are the main challenges in Reinforcement Learning?
  • How do we categorize RL approaches?

090 Reinforcement Learning

By Zachary Sunberg

090 Reinforcement Learning

  • 212