Reinforcement Learning

Last Time

What tools do we have to solve MDPs with continuous \(S\) and \(A\)?

Course Map

Outcome Uncertainty, Immediate vs Future Rewards (MDP)
Model Uncertainty (Reinforcement Learning)
State Uncertainty (POMDP)
Interaction Uncertainty (Game)

Guiding Questions

What is Reinforcement Learning?
What are the main challenges in Reinforcement Learning?
How do we categorize RL approaches?

Problem from HW2

Reinforcement Learning

Previously: \((S, A, T, R, \gamma)\)

r = act!(env, a)

s = observe(env)

Note: Different from \(s', r = G(s, a)\)

In python, typically

s, r = env.step(a)

Now: Episodic Simulator

Learning Curve

Break

Challenges

Exploration vs Exploitation
Credit Assignment
Generalization

Classifications

Model Based: Attempt to learn \(T\) and \(R\), then find \(\pi^*\) by solving MDP
Model Free: Attempt to find \(Q^*\) or \(\pi^*\) directly

On-Policy: Learn only using experience generated with the current policy.
Off-Policy: Learn using experience generated from the current policy and previous policies.
Batch: Learn only from previously-generated experience.

Tabular: Keep track of learned values for each state in a table
Deep: Use a neural network to approximate learned values

Next year: On-Policy: policy being learned and policy generating experience are identical (exam 2)

Tabular Maximum Likelihood Model-Based RL

Guiding Questions

What is Reinforcement Learning?
What are the main challenges in Reinforcement Learning?
How do we categorize RL approaches?