Reinforcement Learning
Last Time
-
What tools do we have to solve MDPs with continuous \(S\) and \(A\)?
Course Map
- Outcome Uncertainty, Immediate vs Future Rewards (MDP)
- Model Uncertainty (Reinforcement Learning)
- State Uncertainty (POMDP)
- Interaction Uncertainty (Game)
Guiding Questions
- What is Reinforcement Learning?
- What are the main challenges in Reinforcement Learning?
- How do we categorize RL approaches?
Problem from HW2
Reinforcement Learning
Previously: \((S, A, T, R, \gamma)\)
r = act!(env, a)
s = observe(env)
Note: Different from \(s', r = G(s, a)\)
In python, typically
s, r = env.step(a)
Now: Episodic Simulator
Learning Curve
Break
Challenges
- Exploration vs Exploitation
- Credit Assignment
- Generalization
Classifications
- Model Based: Attempt to learn \(T\) and \(R\), then find \(\pi^*\) by solving MDP
- Model Free: Attempt to find \(Q^*\) or \(\pi^*\) directly without estimating \(T\) or \(R\)
- On-Policy: Learn only using experience generated with the current policy.
- Off-Policy: Learn using experience generated from the current policy and previous policies.
- Batch: Learn only from previously-generated experience.
- Tabular: Keep track of learned values for each state in a table
- Deep: Use a neural network to approximate learned values
Tabular Maximum Likelihood Model-Based RL
Guiding Questions
- What is Reinforcement Learning?
- What are the main challenges in Reinforcement Learning?
- How do we categorize RL approaches?
090 Reinforcement Learning
By Zachary Sunberg
090 Reinforcement Learning
- 212