Trivia: When was the first car driven with a Neural Network?
1995: 2797/2849 miles (98.2%)
\[\underset{\theta}{\text{maximize}} \prod_{(s, a) \in D} \pi_\theta (a \mid s)\]
Problem: Cascading Errors
\((1-\beta)^k\)
GANs are frighteningly good at generating believable synthetic things
\(\pi_\theta\)
\(C_\phi\)
What if we know the dynamics, but not the reward?
Reinforcement Learning
Inverse Reinforcement Learning
Input
Environment \((S, A, T, R)\)
\(S, A, T, \{\tau\}\)
Output
\(\pi^*\)
\(R\)
What is the reward function?
\(\beta(s, a) \in \{0,1\}^n\)
\(H(X) = -\sum_x P(x) \log P(x)\)
Least informative trajectory distribution
Discounted visitation probability
Optimal policy under \(R_\phi\)