# Imitation and Inverse Reinforcement Learning

• ### Last time:

• Turn-taking zero sum games
• Markov Games
• Incomplete Information Games
• ### Today:

• What if you don't know the reward function and just want to act like an expert?
• Imitation Learning
• Inverse Reinforcement Learning

Trivia: When was the first car driven with a Neural Network?

1995:    2797/2849 miles (98.2%)

## Behavioral Cloning

$\underset{\theta}{\text{maximize}} \prod_{(s, a) \in D} \pi_\theta (a \mid s)$

## Stochastic Mixing Iterative Learning (SMILe)

$$(1-\beta)^k$$

## Generative Adversarial Imitation Learning (GAIL)

GANs are frighteningly good at generating believable synthetic things

## Inverse Reinforcement Learning

What if we know the dynamics, but not the reward?

Reinforcement Learning

Inverse Reinforcement Learning

Input

Environment $$(S, A, T, R)$$

$$S, A, T, \{\tau\}$$

Output

$$\pi^*$$

$$R$$

## Exercise

What is the reward function?

## Maximum Margin Inverse Reinforcement Learning

$$\beta(s, a) \in \{0,1\}^n$$

## Principle of Maximum Entropy

$$H(X) = -\sum_x P(x) \log P(x)$$

## Maximum Entropy Inverse Reinforcement Learning

Least informative trajectory distribution

## Recap

• Behavioral cloning is supervised learning to match the actions of an expert
• A critical problem is cascading errors, which can be addressed by gathering more data with DAgger or SMILe
• Inverse reinforcement learning is the process of learning a reward functions from trajectories in an MDP
• IRL is an underspecified problem
• Maximum entropy RL solves this problem by choosing the reward function that maximizes the entropy of the trajectories of the resulting policy

#### 260 Imitation and Inverse Reinforcement Learning

By Zachary Sunberg

• 49