IDIL: Imitation Learning of Intent-Driven Expert Behavior
AAMAS 2024
Presented By: Himanshu Gupta
Date: 6/10/2024
Authors: Sangwon Seo, Vaibhav Unhelkar
Why this paper?
- Saw this at AAMAS 2024.
- I found it cool and wanted to read it.
- I found it cool and wanted to read it.
- I care about close proximity human-robot tasks.
- We need good human models for them.
- We need good human models for them.
- Amazing Paper
- Developed theory and validated it with an interesting application.
Motivation
Traditional IL
- \( \pi_E = \pi(a|s) \)
IDIL
- \( \pi_E = \pi(a|s,x) \)
- \( \zeta(x'|s,x) \)
Preliminaries
- MDP
- Given an MDP and a policy \( \pi \), we get a Markov chain
- Use the Markov chain to get the stationary distribution called an occupancy measure
Markov Chain
Preliminaries
- Traditional IL
- GOAL : Find a policy \(\pi\) that matches \(\pi_E\)
- Given M and a set of expert demonstrations D
- Same as solving the occupancy measure matching problem
\( \pi_E = \pi(a|s) \)
Preliminaries
- Model the agent using Agent Markov Model (AMM)
- Define AMM using the Tuple
- Given the MDP
X - set of latent states
- AMM model describing the expert \( \mathcal{N}_E = (\pi_E, \zeta_E) \)
IDIL
- INPUT
- Demonstration Set \( \mathcal{D} \)
- Discrete set of intents X
- MDP M
- GOAL : Learn an AMM model \( \mathcal{N} = (\pi, \zeta) \) that mimics the expert behavior
IDIL
- Objective for Traditional IL
- Why not do this for IDIL?
\( \mathcal{N} = (\pi, \zeta) \)
IDIL
- Objective for Traditional IL
- Do this for IDIL
\( \mathcal{N} = (\pi, \zeta) \)
IDIL
- Do this for IDIL
\( \mathcal{N} = (\pi, \zeta) \)
Q: How do I get \(x\) and \(x^-\) though?
Q: How do I leverage the inherent factorization to get \(\pi\) and \(\zeta\)?
IDIL
First Theoretical Contribution
\( \mathcal{N} = (\pi, \zeta) \)
IDIL
Second Theoretical Contribution
IDIL
Objective
IDIL
Objective
\( \mathcal{N} = (\pi, \zeta) \)
Using Viterbi
Algorithm
Viterbi Algorithm
Experiments
MultiGoal-n
(continuous)
Movers
(discrete)
Results
Key Takeaway : IDIL does better than other IL methods for problems where diverse human intents exist and can vary over time.
Key Takeaway : IDIL does just as good (or better) as other IL methods for problems where diverse human intents don't exist.
Results
Key Takeaway : IDIL is better at identifying the hidden intent than other IL methods that also consider intent.
Results
Results
?
IDIL
By Himanshu Gupta
IDIL
- 37