IDIL: Imitation Learning of Intent-Driven Expert Behavior

AAMAS 2024

Presented By: Himanshu Gupta

Date: 6/10/2024

Authors: Sangwon Seo, Vaibhav Unhelkar

Why this paper?

Saw this at AAMAS 2024.
- I found it cool and wanted to read it.
I care about close proximity human-robot tasks.
- We need good human models for them.
Amazing Paper
- Developed theory and validated it with an interesting application.

Motivation

Traditional IL

\( \pi_E = \pi(a|s) \)

IDIL

\( \pi_E = \pi(a|s,x) \)
\( \zeta(x'|s,x) \)

Preliminaries

Given an MDP and a policy \( \pi \), we get a Markov chain

Use the Markov chain to get the stationary distribution called an occupancy measure

Markov Chain

Preliminaries

Traditional IL

GOAL : Find a policy \(\pi\) that matches \(\pi_E\)

Given M and a set of expert demonstrations D

Same as solving the occupancy measure matching problem

\( \pi_E = \pi(a|s) \)

Preliminaries

Model the agent using Agent Markov Model (AMM)

Define AMM using the Tuple

Given the MDP

X - set of latent states

AMM model describing the expert \( \mathcal{N}_E = (\pi_E, \zeta_E) \)

IDIL

INPUT
- Demonstration Set \( \mathcal{D} \)
- Discrete set of intents X
- MDP M

GOAL : Learn an AMM model \( \mathcal{N} = (\pi, \zeta) \) that mimics the expert behavior

IDIL

Objective for Traditional IL

Why not do this for IDIL?

\( \mathcal{N} = (\pi, \zeta) \)

IDIL

Objective for Traditional IL

Do this for IDIL

\( \mathcal{N} = (\pi, \zeta) \)

IDIL

Do this for IDIL

\( \mathcal{N} = (\pi, \zeta) \)

Q: How do I get \(x\) and \(x^-\) though?

Q: How do I leverage the inherent factorization to get \(\pi\) and \(\zeta\)?

IDIL

First Theoretical Contribution

\( \mathcal{N} = (\pi, \zeta) \)

IDIL

Second Theoretical Contribution

IDIL
Objective

\( \mathcal{N} = (\pi, \zeta) \)

Using Viterbi
Algorithm

Viterbi Algorithm

Experiments

MultiGoal-n
(continuous)

Movers
(discrete)

Results

Key Takeaway : IDIL does better than other IL methods for problems where diverse human intents exist and can vary over time.

Key Takeaway : IDIL does just as good (or better) as other IL methods for problems where diverse human intents don't exist.

Results

Key Takeaway : IDIL is better at identifying the hidden intent than other IL methods that also consider intent.

Results

?

IDIL

By Himanshu Gupta

IDIL: Imitation Learning of Intent-Driven Expert Behavior

Why this paper?

Motivation

Preliminaries

Markov Chain

Preliminaries

Preliminaries

IDIL

IDIL

IDIL

IDIL

IDIL

IDIL

Viterbi Algorithm

Experiments

Results

Results

Results

Results

?

IDIL

More from Himanshu Gupta