IDIL: Imitation Learning of Intent-Driven Expert Behavior

AAMAS 2024

Presented By: Himanshu Gupta

Date: 6/10/2024

Authors: Sangwon Seo, Vaibhav Unhelkar

Why this paper?

  • Saw this at AAMAS 2024.
    • I found it cool and wanted to read it.
       
  • I care about close proximity human-robot tasks.
    • We need good human models for them.
       
  • Amazing Paper
    • Developed theory and validated it with an interesting application.  

Motivation

Traditional IL  

  • \( \pi_E = \pi(a|s) \)

IDIL

  • \( \pi_E = \pi(a|s,x) \)
  • \( \zeta(x'|s,x) \)

Preliminaries

  • MDP
  • Given an MDP and a policy \( \pi \), we get a Markov chain
  • Use the Markov chain to get the stationary distribution called an occupancy measure

Markov Chain

Preliminaries

  • Traditional IL
  • GOAL : Find a policy \(\pi\) that matches \(\pi_E\)
  • Given M and a set of expert demonstrations D
  • Same as solving the occupancy measure matching problem

  \( \pi_E = \pi(a|s) \)

Preliminaries

  • Model the agent using Agent Markov Model (AMM)
  • Define AMM using the Tuple
  • Given the MDP

X - set of latent states

  • AMM model describing the expert \( \mathcal{N}_E = (\pi_E, \zeta_E) \)

IDIL

  • INPUT
    • Demonstration Set  \( \mathcal{D} \)
    • Discrete set of intents X
    • MDP M
  • GOAL : Learn an AMM model \( \mathcal{N} = (\pi, \zeta) \) that mimics the expert behavior

IDIL

  • Objective for Traditional IL
  • Why not do this for IDIL?

 \( \mathcal{N} = (\pi, \zeta) \)

IDIL

  • Objective for Traditional IL
  • Do this for IDIL

 \( \mathcal{N} = (\pi, \zeta) \)

IDIL

  • Do this for IDIL

 \( \mathcal{N} = (\pi, \zeta) \)

Q: How do I get \(x\) and \(x^-\) though?

Q: How do I leverage the inherent factorization to get \(\pi\) and \(\zeta\)?

IDIL

First Theoretical Contribution

 \( \mathcal{N} = (\pi, \zeta) \)

IDIL

Second Theoretical Contribution

IDIL
Objective

IDIL
Objective

 \( \mathcal{N} = (\pi, \zeta) \)

Using Viterbi
Algorithm

Viterbi Algorithm

Experiments

MultiGoal-n
(continuous)

Movers
(discrete)

Results

Key Takeaway : IDIL does better than other IL methods for problems where diverse human intents exist and can vary over time.

Key Takeaway : IDIL does just as good (or better) as other IL methods for problems where diverse human intents don't exist.

Results

Key Takeaway : IDIL is better at identifying the hidden intent than other IL methods that also consider intent.

Results

Results

?

IDIL

By Himanshu Gupta

IDIL

  • 37