pith. sign in

arxiv: 2603.22713 · v2 · pith:ISUJIIQSnew · submitted 2026-03-24 · 💻 cs.LG

Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Value Flow Mechanism

classification 💻 cs.LG
keywords compoundingerrorsimitationdemonstrationsdualiq-learnlearningnon-adversarial
0
0 comments X
read the original abstract

Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors inherent to behavioral cloning (BC), yet its adversarial optimization frequently leads to training instability. A class of non-adversarial Q-based imitation learning (IL) methods, exemplified by IQ-Learn, has emerged to address this instability and is widely believed to outperform BC by leveraging online environment interactions. In this paper, we revisit IQ-Learn and prove that it in fact reduces to BC: it admits an imitation gap lower bound with quadratic dependence on the horizon and therefore remains susceptible to compounding errors. Our theoretical analysis reveals why online interactions fail to help: IQ-Learn uniformly suppresses Q-values for all actions at states not covered by demonstrations, preventing generalization beyond demonstrations. To address this fundamental limitation, we introduce Dual Q-DM, a new Q-based IL method built on Bellman constraints. Crucially, Bellman constraints drive value flow: Q-values propagate from demonstrated to unvisited states through environment dynamics, enabling generalization beyond demonstrations. We prove that Dual Q-DM is equivalent to AIL and can recover expert actions at unvisited states, thereby mitigating compounding errors. To the best of our knowledge, Dual Q-DM is the first non-adversarial IL method that is theoretically guaranteed to eliminate compounding errors. Experimental results further corroborate our theoretical findings.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.