From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback

· 2025 · cs.RO · arXiv 2502.07645

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Behavior cloning (BC) optimizes policies by treating human demonstrations as pointwise action labels. While effective with accurate action labels, this formulation is brittle in practice: when human-provided actions are imperfect, treating each label as an exact target can steer the policy away from the underlying desired behavior, particularly when expressive models are used (e.g., energy-based models). As a result, we propose a human-in-the-loop alternative that replaces pointwise supervision with set-valued action targets. We introduce Contrastive policy Learning from Interactive Corrections (CLIC). CLIC leverages human corrections to construct and refine sets of desired actions, and optimizes a policy to place probability mass over these sets rather than over a single action target. This formulation naturally accommodates both absolute and relative corrections and can represent complex multi-modal behaviors. Extensive simulation and real-robot experiments show that the proposed approach leads to effective policy learning across diverse settings: CLIC remains competitive with the state of the art under accurate data while being substantially more robust under noisy, relative, and partial feedback. Our implementation is publicly available at https://clic-webpage.github.io/.

representative citing papers

Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory

cs.RO · 2025-04-07 · unverdicted · novelty 6.0

Wavelet Policy combines world prior memory from background images with wavelet-domain multi-scale action modeling via a single-encoder multiple-decoder architecture to improve long-horizon robotic imitation learning.

citing papers explorer

Showing 1 of 1 citing paper.

Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory cs.RO · 2025-04-07 · unverdicted · none · ref 15 · internal anchor
Wavelet Policy combines world prior memory from background images with wavelet-domain multi-scale action modeling via a single-encoder multiple-decoder architecture to improve long-horizon robotic imitation learning.

From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback

fields

years

verdicts

representative citing papers

citing papers explorer