CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations

Anthony Liang; Erdem Biyik; Matthew Hong; Pavel Czempin; Stephen Tu; Yutai Zhou

arxiv: 2505.04999 · v1 · pith:CD47YDHFnew · submitted 2025-05-08 · 💻 cs.RO · cs.AI· cs.LG

CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations

Anthony Liang , Pavel Czempin , Matthew Hong , Yutai Zhou , Erdem Biyik , Stephen Tu This is my paper

classification 💻 cs.RO cs.AIcs.LG

keywords actioncontinuouslatentclamdatalearningrobotunlabeled

0 comments

read the original abstract

Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations, which fundamentally limits the scale of training data. A promising approach to address this bottleneck is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way. However, we find that existing methods struggle when applied to complex robot tasks requiring fine-grained motions. We design continuous latent action models (CLAM) which incorporate two key ingredients we find necessary for learning to solve complex continuous control tasks from unlabeled observation data: (a) using continuous latent action labels instead of discrete representations, and (b) jointly training an action decoder to ensure that the latent action space can be easily grounded to real actions with relatively few labeled examples. Importantly, the labeled examples can be collected from non-optimal play data, enabling CLAM to learn performant policies without access to any action-labeled expert data. We demonstrate on continuous control benchmarks in DMControl (locomotion) and MetaWorld (manipulation), as well as on a real WidowX robot arm that CLAM significantly outperforms prior state-of-the-art methods, remarkably with a 2-3x improvement in task success rate compared to the best baseline. Videos and code can be found at clamrobot.github.io.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RotVLA: Rotational Latent Action for Vision-Language-Action Model
cs.RO 2026-05 unverdicted novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
cs.RO 2026-02 unverdicted novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robo...
Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models
cs.RO 2026-06 unverdicted novelty 6.0

GLAM learns a shared latent action space grounded in consistent future observation prediction across heterogeneous data sources to train improved behavioral cloning policies for robot manipulation tasks.
PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning
cs.RO 2026-06 unverdicted novelty 6.0

PoLAR imposes radial structure on latent actions in hyperbolic space to factorize extent and mode, improving robot policy performance over baselines.
LARA: Latent Action Representation Alignment for Vision-Language-Action Models
cs.CV 2026-06 unverdicted novelty 6.0

LARA jointly optimizes LAM and VLA models via representation alignment to improve robotic manipulation performance using human videos.
SCAR: Self-Supervised Continuous Action Representation Learning
cs.RO 2026-05 unverdicted novelty 6.0

SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
cs.RO 2025-07 unverdicted novelty 6.0

villa-X enhances latent action modeling in VLA models to support zero-shot action planning for unseen robot embodiments and open-vocabulary instructions, yielding better manipulation results in simulation and real-wor...
LARA: Latent Action Representation Alignment for Vision-Language-Action Models
cs.CV 2026-06 unverdicted novelty 5.0

LARA jointly optimizes LAM and VLA models via representation alignment, reporting average gains of ~10%, ~5%, and ~15% on simulation and real robotic manipulation tasks.