Learning Additively Compositional Latent Actions for Embodied AI

· 2026 · cs.CV · arXiv 2604.03340

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Latent action learning infers pseudo-action labels from visual transitions, providing an approach to leverage internet-scale video for embodied AI. However, most methods learn latent actions without structural priors that encode the additive, compositional structure of physical motion. As a result, latents often entangle irrelevant scene details or information about future observations with true state changes and miscalibrate motion magnitude. We introduce Additively Compositional Latent Action Model (AC-LAM), which enforces scene-wise additive composition structure over short horizons on the latent action space. These AC constraints encourage simple algebraic structure in the latent action space~(identity, inverse, cycle consistency) and suppress information that does not compose additively. Empirically, AC-LAM learns more structured, motion-specific, and displacement-calibrated latent actions and provides stronger supervision for downstream policy learning, outperforming state-of-the-art LAMs across simulated and real-world tabletop tasks.

representative citing papers

Latent Actions from Factorized Transition Effects under Agent Ambiguity

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

OTF decomposes transitions into reusable primitives to form action-like latents in OTF-LAM and OTF-LAM-Dino, enabling zeroshot transfer and competitive policy learning under visual ambiguity.

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

cs.RO · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.

citing papers explorer

Showing 2 of 2 citing papers.

Latent Actions from Factorized Transition Effects under Agent Ambiguity cs.AI · 2026-06-29 · unverdicted · none · ref 69 · internal anchor
OTF decomposes transitions into reusable primitives to form action-like latents in OTF-LAM and OTF-LAM-Dino, enabling zeroshot transfer and competitive policy learning under visual ambiguity.
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models cs.RO · 2026-05-11 · unverdicted · none · ref 49 · 2 links · internal anchor
ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.

Learning Additively Compositional Latent Actions for Embodied AI

fields

years

verdicts

representative citing papers

citing papers explorer