arXiv preprint arXiv:2505.08787 , year=

· 2025 · arXiv 2505.08787

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models

cs.RO · 2026-02-23 · unverdicted · novelty 7.0

UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.

Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models

cs.RO · 2026-06-19 · unverdicted · novelty 6.0

GLAM learns a shared latent action space grounded in consistent future observation prediction across heterogeneous data sources to train improved behavioral cloning policies for robot manipulation tasks.

Efficient Skill Grounding via Code Refactoring with Small Language Models

cs.AI · 2026-06-06 · unverdicted · novelty 6.0

RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.

PHASOR: Phase-Anchored Universal Action Representations for Humanoid Embodiments

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

PHASOR factorizes motion into an FFT-based phase manifold and pose branch with semantic distillation to produce a cross-embodiment, human-anchored action embedding space for humanoid robots.

HARP-VLA: Human-Robot Aligned Representation Learning for Vision-Language-Action Model

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

HARP aligns human-robot visual and latent action representations via paired bridges and unpaired dynamics supervision to boost VLA policy performance on manipulation tasks.

LACE: Latent Visual Representation for Cross-Embodiment Learning

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.

DiLA: Disentangled Latent Action World Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.

Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.

CORE: Common Outcome Regularities from Action-Free Visual Demonstrations for Robot Manipulation

cs.RO · 2026-06-28 · unverdicted · novelty 5.0

CORE extracts visual goal prototypes from terminal embeddings in action-free demonstrations to condition robot policies, reporting success rate gains of up to 17 percentage points on manipulation benchmarks.

citing papers explorer

Showing 9 of 9 citing papers.

UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models cs.RO · 2026-02-23 · unverdicted · none · ref 19
UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.
Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models cs.RO · 2026-06-19 · unverdicted · none · ref 48
GLAM learns a shared latent action space grounded in consistent future observation prediction across heterogeneous data sources to train improved behavioral cloning policies for robot manipulation tasks.
Efficient Skill Grounding via Code Refactoring with Small Language Models cs.AI · 2026-06-06 · unverdicted · none · ref 109
RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.
PHASOR: Phase-Anchored Universal Action Representations for Humanoid Embodiments cs.RO · 2026-06-01 · unverdicted · none · ref 39
PHASOR factorizes motion into an FFT-based phase manifold and pose branch with semantic distillation to produce a cross-embodiment, human-anchored action embedding space for humanoid robots.
HARP-VLA: Human-Robot Aligned Representation Learning for Vision-Language-Action Model cs.RO · 2026-05-29 · unverdicted · none · ref 3
HARP aligns human-robot visual and latent action representations via paired bridges and unpaired dynamics supervision to boost VLA policy performance on manipulation tasks.
LACE: Latent Visual Representation for Cross-Embodiment Learning cs.RO · 2026-05-16 · unverdicted · none · ref 36
LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.
DiLA: Disentangled Latent Action World Models cs.CV · 2026-05-15 · unverdicted · none · ref 15
DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.
Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing cs.RO · 2026-05-05 · unverdicted · none · ref 20
A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.
CORE: Common Outcome Regularities from Action-Free Visual Demonstrations for Robot Manipulation cs.RO · 2026-06-28 · unverdicted · none · ref 17
CORE extracts visual goal prototypes from terminal embeddings in action-free demonstrations to condition robot policies, reporting success rate gains of up to 17 percentage points on manipulation benchmarks.

arXiv preprint arXiv:2505.08787 , year=

fields

years

verdicts

representative citing papers

citing papers explorer