LIBERO-Safety supplies a scalable benchmark, data-generation pipeline, and 19,664-demonstration dataset that exposes a generalization-safety tension in current VLA models where diverse training improves collision avoidance but task success stays limited by trajectory quality and semantic understandi
RynnVLA-001: Using human demonstrations to improve robot manipulation
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.
APT pretrains the action expert as a vision-action prior on frozen VLM features then adds language through gated fusion to improve OOD instruction generalization in continuous-action VLA policies.
ELAN4D introduces plug-and-play 4D keypoint track supervision from forward kinematics to enhance VLA policy generalization in robotic manipulation tasks.
Adaptive Action Chunking uses action entropy to dynamically adjust chunk sizes in VLA models, improving performance on simulated and real robotic manipulation tasks.
HiF-VLA improves long-horizon robotic manipulation by encoding past motion as hindsight priors and anticipating future motion through foresight reasoning inside a VLA framework.
GEM adds generative depth supervision to VLM pre-training and reports improved results on embodied benchmarks plus real-world robot execution.
citing papers explorer
-
LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models
LIBERO-Safety supplies a scalable benchmark, data-generation pipeline, and 19,664-demonstration dataset that exposes a generalization-safety tension in current VLA models where diverse training improves collision avoidance but task success stays limited by trajectory quality and semantic understandi
-
APT: Action Expert Pretraining Improves Instruction Generalization of Vision-Language-Action Policies
APT pretrains the action expert as a vision-action prior on frozen VLM features then adds language through gated fusion to improve OOD instruction generalization in continuous-action VLA policies.
-
ELAN4D: Embodiment-Centric 4D Supervision for Vision-Language-Action Models via Plug-and-Play Adaptation
ELAN4D introduces plug-and-play 4D keypoint track supervision from forward kinematics to enhance VLA policy generalization in robotic manipulation tasks.
-
Adaptive Action Chunking at Inference-time for Vision-Language-Action Models
Adaptive Action Chunking uses action entropy to dynamically adjust chunk sizes in VLA models, improving performance on simulated and real robotic manipulation tasks.
-
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
HiF-VLA improves long-horizon robotic manipulation by encoding past motion as hindsight priors and anticipating future motion through foresight reasoning inside a VLA framework.
-
GEM: Generative Supervision Helps Embodied Intelligence
GEM adds generative depth supervision to VLM pre-training and reports improved results on embodied benchmarks plus real-world robot execution.