UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.
hub
Vision-language-action model with open- world embodied reasoning from pretrained knowledge
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
fields
cs.RO 10roles
background 2polarities
background 2representative citing papers
VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.
Interventional attribution via ISS and NMR diagnoses causal misalignment in VLA policies and predicts their generalization performance across manipulation tasks.
GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for consistent environments.
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
PhysBrain 1.0 extracts scene elements, spatial dynamics, actions and depth relations from human egocentric video to create QA supervision for VLMs, then transfers the resulting physical priors to VLA policies via capability-preserving adaptation.
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.
SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.
Systematic tests of VLM backbones, policy architectures, and cross-embodiment data yield RoboVLMs that set new SOTA on robot manipulation benchmarks while requiring few manual designs.
citing papers explorer
-
UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models
UniLACT improves VLA models by adding depth-aware unified latent action pretraining that outperforms RGB-only baselines on seen and unseen manipulation tasks.
-
VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model
VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.
-
Embodied Interpretability: Linking Causal Understanding to Generalization in Vision-Language-Action Models
Interventional attribution via ISS and NMR diagnoses causal misalignment in VLA policies and predicts their generalization performance across manipulation tasks.
-
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for consistent environments.
-
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
-
PhysBrain 1.0 Technical Report
PhysBrain 1.0 extracts scene elements, spatial dynamics, actions and depth relations from human egocentric video to create QA supervision for VLMs, then transfers the resulting physical priors to VLA policies via capability-preserving adaptation.
-
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.
-
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.
-
What Matters in Building Vision-Language-Action Models for Generalist Robots
Systematic tests of VLM backbones, policy architectures, and cross-embodiment data yield RoboVLMs that set new SOTA on robot manipulation benchmarks while requiring few manual designs.
- Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery