hub Canonical reference

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

Shengliang Deng, Mi Yan, Songlin Wei, Haixin Ma, Yuxin Yang, Jiayi Chen · 2025 · cs.RO · arXiv 2505.03233

Canonical reference. 90% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 90% of classified citations

open full Pith review browse 18 citing papers arXiv PDF

abstract

Embodied foundation models are gaining increasing attention for their zero-shot generalization, scalability, and adaptability to new tasks through few-shot post-training. However, existing models rely heavily on real-world data, which is costly and labor-intensive to collect. Synthetic data offers a cost-effective alternative, yet its potential remains largely underexplored. To bridge this gap, we explore the feasibility of training Vision-Language-Action models entirely with large-scale synthetic action data. We curate SynGrasp-1B, a billion-frame robotic grasping dataset generated in simulation with photorealistic rendering and extensive domain randomization. Building on this, we present GraspVLA, a VLA model pretrained on large-scale synthetic action data as a foundational model for grasping tasks. GraspVLA integrates autoregressive perception tasks and flow-matching-based action generation into a unified Chain-of-Thought process, enabling joint training on synthetic action data and Internet semantics data. This design helps mitigate sim-to-real gaps and facilitates the transfer of learned actions to a broader range of Internet-covered objects, achieving open-vocabulary generalization in grasping. Extensive evaluations across real-world and simulation benchmarks demonstrate GraspVLA's advanced zero-shot generalizability and few-shot adaptability to specific human preferences. We will release SynGrasp-1B dataset and pre-trained weights to benefit the community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 baseline 1 dataset 1

citation-polarity summary

background 9 baseline 1

representative citing papers

FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models

cs.CV · 2026-03-30 · unverdicted · novelty 8.0

FlowHijack is the first dynamics-aware backdoor attack on flow-matching VLAs that achieves high success rates with stealthy triggers while preserving benign performance and making malicious actions kinematically indistinguishable from normal ones.

Dexora: Open-source VLA for High-DoF Bimanual Dexterity

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

Dexora is the first open-source VLA system for dual-arm dual-hand high-DoF manipulation, trained on 100K simulated and 10K real teleoperated trajectories with a discriminator-weighted diffusion policy, achieving 66.7% dexterous success versus 51.7% for baselines.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

cs.RO · 2026-04-23 · unverdicted · novelty 6.0

CorridorVLA improves VLA models by using predicted sparse anchors to impose explicit spatial corridors on action trajectories, yielding 3.4-12.4% success rate gains on LIBERO-Plus with GR00T-Corr reaching 83.21%.

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

cs.RO · 2026-04-14 · unverdicted · novelty 6.0

Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

cs.RO · 2026-02-22 · unverdicted · novelty 6.0

OptimusVLA augments hierarchical VLA models with Global Prior Memory for shorter generative paths and Local Consistency Memory for temporal coherence, yielding higher success rates and 2.9x faster inference on simulation and real-world robotic benchmarks.

DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning

cs.RO · 2026-01-22 · unverdicted · novelty 6.0

DextER uses contact-based embodied reasoning via autoregressive token generation to produce language-driven dexterous grasps, reaching 67.14% success on DexGYS with a 3.83 p.p. gain over prior methods and 96.4% better intention alignment.

PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation

cs.RO · 2026-01-11 · unverdicted · novelty 6.0

PALM improves long-horizon robotic manipulation success by distilling affordance representations for object interaction and predicting within-subtask progress in a VLA model.

Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

cs.RO · 2026-01-05 · unverdicted · novelty 6.0

Genie Sim 3.0 introduces an LLM-powered scene generator, the first LLM-based automated evaluation benchmark, and a large open synthetic dataset that demonstrates zero-shot sim-to-real transfer for robotic manipulation policies.

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

cs.CV · 2025-07-06 · unverdicted · novelty 6.0

DreamVLA uses dynamic-region-guided world knowledge prediction, block-wise attention to disentangle information types, and a diffusion transformer for actions, reaching 76.7% success on real robot tasks and 4.44 average length on CALVIN ABC-D.

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

cs.RO · 2025-06-22 · unverdicted · novelty 6.0

RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.

Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection

cs.RO · 2026-04-15 · unverdicted · novelty 5.0

A dual VLM-VLA framework for long-horizon robot manipulation achieves 32.4% success on RMBench tasks versus 9.8% for the strongest baseline via structured memory and closed-loop adaptive replanning.

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

cs.RO · 2025-08-18 · unverdicted · novelty 5.0

This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.

Towards Robotic Dexterous Hand Intelligence: A Survey

cs.RO · 2026-05-13 · unverdicted · novelty 4.0

A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines

cs.RO · 2026-04-24 · unverdicted · novelty 3.0

A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data generation.

3D Generation for Embodied AI and Robotic Simulation: A Survey

cs.RO · 2026-04-29 · unverdicted · novelty 2.0 · 3 refs

The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and transfer.

citing papers explorer

Showing 18 of 18 citing papers.

FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models cs.CV · 2026-03-30 · unverdicted · none · ref 8 · internal anchor
FlowHijack is the first dynamics-aware backdoor attack on flow-matching VLAs that achieves high success rates with stealthy triggers while preserving benign performance and making malicious actions kinematically indistinguishable from normal ones.
Dexora: Open-source VLA for High-DoF Bimanual Dexterity cs.RO · 2026-05-18 · unverdicted · none · ref 34 · internal anchor
Dexora is the first open-source VLA system for dual-arm dual-hand high-DoF manipulation, trained on 100K simulated and 10K real teleoperated trajectories with a discriminator-weighted diffusion policy, achieving 66.7% dexterous success versus 51.7% for baselines.
GazeVLA: Learning Human Intention for Robotic Manipulation cs.RO · 2026-04-24 · unverdicted · none · ref 21 · internal anchor
GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.
CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors cs.RO · 2026-04-23 · unverdicted · none · ref 28 · internal anchor
CorridorVLA improves VLA models by using predicted sparse anchors to impose explicit spatial corridors on action trajectories, yielding 3.4-12.4% success rate gains on LIBERO-Plus with GR00T-Corr reaching 83.21%.
Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models cs.RO · 2026-04-14 · unverdicted · none · ref 16 · internal anchor
Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds cs.RO · 2026-04-09 · unverdicted · none · ref 17 · internal anchor
SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.
Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation cs.RO · 2026-02-22 · unverdicted · none · ref 7 · internal anchor
OptimusVLA augments hierarchical VLA models with Global Prior Memory for shorter generative paths and Local Consistency Memory for temporal coherence, yielding higher success rates and 2.9x faster inference on simulation and real-world robotic benchmarks.
DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning cs.RO · 2026-01-22 · unverdicted · none · ref 9 · internal anchor
DextER uses contact-based embodied reasoning via autoregressive token generation to produce language-driven dexterous grasps, reaching 67.14% success on DexGYS with a 3.83 p.p. gain over prior methods and 96.4% better intention alignment.
PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation cs.RO · 2026-01-11 · unverdicted · none · ref 21 · internal anchor
PALM improves long-horizon robotic manipulation success by distilling affordance representations for object interaction and predicting within-subtask progress in a VLA model.
Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot cs.RO · 2026-01-05 · unverdicted · none · ref 10 · internal anchor
Genie Sim 3.0 introduces an LLM-powered scene generator, the first LLM-based automated evaluation benchmark, and a large open synthetic dataset that demonstrates zero-shot sim-to-real transfer for robotic manipulation policies.
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge cs.CV · 2025-07-06 · unverdicted · none · ref 83 · internal anchor
DreamVLA uses dynamic-region-guided world knowledge prediction, block-wise attention to disentangle information types, and a diffusion transformer for actions, reaching 76.7% success on real robot tasks and 4.44 average length on CALVIN ABC-D.
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation cs.RO · 2025-06-22 · unverdicted · none · ref 11 · internal anchor
RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.
Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection cs.RO · 2026-04-15 · unverdicted · none · ref 3 · internal anchor
A dual VLM-VLA framework for long-horizon robot manipulation achieves 32.4% success on RMBench tasks versus 9.8% for the strongest baseline via structured memory and closed-loop adaptive replanning.
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey cs.RO · 2025-08-18 · unverdicted · none · ref 143 · internal anchor
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.
Towards Robotic Dexterous Hand Intelligence: A Survey cs.RO · 2026-05-13 · unverdicted · none · ref 155 · internal anchor
A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 169 · internal anchor
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines cs.RO · 2026-04-24 · unverdicted · none · ref 6 · internal anchor
A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data generation.
3D Generation for Embodied AI and Robotic Simulation: A Survey cs.RO · 2026-04-29 · unverdicted · none · ref 190 · 3 links · internal anchor
The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and transfer.

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer