Dexora is the first open-source VLA system for dual-arm dual-hand high-DoF manipulation, trained on 100K simulated and 10K real teleoperated trajectories with a discriminator-weighted diffusion policy, achieving 66.7% dexterous success versus 51.7% for baselines.
hub Canonical reference
Dexgraspvla: A vision-language-action framework towards general dexterous grasping
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 16roles
background 7polarities
background 7representative citing papers
GuardVLA embeds a stealthy backdoor watermark in VLAs via secret messages in visual data and uses a swap-and-detect mechanism for post-release ownership verification that preserves task performance.
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
BiDexGrasp supplies a 9.7-million-grasp bimanual dexterous dataset built via two-stage synthesis and a coordinated geometry-size-adaptive model that generates grasps for unseen objects.
VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.
SpaceDex achieves 63% success grasping unseen objects in tiered workspaces via VLM spatial planning and arm-hand feature separation, beating a 39% tabletop baseline in 100 real trials.
HiVLA decouples VLM-based semantic planning with visual grounding from a cascaded cross-attention DiT action expert, outperforming end-to-end VLAs on long-horizon and fine-grained manipulation.
BLaDA grounds open-vocabulary language into functional dexterous manipulation via knowledge-guided parsing, triangular localization in 3DGS fields, and keypoint grasp execution.
A JAX-implemented flow-based equivariant model for multi-embodiment grasping that deduces kinematics from geometry to support variable-DoF grippers with a new dataset of 25k scenes and 20M grasps.
GD2P generates and learns dexterous hand poses for nonprehensile pushing and pulling by combining contact-guided sampling, physics-based filtering, and a geometry-conditioned diffusion model, demonstrated on Allegro and LEAP hands in real-world tests.
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.
A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.
citing papers explorer
-
Dexora: Open-source VLA for High-DoF Bimanual Dexterity
Dexora is the first open-source VLA system for dual-arm dual-hand high-DoF manipulation, trained on 100K simulated and 10K real teleoperated trajectories with a discriminator-weighted diffusion policy, achieving 66.7% dexterous success versus 51.7% for baselines.
-
Towards Backdoor-Based Ownership Verification for Vision-Language-Action Models
GuardVLA embeds a stealthy backdoor watermark in VLAs via secret messages in visual data and uses a swap-and-detect mechanism for post-release ownership verification that preserves task performance.
-
Being-H0.7: A Latent World-Action Model from Egocentric Videos
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
-
BiDexGrasp: Coordinated Bimanual Dexterous Grasps across Object Geometries and Sizes
BiDexGrasp supplies a 9.7-million-grasp bimanual dexterous dataset built via two-stage synthesis and a coordinated geometry-size-adaptive model that generates grasps for unseen objects.
-
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
-
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
-
Unified Noise Steering for Efficient Human-Guided VLA Adaptation
UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.
-
SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces
SpaceDex achieves 63% success grasping unseen objects in tiered workspaces via VLM spatial planning and arm-hand feature separation, beating a 39% tabletop baseline in 100 real trials.
-
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
HiVLA decouples VLM-based semantic planning with visual grounding from a cascaded cross-attention DiT action expert, outperforming end-to-end VLAs on long-horizon and fine-grained manipulation.
-
BLaDA: Bridging Language to Functional Dexterous Actions within 3DGS Fields
BLaDA grounds open-vocabulary language into functional dexterous manipulation via knowledge-guided parsing, triangular localization in 3DGS fields, and keypoint grasp execution.
-
Towards a Multi-Embodied Grasping Agent
A JAX-implemented flow-based equivariant model for multi-embodiment grasping that deduces kinematics from geometry to support variable-DoF grippers with a new dataset of 25k scenes and 20M grasps.
-
Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands
GD2P generates and learns dexterous hand poses for nonprehensile pushing and pulling by combining contact-guided sampling, physics-based filtering, and a geometry-conditioned diffusion model, demonstrated on Allegro and LEAP hands in real-world tests.
-
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
-
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.
-
Towards Robotic Dexterous Hand Intelligence: A Survey
A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.