dvla: Diffusion vision-language-action model with multimodal chain-of-thought.arXiv preprint arXiv:2509.25681

· 2025 · arXiv 2509.25681

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

A multimodal transformer generates and caches interleaved text-image traces to guide closed-loop actions, achieving 92.4% success on LIBERO-Long and 95.5% average on LIBERO.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.

PAMAE: Phase-Aware-MoE Action Experts Towards Reliable Flow-Matching Vision-Language-Action Policies

cs.RO · 2026-06-25 · unverdicted · novelty 6.0

PAMAE adds a phase-aware router and expert mixture to flow-matching VLA models, yielding up to 9.2% higher task success on multi-stage manipulation simulations via two-stage training.

dVLA-RL: Reinforcement Learning over Denoising Trajectories for Discrete Diffusion Vision-Language-Action Models

cs.RO · 2026-06-22 · unverdicted · novelty 6.0

dVLA-RL models denoising as an MDP to enable RL on dVLAs via trajectory probabilities, reporting 99.7% success on LIBERO and 30.6% gains over SFT on RoboTwin 2.0.

PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models

cs.CV · 2026-06-21 · unverdicted · novelty 6.0

PolicyTrim is an RL post-training framework that boosts VLA policy efficiency by 3x chunk utilization and 51.4% fewer steps, yielding up to 5.83x speedup.

Continuous Reasoning for Vision-Language-Action

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

Continuous Reasoning for VLA introduces a shared Gaussian latent for continuous thoughts, trained with self-verification to improve action prediction on LIBERO-PRO and real robots.

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

cs.CL · 2026-05-22 · unverdicted · novelty 6.0 · 2 refs

Fast-dDrive is a block-diffusion VLA that reports SOTA accuracy on WOD-E2E and nuScenes driving benchmarks together with 12x throughput over autoregressive baselines via section scaffolds and test-time averaging.

GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization

cs.RO · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

GuidedVLA improves VLA generalization by supervising individual attention heads with manually defined auxiliary signals for three task-relevant factors.

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

cs.RO · 2026-05-10 · unverdicted · novelty 6.0

RePO-VLA raises average adversarial success rates in VLA manipulation from 20% to 75% by using recovery-aware initialization, a progress-aware semantic value function, and value-conditioned refinement on success and corrective trajectories.

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.

Learning Native Continuation for Action Chunking Flow Policies

cs.RO · 2026-02-13 · unverdicted · novelty 6.0

Legato trains flow-based VLA policies with schedule-shaped action-noise mixtures and randomized conditions to achieve smoother trajectories and ~10% faster task completion than real-time chunking across five real-world manipulation tasks.

AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models

cs.RO · 2025-11-18 · unverdicted · novelty 6.0

AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.

Scaling by Diversified Experience for Vision-Language-Action Models

cs.CV · 2026-06-08 · unverdicted · novelty 5.0

SyVLA uses Intention Decoupling and similar-sample guided RL on diversified experiences to improve VLA model task success and out-of-distribution generalization while keeping vision-language abilities.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

dvla: Diffusion vision-language-action model with multimodal chain-of-thought.arXiv preprint arXiv:2509.25681

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer