hub Canonical reference

Unified diffusion vla: Vision-language-action model via joint discrete denoising diffusion process

Jiayi Chen, Wenxuan Song, Pengxiang Ding, Ziyang Zhou, Han Zhao, Feilong Tang, Donglin Wang, Haoang Li · 2026 · arXiv 2511.01718

Canonical reference. 100% of citing Pith papers cite this work as background.

14 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Back to the Familiar Future: Failure Recovery for VLA Policies via Pre-Imagined Milestone Selection

cs.RO · 2026-06-08 · unverdicted · novelty 7.0

B2FF pre-generates a milestone bank of familiar future states from the clean initial observation and uses a recoverability-aware selector to guide VLA policies back from deviations, raising average success rate from 56.3% to 74.0% on failure-injected LIBERO.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

A multimodal transformer generates and caches interleaved text-image traces to guide closed-loop actions, achieving 92.4% success on LIBERO-Long and 95.5% average on LIBERO.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

cs.RO · 2026-03-27 · conditional · novelty 7.0

DFM-VLA uses discrete flow matching to iteratively refine action tokens in VLA models, outperforming autoregressive and diffusion baselines with 4.44 average success length on CALVIN and 95.7% success on LIBERO.

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

cs.CV · 2026-06-17 · unverdicted · novelty 6.0

ImageWAM shows image editing models can replace video generation in world action models, delivering better performance with 6x lower FLOPs and 4x lower latency by using edit-derived KV caches as compact context.

GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization

cs.RO · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

GuidedVLA improves VLA generalization by supervising individual attention heads with manually defined auxiliary signals for three task-relevant factors.

AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models

cs.RO · 2025-11-18 · unverdicted · novelty 6.0

AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.

Kairos: A Native World Model Stack for Physical AI

cs.AI · 2026-06-15 · unverdicted · novelty 5.0

Kairos is a native world model stack using cross-embodiment pretraining, hybrid linear temporal attention with theoretical error bounds, and deployment-aware co-design, reporting top performance on embodied benchmarks.

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

cs.RO · 2026-06-08 · unverdicted · novelty 5.0

AHA-WAM is a dual-DiT asynchronous world-action model with horizon-adaptive offset training and OVCR routing that reports 92.8% success on RoboTwin and 78.3% on real tasks at 24.17 Hz without robot pretraining.

Coarse-to-Control: Action-Token Planning for Vision-Language-Action Models

cs.RO · 2026-06-05 · unverdicted · novelty 5.0

Coarse-to-Control adds planning via coarse action tokens in the same vocabulary as control actions, improving VLA performance on long-horizon manipulation tasks.

Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance

cs.RO · 2026-03-26 · unverdicted · novelty 5.0

Parameter differences from two training runs on a small task set are treated as auxiliary capability vectors that are merged into a pretrained VLA model, yielding auxiliary-task gains at the cost of ordinary supervised finetuning plus a simple regularization term.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

World Action Models: A Survey

cs.RO · 2026-06-18 · unverdicted · novelty 3.0

A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.

citing papers explorer

Showing 10 of 10 citing papers after filters.

Back to the Familiar Future: Failure Recovery for VLA Policies via Pre-Imagined Milestone Selection cs.RO · 2026-06-08 · unverdicted · none · ref 20
B2FF pre-generates a milestone bank of familiar future states from the clean initial observation and uses a recoverability-aware selector to guide VLA policies back from deviations, raising average success rate from 56.3% to 74.0% on failure-injected LIBERO.
Point Tracking Improves World Action Models cs.RO · 2026-05-22 · unverdicted · none · ref 17
JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors cs.RO · 2026-04-27 · unverdicted · none · ref 44 · 2 links
Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.
DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching cs.RO · 2026-03-27 · conditional · none · ref 4
DFM-VLA uses discrete flow matching to iteratively refine action tokens in VLA models, outperforming autoregressive and diffusion baselines with 4.44 average success length on CALVIN and 95.7% success on LIBERO.
GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization cs.RO · 2026-05-12 · unverdicted · none · ref 14 · 2 links
GuidedVLA improves VLA generalization by supervising individual attention heads with manually defined auxiliary signals for three task-relevant factors.
AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing cs.RO · 2026-06-08 · unverdicted · none · ref 15
AHA-WAM is a dual-DiT asynchronous world-action model with horizon-adaptive offset training and OVCR routing that reports 92.8% success on RoboTwin and 78.3% on real tasks at 24.17 Hz without robot pretraining.
Coarse-to-Control: Action-Token Planning for Vision-Language-Action Models cs.RO · 2026-06-05 · unverdicted · none · ref 34
Coarse-to-Control adds planning via coarse action tokens in the same vocabulary as control actions, improving VLA performance on long-horizon manipulation tasks.
Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance cs.RO · 2026-03-26 · unverdicted · none · ref 5
Parameter differences from two training runs on a small task set are treated as auxiliary capability vectors that are merged into a pretrained VLA model, yielding auxiliary-task gains at the cost of ordinary supervised finetuning plus a simple regularization term.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 116
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
World Action Models: A Survey cs.RO · 2026-06-18 · unverdicted · none · ref 19
A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.

Unified diffusion vla: Vision-language-action model via joint discrete denoising diffusion process

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer