Canonical reference

Vla-rft: Vision-language-action rein- forcement fine-tuning with verified rewards in world simulators

Hengtao Li, Pengxiang Ding, Runze Suo, Yihao Wang, Zirui Ge, Dongyuan Zang, Kexian Yu, Mingyang Sun, Hongyin Zhang, Donglin Wang, et al · 2025 · arXiv 2510.00406

Canonical reference. 80% of citing Pith papers cite this work as background.

9 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 9 citing papers

citation-role summary

background 5

citation-polarity summary

background 4 unclear 1

representative citing papers

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

Reinforcing VLAs in Task-Agnostic World Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

cs.RO · 2026-04-22 · unverdicted · novelty 6.0 · 2 refs

Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.

RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

A dual-tower 4D embodied world model called RoboStereo reduces geometric hallucinations and delivers over 97% relative improvement on manipulation tasks via test-time augmentation, imitative learning, and open exploration.

Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning

cs.RO · 2026-02-11 · unverdicted · novelty 6.0

LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a 22% average success rate gain over supervised fine-tuning on the LIBERO benchmark's

Learning Action Manifold with Multi-view Latent Priors for Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

The method uses multi-view diffusion priors and action manifold learning to resolve depth ambiguity and improve action prediction in VLA robotic manipulation models, reporting higher success rates than baselines on LIBERO, RoboTwin, and real-robot tasks.

Nautilus: From One Prompt to Plug-and-Play Robot Learning

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.

World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems

cs.RO · 2026-04-16 · unverdicted · novelty 5.0

The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

citing papers explorer

Showing 9 of 9 citing papers.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos cs.RO · 2026-02-06 · unverdicted · none · ref 55
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
Reinforcing VLAs in Task-Agnostic World Models cs.AI · 2026-05-12 · unverdicted · none · ref 18 · 2 links
RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems cs.RO · 2026-04-22 · unverdicted · none · ref 18 · 2 links
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization cs.CV · 2026-03-13 · unverdicted · none · ref 18
A dual-tower 4D embodied world model called RoboStereo reduces geometric hallucinations and delivers over 97% relative improvement on manipulation tasks via test-time augmentation, imitative learning, and open exploration.
Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning cs.RO · 2026-02-11 · unverdicted · none · ref 35
LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a 22% average success rate gain over supervised fine-tuning on the LIBERO benchmark's
Learning Action Manifold with Multi-view Latent Priors for Robotic Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 58
The method uses multi-view diffusion priors and action manifold learning to resolve depth ambiguity and improve action prediction in VLA robotic manipulation models, reporting higher success rates than baselines on LIBERO, RoboTwin, and real-robot tasks.
Nautilus: From One Prompt to Plug-and-Play Robot Learning cs.RO · 2026-05-12 · unverdicted · none · ref 61
NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems cs.RO · 2026-04-16 · unverdicted · none · ref 17
The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 54
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Vla-rft: Vision-language-action rein- forcement fine-tuning with verified rewards in world simulators

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer