{"total":16,"items":[{"citing_arxiv_id":"2606.26006","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation","primary_cat":"cs.RO","submitted_at":"2026-06-24T16:23:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FORCE is a 3-stage RL fine-tuning method for VLA models that stabilizes Q-function via on-policy warm-up and filters high-value actions for updates, claiming 79% success rate gains and 32.5% faster training without human intervention.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12109","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bridging the Morphology Gap: Adapting VLA Models to Dexterous Manipulation via Intent-Conditioned Fine-Tuning","primary_cat":"cs.RO","submitted_at":"2026-06-10T14:03:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"InDex adapts VLA models to high-DoF dexterous manipulation via intent-conditioned fine-tuning and a decoupled diffusion head, outperforming monolithic baselines in simulation tasks with minimal data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11743","ref_index":62,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TacCoRL: Integrating Tactile Feedback into VLA via Simulation","primary_cat":"cs.RO","submitted_at":"2026-06-10T07:20:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TacCoRL integrates tactile feedback into VLA policies via real-aligned simulation co-training and RL, raising average success from 50% to 72.5% on four bimanual contact-rich tasks with direct real-robot transfer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09630","ref_index":46,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies","primary_cat":"cs.RO","submitted_at":"2026-06-08T15:29:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ReCoVLA improves VLA policy reliability by using a VLM as a semantic reward selector to train residual recovery policies in simulation, raising average success from 36.7% to 66.7% in sim and achieving 61.7% in zero-shot sim-to-real physical tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08653","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning","primary_cat":"cs.CV","submitted_at":"2026-06-07T14:41:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FiberTune is a new fine-tuning objective that preserves action-fiber visual residuals in VLA policies, yielding performance gains on simulation and physical robot tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08508","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ActProbe: Action-Space Probe for Early Failure Detection of Generative Robot Policies","primary_cat":"cs.RO","submitted_at":"2026-06-07T08:18:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ActProbe is an action-space detector that uses temporal consistency error and action chunk magnitude from policy outputs, mapped via LSTM-MLP, to predict failures earlier than baselines across policies and real-robot tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03847","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Denoising Tells When to Replan: Denoising-Variance Adaptive Chunking for Flow-Based Robot Policies","primary_cat":"cs.RO","submitted_at":"2026-06-02T16:26:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DVAC uses denoising variance as an intrinsic signal to adaptively chunk actions in flow-based robot policies, improving success rates and cutting replans on LIBERO, RoboTwin, CALVIN, and real-world tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22446","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts","primary_cat":"cs.CV","submitted_at":"2026-05-21T13:13:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Pre-VLA is a multimodal runtime verifier that predicts safety confidence and advantage scores for action chunks, raising closed-loop success rates on the LIBERO benchmark from 30.79% to 37.62%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13105","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models","primary_cat":"cs.RO","submitted_at":"2026-05-13T07:15:37+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PAIR-VLA adds invariance and sensitivity objectives over paired visual variants during PPO fine-tuning of VLA models, yielding 9-16% average gains on ManiSkill3 under distractors, textures, poses, viewpoints, and lighting shifts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12334","ref_index":41,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Reinforcing VLAs in Task-Agnostic World Models","primary_cat":"cs.AI","submitted_at":"2026-05-12T16:16:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"Rl token: Bootstrapping online rl with vision-language-action models.arXiv preprint arXiv:2604.23073, 2026. [39] Yang, J. et al. Rise: Self-improving robot policy with compositional world model.arXiv preprint arXiv:2602.11075, 2026. [40] Yin, T. et al. Playworld: Learning robot world models from autonomous play.arXiv preprint arXiv:2603.09030, 2026. [41] Yu, C. et al. Rlinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation.arXiv preprint arXiv:2509.15965, 2025. [42] Yu, T. et al. Mopo: Model-based offline policy optimization.Advances in neural information processing systems, 33:14129-14142, 2020. [43] Zhang, J. et al. Reinforcing action policies by prophesying."},{"citing_arxiv_id":"2605.07794","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-08T14:31:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NoiseGate learns per-latent timestep schedules as an information-gating policy in diffusion-based world action models, yielding consistent gains on RoboTwin manipulation tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"on the fly: Diffusion time prediction for faster and better image generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23412-23422, 2025. [32] Hongzhi Zang, Mingjie Wei, Si Xu, Yongji Wu, Zhen Guo, Yuanqing Wang, Hao Lin, Liangzhi Shi, Yuqing Xie, Zhexuan Xu, et al. Rlinf-vla: A unified and efficient framework for vla+ rl training.arXiv preprint arXiv:2510.06710, 2025. [33] Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, et al. Rlinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation.arXiv preprint arXiv:2509.15965, 2025. [34] Zhong Guan, Haoran Sun, Yongjian Guo, Shuai Di, Xiaodong Bai, Jing Long, Tianyun Zhao,"},{"citing_arxiv_id":"2605.07288","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training","primary_cat":"cs.CV","submitted_at":"2026-05-08T05:54:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Sword improves world model simulators for VLA policies by disentangling visual style from dynamics and bootstrapping latents for better consistency, outperforming baselines on LIBERO in generalization and RL post-training success.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26256","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training","primary_cat":"cs.LG","submitted_at":"2026-04-29T03:25:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DORA's multi-version streaming rollout enables 2-3x higher throughput in asynchronous RL for LLMs while preserving convergence by maintaining policy consistency, data integrity, and bounded staleness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24729","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-04-27T17:40:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SpecRLBench is a new benchmark evaluating generalization of LTL-guided RL methods across navigation and manipulation domains with static/dynamic environments and varied robot dynamics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23838","ref_index":62,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training","primary_cat":"cs.LG","submitted_at":"2026-04-26T18:45:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"JigsawRL achieves up to 1.85x higher throughput in LLM RL pipelines via pipeline multiplexing, sub-stage graphs, and look-ahead scheduling compared to prior systems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"blocking.AReaL[13] improves pipeline efficiency by dis- carding overlong samples and recomputing them later to mit- igate the long-tail effect.Laminar[44] proposes fully asyn- chronous rollout and trainer instances to break barriers be- tween stages, leveraging relay buffers to support fine-grained weight updates and isolate long-tail samples.RLinf[62] en- ables more flexible data and stage partitioning at a finer granu- larity, achieving dynamic spatiotemporal scheduling within a single RL pipeline. However, asynchronous RL suffers from data staleness [66], which can degrade training stability and convergence. Such trade-offs are undesirable in many real- world deployments, where strict correctness and stability re-"},{"citing_arxiv_id":"2511.14148","ref_index":74,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2025-11-18T05:21:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}