pith. sign in

hub Canonical reference

pi rl: Online rl fine-tuning for flow-based vision-language-action mod- els.arXiv preprint arXiv:2510.25889

Canonical reference. 73% of citing Pith papers cite this work as background.

28 Pith papers citing it
Background 73% of classified citations

hub tools

citation-role summary

background 8 baseline 1 method 1 other 1

citation-polarity summary

years

2026 27 2025 1

clear filters

representative citing papers

Reinforcing VLAs in Task-Agnostic World Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.

RISE: Self-Improving Robot Policy with Compositional World Model

cs.RO · 2026-02-11 · unverdicted · novelty 6.0

RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.

Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning

cs.RO · 2026-02-11 · unverdicted · novelty 6.0

LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a 22% average success rate gain over supervised fine-tuning on the LIBERO benchmark's

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

cs.LG · 2025-11-18 · unverdicted · novelty 6.0

RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.

TacCoRL: Integrating Tactile Feedback into VLA via Simulation

cs.RO · 2026-06-10 · unverdicted · novelty 5.0

TacCoRL integrates tactile feedback into VLA policies via real-aligned simulation co-training and RL, raising average success from 50% to 72.5% on four bimanual contact-rich tasks with direct real-robot transfer.

DexPIE: Stable Dexterous Policy Improvement from Real-World Experience

cs.RO · 2026-06-08 · unverdicted · novelty 5.0

DexPIE improves dexterous manipulation success rates by 37% over demo policies via real-world experience collection with adapted intervention, multi-stage DAgger, asynchronous relative-action inference, and optimality conditioning.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.