Reinforcement learning for long-horizon interactive llm agents

Kevin Chen, Marco Cusumano-Towner, Brody Huval, Aleksei Petrenko, Jackson Hamburger, Vladlen Koltun, Philipp Krähenbühl · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Interactive Post-Training for Vision-Language-Action Models

cs.LG · 2025-05-22 · unverdicted · novelty 6.0

RIPT-VLA applies RL with dynamic rollout sampling and leave-one-out advantage estimation to fine-tune VLA models, achieving up to 97.5% success rates and recovering from 4% to 97% success with one demonstration in 15 iterations.

citing papers explorer

Showing 1 of 1 citing paper.

Interactive Post-Training for Vision-Language-Action Models cs.LG · 2025-05-22 · unverdicted · none · ref 4
RIPT-VLA applies RL with dynamic rollout sampling and leave-one-out advantage estimation to fine-tune VLA models, achieving up to 97.5% success rates and recovering from 4% to 97% success with one demonstration in 15 iterations.

Reinforcement learning for long-horizon interactive llm agents

fields

years

verdicts

representative citing papers

citing papers explorer