RoboGPT-R1: Enhancing Robot Task Planning with Reinforcement Learning

Bingyan Nie; Boyu Li; Haoran Li; Jinrui Liu; Shunsen He; Yaran Chen; Yuze Wang

arxiv: 2510.14828 · v3 · pith:UGQ7E5KGnew · submitted 2025-10-16 · 💻 cs.AI · cs.RO

RoboGPT-R1: Enhancing Robot Task Planning with Reinforcement Learning

Jinrui Liu , Bingyan Nie , Boyu Li , Yaran Chen , Yuze Wang , Shunsen He , Haoran Li This is my paper

classification 💻 cs.AI cs.RO

keywords reasoningtasksplanningfine-tuninglanguagemodelmodelssupervised

0 comments

read the original abstract

Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visual-spatial understanding and reasoning. To achieve physical understanding and action sequence consistency in multi-step reasoning tasks, we design a rule-based reward function that simultaneously considers long-horizon performance and action constraint in the environment. The reasoning model, trained on Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini, by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the EmbodiedBench benchmark.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
cs.RO 2026-05 unverdicted novelty 6.0

A co-evolutionary VLM-VGM loop on 500 unlabeled images raises planner success by 30 points and simulator success by 48 percent while beating fully supervised baselines.
RoboAgent: Chaining Basic Capabilities for Embodied Task Planning
cs.RO 2026-04 unverdicted novelty 5.0

RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.