Robobrain: A unified brain model for robotic manipulation from abstract to concrete

Ji, Y · 2025 · arXiv 2502.21257

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

Authors create ReasonMatch-Bench and DCRL training to boost MLLM performance on wide-baseline matching, reporting gains over baselines while preserving general capabilities.

Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents

cs.AI · 2026-05-09 · unverdicted · novelty 7.0 · 3 refs

VIGIL decouples world-state completion from terminal commitment in embodied agents, exposing up to 19.7 pp gaps in benchmark success despite comparable execution across 20 models.

Exploring Spatial Intelligence from a Generative Perspective

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.

Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views

cs.CV · 2026-06-22 · unverdicted · novelty 6.0

DR-MV3D decomposes MV3D-VQA into global map construction, question-conditioned view planning, and egocentric grounding, supervised by global consistency and local trajectory rewards optimized via GRPO.

Re$^2$MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

Re²MoGen generates open-vocabulary motions via MCTS-enhanced LLM keyframe planning, pose-prior optimization with dynamic temporal matching fine-tuning, and physics-aware RL post-training, claiming SOTA performance.

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

cs.RO · 2025-08-19 · conditional · novelty 6.0

Embodied-R1 uses a pointing-centric representation and reinforced fine-tuning on a 200K dataset to achieve state-of-the-art results on embodied benchmarks plus 56.2% success in SIMPLEREnv and 87.5% on real XArm tasks without task-specific training.

MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation

cs.RO · 2025-02-19 · unverdicted · novelty 6.0

MapNav uses annotated semantic maps as memory for VLN agents, claiming SOTA results in simulation and real-world tests while promising code and data release.

GEM: Generative Supervision Helps Embodied Intelligence

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

GEM adds generative depth supervision to VLM pre-training and reports improved results on embodied benchmarks plus real-world robot execution.

ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation

cs.RO · 2026-05-09 · unverdicted · novelty 5.0

ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning

cs.CV · 2025-06-07 · unverdicted · novelty 4.0

Vision-EKIPL injects high-quality actions from external models into RL training to expand exploration and raise the reasoning ceiling of MLLMs, reporting up to 5% gains on the Reason-RFT-CoT benchmark.

citing papers explorer

Showing 10 of 10 citing papers after filters.

Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching cs.CV · 2026-06-02 · unverdicted · none · ref 22
Authors create ReasonMatch-Bench and DCRL training to boost MLLM performance on wide-baseline matching, reporting gains over baselines while preserving general capabilities.
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents cs.AI · 2026-05-09 · unverdicted · none · ref 42 · 3 links
VIGIL decouples world-state completion from terminal commitment in embodied agents, exposing up to 19.7 pp gaps in benchmark success despite comparable execution across 20 models.
Exploring Spatial Intelligence from a Generative Perspective cs.CV · 2026-04-22 · unverdicted · none · ref 15
Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.
Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views cs.CV · 2026-06-22 · unverdicted · none · ref 22
DR-MV3D decomposes MV3D-VQA into global map construction, question-conditioned view planning, and egocentric grounding, supervised by global consistency and local trajectory rewards optimized via GRPO.
Re$^2$MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement cs.CV · 2026-04-20 · unverdicted · none · ref 18
Re²MoGen generates open-vocabulary motions via MCTS-enhanced LLM keyframe planning, pose-prior optimization with dynamic temporal matching fine-tuning, and physics-aware RL post-training, claiming SOTA performance.
MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation cs.RO · 2025-02-19 · unverdicted · none · ref 15
MapNav uses annotated semantic maps as memory for VLN agents, claiming SOTA results in simulation and real-world tests while promising code and data release.
GEM: Generative Supervision Helps Embodied Intelligence cs.CV · 2026-05-27 · unverdicted · none · ref 28
GEM adds generative depth supervision to VLM pre-training and reports improved results on embodied benchmarks plus real-world robot execution.
ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation cs.RO · 2026-05-09 · unverdicted · none · ref 29
ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 154
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning cs.CV · 2025-06-07 · unverdicted · none · ref 29
Vision-EKIPL injects high-quality actions from external models into RL training to expand exploration and raise the reasoning ceiling of MLLMs, reporting up to 5% gains on the Reason-RFT-CoT benchmark.

Robobrain: A unified brain model for robotic manipulation from abstract to concrete

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer