pith. sign in

hub

Visual embodied brain: Let multimodal large language models see, think, and control in spaces

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 11 2025 3

roles

background 1

polarities

background 1

clear filters

representative citing papers

Token Warping Helps MLLMs Look from Nearby Viewpoints

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

Backward token warping in ViT-based MLLMs enables reliable reasoning from nearby viewpoints by preserving semantic coherence better than pixel-wise warping or fine-tuning baselines.

RoboPIN: Grounded Embodied Reasoning via Pinned Chain-of-Thought

cs.AI · 2026-06-14 · unverdicted · novelty 6.0

Introduces PinCoT paradigm with visual reasoning anchors, builds PIN-170K dataset via automated pipeline, and trains 4B RoboPIN model via three-stage post-training to outperform 7B baselines by 12% on embodied reasoning benchmarks.

GeoWorld-VLM: Geometry from World Models for Vision-Language Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

GeoWorld-VLM aligns VLM image features with intermediate representations from camera-conditioned world models via fine-tuning only the encoder and projector, yielding ~4% gains on What'sUp and VSR spatial benchmarks across two VLM backbones.

MiMo-Embodied: X-Embodied Foundation Model Technical Report

cs.RO · 2025-11-20 · unverdicted · novelty 6.0

MiMo-Embodied is a single foundation model that achieves state-of-the-art results on 17 embodied AI benchmarks and 12 autonomous driving benchmarks through multi-stage learning, curated data, and CoT/RL fine-tuning that produces positive cross-domain transfer.

SPARC: Reliable Spatial Annotations from Robot Demonstrations at Scale

cs.RO · 2026-06-11 · unverdicted · novelty 5.0

SPARC generates reliable spatial annotations for robot demonstrations by leveraging spatio-temporal task structure, outperforming detection baselines on localization accuracy while retaining more samples and enabling competitive model performance without manual annotations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • RoboPIN: Grounded Embodied Reasoning via Pinned Chain-of-Thought cs.AI · 2026-06-14 · unverdicted · none · ref 11

    Introduces PinCoT paradigm with visual reasoning anchors, builds PIN-170K dataset via automated pipeline, and trains 4B RoboPIN model via three-stage post-training to outperform 7B baselines by 12% on embodied reasoning benchmarks.