pith. sign in

hub Canonical reference

Image-of- thought prompting for visual reasoning refinement in multimodal large language models

Canonical reference. 80% of citing Pith papers cite this work as background.

11 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

years

2026 5 2025 6

roles

background 5

polarities

background 4 unclear 1

representative citing papers

Mull-Tokens: Modality-Agnostic Latent Thinking

cs.CV · 2025-12-11 · unverdicted · novelty 6.0

Mull-Tokens are modality-agnostic latent tokens that enable free-form multimodal thinking and deliver up to 16% gains on spatial reasoning benchmarks.

Grounded Reinforcement Learning for Visual Reasoning

cs.CV · 2025-05-29 · unverdicted · novelty 6.0

ViGoRL introduces visually grounded RL that anchors reasoning steps to image coordinates and uses multi-turn zooming to outperform standard RL and supervised baselines on spatial and GUI reasoning benchmarks.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

citing papers explorer

Showing 11 of 11 citing papers.