Tiger: Tool-integrated geometric rea- soning in vision-language models for robotics

Yi Han, Cheng Chi, Enshen Zhou, Shanyu Rong, Jingkun An, Pengwei Wang, Zhongyuan Wang, Lu Sheng, Shanghang Zhang · 2025 · arXiv 2510.07181

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.

BOP-ASK: Object-Interaction Reasoning for Vision-Language Models

cs.CV · 2025-11-20 · unverdicted · novelty 6.0

BOP-ASK supplies 150k images and 33M QA pairs across six tasks to improve VLMs on precise 3D object interaction reasoning and spatial planning.

MAG-3D: Multi-Agent Grounded Reasoning for 3D Understanding

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

MAG-3D is a training-free multi-agent framework that coordinates planning, grounding, and coding agents with off-the-shelf VLMs to achieve grounded 3D reasoning and state-of-the-art benchmark results.

AssemLM: A Spatial Reasoning Multimodal Large Language Model for Robotic Assembly

cs.RO · 2026-04-10

citing papers explorer

Showing 4 of 4 citing papers.

LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models cs.CV · 2026-04-08 · unverdicted · none · ref 12
LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.
BOP-ASK: Object-Interaction Reasoning for Vision-Language Models cs.CV · 2025-11-20 · unverdicted · none · ref 17
BOP-ASK supplies 150k images and 33M QA pairs across six tasks to improve VLMs on precise 3D object interaction reasoning and spatial planning.
MAG-3D: Multi-Agent Grounded Reasoning for 3D Understanding cs.CV · 2026-04-10 · unverdicted · none · ref 17
MAG-3D is a training-free multi-agent framework that coordinates planning, grounding, and coding agents with off-the-shelf VLMs to achieve grounded 3D reasoning and state-of-the-art benchmark results.
AssemLM: A Spatial Reasoning Multimodal Large Language Model for Robotic Assembly cs.RO · 2026-04-10 · unreviewed · ref 18

Tiger: Tool-integrated geometric rea- soning in vision-language models for robotics

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer