Orientation information is recoverable from MLLM visual encoder embeddings via linear regression, contradicting the hypothesis that failures originate in the encoders.
Euclid’s gift: En- hancing spatial perception and reasoning in vision-language models via geometric surrogate tasks
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
years
2026 3roles
baseline 1polarities
baseline 1representative citing papers
Introduces PinCoT paradigm with visual reasoning anchors, builds PIN-170K dataset via automated pipeline, and trains 4B RoboPIN model via three-stage post-training to outperform 7B baselines by 12% on embodied reasoning benchmarks.
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
citing papers explorer
-
Why MLLMs Struggle to Determine Object Orientations
Orientation information is recoverable from MLLM visual encoder embeddings via linear regression, contradicting the hypothesis that failures originate in the encoders.