GeoLaux is a new benchmark of 2186 long-step geometry problems requiring auxiliary lines, used to evaluate 23 MLLMs and reveal major drops in performance on complex tasks.
M2-reasoning: Empowering mllms with unified general and spatial reasoning.arXiv preprint arXiv:2507.08306
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.
Dual Tuning is a data curation method that jointly scores training examples for benefit and for reasoning-gain to choose between reasoning and direct-answer post-training modes for multimodal LLMs.
citing papers explorer
-
GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines
GeoLaux is a new benchmark of 2186 long-step geometry problems requiring auxiliary lines, used to evaluate 23 MLLMs and reveal major drops in performance on complex tasks.
-
MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models
MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.
-
Dual Tuning for Reasoning Efficacy-Driven Data Curation in Multimodal LLM Training
Dual Tuning is a data curation method that jointly scores training examples for benefit and for reasoning-gain to choose between reasoning and direct-answer post-training modes for multimodal LLMs.