OMIBench benchmark reveals that current LVLMs achieve at most 50% on Olympiad problems requiring reasoning across multiple images.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
SFT on divergent branch-heavy CoT from DeepSeek-R1 yields worse generalization than convergent CoT from gpt-oss despite lower loss, but filtering frequent branches improves average performance by 3.6% on five reasoning benchmarks.
citing papers explorer
-
OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model
OMIBench benchmark reveals that current LVLMs achieve at most 50% on Olympiad problems requiring reasoning across multiple images.
-
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning
SFT on divergent branch-heavy CoT from DeepSeek-R1 yields worse generalization than convergent CoT from gpt-oss despite lower loss, but filtering frequent branches improves average performance by 3.6% on five reasoning benchmarks.