Semi-off-policy reinforcement learning for vision-language slow- thinking reasoning

Junhao Shen, Haiteng Zhao, Yuzhe Gu, Songyang Gao, Kuikun Liu, Haian Huang, Jianfei Gao, Dahua Lin, Wenwei Zhang, Kai Chen · 2025 · arXiv 2507.16814

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

citation-role summary

background 1

background 1

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

OMIBench benchmark reveals that current LVLMs achieve at most 50% on Olympiad problems requiring reasoning across multiple images.

Showing 1 of 1 citing paper.

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model cs.CV · 2026-04-22 · unverdicted · none · ref 48
OMIBench benchmark reveals that current LVLMs achieve at most 50% on Olympiad problems requiring reasoning across multiple images.