Semi-off-policy reinforcement learning for vision-language slow- thinking reasoning

Shen, J · 2025 · arXiv 2507.16814

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

OMIBench benchmark reveals that current LVLMs achieve at most 50% on Olympiad problems requiring reasoning across multiple images.

RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change Captioning

cs.CV · 2026-06-26 · unverdicted · novelty 5.0

RSICCLLM introduces a post-training framework with RSICI dataset, difference-aware supervised fine-tuning, and dual-negative preference optimization that claims to outperform much larger models on remote sensing image change captioning.

citing papers explorer

Showing 2 of 2 citing papers after filters.

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model cs.CV · 2026-04-22 · unverdicted · none · ref 48
OMIBench benchmark reveals that current LVLMs achieve at most 50% on Olympiad problems requiring reasoning across multiple images.
RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change Captioning cs.CV · 2026-06-26 · unverdicted · none · ref 39
RSICCLLM introduces a post-training framework with RSICI dataset, difference-aware supervised fine-tuning, and dual-negative preference optimization that claims to outperform much larger models on remote sensing image change captioning.

Semi-off-policy reinforcement learning for vision-language slow- thinking reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer