Deepvision-103k: A visually diverse, broad-coverage, and verifiable mathematical dataset for multimodal reasoning.CoRR, abs/2602.16742, 2026

Haoxiang Sun, Lizhen Xu, Bing Zhao, Wotao Yin, Wei Wang, Boyu Yang, Rui Wang, Hu Wei · 2026 · arXiv 2602.16742

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Reinforcement Learning with Robust Rubric Rewards

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

RLR³ extends RLVR to criterion-level rubric verification via dual execution paths, minimal exposure masking, hierarchical aggregation, and saturation mitigation, delivering 4.7-point gains over base on 15 benchmarks with Qwen3-VL-30B-A3B.

citing papers explorer

Showing 1 of 1 citing paper.

Reinforcement Learning with Robust Rubric Rewards cs.CV · 2026-05-28 · unverdicted · none · ref 29
RLR³ extends RLVR to criterion-level rubric verification via dual execution paths, minimal exposure masking, hierarchical aggregation, and saturation mitigation, delivering 4.7-point gains over base on 15 benchmarks with Qwen3-VL-30B-A3B.

Deepvision-103k: A visually diverse, broad-coverage, and verifiable mathematical dataset for multimodal reasoning.CoRR, abs/2602.16742, 2026

fields

years

verdicts

representative citing papers

citing papers explorer