Lmm-r1: Empowering 3b lmms with strong reasoning abilities through two-stage rule-based rl

Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Improving Vision-language Models with Perception-centric Process Reward Models

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

Perceval is a perception-centric PRM that detects token-level perceptual errors in VLMs, supporting token-advantage RL training and iterative test-time scaling for improved reasoning.

citing papers explorer

Showing 1 of 1 citing paper.

Improving Vision-language Models with Perception-centric Process Reward Models cs.CV · 2026-04-27 · unverdicted · none · ref 31
Perceval is a perception-centric PRM that detects token-level perceptual errors in VLMs, supporting token-advantage RL training and iterative test-time scaling for improved reasoning.

Lmm-r1: Empowering 3b lmms with strong reasoning abilities through two-stage rule-based rl

fields

years

verdicts

representative citing papers

citing papers explorer