Perceval is a perception-centric PRM that detects token-level perceptual errors in VLMs, supporting token-advantage RL training and iterative test-time scaling for improved reasoning.
Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.
DT2IT-MRM proposes a debiased preference construction pipeline, T2I data reformulation, and iterative training to curate multimodal preference data, achieving SOTA on VL-RewardBench, Multimodal RewardBench, and MM-RLHF-RewardBench.
citing papers explorer
-
DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling
DT2IT-MRM proposes a debiased preference construction pipeline, T2I data reformulation, and iterative training to curate multimodal preference data, achieving SOTA on VL-RewardBench, Multimodal RewardBench, and MM-RLHF-RewardBench.