pith. machine review for the scientific record. sign in

arxiv: 2510.14738 · v2 · submitted 2025-10-16 · 💻 cs.CL

Recognition: unknown

AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Authors on Pith no claims yet
classification 💻 cs.CL
keywords reasoningrewardsautorubricmultimodalrubric-basedgenerativemodelsrlvr
0
0 comments X
read the original abstract

Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose AutoRubric, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

    cs.CL 2026-05 unverdicted novelty 6.0

    DeltaRubric decomposes multimodal preference evaluation into self-generated planning and verification steps within a single model, producing large accuracy improvements on VL-RewardBench via multi-role reinforcement learning.