AutoRubric generates rubric-based process rewards from self-aggregated successful trajectories to improve faithful multimodal reasoning in MLLMs under RLVR without human annotation or teacher models.
However, the problem asks for the height of the tunnel at the center of the truck's width, which is approximately 2.88 meters
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
AutoRubric generates rubric-based process rewards from self-aggregated successful trajectories to improve faithful multimodal reasoning in MLLMs under RLVR without human annotation or teacher models.