rDPO uses offline-built rubrics to generate on-policy preference data for DPO, raising benchmark scores in visual tasks over outcome-based filtering and style baselines.
Mitigating hallucination through theory-consistent symmetric multimodal preference optimization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.
citing papers explorer
-
Visual Preference Optimization with Rubric Rewards
rDPO uses offline-built rubrics to generate on-policy preference data for DPO, raising benchmark scores in visual tasks over outcome-based filtering and style baselines.