VL-MDR dynamically selects and aggregates fine-grained dimensions for interpretable vision-language reward modeling using a visual-aware gate, backed by a 321k annotated preference dataset, and improves DPO alignment for hallucination mitigation.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning What Matters: Dynamic Dimension Selection and Aggregation for Interpretable Vision-Language Reward Modeling
VL-MDR dynamically selects and aggregates fine-grained dimensions for interpretable vision-language reward modeling using a visual-aware gate, backed by a 321k annotated preference dataset, and improves DPO alignment for hallucination mitigation.