RC-DPO adds a CoT-conditioned preference term to DPO and pairs it with MCTS-based positive CoT generation plus attention-guided pruning for negatives, yielding lower hallucination rates on multimodal benchmarks.
It contains chal- lenging image-question pairs that require models to generate visually grounded answers rather than re- lying on language priors
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization
RC-DPO adds a CoT-conditioned preference term to DPO and pairs it with MCTS-based positive CoT generation plus attention-guided pruning for negatives, yielding lower hallucination rates on multimodal benchmarks.