ClaimDiff-RL introduces reference-conditioned atomic claim differences verified by a multimodal judge as the reward signal for fine-grained RL in long-form image captioning.
Simplevqa: Multimodal factuality evaluation for multimodal large language models.2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4637–4646
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison
ClaimDiff-RL introduces reference-conditioned atomic claim differences verified by a multimodal judge as the reward signal for fine-grained RL in long-form image captioning.