CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.
Painting with words: Elevating detailed image caption- ing with benchmark and alignment learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
ReShift is a reasoning-level backdoor framework for VLMs that uses poisoned data construction and joint optimization to shift CoT trajectories on trigger while preserving surface coherence.
citing papers explorer
-
ReShift: Aha-Moment-Driven Reasoning-Level Backdoor Attacks on Vision-Language Models
ReShift is a reasoning-level backdoor framework for VLMs that uses poisoned data construction and joint optimization to shift CoT trajectories on trigger while preserving surface coherence.