Painting with words: Elevating detailed image caption- ing with benchmark and alignment learning

Ye, Q · 2025 · arXiv 2503.07906

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

CaptionQA: Is Your Caption as Useful as the Image Itself?

cs.CV · 2025-11-26 · conditional · novelty 7.0

CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.

ReShift: Aha-Moment-Driven Reasoning-Level Backdoor Attacks on Vision-Language Models

cs.CR · 2026-07-01 · unverdicted · novelty 6.0

ReShift is a reasoning-level backdoor framework for VLMs that uses poisoned data construction and joint optimization to shift CoT trajectories on trigger while preserving surface coherence.

citing papers explorer

Showing 1 of 1 citing paper after filters.

ReShift: Aha-Moment-Driven Reasoning-Level Backdoor Attacks on Vision-Language Models cs.CR · 2026-07-01 · unverdicted · none · ref 45
ReShift is a reasoning-level backdoor framework for VLMs that uses poisoned data construction and joint optimization to shift CoT trajectories on trigger while preserving surface coherence.

Painting with words: Elevating detailed image caption- ing with benchmark and alignment learning

fields

years

verdicts

representative citing papers

citing papers explorer