CLIPScore uses a web-pretrained CLIP model to evaluate image captions without references and achieves higher human correlation than CIDEr or SPICE.
Newsclippings: Automatic generation of out-of-context multimodal media
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
The XNote dataset and LVLM benchmarks demonstrate that current models face significant challenges in generating accurate, grounded Community Notes for image-based contextual deception.
RW-Post is an auditable benchmark linking social media posts to evidence from human fact-check articles for evaluating multimodal AI fact-checking across different evidence regimes.
RW-Post is an auditable text-image benchmark for real-world multimodal fact-checking that links posts to evidence traces from human fact-check articles and includes the AgentFact baseline for evaluation.
CRAVE is a new framework that clusters retrieved text and image evidence into narratives and uses an LLM judge to produce explained fact-checking verdicts.
citing papers explorer
-
XNote: Benchmarking Automated Community Notes Generation for Image-based Contextual Deception
The XNote dataset and LVLM benchmarks demonstrate that current models face significant challenges in generating accurate, grounded Community Notes for image-based contextual deception.