CLIPScore uses a web-pretrained CLIP model to evaluate image captions without references and achieves higher human correlation than CIDEr or SPICE.
Newsclippings: Auto- matic generation of out-of-context multimodal media
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
RW-Post is an auditable text-image benchmark for real-world multimodal fact-checking that links posts to evidence traces from human fact-check articles and includes the AgentFact baseline for evaluation.
citing papers explorer
-
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore uses a web-pretrained CLIP model to evaluate image captions without references and achieves higher human correlation than CIDEr or SPICE.
-
RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild
RW-Post is an auditable text-image benchmark for real-world multimodal fact-checking that links posts to evidence traces from human fact-check articles and includes the AgentFact baseline for evaluation.