Evaluating clip: towards characterization of broader capabilities and downstream implications.arXiv preprint arXiv:2108.02818

Sandhini Agarwal, Gretchen Krueger, Jack Clark, Alec Radford, Jong Wook Kim, Miles Brundage · 2021 · arXiv 2108.02818

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

cs.CV · 2021-04-18 · conditional · novelty 8.0

CLIPScore uses a web-pretrained CLIP model to evaluate image captions without references and achieves higher human correlation than CIDEr or SPICE.

An Attribute-Based Measure of Video Complexity

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

VideoABC estimates video-LLM failure probability via low-dimensional attribute projection, dual quantization (k-means plus lattice), and psychophysics-inspired synthetic data.

Bias at the End of the Score

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Reward models used as quality scorers in text-to-image generation encode demographic biases that cause reward-guided training to sexualize female subjects, reinforce stereotypes, and reduce diversity.

ComMem: Complementary Memory Systems for Test-Time Adaptation of Vision-Language Models

cs.AI · 2026-06-27 · unverdicted · novelty 5.0

ComMem proposes complementary fast visual cache and slow textual prototype memories for test-time adaptation of VLMs, claiming superior performance on 15 benchmarks under distribution shifts.

citing papers explorer

Showing 4 of 4 citing papers.

CLIPScore: A Reference-free Evaluation Metric for Image Captioning cs.CV · 2021-04-18 · conditional · none · ref 2
CLIPScore uses a web-pretrained CLIP model to evaluate image captions without references and achieves higher human correlation than CIDEr or SPICE.
An Attribute-Based Measure of Video Complexity cs.CV · 2026-05-30 · unverdicted · none · ref 2
VideoABC estimates video-LLM failure probability via low-dimensional attribute projection, dual quantization (k-means plus lattice), and psychophysics-inspired synthetic data.
Bias at the End of the Score cs.CV · 2026-04-14 · unverdicted · none · ref 1
Reward models used as quality scorers in text-to-image generation encode demographic biases that cause reward-guided training to sexualize female subjects, reinforce stereotypes, and reduce diversity.
ComMem: Complementary Memory Systems for Test-Time Adaptation of Vision-Language Models cs.AI · 2026-06-27 · unverdicted · none · ref 2
ComMem proposes complementary fast visual cache and slow textual prototype memories for test-time adaptation of VLMs, claiming superior performance on 15 benchmarks under distribution shifts.

Evaluating clip: towards characterization of broader capabilities and downstream implications.arXiv preprint arXiv:2108.02818

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer