CLIPScore uses a web-pretrained CLIP model to evaluate image captions without references and achieves higher human correlation than CIDEr or SPICE.
Evaluating clip: towards characterization of broader capabilities and downstream implications.arXiv preprint arXiv:2108.02818
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
VideoABC estimates video-LLM failure probability via low-dimensional attribute projection, dual quantization (k-means plus lattice), and psychophysics-inspired synthetic data.
Reward models used as quality scorers in text-to-image generation encode demographic biases that cause reward-guided training to sexualize female subjects, reinforce stereotypes, and reduce diversity.
ComMem proposes complementary fast visual cache and slow textual prototype memories for test-time adaptation of VLMs, claiming superior performance on 15 benchmarks under distribution shifts.
citing papers explorer
No citing papers match the current filters.