Investigating the Limitation of CLIP Models: The Worst-Performing Categories

Jiang-Xin Shi; Jie-Jing Shao; Lan-Zhe Guo; Xiao-Wen Yang; Yu-Feng Li

arxiv: 2310.03324 · v1 · pith:GCJ73LBAnew · submitted 2023-10-05 · 💻 cs.CV · cs.LG

Investigating the Limitation of CLIP Models: The Worst-Performing Categories

Jie-Jing Shao , Jiang-Xin Shi , Xiao-Wen Yang , Lan-Zhe Guo , Yu-Feng Li This is my paper

classification 💻 cs.CV cs.LG

keywords categoriesclipperformanceaccuracymodelsoverallpromptsworst-performing

0 comments

read the original abstract

Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts, enabling zero-shot recognition on downstream tasks. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance. For example, on ImageNet, there are a total of 10 categories with class-wise accuracy as low as 0\%, even though the overall performance has achieved 64.1\%. This phenomenon reveals the potential risks associated with using CLIP models, particularly in risk-sensitive applications where specific categories hold significant importance. To address this issue, we investigate the alignment between the two modalities in the CLIP model and propose the Class-wise Matching Margin (\cmm) to measure the inference confusion. \cmm\ can effectively identify the worst-performing categories and estimate the potential performance of the candidate prompts. We further query large language models to enrich descriptions of worst-performing categories and build a weighted ensemble to highlight the efficient prompts. Experimental results clearly verify the effectiveness of our proposal, where the accuracy on the worst-10 categories on ImageNet is boosted to 5.2\%, without manual prompt engineering, laborious optimization, or access to labeled validation data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Silent Brush: Evaluating Artistic Style Leakage in AI Art Generation
cs.LG 2026-05 unverdicted novelty 7.0

Art Arena evaluates how artistic styles from training data leak into AI-generated images without explicit prompts, revealing asymmetric blending due to differences in representational strength and interaction dynamics...