pith. sign in

arxiv: 2310.03324 · v1 · pith:GCJ73LBAnew · submitted 2023-10-05 · 💻 cs.CV · cs.LG

Investigating the Limitation of CLIP Models: The Worst-Performing Categories

classification 💻 cs.CV cs.LG
keywords categoriesclipperformanceaccuracymodelsoverallpromptsworst-performing
0
0 comments X
read the original abstract

Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts, enabling zero-shot recognition on downstream tasks. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance. For example, on ImageNet, there are a total of 10 categories with class-wise accuracy as low as 0\%, even though the overall performance has achieved 64.1\%. This phenomenon reveals the potential risks associated with using CLIP models, particularly in risk-sensitive applications where specific categories hold significant importance. To address this issue, we investigate the alignment between the two modalities in the CLIP model and propose the Class-wise Matching Margin (\cmm) to measure the inference confusion. \cmm\ can effectively identify the worst-performing categories and estimate the potential performance of the candidate prompts. We further query large language models to enrich descriptions of worst-performing categories and build a weighted ensemble to highlight the efficient prompts. Experimental results clearly verify the effectiveness of our proposal, where the accuracy on the worst-10 categories on ImageNet is boosted to 5.2\%, without manual prompt engineering, laborious optimization, or access to labeled validation data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Silent Brush: Evaluating Artistic Style Leakage in AI Art Generation

    cs.LG 2026-05 unverdicted novelty 7.0

    Art Arena evaluates how artistic styles from training data leak into AI-generated images without explicit prompts, revealing asymmetric blending due to differences in representational strength and interaction dynamics...