Pith Number

pith:PREMJTZC

pith:2023:PREMJTZC4J7RNK4IHH6M6UBNEE

not attested not anchored not stored refs resolved

Demystifying CLIP Data

Christoph Feichtenhofer, Gargi Ghosh, Hu Xu, Luke Zettlemoyer, Po-Yao Huang, Russell Howes, Saining Xie, Shang-Wen Li, Vasu Sharma, Xiaoqing Ellen Tan

MetaCLIP balances CommonCrawl image-text pairs using CLIP-derived metadata to exceed original CLIP performance on zero-shot benchmarks.

arxiv:2309.16671 v6 · 2023-09-28 · cs.CV · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{PREMJTZC4J7RNK4IHH6M6UBNEE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MetaCLIP applied to CommonCrawl with 400M image-text data pairs outperforms CLIP's data on multiple standard benchmarks. In zero-shot ImageNet classification, MetaCLIP achieves 70.8% accuracy, surpassing CLIP's 68.3% on ViT-B models. Scaling to 1B data attains 72.4%.

C2weakest assumption

That metadata derived from CLIP's own concepts is sufficient to capture the key distributional properties that made CLIP data effective, and that explicit balancing over this metadata is the primary driver of the observed gains rather than other unmeasured factors in the raw pool.

C3one line summary

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

References

179 extracted · 179 resolved · 33 Pith anchors

[2] Coresets for nonparametric estimation-the case of dp-means 2015

[4] An image is worth 16x16 words: Transformers for image recognition at scale 2020

[5] Scalable training of mixture models via coresets 2011

[6] Datacomp: In search of the next generation of multimodal datasets, 2023 2023

[7] On coresets for k-means and k-median clustering 2004

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

Rethinking the Global Knowledge of CLIP in Training-Free Open-Vocabulary Semantic Segmentation

LeakyCLIP: Extracting Training Data from CLIP

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image Analysis

Page image classification for content-specific data processing

Receipt and verification

First computed	2026-05-17T23:38:48.378631Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

7c48c4cf22e27f16ab8839fccf502d21084bf52b5499072da4555157a99911e5

Aliases

arxiv: 2309.16671 · arxiv_version: 2309.16671v6 · doi: 10.48550/arxiv.2309.16671 · pith_short_12: PREMJTZC4J7R · pith_short_16: PREMJTZC4J7RNK4I · pith_short_8: PREMJTZC

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/PREMJTZC4J7RNK4IHH6M6UBNEE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7c48c4cf22e27f16ab8839fccf502d21084bf52b5499072da4555157a99911e5

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ebf6224ad45b03c9907c6c2a98803e7c9116e2a50f09fa513efa5dcd022d1323",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-09-28T17:59:56Z",
    "title_canon_sha256": "77371d4b9df8c37f41b4553938b8a1d5762fff9a02903ac4b560ddeca04e5b06"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2309.16671",
    "kind": "arxiv",
    "version": 6
  }
}