Pith Number

pith:CFJHD4PY

pith:2024:CFJHD4PYK63OGQJLKSW2TFT7GC

not attested not anchored not stored refs resolved

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Adithya Iyer, Ellis Brown, Jihan Yang, Manoj Middepogu, Penghao Wu, Rob Fergus, Sai Charitha Akula, Saining Xie, Sanghyun Woo, Shengbang Tong, Shusheng Yang, Xichen Pan, Yann LeCun, Ziteng Wang

Cambrian-1 shows vision-centric design across twenty encoders plus new benchmarks produces stronger sensory grounding in multimodal LLMs.

arxiv:2406.16860 v2 · 2024-06-24 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{CFJHD4PYK63OGQJLKSW2TFT7GC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Cambrian-1 not only achieves state-of-the-art performance but also serves as a comprehensive, open cookbook for instruction-tuned MLLMs.

C2weakest assumption

That existing MLLM benchmarks are insufficiently vision-centric and that the new CV-Bench plus SVA will provide more accurate measurement of sensory grounding without introducing their own selection or interpretation biases.

C3one line summary

Cambrian-1 is a vision-centric multimodal LLM family that evaluates over 20 vision encoders, introduces CV-Bench and the Spatial Vision Aggregator, and releases open models, code, and data achieving strong performance on visual grounding tasks.

References

163 extracted · 163 resolved · 22 Pith anchors

[1] TallyQA: Answering complex counting questions 2019

[2] Don’t just assume; look and answer: Overcoming priors for visual question answering 2018

[3] Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations 2021

[4] Llama 3 Model Card 2024

[5] arXiv preprint arXiv:2402.05128 , year= 2024

Formal links

2 machine-checked theorem links

Cited by

23 papers in Pith

Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

Receipt and verification

First computed	2026-05-17T23:38:46.167487Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

115271f1f857b6e3412b54ada9967f30b6a8d5075d42d20717a0ad84e9a593f3

Aliases

arxiv: 2406.16860 · arxiv_version: 2406.16860v2 · doi: 10.48550/arxiv.2406.16860 · pith_short_12: CFJHD4PYK63O · pith_short_16: CFJHD4PYK63OGQJL · pith_short_8: CFJHD4PY

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/CFJHD4PYK63OGQJLKSW2TFT7GC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 115271f1f857b6e3412b54ada9967f30b6a8d5075d42d20717a0ad84e9a593f3

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e75c7a5db2e5a7c0cdc8282f2d5b275416c8812e7c4b61ef0c742e12d554aa97",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-06-24T17:59:42Z",
    "title_canon_sha256": "daabba9038aa88b85882b2e1edc2f3cac3d8d26c8aae3905f3290a834b08ea4b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2406.16860",
    "kind": "arxiv",
    "version": 2
  }
}