pith. sign in
Pith Number

pith:CFJHD4PY

pith:2024:CFJHD4PYK63OGQJLKSW2TFT7GC
not attested not anchored not stored refs resolved

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Adithya Iyer, Ellis Brown, Jihan Yang, Manoj Middepogu, Penghao Wu, Rob Fergus, Sai Charitha Akula, Saining Xie, Sanghyun Woo, Shengbang Tong, Shusheng Yang, Xichen Pan, Yann LeCun, Ziteng Wang

Cambrian-1 shows vision-centric design across twenty encoders plus new benchmarks produces stronger sensory grounding in multimodal LLMs.

arxiv:2406.16860 v2 · 2024-06-24 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CFJHD4PYK63OGQJLKSW2TFT7GC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Cambrian-1 not only achieves state-of-the-art performance but also serves as a comprehensive, open cookbook for instruction-tuned MLLMs.

C2weakest assumption

That existing MLLM benchmarks are insufficiently vision-centric and that the new CV-Bench plus SVA will provide more accurate measurement of sensory grounding without introducing their own selection or interpretation biases.

C3one line summary

Cambrian-1 is a vision-centric multimodal LLM family that evaluates over 20 vision encoders, introduces CV-Bench and the Spatial Vision Aggregator, and releases open models, code, and data achieving strong performance on visual grounding tasks.

References

163 extracted · 163 resolved · 22 Pith anchors

[1] TallyQA: Answering complex counting questions 2019
[2] Don’t just assume; look and answer: Overcoming priors for visual question answering 2018
[3] Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations 2021
[4] Llama 3 Model Card 2024
[5] arXiv preprint arXiv:2402.05128 , year= 2024

Formal links

2 machine-checked theorem links

Cited by

23 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.167487Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

115271f1f857b6e3412b54ada9967f30b6a8d5075d42d20717a0ad84e9a593f3

Aliases

arxiv: 2406.16860 · arxiv_version: 2406.16860v2 · doi: 10.48550/arxiv.2406.16860 · pith_short_12: CFJHD4PYK63O · pith_short_16: CFJHD4PYK63OGQJL · pith_short_8: CFJHD4PY
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CFJHD4PYK63OGQJLKSW2TFT7GC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 115271f1f857b6e3412b54ada9967f30b6a8d5075d42d20717a0ad84e9a593f3
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e75c7a5db2e5a7c0cdc8282f2d5b275416c8812e7c4b61ef0c742e12d554aa97",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-06-24T17:59:42Z",
    "title_canon_sha256": "daabba9038aa88b85882b2e1edc2f3cac3d8d26c8aae3905f3290a834b08ea4b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2406.16860",
    "kind": "arxiv",
    "version": 2
  }
}