pith. sign in
Pith Number

pith:ZL63GJPR

pith:2023:ZL63GJPRLMPHJ5HHQZ2CXNA6EA
not attested not anchored not stored refs resolved

The Falcon Series of Open Language Models

Abdulaziz Alshamsi, Alessandro Cappelli, Badreddine Noune, Baptiste Pannier, Daniele Mazzotta, Daniel Hesslow, Ebtesam Almazrouei, \'Etienne Goffinet, Guilherme Penedo, Hamza Alobeidli, Julien Launay, M\'erouane Debbah, Quentin Malartic, Ruxandra Cojocaru

Falcon-180B, trained on 3.5 trillion tokens from web data, nears PaLM-2-Large performance at lower pretraining and inference cost.

arxiv:2311.16867 v2 · 2023-11-28 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ZL63GJPRLMPHJ5HHQZ2CXNA6EA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Falcon-180B significantly outperforms models such as PaLM or Chinchilla, improves upon LLaMA 2 or Inflection-1, and nears the performance of PaLM-2-Large at a reduced pretraining and inference cost, making it one of the three best language models in the world along with GPT-4 and PaLM-2-Large.

C2weakest assumption

The assumption that the reported benchmark results reflect genuine capability gains rather than differences in evaluation protocols, data contamination, or undisclosed advantages in testing conditions.

C3one line summary

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

References

269 extracted · 269 resolved · 66 Pith anchors

[1] Warp size impact in GPUs: large or small? , author=. GPGPU@ASPLOS , year=
[2] XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , author=. ArXiv , year=
[3] Yiwei Yang, Chung Peng Lee, Shangbin Feng, Dora Zhao, Bingbing Wen, Anthony Zhe Liu, Yulia Tsvetkov, and Bill Howe
[4] The Power of Scale for Parameter-Efficient Prompt Tuning · arXiv:2104.08691
[5] Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models

Formal links

2 machine-checked theorem links

Cited by

33 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.300417Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

cafdb325f15b1e74f4e786742bb41e20181254537c6cae321499d2780ea3f22b

Aliases

arxiv: 2311.16867 · arxiv_version: 2311.16867v2 · doi: 10.48550/arxiv.2311.16867 · pith_short_12: ZL63GJPRLMPH · pith_short_16: ZL63GJPRLMPHJ5HH · pith_short_8: ZL63GJPR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZL63GJPRLMPHJ5HHQZ2CXNA6EA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: cafdb325f15b1e74f4e786742bb41e20181254537c6cae321499d2780ea3f22b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "342919fd21ebca51681577f339817650f22f777abcb6a8f6b53031456303dc3d",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-11-28T15:12:47Z",
    "title_canon_sha256": "f56e343e237f48a5d85c2195861ba0c78226381c90163fceb55741ea3be8ab05"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.16867",
    "kind": "arxiv",
    "version": 2
  }
}