pith. machine review for the scientific record. sign in
Pith Number

pith:AY7B3MSL

pith:2023:AY7B3MSLIICKEHJXAFZ7D62B7J
not attested not anchored not stored refs pending

CMMLU: Measuring massive multitask language understanding in Chinese

Fajri Koto, Hai Zhao, Haonan Li, Nan Duan, Timothy Baldwin, Yeyun Gong, Yifei Yang, Yixuan Zhang

Most large language models score below 50 percent on a new Chinese multitask understanding benchmark.

arxiv:2306.09212 v2 · 2023-06-15 · cs.CL

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

most existing LLMs struggle to achieve an average accuracy of 50%, even when provided with in-context examples and chain-of-thought prompts, whereas the random baseline stands at 25%.

C2weakest assumption

The questions in CMMLU accurately represent the knowledge and reasoning demands of real Chinese-language tasks across the covered subjects.

C3one line summary

CMMLU benchmark shows most advanced LLMs score below 50% accuracy on Chinese multitask understanding, well above the 25% random baseline but revealing major room for improvement.

Formal links

2 machine-checked theorem links

Cited by

18 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.587873Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

063e1db24b4204a21d370173f1fb41fa4a41edf019b497bd0f153385a2b3ba48

Aliases

arxiv: 2306.09212 · arxiv_version: 2306.09212v2 · doi: 10.48550/arxiv.2306.09212 · pith_short_12: AY7B3MSLIICK · pith_short_16: AY7B3MSLIICKEHJX · pith_short_8: AY7B3MSL
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/AY7B3MSLIICKEHJXAFZ7D62B7J \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 063e1db24b4204a21d370173f1fb41fa4a41edf019b497bd0f153385a2b3ba48
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d9c30447ea56c65bd128e617eaa5ad9b4e951a27c9baad2ab43a562a40a37796",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-06-15T15:49:51Z",
    "title_canon_sha256": "85638d27497b05a8b6e6cfc2333eaaca9948e66c407771973cfd620a8c90441e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2306.09212",
    "kind": "arxiv",
    "version": 2
  }
}