pith. sign in
Pith Number

pith:NZPDVXNX

pith:2026:NZPDVXNXRUX6HU5FEAADKKJ2BB
not attested not anchored not stored refs pending

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

Bohan Lyu, Chengshuai Shi, Chi Jin, Dapeng Jiang, Dawn Song, Huan-ang Gao, Huaqing Zhang, Jiantao Jiao, Jiaru Zhang, Junlin Yang, Kaicheng Yang, Kun Wang, Max Simchowitz, Qixin Xu, Runhan Huang, Shange Tang, Simon S. Du, Siqiao Huang, Wenhao Chai, Wentao Guo, Xinghan Li, Xinyang Han, Xinyue Ai, Yadi Cao, Yicheng Zhang, Yucheng Yang, Ziran Yang, Zitao Chen

AI agents cannot reliably invent ML methods that beat human designs on generalization and scaling tests.

arxiv:2605.08678 v2 · 2026-05-09 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NZPDVXNXRUX6HU5FEAADKKJ2BB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Current agents remain far from reliably surpassing human-designed methods, and that engineering-style tuning is easier for them than genuine method invention.

C2weakest assumption

The 140 tasks and 12 domains sufficiently capture the core skills needed for inventing generalizable and scalable ML methods without missing key aspects of real research.

C3one line summary

MLS-Bench shows that current AI agents fall short of reliably inventing generalizable ML methods, with engineering tuning easier than genuine invention.

Receipt and verification
First computed 2026-05-28T01:04:41.928394Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

6e5e3addb78d2fe3d3a5200035293a086f16fa3ee433f15379a80521a3e76351

Aliases

arxiv: 2605.08678 · arxiv_version: 2605.08678v2 · doi: 10.48550/arxiv.2605.08678 · pith_short_12: NZPDVXNXRUX6 · pith_short_16: NZPDVXNXRUX6HU5F · pith_short_8: NZPDVXNX
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NZPDVXNXRUX6HU5FEAADKKJ2BB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6e5e3addb78d2fe3d3a5200035293a086f16fa3ee433f15379a80521a3e76351
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "27d4d2277f2bca2534f506576f1920452f62d34e2704ca2beaa2810cd04953c7",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-09T04:29:46Z",
    "title_canon_sha256": "6faf6078a5ee082d9603732aeb78f26559ebefbd7b5f8786355978f97a3d8060"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.08678",
    "kind": "arxiv",
    "version": 2
  }
}