pith. sign in
Pith Number

pith:CMLU2UTA

pith:2025:CMLU2UTAQ2DJ6XU265IJR3HLCF
not attested not anchored not stored refs pending

Searching the Internet for Challenging Benchmarks at Scale

Daniel Deutsch, Mara Finkelstein, Markus Freitag, Parker Riley, Vil\'em Zouhar, Wenda Xu

An epsilon-greedy bandit search over web topics finds the hardest benchmarks after exploring only 6 percent of the space.

arxiv:2509.26619 v3 · 2025-09-30 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CMLU2UTAQ2DJ6XU265IJR3HLCF}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our epsilon-greedy strategy identifies the most challenging topics while exploring only 6% of the search space -- a 100 times cost reduction over exhaustive evaluation.

C2weakest assumption

Topic difficulty revealed through sample-and-evaluate queries is robust across independent metrics (GEMBA-SQA and MetricX), languages, and models.

C3one line summary

An epsilon-greedy multi-armed bandit framework automatically discovers challenging internet topics for benchmarks in machine translation and QA, exploring only 6% of the space for 100x cost reduction.

Formal links

1 machine-checked theorem link

Receipt and verification
First computed 2026-05-27T01:04:51.448585Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

13174d526086869f5e9af75098eceb116510e657d9834072881b6d36d2eee454

Aliases

arxiv: 2509.26619 · arxiv_version: 2509.26619v3 · doi: 10.48550/arxiv.2509.26619 · pith_short_12: CMLU2UTAQ2DJ · pith_short_16: CMLU2UTAQ2DJ6XU2 · pith_short_8: CMLU2UTA
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CMLU2UTAQ2DJ6XU265IJR3HLCF \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 13174d526086869f5e9af75098eceb116510e657d9834072881b6d36d2eee454
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5fc549e455c477fa9e0de55bb264b961c852ed41497e0edc8e038b015deba7fd",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-09-30T17:55:47Z",
    "title_canon_sha256": "3eaacefe7ae5dbdc157d21d0b60b7578559504beb12791b68f4a4dc74ee294bf"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2509.26619",
    "kind": "arxiv",
    "version": 3
  }
}