Pith Number

pith:ZYBOZ7AD

pith:2026:ZYBOZ7ADA2AP6FTVJDMND5NFE7

not attested not anchored not stored refs resolved

Test-Time Learning with an Evolving Library

Alessandro Sordoni, Chandan Singh, Jianfeng Gao, Michel Galley, Weijia Xu, Xingdi Yuan, Zelalem Gero

Large language models improve on complex reasoning by building and evolving a shared library of skills extracted from their own inference trajectories without any parameter updates or external supervision.

arxiv:2605.14477 v1 · 2026-05-14 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{ZYBOZ7ADA2AP6FTVJDMND5NFE7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across challenging benchmarks in mathematical reasoning, code generation, and multi-turn agentic environments, EvoLib improves substantially over the top test-time scaling and learning methods without ground-truth feedback.

C2weakest assumption

That modular skills and reflective insights automatically extracted from the model's own inference trajectories can be weighted and consolidated into increasingly general and reusable abstractions that deliver long-term value without any external supervision or ground-truth signals.

C3one line summary

EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.

References

45 extracted · 45 resolved · 12 Pith anchors

[1] Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou 2023

[2] Large language models are better reasoners with self-verification 2023

[3] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters 2024 · arXiv:2408.03314

[4] Test-time recursive thinking: Self-improvement without external feedback 2026

[5] s1: Simple test-time scaling 2025

Receipt and verification

First computed	2026-05-17T23:39:06.589040Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

ce02ecfc030680ff167548d8d1f5a527d8c51f850c3d2a2d3540572e1bc4288d

Aliases

arxiv: 2605.14477 · arxiv_version: 2605.14477v1 · doi: 10.48550/arxiv.2605.14477 · pith_short_12: ZYBOZ7ADA2AP · pith_short_16: ZYBOZ7ADA2AP6FTV · pith_short_8: ZYBOZ7AD

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZYBOZ7ADA2AP6FTVJDMND5NFE7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ce02ecfc030680ff167548d8d1f5a527d8c51f850c3d2a2d3540572e1bc4288d

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "41bb27f962974400196d990a6a47b421f1e629498ca694ae56ecd50b45d58c8a",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T07:18:12Z",
    "title_canon_sha256": "c3cbb1f626339a3780999d67594f72d96efa060e1da9d5f0f43ec8ac47f0aba2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14477",
    "kind": "arxiv",
    "version": 1
  }
}