pith. sign in
Pith Number

pith:2LM7DJDE

pith:2026:2LM7DJDEYTMMXBDHWCTPR2RQK2
not attested not anchored not stored refs resolved

Learning to Discover at Test Time

Carlos Guestrin, Daniel Koceja, Federico Bianchi, James Zou, Jan Kautz, Jed McCaleb, Mert Yuksekgonul, Xiaolong Wang, Xinhao Li, Yejin Choi, Yu Sun

Reinforcement learning at test time on one problem lets an open LLM produce new state-of-the-art solutions for math, coding, and biology tasks.

arxiv:2601.16175 v2 · 2026-01-22 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2LM7DJDEYTMMXBDHWCTPR2RQK2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to 2× faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis.

C2weakest assumption

That reinforcement learning performed at test time on experience specific to one problem will reliably produce a single superior solution rather than overfitting or failing to improve over frozen-model search.

C3one line summary

TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.

References

102 extracted · 102 resolved · 17 Pith anchors

[1] gpt-oss-120b & gpt-oss-20b Model Card 2025 · arXiv:2508.10925
[2] The surprising effectiveness of test-time training for few-shot learning.arXiv preprint arXiv:2411.07279 2024
[3] AtCoder Inc. AtCoder.https://atcoder.jp, 2025 2025
[4] Test-time Offline Reinforcement Learning on Goal-related Experience 2025 · arXiv:2507.18809
[5] Three convolution inequalities on the real line with connections to additive combinatorics.Journal of Number Theory, 207:42–55, 2020 2020

Formal links

3 machine-checked theorem links

Cited by

22 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.999463Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d2d9f1a464c4d8cb8467b0a6f8ea305696999db0620761966731b8ed3fa2b765

Aliases

arxiv: 2601.16175 · arxiv_version: 2601.16175v2 · doi: 10.48550/arxiv.2601.16175 · pith_short_12: 2LM7DJDEYTMM · pith_short_16: 2LM7DJDEYTMMXBDH · pith_short_8: 2LM7DJDE
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2LM7DJDEYTMMXBDHWCTPR2RQK2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d2d9f1a464c4d8cb8467b0a6f8ea305696999db0620761966731b8ed3fa2b765
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "018aacb15b77bd0cc7c39fd5e7744db849fd223296029b78d39b08bc85643f40",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-01-22T18:24:00Z",
    "title_canon_sha256": "7587786623f1ad2c01aed29c22cd254350e74707b5534f6ee5a0516e002af5dd"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.16175",
    "kind": "arxiv",
    "version": 2
  }
}