{"paper":{"title":"Learning to Discover at Test Time","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Reinforcement learning at test time on one problem lets an open LLM produce new state-of-the-art solutions for math, coding, and biology tasks.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Carlos Guestrin, Daniel Koceja, Federico Bianchi, James Zou, Jan Kautz, Jed McCaleb, Mert Yuksekgonul, Xiaolong Wang, Xinhao Li, Yejin Choi, Yu Sun","submitted_at":"2026-01-22T18:24:00Z","abstract_excerpt":"How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prio"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to 2× faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That reinforcement learning performed at test time on experience specific to one problem will reliably produce a single superior solution rather than overfitting or failing to improve over frozen-model search.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Reinforcement learning at test time on one problem lets an open LLM produce new state-of-the-art solutions for math, coding, and biology tasks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ce9d57fb032374b186648ea895e4d4bfafd7e918695aa3cbe5a86d51e7d2b78a"},"source":{"id":"2601.16175","kind":"arxiv","version":2},"verdict":{"id":"1bb148d5-4a0d-490c-8a1a-80a299e5c884","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T05:11:47.328644Z","strongest_claim":"TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to 2× faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis.","one_line_summary":"TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That reinforcement learning performed at test time on experience specific to one problem will reliably produce a single superior solution rather than overfitting or failing to improve over frozen-model search.","pith_extraction_headline":"Reinforcement learning at test time on one problem lets an open LLM produce new state-of-the-art solutions for math, coding, and biology tasks."},"references":{"count":102,"sample":[{"doi":"","year":2025,"title":"gpt-oss-120b & gpt-oss-20b Model Card","work_id":"178c1f7e-4f19-4392-a45d-45a6dfa88ead","ref_index":1,"cited_arxiv_id":"2508.10925","is_internal_anchor":true},{"doi":"","year":2024,"title":"The surprising effectiveness of test-time training for few-shot learning.arXiv preprint arXiv:2411.07279","work_id":"e3b25df0-6672-4538-9b16-9887c089a5ef","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"AtCoder Inc. AtCoder.https://atcoder.jp, 2025","work_id":"bc423b80-1605-4d2b-8727-1b71d88009b1","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Test-time Offline Reinforcement Learning on Goal-related Experience","work_id":"ddd653d1-bc8c-44c2-b0cb-051a7b495f0b","ref_index":4,"cited_arxiv_id":"2507.18809","is_internal_anchor":true},{"doi":"","year":2020,"title":"Three convolution inequalities on the real line with connections to additive combinatorics.Journal of Number Theory, 207:42–55, 2020","work_id":"9146008a-1aea-46ab-a01e-008cf03416aa","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":102,"snapshot_sha256":"3e996ce19bfd24e4f633c785897c287e223fb5637b68acd4c987191580a27fd0","internal_anchors":17},"formal_canon":{"evidence_count":3,"snapshot_sha256":"5b4573c7f2f0d66996bff9ae8c61996aab758770a3c980035c944f3e3d6af6bf"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}