{"paper":{"title":"ThetaEvolve: Test-time Learning on Open Problems","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A small open-source model learns to evolve programs at test time and sets new best-known bounds on open mathematical problems.","cross_cats":["cs.CL"],"primary_cat":"cs.LG","authors_text":"Baolin Peng, Eva Xu, Hao Cheng, Liliang Ren, Luyao Ma, Pengcheng He, Shao-Rong Su, Shuohang Wang, Simon Shaolei Du, Weizhu Chen, Xinyu Yang, Xuehai He, Yelong Shen, Yiping Wang, Zeyi Huang, Zhiyuan Zeng","submitted_at":"2025-11-28T18:58:14Z","abstract_excerpt":"Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure inference system that models cannot internalize the evolving strategies. We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time, allowing models to continually learn from their "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ThetaEvolve is the first evolving framework that enable a small open-source model, like DeepSeek-R1-0528-Qwen3-8B, to achieve new best-known bounds on open problems (circle packing and first auto-correlation inequality) mentioned in AlphaEvolve.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the observed improvements and cross-task transfer result from the model internalizing evolving strategies via RL rather than from increased total compute, specific hyperparameter choices, or the particular program database construction.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ThetaEvolve enables small open-source LLMs to achieve new best-known bounds on open problems such as circle packing by combining test-time RL with a large program database and lazy penalties.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A small open-source model learns to evolve programs at test time and sets new best-known bounds on open mathematical problems.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3842f90d0d3319dabd070036d119feb2f1279915fcec9b7db0297f48bae72566"},"source":{"id":"2511.23473","kind":"arxiv","version":1},"verdict":{"id":"739918ff-16ab-4cae-aca2-323c4cf152de","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T13:11:44.260084Z","strongest_claim":"ThetaEvolve is the first evolving framework that enable a small open-source model, like DeepSeek-R1-0528-Qwen3-8B, to achieve new best-known bounds on open problems (circle packing and first auto-correlation inequality) mentioned in AlphaEvolve.","one_line_summary":"ThetaEvolve enables small open-source LLMs to achieve new best-known bounds on open problems such as circle packing by combining test-time RL with a large program database and lazy penalties.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the observed improvements and cross-task transfer result from the model internalizing evolving strategies via RL rather than from increased total compute, specific hyperparameter choices, or the particular program database construction.","pith_extraction_headline":"A small open-source model learns to evolve programs at test time and sets new best-known bounds on open mathematical problems."},"references":{"count":50,"sample":[{"doi":"","year":2024,"title":"Spurious Rewards: Rethinking Training Signals in RLVR","work_id":"8e05ef02-44f0-41ce-aea5-d954f72e9546","ref_index":1,"cited_arxiv_id":"2506.10947","is_internal_anchor":true},{"doi":"","year":null,"title":"The optimal arrangement likely involves variable-sized circles","work_id":"bff7b668-0166-4454-b04d-50daa54823e4","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"A pure hexagonal arrangement may not be optimal due to edge effects","work_id":"f5600522-95d6-48a0-8634-64fbdd6e50e7","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The densest known circle packings often use a hybrid approach","work_id":"73a883d6-f05f-48ff-8c55-4d1e7a75029d","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The optimization routine is critically important - simple physics-based models with carefully tuned parameters","work_id":"2c52ab91-8842-4317-b2be-327600b2b6ed","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":50,"snapshot_sha256":"215da317724ce1ea2f2d5c5c04839142318f0da85434e2feb05b9f6becc80ee8","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"4e417a6da4d4f03e08a98973d726ddc4fbccee8c408d53a1a7d02269c7c2c5c6"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}