{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:Y63FS2OAGCOEFWLD44G4OWFLQ6","short_pith_number":"pith:Y63FS2OA","schema_version":"1.0","canonical_sha256":"c7b65969c0309c42d963e70dc758ab87abd9d741857088a373183e4422af9a11","source":{"kind":"arxiv","id":"2511.23473","version":1},"attestation_state":"computed","paper":{"title":"ThetaEvolve: Test-time Learning on Open Problems","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A small open-source model learns to evolve programs at test time and sets new best-known bounds on open mathematical problems.","cross_cats":["cs.CL"],"primary_cat":"cs.LG","authors_text":"Baolin Peng, Eva Xu, Hao Cheng, Liliang Ren, Luyao Ma, Pengcheng He, Shao-Rong Su, Shuohang Wang, Simon Shaolei Du, Weizhu Chen, Xinyu Yang, Xuehai He, Yelong Shen, Yiping Wang, Zeyi Huang, Zhiyuan Zeng","submitted_at":"2025-11-28T18:58:14Z","abstract_excerpt":"Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure inference system that models cannot internalize the evolving strategies. We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time, allowing models to continually learn from their "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2511.23473","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2025-11-28T18:58:14Z","cross_cats_sorted":["cs.CL"],"title_canon_sha256":"09102d32bfe37073f0ad52ba834a2f3d00a135de04df752a6a7e4896d5f8a30a","abstract_canon_sha256":"9ec9ccdeadbda093ba29308224755256300c3524026bfc72840d7b0924d8f806"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:47.762977Z","signature_b64":"L66J1D098dILH+pxmyg73KmXUvPzutNfexD9zC1glOhMAOByoUGo5gbqJ1TnbzJsEDRvgm3tFyPeQxAYPI2VDg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"c7b65969c0309c42d963e70dc758ab87abd9d741857088a373183e4422af9a11","last_reissued_at":"2026-05-17T23:38:47.762527Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:47.762527Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"ThetaEvolve: Test-time Learning on Open Problems","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A small open-source model learns to evolve programs at test time and sets new best-known bounds on open mathematical problems.","cross_cats":["cs.CL"],"primary_cat":"cs.LG","authors_text":"Baolin Peng, Eva Xu, Hao Cheng, Liliang Ren, Luyao Ma, Pengcheng He, Shao-Rong Su, Shuohang Wang, Simon Shaolei Du, Weizhu Chen, Xinyu Yang, Xuehai He, Yelong Shen, Yiping Wang, Zeyi Huang, Zhiyuan Zeng","submitted_at":"2025-11-28T18:58:14Z","abstract_excerpt":"Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure inference system that models cannot internalize the evolving strategies. We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time, allowing models to continually learn from their "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ThetaEvolve is the first evolving framework that enable a small open-source model, like DeepSeek-R1-0528-Qwen3-8B, to achieve new best-known bounds on open problems (circle packing and first auto-correlation inequality) mentioned in AlphaEvolve.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the observed improvements and cross-task transfer result from the model internalizing evolving strategies via RL rather than from increased total compute, specific hyperparameter choices, or the particular program database construction.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ThetaEvolve enables small open-source LLMs to achieve new best-known bounds on open problems such as circle packing by combining test-time RL with a large program database and lazy penalties.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A small open-source model learns to evolve programs at test time and sets new best-known bounds on open mathematical problems.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3842f90d0d3319dabd070036d119feb2f1279915fcec9b7db0297f48bae72566"},"source":{"id":"2511.23473","kind":"arxiv","version":1},"verdict":{"id":"739918ff-16ab-4cae-aca2-323c4cf152de","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T13:11:44.260084Z","strongest_claim":"ThetaEvolve is the first evolving framework that enable a small open-source model, like DeepSeek-R1-0528-Qwen3-8B, to achieve new best-known bounds on open problems (circle packing and first auto-correlation inequality) mentioned in AlphaEvolve.","one_line_summary":"ThetaEvolve enables small open-source LLMs to achieve new best-known bounds on open problems such as circle packing by combining test-time RL with a large program database and lazy penalties.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the observed improvements and cross-task transfer result from the model internalizing evolving strategies via RL rather than from increased total compute, specific hyperparameter choices, or the particular program database construction.","pith_extraction_headline":"A small open-source model learns to evolve programs at test time and sets new best-known bounds on open mathematical problems."},"references":{"count":50,"sample":[{"doi":"","year":2024,"title":"Spurious Rewards: Rethinking Training Signals in RLVR","work_id":"8e05ef02-44f0-41ce-aea5-d954f72e9546","ref_index":1,"cited_arxiv_id":"2506.10947","is_internal_anchor":true},{"doi":"","year":null,"title":"The optimal arrangement likely involves variable-sized circles","work_id":"bff7b668-0166-4454-b04d-50daa54823e4","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"A pure hexagonal arrangement may not be optimal due to edge effects","work_id":"f5600522-95d6-48a0-8634-64fbdd6e50e7","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The densest known circle packings often use a hybrid approach","work_id":"73a883d6-f05f-48ff-8c55-4d1e7a75029d","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The optimization routine is critically important - simple physics-based models with carefully tuned parameters","work_id":"2c52ab91-8842-4317-b2be-327600b2b6ed","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":50,"snapshot_sha256":"215da317724ce1ea2f2d5c5c04839142318f0da85434e2feb05b9f6becc80ee8","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"4e417a6da4d4f03e08a98973d726ddc4fbccee8c408d53a1a7d02269c7c2c5c6"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2511.23473","created_at":"2026-05-17T23:38:47.762602+00:00"},{"alias_kind":"arxiv_version","alias_value":"2511.23473v1","created_at":"2026-05-17T23:38:47.762602+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2511.23473","created_at":"2026-05-17T23:38:47.762602+00:00"},{"alias_kind":"pith_short_12","alias_value":"Y63FS2OAGCOE","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"Y63FS2OAGCOEFWLD","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"Y63FS2OA","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":21,"internal_anchor_count":21,"sample":[{"citing_arxiv_id":"2605.22613","citing_title":"Evolutionary Multi-Task Optimization for LLM-Guided Program Discovery","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22505","citing_title":"Towards Direct Evaluation of Harness Optimizers via Priority Ranking","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20086","citing_title":"What Do Evolutionary Coding Agents Evolve?","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2601.16175","citing_title":"Learning to Discover at Test Time","ref_index":78,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09018","citing_title":"Evolutionary Ensemble of Agents","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14477","citing_title":"Test-Time Learning with an Evolving Library","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08083","citing_title":"LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08678","citing_title":"MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI","ref_index":103,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10913","citing_title":"Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03808","citing_title":"Agentic-imodels: Evolving agentic interpretability tools via autoresearch","ref_index":54,"is_internal_anchor":true},{"citing_arxiv_id":"2604.25083","citing_title":"Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05193","citing_title":"Grokability in five inequalities","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19295","citing_title":"TEMPO: Scaling Test-time Training for Large Reasoning Models","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18607","citing_title":"TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07240","citing_title":"$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06566","citing_title":"AI-Driven Research for Databases","ref_index":80,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07039","citing_title":"PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08083","citing_title":"LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13552","citing_title":"Training-Free Test-Time Contrastive Learning for Large Language Models","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17708","citing_title":"Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization","ref_index":116,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19341","citing_title":"Evaluation-driven Scaling for Scientific Discovery","ref_index":153,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6","json":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6.json","graph_json":"https://pith.science/api/pith-number/Y63FS2OAGCOEFWLD44G4OWFLQ6/graph.json","events_json":"https://pith.science/api/pith-number/Y63FS2OAGCOEFWLD44G4OWFLQ6/events.json","paper":"https://pith.science/paper/Y63FS2OA"},"agent_actions":{"view_html":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6","download_json":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6.json","view_paper":"https://pith.science/paper/Y63FS2OA","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2511.23473&json=true","fetch_graph":"https://pith.science/api/pith-number/Y63FS2OAGCOEFWLD44G4OWFLQ6/graph.json","fetch_events":"https://pith.science/api/pith-number/Y63FS2OAGCOEFWLD44G4OWFLQ6/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6/action/timestamp_anchor","attest_storage":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6/action/storage_attestation","attest_author":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6/action/author_attestation","sign_citation":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6/action/citation_signature","submit_replication":"https://pith.science/pith/Y63FS2OAGCOEFWLD44G4OWFLQ6/action/replication_record"}},"created_at":"2026-05-17T23:38:47.762602+00:00","updated_at":"2026-05-17T23:38:47.762602+00:00"}