{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:DMR7KHEBBJSJOQL6E7ISWMVEK4","short_pith_number":"pith:DMR7KHEB","schema_version":"1.0","canonical_sha256":"1b23f51c810a6497417e27d12b32a45714c44fd7a3a0067bb6c494ee1ee46572","source":{"kind":"arxiv","id":"2605.14445","version":1},"attestation_state":"computed","paper":{"title":"FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"An automated system evolves closed-ended competitive programming tasks into open-ended coding problems and uses the resulting data to train stronger LLM coders.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Alex Dimakis, Alvin Cheung, Bo Peng, Hanchen Li, Huanzhi Mao, Jingbo Shang, Joseph E. Gonzalez, Kaiyuan Liu, Lufeng Cheng, Qiuyang Mang, Qizheng Zhang, Runyuan He, Shang Zhou, Tianfu Fu, Wenhao Chai, Yichuan Wang, Zerui Li","submitted_at":"2026-05-14T06:39:42Z","abstract_excerpt":"Many real-world coding challenges are open-ended and admit no known optimal solution. Yet, recent progress in LLM coding has focused on well-defined tasks such as feature implementation, bug fixing, and competitive programming. Open-ended coding remains a weak spot for LLMs, largely because open-ended training problems are scarce and expensive to construct. Our goal is to synthesize open-ended coding problems at scale to train stronger LLM coders. We introduce FrontierSmith, an automated system for iteratively evolving open-ended problems from existing closed-ended coding tasks. Starting from "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.14445","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T06:39:42Z","cross_cats_sorted":[],"title_canon_sha256":"b21fd42d8f615d8f6a2477476abf9cf330659f87a65c72cf4be33b55f6263dfa","abstract_canon_sha256":"90cea9fd00b568405153a47deeda6843b2d03473e5db4c23554a02208472e9fb"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:06.966676Z","signature_b64":"L66CgnmwU9jBhiQnnF2ri5sKnncTaxW2UHH5ZtXP8EaL2TbWnszwkq5KF4Q1AuY1WjaBIom0U9xy5+pfbLlsCg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"1b23f51c810a6497417e27d12b32a45714c44fd7a3a0067bb6c494ee1ee46572","last_reissued_at":"2026-05-17T23:39:06.965948Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:06.965948Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"An automated system evolves closed-ended competitive programming tasks into open-ended coding problems and uses the resulting data to train stronger LLM coders.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Alex Dimakis, Alvin Cheung, Bo Peng, Hanchen Li, Huanzhi Mao, Jingbo Shang, Joseph E. Gonzalez, Kaiyuan Liu, Lufeng Cheng, Qiuyang Mang, Qizheng Zhang, Runyuan He, Shang Zhou, Tianfu Fu, Wenhao Chai, Yichuan Wang, Zerui Li","submitted_at":"2026-05-14T06:39:42Z","abstract_excerpt":"Many real-world coding challenges are open-ended and admit no known optimal solution. Yet, recent progress in LLM coding has focused on well-defined tasks such as feature implementation, bug fixing, and competitive programming. Open-ended coding remains a weak spot for LLMs, largely because open-ended training problems are scarce and expensive to construct. Our goal is to synthesize open-ended coding problems at scale to train stronger LLM coders. We introduce FrontierSmith, an automated system for iteratively evolving open-ended problems from existing closed-ended coding tasks. Starting from "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"training on our synthesized data yields substantial gains over the base models: Qwen3.5-9B improves by +8.82 score on FrontierCS and +306.36 (Elo-rating-based performance) on ALE-bench; Qwen3.5-27B improves by +12.12 and +309.12, respectively.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The quantitative idea divergence metric reliably selects problems that elicit genuinely diverse solution approaches from different solvers, and the automatically generated test cases and verifiers are sufficiently robust to support training.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"An automated system evolves closed-ended competitive programming tasks into open-ended coding problems and uses the resulting data to train stronger LLM coders.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"67ec084d7893955fc6ce584256bd405a17fe87b5c26df6e213462df7db80d9aa"},"source":{"id":"2605.14445","kind":"arxiv","version":1},"verdict":{"id":"ad572810-6ab0-457f-b3fe-940f0093e04d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T01:58:18.730852Z","strongest_claim":"training on our synthesized data yields substantial gains over the base models: Qwen3.5-9B improves by +8.82 score on FrontierCS and +306.36 (Elo-rating-based performance) on ALE-bench; Qwen3.5-27B improves by +12.12 and +309.12, respectively.","one_line_summary":"FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The quantitative idea divergence metric reliably selects problems that elicit genuinely diverse solution approaches from different solvers, and the automatically generated test cases and verifiers are sufficiently robust to support training.","pith_extraction_headline":"An automated system evolves closed-ended competitive programming tasks into open-ended coding problems and uses the resulting data to train stronger LLM coders."},"references":{"count":43,"sample":[{"doi":"","year":2026,"title":"Bengt Aspvall, Michael F Plass, and Robert Endre Tarjan","work_id":"40c0c872-ed4c-4695-95f5-896dc1a2b97b","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Swe-rebench: An automated pipeline for task collection and decontaminated evaluation of software engineering agents","work_id":"c819f5fe-32b6-41d1-aeb5-2d80c6fa8474","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Scaling Self-Play with Self-Guidance","work_id":"3c86499b-f604-41f2-be6f-e77ba71785c7","ref_index":3,"cited_arxiv_id":"2604.20209","is_internal_anchor":true},{"doi":"","year":null,"title":"K-search: Llm kernel generation via Page 62 of 110 Evaluation-driven Scaling for Scientific Discovery co-evolving intrinsic world model","work_id":"6312f515-a452-4853-801a-b99c6219533b","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Adae- volve: Adaptive llm driven zeroth-order optimization","work_id":"284cc2bd-522c-4674-99da-c07e54d2afff","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":43,"snapshot_sha256":"096a30fcad06e50f530935e81e392b3fc73897741b8c3a14090c73a2141ca6fa","internal_anchors":15},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.14445","created_at":"2026-05-17T23:39:06.966049+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.14445v1","created_at":"2026-05-17T23:39:06.966049+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14445","created_at":"2026-05-17T23:39:06.966049+00:00"},{"alias_kind":"pith_short_12","alias_value":"DMR7KHEBBJSJ","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"DMR7KHEBBJSJOQL6","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"DMR7KHEB","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4","json":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4.json","graph_json":"https://pith.science/api/pith-number/DMR7KHEBBJSJOQL6E7ISWMVEK4/graph.json","events_json":"https://pith.science/api/pith-number/DMR7KHEBBJSJOQL6E7ISWMVEK4/events.json","paper":"https://pith.science/paper/DMR7KHEB"},"agent_actions":{"view_html":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4","download_json":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4.json","view_paper":"https://pith.science/paper/DMR7KHEB","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.14445&json=true","fetch_graph":"https://pith.science/api/pith-number/DMR7KHEBBJSJOQL6E7ISWMVEK4/graph.json","fetch_events":"https://pith.science/api/pith-number/DMR7KHEBBJSJOQL6E7ISWMVEK4/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4/action/timestamp_anchor","attest_storage":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4/action/storage_attestation","attest_author":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4/action/author_attestation","sign_citation":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4/action/citation_signature","submit_replication":"https://pith.science/pith/DMR7KHEBBJSJOQL6E7ISWMVEK4/action/replication_record"}},"created_at":"2026-05-17T23:39:06.966049+00:00","updated_at":"2026-05-17T23:39:06.966049+00:00"}