{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:UCH5PWOF7QIHGFDOPYMVRW2FTO","short_pith_number":"pith:UCH5PWOF","schema_version":"1.0","canonical_sha256":"a08fd7d9c5fc1073146e7e1958db459bb4659d19bbbd7e45013838433479baa2","source":{"kind":"arxiv","id":"2312.13010","version":3},"attestation_state":"computed","paper":{"title":"AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A multi-agent system divides code generation among three agents to reach higher accuracy with lower token cost than single models.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Dong Huang, Heming Cui, Jie M.Zhang, Michael Luck, Qingwen Bu, Yuhao Qing","submitted_at":"2023-12-20T13:22:41Z","abstract_excerpt":"The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized a"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2312.13010","kind":"arxiv","version":3},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2023-12-20T13:22:41Z","cross_cats_sorted":[],"title_canon_sha256":"5a6bba019c25f78396c3860172ae82837bc1075766dc395859feffa808ad737f","abstract_canon_sha256":"f82e5695b3ec0d0a1c717f6d512cde914ec019994f33804d782a92a621d4e1c0"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:53.608635Z","signature_b64":"PQgk4bChuxIdfZBZx2aVGNTqQIfcnOJmMc/Jc7hOtP70YtynZS/xmtgP++EC1HelTzfUumh5F1ENiMGV4QCqBw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"a08fd7d9c5fc1073146e7e1958db459bb4659d19bbbd7e45013838433479baa2","last_reissued_at":"2026-05-17T23:38:53.608051Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:53.608051Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A multi-agent system divides code generation among three agents to reach higher accuracy with lower token cost than single models.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Dong Huang, Heming Cui, Jie M.Zhang, Michael Luck, Qingwen Bu, Yuhao Qing","submitted_at":"2023-12-20T13:22:41Z","abstract_excerpt":"The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized a"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"AgentCoder (GPT-4) achieves 96.3% and 91.8% pass@1 in HumanEval and MBPP datasets with an overall token overhead of 56.9K and 66.3K, while state-of-the-art obtains only 90.2% and 78.9% pass@1 with an overall token overhead of 138.2K and 206.5K.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That iterative feedback from the test executor agent reliably improves code quality without introducing new errors or causing the programmer agent to overfit to the generated tests.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A multi-agent system divides code generation among three agents to reach higher accuracy with lower token cost than single models.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"d7d147f6cd672ca4dadde590e6d51339b9f2d41662ec74d9b2ae391cf9b2e2ac"},"source":{"id":"2312.13010","kind":"arxiv","version":3},"verdict":{"id":"c49bfe8b-e8db-44f8-a272-b9990c4cec5d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T03:53:52.500420Z","strongest_claim":"AgentCoder (GPT-4) achieves 96.3% and 91.8% pass@1 in HumanEval and MBPP datasets with an overall token overhead of 56.9K and 66.3K, while state-of-the-art obtains only 90.2% and 78.9% pass@1 with an overall token overhead of 138.2K and 206.5K.","one_line_summary":"A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That iterative feedback from the test executor agent reliably improves code quality without introducing new errors or causing the programmer agent to overfit to the generated tests.","pith_extraction_headline":"A multi-agent system divides code generation among three agents to reach higher accuracy with lower token cost than single models."},"references":{"count":54,"sample":[{"doi":"","year":2021,"title":"Unified pre-training for program understanding and generation.arXiv preprint","work_id":"92715367-2460-4833-ac02-f642b8a19c59","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","ref_index":2,"cited_arxiv_id":"2108.07732","is_internal_anchor":true},{"doi":"","year":2005,"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","ref_index":4,"cited_arxiv_id":"2005.14165","is_internal_anchor":true},{"doi":"","year":2023,"title":"arXiv preprint arXiv:2305.17126 , year=","work_id":"1447b78e-0a79-4af6-8cd4-93220e680d2b","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","ref_index":7,"cited_arxiv_id":"2107.03374","is_internal_anchor":true}],"resolved_work":54,"snapshot_sha256":"48be6ac9592ab27d7ac76ca1d4bded7b7e6ad7415ddccbb28045f6c495e83149","internal_anchors":17},"formal_canon":{"evidence_count":3,"snapshot_sha256":"6a16cf67c9bc1bf6a7e196afe950083b1cef270d27c523ea83dba8ecafc59db4"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2312.13010","created_at":"2026-05-17T23:38:53.608144+00:00"},{"alias_kind":"arxiv_version","alias_value":"2312.13010v3","created_at":"2026-05-17T23:38:53.608144+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2312.13010","created_at":"2026-05-17T23:38:53.608144+00:00"},{"alias_kind":"pith_short_12","alias_value":"UCH5PWOF7QIH","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"UCH5PWOF7QIHGFDO","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"UCH5PWOF","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":40,"internal_anchor_count":40,"sample":[{"citing_arxiv_id":"2605.23273","citing_title":"Self-Refining Topology Optimization via an LLM-Based Multi-Agent Framework","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2404.01535","citing_title":"Assessing, Exploiting, and Mitigating Syntactic Robustness Failures in LLM-Based Code Generation","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2410.07095","citing_title":"MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2412.04590","citing_title":"Specification-Driven Code Translation Powered by Large Language Models: How Far Are We?","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2510.03879","citing_title":"Adversarial Agent Collaboration for Correctness Improvements of C to Safe Rust Translation","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18747","citing_title":"Code as Agent Harness","ref_index":50,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19140","citing_title":"Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18073","citing_title":"A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17535","citing_title":"AgentModernize: Preserving Business Logic in Legacy Modernization with Multi-Agent LLMs and Behavioral Specification Graphs","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15301","citing_title":"Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2506.02954","citing_title":"Mutation-Guided Unit Test Generation with a Large Language Model","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2506.18315","citing_title":"Effective LLM Code Refinement via Property-Oriented and Structurally Minimal Feedback","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2510.06452","citing_title":"Code Semantic Zooming","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2510.08804","citing_title":"MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2404.08144","citing_title":"LLM Agents can Autonomously Exploit One-day Vulnerabilities","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2409.02977","citing_title":"Large Language Model-Based Agents for Software Engineering: A Survey","ref_index":109,"is_internal_anchor":true},{"citing_arxiv_id":"2512.18470","citing_title":"SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16314","citing_title":"Software Self-Extension with SelfEvolve: an Agentic Architecture for Runtime Code Generation","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2508.07407","citing_title":"A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16321","citing_title":"LLM-Based Multi-Agent Systems for Code Generation: A Multi-Vocal Literature Review","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14445","citing_title":"FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2507.21046","citing_title":"A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence","ref_index":163,"is_internal_anchor":true},{"citing_arxiv_id":"2603.28653","citing_title":"BACE: LLM-based Code Generation through Bayesian Anchored Co-Evolution of Code and Test Populations","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2406.00515","citing_title":"A Survey on Large Language Models for Code Generation","ref_index":106,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27647","citing_title":"Tail-aware N-version Machine Learning Models for Reliable API Recommendation","ref_index":12,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":3,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO","json":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO.json","graph_json":"https://pith.science/api/pith-number/UCH5PWOF7QIHGFDOPYMVRW2FTO/graph.json","events_json":"https://pith.science/api/pith-number/UCH5PWOF7QIHGFDOPYMVRW2FTO/events.json","paper":"https://pith.science/paper/UCH5PWOF"},"agent_actions":{"view_html":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO","download_json":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO.json","view_paper":"https://pith.science/paper/UCH5PWOF","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2312.13010&json=true","fetch_graph":"https://pith.science/api/pith-number/UCH5PWOF7QIHGFDOPYMVRW2FTO/graph.json","fetch_events":"https://pith.science/api/pith-number/UCH5PWOF7QIHGFDOPYMVRW2FTO/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO/action/timestamp_anchor","attest_storage":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO/action/storage_attestation","attest_author":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO/action/author_attestation","sign_citation":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO/action/citation_signature","submit_replication":"https://pith.science/pith/UCH5PWOF7QIHGFDOPYMVRW2FTO/action/replication_record"}},"created_at":"2026-05-17T23:38:53.608144+00:00","updated_at":"2026-05-17T23:38:53.608144+00:00"}