{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2022:7IXCF2WFBBTORRCEUPMRCI32ES","short_pith_number":"pith:7IXCF2WF","schema_version":"1.0","canonical_sha256":"fa2e22eac50866e8c444a3d911237a24b1ee6dbc5789e025bddbf28d98cc43ad","source":{"kind":"arxiv","id":"2204.05999","version":3},"attestation_state":"computed","paper":{"title":"InCoder: A Generative Model for Code Infilling and Synthesis","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"InCoder is a single generative model that performs both left-to-right code synthesis and zero-shot infilling of masked regions using bidirectional context.","cross_cats":["cs.CL","cs.LG"],"primary_cat":"cs.SE","authors_text":"Armen Aghajanyan, Daniel Fried, Eric Wallace, Freda Shi, Jessy Lin, Luke Zettlemoyer, Mike Lewis, Ruiqi Zhong, Sida Wang, Wen-tau Yih","submitted_at":"2022-04-12T16:25:26Z","abstract_excerpt":"Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on c"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2204.05999","kind":"arxiv","version":3},"metadata":{"license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","primary_cat":"cs.SE","submitted_at":"2022-04-12T16:25:26Z","cross_cats_sorted":["cs.CL","cs.LG"],"title_canon_sha256":"e0ee2b0588f2b16181eaf033db6715dd9f9f025488c5b7fc3381446517c0c296","abstract_canon_sha256":"2cc00f957c4140489f550860c56efe9b4b81d0daf5555f1e30c50f11d894d679"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:49.395599Z","signature_b64":"EEUUd/dGAhWs6GQLc+ar8hZLg5/8h4OdtiGlhCYYRbKxip6Vpdo5lMsBuFfPUDbqsRUCR3BdURfUzToz3fmlBA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"fa2e22eac50866e8c444a3d911237a24b1ee6dbc5789e025bddbf28d98cc43ad","last_reissued_at":"2026-05-17T23:38:49.395076Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:49.395076Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"InCoder: A Generative Model for Code Infilling and Synthesis","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"InCoder is a single generative model that performs both left-to-right code synthesis and zero-shot infilling of masked regions using bidirectional context.","cross_cats":["cs.CL","cs.LG"],"primary_cat":"cs.SE","authors_text":"Armen Aghajanyan, Daniel Fried, Eric Wallace, Freda Shi, Jessy Lin, Luke Zettlemoyer, Mike Lewis, Ruiqi Zhong, Sida Wang, Wen-tau Yih","submitted_at":"2022-04-12T16:25:26Z","abstract_excerpt":"Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on c"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That randomly masking and appending code regions during training produces a model whose infilling behavior generalizes to realistic editing scenarios without task-specific fine-tuning or data leakage from the test distributions.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"InCoder is the first generative model to directly perform zero-shot code infilling via bidirectional context from a masked-then-appended training scheme, matching left-to-right models on synthesis while improving on type inference, comment generation, and variable renaming.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"InCoder is a single generative model that performs both left-to-right code synthesis and zero-shot infilling of masked regions using bidirectional context.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8c9e6da3f40c2b0afa227c8bd9d0a38b30ad580b77630f6195dc345076429f36"},"source":{"id":"2204.05999","kind":"arxiv","version":3},"verdict":{"id":"347c4d27-e1e0-4631-8944-d7e428d093fb","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T02:16:09.357739Z","strongest_claim":"Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.","one_line_summary":"InCoder is the first generative model to directly perform zero-shot code infilling via bidirectional context from a masked-then-appended training scheme, matching left-to-right models on synthesis while improving on type inference, comment generation, and variable renaming.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That randomly masking and appending code regions during training produces a model whose infilling behavior generalizes to realistic editing scenarios without task-specific fine-tuning or data leakage from the test distributions.","pith_extraction_headline":"InCoder is a single generative model that performs both left-to-right code synthesis and zero-shot infilling of masked regions using bidirectional context."},"references":{"count":38,"sample":[{"doi":"","year":null,"title":"Cm3: A causal masked multimodal model of the internet","work_id":"a4a6d3b6-13f5-437f-8081-765dd23198b9","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"V ., Du, J., Iyer, S., Pasunuru, R., et al","work_id":"711d98e0-b9e1-4350-b81d-a621de85a6bb","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","ref_index":3,"cited_arxiv_id":"2108.07732","is_internal_anchor":true},{"doi":"","year":null,"title":"Efficient training of language models to fill in the middle","work_id":"54afe4f8-4d93-4829-99ae-2a27143a9641","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"AutoPandas: neural- backed generators for program synthesis","work_id":"dcc366a2-4c1d-4c87-8bc3-e5a32f359e92","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":38,"snapshot_sha256":"fc8732cd91812d4311a43d791c5212a8ed60b6aa409d78d098cf8c654fa0d2b9","internal_anchors":13},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b6f68c40206c31e639174b4e2befa8fb6023bb17c5bd43181cae598d9743d856"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2204.05999","created_at":"2026-05-17T23:38:49.395162+00:00"},{"alias_kind":"arxiv_version","alias_value":"2204.05999v3","created_at":"2026-05-17T23:38:49.395162+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2204.05999","created_at":"2026-05-17T23:38:49.395162+00:00"},{"alias_kind":"pith_short_12","alias_value":"7IXCF2WFBBTO","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"7IXCF2WFBBTORRCE","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"7IXCF2WF","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2402.01411","citing_title":"CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2503.02497","citing_title":"A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21984","citing_title":"Echo: Learning from Experience Data via User-Driven Refinement","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19102","citing_title":"Prompt Optimization for LLM Code Generation via Reinforcement Learning","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2305.07922","citing_title":"CodeT5+: Open Code Large Language Models for Code Understanding and Generation","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2207.14255","citing_title":"Efficient Training of Language Models to Fill in the Middle","ref_index":111,"is_internal_anchor":true},{"citing_arxiv_id":"2207.14255","citing_title":"Efficient Training of Language Models to Fill in the Middle","ref_index":62,"is_internal_anchor":true},{"citing_arxiv_id":"2207.10397","citing_title":"CodeT: Code Generation with Generated Tests","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2306.03091","citing_title":"RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2304.01373","citing_title":"Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling","ref_index":176,"is_internal_anchor":true},{"citing_arxiv_id":"2312.13010","citing_title":"AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02702","citing_title":"TypePro: Boosting LLM-Based Type Inference via Inter-Procedural Slicing","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2406.00515","citing_title":"A Survey on Large Language Models for Code Generation","ref_index":77,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08744","citing_title":"MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2211.05100","citing_title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model","ref_index":232,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00413","citing_title":"ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real Programs","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02944","citing_title":"Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18525","citing_title":"Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2305.06161","citing_title":"StarCoder: may the source be with you!","ref_index":130,"is_internal_anchor":true},{"citing_arxiv_id":"2604.05753","citing_title":"An End-to-End Approach for Fixing Concurrency Bugs via SHB-Based Context Extractor","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04332","citing_title":"EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2403.07974","citing_title":"LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code","ref_index":252,"is_internal_anchor":true},{"citing_arxiv_id":"2401.14196","citing_title":"DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2604.15385","citing_title":"Prompt-Driven Code Summarization: A Systematic Literature Review","ref_index":97,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17351","citing_title":"SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization","ref_index":74,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES","json":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES.json","graph_json":"https://pith.science/api/pith-number/7IXCF2WFBBTORRCEUPMRCI32ES/graph.json","events_json":"https://pith.science/api/pith-number/7IXCF2WFBBTORRCEUPMRCI32ES/events.json","paper":"https://pith.science/paper/7IXCF2WF"},"agent_actions":{"view_html":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES","download_json":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES.json","view_paper":"https://pith.science/paper/7IXCF2WF","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2204.05999&json=true","fetch_graph":"https://pith.science/api/pith-number/7IXCF2WFBBTORRCEUPMRCI32ES/graph.json","fetch_events":"https://pith.science/api/pith-number/7IXCF2WFBBTORRCEUPMRCI32ES/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES/action/timestamp_anchor","attest_storage":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES/action/storage_attestation","attest_author":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES/action/author_attestation","sign_citation":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES/action/citation_signature","submit_replication":"https://pith.science/pith/7IXCF2WFBBTORRCEUPMRCI32ES/action/replication_record"}},"created_at":"2026-05-17T23:38:49.395162+00:00","updated_at":"2026-05-17T23:38:49.395162+00:00"}