{"paper":{"title":"InCoder: A Generative Model for Code Infilling and Synthesis","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"InCoder is a single generative model that performs both left-to-right code synthesis and zero-shot infilling of masked regions using bidirectional context.","cross_cats":["cs.CL","cs.LG"],"primary_cat":"cs.SE","authors_text":"Armen Aghajanyan, Daniel Fried, Eric Wallace, Freda Shi, Jessy Lin, Luke Zettlemoyer, Mike Lewis, Ruiqi Zhong, Sida Wang, Wen-tau Yih","submitted_at":"2022-04-12T16:25:26Z","abstract_excerpt":"Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on c"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That randomly masking and appending code regions during training produces a model whose infilling behavior generalizes to realistic editing scenarios without task-specific fine-tuning or data leakage from the test distributions.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"InCoder is the first generative model to directly perform zero-shot code infilling via bidirectional context from a masked-then-appended training scheme, matching left-to-right models on synthesis while improving on type inference, comment generation, and variable renaming.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"InCoder is a single generative model that performs both left-to-right code synthesis and zero-shot infilling of masked regions using bidirectional context.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8c9e6da3f40c2b0afa227c8bd9d0a38b30ad580b77630f6195dc345076429f36"},"source":{"id":"2204.05999","kind":"arxiv","version":3},"verdict":{"id":"347c4d27-e1e0-4631-8944-d7e428d093fb","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T02:16:09.357739Z","strongest_claim":"Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.","one_line_summary":"InCoder is the first generative model to directly perform zero-shot code infilling via bidirectional context from a masked-then-appended training scheme, matching left-to-right models on synthesis while improving on type inference, comment generation, and variable renaming.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That randomly masking and appending code regions during training produces a model whose infilling behavior generalizes to realistic editing scenarios without task-specific fine-tuning or data leakage from the test distributions.","pith_extraction_headline":"InCoder is a single generative model that performs both left-to-right code synthesis and zero-shot infilling of masked regions using bidirectional context."},"references":{"count":38,"sample":[{"doi":"","year":null,"title":"Cm3: A causal masked multimodal model of the internet","work_id":"a4a6d3b6-13f5-437f-8081-765dd23198b9","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"V ., Du, J., Iyer, S., Pasunuru, R., et al","work_id":"711d98e0-b9e1-4350-b81d-a621de85a6bb","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","ref_index":3,"cited_arxiv_id":"2108.07732","is_internal_anchor":true},{"doi":"","year":null,"title":"Efficient training of language models to fill in the middle","work_id":"54afe4f8-4d93-4829-99ae-2a27143a9641","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"AutoPandas: neural- backed generators for program synthesis","work_id":"dcc366a2-4c1d-4c87-8bc3-e5a32f359e92","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":38,"snapshot_sha256":"fc8732cd91812d4311a43d791c5212a8ed60b6aa409d78d098cf8c654fa0d2b9","internal_anchors":13},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b6f68c40206c31e639174b4e2befa8fb6023bb17c5bd43181cae598d9743d856"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}