arxiv: 2510.03843 · v4 · submitted 2025-10-04 · 💻 cs.SE · cs.HC· cs.LG

Smart Paste: Automatically Fixing Copy/Paste for Google Developers

Vincent Nguyen , Guilherme Herzog , Jos\'e Cambronero , Marcus Revaj , Aditya Kini , Alexander Fr\"ommgen , Maxim Tabachnyk This is my paper

Pith reviewed 2026-05-18 10:16 UTC · model grok-4.3

classification 💻 cs.SE cs.HCcs.LG

keywords post-paste editsIDE code assistanceAI for developer toolscode editing automationenterprise software developmentmachine learning for code

0 comments

The pith

Smart Paste suggests automatic edits after code is pasted and now generates over 1% of all code at Google.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes building and deploying an IDE feature called Smart Paste that predicts and offers fixes for common issues after developers insert copied code. Pasting occurs four times more often than typing new code in Google's internal development, but usually demands extra work for formatting, names, style, and cross-language adjustments. The authors detail an iterative process that combines model training with attention to how suggestions appear in the editor and how they fit into existing tools. Deployment data shows a 45% acceptance rate, and the accepted changes represent a substantial share of total code produced across the company.

Core claim

We show how to iteratively develop and scale Smart Paste, an IDE feature for post-paste edit suggestions, to Google's development environment. Since deployment, Smart Paste has had overwhelmingly positive feedback with a 45% acceptance rate. At Google's enterprise scale, these accepted suggestions account substantially for over 1% of all code written company-wide.

What carries the argument

The deep learning model that predicts post-paste edits such as reformatting, variable renaming, and style adjustments, integrated into the IDE with user-facing suggestion handling.

Load-bearing premise

High acceptance rates and measured code volume directly indicate net productivity gains without hidden costs like suggestion fatigue or reduced code quality.

What would settle it

A measurement of total developer time spent on code tasks before and after the feature, or a check for changes in bug rates and maintenance effort in code that used the suggestions.

Figures

Figures reproduced from arXiv: 2510.03843 by Aditya Kini, Alexander Fr\"ommgen, Guilherme Herzog, Jos\'e Cambronero, Marcus Revaj, Maxim Tabachnyk, Vincent Nguyen.

**Figure 2.** Figure 2: A developer’s coding journey, reconstructed from File Snapshot and Edit Delta events. Our method identifies a Paste [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Task representation for the Smart Paste model using [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Our user interface renders suggestions as an inline [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The “auto-apply with hint” design automatically [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 9.** Figure 9: The inline “ghost text” pattern, adopted from code [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 7.** Figure 7: The side-by-side diff view that resulted from the [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: The Peek View suggestion window. Although ca [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 11.** Figure 11: Developers employed chains of pastes and Smart [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

read the original abstract

Manually editing pasted code is a long-standing developer pain point. In internal software development at Google, we observe that code is pasted 4 times more often than it is manually typed. These paste actions frequently require follow-up edits, ranging from simple reformatting and renaming to more complex style adjustments and cross-language translations. Prior work has shown deep learning can be used to predict these edits. In this work, we show how to iteratively develop and scale Smart Paste, an IDE feature for post-paste edit suggestions, to Google's development environment. This experience can serve as a guide for AI practitioners on a holistic approach to feature development, covering user experience, system integration, and model capabilities. Since deployment, Smart Paste has had overwhelmingly positive feedback with a 45% acceptance rate. At Google's enterprise scale, these accepted suggestions account substantially for over 1% of all code written company-wide.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Smart Paste reports a real Google deployment of post-paste edit suggestions with 45% acceptance and over 1% code volume impact, but the productivity claims rest on surface metrics without supporting controls.

read the letter

Smart Paste describes an IDE feature that suggests edits after developers paste code, built on deep learning models for edit prediction. The paper reports 45% acceptance since deployment and claims the accepted edits account for over 1% of all code written at Google. That deployment scale and the concrete numbers are the main things to take away here. The iterative process they used to develop and integrate the feature into Google's environment stands out as the useful contribution. They walk through user experience choices, system integration steps, and how they scaled the model capabilities over time. This gives a practical view of moving from prior edit-prediction research to something that runs in production for thousands of developers. The enterprise metrics add some weight because they come from actual usage rather than controlled experiments. The soft spots sit in the evaluation. Acceptance rate and code volume are straightforward to measure, but they do not automatically show net productivity gains. The paper gives no data on time spent reviewing suggestions, rejection reasons, suggestion fatigue, or whether the edits reduce or increase downstream bugs compared to manual fixes. Without baselines or controls for those factors, the positive feedback and volume numbers leave the overall benefit open to interpretation. This paper is for practitioners working on AI coding tools who want to see how one such system reached internal adoption at large scale. Readers focused on deployment stories and integration details will get the most from it. The work shows clear thinking about the full pipeline from model to feature, so it deserves a serious referee even if the impact measurements need tightening. I would recommend sending it for peer review to get feedback on the metrics and to check whether the scaling narrative holds up under external scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript describes the iterative development and deployment of Smart Paste, an IDE feature that applies deep learning to generate post-paste edit suggestions for code in Google's internal environment. It notes that pasting occurs four times more often than manual typing and reports that, after deployment, the feature received positive feedback with a 45% acceptance rate; accepted suggestions are claimed to account for over 1% of all code written company-wide. The work positions itself as a guide for AI practitioners on holistic feature development covering user experience, system integration, and model capabilities.

Significance. If the reported deployment metrics hold under scrutiny, the paper provides a useful large-scale case study on integrating predictive edit models into production developer tools. It demonstrates how such a feature can achieve measurable uptake at enterprise scale and offers practical lessons on scaling from research prototypes to widespread use. The emphasis on real-world outcomes rather than isolated model accuracy is a strength, though the absence of detailed evaluation methodology limits the ability to assess generalizability or net productivity impact.

major comments (2)

[Abstract] Abstract: The central claims of a 45% acceptance rate and >1% contribution to company-wide code volume are stated as observed results without any description of measurement methods, time window, definition of acceptance, controls for selection bias, or comparison to baseline paste/edit behavior. These metrics are load-bearing for the claim of substantial success and positive feedback.
[Abstract] The manuscript does not report data on rejection reasons, interaction overhead, downstream code quality effects, or controlled comparisons to manual editing, which are required to substantiate that the surface metrics reflect net productivity gains rather than hidden costs such as review fatigue or quality regressions.

minor comments (2)

Clarify the exact scope of 'all code written company-wide' (e.g., whether it includes only edited files or all commits) to avoid ambiguity in the 1% figure.
Provide a brief overview of the model architecture or training data sources in the main text, as the abstract mentions deep learning but leaves implementation details implicit.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive feedback, which helps clarify the presentation of our deployment results. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of a 45% acceptance rate and >1% contribution to company-wide code volume are stated as observed results without any description of measurement methods, time window, definition of acceptance, controls for selection bias, or comparison to baseline paste/edit behavior. These metrics are load-bearing for the claim of substantial success and positive feedback.

Authors: We agree that greater transparency on metric collection strengthens the paper. The revised manuscript now includes an expanded methods description specifying the observation window (post-deployment data from March 2023 through June 2024), the operational definition of acceptance (explicit user acceptance or persistence of the edit in the final committed code), and the aggregation method used to compute the >1% contribution (total lines introduced by accepted suggestions divided by total lines committed company-wide). Phased rollout across teams was used to reduce selection effects, though we cannot release the full internal statistical controls or baseline paste/edit logs for privacy reasons. revision: partial
Referee: [Abstract] The manuscript does not report data on rejection reasons, interaction overhead, downstream code quality effects, or controlled comparisons to manual editing, which are required to substantiate that the surface metrics reflect net productivity gains rather than hidden costs such as review fatigue or quality regressions.

Authors: We have added a new subsection summarizing internal survey responses on rejection reasons (primarily irrelevance or over-conservatism) and qualitative observations that interaction overhead remains low because suggestions are presented inline. We also now explicitly list the absence of randomized controlled trials and downstream quality metrics as a limitation of the work. A formal A/B comparison to manual editing was not feasible within the production deployment constraints and ethical guidelines governing tool rollouts at this scale. revision: partial

standing simulated objections not resolved

Quantitative results from controlled experiments measuring net productivity impact or downstream code quality effects

Circularity Check

0 steps flagged

No significant circularity in empirical deployment report

full rationale

The paper is an experience report on iteratively developing and deploying the Smart Paste IDE feature. It reports direct observational data including that code is pasted 4 times more often than typed, a 45% acceptance rate after deployment, and that accepted suggestions account for over 1% of company-wide code. These are presented as measured post-deployment outcomes and user feedback rather than as model predictions, fitted parameters renamed as results, or quantities derived from equations that reduce to the inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked to justify the central claims; the metrics function as independent external observations of the deployed system.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical systems and deployment report. It relies on standard assumptions from machine learning and user-experience practice but introduces no explicit free parameters, new entities, or non-standard axioms beyond the behavioral premise that users will accept useful suggestions.

axioms (1)

domain assumption User acceptance of suggestions and resulting code volume serve as valid proxies for feature value and productivity impact.
The central success metrics depend on this untested behavioral assumption about developer interaction with the tool.

pith-pipeline@v0.9.0 · 5703 in / 1379 out tokens · 42446 ms · 2026-05-18T10:16:03.078334+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Atomicity.lean atomic_tick unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We developed a rule-based method to identify paste and fix sequences from raw edit logs... 72% of all paste events receive a local fix.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The target output is a unidiff-style patch... multilingual training strategy successfully improved performance across all languages.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 4 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

SmartPaste: Learning to Adapt Source Code

Miltiadis Allamanis and Marc Brockschmidt. Smartpaste: Learning to adapt source code.arXiv preprint arXiv:1705.07867, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[3]

Amazon CodeWhisperer is now generally avail- able

Amazon Web Services. Amazon CodeWhisperer is now generally avail- able. https://aws.amazon.com/blogs/aws/amazon-codewhisperer-free-for- individual-use-is-now-generally-available/, April 2023. Accessed: 2025-09-24

work page 2023
[4]

Program fracture and recombination for efficient automatic code reuse

Peter Amidon, Eli Davis, Stelios Sidiroglou-Douskos, and Martin Rinard. Program fracture and recombination for efficient automatic code reuse. In2015 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–6. IEEE, 2015

work page 2015
[5]

Cursor: The AI-first Code Editor

Anysphere Inc. Cursor: The AI-first Code Editor. https://cursor.sh/. n.d

work page
[6]

Efficient Training of Language Models to Fill in the Middle

Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, and Mark Chen. Efficient training of language models to fill in the middle.arXiv preprint arXiv:2207.14255, 2022. Smart Paste: Automatically Fixing Copy/Paste for Google Developers

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Identifying the factors that influence trust in ai code completion

Adam Brown, Sarah D’Angelo, Ambar Murillo, Ciera Jaspan, and Collin Green. Identifying the factors that influence trust in ai code completion. InProceedings of the 1st ACM International Conference on AI-Powered Software, pages 1–9, 2024

work page 2024
[8]

Learning from examples to improve code completion systems

Marcel Bruch, Martin Monperrus, and Mira Mezini. Learning from examples to improve code completion systems. InProceedings of the 7th joint meeting of the European softw are engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pages 213–222, 2009

work page 2009
[9]

On multi-modal learning of editing source code

Saikat Chakraborty and Baishakhi Ray. On multi-modal learning of editing source code. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 443–455. IEEE, 2021

work page 2021
[10]

Multi-line ai-assisted code authoring

Omer Dunay, Daniel Cheng, Adam Tait, Parth Thakkar, Peter C Rigby, Andy Chiu, Imad Ahmad, Arun Ganesan, Chandra Maddila, Vijayaraghavan Murali, et al. Multi-line ai-assisted code authoring. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 150–160, 2024

work page 2024
[12]

GitHub Copilot is generally available to all developers

GitHub. GitHub Copilot is generally available to all developers. https://github. blog/2022-06-21-github-copilot-is-generally-available-to-all-developers/, June

work page 2022
[13]

Accessed: 2025-09-24

work page 2025
[14]

Duet AI in Google Cloud is now generally avail- able

Google Cloud. Duet AI in Google Cloud is now generally avail- able. https://cloud.google.com/blog/products/application-modernization/ introducing-duet-ai-for-google-cloud, May 2023. Accessed: 2025-09-24]

work page 2023
[15]

Diff, patch, and friends.Linux Journal, 1996(28es):2–es, 1996

Michael K Johnson. Diff, patch, and friends.Linux Journal, 1996(28es):2–es, 1996

work page 1996
[16]

Adaptivepaste: Intelligent copy-paste in ide

Xiaoyu Liu, Jinu Jang, Neel Sundaresan, Miltiadis Allamanis, and Alexey Svy- atkovskiy. Adaptivepaste: Intelligent copy-paste in ide. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1844–1854, 2023

work page 2023
[17]

Navigate and edit c# - visual studio code

Microsoft. Navigate and edit c# - visual studio code. https://code.visualstudio. com/docs/csharp/navigate-edit#_peek-definition. Section: Peek Definition. Ac- cessed: 2025-09-18

work page 2025
[18]

Refactoring - visual studio code

Microsoft. Refactoring - visual studio code. https://code.visualstudio.com/docs/ editing/refactoring, Sep 2025. Accessed: 2025-09-18

work page 2025
[19]

Prompting llms for code editing: Struggles and remedies.arXiv preprint arXiv:2504.20196, 2025

Daye Nam, Ahmed Omran, Ambar Murillo, Saksham Thakur, Abner Araujo, Marcel Blistein, Alexander Frömmgen, Vincent Hellendoorn, and Satish Chan- dra. Prompting llms for code editing: Struggles and remedies.arXiv preprint arXiv:2504.20196, 2025

work page arXiv 2025
[20]

Type-directed completion of partial expressions

Daniel Perelman, Sumit Gulwani, Thomas Ball, and Dan Grossman. Type-directed completion of partial expressions. InProceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, pages 275–286, 2012

work page 2012
[21]

chrF: character n-gram F-score for automatic MT evaluation

Maja Popović. chrF: character n-gram F-score for automatic MT evaluation. In Ondřej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina, editors,Proceed- ings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal, September 2015. Association for Co...

work page 2015
[22]

Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

work page 2020
[23]

Detecting and characterizing semantic inconsistencies in ported code

Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. Detecting and characterizing semantic inconsistencies in ported code. In2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 367–377. IEEE, 2013

work page 2013
[24]

Code completion with statis- tical language models

Veselin Raychev, Martin Vechev, and Eran Yahav. Code completion with statis- tical language models. InProceedings of the 35th ACM SIGPLAN conference on programming language design and implementation, pages 419–428, 2014

work page 2014
[25]

Codecarboncopy

Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin Rinard. Codecarboncopy. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 95–105, 2017

work page 2017
[26]

Pythia: Ai-assisted code completion system

Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. Pythia: Ai-assisted code completion system. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2727–2735, 2019

work page 2019
[27]

Ml-enhanced code completion im- proves developer productivity

Maxim Tabachnyk and Stoyan Nikolov. Ml-enhanced code completion im- proves developer productivity. https://research.google/blog/ml-enhanced-code- completion-improves-developer-productivity/, Jul 2022. Accessed: 2025-09-18

work page 2022
[28]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Why and how javascript developers use linters

Kristín Fjóla Tómasdóttir, Mauricio Aniche, and Arie Van Deursen. Why and how javascript developers use linters. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 578–589. IEEE, 2017

work page 2017
[30]

On learning meaningful code changes via neural machine translation

Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. On learning meaningful code changes via neural machine translation. In2019 IEEE/ACM 41st International Conference on Software Engi- neering (ICSE), pages 25–36. IEEE, 2019

work page 2019
[31]

Code suggestions powered by everything you’ve done

Windsurf. Code suggestions powered by everything you’ve done. https:// windsurf.com/tab. n.d

work page
[32]

Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

work page 2024