pith. machine review for the scientific record. sign in

arxiv: 2510.03843 · v4 · submitted 2025-10-04 · 💻 cs.SE · cs.HC· cs.LG

Smart Paste: Automatically Fixing Copy/Paste for Google Developers

Pith reviewed 2026-05-18 10:16 UTC · model grok-4.3

classification 💻 cs.SE cs.HCcs.LG
keywords post-paste editsIDE code assistanceAI for developer toolscode editing automationenterprise software developmentmachine learning for code
0
0 comments X

The pith

Smart Paste suggests automatic edits after code is pasted and now generates over 1% of all code at Google.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes building and deploying an IDE feature called Smart Paste that predicts and offers fixes for common issues after developers insert copied code. Pasting occurs four times more often than typing new code in Google's internal development, but usually demands extra work for formatting, names, style, and cross-language adjustments. The authors detail an iterative process that combines model training with attention to how suggestions appear in the editor and how they fit into existing tools. Deployment data shows a 45% acceptance rate, and the accepted changes represent a substantial share of total code produced across the company.

Core claim

We show how to iteratively develop and scale Smart Paste, an IDE feature for post-paste edit suggestions, to Google's development environment. Since deployment, Smart Paste has had overwhelmingly positive feedback with a 45% acceptance rate. At Google's enterprise scale, these accepted suggestions account substantially for over 1% of all code written company-wide.

What carries the argument

The deep learning model that predicts post-paste edits such as reformatting, variable renaming, and style adjustments, integrated into the IDE with user-facing suggestion handling.

Load-bearing premise

High acceptance rates and measured code volume directly indicate net productivity gains without hidden costs like suggestion fatigue or reduced code quality.

What would settle it

A measurement of total developer time spent on code tasks before and after the feature, or a check for changes in bug rates and maintenance effort in code that used the suggestions.

Figures

Figures reproduced from arXiv: 2510.03843 by Aditya Kini, Alexander Fr\"ommgen, Guilherme Herzog, Jos\'e Cambronero, Marcus Revaj, Maxim Tabachnyk, Vincent Nguyen.

Figure 1
Figure 1. Figure 1: Smart Paste monitors a developer’s IDE activity, so [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A developer’s coding journey, reconstructed from File Snapshot and Edit Delta events. Our method identifies a Paste [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Task representation for the Smart Paste model using [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Our user interface renders suggestions as an inline [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The “auto-apply with hint” design automatically [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 9
Figure 9. Figure 9: The inline “ghost text” pattern, adopted from code [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 7
Figure 7. Figure 7: The side-by-side diff view that resulted from the [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The Peek View suggestion window. Although ca [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 11
Figure 11. Figure 11: Developers employed chains of pastes and Smart [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
read the original abstract

Manually editing pasted code is a long-standing developer pain point. In internal software development at Google, we observe that code is pasted 4 times more often than it is manually typed. These paste actions frequently require follow-up edits, ranging from simple reformatting and renaming to more complex style adjustments and cross-language translations. Prior work has shown deep learning can be used to predict these edits. In this work, we show how to iteratively develop and scale Smart Paste, an IDE feature for post-paste edit suggestions, to Google's development environment. This experience can serve as a guide for AI practitioners on a holistic approach to feature development, covering user experience, system integration, and model capabilities. Since deployment, Smart Paste has had overwhelmingly positive feedback with a 45% acceptance rate. At Google's enterprise scale, these accepted suggestions account substantially for over 1% of all code written company-wide.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes the iterative development and deployment of Smart Paste, an IDE feature that applies deep learning to generate post-paste edit suggestions for code in Google's internal environment. It notes that pasting occurs four times more often than manual typing and reports that, after deployment, the feature received positive feedback with a 45% acceptance rate; accepted suggestions are claimed to account for over 1% of all code written company-wide. The work positions itself as a guide for AI practitioners on holistic feature development covering user experience, system integration, and model capabilities.

Significance. If the reported deployment metrics hold under scrutiny, the paper provides a useful large-scale case study on integrating predictive edit models into production developer tools. It demonstrates how such a feature can achieve measurable uptake at enterprise scale and offers practical lessons on scaling from research prototypes to widespread use. The emphasis on real-world outcomes rather than isolated model accuracy is a strength, though the absence of detailed evaluation methodology limits the ability to assess generalizability or net productivity impact.

major comments (2)
  1. [Abstract] Abstract: The central claims of a 45% acceptance rate and >1% contribution to company-wide code volume are stated as observed results without any description of measurement methods, time window, definition of acceptance, controls for selection bias, or comparison to baseline paste/edit behavior. These metrics are load-bearing for the claim of substantial success and positive feedback.
  2. [Abstract] The manuscript does not report data on rejection reasons, interaction overhead, downstream code quality effects, or controlled comparisons to manual editing, which are required to substantiate that the surface metrics reflect net productivity gains rather than hidden costs such as review fatigue or quality regressions.
minor comments (2)
  1. Clarify the exact scope of 'all code written company-wide' (e.g., whether it includes only edited files or all commits) to avoid ambiguity in the 1% figure.
  2. Provide a brief overview of the model architecture or training data sources in the main text, as the abstract mentions deep learning but leaves implementation details implicit.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive feedback, which helps clarify the presentation of our deployment results. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims of a 45% acceptance rate and >1% contribution to company-wide code volume are stated as observed results without any description of measurement methods, time window, definition of acceptance, controls for selection bias, or comparison to baseline paste/edit behavior. These metrics are load-bearing for the claim of substantial success and positive feedback.

    Authors: We agree that greater transparency on metric collection strengthens the paper. The revised manuscript now includes an expanded methods description specifying the observation window (post-deployment data from March 2023 through June 2024), the operational definition of acceptance (explicit user acceptance or persistence of the edit in the final committed code), and the aggregation method used to compute the >1% contribution (total lines introduced by accepted suggestions divided by total lines committed company-wide). Phased rollout across teams was used to reduce selection effects, though we cannot release the full internal statistical controls or baseline paste/edit logs for privacy reasons. revision: partial

  2. Referee: [Abstract] The manuscript does not report data on rejection reasons, interaction overhead, downstream code quality effects, or controlled comparisons to manual editing, which are required to substantiate that the surface metrics reflect net productivity gains rather than hidden costs such as review fatigue or quality regressions.

    Authors: We have added a new subsection summarizing internal survey responses on rejection reasons (primarily irrelevance or over-conservatism) and qualitative observations that interaction overhead remains low because suggestions are presented inline. We also now explicitly list the absence of randomized controlled trials and downstream quality metrics as a limitation of the work. A formal A/B comparison to manual editing was not feasible within the production deployment constraints and ethical guidelines governing tool rollouts at this scale. revision: partial

standing simulated objections not resolved
  • Quantitative results from controlled experiments measuring net productivity impact or downstream code quality effects

Circularity Check

0 steps flagged

No significant circularity in empirical deployment report

full rationale

The paper is an experience report on iteratively developing and deploying the Smart Paste IDE feature. It reports direct observational data including that code is pasted 4 times more often than typed, a 45% acceptance rate after deployment, and that accepted suggestions account for over 1% of company-wide code. These are presented as measured post-deployment outcomes and user feedback rather than as model predictions, fitted parameters renamed as results, or quantities derived from equations that reduce to the inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked to justify the central claims; the metrics function as independent external observations of the deployed system.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical systems and deployment report. It relies on standard assumptions from machine learning and user-experience practice but introduces no explicit free parameters, new entities, or non-standard axioms beyond the behavioral premise that users will accept useful suggestions.

axioms (1)
  • domain assumption User acceptance of suggestions and resulting code volume serve as valid proxies for feature value and productivity impact.
    The central success metrics depend on this untested behavioral assumption about developer interaction with the tool.

pith-pipeline@v0.9.0 · 5703 in / 1379 out tokens · 42446 ms · 2026-05-18T10:16:03.078334+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 4 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  2. [2]

    SmartPaste: Learning to Adapt Source Code

    Miltiadis Allamanis and Marc Brockschmidt. Smartpaste: Learning to adapt source code.arXiv preprint arXiv:1705.07867, 2017

  3. [3]

    Amazon CodeWhisperer is now generally avail- able

    Amazon Web Services. Amazon CodeWhisperer is now generally avail- able. https://aws.amazon.com/blogs/aws/amazon-codewhisperer-free-for- individual-use-is-now-generally-available/, April 2023. Accessed: 2025-09-24

  4. [4]

    Program fracture and recombination for efficient automatic code reuse

    Peter Amidon, Eli Davis, Stelios Sidiroglou-Douskos, and Martin Rinard. Program fracture and recombination for efficient automatic code reuse. In2015 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–6. IEEE, 2015

  5. [5]

    Cursor: The AI-first Code Editor

    Anysphere Inc. Cursor: The AI-first Code Editor. https://cursor.sh/. n.d

  6. [6]

    Efficient Training of Language Models to Fill in the Middle

    Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, and Mark Chen. Efficient training of language models to fill in the middle.arXiv preprint arXiv:2207.14255, 2022. Smart Paste: Automatically Fixing Copy/Paste for Google Developers

  7. [7]

    Identifying the factors that influence trust in ai code completion

    Adam Brown, Sarah D’Angelo, Ambar Murillo, Ciera Jaspan, and Collin Green. Identifying the factors that influence trust in ai code completion. InProceedings of the 1st ACM International Conference on AI-Powered Software, pages 1–9, 2024

  8. [8]

    Learning from examples to improve code completion systems

    Marcel Bruch, Martin Monperrus, and Mira Mezini. Learning from examples to improve code completion systems. InProceedings of the 7th joint meeting of the European softw are engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pages 213–222, 2009

  9. [9]

    On multi-modal learning of editing source code

    Saikat Chakraborty and Baishakhi Ray. On multi-modal learning of editing source code. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 443–455. IEEE, 2021

  10. [10]

    Multi-line ai-assisted code authoring

    Omer Dunay, Daniel Cheng, Adam Tait, Parth Thakkar, Peter C Rigby, Andy Chiu, Imad Ahmad, Arun Ganesan, Chandra Maddila, Vijayaraghavan Murali, et al. Multi-line ai-assisted code authoring. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 150–160, 2024

  11. [12]

    GitHub Copilot is generally available to all developers

    GitHub. GitHub Copilot is generally available to all developers. https://github. blog/2022-06-21-github-copilot-is-generally-available-to-all-developers/, June

  12. [13]

    Accessed: 2025-09-24

  13. [14]

    Duet AI in Google Cloud is now generally avail- able

    Google Cloud. Duet AI in Google Cloud is now generally avail- able. https://cloud.google.com/blog/products/application-modernization/ introducing-duet-ai-for-google-cloud, May 2023. Accessed: 2025-09-24]

  14. [15]

    Diff, patch, and friends.Linux Journal, 1996(28es):2–es, 1996

    Michael K Johnson. Diff, patch, and friends.Linux Journal, 1996(28es):2–es, 1996

  15. [16]

    Adaptivepaste: Intelligent copy-paste in ide

    Xiaoyu Liu, Jinu Jang, Neel Sundaresan, Miltiadis Allamanis, and Alexey Svy- atkovskiy. Adaptivepaste: Intelligent copy-paste in ide. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1844–1854, 2023

  16. [17]

    Navigate and edit c# - visual studio code

    Microsoft. Navigate and edit c# - visual studio code. https://code.visualstudio. com/docs/csharp/navigate-edit#_peek-definition. Section: Peek Definition. Ac- cessed: 2025-09-18

  17. [18]

    Refactoring - visual studio code

    Microsoft. Refactoring - visual studio code. https://code.visualstudio.com/docs/ editing/refactoring, Sep 2025. Accessed: 2025-09-18

  18. [19]

    Prompting llms for code editing: Struggles and remedies.arXiv preprint arXiv:2504.20196, 2025

    Daye Nam, Ahmed Omran, Ambar Murillo, Saksham Thakur, Abner Araujo, Marcel Blistein, Alexander Frömmgen, Vincent Hellendoorn, and Satish Chan- dra. Prompting llms for code editing: Struggles and remedies.arXiv preprint arXiv:2504.20196, 2025

  19. [20]

    Type-directed completion of partial expressions

    Daniel Perelman, Sumit Gulwani, Thomas Ball, and Dan Grossman. Type-directed completion of partial expressions. InProceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, pages 275–286, 2012

  20. [21]

    chrF: character n-gram F-score for automatic MT evaluation

    Maja Popović. chrF: character n-gram F-score for automatic MT evaluation. In Ondřej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina, editors,Proceed- ings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal, September 2015. Association for Co...

  21. [22]

    Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

  22. [23]

    Detecting and characterizing semantic inconsistencies in ported code

    Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. Detecting and characterizing semantic inconsistencies in ported code. In2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 367–377. IEEE, 2013

  23. [24]

    Code completion with statis- tical language models

    Veselin Raychev, Martin Vechev, and Eran Yahav. Code completion with statis- tical language models. InProceedings of the 35th ACM SIGPLAN conference on programming language design and implementation, pages 419–428, 2014

  24. [25]

    Codecarboncopy

    Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin Rinard. Codecarboncopy. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 95–105, 2017

  25. [26]

    Pythia: Ai-assisted code completion system

    Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. Pythia: Ai-assisted code completion system. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2727–2735, 2019

  26. [27]

    Ml-enhanced code completion im- proves developer productivity

    Maxim Tabachnyk and Stoyan Nikolov. Ml-enhanced code completion im- proves developer productivity. https://research.google/blog/ml-enhanced-code- completion-improves-developer-productivity/, Jul 2022. Accessed: 2025-09-18

  27. [28]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

  28. [29]

    Why and how javascript developers use linters

    Kristín Fjóla Tómasdóttir, Mauricio Aniche, and Arie Van Deursen. Why and how javascript developers use linters. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 578–589. IEEE, 2017

  29. [30]

    On learning meaningful code changes via neural machine translation

    Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. On learning meaningful code changes via neural machine translation. In2019 IEEE/ACM 41st International Conference on Software Engi- neering (ICSE), pages 25–36. IEEE, 2019

  30. [31]

    Code suggestions powered by everything you’ve done

    Windsurf. Code suggestions powered by everything you’ve done. https:// windsurf.com/tab. n.d

  31. [32]

    Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024