pith. sign in

arxiv: 2604.08352 · v1 · submitted 2026-04-09 · 💻 cs.SE · cs.CR· cs.HC

Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

Pith reviewed 2026-05-10 17:15 UTC · model grok-4.3

classification 💻 cs.SE cs.CRcs.HC
keywords generative AIcoding assistantsGitHub Copilotsecurity concernsdata leakageadversarial attacksonline discussionsthematic analysis
0
0 comments X

The pith

Analysis of online developer discussions identifies four primary security concerns with generative AI coding assistants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors set out to understand the security worries that software developers have about tools like GitHub Copilot by looking at what people say in public online spaces. They gathered relevant posts and comments from three major platforms and used topic modeling to group them before identifying common themes through careful reading. This approach shows that users are particularly anxious about their data being exposed, legal issues with code ownership, ways attackers could manipulate the AI, and the chance that the generated code contains vulnerabilities. Knowing these specific concerns matters because it reveals practical challenges that technical tests alone might miss, helping point toward better safeguards in future versions of these assistants.

Core claim

Through the collection of discussion threads from Stack Overflow, Reddit, and Hacker News concerning security issues in GitHub Copilot, followed by BERTopic clustering and thematic analysis, four major areas of concern emerge: potential data leakage, code licensing, adversarial attacks such as prompt injection, and insecure code suggestions. These findings emphasize the limitations and trade-offs involved in applying generative AI to software engineering tasks.

What carries the argument

BERTopic clustering followed by thematic analysis of developer discussion threads on GitHub Copilot security issues.

Load-bearing premise

The sample of discussions from Stack Overflow, Reddit, and Hacker News accurately reflects the security concerns of software developers in general.

What would settle it

A broad survey of professional developers reporting few or no security concerns with generative AI coding assistants would challenge whether the four areas represent widespread views.

Figures

Figures reproduced from arXiv: 2604.08352 by Monika Swetha Gurupathi, Nalin Arachchilage, Nicol\'as E. D\'iaz Ferreyra, Riccardo Scandariato, Zadia Codabux.

Figure 1
Figure 1. Figure 1: Study Design. 3 METHODOLOGY To answer the RQs introduced in Section 1, we curated a dataset of online posts, comments, and discussion threads addressing security issues in GitHub Copilot from three public online forums: Stack Overflow, Reddit, and Hacker News. As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Provenance Distribution Across Clusters. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sentiment Distribution Across Platforms. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software practitioners across multiple programming tasks, including code completion, documentation, and bug detection. However, current research has identified significant limitations and open issues in GenAI, including reliability, non-determinism, bias, and copyright infringement. While prior work has primarily focused on assessing the technical performance of these technologies for code generation, less attention has been paid to emerging concerns of software developers, particularly in the security realm. OBJECTIVE: This work explores security concerns regarding the use of GenAI-based coding assistants by analyzing challenges voiced by developers and software enthusiasts in public online forums. METHOD: We retrieved posts, comments, and discussion threads addressing security issues in GitHub Copilot from three popular platforms, namely Stack Overflow, Reddit, and Hacker News. These discussions were clustered using BERTopic and then synthesized using thematic analysis to identify distinct categories of security concerns. RESULTS: Four major concern areas were identified, including potential data leakage, code licensing, adversarial attacks (e.g., prompt injection), and insecure code suggestions, underscoring critical reflections on the limitations and trade-offs of GenAI in software engineering. IMPLICATIONS: Our findings contribute to a broader understanding of how developers perceive and engage with GenAI-based coding assistants, while highlighting key areas for improving their built-in security features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper explores security concerns regarding generative AI coding assistants such as GitHub Copilot by retrieving and analyzing public discussions from Stack Overflow, Reddit, and Hacker News. It applies BERTopic clustering to the collected threads followed by thematic analysis to synthesize four major concern categories: potential data leakage, code licensing issues, adversarial attacks (e.g., prompt injection), and insecure code suggestions. The work positions these findings as insights into developer perceptions and trade-offs in GenAI adoption for software engineering tasks.

Significance. If the results hold, the study adds a developer-centered perspective to the literature on GenAI limitations in software engineering, complementing technical evaluations of reliability, bias, and performance. By drawing directly from forum discussions rather than controlled experiments, it highlights practical security worries that could inform tool improvements and future research on human-AI collaboration in coding.

major comments (2)
  1. [Methods] Methods section: The data collection description provides no specifics on search queries/keywords, time period covered, total volume of posts/comments/threads retrieved, or any inclusion/exclusion criteria applied across the three platforms. These details are load-bearing for evaluating whether the four identified categories are robustly supported or influenced by sampling choices.
  2. [Methods] Thematic analysis description (following BERTopic clustering): No information is given on the number of analysts involved, inter-rater reliability metrics, or the validation process used to derive and confirm the four concern categories from the clusters. This omission weakens the ability to assess the reliability of the synthesis step central to the results.
minor comments (2)
  1. [Abstract] Abstract: Adding approximate figures for the number of discussions analyzed would help readers gauge the scale of the evidence base supporting the four categories.
  2. [Results] Results: The presentation of the four categories would benefit from explicit mapping back to representative quotes or cluster examples to strengthen traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We agree that expanding the Methods section with additional details will strengthen the paper's transparency and allow readers to better assess the robustness of our findings.

read point-by-point responses
  1. Referee: [Methods] Methods section: The data collection description provides no specifics on search queries/keywords, time period covered, total volume of posts/comments/threads retrieved, or any inclusion/exclusion criteria applied across the three platforms. These details are load-bearing for evaluating whether the four identified categories are robustly supported or influenced by sampling choices.

    Authors: We agree that these details are essential for reproducibility and evaluating sampling choices. In the revised manuscript, we will expand the data collection subsection to specify the search queries and keywords used on each platform (e.g., terms combining 'GitHub Copilot' with 'security', 'data leak', 'licensing', 'prompt injection', and 'insecure code'), the time period covered (from Copilot's public release in 2021 through the collection date), the total volume of posts/comments/threads retrieved before and after filtering, and the inclusion/exclusion criteria (e.g., English-language discussions directly addressing security concerns, exclusion of duplicates or unrelated threads). This will clarify how the four categories were derived from the sampled discussions. revision: yes

  2. Referee: [Methods] Thematic analysis description (following BERTopic clustering): No information is given on the number of analysts involved, inter-rater reliability metrics, or the validation process used to derive and confirm the four concern categories from the clusters. This omission weakens the ability to assess the reliability of the synthesis step central to the results.

    Authors: We acknowledge that the current description of the thematic analysis step is insufficiently detailed. In the revision, we will add a description of the process: the involvement of two authors in reviewing BERTopic clusters (via top terms, representative documents, and manual inspection), the iterative synthesis into the four categories through discussion and consensus-building, and the validation approach (cross-referencing with prior literature on GenAI security and checking for consistency across platforms). If inter-rater reliability was not formally computed, we will note the consensus process used instead. This will improve transparency without altering the reported categories. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper performs an exploratory qualitative study: it retrieves external forum threads from Stack Overflow, Reddit, and Hacker News, applies BERTopic clustering, and conducts thematic analysis to surface four concern categories. No equations, fitted parameters, predictions, or derivations exist; the results are direct outputs of documented processing steps on independent data. No self-citation load-bearing steps or ansatz smuggling appear in the provided material. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard qualitative methods and one core domain assumption about the representativeness of public forum data; it introduces no free parameters, invented entities, or additional axioms.

axioms (1)
  • domain assumption Discussions on public forums like Stack Overflow, Reddit, and Hacker News provide representative insights into developers' security concerns with GenAI coding assistants.
    The study bases its findings on these sources without addressing potential biases in who posts or what gets discussed.

pith-pipeline@v0.9.0 · 5585 in / 1327 out tokens · 54002 ms · 2026-05-10T17:15:42.410326+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

  1. [1]

    Mousa Al-kfairy, Ahmed Al-Adaileh, and Obsa Sendaba. 2024. ChatGPT Through the Users’ Eyes: Sentiment Analysis of Privacy and Security Issues. InInter- national Symposium on Security and Privacy in Social Networks and Big Data. Springer, 41–67

  2. [2]

    Mutahar Ali, Arjun Arunasalam, and Habiba Farrukh. 2025. Understanding Users’ Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms. In2025 IEEE Symposium on Security and Privacy (SP). 298–316

  3. [3]

    Alessia Antelmi, Gennaro Cordasco, Daniele De Vinco, and Carmine Spagnuolo

  4. [4]

    InCompanion Proceedings of the ACM Web Conference 2023

    The age of snippet programming: Toward understanding developer com- munities in stack overflow and reddit. InCompanion Proceedings of the ACM Web Conference 2023. 1218–1224

  5. [5]

    Leonardo Banh, Florian Holldack, and Gero Strobel. 2025. Copiloting the future: How generative AI transforms Software Engineering.Information and Software Technology183 (2025), 107751

  6. [6]

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert- Voss, Katherine Lee, Adam Roberts, Tom B Brown, Dawn Song, Úlfar Erlingsson, EASE Companion 2026, 9–12 June, 2026, Glasgow, Scotland, United Kingdom Díaz Ferreyra et al. et al. 2022. Extracting Training Data from Large Language Models. InProceedings of the 31st USENIX Securi...

  7. [7]

    Amanda Casari, Julia Ferraioli, and Juniper Lovato. 2023. Beyond the repository: Best practices for open source ecosystems researchers.Queue21, 2 (2023), 14–34

  8. [8]

    Omkar Sandip Chavan, Divya Dilip Hinge, Soham Sanjay Deo, Yaxuan Wang, and Mohamed Wiem Mkaouer. 2024. Analyzing developer-ChatGPT conversations for software refactoring: an exploratory study. InProceedings of the 21st International Conference on Mining Software Repositories. 207–211

  9. [9]

    Mark Chen, Jerry Tworek, et al. 2021. Evaluating Large Language Models Trained on Code. (2021). arXiv:2107.03374

  10. [10]

    Zhi Chen and Lingxiao Jiang. 2025. Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Sce- narios. In2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 657–668

  11. [11]

    Zadia Codabux, Fatemeh Fard, Roberto Verdecchia, Fabio Palomba, Dario Di Nucci, and Gilberto Recupito. 2024. Teaching Mining Software Reposito- ries. InHandbook on Teaching Empirical Software Engineering. Springer, 325–362

  12. [12]

    Domenico Cotroneo, Roberta De Luca, and Pietro Liguori. 2025. DeVAIC: A tool for security assessment of AI-generated code.Information and Software Technology177 (2025), 107572

  13. [13]

    Domenico Cotroneo, Cristina Improta, Pietro Liguori, and Roberto Natella. 2024. Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning At- tacks. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension. Association for Computing Machinery, 280–292

  14. [14]

    Roland Croft, Yongzheng Xie, Mansooreh Zahedi, M Ali Babar, and Christoph Treude. 2022. An empirical study of developers’ discussions about security challenges of different programming languages.Empirical Software Engineering 27, 1 (2022), 27

  15. [15]

    Daniela S Cruzes and Tore Dyba. 2011. Recommended steps for thematic synthesis in software engineering. In2011 international symposium on empirical software engineering and measurement. IEEE, 275–284

  16. [16]

    Nicolás E Díaz Ferreyra, Melina Vidoni, Maritta Heisel, and Riccardo Scandariato

  17. [17]

    Cybersecurity discussions in Stack Overflow: a developer-centred analysis of engagement and self-disclosure behaviour.Social Network Analysis and Mining 14, 1 (2023), 16

  18. [18]

    Mateusz Dolata, Norbert Lange, and Gerhard Schwabe. 2024. Development in times of hype: How freelancers explore Generative AI?. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

  19. [19]

    Christof Ebert and Panos Louridas. 2023. Generative AI for software practitioners. IEEE Software40, 4 (2023), 30–38

  20. [20]

    Yujia Fu, Peng Liang, Amjed Tahir, Zengyang Li, Mojtaba Shahin, Jiaxin Yu, and Jinfu Chen. 2025. Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study.ACM Transactions on Software Engineering and Methodologies(2025). Just Accepted

  21. [21]

    Ya Gao and GitHub Customer Research. 2024. Quantifying GitHub Copilot’s Impact in the Enterprise with Accenture. Online. https://shorturl.at/OSXr5

  22. [22]

    Vladimir Geroimenko. 2025. Key Security Risks in Prompt Engineering. InThe Essential Guide to Prompt Engineering: Key Principles, Techniques, Challenges, and Security Risks. Springer, 103–120

  23. [23]

    Nicolas E Gold and Jens Krinke. 2022. Ethics in the mining of software repositories. Empirical Software Engineering27, 1 (2022), 17

  24. [24]

    Sivana Hamer, Marcelo d’Amorim, and Laurie Williams. 2024. Just Another Copy and Paste? cComparing the Security Culnerabilities of ChatGPT Generated Code and Stack Overflow Answers. In2024 IEEE Security and Privacy Workshops (SPW). IEEE, 87–94

  25. [25]

    Jan H Klemmer, Stefan Albert Horstmann, Nikhil Patnaik, Cordelia Ludden, Cordell Burton Jr, Carson Powers, Fabio Massacci, Akond Rahman, Daniel Votipka, Heather Richter Lipford, et al . 2024. Using ai assistants in software development: A qualitative study on security practices and concerns. InProceed- ings of the 2024 on ACM SIGSAC Conference on Computer...

  26. [26]

    Ratanond Koonchanok, Yanling Pan, and Hyeju Jang. 2024. Public attitudes toward chatgpt on twitter: sentiments, topics, and occupations.Social Network Analysis and Mining14, 1 (2024), 106

  27. [27]

    Junjie Li, Aseem Sangalay, Cheng Cheng, Yuan Tian, and Jinqiu Yang. 2024. Fine tuning large language model for secure code generation. InProceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering. 86–90

  28. [28]

    Ze Shi Li, Nowshin Nawar Arony, Kezia Devathasan, Manish Sihag, Neil Ernst, and Daniela Damian. 2024. Unveiling the life cycle of user feedback: Best practices from software practitioners. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13

  29. [29]

    Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, et al. 2025. Rethinking machine unlearning for large language models.Nature Machine Intelligence(2025), 1–14

  30. [30]

    Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, and NhatHai Phan. 2024. Promsec: Prompt optimization for secure generation of functional source code with large language models (llms). InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2266–2280

  31. [31]

    Anh Nguyen-Duc, Beatriz Cabrero-Daniel, Adam Przybylek, Chetan Arora, Dron Khanna, Tomas Herda, Usman Rafiq, Jorge Melegati, Eduardo Guerra, Kai-Kristian Kemell, et al. 2025. Generative artificial intelligence for software engineering—a research agenda.Software: Practice and Experience(2025)

  32. [32]

    Liang Niu, Shujaat Mirza, Zayd Maradni, and Christina Pöpper. 2023. CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot. In 32nd USENIX Security Symposium (USENIX Security 23). 2133–2150

  33. [33]

    Sahrima Jannat Oishwee, Zadia Codabux, and Natalia Stakhanova. 2024. De- coding android permissions: a study of developer challenges and solutions on stack overflow. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 143–153

  34. [34]

    Ogobuchi Daniel Okey, Ekikere Umoren Udo, Renata Lopes Rosa, Demostenes Ze- garra Rodríguez, and João Henrique Kleinschmidt. 2023. Investigating ChatGPT and cybersecurity: A perspective on topic modeling and sentiment analysis. Computers & Security135 (2023), 103476

  35. [35]

    Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2025. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions.Commun. ACM68, 2 (2025), 96–105

  36. [36]

    Anthony Peruma, Steven Simmons, Eman Abdullah AlOmar, Christian D New- man, Mohamed Wiem Mkaouer, and Ali Ouni. 2022. How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow.Empirical Software Engineering27, 1 (2022), 11

  37. [37]

    Rafiqul Rabin, Sean McGregor, and Nick Judd. 2025. Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation.arXiv preprint arXiv:2503.22760(2025)

  38. [38]

    Raphael Serafini, Asli Yardim, and Alena Naiakshina. 2025. Exploring the Impact of Intervention Methods on Developers’ Security Behavior in a Manipulated ChatGPT Study. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–26

  39. [39]

    Trevor Stalnaker, Nathan Wintersgill, Oscar Chaparro, Laura A Heymann, Mas- similiano Di Penta, Daniel M German, and Denys Poshyvanyk. 2024. Developer Perspectives on Licensing and Copyright Issues Arising from Generative AI for Software Development.ACM Transactions on Software Engineering and Method- ology(2024)

  40. [40]

    Díaz Ferreyra, Markus Mutas, Salem Dhiff, and Ric- cardo Scandariato

    Catherine Tony, Nicolás E. Díaz Ferreyra, Markus Mutas, Salem Dhiff, and Ric- cardo Scandariato. 2025. Prompting Techniques for Secure Code Generation: A Systematic Investigation.ACM Transactions on Software Engineering and Methodology(2025). doi:10.1145/3722108

  41. [41]

    In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp

    Catherine Tony, Markus Mutas, Nicolás E. Díaz Ferreyra, and Riccardo Scandari- ato. 2023. LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations. InProceedings of the 20th International Conference on Mining Software Repositories (MSR ’23). doi:10.1109/MSR59073.2023.00084

  42. [42]

    Weibin Wu, Haoxuan Hu, Zhaoji Fan, Yitong Qiao, Yizhan Huang, Yichen Li, Zibin Zheng, and Michael Lyu. 2025. An Empirical Study of Code Clones from Commercial AI Code Generators.Proceedings of the ACM on Software Engineering 2, FSE (2025), 2874–2896

  43. [43]

    Mei Wu-Gehbauer and Christoph Rosenkranz. 2024. Unlocking the Potential of Generative Artificial Intelligence: A Case Study in Software Development. In Proceedings of the International Conference on Information Systems (ICIS 2024) (ICIS 2024 Proceedings, 25). Association for Information Systems

  44. [44]

    HanXiang Xu, ShenAo Wang, Ningke Li, Kailong Wang, Yanjie Zhao, Kai Chen, Ting Yu, Yang Liu, and HaoYu Wang. 2024. Large language models for cyber se- curity: A systematic literature review.ACM Transactions on Software Engineering and Methodology(2024)

  45. [45]

    Weiwei Xu, Kai Gao, Hao He, and Minghui Zhou. 2025. Licoeval: Evaluating LLMs on License Compliance in Code Generation. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 1665–1677

  46. [46]

    Zhaoxiang Xu, Qingguo Fang, Yanbo Huang, and Mingjian Xie. 2024. The public attitude towards ChatGPT on reddit: A study based on unsupervised learning from sentiment analysis and topic modeling.Plos one19, 5 (2024), e0302502

  47. [47]

    Zhou Yang, Zhipeng Zhao, Chenyu Wang, Jieke Shi, Dongsun Kim, Donggyun Han, and David Lo. 2024. Unveiling memorization in code models. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

  48. [48]

    Burak Yetistiren, Isik Ozsoy, and Eray Tuzun. 2022. Assessing the quality of GitHub copilot’s code generation. InProceedings of the 18th international confer- ence on predictive models and data analytics in software engineering. 62–71

  49. [49]

    Aria Zegers, Natalie Preciado, Jan Duchnowski, Fernanda Madeiral, and Emitzá Guzmán. 2025. Irresponsibility Killed the Cat: Software Accountability Concerns. In2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 131–142