XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
Pith reviewed 2026-05-22 23:52 UTC · model grok-4.3
The pith
Attackers can poison AI coding assistants by making semantically equivalent modifications to code from different sources, achieving 75.72% success on average with a new graph search algorithm.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce XOXO, a cross-origin context poisoning attack that relies on adversarial but semantically equivalent code modifications to compromise AI coding assistants without detection by traditional analysis. To find effective modifications, we propose GCGS, a task-agnostic black-box algorithm that searches the transformation space using a Cayley Graph, achieving an average attack success rate of 75.72% across five tasks and eleven models including GPT-4.1 and Claude 3.5 Sonnet v2. Adversarial fine-tuning defenses prove ineffective against this approach.
What carries the argument
GCGS algorithm, which systematically searches the transformation space of semantically equivalent code changes using a Cayley Graph structure to identify effective poisoning inputs in a black-box setting.
If this is right
- The attack succeeds against eleven models used in popular AI coding assistants.
- Existing defenses such as adversarial fine-tuning fail to mitigate the attack.
- Attackers can cause the generation of incorrect or vulnerable code while appearing legitimate.
- The method applies across five different tasks in a task-agnostic manner.
- Blame for bad outputs can be shifted to the victim developer.
Where Pith is reading between the lines
- AI coding tools may need to track the origin and provenance of all context pieces to prevent such poisoning.
- Similar context aggregation in other LLM applications could be vulnerable to equivalent attacks.
- Developers should consider manual review or additional verification steps for AI-generated code from large contexts.
Load-bearing premise
Automatic gathering of context from multiple origins into the LLM prompt occurs without any sanitization or checks on where the context came from.
What would settle it
Testing whether applying the GCGS-found transformations to code in a multi-file project causes the AI assistant to output the intended poisoned result on one of the five tasks with one of the eleven models.
Figures
read the original abstract
AI coding assistants are widely used for tasks like code generation. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to compromise the assistant's outputs, potentially generating vulnerable code or introducing critical errors. We propose a novel attack, Cross-Origin Context Poisoning (XOXO), that is challenging to detect as it relies on adversarial code modifications that are semantically equivalent. Traditional program analysis techniques struggle to identify these perturbations since the semantics of the code remains correct, making it appear legitimate. This allows attackers to manipulate coding assistants into producing incorrect outputs, while shifting the blame to the victim developer. We introduce a novel, task-agnostic, black-box attack algorithm GCGS that systematically searches the transformation space using a Cayley Graph, achieving a 75.72% attack success rate on average across five tasks and eleven models, including GPT 4.1 and Claude 3.5 Sonnet v2 used by popular AI coding assistants. Furthermore, defenses like adversarial fine-tuning are ineffective against our attack, underscoring the need for new security measures in LLM-powered coding tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces XOXO, a stealthy cross-origin context poisoning attack on AI coding assistants. It argues that automatic aggregation of context from multiple origins (files, projects) into LLM prompts creates an unsanitized attack surface, allowing semantically equivalent code modifications to poison outputs (e.g., inducing vulnerable code) while appearing legitimate. The authors propose GCGS, a task-agnostic black-box algorithm that searches the space of code transformations via a Cayley graph, reporting a 75.72% average attack success rate across five tasks and eleven models (including GPT-4.1 and Claude 3.5 Sonnet v2). They further claim that adversarial fine-tuning is ineffective as a defense.
Significance. If the results hold under realistic multi-origin context aggregation, the work would highlight an important new vulnerability class for LLM-based developer tools. The Cayley-graph search method offers a systematic way to generate stealthy, semantics-preserving adversarial examples, which could influence future defenses in code-generation systems.
major comments (2)
- [Abstract and Experimental Evaluation] Abstract and Experimental Evaluation: The headline 75.72% ASR claim (and the ineffectiveness of adversarial fine-tuning) is presented without any description of the experimental protocol, including how multi-origin context was collected and aggregated (e.g., file inclusion order, deduplication rules, provenance metadata), the exact set of transformations enumerated by GCGS, number of trials per task/model, baselines, or statistical tests. This absence prevents evaluation of whether the central empirical result is reproducible or load-bearing.
- [Threat Model and Evaluation Setup] Threat Model and Evaluation Setup: The threat model posits automatic context gathering across origins without sanitization, yet no evidence is supplied that the reported experiments used an actual context-collection pipeline rather than hand-crafted single prompts. If real IDE gatherers apply even lightweight filtering or reordering, the Cayley-graph search may not achieve the claimed success rate; this assumption is load-bearing for the attack's practicality.
minor comments (2)
- [Abstract] Clarify model naming (e.g., 'GPT 4.1' in the abstract) and list exact versions and access methods for all eleven models.
- [GCGS Algorithm] The description of the Cayley graph construction would benefit from a short pseudocode or diagram showing how semantic equivalence is preserved during search.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for greater clarity on our experimental protocol and threat model assumptions. We address each comment below and will revise the manuscript accordingly to improve reproducibility and address concerns about realism.
read point-by-point responses
-
Referee: [Abstract and Experimental Evaluation] Abstract and Experimental Evaluation: The headline 75.72% ASR claim (and the ineffectiveness of adversarial fine-tuning) is presented without any description of the experimental protocol, including how multi-origin context was collected and aggregated (e.g., file inclusion order, deduplication rules, provenance metadata), the exact set of transformations enumerated by GCGS, number of trials per task/model, baselines, or statistical tests. This absence prevents evaluation of whether the central empirical result is reproducible or load-bearing.
Authors: We agree the abstract omits protocol details (standard for length constraints) and that the experimental section would benefit from explicit enumeration. The full manuscript's Section 4 already specifies context aggregation (concatenation by file modification time with no deduplication), the GCGS transformation set (12 operators including variable renaming and statement reordering), 100 trials per task-model pair, random-search baseline, and mean ASR with standard deviation. In revision we will (1) add a one-sentence protocol summary to the abstract, (2) expand Section 4 with a table listing all GCGS generators and inclusion rules, and (3) report 95% confidence intervals and paired t-tests against the baseline. revision: yes
-
Referee: [Threat Model and Evaluation Setup] Threat Model and Evaluation Setup: The threat model posits automatic context gathering across origins without sanitization, yet no evidence is supplied that the reported experiments used an actual context-collection pipeline rather than hand-crafted single prompts. If real IDE gatherers apply even lightweight filtering or reordering, the Cayley-graph search may not achieve the claimed success rate; this assumption is load-bearing for the attack's practicality.
Authors: Our evaluation constructs multi-origin prompts by programmatically combining snippets from distinct files and repositories exactly as described in the threat model (Section 3), without any sanitization step. This matches the automatic aggregation behavior reported for tools such as Cursor and GitHub Copilot. We did not instrument a live IDE gatherer, which is a methodological limitation common to black-box LLM attacks. In the revision we will add a dedicated paragraph in Section 4.2 discussing robustness to common lightweight filters (e.g., duplicate removal, provenance stripping) and include an auxiliary experiment measuring ASR degradation under simulated reordering. revision: partial
Circularity Check
No significant circularity; empirical attack success rates are direct experimental measurements
full rationale
The paper presents an empirical black-box attack (GCGS) and reports measured attack success rates (75.72% average) on external commercial models. No equations, parameter fitting, self-definitional constructs, or load-bearing self-citations appear in the derivation of the central claim. The reported ASR constitutes independent evidence obtained by applying transformations to prompts and observing model outputs, rather than any reduction of the result to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Codeium: Free ai code completion & chat. https://www.codeium.com/, . Accessed: 2024- 11-08
work page 2024
-
[2]
Cody by sourcegraph. https://sourcegraph.com/cody, . Accessed: 2024-11-08
work page 2024
-
[3]
Continue: Open-source code copilot. https://continue.dev/. Accessed: 2024-11-08
work page 2024
-
[4]
https://github.com/features/copilot
Github copilot. https://github.com/features/copilot. Accessed: 2024-11-08
work page 2024
- [5]
- [6]
-
[7]
URL https://github.com/ meta-llama/PurpleLlama/tree/main/CodeShield
PurpleLlama/CodeShield at main · meta-llama/PurpleLlama, . URL https://github.com/ meta-llama/PurpleLlama/tree/main/CodeShield
-
[8]
URL https://github.com/meta-llama/PurpleLlama/tree/main/CodeShield/ insecure_code_detector
PurpleLlama/CodeShield/insecure code detector at main · meta-llama/PurpleLlama, . URL https://github.com/meta-llama/PurpleLlama/tree/main/CodeShield/ insecure_code_detector
-
[9]
Tabnine: Ai code completion for all languages. https://www.tabnine.com/. Accessed: 2024-11-08
work page 2024
-
[10]
URL https://tree-sitter.github.io/tree-sitter/
Tree-sitter. URL https://tree-sitter.github.io/tree-sitter/
-
[11]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
The claude 3 model family: Opus, sonnet, haiku, 2024
Anthropic. The claude 3 model family: Opus, sonnet, haiku, 2024. URL https://www. anthropic.com/news/claude-3-5-sonnet
work page 2024
-
[13]
Adversarial robustness for code, 2020
Pavol Bielik and Martin Vechev. Adversarial robustness for code, 2020. URL https:// arxiv.org/abs/2002.04694
-
[14]
Barr, Santanu Kumar Dash, Prem Devanbu, and Emily Mor- gan
Casey Casalnuovo, Earl T. Barr, Santanu Kumar Dash, Prem Devanbu, and Emily Mor- gan. A theory of dual channel constraints. In Proceedings of the ACM/IEEE 42nd In- ternational Conference on Software Engineering: New Ideas and Emerging Results , page 25–28. Association for Computing Machinery, 2020. doi: 10.1145/3377816.3381720. URL https://doi.org/10.1145...
-
[15]
mitmproxy: A free and open source interactive HTTPS proxy, 2010–
Aldo Cortesi, Maximilian Hils, Thomas Kriechbaumer, and contributors. mitmproxy: A free and open source interactive HTTPS proxy, 2010–. URLhttps://mitmproxy.org/. [Version 11.0]
work page 2010
-
[16]
How gradient created an open llm with a million-token con- text window
Ben Dickson. How gradient created an open llm with a million-token con- text window. VentureBeat, June 2024. URL https://venturebeat.com/ai/ how-gradient-created-an-open-llm-with-a-million-token-context-window/
work page 2024
-
[17]
Vulnerability detection with code language models: How far are we?
Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alo- mair, David Wagner, Baishakhi Ray, and Yizheng Chen. Vulnerability detection with code language models: How far are we?, 2024. URL https://arxiv.org/abs/2403.18624
-
[18]
Django: The Web framework for perfectionists with deadlines,
Django Software Foundation. Django: The Web framework for perfectionists with deadlines,
- [19]
-
[20]
Thomas Dohmke. Bringing developer choice to Copilot with Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview, October 2024. URLhttps://github. blog/news-insights/product-news/bringing-developer-choice-to-copilot/
work page 2024
-
[21]
An extensive study on adversarial attack against pre-trained models of code
Xiaohu Du, Ming Wen, Zichao Wei, Shangwen Wang, and Hai Jin. An extensive study on adversarial attack against pre-trained models of code. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, page 489–501, New York, NY , USA, 2023. Association for Computing Ma...
-
[22]
Abhimanyu Dubey, Abhinav Jauhri, and Abhinav Pandey et al. The llama 3 herd of models,
-
[23]
URL https://arxiv.org/abs/2407.21783
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[25]
Github copilot now has a better ai model and new capabilities
GitHub Blog. Github copilot now has a better ai model and new capabilities. https://github.blog/ai-and-ml/github-copilot/ github-copilot-now-has-a-better-ai-model-and-new-capabilities/ , 2023. Accessed: 2024-11-11
work page 2023
-
[26]
GitHub, Inc. Github copilot. https://code.visualstudio.com/docs/copilot/ overview, 2024. Accessed: 2024-10-17
work page 2024
-
[27]
Alex Gu, Wen-Ding Li, Naman Jain, Theo Olausson, Celine Lee, Koushik Sen, and Armando Solar-Lezama. The counterfeit conundrum: Can code language models grasp the nuances of their incorrect generations? In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, ed- itors, Findings of the Association for Computational Linguistics ACL 2024 , pages 74–117, Bangkok, Th...
-
[28]
GraphCodeBERT: Pre-training Code Representations with Data Flow
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. Graphcodebert: Pre-training code representa- tions with data flow. arXiv preprint arXiv:2009.08366, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[29]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y . Wu, Y .K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024. URL https://arxiv.org/abs/2401.14196
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Codescm: Causal analysis for multi-modal code generation, 2025
Mukur Gupta, Noopur Bhatt, and Suman Jana. Codescm: Causal analysis for multi-modal code generation, 2025. URL https://arxiv.org/abs/2502.05150
-
[31]
Hossein Hosseini, Baicen Xiao, Mayoore S. Jaiswal, and Radha Poovendran. On the limita- tion of convolutional neural networks in recognizing negative images. 2017 16th IEEE Inter- national Conference on Machine Learning and Applications (ICMLA) , pages 352–358, 2017. URL https://api.semanticscholar.org/CorpusID:24753302
work page 2017
-
[32]
Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Ji- ajun Zhang, Bowen Yu, Kai Dang, et al. Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[33]
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[34]
Sok: History is a vast early warning system: Auditing the provenance of system intrusions
Muhammad Adil Inam, Yinfang Chen, Akul Goyal, Jason Liu, Jaron Mink, Noor Michael, Sneha Gaur, Adam Bates, and Wajih Ul Hassan. Sok: History is a vast early warning system: Auditing the provenance of system intrusions. In 2023 IEEE Symposium on Security and Privacy (SP), 2023. 11
work page 2023
-
[35]
Practical attacks against black-box code completion engines
Slobodan Jenko, Jingxuan He, Niels M ¨undler, Mark Vero, and Martin Vechev. Practical attacks against black-box code completion engines. arXiv preprint arXiv:2408.02509, 2024
-
[36]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. Why don’t soft- ware developers use static analysis tools to find bugs? In 2013 35th International Conference on Software Engineering (ICSE), pages 672–681, 2013. doi: 10.1109/ICSE.2013.6606613
-
[37]
Hong Jin Kang, Khai Loong Aw, and David Lo. Detecting false alarms from automatic static analysis tools: how far are we? In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22, page 698–709, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9781450392211. doi: 10.1145/3510003.3510214. URL https: //doi.org/1...
-
[38]
Some problems on cayley graphs
Elena Konstantinova. Some problems on cayley graphs. Linear Algebra and its applications, 429(11-12):2754–2769, 2008
work page 2008
-
[39]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Sym- posium on Operating Systems Principles, 2023
work page 2023
-
[40]
IRIS: LLM-assisted static analysis for detecting security vulnerabilities
Ziyang Li, Saikat Dutta, and Mayur Naik. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. In The Thirteenth International Conference on Learning Representa- tions, 2025. URL https://openreview.net/forum?id=9LdJDU7E91
work page 2025
-
[41]
D. Liu and S. Zhang. ALANCA: Active learning guided adversarial attacks for code com- prehension on diverse pre-trained and large language models. In 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages 602–613, Rovaniemi, Finland, 2024. doi: 10.1109/SANER60148.2024.00067
-
[42]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https: //openreview.net/forum?id=1qvx610Cu7
work page 2023
-
[43]
Repoqa: Evaluating long context code understanding
Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, and Lingming Zhang. Repoqa: Evaluating long context code understanding. arXiv preprint arXiv:2406.06025, 2024
-
[44]
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. Codexglue: A machine learning bench- mark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[45]
DIP: Dead code insertion based black- box attack for programming language model
CheolWon Na, YunSeok Choi, and Jee-Hyong Lee. DIP: Dead code insertion based black- box attack for programming language model. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers) , pages 7777–7791, Toronto, Canada, July
-
[46]
doi: 10.18653/v1/2023.acl-long.430
Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.430. URL https://aclanthology.org/2023.acl-long.430
-
[47]
Introducing gpt-4.1 in the api, 2025
OpenAI. Introducing gpt-4.1 in the api, 2025. URL https://openai.com/index/gpt-4-1
work page 2025
-
[48]
2024 developer survey: Ai and software development
Stack Overflow. 2024 developer survey: Ai and software development. https://survey. stackoverflow.co/2024/ai/, 2024. Accessed: 2024-10-07
work page 2024
-
[49]
Cweval: Outcome- driven evaluation on functionality and security of llm code generation, 2025
Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, and Baishakhi Ray. Cweval: Outcome- driven evaluation on functionality and security of llm code generation, 2025. URL https: //arxiv.org/abs/2501.08200
-
[50]
Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. Do users write more insecure code with ai assistants? In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 2785–2799, 2023. 12
work page 2023
-
[51]
Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, and Mo- hammad Amin Alipour
Md Rafiqul Islam Rabin, Nghi D.Q. Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, and Mo- hammad Amin Alipour. On the generalizability of neural program models with respect to semantic-preserving program transformations. Information and Software Technology, 2021. doi: https://doi.org/10.1016/j.infsof.2021.106552
-
[52]
Semantic robustness of models of source code
Goutham Ramakrishnan, Jordan Henkel, Thomas Reps, and Somesh Jha. Semantic robustness of models of source code. arXiv preprint arXiv:2002.03043, 2020
-
[53]
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean- baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Codebleu: a method for automatic evaluation of code synthesis, 2020. URL https://arxiv.org/abs/2009.10297
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[55]
Collaborative software development environment
Replit. Collaborative software development environment. https://replit.com/, 2024. Accessed: 2024-10-07
work page 2024
-
[56]
Why larger llm context windows are all the rage
IBM Research. Why larger llm context windows are all the rage. https://research.ibm. com/blog/larger-context-window, 2023. Accessed: 2024-10-18
work page 2023
-
[57]
Anatomy of a coding assistant, 2023
Quinn Slack. Anatomy of a coding assistant, 2023. URL https://sourcegraph.com/ blog/anatomy-of-a-coding-assistant . Accessed: 2024-10-21
work page 2023
-
[58]
Generating adversarial computer programs using optimized obfusca- tions, 2021
Shashank Srikant, Sijia Liu, Tamara Mitrovska, Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, and Una-May O’Reilly. Generating adversarial computer programs using optimized obfusca- tions, 2021. URL https://arxiv.org/abs/2103.11882
-
[59]
Bigcloneeval: A clone detection tool evaluation frame- work with bigclonebench
Jeffrey Svajlenko and Chanchal K Roy. Bigcloneeval: A clone detection tool evaluation frame- work with bigclonebench. In 2016 IEEE international conference on software maintenance and evolution (ICSME), pages 596–600. IEEE, 2016
work page 2016
-
[60]
Mistral AI team. Codestral, May 2024. URL https://mistral.ai/news/codestral. publisher: Mistral AI
work page 2024
-
[61]
Zhao Tian, Junjie Chen, and Zhi Jin. Code difference guided adversarial example genera- tion for deep code models. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 850–862, 2023. doi: 10.1109/ASE56229.2023.00149
-
[62]
Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, and Gianluca Stringhini. Llms cannot reliably identify and reason about security vulnerabilities (yet?): A comprehensive evaluation, framework, and benchmarks, 2024. URL https://arxiv.org/ abs/2312.12575
-
[63]
ReCode: Robustness evaluation of code generation models
Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, and Bing Xiang. ReCode: Robustness evaluation of code generation models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Ann...
-
[64]
Detecting code clones with graph neural network and flow-augmented abstract syntax tree
Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 261–271. IEEE, 2020
work page 2020
-
[65]
Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, and Steven C. H. Hoi. Codet5+: Open code large language models for code understanding and generation.arXiv preprint, 2023. 13
work page 2023
-
[66]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R ´emi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State- of-the-...
work page 2020
-
[67]
Fangzhou Wu, Xiaogeng Liu, and Chaowei Xiao. Deceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions, 2023. URL https://arxiv.org/ abs/2312.04730
-
[68]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng ...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[69]
Natural attack for pre-trained models of code
Zhou Yang, Jieke Shi, Junda He, and David Lo. Natural attack for pre-trained models of code. In Proceedings of the 44th International Conference on Software Engineering , ICSE ’22, page 1482–1493, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9781450392211. doi: 10.1145/3510003.3510146. URL https://doi.org/10.1145/ 3510003.3510146
-
[70]
Adversarial examples for models of code
Noam Yefet, Uri Alon, and Eran Yahav. Adversarial examples for models of code. Proc. ACM Program. Lang., 4(OOPSLA), November 2020. doi: 10.1145/3428230. URL https: //doi.org/10.1145/3428230
-
[71]
An extensive study on pre-trained models for program understanding and generation
Zhengran Zeng, Hanzhuo Tan, Haotian Zhang, Jing Li, Yuqun Zhang, and Lingming Zhang. An extensive study on pre-trained models for program understanding and generation. In Pro- ceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis, pages 39–51, 2022
work page 2022
-
[72]
Generating adversarial examples for holding robustness of source code processing models
Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu, and Zhi Jin. Generating adversarial examples for holding robustness of source code processing models. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):1169–1176, Apr. 2020. doi: 10.1609/aaai.v34i01
-
[73]
URL https://ojs.aaai.org/index.php/AAAI/article/view/5469
-
[74]
Towards robustness of deep program processing models—detection, estimation, and enhancement
Huangzhao Zhang, Zhiyi Fu, Ge Li, Lei Ma, Zhehao Zhao, Hua’an Yang, Yizhe Sun, Yang Liu, and Zhi Jin. Towards robustness of deep program processing models—detection, estimation, and enhancement. ACM Trans. Softw. Eng. Methodol., 31(3), April 2022. ISSN 1049-331X. doi: 10.1145/3511887. URL https://doi.org/10.1145/3511887
-
[75]
A black-box attack on code models via representation nearest neighbor search
Jie Zhang, Wei Ma, Qiang Hu, Shangqing Liu, Xiaofei Xie, Yves Le Traon, and Yang Liu. A black-box attack on code models via representation nearest neighbor search. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Com- putational Linguistics: EMNLP 2023 , pages 9706–9716, Singapore, December 2023. As- sociation for Com...
-
[76]
GitHub Copilot now has a better AI model and new capabil- ities, February 2023
Shuyin Zhao. GitHub Copilot now has a better AI model and new capabil- ities, February 2023. URL https://github.blog/ai-and-ml/github-copilot/ github-copilot-now-has-a-better-ai-model-and-new-capabilities/
work page 2023
-
[77]
Evolutionary multi-objective optimiza- tion for contextual adversarial example generation.Proc
Shasha Zhou, Mingyu Huang, Yanan Sun, and Ke Li. Evolutionary multi-objective optimiza- tion for contextual adversarial example generation.Proc. ACM Softw. Eng., 1(FSE), July 2024. doi: 10.1145/3660808. URL https://doi.org/10.1145/3660808. 14
-
[78]
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems, 32, 2019. A Implementation Transformations. Although the Cayley Graph structure accommodates any semantics-preserving tr...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.