pith. sign in

arxiv: 2509.11787 · v5 · submitted 2025-09-15 · 💻 cs.SE · cs.MA

CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings

Pith reviewed 2026-05-18 16:29 UTC · model grok-4.3

classification 💻 cs.SE cs.MA
keywords static analysis warningsautomatic repairLLM agentscode fixingSonarQubeJavaagentic frameworkfalse positive suppression
0
0 comments X

The pith

An LLM-based agent classifies and repairs static analysis warnings by iteratively gathering codebase information and editing code, achieving plausible fixes for 96.8% of cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CodeCureAgent as a way to automate the manual and often ignored task of resolving static analysis warnings that accumulate and degrade code quality over time. It relies on an agentic framework where an LLM agent repeatedly calls tools to explore the project and make targeted edits, classifying warnings as false positives for suppression or true positives for repair. Tested on 1,000 SonarQube warnings across 106 Java projects and 291 rules, the method delivers plausible fixes in 96.8% of instances while beating prior approaches by 29 to 34 points, with 86.3% of a sampled set confirmed correct by hand. A three-step check approves each patch only after a clean build, warning removal without new issues, and passing tests. This setup could let teams keep codebases cleaner without constant human intervention.

Core claim

CodeCureAgent harnesses LLM-based agents to automatically analyze, classify, and repair static analysis warnings. Unlike fixed algorithms, the agentic framework iteratively invokes tools to gather additional information from the codebase and edit the code to resolve the warning. It detects and suppresses false positives while fixing true positives. Evaluated on 1,000 SonarQube warnings in 106 Java projects covering 291 distinct rules, the approach produces plausible fixes for 96.8% of the warnings, outperforming state-of-the-art baselines by 29.2%-34.0%. Manual inspection of 291 cases shows an 86.3% correct-fix rate. Patches are approved through a three-step heuristic of successful build, no

What carries the argument

CodeCureAgent, an agentic LLM framework that iteratively invokes tools such as code search to explore the codebase and then edits code to resolve or suppress warnings, validated by a three-step build-and-test heuristic.

If this is right

  • Developers can address static analysis warnings with far less manual effort.
  • Code quality improves by preventing the buildup of unresolved warnings over time.
  • The method can be embedded in CI/CD pipelines for ongoing automatic maintenance.
  • False-positive warnings can be suppressed without unnecessary code changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The agentic pattern may extend to related tasks such as security patch generation or refactoring suggestions.
  • Performance could be tested on projects with sparse test suites to check reliance on the validation heuristic.
  • Integration with multiple static analysis tools beyond SonarQube would broaden applicability across languages.

Load-bearing premise

The three-step heuristic of successful build, warning disappearance without new warnings, and passing test suite is enough to confirm a patch is correct and introduces no undetected regressions.

What would settle it

Running additional static analysis tools or long-term usage monitoring on the generated patches and finding that a substantial fraction introduce new bugs or regressions despite passing the three-step heuristic.

Figures

Figures reproduced from arXiv: 2509.11787 by Islem Bouzenia, Michael Pradel, Pascal Joos.

Figure 1
Figure 1. Figure 1: Overview of CodeCureAgent. 3 Approach 3.1 Overview [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CodeCureAgent time and monetary cost distribution between fixed and unfixed warnings. [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Absolute number of tool calls, comparing between fixed and unfixed warnings. [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
read the original abstract

Static analysis tools are widely used to detect bugs, vulnerabilities, and code smells. Traditionally, developers must resolve these warnings manually. Because this process is tedious, developers sometimes ignore warnings, leading to an accumulation of warnings and a degradation of code quality. This paper presents CodeCureAgent, an approach that harnesses LLM-based agents to automatically analyze, classify, and repair static analysis warnings. Unlike previous work, our method does not follow a predetermined algorithm. Instead, we adopt an agentic framework that iteratively invokes tools to gather additional information from the codebase (e.g., via code search) and edit the codebase to resolve the warning. CodeCureAgent detects and suppresses false positives, while fixing true positives when identified. We equip CodeCureAgent with a three-step heuristic to approve patches: (1) build the project, (2) verify that the warning disappears without introducing new warnings, and (3) run the test suite. We evaluate CodeCureAgent on a dataset of 1,000 SonarQube warnings found in 106 Java projects and covering 291 distinct rules. Our approach produces plausible fixes for 96.8% of the warnings, outperforming state-of-the-art baseline approaches by 29.2%-34.0% in plausible-fix rate. Manual inspection of 291 cases reveals a correct-fix rate of 86.3%, showing that CodeCureAgent can reliably repair static analysis warnings. The approach incurs LLM costs of about 2.9 cents (USD) and an end-to-end processing time of about four minutes per warning. We envision CodeCureAgent helping to clean existing codebases and being integrated into CI/CD pipelines to prevent the accumulation of static analysis warnings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents CodeCureAgent, an LLM-based agentic system for classifying and repairing SonarQube static analysis warnings in Java projects. Unlike fixed algorithms, it iteratively uses tools to gather codebase context and edit code, detecting false positives and fixing true positives. On 1,000 warnings across 106 projects and 291 rules, it reports 96.8% plausible-fix rate (outperforming baselines by 29.2-34.0%), 86.3% correct-fix rate from manual review of 291 cases, using a three-step heuristic (build success, warning disappearance without new warnings, test-suite passage). Per-warning cost is ~2.9 cents USD and runtime ~4 minutes.

Significance. If the empirical results hold under stronger validation, the work would demonstrate practical value of agentic LLM frameworks for automated program repair in software maintenance. The scale of evaluation (106 real projects, 291 rules), low cost/time, and potential CI/CD integration are strengths that could reduce warning accumulation and improve code quality with minimal developer effort.

major comments (2)
  1. [Evaluation] Evaluation section: The three-step heuristic (successful build, warning disappearance without new warnings, test-suite passage) is presented as sufficient to confirm plausible and correct fixes, but this is load-bearing for the 96.8% and 86.3% rates. Incomplete test coverage in typical projects means the heuristic can accept superficial changes (e.g., refactoring that masks the issue or suppression) or regressions on untested paths; the manual review of 291 cases inherits the same limitation and does not fully compensate.
  2. [Evaluation] Evaluation section: Limited detail is given on re-implementation of the state-of-the-art baselines and on the protocol for manual correctness judgments (e.g., exact criteria for deeming a fix 'correct', access to warning context, and inter-rater reliability measures). This directly affects confidence in the reported 29.2%-34.0% improvement and 86.3% correct-fix rate.
minor comments (2)
  1. [Abstract] Abstract: The 291 cases manually inspected and the 291 distinct rules are both mentioned; clarify whether these sets overlap or if the inspected cases are a random sample of the 1,000 warnings.
  2. Ensure tables reporting fix rates include confidence intervals or statistical significance tests for the baseline comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for highlighting important aspects of our evaluation that require further clarification. We respond to each major comment in turn and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: The three-step heuristic (successful build, warning disappearance without introducing new warnings, test-suite passage) is presented as sufficient to confirm plausible and correct fixes, but this is load-bearing for the 96.8% and 86.3% rates. Incomplete test coverage in typical projects means the heuristic can accept superficial changes (e.g., refactoring that masks the issue or suppression) or regressions on untested paths; the manual review of 291 cases inherits the same limitation and does not fully compensate.

    Authors: We recognize the validity of this concern, as test suites in real-world projects rarely achieve full coverage. Our heuristic follows common practices in automated program repair literature for assessing patch plausibility. The manual review of 291 randomly sampled cases was performed by the authors, who had complete access to the original SonarQube warnings, the modified code, and the project context to judge whether the fix correctly addressed the underlying issue. To address this point, we will include an expanded discussion of the heuristic's limitations and the role of manual validation in the revised evaluation section. revision: partial

  2. Referee: Limited detail is given on re-implementation of the state-of-the-art baselines and on the protocol for manual correctness judgments (e.g., exact criteria for deeming a fix 'correct', access to warning context, and inter-rater reliability measures). This directly affects confidence in the reported 29.2%-34.0% improvement and 86.3% correct-fix rate.

    Authors: We agree that more details on these aspects would enhance the paper's reproducibility and the reader's confidence in the results. In the revised manuscript, we will provide a detailed description of how the baseline approaches were re-implemented, including any necessary adaptations to work with our dataset of SonarQube warnings. Additionally, we will elaborate on the manual review protocol, specifying the criteria for correctness (such as whether the patch resolves the warning's root cause without side effects or behavioral changes), confirming that reviewers had full access to warning contexts and code, and noting that all judgments were reached through consensus among the authors. We will also report any measures of agreement used during the review process. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation on external dataset

full rationale

The paper describes an LLM-agent approach for classifying and repairing static analysis warnings, evaluated directly on a held-out dataset of 1,000 SonarQube warnings across 106 Java projects. All reported metrics (96.8% plausible-fix rate, 86.3% correct-fix rate) are obtained via measurement against the three-step heuristic and manual review; none are derived from equations, fitted parameters, or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the derivation chain. The heuristic itself is an explicit assumption about patch validity rather than a circular reduction of results to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard assumptions about LLM tool-use capabilities and the adequacy of existing build and test infrastructure; no new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5848 in / 1153 out tokens · 44195 ms · 2026-05-18T16:29:48.692541+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

  1. [1]

    Barr, Soline Ducousso, and Zheng Gao

    Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: neural type hints. InProceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI. 91–105. doi:10.1145/3385412.3385997

  2. [2]

    Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to fix bugs automatically. Proc. ACM Program. Lang.3, OOPSLA (2019), 159:1–159:27. doi:10.1145/3360585

  3. [3]

    Subarno Banerjee, Lazaro Clapp, and Manu Sridharan. 2019. NullAway: practical type-based null safety for Java. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and...

  4. [4]

    Patrick Bareiß, Beatriz Souza, Marcelo d’Amorim, and Michael Pradel. 2022. Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code.CoRRabs/2206.01335 (2022). arXiv:2206.01335 doi:10.48550/arXiv.2206.01335

  5. [5]

    Rohan Bavishi, Hiroaki Yoshida, and Mukul R. Prasad. 2019. Phoenix: automated data-driven synthesis of repairs for static analysis violations. InESEC/FSE. 613–624. doi:10.1145/3338906.3338952

  6. [6]

    Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. 2025. RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. InInternational Conference on Software Engineering (ICSE)

  7. [7]

    Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2023. Teaching Large Language Models to Self-Debug. arXiv:2304.05128 [cs.CL]

  8. [8]

    Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus

  9. [9]

    SequenceR: Sequence-to-sequence learning for end-to-end program repair,

    SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair.IEEE Trans. Software Eng.47, 9 (2021), 1943–1959. doi:10.1109/TSE.2019.2940179

  10. [10]

    Runxiang Cheng, Michele Tufano, Jürgen Cito, José Cambronero, Pat Rondon, Renyao Wei, Aaron Sun, and Satish Chandra. 2025. Agentic Bug Reproduction for Effective Automated Program Repair at Google.arXiv preprint arXiv:2502.01821(2025)

  11. [12]

    Yiu Wai Chow, Luca Di Grazia, and Michael Pradel. 2024. PyTy: Repairing Static Type Errors in Python. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 87:1–87:13. doi:10.1145/3597503.3639184

  12. [13]

    Yiu Wai Chow, Max Schäfer, and Michael Pradel. 2023. Beware of the Unexpected: Bimodal Taint Analysis. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA, René Just and Gordon Fraser (Eds.). ACM, 211–222. doi:10.1145/3597926.3598050

  13. [14]

    Malinda Dilhara, Abhiram Bellur, Timofey Bryksin, and Danny Dig. 2024. Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example. InFSE. https://doi.org/10.48550/arXiv.2402.07138

  14. [15]

    Sedick David Baker Effendi, Berk Cirisci, Rajdeep Mukherjee, Hoan Nguyen, and Omer Tripp. 2023. A language-agnostic framework for mining static analysis rules from code changes. InICSE-SEIP

  15. [16]

    Khashayar Etemadi, Nicolas Harrand, Simon Larsén, Haris Adzemovic, Henry Luong Phu, Ashutosh Verma, Fernanda Madeiral, Douglas Wikström, and Martin Monperrus. 2023. Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations.IEEE Trans. Dependable Secur. Comput.20, 4 (2023), 2794–2810. doi:10.1109/TDSC.2022.3167316

  16. [17]

    Khashayar Etemadi, Nicolas Harrand, Simon Larsén, Haris Adzemovic, Henry Luong Phu, Ashutosh Verma, Fernanda Madeiral, Douglas Wikström, and Martin Monperrus. 2023. Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations.IEEE Transactions on Dependable and Secure Computing20, 4 (7 2023), 2794–2810. doi:10.1109/ TDSC.2022.3167316

  17. [18]

    Pranav Garg and Srinivasan Sengamedu. 2022. Example-based Synthesis of Static Analysis Rules.CoRRabs/2204.08643 (2022). arXiv:2204.08643 doi:10.48550/arXiv.2204.08643

  18. [19]

    Olausson, Celine Lee, Koushik Sen, and Armando Solar-Lezama

    Alex Gu, Wen-Ding Li, Naman Jain, Theo X. Olausson, Celine Lee, Koushik Sen, and Armando Solar-Lezama. 2024. The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations? arXiv:2402.19475 [cs.SE]

  19. [20]

    Huimin Hu, Yingying Wang, Julia Rubin, and Michael Pradel. 2025. An Empirical Study of Suppressed Static Analysis Warnings.Proceedings of the ACM on Software Engineering2, FSE (2025), 290–311

  20. [21]

    Nasif Imtiaz, Akond Rahman, Effat Farhana, and Laurie Williams. 2019. Challenges with responding to static analysis tool alerts. In2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 245–249. doi:10.1109/MSR.2019.00049

  21. [22]

    Naman Jain, Shubham Gandhi, Atharv Sonwane, Aditya Kanade, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. 2023. StaticFixer: From Static Analysis to Static Repair. arXiv:2307.12465 [cs.SE]

  22. [23]

    Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of Code Language Models on Automated Program Repair. InICSE. 1430–1442. doi:10.1109/ICSE48619.2023.00125

  23. [24]

    Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs?. In2013 35th International Conference on Software Engineering (ICSE). IEEE, 672–681. doi:10.1109/ICSE.2013.6606613

  24. [25]

    Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, and Wei Le. 2021. Validating Static Warnings via Testing Code Fragments. InISSTA

  25. [26]

    Nima Karimipour, Justin Pham, Lazaro Clapp, and Manu Sridharan. 2023. Practical Inference of Nullability Types. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Satish Chandra, Kelly Blincoe, and Paolo Tonella (...

  26. [27]

    Yalin Ke, Kathryn T Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing programs with semantic code search (t). InASE. IEEE, 295–306

  27. [28]

    Clement, and Neel Sundaresan

    Anant Kharkar, Roshanak Zilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin B. Clement, and Neel Sundaresan. 2022. Learning to Reduce False Positives in Analytic Bug Detectors. In44th IEEE/ACM 44th International Conference on Software Engineering, ICSE. 1307–1316. doi:10.1145/3510003.3510153

  28. [29]

    Sifis Lagouvardos, Julian Dolby, Neville Grech, Anastasios Antoniadis, and Yannis Smaragdakis. 2020. Static Analysis of Shape in TensorFlow Programs. In34th European Conference on Object-Oriented Programming, ECOOP, Vol. 166. 15:1–15:29. doi:10.4230/LIPIcs.ECOOP.2020.15

  29. [30]

    Quang Loc Le, Azalea Raad, Jules Villard, Josh Berdine, Derek Dreyer, and Peter W. O’Hearn. 2022. Finding real bugs in big programs with incorrectness logic.Proc. ACM Program. Lang.6, OOPSLA (2022), 1–27. doi:10.1145/3527325

  30. [31]

    Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair.IEEE Trans. Software Eng.38, 1 (2012), 54–72

  31. [32]

    Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair.Commun. ACM62, 12 (2019), 56–65. doi:10.1145/3318162

  32. [33]

    Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2023. The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models. arXiv:2308.00245 [cs.SE]

  33. [34]

    Ziyang Li, Saikat Dutta, and Mayur Naik. 2024. LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. arXiv:2405.17238 [cs.CR]

  34. [35]

    Kui Liu, Dongsun Kim, Tegawendé F Bissyandé, Shin Yoo, and Yves Le Traon. 2018. Mining fix patterns for findbugs violations.IEEE Transactions on Software Engineering(2018)

  35. [36]

    Tbar: revisiting template-based automated program repair,

    Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: revisiting template-based automated program repair. InISSTA. ACM, 31–42. doi:10.1145/3293882.3330577

  36. [37]

    Yu Liu, Sergey Mechtaev, Pavle Subotić, and Abhik Roychoudhury. 2023. Program Repair Guided by Datalog-Defined Static Analysis. InESEC/FSE. 1216–1228

  37. [38]

    Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. InPOPL. 298–312

  38. [39]

    Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. CoCoNuT: combining context-aware neural translation models using ensemble for program repair. InISSTA. ACM, 101–114. doi:10.1145/ 3395363.3397369

  39. [40]

    Alexandru Marginean, Johannes Bader, Satish Chandra, Mark Harman, Yue Jia, Ke Mao, Alexander Mols, and Andrew Scott. 2019. Sapfix: Automated end-to-end repair at scale. In2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

  40. [41]

    Wonseok Oh and Hakjoo Oh. 2022. PyTER: Effective Program Repair for Python Type Errors. InESEC/FSE

  41. [42]

    Michael Reif, Florian Kübler, Michael Eichberg, Dominik Helm, and Mira Mezini. 2019. Judge: identifying, under- standing, and evaluating sources of unsoundness in call graphs. InProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019, Dongmei Zhang and Anders Møller (Eds.)....

  42. [43]

    Almazan, and Jeffrey S

    Nick Rutar, Christian B. Almazan, and Jeffrey S. Foster. 2004. A Comparison of Bug Finding Tools for Java. In International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society, 245–256

  43. [44]

    Georgios Sakkas, Madeline Endres, Benjamin Cosman, Westley Weimer, and Ranjit Jhala. 2020. Type error feedback via analytic program repair. InProceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 16–30. doi...

  44. [45]

    André Silva, Sen Fang, and Martin Monperrus. 2024. RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair. arXiv:2312.15698 [cs.SE]

  45. [46]

    Cristian-Alexandru Staicu, Martin Toldam Torp, Max Schäfer, Anders Møller, and Michael Pradel. 2020. Extracting taint specifications for JavaScript libraries. InICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 198–209. doi:10.1145/3377811.3380390

  46. [47]

    Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On learning meaningful code changes via neural machine translation. InICSE. 25–36. https://dl.acm.org/citation.cfm?id=3339509

  47. [48]

    Akshay Utture, Shuyang Liu, Christian Gram Kalhauge, and Jens Palsberg. 2022. Striking a Balance: Pruning False- Positives from Static Call Graphs. InICSE

  48. [49]

    Nalin Wadhwa, Jui Pradhan, Atharv Sonwane, Surya Prakash Sahu, Nagarajan Natarajan, Aditya Kanade, Suresh Parthasarathy, and Sriram Rajamani. 2024. CORE: Resolving Code Quality Issues using LLMs.Proceedings of the ACM on Software Engineering1, FSE (7 2024), 789–811. doi:10.1145/3643762

  49. [50]

    Tongjie Wang, Yaroslav Golubev, Oleg Smirnov, Jiawei Li, Timofey Bryksin, and Iftekhar Ahmed. 2021. PyNose: A Test Smell Detector For Python. InASE. CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings 21

  50. [51]

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. 2024. OpenHands: An Open Platform for A...

  51. [52]

    Cheng Wen, Yuandao Cai, Bin Zhang, Jie Su, Zhiwu Xu, Dugang Liu, Shengchao Qin, Zhong Ming, and Cong Tian

  52. [53]

    Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We? ACM Transactions on Knowledge Discovery from Data(2024)

  53. [54]

    Chunqiu Steven Xia and Lingming Zhang. 2024. Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, Maria Christakis and Michael Pradel (Eds.). ACM, 819–831. doi:10...

  54. [55]

    He Ye, Matias Martinez, and Martin Monperrus. 2022. Neural Program Repair with Execution-based Backpropagation. InICSE

  55. [56]

    He Ye and Martin Monperrus. 2024. ITER: Iterative Neural Repair for Multi-Location Patches. InICSE

  56. [57]

    Dongjun Youn, Sungho Lee, and Sukyoung Ryu. 2023. Declarative static analysis for multilingual programs using CodeQL.Softw. Pract. Exp.53, 7 (2023), 1472–1495. doi:10.1002/spe.3199

  57. [58]

    Jinman Zhao, Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, and Damien Octeau. 2018. Neural-augmented static analysis of Android communication. InProceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2...

  58. [59]

    Lewis, Luca Buratti, Edward A

    Yunhui Zheng, Saurabh Pujar, Burn L. Lewis, Luca Buratti, Edward A. Epstein, Bo Yang, Jim Laredo, Alessandro Morari, and Zhong Su. 2021. D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis. In43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2021, Madrid,...