pith. machine review for the scientific record. sign in

arxiv: 2605.06098 · v1 · submitted 2026-05-07 · 💻 cs.SE

Recognition: unknown

Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:00 UTC · model grok-4.3

classification 💻 cs.SE
keywords algorithm recognitionabstract syntax treepattern matchingdomain-specific languagesoftware maintenancecode clone detectionBigCloneEvalprogram understanding
0
0 comments X

The pith

Abstract syntax tree patterns expressed in a domain-specific language can identify algorithm implementations in code with higher accuracy than large language models or code clone detection tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether patterns written on a program's abstract syntax tree can reliably detect which algorithm a piece of code implements. This capability would let tools automatically flag inefficient choices such as Bubble Sort and suggest better alternatives, or simply document the concerns present in a large code base. The authors build a small language that lets them write compact descriptions of key algorithmic features, derive patterns from publicly available reference implementations, and match those patterns against code. They evaluate the method on a standard benchmark containing common algorithms and report stronger results than both an LLM baseline and several existing clone-detection tools. If the method holds up, it offers a lightweight, explainable alternative to black-box approaches for routine program understanding tasks.

Core claim

The central claim is that a prototype built around a domain-specific language for abstract syntax tree patterns, a corresponding matching engine, and an initial catalog of patterns derived from web-searched reference implementations can recognize algorithms such as Fibonacci, Bubble Sort, and Binary Search. On a subset of the BigCloneEval benchmark the approach produces an average F1-score of 0.74 while the CodeLlama model reaches only 0.35; it also attains a recall of 0.62 against a best-case recall of 0.20 for the strongest code-clone detector used as baseline.

What carries the argument

A domain-specific language that lets users describe the essential structural features of an algorithm as a search pattern on the abstract syntax tree, paired with a matching algorithm that locates those features inside program code.

If this is right

  • Software maintenance tools could automatically surface which algorithms are present in a code base and thereby document the concerns implemented by each module.
  • Quality-assessment systems could flag inefficient algorithm choices and propose library replacements without manual inspection.
  • The ready-to-use pattern catalog demonstrates that a modest number of hand-written descriptions suffices for practical recognition of common algorithms.
  • Because the patterns operate directly on the abstract syntax tree, the method remains language-specific yet does not require training data or model fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern language could be applied to detect domain-specific or composite algorithms once the catalog is extended beyond the initial set.
  • Integration with refactoring engines would allow automatic replacement suggestions whenever an inefficient pattern is matched.
  • Evaluating the patterns on large open-source repositories rather than the benchmark subset would test whether they survive real-world naming, formatting, and optimization differences.
  • Combining the structural patterns with lightweight semantic checks might improve precision on heavily refactored or obfuscated implementations.

Load-bearing premise

The patterns created from web-searched reference implementations are representative of real-world algorithm variants and the chosen BigCloneEval subset provides an unbiased test of recognition accuracy without post-hoc pattern tuning.

What would settle it

Apply the same fixed set of patterns to a fresh collection of code snippets that contain the target algorithms but use substantially different control-flow or data-structure choices than the web references, then measure whether average F1-score drops below 0.5.

Figures

Figures reproduced from arXiv: 2605.06098 by Denis Neum\"uller, Florian Sihler, Matthias Tichy, Raphael Straub.

Figure 1
Figure 1. Figure 1: Overview of our approach view at source ↗
Figure 2
Figure 2. Figure 2: A Search Pattern for Prime Factors using only core language primitives. meaning that we would match a binary operation with any operand or a for-loop with any condition. Addi￾tionally, the any() and anyType() constructs allow the user to match any AST element or any type reference in the AST, similar to the “.” in regular expressions. When specifying a search pattern, the builders are used as ba￾sic buildi… view at source ↗
Figure 3
Figure 3. Figure 3: Two semantically equivalent if statements: one uses an early return, the other an explicit else block. code. However, there are cases where we want to match AST elements in UNORDERED fashion. One such example is method parameters which are matched UNORDERED by default. Another example is statements that are not dataflow dependent on each other and can therefore oc￾cur in any order inside a block. An exampl… view at source ↗
Figure 4
Figure 4. Figure 4: Example search pattern on the left and the corresponding pattern tree on the right. The arrows indicate which part of the pattern tree is created by which part of the search pattern (obvious parts such as block() are omitted for brevity). Before starting the matching process, we first trans￾form the search pattern, specified by the user in our embedded Java DSL, into a different representation called the p… view at source ↗
Figure 5
Figure 5. Figure 5: Example showcasing two possible successful matching states. Either (𝑎) we bind i to the variable x, or (𝑏) the variable y. 2.4.2 Matching Procedure. The matching process starts with a search pattern and a set consisting of a single initial matching state. All nodes of the source code AST that match the root node type of our search pattern are valid entry points for the search. For each match between a patt… view at source ↗
Figure 6
Figure 6. Figure 6: Example pattern tree for the prime factors search pattern from Listing 3 on the left and an example implementation which would satisfy the pattern tree on the right. The numbers specify the matching order and the arrows of selected pattern tree nodes indicate which code fragment is matched by them. oneOf() construct tries all of its possible alternatives, of which only the forLoop() matches. We carry on by… view at source ↗
read the original abstract

The automated recognition of algorithm implementations can support many software maintenance and re-engineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like Bubble Sort and suggesting superior alternatives from a library can help in assessing and improving the quality of a system. Approaches from related work suffer from usability as well as scalability issues and their accuracy is not evaluated. In this paper, we investigate how well our approach based on the abstract syntax tree of a program performs for automatic algorithm recognition. To this end, we have implemented a prototype consisting of: A domain-specific language designed to capture the key features of an algorithm and used to express a search pattern on the abstract syntax tree, a matching algorithm to find these features, and an initial catalog of "ready to use" patterns. To create our search patterns we performed a web search using the algorithm name and described key features of the found reference implementations with our domain-specific language. We evaluate our prototype on a subset of the BigCloneEval benchmark containing algorithms like Fibonacci, Bubble Sort, and Binary Search. We achieve an average F1-score of 0.74 outperforming the large language model Codellama which attains 0.35. Additionally, we use multiple code clone detection tools as a baseline for comparison, achieving a recall of 0.62 while the best-performing tool reaches 0.20.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to demonstrate the effectiveness of using abstract syntax tree (AST) patterns defined in a custom domain-specific language (DSL) for recognizing algorithm implementations in code. Patterns are manually created based on web-searched reference implementations for algorithms such as Fibonacci, Bubble Sort, and Binary Search. On a subset of the BigCloneEval benchmark, the approach achieves an average F1-score of 0.74, outperforming CodeLlama (0.35) and code clone detection tools (recall 0.62 vs. 0.20).

Significance. Should the central performance claims be substantiated with more rigorous evaluation details, the work would represent a meaningful contribution to automated software analysis by providing an interpretable, pattern-based alternative to black-box LLM methods and less accurate clone detectors. The DSL and matching algorithm could be adopted for practical re-engineering tasks if shown to generalize.

major comments (3)
  1. [Abstract and Evaluation] Abstract and Evaluation section: The process for creating the AST patterns is described only at a high level as performing a web search using the algorithm name and describing key features with the DSL. No details are provided on the number of patterns per algorithm, the iteration or revision process during creation, or explicit confirmation that no BigCloneEval examples were inspected before finalizing patterns. This is load-bearing for the 0.74 F1 claim, as it leaves open the possibility that patterns were tuned to common variants in the chosen subset rather than capturing general algorithmic essence.
  2. [Evaluation] Evaluation section: The reported F1-score of 0.74 and recall of 0.62 lack supporting details on the exact size and composition of the BigCloneEval subset (e.g., number of true-positive snippets per algorithm, total examples), any cross-validation or hold-out procedures, and statistical significance testing. Without these, the superiority over CodeLlama and clone detectors cannot be reliably assessed and risks being an artifact of the specific test selection.
  3. [Approach] Approach section: The DSL for expressing AST patterns is introduced as capturing 'key features' but the manuscript provides no grammar, formal syntax, or complete examples of pattern definitions. This omission hinders assessment of the matching algorithm's generality and reproducibility of the catalog.
minor comments (2)
  1. The abstract should specify the exact algorithms and number of code snippets included in the BigCloneEval subset to allow readers to contextualize the results.
  2. Provide a link to the pattern catalog or make it available as supplementary material to support the claim of 'ready to use' patterns and enable replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments, which highlight important areas for improving clarity and rigor. We address each major comment below and will incorporate revisions to strengthen the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: The process for creating the AST patterns is described only at a high level as performing a web search using the algorithm name and describing key features with the DSL. No details are provided on the number of patterns per algorithm, the iteration or revision process during creation, or explicit confirmation that no BigCloneEval examples were inspected before finalizing patterns. This is load-bearing for the 0.74 F1 claim, as it leaves open the possibility that patterns were tuned to common variants in the chosen subset rather than capturing general algorithmic essence.

    Authors: We agree that the pattern creation process merits more detail to support the performance claims. In the revised manuscript, we will expand the relevant sections to specify the number of patterns per algorithm, describe the iterative refinement process (initial patterns derived from web-sourced references were tested and adjusted against additional independent implementations), and explicitly state that no BigCloneEval snippets were examined during pattern development. This will demonstrate that patterns target core algorithmic structures identified from public references. revision: yes

  2. Referee: [Evaluation] Evaluation section: The reported F1-score of 0.74 and recall of 0.62 lack supporting details on the exact size and composition of the BigCloneEval subset (e.g., number of true-positive snippets per algorithm, total examples), any cross-validation or hold-out procedures, and statistical significance testing. Without these, the superiority over CodeLlama and clone detectors cannot be reliably assessed and risks being an artifact of the specific test selection.

    Authors: We acknowledge the value of greater transparency in the evaluation protocol. The revised Evaluation section will include the precise size and composition of the BigCloneEval subset (number of snippets per algorithm and total examples). As the method relies on static, manually authored patterns rather than learned models, traditional cross-validation does not apply; we will clarify this distinction and discuss implications for generalizability. We will also add statistical significance testing for the reported performance differences to allow more robust comparison with baselines. revision: yes

  3. Referee: [Approach] Approach section: The DSL for expressing AST patterns is introduced as capturing 'key features' but the manuscript provides no grammar, formal syntax, or complete examples of pattern definitions. This omission hinders assessment of the matching algorithm's generality and reproducibility of the catalog.

    Authors: We concur that the absence of the DSL grammar and examples limits reproducibility. In the revised Approach section, we will include the formal grammar and syntax of the DSL along with at least one complete, self-contained example of a pattern definition (e.g., for Binary Search). This addition will enable readers to evaluate the language's expressiveness and the catalog's reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: evaluation uses external benchmark and independent baselines

full rationale

The paper describes an empirical evaluation on a subset of the external BigCloneEval benchmark, with patterns created via web search on algorithm names and reference implementations. Standard F1 and recall metrics are reported against independent comparators (CodeLlama at 0.35 F1, clone detectors at 0.20 recall). No equations, fitted parameters, self-citations, or derivations are present that reduce the reported results to the inputs by construction. The approach is self-contained against external data and tools.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that AST structural patterns suffice to distinguish algorithm implementations and on the standard assumption that the benchmark subset is representative.

axioms (1)
  • domain assumption Key features of algorithms can be captured by patterns on abstract syntax trees
    Invoked when defining the DSL and search patterns from reference implementations.

pith-pipeline@v0.9.0 · 5549 in / 1242 out tokens · 47750 ms · 2026-05-08T09:00:01.655542+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 23 canonical work pages · 3 internal anchors

  1. [1]

    Measuring program comprehension: A large-scale field study with professionals,

    X. Xia, L. Bao, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Measuring program comprehension: A large-scale field study with professionals, ” IEEE Trans. Software Eng. , vol. 44, no. 10, pp. 951–976, 2018. [Online]. Available: https://doi.org/10.1109/TSE.2017.2734091

  2. [2]

    Model-driven reverse engineering approaches: A systematic literature review,

    C. Raibulet, F. A. Fontana, and M. Zanoni, “Model-driven reverse engineering approaches: A systematic literature review, ”IEEE Access, vol. 5, pp. 14 516–14 542, 2017. [Online]. Available: https://doi.org/10.1109/ACCESS.2017.2733518

  3. [3]

    Design pattern detection approaches: a systematic review of the literature,

    H. Yarahmadi and S. M. H. Hasheminejad, “Design pattern detection approaches: a systematic review of the literature, ” Artif. Intell. Rev., vol. 53, no. 8, pp. 5789–5846, 2020. [Online]. Available: https://doi.org/10.1007/s10462-020-09834-5

  4. [4]

    In2018 13th IEEE Inter- national Conference on Automatic Face & Gesture Recognition (FG 2018)

    A. Shahbazian, Y. K. Lee, D. M. Le, Y. Brun, and N. Med- vidovic, “Recovering architectural design decisions, ” in IEEE International Conference on Software Architecture, ICSA 2018, Seattle, W A, USA, April 30 - May 4, 2018 . IEEE Computer Society, 2018, pp. 95–104. [Online]. Available: https://doi.org/10.1109/ICSA.2018.00019

  5. [5]

    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd Edition . MIT Press, 2009. [Online]. Available: http://mitpress.mit.edu/books/introduct ion-algorithms

  6. [6]

    Softmon: A tool to compare similar open-source software from a performance perspective,

    S. S. Singh and S. R. Sarangi, “Softmon: A tool to compare similar open-source software from a performance perspective, ” in MSR ’20: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29-30 June, 2020 , S. Kim, G. Gousios, S. Nadi, and J. Hejderup, Eds. ACM, 2020, pp. 397–408. [Online]. Available: https://doi.org/10....

  7. [7]

    Turner, J

    N. Anquetil, K. M. de Oliveira, K. D. de Sousa, and M. G. B. Dias, “Software maintenance seen as a knowledge management issue, ”Inf. Softw. Technol. , vol. 49, no. 5, pp. 515–529, 2007. [Online]. Available: https://doi.org/10.1016/j. infsof.2006.07.007

  8. [8]

    A systematic examination of knowledge loss in open source software projects,

    M. Rashid, P. M. Clarke, and R. V. O’Connor, “A systematic examination of knowledge loss in open source software projects, ”Int. J. Inf. Manag., vol. 46, pp. 104–123, 2019. [Online]. Available: https://doi.org/10.1016/j.ijinfomgt.2018.11.015

  9. [9]

    Metzger and Z

    R. Metzger and Z. Wen, Automatic algorithm recognition and replacement: a new approach to program optimization . MIT Press, 2000

  10. [10]

    Towards a framework for algorithm recognition in binary code,

    F. Mesnard, E. Payet, and W. Vanhoof, “Towards a framework for algorithm recognition in binary code, ” inProceedings of the 18th International Symposium on Principles and Practice of Declarative Programming, ser. PPDP ’16. New York, NY, USA: Association for Computing Machinery, 2016, pp. 202–213. [Online]. Available: https://doi.org/10.1145/2967973.2968600

  11. [11]

    Automatic algorithm recognition of source-code using machine learning,

    M. Shalaby, T. Mehrez, A. El Mougy, K. Abdulnasser, and A. Al- Safty, “Automatic algorithm recognition of source-code using machine learning, ” in2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 2017, pp. 170– 177

  12. [12]

    Multi-view graph representation for programming language processing: An investigation into algorithm detection,

    T. Long, Y. Xie, X. Chen, W. Zhang, Q. Cao, and Y. Yu, “Multi-view graph representation for programming language processing: An investigation into algorithm detection, ” Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 5, pp. 5792–5799, Jun. 2022. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/20522

  13. [13]

    Autonomous mental development for algorithm recognition,

    G. Zhu and X. Zhu, “Autonomous mental development for algorithm recognition, ” inInternational Conference on Infor- mation Science and Technology, March 2011, pp. 339–347

  14. [14]

    Recognizing sorting algorithms with the c4. 5 decision tree classifier,

    A. Taherkhani, “Recognizing sorting algorithms with the c4. 5 decision tree classifier, ” in2010 IEEE 18th International Con- ference on Program Comprehension . IEEE, 2010, pp. 72–75

  15. [15]

    Program concept recognition and transformation,

    W. Kozaczynski, J. Ning, and A. Engberts, “Program concept recognition and transformation, ”IEEE Transactions on Soft- ware Engineering, vol. 18, no. 12, pp. 1065–1075, 1992

  16. [16]

    A memory-based approach to recognizing programming plans,

    A. Quilici, “A memory-based approach to recognizing programming plans, ” Commun. ACM , vol. 37, no. 5, pp. 84–93, 1994. [Online]. Available: https://doi.org/10.1145/ 175290.175301

  17. [17]

    Using attributed flow graph parsing to recognize clichés in programs,

    L. M. Wills, “Using attributed flow graph parsing to recognize clichés in programs, ” inGraph Gramars and Their Application to Computer Science, 5th International Workshop, Williamsburg, V A, USA, November 13-18, 1994, Selected Papers, ser. Lecture Notes in Computer Science, J. E. Cuny, H. Ehrig, G. Engels, and G. Rozenberg, Eds., vol. 1073. Springer, 199...

  18. [18]

    Arcc: Assistant for repetitive code comprehension,

    W. Z. Nunez, V. J. Marin, and C. R. Rivero, “Arcc: Assistant for repetitive code comprehension, ” inProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering , ser. ESEC/FSE 2017. New York, NY, USA: Association for Computing Machinery, 2017, pp. 999–1003. [Online]. Available: https://doi.org/10.1145/3106237.3122824

  19. [19]

    Code llama: Open foundation models for code,

    B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, R. Sauvestre, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. Bhatt, C. C. Ferrer, A. Grattafiori, W. Xiong, A. Défossez, J. Copet, F. Azhar, H. Touvron, L. Mar- tin, N. Usunier, T. Scialom, and G. Synnaeve, “Code llama: Open foundation models for code, ” 2024

  20. [20]

    Is your code gener- ated by chatgpt really correct? rigorous evaluation of large language models for code generation,

    J. Liu, C. S. Xia, Y. Wang, and L. Zhang, “Is your code gener- ated by chatgpt really correct? rigorous evaluation of large language models for code generation, ” Advances in Neural Information Processing Systems, vol. 36, 2024

  21. [21]

    Repairllama: Efficient representations and fine-tuned adapters for program repair,

    A. Silva, S. Fang, and M. Monperrus, “Repairllama: Efficient representations and fine-tuned adapters for program repair, ” 2023

  22. [22]

    Spoon: A Library for Implementing Analyses and Transformations of Java Source Code,

    R. Pawlak, M. Monperrus, N. Petitprez, C. Noguera, and L. Seinturier, “Spoon: A Library for Implementing Analyses and Transformations of Java Source Code, ”Software: Practice and Experience , vol. 46, pp. 1155–1179, 2015. [Online]. Available: https://hal.archives-ouvertes.fr/hal-01078532/docu Exploring the Effectiveness of Abstract Syntax Tree Patterns fo...

  23. [23]

    Fowler, Domain-Specific Languages (Addison-Wesley Signa- ture Series (Fowler)), Hardcover ed

    M. Fowler, Domain-Specific Languages (Addison-Wesley Signa- ture Series (Fowler)), Hardcover ed. Addison-Wesley Profes- sional, 9 2010

  24. [24]

    Bigcloneeval: A clone detection tool evaluation framework with bigclonebench,

    J. Svajlenko and C. K. Roy, “Bigcloneeval: A clone detection tool evaluation framework with bigclonebench, ” in2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 2016, pp. 596–600

  25. [25]

    Towards a big data curated benchmark of inter-project code clones,

    J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, and M. M. Mia, “Towards a big data curated benchmark of inter-project code clones, ” in2014 IEEE International Conference on Software Maintenance and Evolution, 2014, pp. 476–480

  26. [26]

    Oreo: Detection of clones in the twilight zone,

    V. Saini, F. Farmahinifarahani, Y. Lu, P. Baldi, and C. V. Lopes, “Oreo: Detection of clones in the twilight zone, ” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , ser. ESEC/FSE

  27. [27]

    New York, NY, USA: Association for Computing Machinery, 2018, pp. 354–365. [Online]. Available: https: //doi.org/10.1145/3236024.3236026

  28. [28]

    Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition - Reproducability Package,

    N. Denis, F. Sihler, R. Straub, and M. Tichy, “Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition - Reproducability Package, ” May 2024. [Online]. Available: https://doi.org/10.5281/zenodo.11217414

  29. [29]

    On precision of code clone detection tools,

    F. Farmahinifarahani, V. Saini, D. Yang, H. Sajnani, and C. V. Lopes, “On precision of code clone detection tools, ” in2019 IEEE 26th International Conference on Software Analysis, Evo- lution and Reengineering (SANER) , 2019, pp. 84–94

  30. [30]

    Available: https://arxiv.org/abs/2006.15682

    J. Svajlenko and C. K. Roy, “A survey on the evaluation of clone detection performance and benchmarking, ”CoRR, vol. abs/2006.15682, 2020. [Online]. Available: https://arxiv.org/ab s/2006.15682

  31. [31]

    The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation,

    D. Chicco and G. Jurman, “The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation, ”BMC genomics, vol. 21, no. 1, pp. 1–13, 2020

  32. [32]

    A survey of large language models,

    W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J.-Y. Nie, and J.-R. Wen, “A survey of large language models, ” 2023

  33. [33]

    Few-shot training llms for project-specific code-summarization,

    T. Ahmed and P. Devanbu, “Few-shot training llms for project-specific code-summarization, ” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , ser. ASE ’22. New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3559555

  34. [34]

    Understand- ing code semantics: An evaluation of transformer models in summarization,

    D. Mondal, A. Lodha, A. Sahoo, and B. Kumari, “Understand- ing code semantics: An evaluation of transformer models in summarization, ” 2023

  35. [35]

    Lahiri, and Sid- dhartha Sen

    C. S. Xia, Y. Wei, and L. Zhang, “Automated program repair in the era of large pre-trained language models, ” inProceedings of the 45th International Conference on Software Engineering , ser. ICSE ’23. IEEE Press, 2023, p. 1482–1494. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00129

  36. [36]

    Repair is nearly generation: multilingual program repair with llms,

    H. Joshi, J. C. Sanchez, S. Gulwani, V. Le, I. Radiček, and G. Verbruggen, “Repair is nearly generation: multilingual program repair with llms, ” inProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances i...

  37. [37]

    Evaluating Large Language Models Trained on Code

    M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herb...

  38. [38]

    Towards understanding the capability of large language models on code clone detection: A survey,

    S. Dou, J. Shan, H. Jia, W. Deng, Z. Xi, W. He, Y. Wu, T. Gui, Y. Liu, and X. Huang, “Towards understanding the capability of large language models on code clone detection: A survey, ” 2023

  39. [39]

    Investigating the efficacy of large language models for code clone detection,

    M. Khajezade, J. J. Wu, F. H. Fard, G. Rodríguez-Pérez, and M. S. Shehata, “Investigating the efficacy of large language models for code clone detection, ” 2024

  40. [40]

    Alpaca: A strong, repli- cable instruction-following model,

    R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto, “Alpaca: A strong, repli- cable instruction-following model, ” Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, vol. 3, no. 6, p. 7, 2023

  41. [41]

    Language Models are Few-Shot Learners

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

  42. [42]

    A learning algorithm for boltzmann machines,

    D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for boltzmann machines, ”Cognitive science, vol. 9, no. 1, pp. 147–169, 1985

  43. [44]

    The Curious Case of Neural Text Degeneration

    [Online]. Available: http://arxiv.org/abs/1904.09751

  44. [45]

    Hierarchical neural story generation.CoRR, abs/1805.04833, 2018

    A. Fan, M. Lewis, and Y. N. Dauphin, “Hierarchical neural story generation, ”CoRR, vol. abs/1805.04833, 2018. [Online]. Available: http://arxiv.org/abs/1805.04833

  45. [46]

    BigCloneEval: Evaluating Clone Detection Tools with BigCloneBench,

    J. Svajlenko, “BigCloneEval: Evaluating Clone Detection Tools with BigCloneBench, ” Oct. 2023, original-date: 2016-06- 21T00:07:05Z. [Online]. Available: https://github.com/jeffsva jlenko/BigCloneEval

  46. [47]

    Cloneworks: a fast and flexible large-scale near-miss clone detection tool

    J. Svajlenko and C. K. Roy, “Cloneworks: a fast and flexible large-scale near-miss clone detection tool. ” inICSE (Companion Volume), 2017, pp. 177–179. [Online]. Available: https://github.com/jeffsvajlenko/CloneWorks Denis Neumüller , Florian Sihler , Raphael Straub , and Matthias Tichy

  47. [48]

    The nicad clone detector,

    J. R. Cordy and C. K. Roy, “The nicad clone detector, ” in 2011 IEEE 19th International Conference on Program Comprehension. IEEE, 2011, pp. 219–220. [Online]. Available: https://github.com/bumper-app/nicad

  48. [49]

    Sourcerercc: Scaling code clone detection to big-code,

    H. Sajnani, V. Saini, J. Svajlenko, C. K. Roy, and C. V. Lopes, “Sourcerercc: Scaling code clone detection to big-code, ” in 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), 2016, pp. 1157–1168. [Online]. Available: https://github.com/Mondego/SourcererCC

  49. [50]

    A framework for source code search using program patterns,

    S. Paul and A. Prakash, “A framework for source code search using program patterns, ”IEEE Transactions on Software Engi- neering, vol. 20, no. 6, pp. 463–475, 1994

  50. [51]

    Introduction to automata theory, languages, and computation, 2nd edition,

    J. E. Hopcroft, R. Motwani, and J. D. Ullman, “Introduction to automata theory, languages, and computation, 2nd edition, ” ACM SIGACT News, vol. 32, no. 1, pp. 60–65, 2001. [Online]. Available: https://dl.acm.org/doi/10.1145/568438.568455

  51. [52]

    Effective pattern matching of source code using abstract syntax patterns,

    D. C. Atkinson and W. G. Griswold, “Effective pattern matching of source code using abstract syntax patterns, ” Software: Practice and Experience , vol. 36, no. 4, pp. 413–447,

  52. [53]

    Available: https://onlinelibrary.wiley.com/doi/ abs/10.1002/spe.704

    [Online]. Available: https://onlinelibrary.wiley.com/doi/ abs/10.1002/spe.704

  53. [54]

    Auto- mated personalized feedback in introductory java program- ming moocs,

    V. J. Marin, T. Pereira, S. Sridharan, and C. R. Rivero, “Auto- mated personalized feedback in introductory java program- ming moocs, ” in2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017, pp. 1259–1270