pith. sign in

arxiv: 2506.01249 · v3 · submitted 2025-06-02 · 💻 cs.SE · cs.PF

SysLLMatic: Large Language Models are Software System Optimizers

Pith reviewed 2026-05-19 12:13 UTC · model grok-4.3

classification 💻 cs.SE cs.PF
keywords large language modelssoftware optimizationperformance profilingcompiler optimizationJava applicationslatency improvementenergy efficiencyDaCapo benchmark
0
0 comments X

The pith

Large language models guided by profiling and a catalog of 43 optimization patterns can optimize large-scale software systems better than compilers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tries to establish that large language models can optimize complex, large-scale software when they have access to performance profiling information and a catalog of 43 optimization patterns. A sympathetic reader would care if this is true because it opens the door to automating performance tuning for real applications instead of relying on limited compiler heuristics or manual work. The evaluation on the DaCapo suite shows concrete gains over compilers in latency and energy. It also shows the system works on smaller benchmarks but scales to the large ones where prior LLM methods did not.

Core claim

SysLLMatic integrates LLMs with performance diagnostics and a curated catalog of 43 optimization patterns to automatically optimize software systems. By leveraging profiling to identify performance hotspots, the approach enables LLMs to optimize real-world software beyond isolated code snippets, achieving average relative improvements of 1.54x in latency and 1.24x in energy on the DaCapo suite of large-scale Java applications, compared to 1.01x and 1.08x for the compiler.

What carries the argument

The integration of LLMs with performance profiling for hotspot identification and the fixed catalog of 43 optimization patterns that the LLM selects and applies to the code.

If this is right

  • Large applications receive automatic performance improvements that exceed standard compiler results in latency and energy.
  • LLMs become practical for full production codebases when given structured performance guidance.
  • Metrics including throughput, memory usage, and CPU utilization improve alongside latency and energy in the evaluated suites.
  • The method applies across program sizes from small kernels to complete systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar LLM setups could be added to build pipelines for repeated automatic tuning during development.
  • The pattern catalog idea might transfer to other languages if the diagnostics and patterns are adapted.
  • Independent checks for semantic equivalence would be useful to confirm that applied changes preserve program behavior.

Load-bearing premise

The catalog of 43 optimization patterns is assumed to be both sufficient and safely applicable by the LLM to arbitrary real-world code without introducing semantic errors.

What would settle it

Applying SysLLMatic to a new large-scale Java application outside the DaCapo suite and finding either smaller gains than the compiler or introduced functional errors would settle whether the approach works as claimed.

Figures

Figures reproduced from arXiv: 2506.01249 by Arjun Gupte, Chien-Chou Ho, George K. Thiruvathukal, Huiyun Peng, James C. Davis, Konstantin L\"aufer, Leo Deng, Nicholas John Eliopoulos, Rishi Mantri, Ryan Hasler.

Figure 1
Figure 1. Figure 1: Diagnosis Cycle: Performance hypotheses are tested and refined [24]. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Construction of the Performance Optimization Pattern Catalog: We [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Excerpt of optimization patterns by theme. Symbols indicate four [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: During instrumentation §V-D, the system collects [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of SysLMMatic, our automated LLM-based optimization framework. The system integrates domain-specific performance knowledge and [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Second, we analyzed SysLLMatic’s optimizations [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 5
Figure 5. Figure 5: Side-by-side comparisons of two optimizations. The top row presents original versions: a Jacobi linear equation solver that uses the Successive Over [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of optimizations applied to HumanEval [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Normalized performance gains across five metrics on three bench [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance improvements across 1–4 Evaluator feedback iterations on [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of function-level and class-level optimization on SciMark2 [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Automatic software system optimization can improve software speed, reduce operating costs, and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to generalize across diverse codebases and system contexts. Recent methods using Large Language Models (LLMs) introduce automation on simple programs, but they do not scale effectively to the complexity and size of real-world software systems. We present SysLLMatic, a system that integrates LLMs with performance diagnostics and a curated catalog of 43 optimization patterns to automatically optimize software systems. By leveraging profiling to identify performance hotspots, our approach enables LLMs to optimize real-world software beyond isolated code snippets. We evaluate it on three benchmark suites: HumanEval_CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapo (large-scale software systems in Java). Results show that SysLLMatic can improve software system performance, including latency, throughput, energy efficiency, memory usage, and CPU utilization. It consistently outperforms state-of-the-art LLM baselines on microbenchmarks. On large-scale application codes, to which prior LLM approaches have not scaled, it surpasses compiler optimizations, achieving average relative improvements of 1.54x in latency (vs. 1.01x for the compiler) and 1.24x in energy (vs. 1.08x for the compiler). Our findings demonstrate that LLMs, guided by performance knowledge through the optimization pattern catalog and appropriate performance diagnostics, can serve as viable software system optimizers. We further identify limitations of our approach and the challenges involved in handling complex applications. This work provides a foundation for generating optimized code across various languages, benchmarks, and program sizes in a principled manner.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SysLLMatic, a system that integrates LLMs with performance profiling and a curated catalog of 43 optimization patterns to automatically optimize real-world software systems. Evaluations are reported on HumanEval_CPP (C++ microbenchmarks), SciMark2 (Java scientific kernels), and DaCapo (large-scale Java applications), with claims that the approach outperforms prior LLM baselines on microbenchmarks and surpasses compiler optimizations on DaCapo, yielding average relative improvements of 1.54x in latency (vs. 1.01x for the compiler) and 1.24x in energy (vs. 1.08x for the compiler).

Significance. If the empirical results are shown to be robust and the applied transformations preserve semantics, the work would represent a meaningful advance in automated software optimization by scaling LLM-based techniques to complex, large-scale codebases that prior methods have not addressed. The combination of profiling-driven hotspot identification with a fixed pattern catalog offers a concrete, reproducible pathway for LLM-guided optimization across languages and program sizes.

major comments (2)
  1. [DaCapo evaluation (Section 5)] DaCapo evaluation (Section 5 / results for large-scale applications): The central claim that SysLLMatic surpasses compiler optimizations with 1.54x latency and 1.24x energy gains on DaCapo requires that every LLM-selected and inserted pattern from the 43-pattern catalog produces functionally equivalent code. The manuscript contains no description of automated equivalence checking, differential testing, full test-suite execution on the modified binaries, or even manual inspection of changed sites. Without such verification, the reported speedups are not demonstrably comparable to the compiler baseline.
  2. [Abstract and Section 5] Abstract and quantitative results (Section 5): The headline average relative improvements (1.54x latency, 1.24x energy) are presented without any information on the number of experimental runs, standard deviations, statistical significance tests, or criteria used to select or exclude optimization patterns. This omission directly affects the reliability of the cross-benchmark and cross-optimizer comparisons.
minor comments (2)
  1. [System description / pattern catalog] The description of how the 43 optimization patterns were curated and validated for safety across domains could be expanded to clarify their generality.
  2. [Results tables/figures] Tables or figures reporting speedups should include error bars or confidence intervals when multiple runs or multiple applications are aggregated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of reproducibility and validity that we will address to strengthen the manuscript. We respond to each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [DaCapo evaluation (Section 5)] DaCapo evaluation (Section 5 / results for large-scale applications): The central claim that SysLLMatic surpasses compiler optimizations with 1.54x latency and 1.24x energy gains on DaCapo requires that every LLM-selected and inserted pattern from the 43-pattern catalog produces functionally equivalent code. The manuscript contains no description of automated equivalence checking, differential testing, full test-suite execution on the modified binaries, or even manual inspection of changed sites. Without such verification, the reported speedups are not demonstrably comparable to the compiler baseline.

    Authors: We agree that the manuscript must explicitly document equivalence verification to support the DaCapo claims. Although the DaCapo suite provides extensive built-in tests that were executed on all optimized binaries, and we performed spot-checks on changed code sites, these steps are not described. In the revised version we will add a dedicated subsection to Section 5 that details: (1) full execution of each application's DaCapo test suite on the modified binaries, (2) differential testing against the original on representative workloads, and (3) manual inspection of the LLM-proposed edits at profiled hotspots. This addition will make the comparison to the compiler baseline demonstrably valid. revision: yes

  2. Referee: [Abstract and Section 5] Abstract and quantitative results (Section 5): The headline average relative improvements (1.54x latency, 1.24x energy) are presented without any information on the number of experimental runs, standard deviations, statistical significance tests, or criteria used to select or exclude optimization patterns. This omission directly affects the reliability of the cross-benchmark and cross-optimizer comparisons.

    Authors: We acknowledge the absence of these methodological details. Each reported configuration was executed 10 times under controlled conditions to mitigate measurement noise, with averages taken; pattern selection followed explicit rules based on hotspot profiling and catalog matching. Standard deviations, confidence intervals, and statistical significance tests (paired t-tests against baselines) were not included. In the revision we will add this information to Section 5, include variability measures in the figures and tables, report p-values for the key comparisons, and clarify the pattern-selection criteria. If space allows we will also update the abstract to reflect the added rigor. revision: yes

Circularity Check

0 steps flagged

Empirical measurements on external benchmarks with no internal derivation chain

full rationale

The paper describes an empirical system (SysLLMatic) that applies LLMs guided by a fixed catalog of 43 patterns and profiling to optimize code on HumanEval_CPP, SciMark2, and DaCapo benchmarks. Reported speedups (e.g., 1.54x latency vs. compiler) are direct measurements against external baselines rather than quantities computed from fitted parameters or reduced to self-citations. No equations, uniqueness theorems, or ansatzes are invoked that could collapse into the inputs by construction. The central claims rest on observable runtime results on fixed suites, making the evaluation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the existence and sufficiency of a hand-curated catalog of 43 optimization patterns whose selection and application are treated as reliable inputs to the LLM; no free parameters are explicitly fitted in the abstract, and no new physical or mathematical entities are postulated.

axioms (1)
  • domain assumption LLMs can correctly apply the supplied optimization patterns to profiled code regions without altering program semantics.
    Invoked when the system feeds profiled hotspots and pattern descriptions to the model and accepts the generated rewrites as valid optimizations.
invented entities (1)
  • Catalog of 43 optimization patterns no independent evidence
    purpose: Provides explicit performance knowledge that guides the LLM beyond what it would generate from code alone.
    The catalog is presented as a curated, fixed resource that enables scaling to real systems; no independent evidence of its completeness is given in the abstract.

pith-pipeline@v0.9.0 · 5887 in / 1323 out tokens · 41850 ms · 2026-05-19T12:13:08.458777+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development

    cs.HC 2026-04 unverdicted novelty 5.0

    EcoAssist embeds energy estimation and optimization into AI-assisted frontend coding, reducing website energy use by 13-16% in benchmarks while preserving developer productivity.

  2. Sustainable Code Generation Using Large Language Models: A Systematic Literature Review

    cs.SE 2026-03 unverdicted novelty 3.0

    A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    Jain, The Art of Computer Systems Performance Analysis: Techniques For Experimental Design, Measurement, Simulation, and Modeling

    R. Jain, The Art of Computer Systems Performance Analysis: Techniques For Experimental Design, Measurement, Simulation, and Modeling . Wiley-Interscience, Apr. 1991

  2. [2]

    Energy consumption and efficiency in mobile applications: A user feedback study,

    C. Wilke, S. Richly et al., “Energy consumption and efficiency in mobile applications: A user feedback study,” in International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing , 2013

  3. [3]

    Data center energy consumption modeling: A survey,

    M. Dayarathna, Y . Wen, and R. Fan, “Data center energy consumption modeling: A survey,” IEEE Communications Surveys & Tutorials, 2015

  4. [4]

    The cost of poor software quality in the US: A 2020 report,

    H. Krasner, “The cost of poor software quality in the US: A 2020 report,” Proc. Consortium Inf. Softw. QualityTM (CISQTM) , vol. 2, p. 3, 2021

  5. [5]

    An analysis of failure- related energy waste in a large-scale cloud environment,

    P. Garraghan, I. S. Moreno, Townend et al. , “An analysis of failure- related energy waste in a large-scale cloud environment,” IEEE Trans- actions on Emerging Topics in Computing , 2014

  6. [6]

    Towards holistic continuous software performance assessment,

    V . Ferme and C. Pautasso, “Towards holistic continuous software performance assessment,” in ACM/SPEC on International Conference on Performance Engineering Companion , ser. ICPE ’17 Companion. Association for Computing Machinery, 2017

  7. [7]

    Acquirer: A hybrid approach to detecting algorithmic complexity vulnerabilities,

    Y . Liu and W. Meng, “Acquirer: A hybrid approach to detecting algorithmic complexity vulnerabilities,” in ACM SIGSAC Conference on Computer and Communications Security , ser. CCS ’22. Association for Computing Machinery, 2022

  8. [8]

    CWE-1132: Inefficient Algorithmic Complexity,

    The MITRE Corporation, “CWE-1132: Inefficient Algorithmic Complexity,” https://cwe.mitre.org/data/definitions/1132.html, 2024, accessed: 2025-05-10

  9. [9]

    AI agents under threat: A survey of key security challenges and future pathways,

    Z. Deng, Y . Guo, Han et al., “AI agents under threat: A survey of key security challenges and future pathways,” ACM Comput. Surv., vol. 57, Feb. 2025

  10. [10]

    Powering intelligence: Analyzing artificial intelligence and data center energy consumption,

    E. P. R. Institute, “Powering intelligence: Analyzing artificial intelligence and data center energy consumption,” Electric Power Research Institute, Brochure Product ID 3002028905, 2024

  11. [11]

    Commission adopts EU-wide scheme for rating sustainability of data centres,

    European Commission, “Commission adopts EU-wide scheme for rating sustainability of data centres,” 2024

  12. [12]

    How much energy will ai really consume? the good, the bad and the unknown,

    S. Chen, “How much energy will ai really consume? the good, the bad and the unknown,” Nature, 2025

  13. [13]

    Criticality analysis process model,

    C. Paulsen, J. Boyens, Bartol et al., “Criticality analysis process model,” National Institute of Standards and Technology, NIST Interagency/Inter- nal Report (NISTIR) NISTIR 8179, 2018

  14. [14]

    Currie, S

    A. Currie, S. Hsu, and S. Bergman, Building Green Software. O’Reilly Media, Inc., 2024

  15. [15]

    S. S. Muchnick, Advanced compiler design and implementation . Mor- gan Kaufmann Publishers Inc., 1998

  16. [16]

    D. A. Bader, B. M. E. Moret, and P. Sanders, Algorithm Engineering for Parallel Computation. Springer Berlin Heidelberg, 2002

  17. [17]

    A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators,

    R. Buchty, V . Heuveline, Karl et al. , “A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators,” Concurrency and Computation: Practice and Experience , 2012

  18. [18]

    Application of large language models to software engineer- ing tasks: Opportunities, risks, and implications,

    I. Ozkaya, “Application of large language models to software engineer- ing tasks: Opportunities, risks, and implications,” IEEE Software, 2023

  19. [19]

    Language models for code optimization: Survey, challenges and future directions.arXiv preprint arXiv:2501.01277, 2025

    J. Gong, V . V oskanyan, P. Brookes et al. , “Language models for code optimization: Survey, challenges and future directions,” 2025. [Online]. Available: https://arxiv.org/abs/2501.01277

  20. [20]

    Evaluating the energy-efficiency of the code generated by LLMs,

    M. A. Islam, D. V . Jonnala, R. Rekhi, P. Pokharel, S. Cilamkoti, A. Imran, T. Kosar, and B. Turkkan, “Evaluating the energy-efficiency of the code generated by LLMs,” arxiv: 2505.20324 , 2025. [Online]. Available: https://arxiv.org/abs/2505.20324

  21. [21]

    Evaluating large language models trained on code,

    M. Chen, J. Tworek, H. Jun et al. , “Evaluating large language models trained on code,” 2021. [Online]. Available: https://arxiv.org/abs/2107. 03374

  22. [22]

    Pozo and B

    R. Pozo and B. Miller. (2000) SciMark 2: A Java benchmark for scientific and numerical computing. Accessed: 2025-05-20. [Online]. Available: https://math.nist.gov/scimark2/

  23. [23]

    Rethinking Java performance analysis,

    S. M. Blackburn, Z. Cai, Chen et al. , “Rethinking Java performance analysis,” in ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 , ser. ASPLOS ’25, 2025

  24. [24]

    Gregg, Systems Performance: Enterprise and the Cloud

    B. Gregg, Systems Performance: Enterprise and the Cloud . Addison- Wesley, 2021

  25. [25]

    Performance issues and optimizations in JavaScript: An empirical study,

    M. Selakovic and M. Pradel, “Performance issues and optimizations in JavaScript: An empirical study,” in 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) , 2016

  26. [26]

    Evaluating and improving the performance and scheduling of HPC applications in cloud,

    A. Gupta, P. Faraboschi, Gioachin et al. , “Evaluating and improving the performance and scheduling of HPC applications in cloud,” IEEE Transactions on Cloud Computing , 2016

  27. [27]

    Validity of the single processor approach to achieving large scale computing capabilities,

    G. M. Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” in Spring Joint Computer Confer- ence, ser. AFIPS ’67 (Spring). Association for Computing Machinery, 1967, p. 483–485

  28. [28]

    N. J. Gunther, The Practical Performance Analyst: Performance-by- Design Techniques for Distributed Systems . McGraw-Hill, Inc., 1997

  29. [29]

    Differential profiling,

    P. E. McKenney, “Differential profiling,” Software: Practice and Expe- rience, vol. 29, no. 3, pp. 219–234, 1999

  30. [30]

    An execution profiler for modular programs,

    S. L. Graham, P. B. Kessler, and M. K. McKusick, “An execution profiler for modular programs,” Software: Practice and Experience , 1983

  31. [31]

    Gprof: A call graph execution profiler,

    S. L. Graham, P. B. Kessler, and M. K. Mckusick, “Gprof: A call graph execution profiler,” in SIGPLAN Symposium on Compiler Construction , ser. SIGPLAN ’82. Association for Computing Machinery, 1982

  32. [32]

    Profiling and tracing in Linux,

    S. Shende, “Profiling and tracing in Linux,” in Extreme Linux Workshop, vol. 2, 1999

  33. [33]

    The flame graph,

    B. Gregg, “The flame graph,” Communications of the ACM , vol. 59, no. 6, pp. 48–57, 2016

  34. [34]

    Measuring energy consumption for short code paths using RAPL,

    M. H ¨ahnel, B. D ¨obel, V ¨olp et al., “Measuring energy consumption for short code paths using RAPL,” SIGMETRICS Perform. Eval. Rev. , p. 13–17, 2012

  35. [35]

    Energy measurement of encryption tech- niques using RAPL,

    C. Thorat and V . Inamdar, “Energy measurement of encryption tech- niques using RAPL,” in International Conference on Computing, Com- munication, Control and Automation , 2017

  36. [36]

    RAPL in action: Experiences in using RAPL for power measurements,

    K. N. Khan, M. Hirki, Niemi et al. , “RAPL in action: Experiences in using RAPL for power measurements,” ACM Trans. Model. Perform. Eval. Comput. Syst. , Mar. 2018

  37. [37]

    GPU debugging and profiling with nvidia parallel nsight,

    K. Iyer and J. Kiel, “GPU debugging and profiling with nvidia parallel nsight,” Game Development Tools, pp. 303–324, 2016

  38. [38]

    THAPI: Tracing heterogeneous APIs,

    S. Bekele, A. Vivas, T. Applencourt et al. , “THAPI: Tracing heterogeneous APIs,” 2025. [Online]. Available: https://arxiv.org/abs/ 2504.03683

  39. [39]

    Global extensible open power manager: A vehicle for HPC community collaboration on co-designed energy management solutions,

    J. Eastep, S. Sylvester, Cantalupo et al., “Global extensible open power manager: A vehicle for HPC community collaboration on co-designed energy management solutions,” in High Performance Computing, J. M. Kunkel, R. Yokota, P. Balaji, and D. Keyes, Eds. Springer International Publishing, 2017

  40. [40]

    The future of software performance engineering,

    M. Woodside, G. Franks, and D. C. Petriu, “The future of software performance engineering,” in Future of Software Engineering (FOSE ’07), 2007, pp. 171–187

  41. [41]

    Performance engineering of software systems: a case study,

    C. U. Smith and J. C. Browne, “Performance engineering of software systems: a case study,” in National Computer Conference , 1982, p. 217–224

  42. [42]

    Compiler transformations for high-performance computing,

    D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler transformations for high-performance computing,” ACM Computing Surveys (CSUR) , vol. 26, no. 4, pp. 345–420, 1994

  43. [43]

    Global common subexpression elimination,

    J. Cocke, “Global common subexpression elimination,” SIGPLAN Not., p. 20–24, Jul. 1970

  44. [44]

    A. V . Aho, R. Sethi, and J. D. Ullman, Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co., Inc., 1986

  45. [45]

    Register allocation via coloring,

    G. J. Chaitin, M. A. Auslander, A. K. Chandra et al. , “Register allocation via coloring,” Computer Languages , vol. 6, no. 1, pp. 47– 57, 1981. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/0096055181900485

  46. [46]

    LLVM: a compilation framework for lifelong program analysis & transformation,

    C. Lattner and V . Adve, “LLVM: a compilation framework for lifelong program analysis & transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004. , 2004, pp. 75–86

  47. [47]

    The cache performance and optimizations of blocked algorithms,

    M. D. Lam, E. E. Rothberg, and M. E. Wolf, “The cache performance and optimizations of blocked algorithms,” in International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 1991

  48. [48]

    Towards a green ranking for programming languages,

    M. Couto et al., “Towards a green ranking for programming languages,” in the Brazilian Symposium on Programming Languages , 2017

  49. [49]

    Using selective memoization to defeat regular expression denial of service (ReDoS),

    J. C. Davis, F. Servant, and D. Lee, “Using selective memoization to defeat regular expression denial of service (ReDoS),” in 2021 IEEE symposium on security and privacy (SP) . IEEE, 2021, pp. 1–17

  50. [50]

    Packrat parsing: simple, powerful, lazy, linear time, functional pearl,

    B. Ford, “Packrat parsing: simple, powerful, lazy, linear time, functional pearl,” ACM SIGPLAN Notices , vol. 37, no. 9, pp. 36–47, 2002

  51. [51]

    Evolutionary improvement of programs,

    D. R. White, A. Arcuri, and J. A. Clark, “Evolutionary improvement of programs,” IEEE Transactions on Evolutionary Computation , vol. 15, no. 4, pp. 515–538, 2011

  52. [52]

    OpenTuner: An extensible framework for program autotuning,

    J. Ansel, S. Kamil, Veeramachaneni et al., “OpenTuner: An extensible framework for program autotuning,” in 2014 23rd International Confer- ence on Parallel Architecture and Compilation Techniques , 2014

  53. [53]

    An actionable performance profiler for optimizing the order of evaluations,

    M. Selakovic, T. Glaser, and M. Pradel, “An actionable performance profiler for optimizing the order of evaluations,” in ACM SIGSOFT International Symposium on Software Testing and Analysis , ser. ISSTA 2017, 2017

  54. [54]

    Structured chain-of-thought prompting for code generation,

    J. Li, G. Li, Y . Li, and Z. Jin, “Structured chain-of-thought prompting for code generation,” ACM Transactions on Software Engineering and Methodology, 2023

  55. [55]

    CodeGen: An open large language model for code with multi-turn program synthesis,

    E. Nijkamp, B. Pang, H. Hayashi et al. , “CodeGen: An open large language model for code with multi-turn program synthesis,” in The Eleventh International Conference on Learning Representations , 2023. [Online]. Available: https://openreview.net/forum?id=iaYcJKpY2B

  56. [56]

    Program synthesis with large language models,

    J. Austin, A. Odena, M. Nye et al. , “Program synthesis with large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2108. 07732

  57. [57]

    Software testing with large language models: Survey, landscape, and vision,

    J. Wang, Y . Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang, “Software testing with large language models: Survey, landscape, and vision,”IEEE Transactions on Software Engineering , vol. 50, no. 4, 2024

  58. [58]

    Evaluating and improving ChatGPT for unit test generation,

    Z. Yuan, Y . Lou, M. Liu et al., “Evaluating and improving ChatGPT for unit test generation,” Proc. ACM Softw. Eng. , no. FSE, Jul. 2024

  59. [59]

    Large language models in fault localisation,

    Y . Wu, Z. Li, J. M. Zhang et al. , “Large language models in fault localisation,” 2023. [Online]. Available: https://arxiv.org/abs/2308.15276

  60. [60]

    Explainable automated debugging via large language model-driven scientific debugging,

    S. Kang, B. Chen, S. Yoo, and J.-G. Lou, “Explainable automated debugging via large language model-driven scientific debugging,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02195

  61. [61]

    Teaching Large Language Models to Self-Debug

    X. Chen, M. Lin, N. Sch ¨arli, and D. Zhou, “Teaching large language models to self-debug,” 2023. [Online]. Available: https: //arxiv.org/abs/2304.05128

  62. [62]

    A survey on automated program repair techniques,

    K. Huang, Z. Xu, S. Yang et al., “A survey on automated program repair techniques,” 2023. [Online]. Available: https://arxiv.org/abs/2303.18184

  63. [63]

    Learning performance- improving code edits,

    A. G. Shypula, A. Madaan, Y . Zeng et al. , “Learning performance- improving code edits,” in International Conference on Learning Repre- sentations, 2024

  64. [64]

    Evaluating language models for efficient code generation,

    J. Liu, S. Xie, J. Wang et al. , “Evaluating language models for efficient code generation,” arXiv:2408.06450, 2024. [Online]. Available: https://arxiv.org/abs/2408.06450

  65. [65]

    Llm compiler: Foundation language models for compiler optimization,

    C. Cummins, V . Seeker, D. Grubisic et al., “Llm compiler: Foundation language models for compiler optimization,” in ACM SIGPLAN Interna- tional Conference on Compiler Construction , ser. CC ’25. New York, NY , USA: Association for Computing Machinery, 2025

  66. [66]

    Meta large language model compiler: Foundation models of compiler optimization,

    ——, “Meta large language model compiler: Foundation models of compiler optimization,” 2024. [Online]. Available: https://arxiv.org/abs/ 2407.02524

  67. [67]

    LLM-vectorizer: LLM-based verified loop vectorizer,

    J. Taneja, A. Laird, C. Yan et al., “LLM-vectorizer: LLM-based verified loop vectorizer,” in ACM/IEEE International Symposium on Code Gen- eration and Optimization , ser. CGO ’25. Association for Computing Machinery, 2025

  68. [68]

    Improving parallel program performance with llm optimizers via agent-system interface,

    A. Wei, A. Nie, T. S. F. X. Teixeira et al., “Improving parallel program performance with LLM optimizers via agent-system interface,” 2025. [Online]. Available: https://arxiv.org/abs/2410.15625

  69. [69]

    Supersonic: Learning to Generate Source Code Optimizations in C/C++ ,

    Z. Chen, S. Fang, and M. Monperrus, “ Supersonic: Learning to Generate Source Code Optimizations in C/C++ ,” IEEE Transactions on Software Engineering, vol. 50, no. 11, Nov. 2024

  70. [70]

    Search-Based LLMs for Code Optimiza- tion ,

    S. Gao, C. Gao, Gu et al. , “ Search-Based LLMs for Code Optimiza- tion ,” in 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, May 2025

  71. [71]

    DeepDev-PERF: a deep learning-based approach for improving software performance,

    S. Garg, R. Z. Moghaddam, Clement et al. , “DeepDev-PERF: a deep learning-based approach for improving software performance,” in European Software Engineering Conference and Symposium on the Foundations of Software Engineering , ser. ESEC/FSE 2022, 2022

  72. [72]

    Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes

    M. T. Dearing, Y . Tao, X. Wu et al. , “Leveraging LLMs to automate energy-aware refactoring of parallel scientific codes,” 2025. [Online]. Available: https://arxiv.org/abs/2505.02184

  73. [73]

    Self-refine: iterative refinement with self-feedback,

    A. Madaan, N. Tandon, P. Gupta et al., “Self-refine: iterative refinement with self-feedback,” in Advances in Neural Information Processing Systems, ser. NIPS ’23. Curran Associates Inc., 2023

  74. [74]

    EffiLearner: Enhancing efficiency of generated code via self-optimization,

    D. Huang, J. Dai, H. Weng et al. , “EffiLearner: Enhancing efficiency of generated code via self-optimization,” 2024. [Online]. Available: https://arxiv.org/abs/2405.15189

  75. [75]

    PerfCodeGen: Improving performance of LLM generated code with execution feedback,

    Y . Peng, A. D. Gotmare, M. Lyu et al. , “PerfCodeGen: Improving performance of LLM generated code with execution feedback,” 2024. [Online]. Available: https://arxiv.org/abs/2412.03578

  76. [76]

    MARCO: A multi-agent system for optimizing HPC code generation using large language models,

    A. Rahman, V . Cvetkovic, K. Reece et al. , “MARCO: A multi-agent system for optimizing HPC code generation using large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2505.03906

  77. [77]

    RAPGen: An approach for fixing code inefficiencies in zero-shot,

    S. Garg, R. Z. Moghaddam, and N. Sundaresan, “RAPGen: An approach for fixing code inefficiencies in zero-shot,” 2025. [Online]. Available: https://arxiv.org/abs/2306.17077

  78. [78]

    Iterative refactoring of real-world open- source programs with large language models,

    J. Choi, G. An, and S. Yoo, “Iterative refactoring of real-world open- source programs with large language models,” in Search-Based Software Engineering, G. Jahangirova and F. Khomh, Eds. Springer Nature Switzerland, 2024, pp. 49–55

  79. [79]

    Gerber, The Software Optimization Cookbook: High-performance Recipes for IA-32 Platforms, Second Edition

    R. Gerber, The Software Optimization Cookbook: High-performance Recipes for IA-32 Platforms, Second Edition . Books24x7.com, 2006

  80. [80]

    Kukunas, Power and Performance: Software Analysis and Optimiza- tion, 1st ed

    J. Kukunas, Power and Performance: Software Analysis and Optimiza- tion, 1st ed. Morgan Kaufmann Publishers Inc., 2015

Showing first 80 references.