SysLLMatic: Large Language Models are Software System Optimizers
Pith reviewed 2026-05-19 12:13 UTC · model grok-4.3
The pith
Large language models guided by profiling and a catalog of 43 optimization patterns can optimize large-scale software systems better than compilers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SysLLMatic integrates LLMs with performance diagnostics and a curated catalog of 43 optimization patterns to automatically optimize software systems. By leveraging profiling to identify performance hotspots, the approach enables LLMs to optimize real-world software beyond isolated code snippets, achieving average relative improvements of 1.54x in latency and 1.24x in energy on the DaCapo suite of large-scale Java applications, compared to 1.01x and 1.08x for the compiler.
What carries the argument
The integration of LLMs with performance profiling for hotspot identification and the fixed catalog of 43 optimization patterns that the LLM selects and applies to the code.
If this is right
- Large applications receive automatic performance improvements that exceed standard compiler results in latency and energy.
- LLMs become practical for full production codebases when given structured performance guidance.
- Metrics including throughput, memory usage, and CPU utilization improve alongside latency and energy in the evaluated suites.
- The method applies across program sizes from small kernels to complete systems.
Where Pith is reading between the lines
- Similar LLM setups could be added to build pipelines for repeated automatic tuning during development.
- The pattern catalog idea might transfer to other languages if the diagnostics and patterns are adapted.
- Independent checks for semantic equivalence would be useful to confirm that applied changes preserve program behavior.
Load-bearing premise
The catalog of 43 optimization patterns is assumed to be both sufficient and safely applicable by the LLM to arbitrary real-world code without introducing semantic errors.
What would settle it
Applying SysLLMatic to a new large-scale Java application outside the DaCapo suite and finding either smaller gains than the compiler or introduced functional errors would settle whether the approach works as claimed.
Figures
read the original abstract
Automatic software system optimization can improve software speed, reduce operating costs, and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to generalize across diverse codebases and system contexts. Recent methods using Large Language Models (LLMs) introduce automation on simple programs, but they do not scale effectively to the complexity and size of real-world software systems. We present SysLLMatic, a system that integrates LLMs with performance diagnostics and a curated catalog of 43 optimization patterns to automatically optimize software systems. By leveraging profiling to identify performance hotspots, our approach enables LLMs to optimize real-world software beyond isolated code snippets. We evaluate it on three benchmark suites: HumanEval_CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapo (large-scale software systems in Java). Results show that SysLLMatic can improve software system performance, including latency, throughput, energy efficiency, memory usage, and CPU utilization. It consistently outperforms state-of-the-art LLM baselines on microbenchmarks. On large-scale application codes, to which prior LLM approaches have not scaled, it surpasses compiler optimizations, achieving average relative improvements of 1.54x in latency (vs. 1.01x for the compiler) and 1.24x in energy (vs. 1.08x for the compiler). Our findings demonstrate that LLMs, guided by performance knowledge through the optimization pattern catalog and appropriate performance diagnostics, can serve as viable software system optimizers. We further identify limitations of our approach and the challenges involved in handling complex applications. This work provides a foundation for generating optimized code across various languages, benchmarks, and program sizes in a principled manner.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SysLLMatic, a system that integrates LLMs with performance profiling and a curated catalog of 43 optimization patterns to automatically optimize real-world software systems. Evaluations are reported on HumanEval_CPP (C++ microbenchmarks), SciMark2 (Java scientific kernels), and DaCapo (large-scale Java applications), with claims that the approach outperforms prior LLM baselines on microbenchmarks and surpasses compiler optimizations on DaCapo, yielding average relative improvements of 1.54x in latency (vs. 1.01x for the compiler) and 1.24x in energy (vs. 1.08x for the compiler).
Significance. If the empirical results are shown to be robust and the applied transformations preserve semantics, the work would represent a meaningful advance in automated software optimization by scaling LLM-based techniques to complex, large-scale codebases that prior methods have not addressed. The combination of profiling-driven hotspot identification with a fixed pattern catalog offers a concrete, reproducible pathway for LLM-guided optimization across languages and program sizes.
major comments (2)
- [DaCapo evaluation (Section 5)] DaCapo evaluation (Section 5 / results for large-scale applications): The central claim that SysLLMatic surpasses compiler optimizations with 1.54x latency and 1.24x energy gains on DaCapo requires that every LLM-selected and inserted pattern from the 43-pattern catalog produces functionally equivalent code. The manuscript contains no description of automated equivalence checking, differential testing, full test-suite execution on the modified binaries, or even manual inspection of changed sites. Without such verification, the reported speedups are not demonstrably comparable to the compiler baseline.
- [Abstract and Section 5] Abstract and quantitative results (Section 5): The headline average relative improvements (1.54x latency, 1.24x energy) are presented without any information on the number of experimental runs, standard deviations, statistical significance tests, or criteria used to select or exclude optimization patterns. This omission directly affects the reliability of the cross-benchmark and cross-optimizer comparisons.
minor comments (2)
- [System description / pattern catalog] The description of how the 43 optimization patterns were curated and validated for safety across domains could be expanded to clarify their generality.
- [Results tables/figures] Tables or figures reporting speedups should include error bars or confidence intervals when multiple runs or multiple applications are aggregated.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of reproducibility and validity that we will address to strengthen the manuscript. We respond to each major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: [DaCapo evaluation (Section 5)] DaCapo evaluation (Section 5 / results for large-scale applications): The central claim that SysLLMatic surpasses compiler optimizations with 1.54x latency and 1.24x energy gains on DaCapo requires that every LLM-selected and inserted pattern from the 43-pattern catalog produces functionally equivalent code. The manuscript contains no description of automated equivalence checking, differential testing, full test-suite execution on the modified binaries, or even manual inspection of changed sites. Without such verification, the reported speedups are not demonstrably comparable to the compiler baseline.
Authors: We agree that the manuscript must explicitly document equivalence verification to support the DaCapo claims. Although the DaCapo suite provides extensive built-in tests that were executed on all optimized binaries, and we performed spot-checks on changed code sites, these steps are not described. In the revised version we will add a dedicated subsection to Section 5 that details: (1) full execution of each application's DaCapo test suite on the modified binaries, (2) differential testing against the original on representative workloads, and (3) manual inspection of the LLM-proposed edits at profiled hotspots. This addition will make the comparison to the compiler baseline demonstrably valid. revision: yes
-
Referee: [Abstract and Section 5] Abstract and quantitative results (Section 5): The headline average relative improvements (1.54x latency, 1.24x energy) are presented without any information on the number of experimental runs, standard deviations, statistical significance tests, or criteria used to select or exclude optimization patterns. This omission directly affects the reliability of the cross-benchmark and cross-optimizer comparisons.
Authors: We acknowledge the absence of these methodological details. Each reported configuration was executed 10 times under controlled conditions to mitigate measurement noise, with averages taken; pattern selection followed explicit rules based on hotspot profiling and catalog matching. Standard deviations, confidence intervals, and statistical significance tests (paired t-tests against baselines) were not included. In the revision we will add this information to Section 5, include variability measures in the figures and tables, report p-values for the key comparisons, and clarify the pattern-selection criteria. If space allows we will also update the abstract to reflect the added rigor. revision: yes
Circularity Check
Empirical measurements on external benchmarks with no internal derivation chain
full rationale
The paper describes an empirical system (SysLLMatic) that applies LLMs guided by a fixed catalog of 43 patterns and profiling to optimize code on HumanEval_CPP, SciMark2, and DaCapo benchmarks. Reported speedups (e.g., 1.54x latency vs. compiler) are direct measurements against external baselines rather than quantities computed from fitted parameters or reduced to self-citations. No equations, uniqueness theorems, or ansatzes are invoked that could collapse into the inputs by construction. The central claims rest on observable runtime results on fixed suites, making the evaluation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can correctly apply the supplied optimization patterns to profiled code regions without altering program semantics.
invented entities (1)
-
Catalog of 43 optimization patterns
no independent evidence
Forward citations
Cited by 2 Pith papers
-
EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development
EcoAssist embeds energy estimation and optimization into AI-assisted frontend coding, reducing website energy use by 13-16% in benchmarks while preserving developer productivity.
-
Sustainable Code Generation Using Large Language Models: A Systematic Literature Review
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.
Reference graph
Works this paper leans on
-
[1]
R. Jain, The Art of Computer Systems Performance Analysis: Techniques For Experimental Design, Measurement, Simulation, and Modeling . Wiley-Interscience, Apr. 1991
work page 1991
-
[2]
Energy consumption and efficiency in mobile applications: A user feedback study,
C. Wilke, S. Richly et al., “Energy consumption and efficiency in mobile applications: A user feedback study,” in International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing , 2013
work page 2013
-
[3]
Data center energy consumption modeling: A survey,
M. Dayarathna, Y . Wen, and R. Fan, “Data center energy consumption modeling: A survey,” IEEE Communications Surveys & Tutorials, 2015
work page 2015
-
[4]
The cost of poor software quality in the US: A 2020 report,
H. Krasner, “The cost of poor software quality in the US: A 2020 report,” Proc. Consortium Inf. Softw. QualityTM (CISQTM) , vol. 2, p. 3, 2021
work page 2020
-
[5]
An analysis of failure- related energy waste in a large-scale cloud environment,
P. Garraghan, I. S. Moreno, Townend et al. , “An analysis of failure- related energy waste in a large-scale cloud environment,” IEEE Trans- actions on Emerging Topics in Computing , 2014
work page 2014
-
[6]
Towards holistic continuous software performance assessment,
V . Ferme and C. Pautasso, “Towards holistic continuous software performance assessment,” in ACM/SPEC on International Conference on Performance Engineering Companion , ser. ICPE ’17 Companion. Association for Computing Machinery, 2017
work page 2017
-
[7]
Acquirer: A hybrid approach to detecting algorithmic complexity vulnerabilities,
Y . Liu and W. Meng, “Acquirer: A hybrid approach to detecting algorithmic complexity vulnerabilities,” in ACM SIGSAC Conference on Computer and Communications Security , ser. CCS ’22. Association for Computing Machinery, 2022
work page 2022
-
[8]
CWE-1132: Inefficient Algorithmic Complexity,
The MITRE Corporation, “CWE-1132: Inefficient Algorithmic Complexity,” https://cwe.mitre.org/data/definitions/1132.html, 2024, accessed: 2025-05-10
work page 2024
-
[9]
AI agents under threat: A survey of key security challenges and future pathways,
Z. Deng, Y . Guo, Han et al., “AI agents under threat: A survey of key security challenges and future pathways,” ACM Comput. Surv., vol. 57, Feb. 2025
work page 2025
-
[10]
Powering intelligence: Analyzing artificial intelligence and data center energy consumption,
E. P. R. Institute, “Powering intelligence: Analyzing artificial intelligence and data center energy consumption,” Electric Power Research Institute, Brochure Product ID 3002028905, 2024
work page 2024
-
[11]
Commission adopts EU-wide scheme for rating sustainability of data centres,
European Commission, “Commission adopts EU-wide scheme for rating sustainability of data centres,” 2024
work page 2024
-
[12]
How much energy will ai really consume? the good, the bad and the unknown,
S. Chen, “How much energy will ai really consume? the good, the bad and the unknown,” Nature, 2025
work page 2025
-
[13]
Criticality analysis process model,
C. Paulsen, J. Boyens, Bartol et al., “Criticality analysis process model,” National Institute of Standards and Technology, NIST Interagency/Inter- nal Report (NISTIR) NISTIR 8179, 2018
work page 2018
- [14]
-
[15]
S. S. Muchnick, Advanced compiler design and implementation . Mor- gan Kaufmann Publishers Inc., 1998
work page 1998
-
[16]
D. A. Bader, B. M. E. Moret, and P. Sanders, Algorithm Engineering for Parallel Computation. Springer Berlin Heidelberg, 2002
work page 2002
-
[17]
A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators,
R. Buchty, V . Heuveline, Karl et al. , “A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators,” Concurrency and Computation: Practice and Experience , 2012
work page 2012
-
[18]
I. Ozkaya, “Application of large language models to software engineer- ing tasks: Opportunities, risks, and implications,” IEEE Software, 2023
work page 2023
-
[19]
J. Gong, V . V oskanyan, P. Brookes et al. , “Language models for code optimization: Survey, challenges and future directions,” 2025. [Online]. Available: https://arxiv.org/abs/2501.01277
-
[20]
Evaluating the energy-efficiency of the code generated by LLMs,
M. A. Islam, D. V . Jonnala, R. Rekhi, P. Pokharel, S. Cilamkoti, A. Imran, T. Kosar, and B. Turkkan, “Evaluating the energy-efficiency of the code generated by LLMs,” arxiv: 2505.20324 , 2025. [Online]. Available: https://arxiv.org/abs/2505.20324
-
[21]
Evaluating large language models trained on code,
M. Chen, J. Tworek, H. Jun et al. , “Evaluating large language models trained on code,” 2021. [Online]. Available: https://arxiv.org/abs/2107. 03374
work page 2021
-
[22]
R. Pozo and B. Miller. (2000) SciMark 2: A Java benchmark for scientific and numerical computing. Accessed: 2025-05-20. [Online]. Available: https://math.nist.gov/scimark2/
work page 2000
-
[23]
Rethinking Java performance analysis,
S. M. Blackburn, Z. Cai, Chen et al. , “Rethinking Java performance analysis,” in ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 , ser. ASPLOS ’25, 2025
work page 2025
-
[24]
Gregg, Systems Performance: Enterprise and the Cloud
B. Gregg, Systems Performance: Enterprise and the Cloud . Addison- Wesley, 2021
work page 2021
-
[25]
Performance issues and optimizations in JavaScript: An empirical study,
M. Selakovic and M. Pradel, “Performance issues and optimizations in JavaScript: An empirical study,” in 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) , 2016
work page 2016
-
[26]
Evaluating and improving the performance and scheduling of HPC applications in cloud,
A. Gupta, P. Faraboschi, Gioachin et al. , “Evaluating and improving the performance and scheduling of HPC applications in cloud,” IEEE Transactions on Cloud Computing , 2016
work page 2016
-
[27]
Validity of the single processor approach to achieving large scale computing capabilities,
G. M. Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” in Spring Joint Computer Confer- ence, ser. AFIPS ’67 (Spring). Association for Computing Machinery, 1967, p. 483–485
work page 1967
-
[28]
N. J. Gunther, The Practical Performance Analyst: Performance-by- Design Techniques for Distributed Systems . McGraw-Hill, Inc., 1997
work page 1997
-
[29]
P. E. McKenney, “Differential profiling,” Software: Practice and Expe- rience, vol. 29, no. 3, pp. 219–234, 1999
work page 1999
-
[30]
An execution profiler for modular programs,
S. L. Graham, P. B. Kessler, and M. K. McKusick, “An execution profiler for modular programs,” Software: Practice and Experience , 1983
work page 1983
-
[31]
Gprof: A call graph execution profiler,
S. L. Graham, P. B. Kessler, and M. K. Mckusick, “Gprof: A call graph execution profiler,” in SIGPLAN Symposium on Compiler Construction , ser. SIGPLAN ’82. Association for Computing Machinery, 1982
work page 1982
-
[32]
Profiling and tracing in Linux,
S. Shende, “Profiling and tracing in Linux,” in Extreme Linux Workshop, vol. 2, 1999
work page 1999
-
[33]
B. Gregg, “The flame graph,” Communications of the ACM , vol. 59, no. 6, pp. 48–57, 2016
work page 2016
-
[34]
Measuring energy consumption for short code paths using RAPL,
M. H ¨ahnel, B. D ¨obel, V ¨olp et al., “Measuring energy consumption for short code paths using RAPL,” SIGMETRICS Perform. Eval. Rev. , p. 13–17, 2012
work page 2012
-
[35]
Energy measurement of encryption tech- niques using RAPL,
C. Thorat and V . Inamdar, “Energy measurement of encryption tech- niques using RAPL,” in International Conference on Computing, Com- munication, Control and Automation , 2017
work page 2017
-
[36]
RAPL in action: Experiences in using RAPL for power measurements,
K. N. Khan, M. Hirki, Niemi et al. , “RAPL in action: Experiences in using RAPL for power measurements,” ACM Trans. Model. Perform. Eval. Comput. Syst. , Mar. 2018
work page 2018
-
[37]
GPU debugging and profiling with nvidia parallel nsight,
K. Iyer and J. Kiel, “GPU debugging and profiling with nvidia parallel nsight,” Game Development Tools, pp. 303–324, 2016
work page 2016
-
[38]
THAPI: Tracing heterogeneous APIs,
S. Bekele, A. Vivas, T. Applencourt et al. , “THAPI: Tracing heterogeneous APIs,” 2025. [Online]. Available: https://arxiv.org/abs/ 2504.03683
-
[39]
J. Eastep, S. Sylvester, Cantalupo et al., “Global extensible open power manager: A vehicle for HPC community collaboration on co-designed energy management solutions,” in High Performance Computing, J. M. Kunkel, R. Yokota, P. Balaji, and D. Keyes, Eds. Springer International Publishing, 2017
work page 2017
-
[40]
The future of software performance engineering,
M. Woodside, G. Franks, and D. C. Petriu, “The future of software performance engineering,” in Future of Software Engineering (FOSE ’07), 2007, pp. 171–187
work page 2007
-
[41]
Performance engineering of software systems: a case study,
C. U. Smith and J. C. Browne, “Performance engineering of software systems: a case study,” in National Computer Conference , 1982, p. 217–224
work page 1982
-
[42]
Compiler transformations for high-performance computing,
D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler transformations for high-performance computing,” ACM Computing Surveys (CSUR) , vol. 26, no. 4, pp. 345–420, 1994
work page 1994
-
[43]
Global common subexpression elimination,
J. Cocke, “Global common subexpression elimination,” SIGPLAN Not., p. 20–24, Jul. 1970
work page 1970
-
[44]
A. V . Aho, R. Sethi, and J. D. Ullman, Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co., Inc., 1986
work page 1986
-
[45]
Register allocation via coloring,
G. J. Chaitin, M. A. Auslander, A. K. Chandra et al. , “Register allocation via coloring,” Computer Languages , vol. 6, no. 1, pp. 47– 57, 1981. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/0096055181900485
-
[46]
LLVM: a compilation framework for lifelong program analysis & transformation,
C. Lattner and V . Adve, “LLVM: a compilation framework for lifelong program analysis & transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004. , 2004, pp. 75–86
work page 2004
-
[47]
The cache performance and optimizations of blocked algorithms,
M. D. Lam, E. E. Rothberg, and M. E. Wolf, “The cache performance and optimizations of blocked algorithms,” in International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 1991
work page 1991
-
[48]
Towards a green ranking for programming languages,
M. Couto et al., “Towards a green ranking for programming languages,” in the Brazilian Symposium on Programming Languages , 2017
work page 2017
-
[49]
Using selective memoization to defeat regular expression denial of service (ReDoS),
J. C. Davis, F. Servant, and D. Lee, “Using selective memoization to defeat regular expression denial of service (ReDoS),” in 2021 IEEE symposium on security and privacy (SP) . IEEE, 2021, pp. 1–17
work page 2021
-
[50]
Packrat parsing: simple, powerful, lazy, linear time, functional pearl,
B. Ford, “Packrat parsing: simple, powerful, lazy, linear time, functional pearl,” ACM SIGPLAN Notices , vol. 37, no. 9, pp. 36–47, 2002
work page 2002
-
[51]
Evolutionary improvement of programs,
D. R. White, A. Arcuri, and J. A. Clark, “Evolutionary improvement of programs,” IEEE Transactions on Evolutionary Computation , vol. 15, no. 4, pp. 515–538, 2011
work page 2011
-
[52]
OpenTuner: An extensible framework for program autotuning,
J. Ansel, S. Kamil, Veeramachaneni et al., “OpenTuner: An extensible framework for program autotuning,” in 2014 23rd International Confer- ence on Parallel Architecture and Compilation Techniques , 2014
work page 2014
-
[53]
An actionable performance profiler for optimizing the order of evaluations,
M. Selakovic, T. Glaser, and M. Pradel, “An actionable performance profiler for optimizing the order of evaluations,” in ACM SIGSOFT International Symposium on Software Testing and Analysis , ser. ISSTA 2017, 2017
work page 2017
-
[54]
Structured chain-of-thought prompting for code generation,
J. Li, G. Li, Y . Li, and Z. Jin, “Structured chain-of-thought prompting for code generation,” ACM Transactions on Software Engineering and Methodology, 2023
work page 2023
-
[55]
CodeGen: An open large language model for code with multi-turn program synthesis,
E. Nijkamp, B. Pang, H. Hayashi et al. , “CodeGen: An open large language model for code with multi-turn program synthesis,” in The Eleventh International Conference on Learning Representations , 2023. [Online]. Available: https://openreview.net/forum?id=iaYcJKpY2B
work page 2023
-
[56]
Program synthesis with large language models,
J. Austin, A. Odena, M. Nye et al. , “Program synthesis with large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2108. 07732
work page 2021
-
[57]
Software testing with large language models: Survey, landscape, and vision,
J. Wang, Y . Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang, “Software testing with large language models: Survey, landscape, and vision,”IEEE Transactions on Software Engineering , vol. 50, no. 4, 2024
work page 2024
-
[58]
Evaluating and improving ChatGPT for unit test generation,
Z. Yuan, Y . Lou, M. Liu et al., “Evaluating and improving ChatGPT for unit test generation,” Proc. ACM Softw. Eng. , no. FSE, Jul. 2024
work page 2024
-
[59]
Large language models in fault localisation,
Y . Wu, Z. Li, J. M. Zhang et al. , “Large language models in fault localisation,” 2023. [Online]. Available: https://arxiv.org/abs/2308.15276
-
[60]
Explainable automated debugging via large language model-driven scientific debugging,
S. Kang, B. Chen, S. Yoo, and J.-G. Lou, “Explainable automated debugging via large language model-driven scientific debugging,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02195
-
[61]
Teaching Large Language Models to Self-Debug
X. Chen, M. Lin, N. Sch ¨arli, and D. Zhou, “Teaching large language models to self-debug,” 2023. [Online]. Available: https: //arxiv.org/abs/2304.05128
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[62]
A survey on automated program repair techniques,
K. Huang, Z. Xu, S. Yang et al., “A survey on automated program repair techniques,” 2023. [Online]. Available: https://arxiv.org/abs/2303.18184
-
[63]
Learning performance- improving code edits,
A. G. Shypula, A. Madaan, Y . Zeng et al. , “Learning performance- improving code edits,” in International Conference on Learning Repre- sentations, 2024
work page 2024
-
[64]
Evaluating language models for efficient code generation,
J. Liu, S. Xie, J. Wang et al. , “Evaluating language models for efficient code generation,” arXiv:2408.06450, 2024. [Online]. Available: https://arxiv.org/abs/2408.06450
-
[65]
Llm compiler: Foundation language models for compiler optimization,
C. Cummins, V . Seeker, D. Grubisic et al., “Llm compiler: Foundation language models for compiler optimization,” in ACM SIGPLAN Interna- tional Conference on Compiler Construction , ser. CC ’25. New York, NY , USA: Association for Computing Machinery, 2025
work page 2025
-
[66]
Meta large language model compiler: Foundation models of compiler optimization,
——, “Meta large language model compiler: Foundation models of compiler optimization,” 2024. [Online]. Available: https://arxiv.org/abs/ 2407.02524
-
[67]
LLM-vectorizer: LLM-based verified loop vectorizer,
J. Taneja, A. Laird, C. Yan et al., “LLM-vectorizer: LLM-based verified loop vectorizer,” in ACM/IEEE International Symposium on Code Gen- eration and Optimization , ser. CGO ’25. Association for Computing Machinery, 2025
work page 2025
-
[68]
Improving parallel program performance with llm optimizers via agent-system interface,
A. Wei, A. Nie, T. S. F. X. Teixeira et al., “Improving parallel program performance with LLM optimizers via agent-system interface,” 2025. [Online]. Available: https://arxiv.org/abs/2410.15625
-
[69]
Supersonic: Learning to Generate Source Code Optimizations in C/C++ ,
Z. Chen, S. Fang, and M. Monperrus, “ Supersonic: Learning to Generate Source Code Optimizations in C/C++ ,” IEEE Transactions on Software Engineering, vol. 50, no. 11, Nov. 2024
work page 2024
-
[70]
Search-Based LLMs for Code Optimiza- tion ,
S. Gao, C. Gao, Gu et al. , “ Search-Based LLMs for Code Optimiza- tion ,” in 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, May 2025
work page 2025
-
[71]
DeepDev-PERF: a deep learning-based approach for improving software performance,
S. Garg, R. Z. Moghaddam, Clement et al. , “DeepDev-PERF: a deep learning-based approach for improving software performance,” in European Software Engineering Conference and Symposium on the Foundations of Software Engineering , ser. ESEC/FSE 2022, 2022
work page 2022
-
[72]
Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes
M. T. Dearing, Y . Tao, X. Wu et al. , “Leveraging LLMs to automate energy-aware refactoring of parallel scientific codes,” 2025. [Online]. Available: https://arxiv.org/abs/2505.02184
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[73]
Self-refine: iterative refinement with self-feedback,
A. Madaan, N. Tandon, P. Gupta et al., “Self-refine: iterative refinement with self-feedback,” in Advances in Neural Information Processing Systems, ser. NIPS ’23. Curran Associates Inc., 2023
work page 2023
-
[74]
EffiLearner: Enhancing efficiency of generated code via self-optimization,
D. Huang, J. Dai, H. Weng et al. , “EffiLearner: Enhancing efficiency of generated code via self-optimization,” 2024. [Online]. Available: https://arxiv.org/abs/2405.15189
-
[75]
PerfCodeGen: Improving performance of LLM generated code with execution feedback,
Y . Peng, A. D. Gotmare, M. Lyu et al. , “PerfCodeGen: Improving performance of LLM generated code with execution feedback,” 2024. [Online]. Available: https://arxiv.org/abs/2412.03578
-
[76]
MARCO: A multi-agent system for optimizing HPC code generation using large language models,
A. Rahman, V . Cvetkovic, K. Reece et al. , “MARCO: A multi-agent system for optimizing HPC code generation using large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2505.03906
-
[77]
RAPGen: An approach for fixing code inefficiencies in zero-shot,
S. Garg, R. Z. Moghaddam, and N. Sundaresan, “RAPGen: An approach for fixing code inefficiencies in zero-shot,” 2025. [Online]. Available: https://arxiv.org/abs/2306.17077
-
[78]
Iterative refactoring of real-world open- source programs with large language models,
J. Choi, G. An, and S. Yoo, “Iterative refactoring of real-world open- source programs with large language models,” in Search-Based Software Engineering, G. Jahangirova and F. Khomh, Eds. Springer Nature Switzerland, 2024, pp. 49–55
work page 2024
-
[79]
R. Gerber, The Software Optimization Cookbook: High-performance Recipes for IA-32 Platforms, Second Edition . Books24x7.com, 2006
work page 2006
-
[80]
Kukunas, Power and Performance: Software Analysis and Optimiza- tion, 1st ed
J. Kukunas, Power and Performance: Software Analysis and Optimiza- tion, 1st ed. Morgan Kaufmann Publishers Inc., 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.