Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes
Pith reviewed 2026-05-22 17:17 UTC · model grok-4.3
The pith
LLMs guided by runtime power data can refactor parallel scientific codes to cut energy use by about one third on GPUs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel codes through a multi-stage, iterative approach integrating runtime power profiling, energy-aware prompting, self-correcting feedback loops, and an LLM-as-a-Judge agent for screening generated code. We evaluate LASSI-EE using twenty-two representative scientific benchmarks and applications on NVIDIA A100 and AMD MI100 GPUs. The results indicate an average energy reduction of 36% for MI100 and 34% for A100, across trials that produced passing energy-reducing refactorings.
What carries the argument
LASSI-EE multi-stage iterative framework that combines runtime power profiling, energy-aware prompting, self-correcting loops, and LLM-as-Judge screening to produce energy-reducing code changes.
If this is right
- Energy consumption of large-scale scientific applications on GPUs can be lowered automatically using empirical execution feedback.
- LLMs become practical tools for optimizing parallel codes for power efficiency in addition to correctness and speed.
- Refactoring tasks that previously required manual expert effort can be handled through iterative prompting and verification loops.
Where Pith is reading between the lines
- The same feedback-driven loop could be applied to other hardware platforms if comparable power measurement tools exist.
- Repeated application across successive code versions might produce cumulative energy improvements over a project's lifetime.
Load-bearing premise
LLM-generated refactorings preserve functional correctness for the original scientific results while delivering the measured energy savings.
What would settle it
A benchmark run where the refactored code passes screening and executes but returns different scientific output from the original or shows no energy reduction.
read the original abstract
Large language models (LLMs) are increasingly used for generating parallel scientific codes, with a primary focus on generating functionally correct code. Recent work has focused on generating performant code, with an emphasis on its execution time. However, energy efficiency is now recognized as a critical objective, given the significant power demands of large-scale compute systems. This paper addresses the research question of whether LLMs can generate energy-efficient parallel scientific codes when guided by empirical execution feedback. To answer this question, we propose LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel codes through a multi-stage, iterative approach integrating runtime power profiling, energy-aware prompting, self-correcting feedback loops, and an LLM-as-a-Judge agent for screening generated code. We evaluate LASSI-EE using twenty-two representative scientific benchmarks and applications on NVIDIA A100 and AMD MI100 GPUs. The results indicate an average energy reduction of 36% for MI100 and 34% for A100, across trials that produced passing energy-reducing refactorings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces LASSI-EE, a multi-stage LLM-based framework for automated refactoring of parallel scientific codes to reduce energy consumption. It integrates runtime power profiling, energy-aware prompting, self-correcting feedback loops, and an LLM-as-Judge agent. Evaluated on 22 benchmarks and applications on NVIDIA A100 and AMD MI100 GPUs, it reports average energy reductions of 36% (MI100) and 34% (A100) across trials that produced passing energy-reducing refactorings.
Significance. If the central results hold after addressing the missing success-rate and verification details, the work would be significant for demonstrating practical LLM-driven energy optimization in HPC, moving beyond time-focused code generation. The empirical feedback loop and GPU-specific evaluation on representative scientific workloads are strengths; the approach could influence energy-aware refactoring tools if the pass rates and correctness guarantees are quantified.
major comments (3)
- [Abstract / Evaluation] Abstract and Evaluation section: The headline claims of 36% (MI100) and 34% (A100) average energy reduction are explicitly conditioned on 'trials that produced passing energy-reducing refactorings' without reporting the overall success rate, the distribution of savings among passers, or any failure-mode analysis. This directly affects interpretability of the central claim, as the averages cannot be read as expected performance of LASSI-EE without knowing what fraction of attempts succeeded or whether passers are biased toward easier benchmarks.
- [Methodology / Evaluation] Methodology and Evaluation sections: Functional correctness of the refactored scientific codes is asserted via self-correcting loops and LLM-as-Judge screening, yet no details are provided on verification methods for numerical/scientific results (e.g., tolerance checks against reference outputs, number of test cases, or error rates). This is load-bearing because undetected semantic errors could invalidate the energy savings.
- [Evaluation] Evaluation section: The abstract and results lack variance across trials, statistical significance tests, or per-benchmark breakdowns for the energy reductions. Without these, it is unclear whether the reported averages are robust or driven by a few outliers.
minor comments (2)
- [Evaluation] Clarify the exact number of trials per benchmark and the definition of 'passing' (e.g., energy reduction threshold and correctness criteria) in the evaluation description.
- [Related Work] The paper could strengthen the related-work discussion by explicitly contrasting LASSI-EE against prior LLM code-generation efforts focused solely on performance rather than energy.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and agree that several clarifications and additions will strengthen the manuscript. Revisions will be made to improve interpretability and rigor of the evaluation.
read point-by-point responses
-
Referee: The headline claims of 36% (MI100) and 34% (A100) average energy reduction are explicitly conditioned on 'trials that produced passing energy-reducing refactorings' without reporting the overall success rate, the distribution of savings among passers, or any failure-mode analysis. This directly affects interpretability of the central claim, as the averages cannot be read as expected performance of LASSI-EE without knowing what fraction of attempts succeeded or whether passers are biased toward easier benchmarks.
Authors: We acknowledge the referee's point on interpretability. The reported averages are conditioned on successful trials, as stated in the abstract and Evaluation section. To address this, we will revise the manuscript to report the overall success rate across all attempts (including the fraction of trials that produced passing refactorings), a distribution or histogram of energy savings among successful cases, and a concise failure-mode analysis. These details are available from our experimental logs and will be added to the Evaluation section and abstract where appropriate. revision: yes
-
Referee: Functional correctness of the refactored scientific codes is asserted via self-correcting loops and LLM-as-Judge screening, yet no details are provided on verification methods for numerical/scientific results (e.g., tolerance checks against reference outputs, number of test cases, or error rates). This is load-bearing because undetected semantic errors could invalidate the energy savings.
Authors: We agree that explicit verification details for numerical correctness are essential for scientific codes. The Methodology section describes the self-correcting feedback loops and LLM-as-Judge screening, but we will expand the Evaluation section to include specifics on numerical verification: tolerance thresholds used for output comparisons against reference implementations, the number and types of test cases per benchmark, and observed error rates or failure counts during screening. This will clarify how semantic correctness was ensured beyond the LLM-based checks. revision: yes
-
Referee: The abstract and results lack variance across trials, statistical significance tests, or per-benchmark breakdowns for the energy reductions. Without these, it is unclear whether the reported averages are robust or driven by a few outliers.
Authors: We appreciate this observation regarding statistical robustness. The current Evaluation section presents aggregate averages but does not include variance, per-benchmark breakdowns, or significance tests. We will revise to add: (1) per-benchmark energy reduction tables or figures with individual results, (2) measures of variance (e.g., standard deviation or interquartile range across trials), and (3) statistical significance tests (such as paired t-tests comparing original vs. refactored energy) to confirm the averages are not outlier-driven. These additions will be included in the revised Evaluation section. revision: yes
Circularity Check
No significant circularity; empirical measurements are externally grounded
full rationale
The paper proposes an LLM-driven refactoring framework (LASSI-EE) and evaluates it via direct runtime power profiling on physical NVIDIA A100 and AMD MI100 GPUs across 22 benchmarks. Reported energy reductions (36% MI100, 34% A100) are obtained from hardware measurements conditioned on passing trials, not from any fitted parameters, self-referential definitions, or equations that reduce to the inputs by construction. No derivation chain, uniqueness theorems, or ansatzes are present; the central claims rest on external, falsifiable execution data rather than self-citation load-bearing or renaming of known results.
Axiom & Free-Parameter Ledger
invented entities (1)
-
LASSI-EE framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LASSI-EE ... multi-stage, iterative approach integrating runtime power profiling, energy-aware prompting, self-correcting feedback loops, and an LLM-as-a-Judge agent
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
energy-reduction@k ... expected energy reduction when generating k code candidates
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
SysLLMatic: Large Language Models are Software System Optimizers
SysLLMatic integrates LLMs with performance diagnostics and a 43-pattern catalog to optimize complex software, reporting 1.54x latency and 1.24x energy gains over compilers on large Java systems where prior LLM method...
-
Sustainable Code Generation Using Large Language Models: A Systematic Literature Review
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.