Coordinated Power Management on Heterogeneous Systems

Michael E. Papka; Valerie E. Taylor; Xingfu Wu; Zhiling Lan; Zhong Zheng

arxiv: 2508.07605 · v3 · submitted 2025-08-11 · 💻 cs.DC

Coordinated Power Management on Heterogeneous Systems

Zhong Zheng , Zhiling Lan , Xingfu Wu , Valerie E. Taylor , Michael E. Papka This is my paper

Pith reviewed 2026-05-19 00:23 UTC · model grok-4.3

classification 💻 cs.DC

keywords performance predictionheterogeneous systemscollaborative filteringpower managementHPCCPU-GPUenergy efficiencyprofiling reduction

0 comments

The pith

OPEN combines offline modeling with lightweight online profiling and collaborative filtering to predict performance on CPU-GPU systems at up to 98.29 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OPEN, a two-phase framework that addresses the high cost of exhaustive profiling for performance prediction in heterogeneous computing systems. In the offline phase it builds a predictor and an initial dense matrix; in the online phase it collects only lightweight measurements and applies collaborative filtering to fill in the rest. This matters because modern HPC workloads run on large combinations of CPUs and GPUs where full offline characterization is often impractical. If the approach works, it supplies accurate enough predictions to support runtime power decisions without the usual profiling overhead.

Core claim

OPEN performs performance prediction by constructing a performance predictor and dense matrix offline, then using lightweight online profiling together with collaborative filtering to generate predictions, reaching up to 98.29 percent accuracy on systems containing A100 and A30 GPUs while substantially lowering profiling cost.

What carries the argument

OPEN, a hybrid framework whose offline predictor and collaborative-filtering step convert sparse lightweight online measurements into full performance estimates for power-aware scheduling.

If this is right

Runtime power decisions become feasible on large-scale heterogeneous nodes without exhaustive pre-characterization.
The same lightweight profiling step can be reused across multiple applications once the offline matrix exists.
Power-aware schedulers gain a practical data source for deciding CPU-GPU allocation under energy caps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may extend to other accelerators if the offline matrix is rebuilt for the new hardware.
Prediction errors could be further reduced by feeding runtime power measurements back into the online phase.
Integration with existing job schedulers would allow automatic throttling or migration when predicted power exceeds a budget.

Load-bearing premise

An offline-built performance predictor plus collaborative filtering on an initial dense matrix will deliver accurate online predictions from only lightweight profiling on the tested heterogeneous systems and applications.

What would settle it

Run OPEN on a new application or GPU model without rebuilding the offline matrix and observe whether accuracy falls below 90 percent.

Figures

Figures reproduced from arXiv: 2508.07605 by Michael E. Papka, Valerie E. Taylor, Xingfu Wu, Zhiling Lan, Zhong Zheng.

**Figure 1.** Figure 1: Normalized performance for BERT training, UNet training, miniGAN, GROMACS, and Resnet50 training on an Intel(R) Xeon(R) Platinum 8380 system with an A100 GPU under various CPU and GPU power caps reveals differing sensitivity patterns across applications. (a) miniGAN (b) UNet training [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Pareto frontiers of applications between normalized [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The OPEN framework consists of offline and online [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Mutual information scores between selected CPU and [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Applications with long CPU phases. B. Online Profiling and Prediction 1) Collaborative Filtering Model: Collaborative Filtering (CF) is a foundational technique in recommender systems, widely adopted following its popularization during the Netflix Prize competition [56]. The key idea is to predict missing entries in a sparse user–item interaction matrix by leveraging patterns learned from known data. Sever… view at source ↗

**Figure 6.** Figure 6: Normalized application energy efficiency on A100 with different power capping strategies. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Normalized application performance on A100 with different power capping strategies. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 1.** Figure 1: However, the coordinated CPU–GPU power capping [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

read the original abstract

Performance prediction is essential for energy-efficient computing in heterogeneous computing systems that integrate CPUs and GPUs. However, traditional performance modeling methods often rely on exhaustive offline profiling, which becomes impractical due to the large setting space and the high cost of profiling large-scale applications. In this paper, we present OPEN, a framework consists of offline and online phases. The offline phase involves building a performance predictor and constructing an initial dense matrix. In the online phase, OPEN performs lightweight online profiling, and leverages the performance predictor with collaborative filtering to make performance prediction. We evaluate OPEN on multiple heterogeneous systems, including those equipped with A100 and A30 GPUs. Results show that OPEN achieves prediction accuracy up to 98.29\%. This demonstrates that OPEN effectively reduces profiling cost while maintaining high accuracy, making it practical for power-aware performance modeling in modern HPC environments. Overall, OPEN provides a lightweight solution for performance prediction under power constraints, enabling better runtime decisions in power-aware computing environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents OPEN, a framework for performance prediction on heterogeneous CPU-GPU systems consisting of an offline phase (building a performance predictor and initial dense matrix) and an online phase (lightweight profiling plus collaborative filtering). Evaluation on A100 and A30 systems reports prediction accuracy up to 98.29%, with the claim that this reduces profiling cost while supporting power-aware performance modeling and better runtime decisions under power constraints.

Significance. If the accuracy claims hold with proper validation, the method could reduce the prohibitive cost of exhaustive profiling in large configuration spaces of modern heterogeneous HPC systems, aiding practical power management. The offline predictor plus collaborative filtering approach is a plausible direction, but its significance for coordinated power management remains limited without demonstrated end-to-end benefits.

major comments (2)

Abstract and results: the central claim that OPEN 'enables better runtime decisions in power-aware computing environments' is not supported by any experiments; only prediction accuracy is reported, with no results on power capping, frequency scaling, energy minimization, or comparisons of end-to-end energy/performance against exhaustive profiling or other baselines.
Evaluation description: the reported 98.29% accuracy lacks supporting details on error bars, data exclusion rules, exact collaborative filtering implementation, or construction of the initial dense matrix, leaving the reliability of the accuracy claim difficult to assess.

minor comments (1)

The title emphasizes 'Coordinated Power Management' while the evaluation focuses exclusively on prediction accuracy; clarifying the scope and adding a short section on how predictions feed into management policies would improve alignment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with honest responses and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [—] Abstract and results: the central claim that OPEN 'enables better runtime decisions in power-aware computing environments' is not supported by any experiments; only prediction accuracy is reported, with no results on power capping, frequency scaling, energy minimization, or comparisons of end-to-end energy/performance against exhaustive profiling or other baselines.

Authors: We acknowledge that the manuscript presents only prediction accuracy results and contains no experiments on power capping, frequency scaling, energy minimization, or end-to-end comparisons. The core technical contribution is the offline predictor combined with online collaborative filtering for reduced-overhead performance prediction. We agree the claim in the abstract overreaches the presented evidence. In revision we will qualify or remove this phrasing from the abstract and conclusion, and add a short discussion section describing how the predictions could support runtime power decisions without claiming experimental validation of those benefits. revision: partial
Referee: [—] Evaluation description: the reported 98.29% accuracy lacks supporting details on error bars, data exclusion rules, exact collaborative filtering implementation, or construction of the initial dense matrix, leaving the reliability of the accuracy claim difficult to assess.

Authors: We agree that additional methodological details are needed for reproducibility and to substantiate the accuracy claim. In the revised manuscript we will report error bars or standard deviations on the accuracy figures, explicitly state any data exclusion or outlier removal rules, describe the exact collaborative filtering algorithm and its parameters, and detail how the initial dense matrix was constructed (including the number and selection of offline profiled configurations). revision: yes

Circularity Check

0 steps flagged

No significant circularity in OPEN performance prediction derivation

full rationale

The paper describes a two-phase framework where an offline phase constructs a performance predictor and an initial dense matrix, after which an online phase applies lightweight profiling plus collaborative filtering to generate predictions. This chain does not reduce any claimed result to a fitted parameter or self-referential definition of the target accuracy; the reported 98.29% accuracy is presented as an empirical evaluation outcome on A100/A30 systems rather than a quantity forced by construction from the inputs themselves. No self-citation load-bearing step, uniqueness theorem, or ansatz smuggling is invoked to justify the core method. The derivation therefore remains self-contained against external benchmarks and receives a normal non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that an offline-constructed performance predictor and initial dense matrix remain effective when combined with collaborative filtering during lightweight online profiling; no free parameters or invented entities are explicitly detailed in the abstract.

axioms (1)

domain assumption The offline-built performance predictor generalizes sufficiently to new online settings when augmented by collaborative filtering.
Invoked in the description of the online phase that leverages the predictor for performance prediction.

pith-pipeline@v0.9.0 · 5698 in / 1303 out tokens · 50103 ms · 2026-05-19T00:23:14.875478+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Energy Efficient Co-Scheduling in HPC
cs.DC 2026-04 unverdicted novelty 5.0

EcoSched jointly selects GPU counts via lightweight profiling and coschedules jobs with a score-based policy plus NUMA placement, delivering up to 14.8% energy savings, 30.1% makespan reduction, and 40.4% EDP improvem...
EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems
cs.DC 2026-04 unverdicted novelty 5.0

EcoShift uses online performance prediction plus dynamic programming to reallocate reclaimed power in heterogeneous CPU-GPU clusters, delivering up to 6% average performance gain while staying inside the total power limit.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 2 Pith papers

[1]

Exascale computing study: Technology challenges in achieving exascale systems,

K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Den- neau, P. Franzon, W. Harrod, K. Hill, J. Hiller,et al., “Exascale computing study: Technology challenges in achieving exascale systems,” Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep, vol. 15, p. 181, 2008

work page 2008
[2]

A strawman for an hpc powerstack,

C. Cantalupo, J. Eastep, S. Jana, M. Kondo, M. Maiterth, A. Marathe, T. Patki, B. Rountree, R. Sakamoto, M. Schulz,et al., “A strawman for an hpc powerstack,” tech. rep., Intel Corporation (United States); Lawrence Livermore National Lab.(LLNL . . . , 2018

work page 2018
[3]

Rodinia: A benchmark suite for heterogeneous computing,

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in2009 IEEE international symposium on workload characterization (IISWC), pp. 44–54, Ieee, 2009

work page 2009
[4]

An integrated gpu power and performance model,

S. Hong and H. Kim, “An integrated gpu power and performance model,” inProceedings of the 37th annual international symposium on Computer architecture, pp. 280–289, 2010

work page 2010
[5]

Improving throughput of power-constrained gpus using dynamic voltage/frequency and core scaling,

J. Lee, V . Sathisha, M. Schulte, K. Compton, and N. S. Kim, “Improving throughput of power-constrained gpus using dynamic voltage/frequency and core scaling,” in2011 International Conference on Parallel Archi- tectures and Compilation Techniques, pp. 111–120, IEEE, 2011

work page 2011
[7]

Predictable gpus frequency scaling for energy and performance,

K. Fan, B. Cosenza, and B. Juurlink, “Predictable gpus frequency scaling for energy and performance,” inProceedings of the 48th International Conference on Parallel Processing, pp. 1–10, 2019

work page 2019
[11]

Performance-aware energy-efficient gpu frequency selection using dnn- based models,

G. Ali, M. Side, S. Bhalachandra, N. J. Wright, and Y . Chen, “Performance-aware energy-efficient gpu frequency selection using dnn- based models,” inProceedings of the 52nd International Conference on Parallel Processing, pp. 433–442, 2023

work page 2023
[13]

Memory power management via dynamic voltage/frequency scaling,

H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte, and O. Mutlu, “Memory power management via dynamic voltage/frequency scaling,” inProceedings of the 8th ACM international conference on Autonomic computing, pp. 31–40, 2011

work page 2011
[15]

Energy efficient real- time task scheduling on cpu-gpu hybrid clusters,

X. Mei, X. Chu, H. Liu, Y .-W. Leung, and Z. Li, “Energy efficient real- time task scheduling on cpu-gpu hybrid clusters,” inIEEE INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1–9, IEEE, 2017

work page 2017
[17]

Co-run scheduling with power cap on integrated cpu-gpu systems,

Q. Zhu, B. Wu, X. Shen, L. Shen, and Z. Wang, “Co-run scheduling with power cap on integrated cpu-gpu systems,” in2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 967–977, IEEE, 2017

work page 2017
[21]

Modeling and decoupling the gpu power consumption for cross-domain dvfs,

J. Guerreiro, A. Ilic, N. Roma, and P. Tom ´as, “Modeling and decoupling the gpu power consumption for cross-domain dvfs,”IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 11, pp. 2494–2506, 2019

work page 2019
[22]

A survey and measurement study of gpu dvfs on energy conservation,

X. Mei, Q. Wang, and X. Chu, “A survey and measurement study of gpu dvfs on energy conservation,”Digital Communications and Networks, vol. 3, no. 2, pp. 89–100, 2017

work page 2017
[24]

Gpgpu performance estimation for frequency scaling using cross-benchmarking,

Q. Wang, C. Liu, and X. Chu, “Gpgpu performance estimation for frequency scaling using cross-benchmarking,” inProceedings of the 13th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, pp. 31–40, 2020

work page 2020
[25]

Optimal gpu frequency selection using multi-objective approaches for hpc sys- tems,

G. Ali, S. Bhalachandra, N. J. Wright, M. Side, and Y . Chen, “Optimal gpu frequency selection using multi-objective approaches for hpc sys- tems,” in2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7, IEEE, 2022

work page 2022
[26]

Performance-aware energy-efficient gpu frequency selection using dnn- based models,

G. Ali, M. Side, S. Bhalachandra, N. J. Wright, and Y . Chen, “Performance-aware energy-efficient gpu frequency selection using dnn- based models,” inProceedings of the 52nd International Conference on Parallel Processing, pp. 433–442, ACM, 2023

work page 2023
[27]

A data-driven frequency scaling approach for deadline-aware energy ef- ficient scheduling on graphics processing units (gpus),

S. Ilager, R. Muralidhar, K. Ramamohanarao, and R. Buyya, “A data-driven frequency scaling approach for deadline-aware energy ef- ficient scheduling on graphics processing units (gpus),” in2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 579–588, IEEE, 2020

work page 2020
[28]

Greengpu: A holistic approach to energy efficiency in gpu-cpu heterogeneous architectures,

K. Ma, X. Li, W. Chen, C. Zhang, and X. Wang, “Greengpu: A holistic approach to energy efficiency in gpu-cpu heterogeneous architectures,” in2012 41st International Conference on Parallel Processing, pp. 48– 57, IEEE, 2012

work page 2012
[29]

Power capping of cpu-gpu heterogeneous systems through coordinating dvfs and task mapping,

T. Komoda, S. Hayashi, T. Nakada, S. Miwa, and H. Nakamura, “Power capping of cpu-gpu heterogeneous systems through coordinating dvfs and task mapping,” in2013 IEEE 31st International Conference on Computer Design (ICCD), pp. 349–356, IEEE, 2013

work page 2013
[30]

Coordinated energy management in heterogeneous processors,

I. Paul, V . Ravi, S. Manne, M. Arora, and S. Yalamanchili, “Coordinated energy management in heterogeneous processors,” inSC ’13: Proceed- ings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12, IEEE, 2013

work page 2013
[31]

Harmonia: Balancing compute and memory power in high-performance gpus,

I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia: Balancing compute and memory power in high-performance gpus,” inProceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 54–65, ACM, 2015

work page 2015
[32]

Multi-kernel auto- tuning on gpus: Performance and energy-aware optimization,

J. Guerreiro, A. Ilic, N. Roma, and P. Tom ´as, “Multi-kernel auto- tuning on gpus: Performance and energy-aware optimization,” in2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 438–445, IEEE, 2015

work page 2015
[33]

Dynamic gpgpu power management using adaptive model predictive control,

A. Majumdar, L. Piga, I. Paul, J. L. Greathouse, W. Huang, and D. H. Albonesi, “Dynamic gpgpu power management using adaptive model predictive control,” in2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 613–624, IEEE, 2017

work page 2017
[34]

Predictable gpus frequency scaling for energy and performance,

K. Fan, B. Cosenza, and B. Juurlink, “Predictable gpus frequency scaling for energy and performance,” inProceedings of the 48th International Conference on Parallel Processing, pp. 1–10, ACM, 2019

work page 2019
[35]

Dvfs-aware application classification to improve gpgpus energy efficiency,

J. Guerreiro, A. Ilic, N. Roma, and P. Tom ´as, “Dvfs-aware application classification to improve gpgpus energy efficiency,”Parallel Computing, vol. 83, pp. 93–117, 2019

work page 2019
[36]

Coordinated batching and dvfs for dnn inference on gpu accelerators,

S. M. Nabavinejad, S. Reda, and M. Ebrahimi, “Coordinated batching and dvfs for dnn inference on gpu accelerators,”IEEE transactions on parallel and distributed systems, vol. 33, no. 10, pp. 2496–2508, 2022

work page 2022
[37]

Equalizer: Dynamic tuning of gpu resources for efficient execution,

A. Sethia and S. Mahlke, “Equalizer: Dynamic tuning of gpu resources for efficient execution,” in2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 647–658, IEEE, 2014

work page 2014
[38]

The crisp performance model for dynamic voltage and frequency scaling in a gpgpu,

R. Nath and D. Tullsen, “The crisp performance model for dynamic voltage and frequency scaling in a gpgpu,” inProceedings of the 48th international symposium on microarchitecture, pp. 281–293, 2015

work page 2015
[39]

Grape: Minimizing energy for gpu applications with performance requirements,

M. H. Santriaji and H. Hoffmann, “Grape: Minimizing energy for gpu applications with performance requirements,” in2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13, IEEE, 2016

work page 2016
[40]

Indicator-directed dynamic power management for iterative workloads on gpu-accelerated systems,

P. Zou, A. Li, K. Barker, and R. Ge, “Indicator-directed dynamic power management for iterative workloads on gpu-accelerated systems,” in 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 559–568, IEEE, 2020

work page 2020
[42]

Coordinated energy management in heterogeneous processors,

I. Paul, V . Ravi, S. Manne, M. Arora, and S. Yalamanchili, “Coordinated energy management in heterogeneous processors,” inProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12, 2013

work page 2013
[43]

Altis: Modernizing gpgpu benchmarks,

B. Hu and C. J. Rossbach, “Altis: Modernizing gpgpu benchmarks,” in2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 1–11, IEEE, 2020

work page 2020
[44]

Altis-sycl: Migrating altis benchmarking suite from cuda to sycl for gpus and fpgas,

C. Weckert, L. Solis-Vasquez, J. Oppermann, A. Koch, and O. Sinnen, “Altis-sycl: Migrating altis benchmarking suite from cuda to sycl for gpus and fpgas,” inProceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 547–555, 2023

work page 2023
[45]

ECP proxy apps suite

“ECP proxy apps suite.” https://proxyapps.exascaleproject.org/ ecp- proxy- apps- suite/., 2025

work page 2025
[46]

Mlperf™ hpc: A holistic benchmark suite for scientific machine learning on hpc systems,

S. Farrell, M. Emani, J. Balma, L. Drescher, A. Drozd, A. Fink, G. Fox, D. Kanter, T. Kurth, P. Mattson,et al., “Mlperf™ hpc: A holistic benchmark suite for scientific machine learning on hpc systems,” in 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), pp. 33–45, IEEE, 2021

work page 2021
[47]

A benchmark suite for improving performance portability of the sycl programming model,

Z. Jin and J. S. Vetter, “A benchmark suite for improving performance portability of the sycl programming model,” in2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 325–327, IEEE, 2023

work page 2023
[49]

Accelwattch: A power modeling framework for modern gpus,

V . Kandiah, S. Peverelle, M. Khairy, J. Pan, A. Manjunath, T. G. Rogers, T. M. Aamodt, and N. Hardavellas, “Accelwattch: A power modeling framework for modern gpus,” inMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 738–753, 2021

work page 2021
[50]

Gpgpu performance estimation with core and memory frequency scaling,

Q. Wang and X. Chu, “Gpgpu performance estimation with core and memory frequency scaling,”IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 12, pp. 2865–2881, 2020

work page 2020
[51]

Gpgpu power modeling for multi-domain voltage-frequency scaling,

J. Guerreiro, A. Ilic, N. Roma, and P. Tomas, “Gpgpu power modeling for multi-domain voltage-frequency scaling,” in2018 IEEE Interna- tional Symposium on High Performance Computer Architecture (HPCA), pp. 789–800, IEEE, 2018

work page 2018
[52]

Gpgpu power estimation with core and memory frequency scaling,

Q. Wang and X. Chu, “Gpgpu power estimation with core and memory frequency scaling,”ACM SIGMETRICS Performance Evaluation Review, vol. 45, no. 2, pp. 73–78, 2017

work page 2017
[53]

Power and performance characterization and modeling of gpu-accelerated systems,

Y . Abe, H. Sasaki, S. Kato, K. Inoue, M. Edahiro, and M. Peres, “Power and performance characterization and modeling of gpu-accelerated systems,” in2014 IEEE 28th international parallel and distributed processing symposium, pp. 113–122, IEEE, 2014

work page 2014
[54]

Gpu power prediction via ensemble machine learning for dvfs space exploration,

B. Dutta, V . Adhinarayanan, and W.-c. Feng, “Gpu power prediction via ensemble machine learning for dvfs space exploration,” inProceedings of the 15th ACM International Conference on Computing Frontiers, pp. 240–243, 2018

work page 2018
[55]

Gpgpu performance and power estimation using machine learning,

G. Wu, J. L. Greathouse, A. Lyashevsky, N. Jayasena, and D. Chiou, “Gpgpu performance and power estimation using machine learning,” in 2015 IEEE 21st international symposium on high performance computer architecture (HPCA), pp. 564–576, IEEE, 2015

work page 2015
[56]

Large-scale parallel collaborative filtering for the netflix prize,

Y . Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-scale parallel collaborative filtering for the netflix prize,” inAlgorithmic Aspects in Information and Management: 4th International Conference, AAIM 2008, Shanghai, China, June 23-25, 2008. Proceedings 4, pp. 337–348, Springer, 2008

work page 2008
[57]

Design and implementation of movie recommendation system based on knn collaborative filtering algorithm,

B.-B. Cui, “Design and implementation of movie recommendation system based on knn collaborative filtering algorithm,” inITM web of conferences, vol. 12, p. 04008, EDP Sciences, 2017

work page 2017
[58]

Svd-based collaborative filtering with privacy,

H. Polat and W. Du, “Svd-based collaborative filtering with privacy,” inProceedings of the 2005 ACM symposium on Applied computing, pp. 791–795, 2005

work page 2005
[59]

Variational autoencoders for collaborative filtering,

D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara, “Variational autoencoders for collaborative filtering,” inProceedings of the 2018 world wide web conference, pp. 689–698, 2018

work page 2018
[60]

Neural collaborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” inProceedings of the 26th International Confer- ence on World Wide Web, WWW ’17, (Republic and Canton of Geneva, CHE), p. 173–182, International World Wide Web Conferences Steering Committee, 2017

work page 2017
[61]

Gromacs: fast, flexible, and free,

D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. Berendsen, “Gromacs: fast, flexible, and free,”Journal of computational chemistry, vol. 26, no. 16, pp. 1701–1718, 2005

work page 2005
[62]

Atomic and M

L.-s. Atomic and M. M. P. Simulator, “Lammps,”available at: http:/lammps. sandia. gov, 2013

work page 2013
[63]

Scalable molecular dynamics with namd,

J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten, “Scalable molecular dynamics with namd,”Journal of computational chemistry, vol. 26, no. 16, pp. 1781–1802, 2005

work page 2005
[64]

Lessons learned from the chameleon testbed,

K. Keahey, J. Anderson, Z. Zhen, P. Riteau, P. Ruth, D. Stanzione, M. Cevik, J. Colleran, H. S. Gunawi, C. Hammock,et al., “Lessons learned from the chameleon testbed,” in2020 USENIX annual technical conference (USENIX ATC 20), pp. 219–233, 2020

work page 2020
[65]

Improving gpu energy efficiency through an application-transparent frequency scaling policy with performance assurance,

Y . Zhang, Q. Wang, Z. Lin, P. Xu, and B. Wang, “Improving gpu energy efficiency through an application-transparent frequency scaling policy with performance assurance,” inProceedings of the Nineteenth European Conference on Computer Systems, pp. 769–785, 2024

work page 2024
[66]

Power analysis of nersc production workloads,

Z. Zhao, E. Rrapaj, S. Bhalachandra, B. Austin, H. A. Nam, and N. Wright, “Power analysis of nersc production workloads,” inProceed- ings of the SC’23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 1279– 1287, 2023

work page 2023
[67]

Collaborative filtering recommender systems,

J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” inThe adaptive web: methods and strategies of web personalization, pp. 291–324, Springer, 2007

work page 2007
[68]

Paragon: Qos-aware scheduling for heterogeneous datacenters,

C. Delimitrou and C. Kozyrakis, “Paragon: Qos-aware scheduling for heterogeneous datacenters,”ACM SIGPLAN Notices, vol. 48, no. 4, pp. 77–88, 2013

work page 2013
[69]

Predicting per- formance using collaborative filtering,

S. Salaria, A. Drozd, A. Podobas, and S. Matsuoka, “Predicting per- formance using collaborative filtering,” in2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 504–514, IEEE, 2018

work page 2018

[1] [1]

Exascale computing study: Technology challenges in achieving exascale systems,

K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Den- neau, P. Franzon, W. Harrod, K. Hill, J. Hiller,et al., “Exascale computing study: Technology challenges in achieving exascale systems,” Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep, vol. 15, p. 181, 2008

work page 2008

[2] [2]

A strawman for an hpc powerstack,

C. Cantalupo, J. Eastep, S. Jana, M. Kondo, M. Maiterth, A. Marathe, T. Patki, B. Rountree, R. Sakamoto, M. Schulz,et al., “A strawman for an hpc powerstack,” tech. rep., Intel Corporation (United States); Lawrence Livermore National Lab.(LLNL . . . , 2018

work page 2018

[3] [3]

Rodinia: A benchmark suite for heterogeneous computing,

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in2009 IEEE international symposium on workload characterization (IISWC), pp. 44–54, Ieee, 2009

work page 2009

[4] [4]

An integrated gpu power and performance model,

S. Hong and H. Kim, “An integrated gpu power and performance model,” inProceedings of the 37th annual international symposium on Computer architecture, pp. 280–289, 2010

work page 2010

[5] [5]

Improving throughput of power-constrained gpus using dynamic voltage/frequency and core scaling,

J. Lee, V . Sathisha, M. Schulte, K. Compton, and N. S. Kim, “Improving throughput of power-constrained gpus using dynamic voltage/frequency and core scaling,” in2011 International Conference on Parallel Archi- tectures and Compilation Techniques, pp. 111–120, IEEE, 2011

work page 2011

[6] [7]

Predictable gpus frequency scaling for energy and performance,

K. Fan, B. Cosenza, and B. Juurlink, “Predictable gpus frequency scaling for energy and performance,” inProceedings of the 48th International Conference on Parallel Processing, pp. 1–10, 2019

work page 2019

[7] [11]

Performance-aware energy-efficient gpu frequency selection using dnn- based models,

G. Ali, M. Side, S. Bhalachandra, N. J. Wright, and Y . Chen, “Performance-aware energy-efficient gpu frequency selection using dnn- based models,” inProceedings of the 52nd International Conference on Parallel Processing, pp. 433–442, 2023

work page 2023

[8] [13]

Memory power management via dynamic voltage/frequency scaling,

H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte, and O. Mutlu, “Memory power management via dynamic voltage/frequency scaling,” inProceedings of the 8th ACM international conference on Autonomic computing, pp. 31–40, 2011

work page 2011

[9] [15]

Energy efficient real- time task scheduling on cpu-gpu hybrid clusters,

X. Mei, X. Chu, H. Liu, Y .-W. Leung, and Z. Li, “Energy efficient real- time task scheduling on cpu-gpu hybrid clusters,” inIEEE INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1–9, IEEE, 2017

work page 2017

[10] [17]

Co-run scheduling with power cap on integrated cpu-gpu systems,

Q. Zhu, B. Wu, X. Shen, L. Shen, and Z. Wang, “Co-run scheduling with power cap on integrated cpu-gpu systems,” in2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 967–977, IEEE, 2017

work page 2017

[11] [21]

Modeling and decoupling the gpu power consumption for cross-domain dvfs,

J. Guerreiro, A. Ilic, N. Roma, and P. Tom ´as, “Modeling and decoupling the gpu power consumption for cross-domain dvfs,”IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 11, pp. 2494–2506, 2019

work page 2019

[12] [22]

A survey and measurement study of gpu dvfs on energy conservation,

X. Mei, Q. Wang, and X. Chu, “A survey and measurement study of gpu dvfs on energy conservation,”Digital Communications and Networks, vol. 3, no. 2, pp. 89–100, 2017

work page 2017

[13] [24]

Gpgpu performance estimation for frequency scaling using cross-benchmarking,

Q. Wang, C. Liu, and X. Chu, “Gpgpu performance estimation for frequency scaling using cross-benchmarking,” inProceedings of the 13th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, pp. 31–40, 2020

work page 2020

[14] [25]

Optimal gpu frequency selection using multi-objective approaches for hpc sys- tems,

G. Ali, S. Bhalachandra, N. J. Wright, M. Side, and Y . Chen, “Optimal gpu frequency selection using multi-objective approaches for hpc sys- tems,” in2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7, IEEE, 2022

work page 2022

[15] [26]

Performance-aware energy-efficient gpu frequency selection using dnn- based models,

G. Ali, M. Side, S. Bhalachandra, N. J. Wright, and Y . Chen, “Performance-aware energy-efficient gpu frequency selection using dnn- based models,” inProceedings of the 52nd International Conference on Parallel Processing, pp. 433–442, ACM, 2023

work page 2023

[16] [27]

A data-driven frequency scaling approach for deadline-aware energy ef- ficient scheduling on graphics processing units (gpus),

S. Ilager, R. Muralidhar, K. Ramamohanarao, and R. Buyya, “A data-driven frequency scaling approach for deadline-aware energy ef- ficient scheduling on graphics processing units (gpus),” in2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 579–588, IEEE, 2020

work page 2020

[17] [28]

Greengpu: A holistic approach to energy efficiency in gpu-cpu heterogeneous architectures,

K. Ma, X. Li, W. Chen, C. Zhang, and X. Wang, “Greengpu: A holistic approach to energy efficiency in gpu-cpu heterogeneous architectures,” in2012 41st International Conference on Parallel Processing, pp. 48– 57, IEEE, 2012

work page 2012

[18] [29]

Power capping of cpu-gpu heterogeneous systems through coordinating dvfs and task mapping,

T. Komoda, S. Hayashi, T. Nakada, S. Miwa, and H. Nakamura, “Power capping of cpu-gpu heterogeneous systems through coordinating dvfs and task mapping,” in2013 IEEE 31st International Conference on Computer Design (ICCD), pp. 349–356, IEEE, 2013

work page 2013

[19] [30]

Coordinated energy management in heterogeneous processors,

I. Paul, V . Ravi, S. Manne, M. Arora, and S. Yalamanchili, “Coordinated energy management in heterogeneous processors,” inSC ’13: Proceed- ings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12, IEEE, 2013

work page 2013

[20] [31]

Harmonia: Balancing compute and memory power in high-performance gpus,

I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia: Balancing compute and memory power in high-performance gpus,” inProceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 54–65, ACM, 2015

work page 2015

[21] [32]

Multi-kernel auto- tuning on gpus: Performance and energy-aware optimization,

J. Guerreiro, A. Ilic, N. Roma, and P. Tom ´as, “Multi-kernel auto- tuning on gpus: Performance and energy-aware optimization,” in2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 438–445, IEEE, 2015

work page 2015

[22] [33]

Dynamic gpgpu power management using adaptive model predictive control,

A. Majumdar, L. Piga, I. Paul, J. L. Greathouse, W. Huang, and D. H. Albonesi, “Dynamic gpgpu power management using adaptive model predictive control,” in2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 613–624, IEEE, 2017

work page 2017

[23] [34]

Predictable gpus frequency scaling for energy and performance,

K. Fan, B. Cosenza, and B. Juurlink, “Predictable gpus frequency scaling for energy and performance,” inProceedings of the 48th International Conference on Parallel Processing, pp. 1–10, ACM, 2019

work page 2019

[24] [35]

Dvfs-aware application classification to improve gpgpus energy efficiency,

J. Guerreiro, A. Ilic, N. Roma, and P. Tom ´as, “Dvfs-aware application classification to improve gpgpus energy efficiency,”Parallel Computing, vol. 83, pp. 93–117, 2019

work page 2019

[25] [36]

Coordinated batching and dvfs for dnn inference on gpu accelerators,

S. M. Nabavinejad, S. Reda, and M. Ebrahimi, “Coordinated batching and dvfs for dnn inference on gpu accelerators,”IEEE transactions on parallel and distributed systems, vol. 33, no. 10, pp. 2496–2508, 2022

work page 2022

[26] [37]

Equalizer: Dynamic tuning of gpu resources for efficient execution,

A. Sethia and S. Mahlke, “Equalizer: Dynamic tuning of gpu resources for efficient execution,” in2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 647–658, IEEE, 2014

work page 2014

[27] [38]

The crisp performance model for dynamic voltage and frequency scaling in a gpgpu,

R. Nath and D. Tullsen, “The crisp performance model for dynamic voltage and frequency scaling in a gpgpu,” inProceedings of the 48th international symposium on microarchitecture, pp. 281–293, 2015

work page 2015

[28] [39]

Grape: Minimizing energy for gpu applications with performance requirements,

M. H. Santriaji and H. Hoffmann, “Grape: Minimizing energy for gpu applications with performance requirements,” in2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13, IEEE, 2016

work page 2016

[29] [40]

Indicator-directed dynamic power management for iterative workloads on gpu-accelerated systems,

P. Zou, A. Li, K. Barker, and R. Ge, “Indicator-directed dynamic power management for iterative workloads on gpu-accelerated systems,” in 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 559–568, IEEE, 2020

work page 2020

[30] [42]

Coordinated energy management in heterogeneous processors,

I. Paul, V . Ravi, S. Manne, M. Arora, and S. Yalamanchili, “Coordinated energy management in heterogeneous processors,” inProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12, 2013

work page 2013

[31] [43]

Altis: Modernizing gpgpu benchmarks,

B. Hu and C. J. Rossbach, “Altis: Modernizing gpgpu benchmarks,” in2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 1–11, IEEE, 2020

work page 2020

[32] [44]

Altis-sycl: Migrating altis benchmarking suite from cuda to sycl for gpus and fpgas,

C. Weckert, L. Solis-Vasquez, J. Oppermann, A. Koch, and O. Sinnen, “Altis-sycl: Migrating altis benchmarking suite from cuda to sycl for gpus and fpgas,” inProceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 547–555, 2023

work page 2023

[33] [45]

ECP proxy apps suite

“ECP proxy apps suite.” https://proxyapps.exascaleproject.org/ ecp- proxy- apps- suite/., 2025

work page 2025

[34] [46]

Mlperf™ hpc: A holistic benchmark suite for scientific machine learning on hpc systems,

S. Farrell, M. Emani, J. Balma, L. Drescher, A. Drozd, A. Fink, G. Fox, D. Kanter, T. Kurth, P. Mattson,et al., “Mlperf™ hpc: A holistic benchmark suite for scientific machine learning on hpc systems,” in 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), pp. 33–45, IEEE, 2021

work page 2021

[35] [47]

A benchmark suite for improving performance portability of the sycl programming model,

Z. Jin and J. S. Vetter, “A benchmark suite for improving performance portability of the sycl programming model,” in2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 325–327, IEEE, 2023

work page 2023

[36] [49]

Accelwattch: A power modeling framework for modern gpus,

V . Kandiah, S. Peverelle, M. Khairy, J. Pan, A. Manjunath, T. G. Rogers, T. M. Aamodt, and N. Hardavellas, “Accelwattch: A power modeling framework for modern gpus,” inMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 738–753, 2021

work page 2021

[37] [50]

Gpgpu performance estimation with core and memory frequency scaling,

Q. Wang and X. Chu, “Gpgpu performance estimation with core and memory frequency scaling,”IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 12, pp. 2865–2881, 2020

work page 2020

[38] [51]

Gpgpu power modeling for multi-domain voltage-frequency scaling,

J. Guerreiro, A. Ilic, N. Roma, and P. Tomas, “Gpgpu power modeling for multi-domain voltage-frequency scaling,” in2018 IEEE Interna- tional Symposium on High Performance Computer Architecture (HPCA), pp. 789–800, IEEE, 2018

work page 2018

[39] [52]

Gpgpu power estimation with core and memory frequency scaling,

Q. Wang and X. Chu, “Gpgpu power estimation with core and memory frequency scaling,”ACM SIGMETRICS Performance Evaluation Review, vol. 45, no. 2, pp. 73–78, 2017

work page 2017

[40] [53]

Power and performance characterization and modeling of gpu-accelerated systems,

Y . Abe, H. Sasaki, S. Kato, K. Inoue, M. Edahiro, and M. Peres, “Power and performance characterization and modeling of gpu-accelerated systems,” in2014 IEEE 28th international parallel and distributed processing symposium, pp. 113–122, IEEE, 2014

work page 2014

[41] [54]

Gpu power prediction via ensemble machine learning for dvfs space exploration,

B. Dutta, V . Adhinarayanan, and W.-c. Feng, “Gpu power prediction via ensemble machine learning for dvfs space exploration,” inProceedings of the 15th ACM International Conference on Computing Frontiers, pp. 240–243, 2018

work page 2018

[42] [55]

Gpgpu performance and power estimation using machine learning,

G. Wu, J. L. Greathouse, A. Lyashevsky, N. Jayasena, and D. Chiou, “Gpgpu performance and power estimation using machine learning,” in 2015 IEEE 21st international symposium on high performance computer architecture (HPCA), pp. 564–576, IEEE, 2015

work page 2015

[43] [56]

Large-scale parallel collaborative filtering for the netflix prize,

Y . Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-scale parallel collaborative filtering for the netflix prize,” inAlgorithmic Aspects in Information and Management: 4th International Conference, AAIM 2008, Shanghai, China, June 23-25, 2008. Proceedings 4, pp. 337–348, Springer, 2008

work page 2008

[44] [57]

Design and implementation of movie recommendation system based on knn collaborative filtering algorithm,

B.-B. Cui, “Design and implementation of movie recommendation system based on knn collaborative filtering algorithm,” inITM web of conferences, vol. 12, p. 04008, EDP Sciences, 2017

work page 2017

[45] [58]

Svd-based collaborative filtering with privacy,

H. Polat and W. Du, “Svd-based collaborative filtering with privacy,” inProceedings of the 2005 ACM symposium on Applied computing, pp. 791–795, 2005

work page 2005

[46] [59]

Variational autoencoders for collaborative filtering,

D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara, “Variational autoencoders for collaborative filtering,” inProceedings of the 2018 world wide web conference, pp. 689–698, 2018

work page 2018

[47] [60]

Neural collaborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” inProceedings of the 26th International Confer- ence on World Wide Web, WWW ’17, (Republic and Canton of Geneva, CHE), p. 173–182, International World Wide Web Conferences Steering Committee, 2017

work page 2017

[48] [61]

Gromacs: fast, flexible, and free,

D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. Berendsen, “Gromacs: fast, flexible, and free,”Journal of computational chemistry, vol. 26, no. 16, pp. 1701–1718, 2005

work page 2005

[49] [62]

Atomic and M

L.-s. Atomic and M. M. P. Simulator, “Lammps,”available at: http:/lammps. sandia. gov, 2013

work page 2013

[50] [63]

Scalable molecular dynamics with namd,

J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten, “Scalable molecular dynamics with namd,”Journal of computational chemistry, vol. 26, no. 16, pp. 1781–1802, 2005

work page 2005

[51] [64]

Lessons learned from the chameleon testbed,

K. Keahey, J. Anderson, Z. Zhen, P. Riteau, P. Ruth, D. Stanzione, M. Cevik, J. Colleran, H. S. Gunawi, C. Hammock,et al., “Lessons learned from the chameleon testbed,” in2020 USENIX annual technical conference (USENIX ATC 20), pp. 219–233, 2020

work page 2020

[52] [65]

Improving gpu energy efficiency through an application-transparent frequency scaling policy with performance assurance,

Y . Zhang, Q. Wang, Z. Lin, P. Xu, and B. Wang, “Improving gpu energy efficiency through an application-transparent frequency scaling policy with performance assurance,” inProceedings of the Nineteenth European Conference on Computer Systems, pp. 769–785, 2024

work page 2024

[53] [66]

Power analysis of nersc production workloads,

Z. Zhao, E. Rrapaj, S. Bhalachandra, B. Austin, H. A. Nam, and N. Wright, “Power analysis of nersc production workloads,” inProceed- ings of the SC’23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 1279– 1287, 2023

work page 2023

[54] [67]

Collaborative filtering recommender systems,

J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” inThe adaptive web: methods and strategies of web personalization, pp. 291–324, Springer, 2007

work page 2007

[55] [68]

Paragon: Qos-aware scheduling for heterogeneous datacenters,

C. Delimitrou and C. Kozyrakis, “Paragon: Qos-aware scheduling for heterogeneous datacenters,”ACM SIGPLAN Notices, vol. 48, no. 4, pp. 77–88, 2013

work page 2013

[56] [69]

Predicting per- formance using collaborative filtering,

S. Salaria, A. Drozd, A. Podobas, and S. Matsuoka, “Predicting per- formance using collaborative filtering,” in2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 504–514, IEEE, 2018

work page 2018