pith. sign in

arxiv: 2604.26815 · v1 · submitted 2026-04-29 · 💻 cs.SE · cs.PF

What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools

Pith reviewed 2026-05-07 13:04 UTC · model grok-4.3

classification 💻 cs.SE cs.PF
keywords RAPLenergy monitoringmeasurement overheadpower profilingMSR accessperformance toolsbenchmarking
0
0 comments X

The pith

Existing RAPL energy tools add 0.25 to 46.75 percent time overhead at 1 kHz polling while simplified versions stay near the no-tool baseline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures the extra time and energy consumed by different programs that read RAPL power counters one thousand times per second. It runs seven tools, including two simple ones the authors wrote, against a no-monitoring baseline on six NAS benchmark functions and finds that most existing user-space tools slow the program noticeably. A second experiment times the basic read operations themselves and shows system calls take longer than direct register reads. The results matter because high-frequency energy monitoring is used to study software efficiency, yet the act of monitoring can change the runtimes and energy figures it is meant to record. If these overheads are ignored, the reported energy values for the same code can differ depending on which tool is chosen.

Core claim

Existing user-space RAPL tools introduce time overhead ranging from 0.25% to 46.75% at 1 kHz polling, whereas the authors' simplified tools maintain time overhead levels close to the no-tool baseline. System calls are slower than rdmsr, which is slower than traditionally long-running instructions like cpuid. RAPL-based energy measurement can be substantially improved by simplifying tool design and employing lower-level instructions to access RAPL values.

What carries the argument

Different methods for polling RAPL model-specific registers (user-space applications, kernel modules, direct rdmsr instructions, and system calls such as sys/proc_read) and the execution latency each method adds at high frequency.

If this is right

  • High-frequency energy profiling can distort measured runtimes by tens of percent unless the monitoring tool is deliberately minimal.
  • Direct register reads and simplified kernel modules keep added time near the no-monitoring case.
  • System calls for reading RAPL counters are measurably slower than inline register instructions.
  • Energy figures collected with different tools on identical code can differ because of the tool's own overhead.
  • Practitioners can reduce distortion by preferring lower-level access methods when 1 kHz sampling is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If overhead depends on workload type, then energy reports for web servers or database queries could show different distortion than the compute-heavy NAS tests.
  • Library authors could expose a documented cost model so users know whether 1 kHz polling is safe for their particular experiment.
  • Future hardware might add sampling modes that avoid polling entirely, removing the latency trade-off measured here.

Load-bearing premise

That the six NAS benchmark functions run on one hardware platform produce overhead percentages that will hold for other workloads and other machines.

What would settle it

Repeating the 1 kHz polling experiments with the same seven tools on a different CPU model or with a memory-bound or I/O-bound workload and obtaining time-overhead values outside the reported 0.25-46.75 percent range.

Figures

Figures reproduced from arXiv: 2604.26815 by Jeremy Diamond, Vincenzo Stoico.

Figure 1
Figure 1. Figure 1: Flamegraphs comparison in percent of compute view at source ↗
Figure 2
Figure 2. Figure 2: Runner and System Under Test Experimental Steps view at source ↗
Figure 3
Figure 3. Figure 3: Profiling time (in seconds) for each tool across view at source ↗
Figure 4
Figure 4. Figure 4: Execution time (in nanoseconds) per instruction view at source ↗
Figure 5
Figure 5. Figure 5: Lineplot of Energy and Time Relative Overhead for view at source ↗
read the original abstract

The Running Average Power Limit (RAPL) interface is widely used to estimate software energy consumption via CPU and DRAM counters, but tool design differences and high-frequency polling can introduce measurement overhead, namely, extra time and energy consumed by the tool itself.This paper quantifies the impact of RAPL-based tools on high-frequency (1 kHz) energy monitoring and investigates mitigation strategies. We conduct two controlled experiments: the first evaluates seven tools, including a user-space application and a kernel module developed by the authors, against a no-tool baseline, using six NAS Benchmark functions to quantify overhead. The second experiment isolates and times key functions for polling Model-Specific Registers (MSRs) (rdmsr and sys/proc_read) to estimate their execution latencies and identify potential slowdowns. The results show that existing user-space tools can introduce substantial time overhead at 1 kHz, whereas our tools significantly reduce system call overhead and inline math overhead. The time overhead of existing tools ranges from 0.25% to 46.75%. Our solutions maintain time overhead levels close to the baseline. We also find that system calls are slower than rdmsr, which in turn is slower than traditionally long-running instructions like cpuid. These findings indicate that RAPL-based energy measurement can be substantially improved by simplifying tool design and employing lower-level instructions to access RAPL values. Our findings provide guidance for practitioners on how to develop high-frequency energy profiling tools, show possible situations that can skew energy values, and demonstrate that access to RAPL values can be faster using specific techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that RAPL-based energy monitoring tools incur measurable time overhead at 1 kHz polling, with existing user-space tools showing 0.25%–46.75% overhead relative to a no-tool baseline across six NAS benchmark functions, while the authors' simplified user-space application and kernel module keep overhead near baseline levels. A second experiment isolates MSR access latencies, finding syscalls slower than rdmsr (itself slower than cpuid), and concludes that simplifying tool design and using lower-level instructions can substantially reduce overhead. The work provides practitioner guidance on avoiding measurement artifacts in high-frequency energy profiling.

Significance. If the reported overhead ranges and latency comparisons hold under the described controls, this empirical study is significant for the software energy measurement community. It supplies concrete evidence that tool implementation choices directly affect the validity of energy data, quantifies the problem with controlled baseline comparisons, and demonstrates practical mitigations via simplified designs. The dual-experiment structure (workload-level overhead plus micro-benchmarked access costs) is a methodological strength that could inform future tool development and encourage routine overhead validation in energy studies.

major comments (1)
  1. The central guidance for practitioners rests on overhead percentages measured exclusively with six NAS Parallel Benchmark functions on one hardware platform. These kernels are dominated by predictable floating-point compute loops and do not exercise I/O, frequent system calls, or cache contention that could alter per-poll costs. Because the manuscript derives design recommendations from these specific numbers, the limited workload and platform sampling is load-bearing for the claim that the authors' tools 'significantly reduce' overhead in general; additional benchmarks or an explicit limitations discussion is needed to support broader applicability.
minor comments (1)
  1. The abstract states that seven tools were evaluated but does not enumerate them or their access mechanisms; a summary table in the methods section would improve traceability of the 0.25%–46.75% range.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. The single major comment highlights a valid concern about the generalizability of our results, which we address by committing to an explicit limitations discussion in the revised manuscript.

read point-by-point responses
  1. Referee: The central guidance for practitioners rests on overhead percentages measured exclusively with six NAS Parallel Benchmark functions on one hardware platform. These kernels are dominated by predictable floating-point compute loops and do not exercise I/O, frequent system calls, or cache contention that could alter per-poll costs. Because the manuscript derives design recommendations from these specific numbers, the limited workload and platform sampling is load-bearing for the claim that the authors' tools 'significantly reduce' overhead in general; additional benchmarks or an explicit limitations discussion is needed to support broader applicability.

    Authors: We agree that the evaluation is scoped to six NAS Parallel Benchmark kernels on a single platform and that these workloads are primarily compute-bound floating-point loops. This choice was deliberate to isolate polling overhead under controlled, repeatable conditions typical of many energy-profiling studies, but we recognize it does not cover I/O-intensive, system-call-heavy, or cache-contention scenarios that could change per-poll costs. We will add a dedicated Limitations section that (1) explicitly states the workload and platform constraints, (2) cautions against direct extrapolation to other workload classes, and (3) notes that the observed relative ordering of tool overheads (user-space vs. kernel-module vs. baseline) is expected to hold for similar compute-intensive polling patterns. This revision will make the scope of the practitioner guidance transparent without requiring new experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical timing measurements against external baseline

full rationale

The paper conducts two controlled experiments that directly measure execution times of RAPL polling operations and compare them to a no-tool baseline using NAS benchmarks on specific hardware. All reported overhead percentages (0.25%–46.75% for existing tools, near-baseline for authors' tools) and latency comparisons (rdmsr vs. syscalls) are obtained from raw timing data collected in the experiments. No equations, fitted parameters, predictions, or models are derived; no self-citations support load-bearing claims; and no ansatz or uniqueness theorems are invoked. The derivation chain consists solely of experimental setup, data collection, and statistical summarization of observed differences, which does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study rests on standard assumptions about benchmark representativeness and hardware counter accuracy; no free parameters, invented entities, or non-standard axioms are introduced in the abstract.

pith-pipeline@v0.9.0 · 5583 in / 1126 out tokens · 36579 ms · 2026-05-07T13:04:05.797501+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 3 canonical work pages

  1. [1]

    Lukas Alt, Anara Kozhokanova, Thomas Ilsche, Christian Terboven, and Matthias S Mueller. 2024. An experimental setup to evaluate rapl energy counters for heterogeneous memory. InProceedings of the 15th ACM/SPEC International Conference on Performance Engineering. 71–82

  2. [2]

    Radu Apsan, Vincenzo Stoico, Michel Albonico, Rudra Dhar, Karthik Vaid- hyanathan, and Ivano Malavolta. 2025. Generating Energy-Efficient Code via Large-Language Models–Where are we now? IEEE/ACM 48th International Conference on Software Engineering (ICSE 2026) (2025)

  3. [3]

    Gabriell Araujo, Dalvan Griebler, Dinei A Rockenbach, Marco Danelutto, and Luiz G Fernandes. 2023. NAS Parallel Benchmarks with CUDA and beyond. Software: Practice and Experience 53, 1 (2023), 53–80

  4. [4]

    Len Brown. 2009. turbostat – Report Processor Frequency and Idle Statistics. Linux kernel tool, tools/power/x86/turbostat. https://www.kernel.org/doc/ man-pages/ Available in the Linux kernel source tree. Man page: turbostat(8)

  5. [5]

    Thomas D Cook and D T Campbell. 1979. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin

  6. [6]

    Intel Corporation. 2026. Manuals for Intel®64 and IA-32 Architec- tures. https://www.intel.com/content/www/us/en/developer/articles/technical/ intel-sdm.html. Accessed: 2026-01-23

  7. [7]

    Spencer Desrochers, Chad Paradis, and Vincent M Weaver. 2016. A validation of DRAM RAPL power measurements. In Proceedings of the Second International Symposium on Memory Systems. 455–470

  8. [8]

    CodeCarbon Developers. 2020. Code Carbon: Code runtime carbon emissions tracker. https://github.com/mlco2/codecarbon. Accessed: 2025-11-27

  9. [9]

    Scaphandre Developers. 2020. Scaphandre - Energy consumption monitoring tool. https://github.com/hubblo-org/scaphandre. Accessed: 2025-11-27

  10. [10]

    Agner Fog. 2024. Instruction Tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs. https://www. agner.org/optimize/instruction_tables.pdf. Accessed: 2025-07-06

  11. [11]

    Hoffmann Geerd-Dietger and Majuntke Verena. 2025. PowerLetrics: An Open- Source Framework for Power and Energy Metrics for Linux. In 2025 IEEE/ACM 9th International Workshopon Green and Sustainable Software (GREENS). IEEE Computer Society, 28–30

  12. [12]

    Brendan Gregg. 2011. FlameGraph: Stack trace visualizer. https://github.com/ brendangregg/FlameGraph. Accessed: 2026-01-20

  13. [13]

    Marcus Hähnel, Björn Döbel, Marcus Völp, and Hermann Härtig. 2012. Measur- ing energy consumption for short code paths using RAPL. ACM SIGMETRICS Performance Evaluation Review 40, 3 (2012), 13–17

  14. [14]

    Herzog, S

    B. Herzog, S. Reif, J. Preis, W. Schröder-Preikschat, and T. Hönig. 2021. The Price of Meltdown and Spectre: Energy Overhead of Mitigations at Operating System Level. In 14th European Workshop on Systems Security (EuroSec). ACM. doi:10.1145/3447852.3458721

  15. [15]

    Jeremy Diamond, Vincenzo Stoico. 2026. Replication Package. https://github. com/S2-group/ease-2026-cost-energy-monitoring Accessed: 2026-04-22

  16. [16]

    Max Karsten, Andrei Calin Dragomir, Radu Apsan, Vincenzo Stoico, and Ivano Malavolta. 2025. Experiment Runner: a Tool for the Automatic Orchestration of Experiments Targeting Software Systems. Science of Computer Programming (2025), 103415

  17. [17]

    Kashif Nizam Khan, Mikael Hirki, Tapio Niemi, Jukka K Nurminen, and Zhonghong Ou. 2018. Rapl in action: Experiences in using rapl for power measurements. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS) 3, 2 (2018), 1–26

  18. [18]

    Dmitry Kuznetsov and Adam Morrison. 2022. Privbox: Faster System Calls Through Sandboxed Privileged Execution. In Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC ’22). USENIX Association, Carlsbad, CA, USA. https://www.usenix.org/system/files/atc22-kuznetsov.pdf

  19. [19]

    Technopôle Domolandes R&D Laboratory. 2024. EcoFloc: Energy Measuring System Tool for Linux. https://github.com/labDomolandes/ecofloc

  20. [20]

    Linux Kernel Organization. 2026. Linux Performance Events and Tools (perf). The Linux Kernel Archives. https://kernel.org Accessed: 2026-04-29

  21. [21]

    Shinobu Miwa and Shin’Ichiro Matsuo. 2023. Analyzing the performance impact of HPC workloads with gramine+ sgx on 3rd generation xeon scalable processors. In Proceedings of the SC’23Workshopsof the International Conference on High Performance Computing, Network, Storage, and Analysis. 1850–1858

  22. [22]

    Adel Noureddine. 2022. PowerJoular and JoularJX: Multi-Platform Soft- ware Power Monitoring Tools. In 18th International Conference on Intelligent Environments (IE2022). Biarritz, France

  23. [23]

    Vladimir Ostapenco, Laurent Lefèvre, Anne-Cécile Orgerie, and Benjamin Fichel

  24. [24]

    In International Conference on Algorithms and Architectures for Parallel Processing

    Exploring RAPL as a Power Capping Leverage for Power-Constrained Infrastructures. In International Conference on Algorithms and Architectures for Parallel Processing. Springer, 323–333

  25. [25]

    James Phung, Young Choon Lee, and Albert Y Zomaya. 2018. Modeling system- level power consumption profiles using RAPL. In 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA). IEEE, 1–4

  26. [26]

    Guillaume Raffin and Denis Trystram. 2024. Dissecting the software-based mea- surement of CPU energy consumption: a comparative analysis.IEEE Transactions on Parallel and Distributed Systems (2024)

  27. [27]

    June Sallou, Luís Cruz, and Thomas Durieux. 2023. Energibridge: empowering software sustainability through cross-platform energy measurement. arXiv preprint arXiv:2312.13897 (2023)

  28. [28]

    Arsalan Shahid, Muhammad Fahad, Ravi Reddy Manumachu, and Alexey Las- tovetsky. 2021. Improving the accuracy of energy predictive models for multicore CPUs by combining utilization and performance events model variables. J. Parallel and Distrib. Comput. 151 (2021), 38–51

  29. [29]

    Andrew S Tanenbaum and Herbert Bos. 2015. Modern operating systems. Pear- son Education, Inc

  30. [30]

    Philipp Thamm and Ulf Leser. 2025. Strategies to Measure Energy Consumption Using RAPL During Workflow Execution on Commodity Clusters.arXiv preprint arXiv:2505.09375 (2025)

  31. [31]

    uops.info. 2024. Instruction tables - Detailed latency, throughput and micro-op info for x86 instructions. https://uops.info/table.html. Accessed: 2025-07-06

  32. [32]

    Nicolas van Kempen, Hyuk-Je Kwon, Dung Tuan Nguyen, and Emery D Berger

  33. [33]

    In 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)

    It’s not easy being green: On the energy efficiency of programming languages. In 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1553–1565

  34. [34]

    Akshay Venkatesh, Krishna Kandalla, and Dhabaleswar K Panda. 2013. Evaluation of energy characteristics of mpi communication primitives with rapl. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. IEEE, 938–945

  35. [35]

    Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, Anders Wesslén, et al. 2012. Experimentation in software engineering. Vol. 236. Springer

  36. [36]

    Zhenkai Zhang, Sisheng Liang, Fan Yao, and Xing Gao. 2021. Red alert for power leakage: Exploiting intel rapl-induced side channels. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security. 162–175