pith. sign in

arxiv: 2605.27601 · v1 · pith:ZPW62TZ2new · submitted 2026-05-26 · 💻 cs.DC · cs.LG· cs.PF

A Methodology to Assess Power Modeling in Energy-Aware Federated Learning on Heterogeneous Mobile Devices

Pith reviewed 2026-06-29 15:22 UTC · model grok-4.3

classification 💻 cs.DC cs.LGcs.PF
keywords CPU power estimationfederated learningenergy efficiencymobile devicesARM SoCspower modelingheterogeneous computing
0
0 comments X

The pith

An analytical CMOS-based CPU power model achieves under 10% error on Android devices and enables 1.4 times lower energy use in federated learning than approximate models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a software-only methodology to estimate CPU power on heterogeneous multi-cluster ARM SoCs by mapping voltage rails to clusters. This allows use of an analytical CMOS-based model instead of the simplified approximations common in energy-aware federated learning. On two commodity Android devices the analytical model stays within 10% error while approximations reach errors as high as 959%. When plugged into the AnycostFL framework the accurate model delivers the target 80% accuracy at 1.4 times lower energy consumption.

Core claim

A reproducible CPU power estimation methodology that combines a rail-to-cluster mapping technique with the analytical CMOS-based model predicts CPU power with errors below 10% on heterogeneous Android devices, enabling the same 80% model accuracy in AnycostFL while consuming 1.4 times less energy than when the approximate model is used.

What carries the argument

Rail-to-cluster mapping technique that retrieves cluster-level supply voltage from software-accessible information on multi-cluster ARM SoCs.

If this is right

  • Energy-aware federated learning frameworks can select computation schedules that actually minimize energy rather than misestimate it.
  • Approximate power models risk leading to decisions that waste energy while still reaching the target accuracy.
  • The methodology removes the need for external measurement hardware when applying analytical models to commodity mobile devices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mapping approach could be tested on non-Android ARM platforms or other energy-aware distributed training systems.
  • Extending the model to include GPU or memory power on the same devices would give a fuller picture of total energy.
  • If the mapping holds across more SoC vendors, it would allow broader adoption of analytical models in mobile machine learning without custom hardware.

Load-bearing premise

The rail-to-cluster mapping can reliably retrieve cluster-level supply voltages on heterogeneous ARM SoCs using only software-accessible information without external hardware or extra device calibration.

What would settle it

Direct power measurements on additional Android devices with different multi-cluster SoC designs, comparing the analytical model's predictions against measured values across varied workloads.

Figures

Figures reproduced from arXiv: 2605.27601 by Chaimae Jallouli, Karim Boubouh, Robert Basmadjian.

Figure 1
Figure 1. Figure 1: Main pipeline for computing CPU dynamic power on heterogeneous ARM-based mobile [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dynamic power prediction comparison across analytical and approximate models on [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cumulative computation energy vs. Accuracy of AnycostFL on (a) Fashion-MNIST and [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Estimating CPU power on heterogeneous ARM-based commodity devices is challenging due to limited access to CPU's voltage domains. As a result, state-of-the-art energy-aware Federated Learning (FL) frameworks typically rely on simplified approximate power models to estimate computation energy, rather than the more accurate analytical CMOS-based model. To bridge this gap, we propose a reproducible CPU power estimation methodology combined with a rail-to-cluster mapping technique to retrieve cluster-level supply voltage. We evaluate our approach on two commodity Android devices and show that the analytical model predicts CPU power with errors below 10%, whereas the approximate model incurs errors of up to 959%. Using AnycostFL, a state-of-the-art energy-aware FL framework, we show that the analytical model achieves the same 80% model accuracy while consuming 1.4x less energy than the approximate model. These results highlight that approximate models can severely misestimate computation energy and lead to suboptimal decisions. This work facilitates the use of analytical CPU power models on heterogeneous multi-cluster ARM-based mobile SoCs without additional hardware support or external power measurement tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a reproducible methodology for CPU power estimation on heterogeneous multi-cluster ARM SoCs that combines an analytical CMOS-based power model with a rail-to-cluster mapping technique to recover per-cluster supply voltage from software-accessible registers. Evaluated on two Android devices, the analytical model achieves <10% prediction error while an approximate model reaches 959% error; when plugged into AnycostFL, the analytical model yields the same 80% target accuracy at 1.4x lower energy.

Significance. If the rail-to-cluster mapping proves robust, the work would enable more accurate energy-aware device selection and scheduling in federated learning on commodity mobile hardware without external measurement equipment, directly addressing a practical barrier that currently forces frameworks to rely on crude approximations.

major comments (2)
  1. [Methodology] Methodology (rail-to-cluster mapping): the central claim that the analytical model yields <10% error depends on accurate cluster-level voltage as input to the CMOS power equation; the manuscript validates the mapping only on the two tested devices and provides no independent cross-device verification or parameter-free derivation that would establish correctness on unseen heterogeneous ARM SoCs.
  2. [Evaluation] Evaluation section: the reported 1.4x energy reduction in AnycostFL is obtained by substituting the two power models into the same FL workload; without evidence that the mapping was derived without external hardware measurements during development, the error bounds and energy savings cannot be assumed to transfer beyond the two evaluated devices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the two major comments point-by-point below, providing clarifications on the methodology and evaluation while proposing targeted revisions to improve the manuscript.

read point-by-point responses
  1. Referee: [Methodology] Methodology (rail-to-cluster mapping): the central claim that the analytical model yields <10% error depends on accurate cluster-level voltage as input to the CMOS power equation; the manuscript validates the mapping only on the two tested devices and provides no independent cross-device verification or parameter-free derivation that would establish correctness on unseen heterogeneous ARM SoCs.

    Authors: The rail-to-cluster mapping is constructed exclusively from standard, software-accessible Linux kernel interfaces (CPUFreq sysfs entries and device-tree voltage domain descriptions) that are present on heterogeneous ARM SoCs; no device-specific parameters or external calibration are required. We will revise the methodology section to include the complete derivation steps, pseudocode, and explicit statement that the procedure is parameter-free once the kernel-exposed rails are read. While we acknowledge that additional devices would further strengthen generalizability claims, the two evaluated SoCs already differ in cluster count, voltage domain organization, and kernel version, providing initial evidence of transferability. The source code for the mapping will remain publicly available to enable independent verification. revision: partial

  2. Referee: [Evaluation] Evaluation section: the reported 1.4x energy reduction in AnycostFL is obtained by substituting the two power models into the same FL workload; without evidence that the mapping was derived without external hardware measurements during development, the error bounds and energy savings cannot be assumed to transfer beyond the two evaluated devices.

    Authors: No external hardware measurements were used at any stage, including during development of the mapping. All voltage values were obtained directly from the same software registers later used at runtime; the <10% error figures were computed against on-device power readings collected via identical software interfaces. We will add an explicit paragraph in the evaluation section documenting this process and reiterating that the methodology requires no external equipment. Consequently, the 1.4x energy saving is a direct consequence of substituting the more accurate analytical model (whose error was measured on-device) into AnycostFL, and the same substitution can be performed on any other device exposing the same kernel interfaces. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical methodology validated by direct device measurements

full rationale

The paper describes a reproducible measurement methodology for retrieving cluster-level voltage on heterogeneous ARM SoCs and compares an analytical CMOS power model against an approximate model through experiments on two Android devices. Error bounds (<10% vs. up to 959%) and the 1.4x energy savings in AnycostFL are obtained by executing the identical FL workload under each model and recording actual outcomes; these results do not reduce to fitted parameters or self-citations by construction. The central claims rest on external hardware-independent measurements rather than any derivation that is definitionally equivalent to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the rail-to-cluster mapping can be performed from software-accessible registers alone and that the two tested devices are representative of heterogeneous ARM mobile SoCs. No free parameters are explicitly fitted in the abstract; the analytical CMOS model itself is treated as standard physics.

axioms (1)
  • domain assumption The analytical CMOS power equation remains valid for modern mobile ARM clusters when supplied with accurate per-cluster voltage.
    Invoked when claiming <10% prediction error; the abstract treats the CMOS model as the ground truth against which approximations are judged.

pith-pipeline@v0.9.1-grok · 5734 in / 1438 out tokens · 23566 ms · 2026-06-29T15:22:44.987731+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 16 canonical work pages

  1. [1]

    Includes SVI2 voltage identification format

    Advanced Micro Devices.BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 15h Processors, 2013. Includes SVI2 voltage identification format. 13

  2. [2]

    Details MSR_PSTATE registers and VID decoding for Zen architectures

    AdvancedMicroDevices.AMD64 Architecture Programmer’s Manual: BIOS and Kernel Devel- oper Guide for AMD Zen Processors, 2017. Details MSR_PSTATE registers and VID decoding for Zen architectures

  3. [3]

    Measure component power.https://source.android.com/ docs/core/power/component, 2026

    Android Open Source Project. Measure component power.https://source.android.com/ docs/core/power/component, 2026. Accessed: 2026-01-12

  4. [4]

    Measure device power.https://source.android.com/docs/ core/power/device, 2026

    Android Open Source Project. Measure device power.https://source.android.com/docs/ core/power/device, 2026. Accessed: 2026-01-12

  5. [5]

    URL https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/ big-little-technology-the-future-of-mobile.pdf

    ARM Ltd.big.LITTLE Technology: The Future of Mobile, 2013. URL https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/ big-little-technology-the-future-of-mobile.pdf

  6. [6]

    Arm-based power modeling repository.https://anonymous.4open

    Anonymous Authors. Arm-based power modeling repository.https://anonymous.4open. science/r/ARM-based-Power-04E8/README.md. Accessed: 2026-01-27

  7. [7]

    Per-cluster activation script (perclusterpower.sh), 2026

    Anonymous Authors. Per-cluster activation script (perclusterpower.sh), 2026. URLhttps:// anonymous.4open.science/r/ARM-based-Power-04E8/percluster_power.sh. Anonymized artifact repository. Accessed: 2026-01-29

  8. [8]

    Single activation script (singlepower.sh), 2026

    Anonymous Authors. Single activation script (singlepower.sh), 2026. URLhttps:// anonymous.4open.science/r/ARM-based-Power-04E8/single_power.sh. Anonymized arti- fact repository. Accessed: 2026-01-29

  9. [9]

    Powerandenergyanalysisonodroid-xu+eandadaptivepower models

    InHwanBaekandXiangruiLiu. Powerandenergyanalysisonodroid-xu+eandadaptivepower models. InUCLA Engineering Technical Report, Los Angeles, CA, USA, 2015. University of California, Los Angeles

  10. [10]

    Power and energy analysis on odroid-xu+e and adaptive power model

    In Hwan Baek and Xiangrui Liu. Power and energy analysis on odroid-xu+e and adaptive power model. Technical report, University of California Los Angeles, Los Angeles, CA, USA, 2017

  11. [11]

    On the advantages of p2p ml on mobile devices

    Robert Basmadjian, Karim Boubouh, Amine Boussetta, Rachid Guerraoui, and Alexandre Maurer. On the advantages of p2p ml on mobile devices. InProceedings of the thirteenth ACM international conference on future energy systems, pages 338–353, 2022

  12. [12]

    Software-based estimation of software-induced energy dissipation with powerstat

    Yannick Becker and Stefan Naumann. Software-based estimation of software-induced energy dissipation with powerstat. InFrom Science to Society: The Bridge Provided by Environmental Informatics, pages 69–73. Shaker Verlag, 2017

  13. [13]

    Power profiler: Monitoring energy consumption of ml algorithmsonandroidmobiledevices

    Karim Boubouh and Robert Basmadjian. Power profiler: Monitoring energy consumption of ml algorithmsonandroidmobiledevices. InCompanion Proceedings of the 14th ACM International Conference on Future Energy Systems, e-Energy ’23 Companion, New York, NY, USA, 2023. ACM. doi: 10.1145/3599733.3600248

  14. [14]

    Rethinking energy- performance trade-off in mobile web page loading

    Duc Hoang Bui, Yunxin Liu, Hyosu Kim, Insik Shin, and Feng Zhao. Rethinking energy- performance trade-off in mobile web page loading. InProceedings of the 21st Annual Inter- national Conference on Mobile Computing and Networking, MobiCom ’15, pages 14–26, New York, NY, USA, 2015. ACM. doi: 10.1145/2789168.2790090

  15. [15]

    Burd and Robert W

    Thomas D. Burd and Robert W. Brodersen. Energy efficient cmos microprocessor design.IEEE Journal of Solid-State Circuits, 30(2):118–125, February 1995. doi: 10.1109/4.350185. 14

  16. [16]

    Burd and Robert W

    Thomas D. Burd and Robert W. Brodersen. Processor design for portable systems.Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 13(2-3):203–221,

  17. [17]

    doi: 10.1007/BF01130406

  18. [18]

    An analysis of power consumption in a smartphone

    Aaron Carroll and Gernot Heiser. An analysis of power consumption in a smartphone. In Proceedings of the 2010 USENIX Annual Technical Conference, USENIX ATC ’10, pages 271– 284, Berkeley, CA, USA, 2010. USENIX Association

  19. [19]

    Chandrakasan, Samuel Sheng, and Robert W

    Anantha P. Chandrakasan, Samuel Sheng, and Robert W. Brodersen. Low-power cmos digital design.IEEE Journal of Solid-State Circuits, 27(4):473–484, 1992. doi: 10.1109/4.126534

  20. [20]

    A survey of on-device machine learning: An algorithms and learning theory perspective.ACM Transactions on Internet of Things, 2(3), 2021

    Sauptik Dhar, Junyao Guo, Jiayi Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. A survey of on-device machine learning: An algorithms and learning theory perspective.ACM Transactions on Internet of Things, 2(3), 2021

  21. [21]

    Self-constructing energy models for mobile devices

    Mingjun Dong and Lin Zhong. Self-constructing energy models for mobile devices. InPro- ceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys), pages 335–348. ACM, 2011

  22. [22]

    Ex kernel manager, August 2024

    flar2. Ex kernel manager, August 2024. URLhttps://play.google.com/store/apps/ details?hl=en-US&id=flar2.exkernelmanager. Android app. Accessed: 2025-12-09

  23. [23]

    Gnome desktop environment.https://www.gnome.org/, 2025

    GNOME Foundation. Gnome desktop environment.https://www.gnome.org/, 2025. Ac- cessed: 2025-02-14

  24. [24]

    ARM Ltd., September 2011

    Peter Greenhalgh.ARM big.LITTLE Processing with ARM Cortex-A15 and Cortex-A7. ARM Ltd., September 2011

  25. [25]

    Khushi Gupta and Tushar Sharma. Changing trends in computer architecture: A compre- hensive analysis of arm and x86 processors.International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 7:619–631, 2021

  26. [26]

    i3 window manager.https://i3wm.org/, 2025

    i3wm. i3 window manager.https://i3wm.org/, 2025. Accessed: 2025-02-14

  27. [27]

    How to determine cause of processor frequency scale down to 200 mhz, 2019

    Intel Community. How to determine cause of processor frequency scale down to 200 mhz, 2019. URLhttps://community.intel.com/t5/Software-Tuning-Performance/ How-to-determine-cause-of-processor-frequency-scale-down-to-200/m-p/1137067. Accessed: 2025-11-25

  28. [28]

    Intel®xeon®w-2123 processor (8.25m cache, 3.60 ghz) specifi- cations, 2017

    Intel Corporation. Intel®xeon®w-2123 processor (8.25m cache, 3.60 ghz) specifi- cations, 2017. URLhttps://www.intel.com/content/www/us/en/products/sku/125036/ intel-xeon-w2123-processor-8-25m-cache-3-60-ghz/specifications.html. Accessed: 2025-11-28

  29. [29]

    Intel Corporation, order number 335592 edition, 2023

    Intel Corporation.Intel®64 and IA-32 Architectures Software Developer’s Manual, Volume 4: Model-Specific Registers. Intel Corporation, order number 335592 edition, 2023. URL https://kib.kiev.ua/x86docs/Intel/SDMs/252046-048.pdf. Accessed: 2025-11-25

  30. [30]

    stress-ng, October 2020

    Colin Ian King. stress-ng, October 2020. URLhttps://wiki.ubuntu.com/Kernel/ Reference/stress-ng. Ubuntu Wiki. Accessed: 2025-11-28

  31. [31]

    powerstat.https://github.com/ColinIanKing/powerstat, 2025

    Colin Ian King. powerstat.https://github.com/ColinIanKing/powerstat, 2025. Accessed: 2025-02-14. 15

  32. [32]

    Tullsen, Parthasarathy Ranganathan, Norman P

    Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. Single-isa heterogeneous multi-core architectures for multithreaded work- load performance.ACM SIGARCH Computer Architecture News, 31(2):64–75, 2003. doi: 10.1145/871656.859629

  33. [33]

    Anycostfl: Efficient on-demand federated learning over heterogeneous edge devices

    Peichun Li, Guoliang Cheng, Xumin Huang, Jiawen Kang, Rong Yu, Yuan Wu, and Miao Pan. Anycostfl: Efficient on-demand federated learning over heterogeneous edge devices. In Proceedings of the IEEE INFOCOM 2023 - IEEE Conference on Computer Communications, pages 1–10, New York, NY, USA, 2023. IEEE. doi: 10.1109/INFOCOM53939.2023.10228888

  34. [34]

    System thermal analysis for mobile phone.Applied Thermal Engineering, 28(14-15):1889–1895, 2008

    Zhaoxia Luo, Hyejung Cho, Xiaobing Luo, and Kyung-il Cho. System thermal analysis for mobile phone.Applied Thermal Engineering, 28(14-15):1889–1895, 2008

  35. [35]

    Communication-efficient learning of deep networks from decentralized data

    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. Communication-efficient learning of deep networks from decentralized data. InProceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 ofPMLR, pages 1273–1282, Fort Lauderdale, FL, USA, 2017. PMLR

  36. [36]

    Harnessing green it: Principles and practices.IT Professional, 10(1):24–33,

    San Murugesan. Harnessing green it: Principles and practices.IT Professional, 10(1):24–33,

  37. [37]

    doi: 10.1109/MITP.2008.10

  38. [38]

    Power and performance anal- ysis of dynamiq-based multi-core processors

    Mahesh Navada, Sanjay Saikia, and Abhijit Bhattacharyya. Power and performance anal- ysis of dynamiq-based multi-core processors. InProceedings of the 2019 IEEE 37th In- ternational Conference on Computer Design (ICCD), pages 478–481. IEEE, 2019. doi: 10.1109/ICCD46524.2019.00071

  39. [39]

    Charlie Hu, and Ming Zhang

    Abhinav Pathak, Y. Charlie Hu, and Ming Zhang. Fine-grained power modeling for smart- phones using system call tracing. InProceedings of the Sixth European Conference on Computer Systems (EuroSys), pages 153–168. ACM, 2011

  40. [40]

    Reverse engineering dvfs mechanisms

    Ryan Piersma, Tawhid Bhuiyan, Tanvir Ahmed Khan, and Simha Sethumadhavan. Reverse engineering dvfs mechanisms. InProceedings of the 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pages 111–122, Tysons Corner, VA, USA, 2025. IEEE. doi: 10.1109/HOST55342.2025.00000

  41. [41]

    Dissecting the software-based measurement of cpu energy consumption: A comparative analysis.arXiv preprint arXiv:2401.15985, 2024

    Guillaume Raffin and Denis Trystram. Dissecting the software-based measurement of cpu energy consumption: A comparative analysis.arXiv preprint arXiv:2401.15985, 2024

  42. [42]

    Dissecting the software-based measurement of cpu energy consumption: A comparative analysis, 2024

    Guillaume Raffin and Denis Trystram. Dissecting the software-based measurement of cpu energy consumption: A comparative analysis, 2024

  43. [43]

    Demo: Phone power monitoring with battor

    Aaron Schulman, Daniel Levin, Neil Spring, Bobby Bhattacharjee, and Patrick Levis. Demo: Phone power monitoring with battor. InProceedings of the 17th Annual International Con- ference on Mobile Computing and Networking, MobiCom ’11, pages 113–114, New York, NY, USA, 2011. ACM. doi: 10.1145/2030613.2030667

  44. [44]

    Toward energy- efficient federated learning over 5g+ mobile devices.IEEE Wireless Communications, 29(5): 44–51, 2022

    Dian Shi, Liang Li, Rui Chen, Pavana Prakash, Miao Pan, and Yuguang Fang. Toward energy- efficient federated learning over 5g+ mobile devices.IEEE Wireless Communications, 29(5): 44–51, 2022

  45. [45]

    Termux.https://termux.dev/, 2026

    Termux Project. Termux.https://termux.dev/, 2026. Accessed: 2026-01-21. 16

  46. [47]

    Walker, Stephan Diestelhorst, Andreas Hansson, Anup K

    Matthew J. Walker, Stephan Diestelhorst, Andreas Hansson, Anup K. Das, Sheng Yang, Bashir M. Al-Hashimi, and Geoff V. Merrett. Accurate and stable run-time power modeling for mobile and embedded cpus.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(1):106–119, 2017. doi: 10.1109/TCAD.2016.2562920

  47. [48]

    Measuring power consumption in mobile devices for energy sustainable computing: Review of power measuring tools.Sustainable Computing: Informatics and Systems, 32:100627, 2021

    Genc-Alban Wilke, Aniko Hegedus, et al. Measuring power consumption in mobile devices for energy sustainable computing: Review of power measuring tools.Sustainable Computing: Informatics and Systems, 32:100627, 2021. doi: 10.1016/j.suscom.2021.100627

  48. [49]

    Documents VCORE voltage monitoring via Super I/O hardware interface

    Winbond Electronics Corporation.W83627HF Hardware Monitor Datasheet, 2002. Documents VCORE voltage monitoring via Super I/O hardware interface

  49. [50]

    Energy efficient federated learning over wireless communication networks.IEEE Transactions on Wireless Communications, 20(3):1935–1949, 2021

    Zhaohui Yang, Mingzhe Chen, Walid Saad, Choong Seon Hong, and Mohammad Shikh-Bahaei. Energy efficient federated learning over wireless communication networks.IEEE Transactions on Wireless Communications, 20(3):1935–1949, 2021. doi: 10.1109/TWC.2020.3037554

  50. [51]

    Resource-aware federated learning for mobile edge computing.IEEE Transactions on Wireless Communications, 2023

    Yifan Zeng, Mingzhe Chen, and Walid Saad. Resource-aware federated learning for mobile edge computing.IEEE Transactions on Wireless Communications, 2023. 17 A x86-based Workstations Whether deployed as commodity desktop devices or as servers in a data center environment, most workstations rely on x86-based processors. Our goal in this section is therefore...