pith. sign in

arxiv: 2605.18862 · v1 · pith:SGAX56VZnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI· cs.CR

Towards Family-Grouped Hierarchical Federated Learning on Sub-5KB Models: A Feasibility Study of Privacy-Preserving ECG Monitoring for Ultra-Resource-Constrained Wearables

Pith reviewed 2026-05-20 20:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR
keywords federated learningECG monitoringwearable devicesprivacy-preserving machine learninghierarchical federated learningresource-constrained devicesarrhythmia detectiontiny neural networks
0
0 comments X

The pith

Family-Grouped Hierarchical Federated Learning cuts communication volume by 76.7 percent for ECG models on sub-5KB wearables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that grouping devices by family for local aggregation before global updates makes federated learning practical on devices too small for standard models or high communication loads. It pairs this hierarchy with a specially designed 669-parameter CNN-LSTM that fits in 4.65 KB of flash memory after INT8 quantization. Experiments on the MIT-BIH database indicate the method keeps accuracy near that of ordinary federated averaging while slashing total communication to roughly one-third of one percent of the baseline. A sympathetic reader would care because the combination opens a route to private, always-on arrhythmia screening on everyday ultra-low-power watches and patches without shipping raw heart signals to any central server.

Core claim

Family-FL reduces communication volume by 76.7% compared to FedAvg while maintaining comparable accuracy. The Tiny CNN-LSTM model with 669 parameters achieves 91.9 +/- 1.2% accuracy, macro-F1 of 0.483 +/- 0.031, and per-class F1 of 0.80 for ventricular arrhythmia detection on the MIT-BIH Arrhythmia Database across five independent runs.

What carries the argument

Family-Grouped Hierarchical Federated Learning (Family-FL), a three-tier architecture that performs intra-family aggregation as a natural privacy boundary before global synchronization, together with a hardware-constrained 669-parameter INT8-quantized CNN-LSTM that occupies 4.65 KB Flash and 2.95 KB RAM.

If this is right

  • Continuous privacy-preserving ECG monitoring becomes feasible on microcontrollers with only a few kilobytes of memory.
  • Communication overhead drops enough to make federated training viable for battery-powered wearable sensors.
  • Ventricular arrhythmia detection reaches a per-class F1 of 0.80, which is clinically useful for home preliminary screening.
  • Total communication falls to 0.31 percent of standard FedAvg while accuracy stays within a few percentage points.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same family-style grouping could be tested on other household or workplace sensor streams such as blood-pressure or activity data.
  • Adding lightweight differential privacy noise inside the family tier might strengthen guarantees without destroying the communication savings.
  • If real-device trials match the simulations, the architecture could support large-scale deployment of federated health models on existing low-cost wearables.

Load-bearing premise

Family grouping supplies a sufficient natural privacy boundary for intra-family aggregation and simulation results on the MIT-BIH database will translate to actual performance on STC32G12K128-class microcontrollers without any hardware deployment.

What would settle it

Deploying the 669-parameter model on a physical STC32G12K128 microcontroller, running it across multiple simulated families, and directly measuring both the achieved communication volume and the arrhythmia detection F1 scores would confirm or refute the reported reductions and accuracy figures.

Figures

Figures reproduced from arXiv: 2605.18862 by Hangyu Wu.

Figure 1
Figure 1. Figure 1: The TinyCNN-LSTM model architecture designed for STM32-class microcontrollers. The network consists of a Conv1D feature extractor, an LSTM temporal module, and a fully-connected classifier, totaling only 637 parameters (4.7 KB Flash, 3.0 KB RAM). INT8 Quantization Strategy. We apply symmetric per￾tensor post-training quantization (PTQ) using a representative calibration dataset of 100 samples. For each wei… view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy–communication tradeoff. Family-FL-Tiny (Ours) achieves Pareto optimality for resource-constrained deployments, reducing communication by 99.7% with only 4.5% accuracy degradation compared to Family-FL. N S V F Q Predicted Label N S V F Q True Label 0.99 0.01 0.00 0.00 0.00 0.62 0.38 0.02 0.01 0.01 0.12 0.03 0.82 0.02 0.01 0.55 0.15 0.12 0.08 0.10 0.45 0.12 0.15 0.10 0.18 0.0 0.2 0.4 0.6 0.8 1.0 No… view at source ↗
Figure 3
Figure 3. Figure 3: Flash memory and RAM usage comparison across methods. The red dashed line indicates the STM32L010K4 resource limit (128 KB Flash / 12 KB RAM). Only Family￾FL-Tiny fits within the MCU constraint, achieving 43.5× Flash and 407× RAM reduction over baselines. FedAvg Family-FL Family-FL-Tiny (Ours) 10 3 10 4 10 5 10 6 Total Communication (KB) 853K 198K 3K 76.8% reduction 98.7% reduction [PITH_FULL_IMAGE:figure… view at source ↗
Figure 4
Figure 4. Figure 4: Total communication cost per device over the entire training process. Family-FL reduces communication by 76.8% compared to FedAvg; Family-FL-Tiny achieves a 99.7% reduction, lowering the cost from 853 MB to only 2.6 KB. for diagnostic applications. The intended use case is preliminary screening where the primary goal is detecting the most common abnormality (ventricular arrhythmias, F1 = 0.80) while flaggi… view at source ↗
Figure 7
Figure 7. Figure 7: Per-class F1-score comparison across all methods. All full-size methods (Centralized, FedAvg, FedProx, Family￾FL) achieve strong performance across all classes. Tiny models (FedAvg-Tiny, Family-FL-Tiny) show severe degradation on rare classes (S, F, Q) due to limited model capacity, while maintaining reasonable performance on N and V classes. Family-FL-Tiny marginally outperforms FedAvg-Tiny on most classe… view at source ↗
read the original abstract

Cardiovascular disease remains the leading cause of death worldwide, and early detection of arrhythmias through continuous ECG monitoring on wearable devices can prevent life-threatening events. Federated Learning (FL) enables privacy-preserving collaborative training by keeping raw ECG data on device, yet standard FL incurs prohibitive communication overhead and standard deep learning models cannot fit on ultra-low-power microcontrollers. We propose Family-Grouped Hierarchical Federated Learning (Family-FL), a three-tier architecture that uses the family as a natural privacy boundary for intra-family aggregation before global synchronization. We further design a hardware-constrained Tiny CNN-LSTM architecture with only 669 parameters, INT8-quantized to occupy merely 4.65KB Flash and 2.95KB RAM, meeting the constraints of STC32G12K128-class microcontrollers. Experiments on the MIT-BIH Arrhythmia Database (mean of 5 independent runs with different seeds) demonstrate that Family-FL reduces communication volume by 76.7% compared to FedAvg while maintaining comparable accuracy. Family-FL-Tiny achieves 91.9 +/- 1.2% accuracy with macro-F1 of 0.483 +/- 0.031, reducing total communication to 0.31% of FedAvg. The model achieves reliable ventricular arrhythmia detection (per-class F1 = 0.80), the most clinically critical abnormality for home-based preliminary screening. These results demonstrate the technical feasibility of privacy-preserving federated learning on ultra-resource-constrained microcontrollers through simulation-based evaluation. We honestly discuss limitations: no hardware deployment, single-dataset validation (MIT-BIH, 47 subjects), reduced rare-class sensitivity, and absence of formal differential privacy guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Family-Grouped Hierarchical Federated Learning (Family-FL), a three-tier FL architecture that performs intra-family aggregation before global synchronization, paired with a 669-parameter INT8-quantized CNN-LSTM model (4.65 KB Flash, 2.95 KB RAM) designed for STC32G12K128-class microcontrollers. On the MIT-BIH Arrhythmia Database, experiments (mean over five seeded runs) report that Family-FL reduces communication volume by 76.7% relative to FedAvg while achieving 91.9 ± 1.2% accuracy and macro-F1 of 0.483 ± 0.031 for the Tiny model, with per-class F1 = 0.80 on ventricular arrhythmia; the work frames these results as a simulation-based feasibility demonstration for privacy-preserving ECG monitoring on ultra-resource-constrained wearables and explicitly lists limitations including absence of hardware deployment and single-dataset scope.

Significance. If the simulation results translate to hardware, the work would demonstrate a concrete route to sub-5 KB federated models with substantial communication savings via hierarchical family grouping, which could enable privacy-preserving collaborative training on the most constrained wearables for continuous cardiac monitoring. The explicit reporting of five-run statistics, per-class metrics on the clinically critical arrhythmia, and open acknowledgment of limitations strengthen the contribution as a targeted feasibility study rather than an overclaimed deployment result.

major comments (2)
  1. [Abstract / Experiments] Abstract and Experiments section: the feasibility claim for the 669-parameter Tiny model on STC32G12K128-class devices rests solely on static memory footprint calculations (4.65 KB Flash / 2.95 KB RAM); no cycle-accurate emulation, SDK compilation, interrupt-latency measurements, or power profiling under realistic ECG sampling rates is reported, which is load-bearing for the central assertion that the architecture meets real-time constraints on the target microcontroller.
  2. [Experiments] Experiments section: while communication volume is reported as 76.7% lower than FedAvg and total communication as 0.31% of FedAvg, the manuscript does not provide the exact per-round byte counts, uplink/downlink breakdown, or the precise definition of 'family grouping' rounds versus global rounds that produce these figures; without this, the magnitude of the reported saving cannot be independently verified from the given experimental setup.
minor comments (2)
  1. [Experiments] The manuscript would benefit from a dedicated subsection or table explicitly comparing the Tiny model's parameter count, memory footprint, and FLOPs against the standard FedAvg baseline model used in the same experiments.
  2. [Method] Notation for the three-tier hierarchy (family aggregator, edge server, global server) should be introduced with a diagram or pseudocode early in the Method section to clarify the aggregation flow before the results are presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. Below we respond point-by-point to the major comments, indicating the changes we will incorporate.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: the feasibility claim for the 669-parameter Tiny model on STC32G12K128-class devices rests solely on static memory footprint calculations (4.65 KB Flash / 2.95 KB RAM); no cycle-accurate emulation, SDK compilation, interrupt-latency measurements, or power profiling under realistic ECG sampling rates is reported, which is load-bearing for the central assertion that the architecture meets real-time constraints on the target microcontroller.

    Authors: We agree that the feasibility demonstration for the Tiny model currently rests on static memory footprint calculations. The manuscript already positions the work as a simulation-based feasibility study and explicitly lists 'no hardware deployment' among its limitations. In revision we will (i) strengthen the abstract and Experiments section to state that real-time constraints are assessed via memory footprint only, (ii) add a short discussion of the assumptions made about interrupt latency and sampling rates, and (iii) include an explicit forward-looking statement that hardware profiling remains necessary future work. Because we cannot conduct the requested cycle-accurate or power measurements in the current revision cycle, the change is partial. revision: partial

  2. Referee: [Experiments] Experiments section: while communication volume is reported as 76.7% lower than FedAvg and total communication as 0.31% of FedAvg, the manuscript does not provide the exact per-round byte counts, uplink/downlink breakdown, or the precise definition of 'family grouping' rounds versus global rounds that produce these figures; without this, the magnitude of the reported saving cannot be independently verified from the given experimental setup.

    Authors: We accept this criticism. In the revised Experiments section we will add (a) a table listing exact per-round uplink and downlink byte counts for both Family-FL and FedAvg, (b) a clear definition of intra-family aggregation rounds versus global synchronization rounds, and (c) the arithmetic steps that yield the 76.7 % reduction and 0.31 % total-communication figures. These additions will enable independent verification. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper is an empirical feasibility study whose central claims (76.7% communication reduction, 91.9% accuracy, Tiny model size) are obtained by direct measurement against the external MIT-BIH dataset and standard FedAvg baseline. No equations, fitted parameters, or self-citations are shown to reduce the reported gains to quantities defined by the method itself. The architecture description, hierarchical aggregation, and quantization steps are presented as design choices evaluated experimentally rather than derived in a self-referential loop. This matches the default expectation for non-circular experimental work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central feasibility claim rests on standard federated learning assumptions plus the domain premise that family units provide a usable privacy boundary; no new free parameters or invented entities are introduced beyond the proposed architecture.

axioms (1)
  • domain assumption Family grouping provides a natural and sufficient privacy boundary for intra-family model aggregation
    Invoked to justify the three-tier architecture before global synchronization.

pith-pipeline@v0.9.0 · 5847 in / 1261 out tokens · 33386 ms · 2026-05-20T20:44:51.528720+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    The limits of fair medical imaging AI in real -world generalization,

    Hannun, A.Y .,Rajpurkar, P .,Haghpanahi, M.,Tison, G.H.,Bourn, C.,Turakhia, M.P .andNg, A.Y .(2019) Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network.Nature Medicine25: 65–69. doi:10.1038/s41591- 018-0268-3

  2. [2]

    (2020) Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature Communications11: 1760

    Ribeiro, A.H.,Ribeiro, M.H.,Paixao, G.M.M.,Oliveira, D.M.,Gomes, P .R.,Canazart, J.A.,Lima, M.P .S.et al. (2020) Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature Communications11: 1760. doi:10.1038/s41467-020-15432-4

  3. [3]

    Moody, G.B.andMark, R.G.(2001) The impact of the mit-bih arrhythmia database.IEEE Engineering in Medicine and Biology Magazine20(3): 45–50. doi:10. 1109/51.932724, URL https://ieeexplore.ieee.org/ document/932724

  4. [4]

    Circulation101(23): e215–e220

    Goldberger, A.,Amaral, L.,Glass, L.,Hausdorff, J.,Ivanov, P .C.,Mark, R.,Mietus, J.E.et al.(2000) Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation101(23): e215–e220. doi:10.1161/01.CIR.101. 23.e215

  5. [5]

    In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)(PMLR),Proceedings of Machine Learning Research54: 1273–1282

    McMahan, B.,Moore, E.,Ramage, D.,Hampson, S.andArcas, B.(2017) Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)(PMLR),Proceedings of Machine Learning Research54: 1273–1282. URL http: //proceedings.mlr.press/v54/mcmahan17a.html

  6. [6]

    Rieke, N.,Hancox, J.,Li, W .,Milletari, F .,Roth, H.R., Albarqouni, S.,Bakas, S.et al.(2020) The future of digital health with federated learning.NPJ Digital Medicine3:

  7. [7]

    doi:10.1038/s41746-020-00323-1

  8. [8]

    doi:10.1145/ 3501296

    Nguyen, D.C.,Pham, Q.V .,Pathirana, P .N.,Ding, M., Seneviratne, A.,Lin, Z.,Dobre, O.A.et al.(2023) Federated learning for smart healthcare: A survey.ACM Computing Surveys (CSUR)55(3): 1–37. doi:10.1145/ 3501296

  9. [9]

    URL https://www.oreilly.com/library/view/tinyml/ 9781492052036/

    W arden, P .andSitunayake, D.(2020)TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra- Low-Power Microcontrollers(O’Reilly Media). URL https://www.oreilly.com/library/view/tinyml/ 9781492052036/

  10. [10]

    InAdvances in Neural Information Processing Systems (NeurIPS)(Curran Associates, Inc.),33: 11711– 11722

    Lin, J.,Chen, W .M.,Lin, Y .,Cohn, J.,Gan, C.and Han, S.(2020) Mcunet: Tiny deep learning on iot devices. InAdvances in Neural Information Processing Systems (NeurIPS)(Curran Associates, Inc.),33: 11711– 11722. URL https://proceedings.neurips.cc/paper/ 2020/hash/86c51678350f656dcc7f490a43946ee5- Abstract.html

  11. [11]

    International Journal of Medical Informatics112, 59–67 (2018) https://doi.org/ 10.1016/j.ijmedinf.2018.01.007

    Brisimi, T .S.,Chen, R.,Mela, T .,Olshevsky, A., Paschalidis, I.C.andShi, W .(2018) Federated learning of predictive models from federated electronic health records.International Journal of Medical Informatics112: 59–67. doi:10.1016/j.ijmedinf.2018.01.007

  12. [12]

    andW ang, F .(2021) Federated learning for healthcare informatics.Journal of Healthcare Informatics Research5(1): 1–19

    Xu, J.,Glicksberg, B.S.,Su, C.,W alker, P .,Bian, J. andW ang, F .(2021) Federated learning for healthcare informatics.Journal of Healthcare Informatics Research5(1): 1–19. doi:10.1007/s41666-020-00082-4

  13. [13]

    North American Actuarial Journal , volume =

    Liu, L.,Zhang, J.,Song, S.andLetaief, K.B.(2020) Client-edge-cloud hierarchical federated learning. In IEEE International Conference on Communications (ICC) (IEEE): 1–6. doi:10.1109/ICC40277.2020.9148862

  14. [14]

    InProceedings of Machine Learning and Systems (MLSys)(mlsys.org),2: 429–

    Li, T .,Sahu, A.K.,Zaheer, M.,Sanjabi, M.,Talwalkar, A.andSmith, V .(2020) Federated optimization in heterogeneous networks. InProceedings of Machine Learning and Systems (MLSys)(mlsys.org),2: 429–

  15. [15]

    URL https://proceedings.mlsys.org/paper/ 2020/hash/38af86134b65d0f10fe33d30dd76442e- Abstract.html

  16. [16]

    InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track(Curran Associates, Inc.),34

    Banbury, C.,Reddi, V .J.,Lam, M.,Fu, W ., Fazel, A.,Holleman, J.,Huang, X.et al.(2021) Mlperf tiny benchmark. InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track(Curran Associates, Inc.),34. URL https://proceedings.neurips.cc/paper/2021/hash/ fd3528311751f7a03804598927488574-Abstract.html

  17. [17]

    URL https://proceedings.mlsys.org/paper/2021/hash/ 9fe77ac706edb76e4e13183fa9f3b1c8-Abstract.html

    David, R.,Duke, J.,Jain, A.,Reddi, V .J.,Jeffries, N.,Li, J., Kreeger, N.et al.(2021) Tensorflow lite micro: Embedded machine learning on tinyml systems.Proceedings of Machine Learning and Systems (MLSys)3: 800–811. URL https://proceedings.mlsys.org/paper/2021/hash/ 9fe77ac706edb76e4e13183fa9f3b1c8-Abstract.html

  18. [18]

    Masset, R

    Ray, P .P .(2022) A review on tinyml: State-of-the-art and prospects.Journal of King Saud University - Computer and Information Sciences34(4): 1595–1623. doi:10.1016/j. jksuci.2021.11.019

  19. [19]

    InAdvances in Neural Information Processing Systems (NeurIPS),32

    Zhu, L.,Liu, Z.andHan, S.(2019) Deep leakage from gradients. InAdvances in Neural Information Processing Systems (NeurIPS),32. URL https://proceedings.neurips.cc/paper/2019/hash/ 60a6c4002cc7b29142def8871531281a-Abstract.html

  20. [20]

    (2017) Focal loss for dense object detection

    Lin, T .Y .,Goyal, P .,Girshick, R.,He, K.andDollár, P . (2017) Focal loss for dense object detection. InProceedings of the IEEE International Conference on Computer Vision (ICCV)(IEEE): 2980–2988. doi:10.1109/ICCV.2017.324. 11 EAI Endorsed Transactions Preprint