Towards Family-Grouped Hierarchical Federated Learning on Sub-5KB Models: A Feasibility Study of Privacy-Preserving ECG Monitoring for Ultra-Resource-Constrained Wearables
Pith reviewed 2026-05-20 20:44 UTC · model grok-4.3
The pith
Family-Grouped Hierarchical Federated Learning cuts communication volume by 76.7 percent for ECG models on sub-5KB wearables.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Family-FL reduces communication volume by 76.7% compared to FedAvg while maintaining comparable accuracy. The Tiny CNN-LSTM model with 669 parameters achieves 91.9 +/- 1.2% accuracy, macro-F1 of 0.483 +/- 0.031, and per-class F1 of 0.80 for ventricular arrhythmia detection on the MIT-BIH Arrhythmia Database across five independent runs.
What carries the argument
Family-Grouped Hierarchical Federated Learning (Family-FL), a three-tier architecture that performs intra-family aggregation as a natural privacy boundary before global synchronization, together with a hardware-constrained 669-parameter INT8-quantized CNN-LSTM that occupies 4.65 KB Flash and 2.95 KB RAM.
If this is right
- Continuous privacy-preserving ECG monitoring becomes feasible on microcontrollers with only a few kilobytes of memory.
- Communication overhead drops enough to make federated training viable for battery-powered wearable sensors.
- Ventricular arrhythmia detection reaches a per-class F1 of 0.80, which is clinically useful for home preliminary screening.
- Total communication falls to 0.31 percent of standard FedAvg while accuracy stays within a few percentage points.
Where Pith is reading between the lines
- The same family-style grouping could be tested on other household or workplace sensor streams such as blood-pressure or activity data.
- Adding lightweight differential privacy noise inside the family tier might strengthen guarantees without destroying the communication savings.
- If real-device trials match the simulations, the architecture could support large-scale deployment of federated health models on existing low-cost wearables.
Load-bearing premise
Family grouping supplies a sufficient natural privacy boundary for intra-family aggregation and simulation results on the MIT-BIH database will translate to actual performance on STC32G12K128-class microcontrollers without any hardware deployment.
What would settle it
Deploying the 669-parameter model on a physical STC32G12K128 microcontroller, running it across multiple simulated families, and directly measuring both the achieved communication volume and the arrhythmia detection F1 scores would confirm or refute the reported reductions and accuracy figures.
Figures
read the original abstract
Cardiovascular disease remains the leading cause of death worldwide, and early detection of arrhythmias through continuous ECG monitoring on wearable devices can prevent life-threatening events. Federated Learning (FL) enables privacy-preserving collaborative training by keeping raw ECG data on device, yet standard FL incurs prohibitive communication overhead and standard deep learning models cannot fit on ultra-low-power microcontrollers. We propose Family-Grouped Hierarchical Federated Learning (Family-FL), a three-tier architecture that uses the family as a natural privacy boundary for intra-family aggregation before global synchronization. We further design a hardware-constrained Tiny CNN-LSTM architecture with only 669 parameters, INT8-quantized to occupy merely 4.65KB Flash and 2.95KB RAM, meeting the constraints of STC32G12K128-class microcontrollers. Experiments on the MIT-BIH Arrhythmia Database (mean of 5 independent runs with different seeds) demonstrate that Family-FL reduces communication volume by 76.7% compared to FedAvg while maintaining comparable accuracy. Family-FL-Tiny achieves 91.9 +/- 1.2% accuracy with macro-F1 of 0.483 +/- 0.031, reducing total communication to 0.31% of FedAvg. The model achieves reliable ventricular arrhythmia detection (per-class F1 = 0.80), the most clinically critical abnormality for home-based preliminary screening. These results demonstrate the technical feasibility of privacy-preserving federated learning on ultra-resource-constrained microcontrollers through simulation-based evaluation. We honestly discuss limitations: no hardware deployment, single-dataset validation (MIT-BIH, 47 subjects), reduced rare-class sensitivity, and absence of formal differential privacy guarantees.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Family-Grouped Hierarchical Federated Learning (Family-FL), a three-tier FL architecture that performs intra-family aggregation before global synchronization, paired with a 669-parameter INT8-quantized CNN-LSTM model (4.65 KB Flash, 2.95 KB RAM) designed for STC32G12K128-class microcontrollers. On the MIT-BIH Arrhythmia Database, experiments (mean over five seeded runs) report that Family-FL reduces communication volume by 76.7% relative to FedAvg while achieving 91.9 ± 1.2% accuracy and macro-F1 of 0.483 ± 0.031 for the Tiny model, with per-class F1 = 0.80 on ventricular arrhythmia; the work frames these results as a simulation-based feasibility demonstration for privacy-preserving ECG monitoring on ultra-resource-constrained wearables and explicitly lists limitations including absence of hardware deployment and single-dataset scope.
Significance. If the simulation results translate to hardware, the work would demonstrate a concrete route to sub-5 KB federated models with substantial communication savings via hierarchical family grouping, which could enable privacy-preserving collaborative training on the most constrained wearables for continuous cardiac monitoring. The explicit reporting of five-run statistics, per-class metrics on the clinically critical arrhythmia, and open acknowledgment of limitations strengthen the contribution as a targeted feasibility study rather than an overclaimed deployment result.
major comments (2)
- [Abstract / Experiments] Abstract and Experiments section: the feasibility claim for the 669-parameter Tiny model on STC32G12K128-class devices rests solely on static memory footprint calculations (4.65 KB Flash / 2.95 KB RAM); no cycle-accurate emulation, SDK compilation, interrupt-latency measurements, or power profiling under realistic ECG sampling rates is reported, which is load-bearing for the central assertion that the architecture meets real-time constraints on the target microcontroller.
- [Experiments] Experiments section: while communication volume is reported as 76.7% lower than FedAvg and total communication as 0.31% of FedAvg, the manuscript does not provide the exact per-round byte counts, uplink/downlink breakdown, or the precise definition of 'family grouping' rounds versus global rounds that produce these figures; without this, the magnitude of the reported saving cannot be independently verified from the given experimental setup.
minor comments (2)
- [Experiments] The manuscript would benefit from a dedicated subsection or table explicitly comparing the Tiny model's parameter count, memory footprint, and FLOPs against the standard FedAvg baseline model used in the same experiments.
- [Method] Notation for the three-tier hierarchy (family aggregator, edge server, global server) should be introduced with a diagram or pseudocode early in the Method section to clarify the aggregation flow before the results are presented.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. Below we respond point-by-point to the major comments, indicating the changes we will incorporate.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: the feasibility claim for the 669-parameter Tiny model on STC32G12K128-class devices rests solely on static memory footprint calculations (4.65 KB Flash / 2.95 KB RAM); no cycle-accurate emulation, SDK compilation, interrupt-latency measurements, or power profiling under realistic ECG sampling rates is reported, which is load-bearing for the central assertion that the architecture meets real-time constraints on the target microcontroller.
Authors: We agree that the feasibility demonstration for the Tiny model currently rests on static memory footprint calculations. The manuscript already positions the work as a simulation-based feasibility study and explicitly lists 'no hardware deployment' among its limitations. In revision we will (i) strengthen the abstract and Experiments section to state that real-time constraints are assessed via memory footprint only, (ii) add a short discussion of the assumptions made about interrupt latency and sampling rates, and (iii) include an explicit forward-looking statement that hardware profiling remains necessary future work. Because we cannot conduct the requested cycle-accurate or power measurements in the current revision cycle, the change is partial. revision: partial
-
Referee: [Experiments] Experiments section: while communication volume is reported as 76.7% lower than FedAvg and total communication as 0.31% of FedAvg, the manuscript does not provide the exact per-round byte counts, uplink/downlink breakdown, or the precise definition of 'family grouping' rounds versus global rounds that produce these figures; without this, the magnitude of the reported saving cannot be independently verified from the given experimental setup.
Authors: We accept this criticism. In the revised Experiments section we will add (a) a table listing exact per-round uplink and downlink byte counts for both Family-FL and FedAvg, (b) a clear definition of intra-family aggregation rounds versus global synchronization rounds, and (c) the arithmetic steps that yield the 76.7 % reduction and 0.31 % total-communication figures. These additions will enable independent verification. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper is an empirical feasibility study whose central claims (76.7% communication reduction, 91.9% accuracy, Tiny model size) are obtained by direct measurement against the external MIT-BIH dataset and standard FedAvg baseline. No equations, fitted parameters, or self-citations are shown to reduce the reported gains to quantities defined by the method itself. The architecture description, hierarchical aggregation, and quantization steps are presented as design choices evaluated experimentally rather than derived in a self-referential loop. This matches the default expectation for non-circular experimental work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Family grouping provides a natural and sufficient privacy boundary for intra-family model aggregation
Reference graph
Works this paper leans on
-
[1]
The limits of fair medical imaging AI in real -world generalization,
Hannun, A.Y .,Rajpurkar, P .,Haghpanahi, M.,Tison, G.H.,Bourn, C.,Turakhia, M.P .andNg, A.Y .(2019) Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network.Nature Medicine25: 65–69. doi:10.1038/s41591- 018-0268-3
-
[2]
Ribeiro, A.H.,Ribeiro, M.H.,Paixao, G.M.M.,Oliveira, D.M.,Gomes, P .R.,Canazart, J.A.,Lima, M.P .S.et al. (2020) Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature Communications11: 1760. doi:10.1038/s41467-020-15432-4
-
[3]
Moody, G.B.andMark, R.G.(2001) The impact of the mit-bih arrhythmia database.IEEE Engineering in Medicine and Biology Magazine20(3): 45–50. doi:10. 1109/51.932724, URL https://ieeexplore.ieee.org/ document/932724
work page 2001
-
[4]
Goldberger, A.,Amaral, L.,Glass, L.,Hausdorff, J.,Ivanov, P .C.,Mark, R.,Mietus, J.E.et al.(2000) Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation101(23): e215–e220. doi:10.1161/01.CIR.101. 23.e215
-
[5]
McMahan, B.,Moore, E.,Ramage, D.,Hampson, S.andArcas, B.(2017) Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)(PMLR),Proceedings of Machine Learning Research54: 1273–1282. URL http: //proceedings.mlr.press/v54/mcmahan17a.html
work page 2017
-
[6]
Rieke, N.,Hancox, J.,Li, W .,Milletari, F .,Roth, H.R., Albarqouni, S.,Bakas, S.et al.(2020) The future of digital health with federated learning.NPJ Digital Medicine3:
work page 2020
-
[7]
doi:10.1038/s41746-020-00323-1
-
[8]
Nguyen, D.C.,Pham, Q.V .,Pathirana, P .N.,Ding, M., Seneviratne, A.,Lin, Z.,Dobre, O.A.et al.(2023) Federated learning for smart healthcare: A survey.ACM Computing Surveys (CSUR)55(3): 1–37. doi:10.1145/ 3501296
work page 2023
-
[9]
URL https://www.oreilly.com/library/view/tinyml/ 9781492052036/
W arden, P .andSitunayake, D.(2020)TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra- Low-Power Microcontrollers(O’Reilly Media). URL https://www.oreilly.com/library/view/tinyml/ 9781492052036/
work page 2020
-
[10]
Lin, J.,Chen, W .M.,Lin, Y .,Cohn, J.,Gan, C.and Han, S.(2020) Mcunet: Tiny deep learning on iot devices. InAdvances in Neural Information Processing Systems (NeurIPS)(Curran Associates, Inc.),33: 11711– 11722. URL https://proceedings.neurips.cc/paper/ 2020/hash/86c51678350f656dcc7f490a43946ee5- Abstract.html
work page 2020
-
[11]
Brisimi, T .S.,Chen, R.,Mela, T .,Olshevsky, A., Paschalidis, I.C.andShi, W .(2018) Federated learning of predictive models from federated electronic health records.International Journal of Medical Informatics112: 59–67. doi:10.1016/j.ijmedinf.2018.01.007
-
[12]
Xu, J.,Glicksberg, B.S.,Su, C.,W alker, P .,Bian, J. andW ang, F .(2021) Federated learning for healthcare informatics.Journal of Healthcare Informatics Research5(1): 1–19. doi:10.1007/s41666-020-00082-4
-
[13]
North American Actuarial Journal , volume =
Liu, L.,Zhang, J.,Song, S.andLetaief, K.B.(2020) Client-edge-cloud hierarchical federated learning. In IEEE International Conference on Communications (ICC) (IEEE): 1–6. doi:10.1109/ICC40277.2020.9148862
-
[14]
InProceedings of Machine Learning and Systems (MLSys)(mlsys.org),2: 429–
Li, T .,Sahu, A.K.,Zaheer, M.,Sanjabi, M.,Talwalkar, A.andSmith, V .(2020) Federated optimization in heterogeneous networks. InProceedings of Machine Learning and Systems (MLSys)(mlsys.org),2: 429–
work page 2020
-
[15]
URL https://proceedings.mlsys.org/paper/ 2020/hash/38af86134b65d0f10fe33d30dd76442e- Abstract.html
work page 2020
-
[16]
Banbury, C.,Reddi, V .J.,Lam, M.,Fu, W ., Fazel, A.,Holleman, J.,Huang, X.et al.(2021) Mlperf tiny benchmark. InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track(Curran Associates, Inc.),34. URL https://proceedings.neurips.cc/paper/2021/hash/ fd3528311751f7a03804598927488574-Abstract.html
work page 2021
-
[17]
URL https://proceedings.mlsys.org/paper/2021/hash/ 9fe77ac706edb76e4e13183fa9f3b1c8-Abstract.html
David, R.,Duke, J.,Jain, A.,Reddi, V .J.,Jeffries, N.,Li, J., Kreeger, N.et al.(2021) Tensorflow lite micro: Embedded machine learning on tinyml systems.Proceedings of Machine Learning and Systems (MLSys)3: 800–811. URL https://proceedings.mlsys.org/paper/2021/hash/ 9fe77ac706edb76e4e13183fa9f3b1c8-Abstract.html
work page 2021
-
[18]
Ray, P .P .(2022) A review on tinyml: State-of-the-art and prospects.Journal of King Saud University - Computer and Information Sciences34(4): 1595–1623. doi:10.1016/j. jksuci.2021.11.019
work page doi:10.1016/j 2022
-
[19]
InAdvances in Neural Information Processing Systems (NeurIPS),32
Zhu, L.,Liu, Z.andHan, S.(2019) Deep leakage from gradients. InAdvances in Neural Information Processing Systems (NeurIPS),32. URL https://proceedings.neurips.cc/paper/2019/hash/ 60a6c4002cc7b29142def8871531281a-Abstract.html
work page 2019
-
[20]
(2017) Focal loss for dense object detection
Lin, T .Y .,Goyal, P .,Girshick, R.,He, K.andDollár, P . (2017) Focal loss for dense object detection. InProceedings of the IEEE International Conference on Computer Vision (ICCV)(IEEE): 2980–2988. doi:10.1109/ICCV.2017.324. 11 EAI Endorsed Transactions Preprint
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.