pith. sign in

arxiv: 2504.04678 · v2 · submitted 2025-04-07 · 💻 cs.NI

Beyond Assumptions: Measuring Federated Learning over Real 5G Networks

Pith reviewed 2026-05-22 21:18 UTC · model grok-4.3

classification 💻 cs.NI
keywords federated learning5G networksstragglerswireless testbedcommunication timeedge devicesO-RANFlower framework
0
0 comments X

The pith

A 5G testbed experiment finds that federated learning usually has one consistent straggler device rather than varying ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets up federated learning on a network of Raspberry Pi devices connected through a real 5G standalone testbed built with software-defined radios and open-source O-RAN software. It runs the Flower framework and measures communication times and machine learning performance over 5G, WiFi, and Ethernet links while changing bandwidth and scheduling settings. The results show that common ideas about how wireless delays create stragglers do not match the observed behavior. Instead, one device tends to lag consistently across many training rounds in most experiments. This finding points to the need for FL designs that handle persistent device differences when operating over next-generation wireless networks.

Core claim

Deploying FL using a 5G-NR SA testbed with Raspberry Pis and COTS components reveals that there is a consistent straggler in about 70% of trials, while in the other 30% high communication time causes competing stragglers. These results challenge common assumptions about communication time in FL over wireless networks.

What carries the argument

The 5G testbed consisting of resource-constrained Raspberry Pi edge devices communicating with a central server via SDR and O-RAN software, instrumented with the Flower FL framework to track communication and ML metrics across different network interfaces.

If this is right

  • FL performance over 5G is affected by external congestion in measurable ways.
  • Varying 5G bandwidths and uplink-downlink scheduling ratios change the observed communication times.
  • The testbed results can be checked against commercial 5G deployments for broader validity.
  • Straggler mitigation in wireless FL must account for cases where one device consistently lags.
  • Open-sourced instrumentation enables direct replication of the measurements on other hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • FL systems for IoT devices should include per-device profiling to detect persistent stragglers before training starts.
  • The same measurement approach could be applied to other wireless links such as WiFi 6 or future 6G to compare straggler patterns.
  • If consistent stragglers prove common, asynchronous FL updates or client selection methods may become more effective than synchronous rounds.
  • The open-sourced tools lower the barrier for other researchers to test FL under controlled wireless conditions.

Load-bearing premise

The specific hardware and software setup in the custom 5G testbed produces communication and straggler patterns that would appear in commercial 5G networks and standard federated learning tasks.

What would settle it

Observing no consistent straggler across the majority of trials when repeating the FL experiments on a commercial 5G network would disprove the main finding.

Figures

Figures reproduced from arXiv: 2504.04678 by (2) Argonne National Laboratory, (3) Indian Institute of Science), Chandra R. Murthy (3), Igor Kadota (1) ((1) Northwestern University, Joaquin Chung (2), Kayla Comer (1), Rajkumar Kettimuthu (2), Robert J. Hayek (2).

Figure 1
Figure 1. Figure 1: 5G Testbed: gNB, Core, six 5G enabled nodes. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: One Communication Round of Federated Learning [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Network Architecture [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Total communication round time over 10 trials [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Worst validation accuracy of the local model evalu [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 9
Figure 9. Figure 9: Average uplink and downlink times averaged for [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 7
Figure 7. Figure 7: Average validation dataset evaluation time for all [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average train dataset evaluation time for all nodes [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Worst local validation accuracy as measured by [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of download (𝑡𝑑 ) and uplink (𝑡𝑢) time over 5G with number of nodes increasing from three to six impose homogeneity of the machine learning configuration, i.e. IID datasets, persistent hyperparameters, etc. However, due to the innate differences between individual devices and the heterogene￾ity of the communication method, we have inconsistencies with realistic device performance. In [PITH_FUL… view at source ↗
Figure 13
Figure 13. Figure 13: Average 5G downlink time (top) and uplink time [PITH_FULL_IMAGE:figures/full_fig_p008_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Average WiFi downlink time (top) and uplink time [PITH_FULL_IMAGE:figures/full_fig_p009_14.png] view at source ↗
read the original abstract

Deploying FL using IoT devices is an area poised to significantly benefit from advances in NextG wireless. In this paper, we deploy a FL application using a 5G-NR Standalone (SA) testbed with open-source and Commercial Off-the-Shelf (COTS) components. The 5G testbed architecture consists of a network of resource-constrained edge devices, namely Raspberry Pis, and a central server equipped with a Software Defined Radio (SDR) and running O-RAN software. Our testbed allows edge devices to communicate with the server using WiFi and Ethernet in addition to 5G. FL is deployed using the Flower FL framework, extended with custom instrumentation for communication and ML metrics. We analyze the FL application across three network interfaces--5G, WiFi, and Ethernet--as well as across 5G bandwidths and uplink-downlink scheduling ratios. Our experimental results challenge some common assumptions about communication time in FL over wireless and discuss the potential pitfalls of these assumptions. We find that there is a consistent straggler in about 70% of trials, while in the other 30%, high communication time causes competing stragglers. We also compare FL performance over 5G with and without external congestion and compare our testbed to commercial 5G to validate our findings in a broader context. For reproducibility, we have open-sourced our FL application, instrumentation tools, and testbed configuration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents an experimental evaluation of Federated Learning (FL) over a custom 5G-NR Standalone testbed built with Raspberry Pi edge devices, an SDR-equipped server running O-RAN software, and the Flower framework with custom instrumentation. It measures FL performance across 5G, WiFi, and Ethernet interfaces, varying 5G bandwidths and uplink-downlink scheduling ratios, reports a consistent straggler in approximately 70% of trials (with competing stragglers in the remaining 30%), examines effects of external congestion, compares the testbed to commercial 5G, and open-sources the application, tools, and configurations.

Significance. If the testbed behaviors generalize, the direct measurements provide useful empirical evidence that can challenge common assumptions about communication times and stragglers in wireless FL deployments. The open-sourcing of the FL application, instrumentation tools, and testbed configuration is a clear strength that supports reproducibility. The work could help inform practical FL system design for IoT over 5G networks.

major comments (2)
  1. [Results section reporting straggler percentages] The central empirical claim rests on the finding of a consistent straggler in about 70% of trials. However, the manuscript provides no details on the total number of trials, statistical tests used, error bars, variance across runs, or exclusion criteria, which limits the ability to assess the robustness and generalizability of this percentage.
  2. [Section on testbed validation against commercial 5G] The comparison to commercial 5G is invoked to validate the testbed findings in a broader context, but if this comparison is restricted to basic network metrics rather than FL-specific straggler and communication-time behaviors under matched conditions, it does not fully address the representativeness concern for the straggler claims.
minor comments (1)
  1. [Abstract and results] The abstract and results would benefit from explicit statements of the number of trials or repetitions performed for each condition to allow readers to contextualize the reported percentages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Results section reporting straggler percentages] The central empirical claim rests on the finding of a consistent straggler in about 70% of trials. However, the manuscript provides no details on the total number of trials, statistical tests used, error bars, variance across runs, or exclusion criteria, which limits the ability to assess the robustness and generalizability of this percentage.

    Authors: We agree that additional methodological details are needed to allow readers to fully assess the robustness of the reported 70% figure. In the revised manuscript we will expand the relevant Results section to state the total number of trials conducted, describe any statistical tests or descriptive statistics applied, include error bars or variance measures across runs, and specify the exclusion criteria used (if any). These additions will be placed directly alongside the straggler-percentage claim. revision: yes

  2. Referee: [Section on testbed validation against commercial 5G] The comparison to commercial 5G is invoked to validate the testbed findings in a broader context, but if this comparison is restricted to basic network metrics rather than FL-specific straggler and communication-time behaviors under matched conditions, it does not fully address the representativeness concern for the straggler claims.

    Authors: We appreciate the referee's distinction between network-level and FL-specific validation. Our existing comparison reports both basic network metrics and FL round-completion times; however, we acknowledge that the linkage to straggler behavior could be made more explicit. We will revise the section to highlight the FL-specific communication-time and straggler observations obtained under the commercial 5G conditions we were able to measure, while noting any limitations in achieving perfectly matched experimental conditions. revision: partial

Circularity Check

0 steps flagged

No circularity: purely experimental measurements with no derivations or self-referential reductions

full rationale

The paper consists entirely of direct experimental reporting from a custom 5G-NR SA testbed using Raspberry Pis, SDR, O-RAN software, and the Flower FL framework. It measures communication times, straggler patterns (e.g., consistent straggler in ~70% of trials), and performance across interfaces and bandwidths, then compares results to commercial 5G. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the provided text or abstract. Claims rest on observed data rather than any derivation chain that reduces to prior fitted values or self-citations. The generalization concern raised in the skeptic note is an external validity issue, not a circularity in the paper's internal logic.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Experimental measurement paper with no free parameters or invented entities; central claim depends on the domain assumption that the testbed setup is representative of broader 5G conditions.

axioms (1)
  • domain assumption The 5G testbed with Raspberry Pis, SDR, and O-RAN software accurately captures communication behaviors relevant to commercial 5G deployments.
    Invoked when generalizing experimental findings on stragglers and congestion to real-world 5G FL use cases.

pith-pipeline@v0.9.0 · 5839 in / 1299 out tokens · 38146 ms · 2026-05-22T21:18:39.190914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 2 internal anchors

  1. [1]

    3GPP. 2024. Service requirements for the 5G system . Technical Specification. 3rd Generation Partnership Project, Sophia Antipolis, France. https://www.3gpp. org/ftp/Specs/archive/22_series/22.261/ Release 18

  2. [2]

    3GPP. 2025. Base Station (BS) radio transmission and reception . Technical Specifi- cation. 3rd Generation Partnership Project. Release 18

  3. [3]

    Syreen Banabilah, Moayad Aloqaily, Eitaa Alsayed, Nida Malik, and Yaser Jarar- weh. 2022. Federated learning review: Fundamentals, enabling technologies, and future applications. Information Processing & Management 59, 6 (Nov. 2022), 103061. https://doi.org/10.1016/j.ipm.2022.103061

  4. [4]

    Mahdi Beitollahi and Ning Lu. 2023. Federated Learning Over Wireless Networks: Challenges and Solutions. IEEE Internet of Things Journal 10, 16 (Aug. 2023), 14749–14763. https://doi.org/10.1109/JIOT.2023.3285868 Conference Name: IEEE Internet of Things Journal

  5. [5]

    Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez- Marques, Yan Gao, Lorenzo Sani, Hei Li Kwing, Titouan Parcollet, Pedro PB de Gusmão, and Nicholas D Lane. 2020. Flower: A Friendly Federated Learning Research Framework. arXiv preprint arXiv:2007.14390 (2020)

  6. [6]

    Chowdhury, Ste- fano Basagni, and Tommaso Melodia

    Leonardo Bonati, Pedram Johari, Michele Polese, Salvatore D’Oro, Subhramoy Mohanti, Miead Tehrani-Moayyed, Davide Villa, Shweta Shrivastava, Chi- nenye Tassie, Kurt Yoder, Ajeet Bagga, Paresh Patel, Ventz Petkov, Michael Seltser, Francesco Restuccia, Abhimanyu Gosain, Kaushik R. Chowdhury, Ste- fano Basagni, and Tommaso Melodia. 2021. Colosseum: Large-Sca...

  7. [7]

    Stoller, Jacobus Van der Merwe, Kirk Webb, and Gary Wong

    Joe Breen, Andrew Buffmire, Jonathon Duerig, Kevin Dutt, Eric Eide, Mike Hibler, David Johnson, Sneha Kumar Kasera, Earl Lewis, Dustin Maas, Alex Orange, Neal Patwari, Daniel Reading, Robert Ricci, David Schurig, Leigh B. Stoller, Jacobus Van der Merwe, Kirk Webb, and Gary Wong. 2020. POWDER: Platform for Open Wireless Data-driven Experimental Research. I...

  8. [8]

    Vincent Poor, Walid Saad, and Shuguang Cui

    Mingzhe Chen, H. Vincent Poor, Walid Saad, and Shuguang Cui. 2020. Wireless Communications for Collaborative Federated Learning. IEEE Communications Magazine 58, 12 (Dec. 2020), 48–54. https://doi.org/10.1109/MCOM.001.2000397 Conference Name: IEEE Communications Magazine

  9. [9]

    Vincent Poor, Walid Saad, and Shuguang Cui

    Mingzhe Chen, H. Vincent Poor, Walid Saad, and Shuguang Cui. 2021. Con- vergence Time Optimization for Federated Learning over Wireless Networks. https://doi.org/10.48550/arXiv.2001.07845 arXiv:2001.07845 [cs]

  10. [10]

    Gerald Combs. 2024. Wireshark: The world’s most popular network protocol analyzer. https://www.wireshark.org/

  11. [11]

    TorchVision Contributors. 2025. TorchVision Image Classification Reference Training Scripts. https://github.com/pytorch/vision/blob/ c2ab0c59f42babf9ad01aa616cd8a901daac86dd/references/classification/ README.md. Accessed: 2025-02-24

  12. [12]

    free5GC. 2025. Open Source 5G Core Network Implementation. GitHub. https: //github.com/free5gc/free5gc

  13. [13]

    Google. 2024. gRPC: A high-performance, open source universal RPC framework . https://github.com/grpc/grpc

  14. [14]

    Hayek, Joaquin Chung, Kayla Comer, Chandra Murthy, Rajkumar Kettimuthu, and Igor Kadota

    Robert J. Hayek, Joaquin Chung, Kayla Comer, Chandra Murthy, Rajkumar Kettimuthu, and Igor Kadota. 2025. Federated Learning for 5G Tesbed for 5G . https://github.com/Net-X-Research-Group/federated_learning_testbed

  15. [15]

    SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

    Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x R. J. Hayek et al. fewer parameters and <0.5MB model size. arXiv:1602.07360 (2016)

  16. [16]

    Leung, and Leandros Tassiulas

    Yuang Jiang, Shiqiang Wang, Víctor Valls, Bong Jun Ko, Wei-Han Lee, Kin K. Leung, and Leandros Tassiulas. 2023. Model Pruning Enables Efficient Federated Learning on Edge Devices. IEEE Transactions on Neural Networks and Learning Systems 34, 12 (Dec. 2023), 10374–10386. https://doi.org/10.1109/TNNLS.2022. 3166101 Conference Name: IEEE Transactions on Neur...

  17. [17]

    Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images . Technical Report. https://www.cs.toronto.edu/~kriz/learning-features-2009- TR.pdf

  18. [19]

    Seungyeol Lee and Myung-Ki Shin. 2022. Federated learning over private 5G networks: demo. In Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc ’22). Association for Computing Machinery, New York, NY, USA, 295–296. https://doi.org/10.1145/3492866.3561259

  19. [20]

    Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying- Chang Liang, Qiang Yang, Dusit Niyato, and Chunyan Miao. 2020. Federated Learning in Mobile Edge Networks: A Comprehensive Survey. IEEE Communica- tions Surveys & Tutorials 22, 3 (2020), 2031–2063. https://doi.org/10.1109/COMST. 2020.2986024 Conference Name: IEEE Communications Surv...

  20. [21]

    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas

    H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2023. Communication-Efficient Learning of Deep Networks from Decentralized Data. https://doi.org/10.48550/arXiv.1602.05629 arXiv:1602.05629 [cs]

  21. [22]

    Giovanni Nardini, Giovanni Stea, and Antonio Virdis. 2021. Scalable Real-Time Emulation of 5G Networks With Simu5G. IEEE Access 9 (2021), 148504–148520. https://doi.org/10.1109/ACCESS.2021.3123873

  22. [23]

    Marina, Saravana Manickam, Alex Dawson, Raymond Knopp, and Christian Bonnet

    Navid Nikaein, Mahesh K. Marina, Saravana Manickam, Alex Dawson, Raymond Knopp, and Christian Bonnet. 2014. OpenAirInterface: A Flexible Platform for 5G Research. SIGCOMM Comput. Commun. Rev. 44, 5 (Oct. 2014), 33–38. https://doi.org/10.1145/2677046.2677053

  23. [24]

    Parsa Rajabzadeh and Abdelkader Outtagarts. 2023. Federated Learning for Distributed NWDAF Architecture. In 2023 26th Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN) . 24–26. https://doi.org/10. 1109/ICIN56760.2023.10073493

  24. [25]

    Yasintha Rumesh, Dinaj Attanayaka, Pawani Porambage, Jarno Pinola, Joshua Groen, and Kaushik Chowdhury. 2024. Federated Learning for Anomaly Detec- tion in Open RAN: Security Architecture Within a Digital Twin. In2024 Joint Euro- pean Conference on Networks and Communications & 6G Summit (EuCNC/6G Sum- mit). 877–882. https://doi.org/10.1109/EuCNC/6GSummit...

  25. [26]

    Tao Sun, Dongsheng Li, and Bao Wang. 2021. Decentralized Federated Averaging. arXiv:2104.11375 [cs.DC] https://arxiv.org/abs/2104.11375

  26. [27]

    Tran, Wei Bao, Albert Zomaya, Minh N

    Nguyen H. Tran, Wei Bao, Albert Zomaya, Minh N. H. Nguyen, and Choong Seon Hong. 2019. Federated Learning over Wireless Networks: Optimization Model Design and Analysis. In IEEE INFOCOM 2019 - IEEE Conference on Computer Communications. 1387–1395. https://doi.org/10.1109/INFOCOM.2019.8737464 ISSN: 2641-9874

  27. [28]

    Shuo Wan, Jiaxun Lu, Pingyi Fan, Yunfeng Shao, Chenghui Peng, and Khaled B. Letaief. 2021. Convergence Analysis and System Design for Federated Learning Over Wireless Networks. IEEE Journal on Selected Areas in Communications 39, 12 (Dec. 2021), 3622–3639. https://doi.org/10.1109/JSAC.2021.3118351 Conference Name: IEEE Journal on Selected Areas in Communications

  28. [29]

    Kok-Seng Wong, Manh Nguyen-Duc, Khiem Le-Huy, Long Ho-Tuan, Cuong Do-Danh, and Danh Le-Phuoc. 2023. An Empirical Study of Federated Learning on IoT-Edge Devices: Resource Allocation and Heterogeneity. arXiv:2305.19831 (May 2023). http://arxiv.org/abs/2305.19831 arXiv:2305.19831 [cs]

  29. [30]

    Jiakai Yu, Tingjun Chen, Craig Gutterman, Shengxiang Zhu, Gil Zussman, Ivan Seskar, and Daniel Kilper. 2019. COSMOS: Optical Architecture and Prototyping. In 2019 Optical Fiber Communications Conference and Exhibition (OFC) . 1–3. https://ieeexplore.ieee.org/document/8697010

  30. [31]

    Tianming Zang, Ce Zheng, Shiyao Ma, Chen Sun, and Wei Chen. 2023. A General Solution for Straggler Effect and Unreliable Communication in Federated Learning. In ICC 2023 - IEEE International Conference on Communications . IEEE, Rome, Italy, 1194–1199. https://doi.org/10.1109/ICC45041.2023.10279635 ACRONYMS 5G-NR 5G New Radio. 2–4 5GC 5G Core Network. 3, 4...