Multi-Worker Selection based Distributed Swarm Learning for Edge IoT with Non-i.i.d. Data

arxiv: 2509.18367 · v1 · submitted 2025-09-22 · 💻 cs.LG · cs.AI· cs.DC

Multi-Worker Selection based Distributed Swarm Learning for Edge IoT with Non-i.i.d. Data

Zhuoyu Yao , Yue Wang , Songyang Zhang , Yingshu Li , Zhipeng Cai , Zhi Tian This is my paper

Pith reviewed 2026-05-18 13:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DC

keywords distributed swarm learningnon-i.i.d. datamulti-worker selectionedge IoTdata heterogeneityconvergence analysisM-DSL

0 comments p. Extension

The pith

A non-i.i.d. degree metric enables targeted multi-worker selection that improves convergence and accuracy in distributed swarm learning on heterogeneous edge IoT data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a new metric for quantifying data heterogeneity lets the system pick multiple workers whose local updates contribute most to the global model. This matters because non-i.i.d. data in edge IoT settings normally slows convergence and lowers accuracy in standard distributed swarm learning. By linking the metric directly to performance evaluation, the approach supplies both a practical selection rule and a convergence proof. Experiments on varied heterogeneous datasets confirm that the resulting M-DSL method outperforms common benchmarks.

Core claim

M-DSL introduces a non-i.i.d. degree metric that measures statistical differences across local datasets and uses these scores to select multiple workers for global model updates. The design supplies a theoretical convergence bound for the resulting algorithm and demonstrates, through numerical tests on heterogeneous datasets and non-i.i.d. partitions, that the selected updates produce higher accuracy and faster stabilization than baseline DSL schemes.

What carries the argument

The non-i.i.d. degree metric, which quantifies statistical differences among local datasets and directly ties that measure to the evaluation of DSL performance.

If this is right

M-DSL supplies an explicit rule for choosing multiple workers that make prominent contributions to each global update.
The method yields a provable convergence guarantee for DSL under the measured heterogeneity.
Numerical results across multiple heterogeneous datasets and non-i.i.d. degrees show consistent gains over existing benchmarks.
The selection mechanism supports improved model scalability and energy efficiency for edge IoT deployments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The metric could be repurposed to guide worker selection in other distributed optimization settings that face statistical heterogeneity.
Real-time updates to the non-i.i.d. degree scores might allow adaptive worker pools that respond to changing data distributions.
Focusing updates on high-contribution workers suggests a route to lower total communication volume in large-scale IoT networks.

Load-bearing premise

The non-i.i.d. degree metric accurately captures statistical differences among local datasets and reliably links those differences to DSL performance.

What would settle it

An experiment in which M-DSL worker selection guided by the metric produces no better or worse final accuracy and convergence speed than uniform random selection on the same non-i.i.d. partitions would falsify the central claim.

Figures

Figures reproduced from arXiv: 2509.18367 by Songyang Zhang, Yingshu Li, Yue Wang, Zhipeng Cai, Zhi Tian, Zhuoyu Yao.

**Figure 2.** Figure 2: The non-i.i.d. CIFAR10 case II: A more common hetero [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The i.i.d. case plays as a baseline performance that [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 3.** Figure 3: Learning performance evaluation of image classification by FedAvg, DSL and improved DSL. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Recent advances in distributed swarm learning (DSL) offer a promising paradigm for edge Internet of Things. Such advancements enhance data privacy, communication efficiency, energy saving, and model scalability. However, the presence of non-independent and identically distributed (non-i.i.d.) data pose a significant challenge for multi-access edge computing, degrading learning performance and diverging training behavior of vanilla DSL. Further, there still lacks theoretical guidance on how data heterogeneity affects model training accuracy, which requires thorough investigation. To fill the gap, this paper first study the data heterogeneity by measuring the impact of non-i.i.d. datasets under the DSL framework. This then motivates a new multi-worker selection design for DSL, termed M-DSL algorithm, which works effectively with distributed heterogeneous data. A new non-i.i.d. degree metric is introduced and defined in this work to formulate the statistical difference among local datasets, which builds a connection between the measure of data heterogeneity and the evaluation of DSL performance. In this way, our M-DSL guides effective selection of multiple works who make prominent contributions for global model updates. We also provide theoretical analysis on the convergence behavior of our M-DSL, followed by extensive experiments on different heterogeneous datasets and non-i.i.d. data settings. Numerical results verify performance improvement and network intelligence enhancement provided by our M-DSL beyond the benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This adds a non-i.i.d. degree metric to pick multiple workers in DSL for edge IoT, but the metric's role in the convergence claim looks loose.

read the letter

The main thing here is that the authors define a new non-i.i.d. degree metric and use it to drive a multi-worker selection rule called M-DSL, with the goal of improving distributed swarm learning when local datasets in edge IoT are heterogeneous. They run experiments on varied non-i.i.d. settings and claim both convergence guarantees and gains over standard benchmarks. That focus on a real deployment pain point is the clearest positive. The selection idea itself is a straightforward extension of prior DSL work, and the experiments give at least some evidence that picking more than one worker can help under heterogeneity. They also attempt a theoretical analysis, which is better than many applied papers in this area. The soft spot is the link between the new metric and the stated convergence behavior. The abstract presents the metric as building a direct connection to performance and selection quality, yet it is not obvious from the description whether the proof actually substitutes or bounds the metric inside the key inequalities or whether the selection rule is simply a threshold heuristic. If the rate holds without the metric appearing in the derivation, then the performance lift may come from generic multi-worker averaging rather than the specific contribution. That makes it harder to credit the metric for the reported improvements. This paper is aimed at researchers working on distributed learning for resource-constrained IoT networks. Someone already familiar with DSL frameworks could pick up the selection heuristic and the experimental comparisons without much trouble. It is not a foundational result, but the practical angle and the attempt at theory are enough to justify sending it to a serious referee who can check the derivations and the exact experimental controls.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes M-DSL, a multi-worker selection algorithm for distributed swarm learning (DSL) in edge IoT settings with non-i.i.d. data. It defines a new non-i.i.d. degree metric to quantify statistical differences among local datasets, employs the metric to select workers making prominent contributions to global model updates, supplies a theoretical convergence analysis for M-DSL, and reports experimental results on heterogeneous datasets demonstrating performance gains relative to benchmarks.

Significance. If the non-i.i.d. degree metric is shown to be derived independently of the performance metrics and is explicitly substituted into the convergence bounds and selection optimality condition, the work could provide useful theoretical guidance for mitigating data heterogeneity effects in privacy-preserving distributed learning on resource-constrained devices. The experimental component on varied non-i.i.d. regimes adds practical value, but only if the metric-to-bound linkage is clarified.

major comments (2)

[Theoretical analysis] Theoretical analysis section: the convergence theorem states a rate for the global model but does not substitute or bound the non-i.i.d. degree metric inside the key inequality (e.g., the term controlling the divergence due to local data distributions). Without this step the metric is not load-bearing for the stated guarantee, undermining the claim that it directly connects heterogeneity measurement to DSL performance evaluation.
[M-DSL algorithm] M-DSL algorithm description: the multi-worker selection rule is defined via a threshold or ranking on the non-i.i.d. degree metric, yet this rule is not shown to be the argmax of any term appearing in the convergence expression. If the selection remains a heuristic rather than an optimization derived from the bound, the theoretical motivation for the metric is weakened and the reported gains cannot be attributed specifically to the new metric.

minor comments (2)

[Abstract] The abstract claims 'theoretical analysis on the convergence behavior' and 'extensive experiments'; a brief pointer to the specific theorem number and the main performance table would improve readability.
[Metric definition] Notation for the non-i.i.d. degree metric (e.g., symbol and scaling parameters) should be introduced once and used consistently in both the metric definition and the subsequent algorithm and proof sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, clarifying the current connections in the manuscript while outlining revisions to strengthen the theoretical linkage between the non-i.i.d. degree metric, worker selection, and convergence guarantees.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical analysis section: the convergence theorem states a rate for the global model but does not substitute or bound the non-i.i.d. degree metric inside the key inequality (e.g., the term controlling the divergence due to local data distributions). Without this step the metric is not load-bearing for the stated guarantee, undermining the claim that it directly connects heterogeneity measurement to DSL performance evaluation.

Authors: We appreciate this observation. The convergence theorem provides a rate that includes a divergence term arising from non-i.i.d. local datasets. The non-i.i.d. degree metric is defined precisely to quantify this statistical difference for each worker, and M-DSL selects workers to reduce the aggregate effect of this term. While an explicit substitution of the metric into the key inequality is not performed in the current version, the metric directly informs the selection that controls the divergence. To address the concern, we will revise the theoretical analysis to derive an explicit upper bound on the divergence term expressed in terms of the non-i.i.d. degree metric, making its role in the guarantee explicit. revision: yes
Referee: [M-DSL algorithm] M-DSL algorithm description: the multi-worker selection rule is defined via a threshold or ranking on the non-i.i.d. degree metric, yet this rule is not shown to be the argmax of any term appearing in the convergence expression. If the selection remains a heuristic rather than an optimization derived from the bound, the theoretical motivation for the metric is weakened and the reported gains cannot be attributed specifically to the new metric.

Authors: We agree that a direct derivation of the selection rule as the argmax of a term from the convergence bound would provide stronger motivation. The current rule ranks workers by the metric to prioritize those whose local distributions contribute less to divergence. To strengthen this, we will add a remark or short derivation in the algorithm section showing that selecting the k workers with the smallest non-i.i.d. degree values minimizes an upper bound on the divergence term appearing in the convergence rate. This will establish that the rule is theoretically motivated rather than purely heuristic. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The abstract introduces a non-i.i.d. degree metric to quantify dataset differences and connect heterogeneity to DSL performance, then separately states the M-DSL selection rule and provides theoretical convergence analysis verified by experiments. No equations or sections are available in the given text that reduce the convergence bound to a direct substitution of the metric, treat the selection criterion as a fitted parameter renamed as prediction, or rely on a self-citation chain for the uniqueness or load-bearing claim. The theoretical analysis is presented as an independent step following the metric definition, satisfying the requirement for explicit reduction before flagging circularity. This aligns with the default expectation for non-circular papers.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central addition is the non-i.i.d. degree metric whose definition and connection to performance are introduced without upstream derivation; the approach rests on the domain assumption that heterogeneity can be quantified in a way that directly predicts worker utility.

free parameters (1)

non-i.i.d. degree metric scaling or threshold parameters
The metric is introduced to measure statistical difference; any scaling constants or selection thresholds are likely chosen or fitted to make the connection to DSL performance hold.

axioms (1)

domain assumption Data heterogeneity measured by the non-i.i.d. degree metric directly affects DSL training accuracy and can be used to guide worker selection.
Invoked when the paper states that the metric builds a connection between heterogeneity measure and performance evaluation.

pith-pipeline@v0.9.0 · 5790 in / 1136 out tokens · 35155 ms · 2026-05-18T13:59:23.361003+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 2 internal anchors

[1]

Enhancing reliability of distributed learning over edge networks,

X. Fan, Y . Wang, Y . Li, Y . Hong, C. Luo, and Z. Tian, “Enhancing reliability of distributed learning over edge networks,” in2025 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 2025, pp. 501–506

work page 2025
[2]

A state-of-the-art survey on solving non-iid data in federated learning,

X. Ma, J. Zhu, Z. Lin, S. Chen, and Y . Qin, “A state-of-the-art survey on solving non-iid data in federated learning,”Future Generation Computer Systems, vol. 135, pp. 244–258, 2022

work page 2022
[3]

Efficient distributed swarm learning for edge computing,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “Efficient distributed swarm learning for edge computing,” inICC 2023-IEEE International Confer- ence on Communications. IEEE, 2023, pp. 3627–3632

work page 2023
[4]

Ganfed: Gan-based federated learning with non-iid datasets in edge iots,

X. Fan, Y . Wang, W. Zhang, Y . Li, Z. Cai, and Z. Tian, “Ganfed: Gan-based federated learning with non-iid datasets in edge iots,” in ICC 2024-IEEE International Conference on Communications. IEEE, 2024, pp. 5443–5448

work page 2024
[5]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021
[6]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution for federated visual classification,”arXiv preprint arXiv:1909.06335, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1909
[7]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020

work page 2020
[8]

Distributed swarm learning for edge internet of things,

Y . Wang, Z. Tian, X. Fan, Z. Cai, C. Nowzari, and K. Zeng, “Distributed swarm learning for edge internet of things,”IEEE Communications Magazine, vol. 62, no. 11, pp. 160–166, 2024

work page 2024
[9]

Cb-dsl: Communication- efficient and byzantine-robust distributed swarm learning on non-i.i.d. data,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “Cb-dsl: Communication- efficient and byzantine-robust distributed swarm learning on non-i.i.d. data,”IEEE Transactions on Cognitive Communications and Network- ing, vol. 10, no. 1, pp. 322–334, 2024

work page 2024
[10]

Wasserstein distance rivals kullback-leibler divergence for knowledge distillation,

J. Lv, H. Yang, and P. Li, “Wasserstein distance rivals kullback-leibler divergence for knowledge distillation,”Advances in Neural Information Processing Systems, vol. 37, pp. 65 445–65 475, 2025

work page 2025
[11]

Federated Learning with Non-IID Data

Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, “Federated learning with non-iid data,”arXiv preprint arXiv:1806.00582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Curse or redemption? how data heterogeneity affects the robustness of federated learning,

S. Zawad, A. Ali, P.-Y . Chen, A. Anwar, Y . Zhou, N. Baracaldo, Y . Tian, and F. Yan, “Curse or redemption? how data heterogeneity affects the robustness of federated learning,” inProceedings of AAAI conference on artificial intelligence, vol. 35, no. 12, 2021, pp. 10 807–10 814

work page 2021
[13]

Data normalization and standardization: a technical report,

P. J. M. Ali, R. H. Faraj, E. Koya, P. J. M. Ali, and R. H. Faraj, “Data normalization and standardization: a technical report,”Mach Learn Tech Rep, vol. 1, no. 1, pp. 1–6, 2014

work page 2014
[14]

Joint optimization of commu- nications and federated learning over the air,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “Joint optimization of commu- nications and federated learning over the air,”IEEE Transactions on Wireless Communications, vol. 21, no. 6, pp. 4434–4449, 2021

work page 2021
[15]

Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,

P. Xu, Y . Wang, X. Chen, and Z. Tian, “Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 17 987–17 999, 2024

work page 2024
[16]

1-bit compressive sensing for efficient federated learning over the air,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “1-bit compressive sensing for efficient federated learning over the air,”IEEE transactions on wireless communications, vol. 22, no. 3, pp. 2139–2155, 2022

work page 2022
[17]

Federated learning of deep networks using model averaging,

H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas, “Federated learning of deep networks using model averaging,”arXiv preprint arXiv:1602.05629, vol. 2, no. 2, 2016

work page arXiv 2016
[18]

The mnist database of handwritten digit images for machine learning research,

L. Deng, “The mnist database of handwritten digit images for machine learning research,”IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012

work page 2012
[19]

Learning multiple layers of features from tiny images,

K. Alex, “Learning multiple layers of features from tiny images,” https://www. cs. toronto. edu/kriz/learning-features-2009-TR. pdf, 2009

work page 2009
[20]

Experimenting with normalization layers in federated learning on non- iid scenarios,

B. Casella, R. Esposito, A. Sciarappa, C. Cavazzoni, and M. Aldinucci, “Experimenting with normalization layers in federated learning on non- iid scenarios,”IEEE Access, vol. 12, pp. 47 961–47 971, 2024

work page 2024

[1] [1]

Enhancing reliability of distributed learning over edge networks,

X. Fan, Y . Wang, Y . Li, Y . Hong, C. Luo, and Z. Tian, “Enhancing reliability of distributed learning over edge networks,” in2025 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 2025, pp. 501–506

work page 2025

[2] [2]

A state-of-the-art survey on solving non-iid data in federated learning,

X. Ma, J. Zhu, Z. Lin, S. Chen, and Y . Qin, “A state-of-the-art survey on solving non-iid data in federated learning,”Future Generation Computer Systems, vol. 135, pp. 244–258, 2022

work page 2022

[3] [3]

Efficient distributed swarm learning for edge computing,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “Efficient distributed swarm learning for edge computing,” inICC 2023-IEEE International Confer- ence on Communications. IEEE, 2023, pp. 3627–3632

work page 2023

[4] [4]

Ganfed: Gan-based federated learning with non-iid datasets in edge iots,

X. Fan, Y . Wang, W. Zhang, Y . Li, Z. Cai, and Z. Tian, “Ganfed: Gan-based federated learning with non-iid datasets in edge iots,” in ICC 2024-IEEE International Conference on Communications. IEEE, 2024, pp. 5443–5448

work page 2024

[5] [5]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021

[6] [6]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution for federated visual classification,”arXiv preprint arXiv:1909.06335, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1909

[7] [7]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020

work page 2020

[8] [8]

Distributed swarm learning for edge internet of things,

Y . Wang, Z. Tian, X. Fan, Z. Cai, C. Nowzari, and K. Zeng, “Distributed swarm learning for edge internet of things,”IEEE Communications Magazine, vol. 62, no. 11, pp. 160–166, 2024

work page 2024

[9] [9]

Cb-dsl: Communication- efficient and byzantine-robust distributed swarm learning on non-i.i.d. data,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “Cb-dsl: Communication- efficient and byzantine-robust distributed swarm learning on non-i.i.d. data,”IEEE Transactions on Cognitive Communications and Network- ing, vol. 10, no. 1, pp. 322–334, 2024

work page 2024

[10] [10]

Wasserstein distance rivals kullback-leibler divergence for knowledge distillation,

J. Lv, H. Yang, and P. Li, “Wasserstein distance rivals kullback-leibler divergence for knowledge distillation,”Advances in Neural Information Processing Systems, vol. 37, pp. 65 445–65 475, 2025

work page 2025

[11] [11]

Federated Learning with Non-IID Data

Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, “Federated learning with non-iid data,”arXiv preprint arXiv:1806.00582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Curse or redemption? how data heterogeneity affects the robustness of federated learning,

S. Zawad, A. Ali, P.-Y . Chen, A. Anwar, Y . Zhou, N. Baracaldo, Y . Tian, and F. Yan, “Curse or redemption? how data heterogeneity affects the robustness of federated learning,” inProceedings of AAAI conference on artificial intelligence, vol. 35, no. 12, 2021, pp. 10 807–10 814

work page 2021

[13] [13]

Data normalization and standardization: a technical report,

P. J. M. Ali, R. H. Faraj, E. Koya, P. J. M. Ali, and R. H. Faraj, “Data normalization and standardization: a technical report,”Mach Learn Tech Rep, vol. 1, no. 1, pp. 1–6, 2014

work page 2014

[14] [14]

Joint optimization of commu- nications and federated learning over the air,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “Joint optimization of commu- nications and federated learning over the air,”IEEE Transactions on Wireless Communications, vol. 21, no. 6, pp. 4434–4449, 2021

work page 2021

[15] [15]

Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,

P. Xu, Y . Wang, X. Chen, and Z. Tian, “Qc-odkla: Quantized and communication- censored online decentralized kernel learning via lin- earized admm,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 17 987–17 999, 2024

work page 2024

[16] [16]

1-bit compressive sensing for efficient federated learning over the air,

X. Fan, Y . Wang, Y . Huo, and Z. Tian, “1-bit compressive sensing for efficient federated learning over the air,”IEEE transactions on wireless communications, vol. 22, no. 3, pp. 2139–2155, 2022

work page 2022

[17] [17]

Federated learning of deep networks using model averaging,

H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas, “Federated learning of deep networks using model averaging,”arXiv preprint arXiv:1602.05629, vol. 2, no. 2, 2016

work page arXiv 2016

[18] [18]

The mnist database of handwritten digit images for machine learning research,

L. Deng, “The mnist database of handwritten digit images for machine learning research,”IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012

work page 2012

[19] [19]

Learning multiple layers of features from tiny images,

K. Alex, “Learning multiple layers of features from tiny images,” https://www. cs. toronto. edu/kriz/learning-features-2009-TR. pdf, 2009

work page 2009

[20] [20]

Experimenting with normalization layers in federated learning on non- iid scenarios,

B. Casella, R. Esposito, A. Sciarappa, C. Cavazzoni, and M. Aldinucci, “Experimenting with normalization layers in federated learning on non- iid scenarios,”IEEE Access, vol. 12, pp. 47 961–47 971, 2024

work page 2024