Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data

Ahmed Mujtaba; Gleb Radchenko; Marc Masana; Radu Prodan

arxiv: 2508.14769 · v2 · pith:Y5JUVXNKnew · submitted 2025-08-20 · 💻 cs.LG · cs.DC

Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data

Ahmed Mujtaba , Gleb Radchenko , Radu Prodan , Marc Masana This is my paper

Pith reviewed 2026-05-21 22:44 UTC · model grok-4.3

classification 💻 cs.LG cs.DC

keywords federated distillationedge devicesnon-IID dataknowledge distillationdensity ratio estimationKMeans clusteringclient-side filtering

0 comments

The pith

EdgeFD replaces complex client-side density estimators with KMeans to filter proxy data locally, removing server filtering and reaching near-IID accuracy in non-IID federated distillation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Federated distillation lets edge devices collaborate by exchanging soft model outputs rather than parameters, which cuts communication and improves privacy. The paper introduces EdgeFD, a method that simplifies client-side filtering of useful proxy data through an efficient KMeans-based density ratio estimator instead of heavy statistical calculations and eliminates the need for any server-side filtering step. Experiments across strong non-IID, weak non-IID, and IID client data distributions show the approach delivers accuracy close to ideal IID cases without requiring a pre-trained teacher model on the server. A sympathetic reader would care because the lower computational load makes collaborative learning practical on phones, sensors, and other constrained devices where data distributions naturally differ.

Core claim

The paper claims that an efficient KMeans-based density ratio estimator running on each client can reliably identify and filter both in-distribution and out-of-distribution proxy data, thereby improving the quality of knowledge sharing in federated distillation. This client-only filtering removes the need for complex statistical density ratio estimators and for any server-side filtering of ambiguous knowledge, producing models whose accuracy stays close to IID performance even under strong non-IID conditions and without a pre-trained teacher model on the server.

What carries the argument

KMeans-based density ratio estimator that performs client-side filtering of in-distribution and out-of-distribution proxy data for knowledge sharing.

If this is right

Clients perform filtering locally with far lower computational cost, making the process viable on resource-constrained edge hardware.
Eliminating server-side filtering removes an extra latency step from the overall workflow.
Accuracy remains close to IID levels across strong non-IID, weak non-IID, and IID client data distributions.
Deployment no longer requires a pre-trained teacher model on the server, simplifying system setup.
The method outperforms prior selective knowledge-sharing strategies in measured accuracy under heterogeneous conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Local filtering may allow federated distillation to scale to larger numbers of devices by removing any central filtering bottleneck.
The same client-side simplification could be tested in other distillation-based collaborative learning setups that face data heterogeneity.
Longer-term experiments could measure whether the accuracy advantage holds when client counts reach thousands or when data drifts over time.
Pairing the lighter filtering step with existing model compression techniques may produce further gains for very small edge devices.

Load-bearing premise

KMeans clustering on each client can accurately separate useful proxy data from irrelevant data without introducing bias or needing more complex statistical estimators.

What would settle it

An experiment in which replacing the KMeans estimator with a standard statistical density ratio method yields clearly higher accuracy or lower filtering error under strong non-IID conditions would falsify the central claim.

Figures

Figures reproduced from arXiv: 2508.14769 by Ahmed Mujtaba, Gleb Radchenko, Marc Masana, Radu Prodan.

**Figure 1.** Figure 1: Global and client-side workflow of EdgeFD with KMeans density ratio estimation on heterogeneous devices, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Learning and estimation time and memory comparison between KuLSIF-DRE [ [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Density ratio estimation comparison using randomly sampled two-feature data. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Principal component analysis of various datasets. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of proxy samples percentage and ID detection threshold on test accuracy for different datasets. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

Federated distillation has emerged as a promising collaborative machine learning approach, offering enhanced privacy protection and reduced communication compared to traditional federated learning by exchanging model outputs (soft logits) rather than full model parameters. However, existing methods employ complex selective knowledge-sharing strategies that require clients to identify in-distribution proxy data through computationally expensive statistical density ratio estimators. Additionally, server-side filtering of ambiguous knowledge introduces latency to the process. To address these challenges, we propose a robust, resource-efficient EdgeFD method that reduces the complexity of the client-side density ratio estimation and removes the need for server-side filtering. EdgeFD introduces an efficient KMeans-based density ratio estimator for effectively filtering both in-distribution and out-of-distribution proxy data on clients, significantly improving the quality of knowledge sharing. We evaluate EdgeFD across diverse practical scenarios, including strong non-IID, weak non-IID, and IID data distributions on clients, without requiring a pre-trained teacher model on the server for knowledge distillation. Experimental results demonstrate that EdgeFD outperforms state-of-the-art methods, consistently achieving accuracy levels close to IID scenarios even under heterogeneous and challenging conditions. The significantly reduced computational overhead of the KMeans-based estimator is suitable for deployment on resource-constrained edge devices, thereby enhancing the scalability and real-world applicability of federated distillation. The code is available online for reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes EdgeFD, a federated distillation approach for edge devices that replaces complex statistical density ratio estimators with a KMeans-based client-side filter for selecting in-distribution proxy data, eliminates server-side filtering of ambiguous knowledge, and reports experiments showing outperformance over prior methods with accuracy approaching IID levels under strong non-IID, weak non-IID, and IID client data distributions, all without a pre-trained server teacher model. Code is released for reproducibility.

Significance. If the empirical gains are reproducible and attributable to the proposed filtering mechanism, the work could improve the deployability of federated distillation on resource-limited devices by lowering client computation and communication overhead while maintaining knowledge-sharing quality in heterogeneous settings. The reproducibility artifact is a positive contribution.

major comments (2)

[EdgeFD method description] The central performance claim rests on the KMeans-based density ratio estimator for client-side proxy filtering (described in the EdgeFD method section). No derivation, error bounds, or comparison to established density-ratio methods (KLIEP, uLSIF) is supplied; KMeans is a partitioning heuristic whose connection to reliable in/out-of-distribution separation in high-dimensional or non-convex feature spaces is not justified. This directly affects attribution of the reported accuracy gains to the proposed mechanism rather than to other implementation choices.
[Evaluation / Experimental results] The abstract and evaluation sections state that EdgeFD 'outperforms state-of-the-art methods' and achieves 'accuracy levels close to IID scenarios' under heterogeneous conditions, yet no quantitative metrics, datasets, baselines, error bars, or ablation tables are referenced. Without these, the load-bearing empirical claim cannot be assessed for statistical significance or robustness.

minor comments (2)

[Abstract] The abstract claims suitability for 'resource-constrained edge devices' but provides no concrete runtime, memory, or FLOPs measurements for the KMeans estimator versus prior statistical estimators.
[Method] Notation for the density ratio estimator and the precise clustering objective (e.g., number of clusters, distance metric, initialization) should be formalized with equations for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive note on the reproducibility artifact. Below we respond point-by-point to the two major comments, indicating the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: [EdgeFD method description] The central performance claim rests on the KMeans-based density ratio estimator for client-side proxy filtering (described in the EdgeFD method section). No derivation, error bounds, or comparison to established density-ratio methods (KLIEP, uLSIF) is supplied; KMeans is a partitioning heuristic whose connection to reliable in/out-of-distribution separation in high-dimensional or non-convex feature spaces is not justified. This directly affects attribution of the reported accuracy gains to the proposed mechanism rather than to other implementation choices.

Authors: We agree that the current method section would benefit from additional justification. KMeans is employed as a computationally lightweight heuristic specifically to meet the constraints of resource-limited edge devices, where established estimators such as KLIEP and uLSIF incur prohibitive overhead. In the revised manuscript we will expand the EdgeFD method description with a dedicated paragraph explaining the rationale: KMeans operates on client-side feature embeddings to partition proxy data into clusters, thereby approximating in-distribution selection without requiring density-ratio optimization. We will also add a complexity comparison (runtime and memory) against KLIEP and uLSIF and include an ablation that isolates the filtering component. These changes will strengthen attribution of the observed gains to the proposed client-side mechanism while acknowledging the heuristic nature of the approach. revision: yes
Referee: [Evaluation / Experimental results] The abstract and evaluation sections state that EdgeFD 'outperforms state-of-the-art methods' and achieves 'accuracy levels close to IID scenarios' under heterogeneous conditions, yet no quantitative metrics, datasets, baselines, error bars, or ablation tables are referenced. Without these, the load-bearing empirical claim cannot be assessed for statistical significance or robustness.

Authors: We acknowledge that the abstract and evaluation sections could reference the supporting results more explicitly. The manuscript already reports experiments across strong non-IID, weak non-IID, and IID partitions on standard image-classification datasets, comparing against relevant federated-distillation baselines and measuring both accuracy and client-side overhead. In the revision we will (i) update the abstract to cite the key quantitative improvements and (ii) add explicit cross-references from the text to the tables and figures that contain mean accuracies, standard deviations over repeated runs, and ablation results. These edits will make the empirical claims easier to verify without altering the underlying data. revision: yes

Circularity Check

0 steps flagged

No circularity: method proposal with experimental validation is self-contained

full rationale

The paper introduces EdgeFD as a new client-side KMeans-based density ratio estimator for federated distillation, explicitly positioned as a simplification over prior complex statistical estimators and server-side filtering. No equations, derivations, or load-bearing steps are shown that reduce claimed performance gains to fitted parameters renamed as predictions, self-definitional loops, or self-citation chains. The abstract and description frame the contribution as an efficient heuristic with empirical evaluation across IID/non-IID scenarios, without invoking uniqueness theorems or smuggling ansatzes via prior work. This matches the default case of a standard method proposal that remains independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based on abstract only: the central claim rests on the effectiveness of KMeans clustering as a density ratio estimator for proxy data filtering and on the assumption that client-side filtering alone suffices without server intervention.

axioms (1)

domain assumption KMeans clustering can serve as an effective and computationally lighter substitute for statistical density ratio estimation in identifying in-distribution proxy data.
Invoked in the description of the EdgeFD method for client-side filtering.

invented entities (1)

EdgeFD method no independent evidence
purpose: Resource-efficient federated distillation with client-side KMeans filtering and no server-side filtering.
New named approach introduced to address limitations of existing selective knowledge-sharing strategies.

pith-pipeline@v0.9.0 · 5778 in / 1328 out tokens · 43387 ms · 2026-05-21T22:44:21.839404+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

EdgeFD introduces an efficient KMeans-based density ratio estimator for effectively filtering both in-distribution and out-of-distribution proxy data on clients
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

KMeans model initialized with a single centroid captures the distinct data distribution pattern

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

[1]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics , pages 1273–1282. PMLR, 2017

work page 2017
[2]

Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data

Sohei Itahara, Takayuki Nishio, Yusuke Koda, Masahiro Morikura, and Koji Yamamoto. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data. IEEE Transactions on Mobile Computing, 22(1):191–205, 2021

work page 2021
[3]

Selective knowledge sharing for privacy-preserving federated distillation without a good teacher

Jiawei Shao, Fangzhao Wu, and Jun Zhang. Selective knowledge sharing for privacy-preserving federated distillation without a good teacher. Nature Communications, 15(1):349, 2024

work page 2024
[4]

Khan, Walid Saad, Zhu Han, Ekram Hossain, and Choong Seon Hong

Latif U. Khan, Walid Saad, Zhu Han, Ekram Hossain, and Choong Seon Hong. Federated learning for internet of things: Recent advances, taxonomy, and open challenges. IEEE Communications Surveys & Tutorials , 23(3):1759–1799, 2021

work page 2021
[5]

Federated learning: Challenges, methods, and future directions

Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60, 2020

work page 2020
[6]

Adaptive federated learning in resource constrained edge computing systems

Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K Leung, Christian Makaya, Ting He, and Kevin Chan. Adaptive federated learning in resource constrained edge computing systems. IEEE Selected Areas in Communications, 37(6):1205–1221, 2019

work page 2019
[7]

Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data

Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. In Proceedings of Neural Information Processing Systems, MLPCD Workshop, 2018. 11 This paper was accepted at FLTA, 2025. The final version will be ...

work page 2018
[8]

Distilling the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In Proceedings of Neural Information Processing Systems Workshop, 2014

work page 2014
[9]

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

Laiqiao Qin, Tianqing Zhu, Wanlei Zhou, and Philip S Yu. Knowledge distillation in federated learning: A survey on long lasting challenges and new solutions. arXiv preprint arXiv:2406.10861, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Cfd: Communication-efficient federated distillation via soft-label quantization and delta coding

Felix Sattler, Arturo Marban, Roman Rischke, and Wojciech Samek. Cfd: Communication-efficient federated distillation via soft-label quantization and delta coding. IEEE Transactions on Network Science and Engineering, 9(4):2025–2038, 2022

work page 2025
[11]

Edge ai collaborative learning: Bayesian approaches to uncertainty estimation, 2024

Gleb Radchenko and Victoria Andrea Fill. Edge ai collaborative learning: Bayesian approaches to uncertainty estimation, 2024

work page 2024
[12]

Density ratio estimation in machine learning

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio estimation in machine learning. Cam- bridge University Press, 2012

work page 2012
[13]

Statistical analysis of kernel-based least-squares density-ratio estimation

Takafumi Kanamori, Taiji Suzuki, and Masashi Sugiyama. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86:335–367, 2012

work page 2012
[14]

Fedmd: Heterogenous federated learning via model distillation

Daliang Li and Junpu Wang. Fedmd: Heterogenous federated learning via model distillation. In Proceedings of Neural Information Processing Systems, FLDPC Workshop, 2019

work page 2019
[15]

Federated distillation: A survey

Lin Li, Jianping Gou, Baosheng Yu, Lan Du, and Zhang Yiand Dacheng Tao. Federated distillation: A survey. arXiv preprint arXiv:2404.08564, 2024

work page arXiv 2024
[16]

Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data

Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479, 2018

work page arXiv 2018
[17]

Data-free knowledge distillation for heterogeneous federated learning

Zhuangdi Zhu, Junyuan Hong, and Jiayu Zhou. Data-free knowledge distillation for heterogeneous federated learning. In International conference on machine learning, pages 12878–12889. PMLR, 2021

work page 2021
[18]

Fedcmd: A federated cross-modal knowledge distillation for drivers’ emotion recognition.ACM Transactions on Intelligent Systems and Technology, 15(3):1–27, 2024

Saira Bano, Nicola Tonellotto, Pietro Cassarà, and Alberto Gotta. Fedcmd: A federated cross-modal knowledge distillation for drivers’ emotion recognition.ACM Transactions on Intelligent Systems and Technology, 15(3):1–27, 2024

work page 2024
[19]

Cambridge University Press, 2022

Hyowoon Seo, Jihong Park, Seungeun Oh, Mehdi Bennis, and Seong-Lyun Kim.Federated Knowledge Distillation, pages 457–485. Cambridge University Press, 2022

work page 2022
[20]

Knowledge selection and local updating optimization for federated knowledge distillation with heterogeneous models

Dong Wang, Naifu Zhang, Meixia Tao, and Xu Chen. Knowledge selection and local updating optimization for federated knowledge distillation with heterogeneous models. IEEE Selected Topics in Signal Processing, 17(1):82–97, 2022

work page 2022
[21]

Communication-efficient federated distilla- tion

Felix Sattler, Arturo Marban, Roman Rischke, and Wojciech Samek. Communication-efficient federated distilla- tion. arXiv preprint arXiv:2012.00632, 2020

work page arXiv 2012
[22]

Distributed distillation for on-device learning

Ilai Bistritz, Ariana Mann, and Nicholas Bambos. Distributed distillation for on-device learning. Advances in Neural Information Processing Systems, 33:22593–22604, 2020

work page 2020
[23]

Feded: Federated learning via ensemble distillation for medical relation extraction

Dianbo Sui, Yubo Chen, Jun Zhao, Yantao Jia, Yuantao Xie, and Weijian Sun. Feded: Federated learning via ensemble distillation for medical relation extraction. In Proceedings of Empirical Methods in Natural Language Processing, pages 2118–2128, 2020

work page 2020
[24]

Gradient-based learning applied to document recognition

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[25]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

work page 2017
[26]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 12

work page 2009

[1] [1]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics , pages 1273–1282. PMLR, 2017

work page 2017

[2] [2]

Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data

Sohei Itahara, Takayuki Nishio, Yusuke Koda, Masahiro Morikura, and Koji Yamamoto. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data. IEEE Transactions on Mobile Computing, 22(1):191–205, 2021

work page 2021

[3] [3]

Selective knowledge sharing for privacy-preserving federated distillation without a good teacher

Jiawei Shao, Fangzhao Wu, and Jun Zhang. Selective knowledge sharing for privacy-preserving federated distillation without a good teacher. Nature Communications, 15(1):349, 2024

work page 2024

[4] [4]

Khan, Walid Saad, Zhu Han, Ekram Hossain, and Choong Seon Hong

Latif U. Khan, Walid Saad, Zhu Han, Ekram Hossain, and Choong Seon Hong. Federated learning for internet of things: Recent advances, taxonomy, and open challenges. IEEE Communications Surveys & Tutorials , 23(3):1759–1799, 2021

work page 2021

[5] [5]

Federated learning: Challenges, methods, and future directions

Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60, 2020

work page 2020

[6] [6]

Adaptive federated learning in resource constrained edge computing systems

Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K Leung, Christian Makaya, Ting He, and Kevin Chan. Adaptive federated learning in resource constrained edge computing systems. IEEE Selected Areas in Communications, 37(6):1205–1221, 2019

work page 2019

[7] [7]

Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data

Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. In Proceedings of Neural Information Processing Systems, MLPCD Workshop, 2018. 11 This paper was accepted at FLTA, 2025. The final version will be ...

work page 2018

[8] [8]

Distilling the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In Proceedings of Neural Information Processing Systems Workshop, 2014

work page 2014

[9] [9]

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

Laiqiao Qin, Tianqing Zhu, Wanlei Zhou, and Philip S Yu. Knowledge distillation in federated learning: A survey on long lasting challenges and new solutions. arXiv preprint arXiv:2406.10861, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Cfd: Communication-efficient federated distillation via soft-label quantization and delta coding

Felix Sattler, Arturo Marban, Roman Rischke, and Wojciech Samek. Cfd: Communication-efficient federated distillation via soft-label quantization and delta coding. IEEE Transactions on Network Science and Engineering, 9(4):2025–2038, 2022

work page 2025

[11] [11]

Edge ai collaborative learning: Bayesian approaches to uncertainty estimation, 2024

Gleb Radchenko and Victoria Andrea Fill. Edge ai collaborative learning: Bayesian approaches to uncertainty estimation, 2024

work page 2024

[12] [12]

Density ratio estimation in machine learning

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio estimation in machine learning. Cam- bridge University Press, 2012

work page 2012

[13] [13]

Statistical analysis of kernel-based least-squares density-ratio estimation

Takafumi Kanamori, Taiji Suzuki, and Masashi Sugiyama. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86:335–367, 2012

work page 2012

[14] [14]

Fedmd: Heterogenous federated learning via model distillation

Daliang Li and Junpu Wang. Fedmd: Heterogenous federated learning via model distillation. In Proceedings of Neural Information Processing Systems, FLDPC Workshop, 2019

work page 2019

[15] [15]

Federated distillation: A survey

Lin Li, Jianping Gou, Baosheng Yu, Lan Du, and Zhang Yiand Dacheng Tao. Federated distillation: A survey. arXiv preprint arXiv:2404.08564, 2024

work page arXiv 2024

[16] [16]

Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data

Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. Communication- efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479, 2018

work page arXiv 2018

[17] [17]

Data-free knowledge distillation for heterogeneous federated learning

Zhuangdi Zhu, Junyuan Hong, and Jiayu Zhou. Data-free knowledge distillation for heterogeneous federated learning. In International conference on machine learning, pages 12878–12889. PMLR, 2021

work page 2021

[18] [18]

Fedcmd: A federated cross-modal knowledge distillation for drivers’ emotion recognition.ACM Transactions on Intelligent Systems and Technology, 15(3):1–27, 2024

Saira Bano, Nicola Tonellotto, Pietro Cassarà, and Alberto Gotta. Fedcmd: A federated cross-modal knowledge distillation for drivers’ emotion recognition.ACM Transactions on Intelligent Systems and Technology, 15(3):1–27, 2024

work page 2024

[19] [19]

Cambridge University Press, 2022

Hyowoon Seo, Jihong Park, Seungeun Oh, Mehdi Bennis, and Seong-Lyun Kim.Federated Knowledge Distillation, pages 457–485. Cambridge University Press, 2022

work page 2022

[20] [20]

Knowledge selection and local updating optimization for federated knowledge distillation with heterogeneous models

Dong Wang, Naifu Zhang, Meixia Tao, and Xu Chen. Knowledge selection and local updating optimization for federated knowledge distillation with heterogeneous models. IEEE Selected Topics in Signal Processing, 17(1):82–97, 2022

work page 2022

[21] [21]

Communication-efficient federated distilla- tion

Felix Sattler, Arturo Marban, Roman Rischke, and Wojciech Samek. Communication-efficient federated distilla- tion. arXiv preprint arXiv:2012.00632, 2020

work page arXiv 2012

[22] [22]

Distributed distillation for on-device learning

Ilai Bistritz, Ariana Mann, and Nicholas Bambos. Distributed distillation for on-device learning. Advances in Neural Information Processing Systems, 33:22593–22604, 2020

work page 2020

[23] [23]

Feded: Federated learning via ensemble distillation for medical relation extraction

Dianbo Sui, Yubo Chen, Jun Zhao, Yantao Jia, Yuantao Xie, and Weijian Sun. Feded: Federated learning via ensemble distillation for medical relation extraction. In Proceedings of Empirical Methods in Natural Language Processing, pages 2118–2128, 2020

work page 2020

[24] [24]

Gradient-based learning applied to document recognition

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998

[25] [25]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

work page 2017

[26] [26]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 12

work page 2009