Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search
Pith reviewed 2026-05-21 05:38 UTC · model grok-4.3
The pith
Clients select their own models in federated learning to raise accuracy while slashing computation and communication.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FedKDNAS lets each client autonomously pick a lightweight architecture under accuracy and resource constraints. The client trains this model locally using supervised learning combined with knowledge distillation from server-provided targets. Only the model's predictions on a public reference set are shared with the server. The server aggregates and smooths these predictions, sometimes incorporating a teacher model, to generate stable distillation targets for the next training round. Tests on six datasets against six baselines confirm improved Pareto efficiency.
What carries the argument
Client-driven neural architecture selection with server-side aggregation of predictions on a public reference set for generating distillation targets
Load-bearing premise
That clients can correctly and autonomously choose lightweight architectures matching their accuracy needs and device limits, and that a public reference set can be used without causing bias or privacy problems.
What would settle it
If a fixed client architecture in a standard federated setup achieves the same accuracy with similar or lower CPU and communication costs on the evaluated datasets, the advantage of the proposed method would be called into question.
Figures
read the original abstract
Federated Learning (FL) enables collaborative model training without centralizing data. However, real-world deployments must simultaneously address statistical heterogeneity across client data (non-IID), system heterogeneity in device capabilities, and communication efficiency. Existing FL approaches mitigate these challenges through improved aggregation, personalization, or knowledge distillation, but they almost universally assume a fixed client architecture, limiting adaptability to heterogeneous data complexity and hardware constraints. This architectural constraint often leads to suboptimal trade-offs between accuracy and efficiency in real-world FL systems. This work introduces FedKDNAS, a distillation-driven FL framework that combines client-side neural architecture selection with distillation of server-coordinated knowledge. Each client autonomously selects a lightweight model under accuracy-resource constraints. It then trains it locally using a hybrid objective combining supervised learning and knowledge distillation and shares only predictions on a public reference set. The server then aggregates and smooths these predictions, optionally combining them with a teacher model, to produce stable distillation targets for the next round. Extensive evaluation on six datasets against six representative FL baselines (FedAvg, Ditto, FedMD, FedDF, FedDistill, Local-KD) demonstrates that FedKDNAS consistently achieves superior Pareto efficiency, improving accuracy by up to 15\% under non-IID conditions, reducing client CPU usage by approximately 28\%, and decreasing communication overhead by up to 44 times while maintaining lightweight logit-based communication.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FedKDNAS, a federated learning framework combining client-side neural architecture search for lightweight models with server-coordinated knowledge distillation. Clients train locally using a hybrid supervised-plus-distillation objective and communicate only logits on a shared public reference set; the server aggregates and smooths these predictions (optionally with a teacher) to form distillation targets for subsequent rounds. Empirical results on six datasets versus six baselines (FedAvg, Ditto, FedMD, FedDF, FedDistill, Local-KD) report up to 15% accuracy gains under non-IID conditions, ~28% client CPU reduction, and up to 44× lower communication overhead.
Significance. If the performance claims hold under rigorous controls, the work would contribute a practical approach to jointly addressing statistical heterogeneity, system heterogeneity, and communication constraints in FL via adaptive client architectures and logit-only communication. The multi-dataset, multi-baseline evaluation is a positive feature for empirical breadth.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the central claims of up to 15% accuracy improvement, 28% CPU reduction, and 44× communication savings under non-IID conditions are presented without reported statistical tests, confidence intervals, or variance across random seeds and non-IID partitions. This leaves the superiority over FedDF and FedDistill weakly supported.
- [§3] §3 (Proposed Method): the distillation targets are formed by aggregating client logits on a public reference set. No description is given of how the reference set is constructed to remain representative across heterogeneous client distributions or to avoid selection bias; if the set skews toward any subpopulation, the smoothed targets become mis-calibrated, directly undermining both the accuracy and efficiency gains relative to baselines that also rely on distillation.
minor comments (2)
- [§3] The neural architecture search space and the exact accuracy-resource constraint used for client-side selection are not specified, hindering reproducibility.
- [§4] Hyperparameter tuning details and the precise non-IID partitioning procedure (e.g., Dirichlet concentration or label skew ratios) are omitted from the experimental setup.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of empirical rigor and methodological clarity. We address each point below and have revised the manuscript to incorporate the suggestions where appropriate.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the central claims of up to 15% accuracy improvement, 28% CPU reduction, and 44× communication savings under non-IID conditions are presented without reported statistical tests, confidence intervals, or variance across random seeds and non-IID partitions. This leaves the superiority over FedDF and FedDistill weakly supported.
Authors: We agree that the absence of statistical tests and variance reporting weakens the strength of the empirical claims. In the revised version, we have rerun the experiments using 5 independent random seeds per non-IID partition setting. We now report mean accuracy, CPU usage, and communication cost together with standard deviations. We have also added paired t-test p-values comparing FedKDNAS against FedDF and FedDistill, showing that the reported gains remain statistically significant (p < 0.05) in the majority of evaluated settings. These changes appear in the abstract and Section 4. revision: yes
-
Referee: [§3] §3 (Proposed Method): the distillation targets are formed by aggregating client logits on a public reference set. No description is given of how the reference set is constructed to remain representative across heterogeneous client distributions or to avoid selection bias; if the set skews toward any subpopulation, the smoothed targets become mis-calibrated, directly undermining both the accuracy and efficiency gains relative to baselines that also rely on distillation.
Authors: This is a valid concern. The original manuscript did not provide sufficient detail on reference-set construction. In the revision we have added an explicit description in Section 3: the reference set is a fixed, randomly sampled collection of 2,000 examples drawn from a publicly available held-out dataset that is completely disjoint from all client training data. We further include a short sensitivity study demonstrating that performance is stable across different random draws of the reference set, thereby reducing the risk of subpopulation skew and mis-calibration. revision: yes
Circularity Check
Empirical framework with no circular derivation or self-referential claims
full rationale
The paper presents FedKDNAS as an empirical FL framework that combines client-side NAS for lightweight models with server-side aggregation of logits on a public reference set for distillation. All reported gains (accuracy, CPU, communication) are obtained from experiments across six datasets and six baselines; no equations, closed-form derivations, or fitted parameters are shown that reduce these outcomes to quantities defined within the same paper. No self-citations are invoked as load-bearing uniqueness theorems, and the central mechanism is externally falsifiable by reproducing the described protocol. The analysis is therefore self-contained with no circular steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Each client autonomously selects a lightweight model under accuracy-resource constraints. It then trains it locally using a hybrid objective combining supervised learning and knowledge distillation and shares only predictions on a public reference set. The server then aggregates and smooths these predictions...
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The server aggregates these predictions, fuses them with teacher guidance, and broadcasts a smoothed distillation target to all clients for the next round.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Communication-efficient learning of deep networks from decentralized data,
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inProc. Int. Conf. Artificial Intelligence and Statistics (AISTATS), 2017, pp. 1273-1282
work page 2017
-
[2]
Kairouz, Peter, and H. Brendan McMahan. ”Advances and open problems in federated learning.” Foundations and trends in machine learning 14.1-2 (2021): 1-210
work page 2021
-
[3]
Compressing deep neural networks: A survey,
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Compressing deep neural networks: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 10, pp. 2434-2453, 2018
work page 2018
-
[4]
Z. Liu, B. Wu, W. Luo, X. Yang, and W. Liu, ‘”Zero-shot quantization of deep neural networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
work page 2021
-
[5]
Quantization and training of neural networks for efficient integer-arithmetic-only inference,
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” inProc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704-2713
work page 2018
-
[6]
Heuristic structured pruning for deep neural networks: A survey,
Y . Tian, K. Zhang, and X. Li, “Heuristic structured pruning for deep neural networks: A survey,”ACM Computing Surveys, 2024
work page 2024
-
[7]
L. Deng, G. Li, and S. Han, “Model compression and acceleration for deep neural networks: The principles, progress, and challenges,”IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 126-136, 2020
work page 2020
-
[8]
A comprehensive survey on model compression for deep learning,
T.-H. Le, M.-T. Nguyen, and Q.-H. Pham, “A comprehensive survey on model compression for deep learning,”IEEE Access, 2024
work page 2024
-
[9]
Distilling the Knowledge in a Neural Network
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[10]
Knowledge distillation: A survey,
J. Gou, B. Yu, S. Maybank, and D. Tao, “Knowledge distillation: A survey,”International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021. 20
work page 2021
-
[11]
Neural architecture search: A survey,
S. Smithson and A. Jones, “Neural architecture search: A survey,”ACM Computing Surveys, 2016
work page 2016
-
[12]
DARTS: Differentiable architecture search,
H. Liu, K. Simonyan, and Y . Yang, “DARTS: Differentiable architecture search,” inProc. Int. Conf. Learning Representations (ICLR), 2019
work page 2019
-
[13]
F. Sattler, S. Wiedemann, K.-R. M ¨uller, and W. Samek, Robust and communication-efficient federated learning from non-IID data,IEEE Transactions on Neural Networks and Learning Systems, 31(9), 2019
work page 2019
-
[14]
Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, Federated learning with non-IID data, InNeurIPS Workshop on Machine Learning on the Phone and other Consumer Devices, 2018
work page 2018
-
[15]
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
J. Koneˇcn´y, H. B. McMahan, D. Ramage, and P. Richt ´arik, Federated optimization: Distributed machine learning for on-device intelligence, arXiv preprint arXiv:1610.02527, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
Federated optimization in heterogeneous networks,
T. Li, A. K. Sahu, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inProc. MLSys, 2020
work page 2020
-
[17]
Ditto: Fair and robust federated learning through personalization,
T. Li, S. Hu, A. Beirami, and V . Smith, “Ditto: Fair and robust federated learning through personalization,” inProc. Int. Conf. Machine Learning (ICML), 2021
work page 2021
-
[18]
Ensemble distillation for robust model fusion in federated learning,
T. Lin, L. Kong, S. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020
work page 2020
-
[19]
E. Jeong, S. Oh, J. Kim, M. Park, and M. Bennis, “Communication- efficient on-device machine learning: Federated distillation and augmen- tation,”arXiv preprint arXiv:1811.11479, 2018
-
[20]
M. Tan, B. Chen, R. Pang, V . Vasudevan, M. Sandler, A. Howard, and Q. Le, MnasNet: Platform-aware neural architecture search for mobile, InProceedings of CVPR, 2019
work page 2019
-
[21]
B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer, FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search, InProceedings of CVPR, 2019
work page 2019
- [22]
-
[23]
Li, Tian, et al. ”Federated learning: Challenges, methods, and future directions.” IEEE signal processing magazine 37.3 (2020): 50-60
work page 2020
-
[24]
Federated optimization in heterogeneous networks,
T. Li, A. K. Sahu, M. Zaheer, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inProc. MLSys, 2020
work page 2020
-
[25]
SCAFFOLD: Stochastic controlled averaging for federated learning,
S. P. Karimireddy, S. Kale, M. Mohan, S. K. R. Sanjabi, and P. Jain, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. Int. Conf. Machine Learning (ICML), 2020
work page 2020
-
[26]
C. T. Dinh, N. Tran, and T. D. Nguyen, Personalized federated learning with Moreau envelopes, InProceedings of NeurIPS, 2020
work page 2020
-
[27]
Tackling the objective inconsistency problem in heterogeneous federated optimization,
J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020
work page 2020
-
[28]
Adaptive federated optimization,
S. J. Reddi, Z. Charles, M. Zamir, and S. Sra, “Adaptive federated optimization,” inProc. Int. Conf. Learning Representations (ICLR), 2021
work page 2021
-
[29]
FedMD: Heterogeneous federated learning via model distillation,
D. Li and J. Wang, “FedMD: Heterogeneous federated learning via model distillation,” inAdvances in Neural Information Processing Systems (NeurIPS), 2019
work page 2019
-
[30]
FedAKD: Federated adaptive knowledge distillation,
M. Shahrezaei, M. S. Kouchaki, and H. R. Tizhoosh, “FedAKD: Federated adaptive knowledge distillation,” inProc. IEEE Int. Conf. Big Data, 2022
work page 2022
-
[31]
Federated learning with knowledge distillation: A survey,
Q. Li, Z. Wen, and B. He, “Federated learning with knowledge distillation: A survey,”ACM Computing Surveys, vol. 55, no. 5, pp. 1-36, 2023
work page 2023
-
[32]
Knowledge Distillation: A Good Teacher Is Patient and Consistent,
M. Beyer, S. Oudah, M. Zhmoginov, A. Oliver, and A. Kolesnikov, “Knowledge Distillation: A Good Teacher Is Patient and Consistent,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[33]
The State of Knowledge Distillation for Classification,
F. Ruffy and C. Chollet, “The State of Knowledge Distillation for Classification,”arXiv preprint arXiv:1912.11381, 2019
-
[34]
What Knowledge Gets Distilled in Knowledge Distillation?
U. Ojha, Y . Li, A. Hodjat, M. Brown, and Y . Li, “What Knowledge Gets Distilled in Knowledge Distillation?” inAdvances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[35]
Large Scale Distributed Neural Network Training through Online Distillation,
R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl, and G. E. Hinton, “Large Scale Distributed Neural Network Training through Online Distillation,”arXiv preprint arXiv:1804.03235, 2018
-
[36]
Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer,
H. Chang, V . Shejwalkar, R. Shokri, and A. Houmansadr, “Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer,”arXiv preprint arXiv:1912.11279, 2019
-
[37]
S. Itahara, T. Nishio, Y . Koda, M. Morikura, and K. Ya- mamoto, “Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data,”IEEE Transactions on Mobile Computing, 2021
work page 2021
-
[38]
FedAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning,
F. Sattler, T. Korjakow, R. Rischke, and W. Samek, “FedAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning,”IEEE Transactions on Neural Networks and Learning Systems, 2021
work page 2021
-
[39]
CFD: Communication- Efficient Federated Distillation via Soft-Label Quantization and Delta Coding,
F. Sattler, A. Marban, R. Rischke, and W. Samek, “CFD: Communication- Efficient Federated Distillation via Soft-Label Quantization and Delta Coding,”IEEE Transactions on Network Science and Engineering, 2021
work page 2021
-
[40]
Data-Free Knowledge Distillation for Heterogeneous Federated Learning,
Z. Zhu, J. Hong, and J. Zhou, “Data-Free Knowledge Distillation for Heterogeneous Federated Learning,” inProc. International Conference on Machine Learning (ICML), PMLR, 2021
work page 2021
-
[41]
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning,
L. Zhang, L. Shen, L. Ding, D. Tao, and L.-Y . Duan, “Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning,” inProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[42]
L. Zhang, D. Wu, and X. Yuan, “FedZKT: Zero-Shot Knowledge Transfer towards Resource-Constrained Federated Learning with Heterogeneous On-Device Models,”arXiv preprint arXiv:2109.03775, 2021
-
[43]
DaFKD: Domain- Aware Federated Knowledge Distillation,
H. Wang, Y . Li, W. Xu, R. Li, Y . Zhan, and Z. Zeng, “DaFKD: Domain- Aware Federated Knowledge Distillation,” inProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[44]
FedNAS: Federated deep learning via neural architecture search,
C. He, M. Annavaram, and S. Avestimehr, “FedNAS: Federated deep learning via neural architecture search,”arXiv preprint(2020)
work page 2020
-
[45]
SPIDER: Searching personalized neural architecture for federated learning,
E. Mushtaq, C. He, J. Ding, and S. Avestimehr, “SPIDER: Searching personalized neural architecture for federated learning,” inProc. AAAI Workshop on Federated Learning, 2022
work page 2022
-
[46]
Resource-aware heterogeneous federated learning using neural architecture search (RaFL),
S. Yu, T. Nguyen, and others, “Resource-aware heterogeneous federated learning using neural architecture search (RaFL),”arXiv preprint arXiv:2211.05716, 2022
-
[47]
AdaptFL: Adaptive feder- ated learning framework for heterogeneous devices,
Y . Zhang, H. Xia, S. Xu, X. Wang, and L. Xu, “AdaptFL: Adaptive feder- ated learning framework for heterogeneous devices,”Future Generation Computer Systems, vol. 165, Art. 107610, 2025
work page 2025
-
[48]
FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion,
S. Cheng, J. Wu, Y . Xiao, Y . Liu, and Y . Liu, “FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion,” inProc. International Conference on Learning Representations (ICLR), 2022
work page 2022
-
[49]
Song, Changlin, et al. ”Feddistill: Global model distillation for lo- cal model de-biasing in non-iid federated learning.” arXiv preprint arXiv:2404.09210 (2024)
-
[50]
Liu, Chenghao, et al. ”Fedet: a communication-efficient federated class- incremental learning framework based on enhanced transformer.” arXiv preprint arXiv:2306.15347 (2023)
-
[51]
Yan, Renao. ”One-Shot Neural Architecture Search with Network Similarity Directed Initialization for Pathological Image Classification.” arXiv preprint arXiv:2506.14176 (2025)
-
[52]
Hu, Li, et al. ”MHAT: An efficient model-heterogenous aggregation training scheme for federated learning.” Information Sciences 560 (2021): 493-503
work page 2021
-
[53]
On the convergence of FedAvg on non-IID data,
X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-IID data,”International Conference on Learning Representations (ICLR), 2020
work page 2020
-
[54]
Stochastic first- and zeroth-order methods for nonconvex stochastic programming,
S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,”SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341-2368, 2013
work page 2013
-
[55]
Energy and Policy Consider- ations for Deep Learning in NLP,
E. Strubell, A. Ganesh, and A. McCallum, “Energy and Policy Consider- ations for Deep Learning in NLP,” inProc. 57th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 3645–3650, 2019
work page 2019
-
[56]
Carbon Emissions and Large Neural Network Training
D. Patterson, J. Gonzalez, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. So, M. Texier, and J. Dean, “Carbon Emissions and Large Neural Network Training,”arXiv preprint arXiv:2104.10350, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [57]
-
[58]
Survey on Energy Consumption Entities on the Smartphone Platform,
G. P. Perrucci, F. H. P. Fitzek, and J. Widmer, “Survey on Energy Consumption Entities on the Smartphone Platform,” inProc. IEEE 73rd Vehicular Technology Conference (VTC Spring), pp. 1–6, 2011
work page 2011
-
[59]
An Analysis of Power Consumption in a Smartphone,
A. Carroll and G. Heiser, “An Analysis of Power Consumption in a Smartphone,” inProc. USENIX Annual Technical Conference (ATC), pp. 21–21, 2010
work page 2010
-
[60]
V . Lannelongue, J. Grealey, and M. Inouye, “CodeCarbon: Estimate and Track Carbon Emissions from Machine Learning Computing,”arXiv preprint arXiv:2002.05651, 2023
-
[61]
PowerJoular and JoularJX: Multi- Platform Software Power Monitoring Tools,
A. Noureddine and R. Rouvoy, “PowerJoular and JoularJX: Multi- Platform Software Power Monitoring Tools,” inProc. 36th Int. Conf. on Advanced Information Networking and Applications (AINA), pp. 97–109, 2022
work page 2022
-
[62]
Shokri, Reza, et al. ”Membership inference attacks against machine learning models.” 2017 IEEE symposium on security and privacy (SP). IEEE, 2017
work page 2017
-
[63]
Medjadji, Chaimaa, et al. ”FedSparQ: Adaptive Sparse Quantization with Error Feedback for Robust & Efficient Federated Learning.” 2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA). IEEE, 2025. 21
work page 2025
-
[64]
Human Activity Recognition from Continuous Ambient Sensor Data
Cook & Thomas, B. Human Activity Recognition from Continuous Ambient Sensor Data. (UCI Machine Learning Repository,2012), DOI: https://doi.org/10.24432/C5D60P
-
[65]
Cohen, G., Afshar, S., Tapson, J. & Van Schaik, A. EMNIST: Extending MNIST to handwritten letters.2017 International Joint Conference On Neural Networks (IJCNN). pp. 2921-2926 (2017)
work page 2017
-
[66]
& Others Learning multiple layers of features from tiny images
Krizhevsky, A., Hinton, G. & Others Learning multiple layers of features from tiny images. (Toronto, ON, Canada,2009)
work page 2009
-
[67]
Deng, L. The mnist database of handwritten digit images for machine learning research [best of the web].IEEE Signal Processing Magazine. 29, 141-142 (2012)
work page 2012
-
[68]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Xiao, H., Rasul, K. & V ollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.ArXiv Preprint ArXiv:1708.07747. (2017) APPENDIX Subtracting consecutive iterates of the EMA recurrence (13) gives ˜Z(r) − ˜Z(r−1) =γ ˜Z(r−1) + (1−γ)Z (r) − ˜Z(r−1) (30) = (1−γ) Z(r) − ˜Z(r−1) ,(31) establishing the first equality. To obta...
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.