Analytic Personalized Federated Meta-Learning

Bangbang Ren; Chaoqun You; Deke Guo; Lailong Luo; Shunxian Gu; Zaipeng Xie; Zhihao Qu

arxiv: 2502.06915 · v2 · submitted 2025-02-10 · 💻 cs.DC · cs.LG

Analytic Personalized Federated Meta-Learning

Shunxian Gu , Chaoqun You , Deke Guo , Zhihao Qu , Bangbang Ren , Zaipeng Xie , Lailong Luo This is my paper

Pith reviewed 2026-05-23 03:57 UTC · model grok-4.3

classification 💻 cs.DC cs.LG

keywords analytic federated learningpersonalized meta-learningdeep neural networksleast-squares solutionsgradient-free trainingmodel personalizationfederated learningheterogeneous data

0 comments

The pith

pFedACnnL delivers personalized models for heterogeneous federated clients via analytic local objectives while FedACnnL trains DNNs layer-wise as distributed least-squares problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Analytic Federated Learning updates a global model in one closed-form least-squares step without gradients, but the resulting model performs poorly when client data distributions differ. The paper introduces FedACnnL to extend this approach to deep networks by modeling the training of each layer as a separate distributed least-squares problem. Building on that foundation, pFedACnnL solves an additional local analytic objective for each client that pulls the shared model toward that client's data distribution. Experiments report 4 to 8 percent higher test accuracy than the non-personalized version and state-of-the-art results across most convex and non-convex tasks, together with training-time reductions of 83 to 99 percent relative to conventional federated frameworks.

Core claim

By treating each DNN layer as an independent distributed least-squares problem, FedACnnL enables gradient-free collaborative training of deep models inside an analytic federated framework; pFedACnnL then produces a client-specific model by analytically solving a local objective that reconciles the global solution with the client's individual data distribution.

What carries the argument

Layer-wise modeling of DNN training as distributed least-squares problems, paired with closed-form analytic solution of a client-specific personalization objective.

If this is right

pFedACnnL improves test accuracy by 4 to 8 percent over the non-personalized FedACnnL baseline.
FedACnnL reduces DNN training time by 83 to 99 percent compared with conventional federated learning frameworks.
pFedACnnL reaches state-of-the-art performance in most convex and non-convex experimental settings.
The analytic personalization step supports fast adaptation for complex federated tasks without requiring gradient information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same layer-wise least-squares construction could be applied to other gradient-free federated settings that currently rely on iterative updates.
Because each personalization step is a closed-form solve, the approach may scale to very large numbers of clients with lower per-client computation than gradient-based meta-learning.
If the layer-wise assumption holds for convolutional networks, it may also hold for other architectures such as transformers when the same distributed least-squares decomposition is applied.

Load-bearing premise

Modeling the training of each layer as a distributed least-squares problem enables effective DNN collaborative training within the AFL framework.

What would settle it

A controlled experiment on a heterogeneous image-classification task in which the layer-wise least-squares models produce lower accuracy than standard gradient-based federated meta-learning baselines would falsify the claim that the approach supports effective DNN training.

Figures

Figures reproduced from arXiv: 2502.06915 by Bangbang Ren, Chaoqun You, Deke Guo, Lailong Luo, Shunxian Gu, Zaipeng Xie, Zhihao Qu.

**Figure 1.** Figure 1: The resampling and flattening process in convolutional layers to make the weights in it updatable by the equation 2. in it. Finally, we give a theoretical analysis on the complexity of the computation and communication in FedACnnL. 3.1. Distributed LS Problem Based on the supervised learning property of ACnnL in each hidden layer, we can model the weight update problem in each layer as a distributed LS pro… view at source ↗

**Figure 3.** Figure 3: The overview of pFedACnnL. The cyan dashed box represents the initialization stage while the brown dashed box and the orange dashed box represent the federated optimization stage and the local personalization stage respectively. Then, a local model is initialized with the same structure on each client. Next, different from FedACnnL, pFedACnnL requires each client to upload an encoded label vector, which is… view at source ↗

**Figure 4.** Figure 4: The averaged total training time of each client in each framework on the MNIST and synthetic datasets [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 6.** Figure 6: The model performance and training time of our proposed frameworks on the CIFAR-10 dataset using the DCNN model. we present the averaged total training time for each framework in [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

Analytic Federated Learning (AFL) is an enhanced gradient-free federated learning (FL) paradigm designed to accelerate training by updating the global model in a single step with closed-form least-square (LS) solutions. However, the obtained global model suffers performance degradation across clients with heterogeneous data distribution. Meta-learning is a common approach to tackle this problem by delivering personalized local models for individual clients. Yet, integrating meta-learning with AFL presents significant challenges: First, conventional AFL frameworks cannot support deep neural network (DNN) training which can influence the fast adaption capability of meta-learning for complex FL tasks. Second, the existing meta-learning method requires gradient information, which is not involved in AFL. To overcome the first challenge, we propose an AFL framework, namely FedACnnL, in which a layer-wise DNN collaborative training method is designed by modeling the training of each layer as a distributed LS problem. For the second challenge, we further propose an analytic personalized federated meta-learning framework, namely pFedACnnL. It generates a personalized model for each client by analytically solving a local objective which bridges the gap between the global model and the individual data distribution. FedACnnL is theoretically proven to require significantly shorter training time than the conventional FL frameworks on DNN training while the reduction ratio is $83\%\sim99\%$ in the experiment. Meanwhile, pFedACnnL excels at test accuracy with the vanilla FedACnnL by $4\%\sim8\%$ and it achieves state-of-the-art (SOTA) model performance in most cases of convex and non-convex settings compared with previous SOTA frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The layer-wise least-squares modeling for DNNs in FedACnnL does not hold up because non-linear activations make the per-layer objectives non-quadratic.

read the letter

The core problem is that FedACnnL models each DNN layer's training as an independent distributed least-squares problem to get closed-form updates. This cannot work as described for standard networks because activations like ReLU turn the effective mapping non-linear, so the analytic solution solves a different objective than actual DNN training. That undercuts both the claimed 83-99% time cut and the extension to non-convex cases in pFedACnnL. The stress-test note matches what the abstract itself says about the construction. The personalization step in pFedACnnL is described as analytically solving a local objective, but without equations it is unclear how it avoids the same mismatch or reduces to something already covered by prior AFL work. The abstract mentions theoretical proofs and SOTA results on convex and non-convex settings, yet supplies no derivations, error bounds, or dataset details, so those claims stay uncheckable from the given text. What is new is the concrete pairing of layer-wise analytic training with an analytic personalization step inside a gradient-free FL loop; that combination does not appear in the cited prior methods. The direction of seeking closed-form updates for speed in heterogeneous FL is reasonable on its face, and the reported 4-8% accuracy lift over the non-personalized version would matter if reproducible. This paper is mainly for people already working on analytic or gradient-free federated methods who want to see one attempt at scaling them to DNNs. A reader focused on practical edge deployment might skim the frameworks for ideas, but the missing grounding means it is not ready for citation or direct use. It deserves a serious referee to check whether the full derivations fix the non-linearity gap or whether the experiments actually test real DNN objectives.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FedACnnL, an analytic federated learning framework for DNN training that models each layer's training as an independent distributed least-squares problem to obtain closed-form global updates in a single step without gradients. It further introduces pFedACnnL, which produces personalized client models by analytically solving a local objective that bridges the global model and heterogeneous data distributions. The paper asserts that FedACnnL reduces training time by 83%–99% relative to conventional FL frameworks, that pFedACnnL improves accuracy by 4%–8% over the non-personalized version, and that both achieve SOTA results in convex and non-convex regimes.

Significance. If the layer-wise LS construction correctly recovers the original DNN training dynamics (including non-linear activations) and the analytic personalization step is valid, the approach would constitute a meaningful advance in gradient-free FL by delivering orders-of-magnitude speed-ups together with built-in personalization, while supplying closed-form solutions that aid reproducibility and theoretical analysis.

major comments (2)

[Abstract] Abstract: the central claim that FedACnnL enables effective DNN collaborative training (and thereby supports meta-learning adaptation) in non-convex settings rests on modeling each layer as a distributed LS problem. Because standard DNN layers contain non-linear activations, the per-layer objective is neither linear nor quadratic; the closed-form solution therefore optimizes a surrogate rather than the original objective. This directly undermines both the asserted 83%–99% time reduction and the SOTA accuracy claims for non-convex regimes.
[Abstract] Abstract: the theoretical proof that FedACnnL requires significantly shorter training time is asserted without any derivation, complexity analysis, or explicit statement of the linearity/convexity assumptions required for the LS reduction. Without these details the 83%–99% experimental ratio cannot be evaluated for general DNNs.

minor comments (2)

Dataset descriptions, preprocessing steps, and hyper-parameter settings are absent from the provided text, preventing verification of the reported accuracy gains.
No equations, algorithm boxes, or pseudocode are supplied to show how the layer-wise LS mapping or the analytic personalization objective are formulated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address the two major points below regarding the layer-wise LS modeling for non-convex DNNs and the missing theoretical details on training time reduction. We plan revisions to clarify assumptions and add analysis.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that FedACnnL enables effective DNN collaborative training (and thereby supports meta-learning adaptation) in non-convex settings rests on modeling each layer as a distributed LS problem. Because standard DNN layers contain non-linear activations, the per-layer objective is neither linear nor quadratic; the closed-form solution therefore optimizes a surrogate rather than the original objective. This directly undermines both the asserted 83%–99% time reduction and the SOTA accuracy claims for non-convex regimes.

Authors: We acknowledge that non-linear activations make each layer's objective non-quadratic, so the closed-form LS solution optimizes a surrogate obtained by solving the linear weights independently after propagating features from prior layers. The manuscript presents this layer-wise construction as an enabling approximation for gradient-free analytic updates in DNNs. Empirical results across convex and non-convex regimes support the reported speed-ups and accuracy, but we agree the distinction from the original objective should be stated explicitly. We will revise the abstract and add a section clarifying the surrogate nature of the per-layer LS and its implications. revision: yes
Referee: [Abstract] Abstract: the theoretical proof that FedACnnL requires significantly shorter training time is asserted without any derivation, complexity analysis, or explicit statement of the linearity/convexity assumptions required for the LS reduction. Without these details the 83%–99% experimental ratio cannot be evaluated for general DNNs.

Authors: The shorter training time follows from completing each layer's global update via a single closed-form LS step requiring one communication round, versus iterative gradient steps over epochs. We will add the requested derivation, a formal complexity comparison of communication and computation, and explicit statements of the per-layer linearity assumption (after feature propagation) in the revised manuscript. This will allow readers to evaluate the 83%–99% ratio under the stated conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The abstract and provided excerpts describe FedACnnL as modeling each DNN layer as an independent distributed LS problem to obtain closed-form updates, and pFedACnnL as analytically solving a local objective for personalization. No quoted equations, self-citations, or steps reduce any claimed prediction, uniqueness result, or time-reduction proof to a fitted parameter or prior self-result by construction. The modeling choice is presented as an enabling assumption rather than a self-referential definition, and experimental ratios (83-99%) are reported separately from the analytic construction. The derivation chain therefore remains self-contained against external benchmarks with no load-bearing reductions to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review is abstract-only; ledger is therefore minimal and provisional. The work rests on standard least-squares solvability and the viability of per-layer decomposition for DNNs.

axioms (2)

domain assumption Least-squares closed-form solutions can replace gradient-based updates for model training in federated settings
Core premise of the AFL paradigm stated in the abstract.
domain assumption Modeling each DNN layer independently as a distributed LS problem preserves sufficient representational power for complex tasks
Required for the FedACnnL claim that the method supports DNN training and meta-learning adaptation.

pith-pipeline@v0.9.0 · 5843 in / 1443 out tokens · 38909 ms · 2026-05-23T03:57:15.821463+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Federatedlearningforiotdevices:Enhancingtinyml with on-board training

Ficco, M., Guerriero, A., Milite, E., Palmieri, F., Pietrantuono, R., Russo,S.,2024. Federatedlearningforiotdevices:Enhancingtinyml with on-board training. Information Fusion 104, 102189

work page 2024
[2]

McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.,

work page
[3]

1273–1282

Communication-efficient learning of deep networks from decentralizeddata,in:Artificialintelligenceandstatistics,PMLR.pp. 1273–1282

work page
[4]

IEEE Internet of Things Journal

Liu,J.,Huo,Y.,Qu,P.,Xu,S.,Liu,Z.,Ma,Q.,Huang,J.,2024.Fedcd: A hybrid federated learning framework for efficient training with iot devices. IEEE Internet of Things Journal

work page 2024
[5]

arXiv preprint arXiv:2403.11041

Sen,M.,Qin,A.K.,etal.,2024.Fagh:Acceleratingfederatedlearning with approximated global hessian. arXiv preprint arXiv:2403.11041

work page arXiv 2024
[6]

Baffle: A baseline of backpropagation-free federated learning, in: European Conference on Computer Vision, Springer

Feng, H., Pang, T., Du, C., Chen, W., Yan, S., Lin, M., 2025. Baffle: A baseline of backpropagation-free federated learning, in: European Conference on Computer Vision, Springer. pp. 89–109

work page 2025
[7]

Haq: Hardware- aware automated quantization with mixed precision, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pp

Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S., 2019. Haq: Hardware- aware automated quantization with mixed precision, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pp. 8612–8620

work page 2019
[8]

Federated bayesian opti- mization via thompson sampling

Dai, Z., Low, B.K.H., Jaillet, P., 2020. Federated bayesian opti- mization via thompson sampling. Advances in Neural Information Processing Systems 33, 9687–9699

work page 2020
[9]

Automatica 142, 110353

Yi,X.,Zhang,S.,Yang,T.,Johansson,K.H.,2022.Zeroth-orderalgo- rithmsforstochasticdistributednonconvexoptimization. Automatica 142, 110353

work page 2022
[10]

Communication-efficientstochasticzeroth-orderoptimizationforfed- erated learning

Fang, W., Yu, Z., Jiang, Y., Shi, Y., Jones, C.N., Zhou, Y., 2022. Communication-efficientstochasticzeroth-orderoptimizationforfed- erated learning. IEEE Transactions on Signal Processing 70, 5058– 5073

work page 2022
[11]

Fine-grained theoretical analysis of federated zeroth-order optimization

Chen, J., Chen, H., Gu, B., Deng, H., 2024. Fine-grained theoretical analysis of federated zeroth-order optimization. Advances in Neural Information Processing Systems 36

work page 2024
[12]

AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models

Zhuang, H., He, R., Tong, K., Fang, D., Sun, H., Li, H., Chen, T., Zeng, Z., 2024. Analytic federated learning. arXiv preprint arXiv:2405.16240 . Shunxian Gu et al.:Preprint submitted to Elsevier Page 11 of 12 Analytic Personalized Federated Meta-Learning

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Sabah, F., Chen, Y., Yang, Z., Azam, M., Ahmad, N., Sarwar, R.,

work page
[14]

Expert Systems with Applications 243, 122874

Model optimization techniques in personalized federated learning: A survey. Expert Systems with Applications 243, 122874

work page
[15]

Configure your federation:hierarchicalattention-enhancedmeta-learningnetworkfor personalized federated learning

Gao, Y., Wang, P., Liu, L., Zhang, C., Ma, H., 2023. Configure your federation:hierarchicalattention-enhancedmeta-learningnetworkfor personalized federated learning. ACM Transactions on Intelligent Systems and Technology 14, 1–24

work page 2023
[16]

Personalized federated learning on non-iid data via group-based meta-learning

Yang, L., Huang, J., Lin, W., Cao, J., 2023. Personalized federated learning on non-iid data via group-based meta-learning. ACM Transactions on Knowledge Discovery from Data 17, 1–20

work page 2023
[17]

Personalized federated learning with moreau envelopes

T Dinh, C., Tran, N., Nguyen, J., 2020. Personalized federated learning with moreau envelopes. Advances in neural information processing systems 33, 21394–21405

work page 2020
[18]

Ananalyticformula- tionofconvolutionalneuralnetworklearningforpatternrecognition

Zhuang,H.,Lin,Z.,Yang,Y.,Toh,K.A.,2025. Ananalyticformula- tionofconvolutionalneuralnetworklearningforpatternrecognition. Information Sciences 686, 121317

work page 2025
[19]

Learning from the kernel and the range space, in:2018IEEE/ACIS17thInternationalConferenceonComputerand Information Science (ICIS), IEEE

Toh, K.A., 2018. Learning from the kernel and the range space, in:2018IEEE/ACIS17thInternationalConferenceonComputerand Information Science (ICIS), IEEE. pp. 1–6

work page 2018
[20]

Zorb: A derivative- free backpropagation algorithm for neural networks

Ranganathan, V., Lewandowski, A., 2020. Zorb: A derivative- free backpropagation algorithm for neural networks. arXiv preprint arXiv:2011.08895

work page arXiv 2020
[21]

Gpil:Gradientwithpseudoinverse learning for high accuracy fine-tuning, in: 2023 IEEE 5th Interna- tional Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE

Lee,G.,Kim,N.J.,Kim,H.,2023. Gpil:Gradientwithpseudoinverse learning for high accuracy fine-tuning, in: 2023 IEEE 5th Interna- tional Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE. pp. 1–5

work page 2023
[22]

The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]

Li Deng, 2012. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Processing Magazine 29, 141–142. doi:10.1109/MSP.2012.2211477

work page doi:10.1109/msp.2012.2211477 2012
[23]

Learning multiple layers of features from tiny images

Krizhevsky, A., Hinton, G., et al., 2009. Learning multiple layers of features from tiny images

work page 2009
[24]

Gradient-based learning applied to document recognition

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324

work page 1998
[25]

Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.,

work page
[26]

Proceed- ings of Machine learning and systems 2, 429–450

Federated optimization in heterogeneous networks. Proceed- ings of Machine learning and systems 2, 429–450

work page
[27]

Lcfed: An efficient clustered federated learning framework for heterogeneous data

Zhang, Y., Chen, H., Lin, Z., Chen, Z., Zhao, J., 2025. Lcfed: An efficient clustered federated learning framework for heterogeneous data. arXiv preprint arXiv:2501.01850

work page arXiv 2025
[28]

Gradient free personalized federated learning, in: Proceedings of the 53rd International Conference on Parallel Processing, pp

Chen, H., Zhang, Y., Zhao, J., Wang, X., Xu, Y., 2024. Gradient free personalized federated learning, in: Proceedings of the 53rd International Conference on Parallel Processing, pp. 971–980. Shunxian Gu received the B.S degree in Science and Technology of Intelligence from Hohai Uni- versity, Nanjing, China, in 2023. Currently, he is an M.Sc. student at ...

work page 2024

[1] [1]

Federatedlearningforiotdevices:Enhancingtinyml with on-board training

Ficco, M., Guerriero, A., Milite, E., Palmieri, F., Pietrantuono, R., Russo,S.,2024. Federatedlearningforiotdevices:Enhancingtinyml with on-board training. Information Fusion 104, 102189

work page 2024

[2] [2]

McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.,

work page

[3] [3]

1273–1282

Communication-efficient learning of deep networks from decentralizeddata,in:Artificialintelligenceandstatistics,PMLR.pp. 1273–1282

work page

[4] [4]

IEEE Internet of Things Journal

Liu,J.,Huo,Y.,Qu,P.,Xu,S.,Liu,Z.,Ma,Q.,Huang,J.,2024.Fedcd: A hybrid federated learning framework for efficient training with iot devices. IEEE Internet of Things Journal

work page 2024

[5] [5]

arXiv preprint arXiv:2403.11041

Sen,M.,Qin,A.K.,etal.,2024.Fagh:Acceleratingfederatedlearning with approximated global hessian. arXiv preprint arXiv:2403.11041

work page arXiv 2024

[6] [6]

Baffle: A baseline of backpropagation-free federated learning, in: European Conference on Computer Vision, Springer

Feng, H., Pang, T., Du, C., Chen, W., Yan, S., Lin, M., 2025. Baffle: A baseline of backpropagation-free federated learning, in: European Conference on Computer Vision, Springer. pp. 89–109

work page 2025

[7] [7]

Haq: Hardware- aware automated quantization with mixed precision, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pp

Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S., 2019. Haq: Hardware- aware automated quantization with mixed precision, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pp. 8612–8620

work page 2019

[8] [8]

Federated bayesian opti- mization via thompson sampling

Dai, Z., Low, B.K.H., Jaillet, P., 2020. Federated bayesian opti- mization via thompson sampling. Advances in Neural Information Processing Systems 33, 9687–9699

work page 2020

[9] [9]

Automatica 142, 110353

Yi,X.,Zhang,S.,Yang,T.,Johansson,K.H.,2022.Zeroth-orderalgo- rithmsforstochasticdistributednonconvexoptimization. Automatica 142, 110353

work page 2022

[10] [10]

Communication-efficientstochasticzeroth-orderoptimizationforfed- erated learning

Fang, W., Yu, Z., Jiang, Y., Shi, Y., Jones, C.N., Zhou, Y., 2022. Communication-efficientstochasticzeroth-orderoptimizationforfed- erated learning. IEEE Transactions on Signal Processing 70, 5058– 5073

work page 2022

[11] [11]

Fine-grained theoretical analysis of federated zeroth-order optimization

Chen, J., Chen, H., Gu, B., Deng, H., 2024. Fine-grained theoretical analysis of federated zeroth-order optimization. Advances in Neural Information Processing Systems 36

work page 2024

[12] [12]

AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models

Zhuang, H., He, R., Tong, K., Fang, D., Sun, H., Li, H., Chen, T., Zeng, Z., 2024. Analytic federated learning. arXiv preprint arXiv:2405.16240 . Shunxian Gu et al.:Preprint submitted to Elsevier Page 11 of 12 Analytic Personalized Federated Meta-Learning

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Sabah, F., Chen, Y., Yang, Z., Azam, M., Ahmad, N., Sarwar, R.,

work page

[14] [14]

Expert Systems with Applications 243, 122874

Model optimization techniques in personalized federated learning: A survey. Expert Systems with Applications 243, 122874

work page

[15] [15]

Configure your federation:hierarchicalattention-enhancedmeta-learningnetworkfor personalized federated learning

Gao, Y., Wang, P., Liu, L., Zhang, C., Ma, H., 2023. Configure your federation:hierarchicalattention-enhancedmeta-learningnetworkfor personalized federated learning. ACM Transactions on Intelligent Systems and Technology 14, 1–24

work page 2023

[16] [16]

Personalized federated learning on non-iid data via group-based meta-learning

Yang, L., Huang, J., Lin, W., Cao, J., 2023. Personalized federated learning on non-iid data via group-based meta-learning. ACM Transactions on Knowledge Discovery from Data 17, 1–20

work page 2023

[17] [17]

Personalized federated learning with moreau envelopes

T Dinh, C., Tran, N., Nguyen, J., 2020. Personalized federated learning with moreau envelopes. Advances in neural information processing systems 33, 21394–21405

work page 2020

[18] [18]

Ananalyticformula- tionofconvolutionalneuralnetworklearningforpatternrecognition

Zhuang,H.,Lin,Z.,Yang,Y.,Toh,K.A.,2025. Ananalyticformula- tionofconvolutionalneuralnetworklearningforpatternrecognition. Information Sciences 686, 121317

work page 2025

[19] [19]

Learning from the kernel and the range space, in:2018IEEE/ACIS17thInternationalConferenceonComputerand Information Science (ICIS), IEEE

Toh, K.A., 2018. Learning from the kernel and the range space, in:2018IEEE/ACIS17thInternationalConferenceonComputerand Information Science (ICIS), IEEE. pp. 1–6

work page 2018

[20] [20]

Zorb: A derivative- free backpropagation algorithm for neural networks

Ranganathan, V., Lewandowski, A., 2020. Zorb: A derivative- free backpropagation algorithm for neural networks. arXiv preprint arXiv:2011.08895

work page arXiv 2020

[21] [21]

Gpil:Gradientwithpseudoinverse learning for high accuracy fine-tuning, in: 2023 IEEE 5th Interna- tional Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE

Lee,G.,Kim,N.J.,Kim,H.,2023. Gpil:Gradientwithpseudoinverse learning for high accuracy fine-tuning, in: 2023 IEEE 5th Interna- tional Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE. pp. 1–5

work page 2023

[22] [22]

The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]

Li Deng, 2012. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Processing Magazine 29, 141–142. doi:10.1109/MSP.2012.2211477

work page doi:10.1109/msp.2012.2211477 2012

[23] [23]

Learning multiple layers of features from tiny images

Krizhevsky, A., Hinton, G., et al., 2009. Learning multiple layers of features from tiny images

work page 2009

[24] [24]

Gradient-based learning applied to document recognition

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324

work page 1998

[25] [25]

Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.,

work page

[26] [26]

Proceed- ings of Machine learning and systems 2, 429–450

Federated optimization in heterogeneous networks. Proceed- ings of Machine learning and systems 2, 429–450

work page

[27] [27]

Lcfed: An efficient clustered federated learning framework for heterogeneous data

Zhang, Y., Chen, H., Lin, Z., Chen, Z., Zhao, J., 2025. Lcfed: An efficient clustered federated learning framework for heterogeneous data. arXiv preprint arXiv:2501.01850

work page arXiv 2025

[28] [28]

Gradient free personalized federated learning, in: Proceedings of the 53rd International Conference on Parallel Processing, pp

Chen, H., Zhang, Y., Zhao, J., Wang, X., Xu, Y., 2024. Gradient free personalized federated learning, in: Proceedings of the 53rd International Conference on Parallel Processing, pp. 971–980. Shunxian Gu received the B.S degree in Science and Technology of Intelligence from Hohai Uni- versity, Nanjing, China, in 2023. Currently, he is an M.Sc. student at ...

work page 2024