pith. machine review for the scientific record. sign in

arxiv: 2604.27833 · v1 · submitted 2026-04-30 · 💻 cs.CV · cs.LG

Recognition: unknown

Taming Noise-Induced Prototype Degradation for Privacy-Preserving Personalized Federated Fine-Tuning

Authors on Pith no claims yet

Pith reviewed 2026-05-07 06:34 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords federated learningdifferential privacyprototype perturbationpersonalized federated learningnoise adaptationclipping regularizationprivacy-utility trade-offmulti-domain adaptation
0
0 comments X

The pith

Variance-adaptive noise allocation in class prototypes reduces degradation while preserving local differential privacy in federated fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper aims to improve the balance between privacy and performance when sharing class prototypes across clients in personalized federated learning. The core idea is that uniform noise addition hurts more in directions where classes are already well separated by high variance, so the method reduces noise there. It also regularizes the model features to sit close to the clipping bound using a distillation signal. A sympathetic reader would care because this lets federated systems handle diverse data domains more effectively without increasing privacy risks or dropping accuracy as much. The theoretical part confirms the privacy level stays at least as strong as the standard approach.

Core claim

The authors establish that their groupwise variance-adaptive prototype perturbation mechanism, together with distillation-guided clipping regularization, delivers privacy guarantees no weaker than isotropic Gaussian perturbation under identical constraints. By allocating noise according to per-dimension class variance and concentrating feature norms near the clipping threshold, the approach limits prototype degradation in semantically important directions while satisfying the same differential privacy bounds.

What carries the argument

The Variance-adaptive Prototype Perturbation (VPP) that scales noise inversely to class variance per dimension, paired with Distillation-guided Clipping Regularization (DCR) that uses prediction consistency from a teacher model to adapt norms to the clip threshold.

If this is right

  • VPDR integrates as a client-side plug-in into existing ProtoPFL methods.
  • It delivers superior privacy-utility trade-offs on multi-domain benchmarks.
  • The approach maintains robustness against realistic attacks.
  • Privacy guarantees are no weaker than those of isotropic Gaussian noise.
  • Feature norms concentrate near the clipping threshold without losing prediction consistency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Data-dependent noise based on variance might apply to other private data aggregation tasks.
  • The variance-discriminability assumption could extend the method to additional modalities like text.
  • Combining this with secure multi-party computation could further strengthen privacy in federated settings.

Load-bearing premise

The premise that higher dimension-wise class variance corresponds directly to greater discriminability, justifying lower noise there, and that distillation can enforce norm concentration near the clipping threshold without altering the model's predictions.

What would settle it

If experiments on multi-domain benchmarks show that VPDR either provides weaker privacy or lower utility than the isotropic baseline under the same constraints, the claims would not hold.

Figures

Figures reproduced from arXiv: 2604.27833 by Hainan Zhang, Huan Zhang, Qinnan Zhang, Wangjie Qiu, Xiaodong Li, Yifan Sun, Yongxin Tong, Yuhua Wang, Zhiming Zheng.

Figure 1
Figure 1. Figure 1: Motivation illustration. In (b), the dashed circle shows uniform noise and the dashed ellipse our adaptive noise. Isotropic noise shifts the blue Class 1 prototype into the red Class 2 region. In (c), a generous clipping bound leaves feature norms almost un￾changed but forces a large noise scale, while an aggressive bound shrinks many features, causing severe information loss. has largely consumed availabl… view at source ↗
Figure 2
Figure 2. Figure 2: Architecture illustration of the ProtoPFL with VPDR. ① Private Prototype Calculation: each client runs VPP (Section 4.2) to privately partition embeddings, apply groupwise clipping, and add adaptive noise to prototypes; ② Upload: clients send privatized proto￾types; ③ Global Prototype Generation: the server aggregates or trains global prototypes; ④ Download: clients receive global prototypes; ⑤ Local Fine-… view at source ↗
Figure 3
Figure 3. Figure 3: Correlation between the discriminative score Sj and label mutual information I(zj ; y) on PACS. son and Spearman coefficients consistently exceeding 0.90 across domains (Figure 3b). This makes Sj a reliable and lightweight proxy for dimension-wise discriminability. Although never uploaded, the selected index set deter￾mines the groupwise clipping bounds and noise covariance. Thus, the released prototypes r… view at source ↗
Figure 4
Figure 4. Figure 4: Evaluation of feature norm and logit difference of FedPLVM [42] with IGPP on Office–Caltech. tillation that enforces prediction consistency between pre￾and post-clipped features without parameter coupling. Con￾cretely, we split the local classifier into a trainable student head fm and a momentum teacher f t m updated as θ t m = βθt m + (1 − β)θm, (16) where θ and θ t m denote the student and teacher parame… view at source ↗
Figure 5
Figure 5. Figure 5: Hyperparameter sensitivity of FedProto [36] framework with VPDR on PACS view at source ↗
Figure 6
Figure 6. Figure 6: reports the per-round, per-client wall-clock time on Office-Caltech and PACS, decomposed into Prototype Generation (ProtoGen) and Local Fine-Tuning (FT). Across six ProtoPFL frameworks, VPDR increases total runtime by an average of only 0.28s (8.1%) compared to the IGPP FedProto FedPCL FPL FedPLVM FedTGP MPFT Methods 0 1 2 3 Time (seconds) 0.87 0.99 1.99 2.13 0.90 1.02 2.06 2.26 0.93 0.90 2.22 2.25 0.92 1.… view at source ↗
Figure 7
Figure 7. Figure 7: compares how the per-client local DP budget (ϵ, δ) is allocated in IGPP and VPP. IGPP spends the entire budget on an isotropic Gaussian prototype mechanism, choosing σiso so that the resulting T-round mechanism is (ϵ, δ)–LDP for each client (Theorem 3.1). In contrast, VPP decomposes IGPP VPP reference mechanism 2 2 2 2 ref 1 1 1 ( , ) ( , ), A B A B           iso 1 ln(1/ ) ( , ) T c       … view at source ↗
Figure 8
Figure 8. Figure 8: T-SNE visualization of FedProto [36] framework with VPDR on Digits at T = 1 (top row) and T = 20 (bottom row). Test accuracy (%) is shown above each subplot. E.4. Model with Data Heterogeneity To evaluate VPDR under highly challenging conditions, we couple architectural diversity with two forms of data skew: (i) domain shift across multi-domain benchmarks, and (ii) label imbalance on CIFAR-10. We instantia… view at source ↗
read the original abstract

Prototype-based Personalized Federated Learning (ProtoPFL) enables efficient multi-domain adaptation by communicating compact class prototypes, but directly sharing them poses privacy risks. A common defense involves per-example $\ell_2$ clipping before prototype computation to bound sensitivity, followed by isotropic Gaussian noise to enforce Local Differential Privacy (LDP). However, Isotropic Gaussian Prototype Perturbation (IGPP) typically over-perturbs discriminative dimensions and struggles to balance the clipping threshold with representation fidelity. In this paper, we propose VPDR, a client-side privacy plug-in that seamlessly integrates into existing ProtoPFLs. Motivated by the observation that dimension-wise class variance reflects discriminability, we introduce Variance-adaptive Prototype Perturbation (VPP), which allocates less noise to discriminative subspaces, preserving semantic separability while ensuring privacy. We further develop Distillation-guided Clipping Regularization (DCR), which enables feature norms to adaptively concentrate near the predefined clipping threshold while maintaining prediction consistency. Theoretical analysis shows that our groupwise mechanism provides privacy guarantees no weaker than the isotropic baseline under the same privacy constraints. Extensive experiments on multi-domain benchmarks demonstrate that VPDR achieves a superior privacy-utility trade-off, outperforming IGPP in personalized federated fine-tuning without sacrificing robustness against realistic attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes VPDR, a client-side privacy plug-in for prototype-based personalized federated learning (ProtoPFL). It introduces Variance-adaptive Prototype Perturbation (VPP) that allocates Gaussian noise per dimension or group according to empirical class-wise variance (less noise in high-variance discriminative subspaces) and Distillation-guided Clipping Regularization (DCR) that encourages feature norms to concentrate near a fixed clipping threshold while preserving prediction consistency. The central claims are that the resulting groupwise mechanism yields LDP guarantees no weaker than standard isotropic Gaussian prototype perturbation (IGPP) under identical privacy budgets, and that extensive experiments on multi-domain benchmarks show improved privacy-utility trade-offs without loss of robustness to realistic attacks.

Significance. If the privacy analysis holds, the work offers a practical way to reduce over-perturbation of discriminative dimensions in prototype sharing, which is a recurring bottleneck in privacy-preserving ProtoPFL. The empirical demonstration of better trade-offs versus IGPP and the seamless integration as a plug-in are useful contributions. The paper supplies reproducible experimental protocols and attack evaluations, which strengthens its practical value if the theoretical guarantee is made rigorous.

major comments (3)
  1. [Theoretical Analysis] Theoretical Analysis section (around the statement that the groupwise mechanism provides privacy guarantees no weaker than the isotropic baseline): the proof must explicitly bound the sensitivity of the empirical dimension-wise variance estimator used to set per-group sigma in VPP. Because sigma is data-dependent, the mechanism is not a simple post-processing of a fixed-sensitivity Gaussian mechanism; without a sensitivity bound or a worst-case privacy-loss argument over all possible variance realizations, the claim that privacy is 'no weaker' under the same constraints does not follow from standard Gaussian LDP calibration.
  2. [§3.2] §3.2 (VPP definition): the decision to allocate noise according to observed dimension-wise class variance relies on the assumption that variance directly reflects discriminability. This assumption is used both to motivate the method and to set the grouping thresholds; if the thresholds themselves are data-dependent, the mechanism requires an additional privacy accounting step that is not shown to be absorbed into the stated epsilon budget.
  3. [§4] §4 (DCR): the regularization is claimed to make feature norms concentrate near the clipping threshold while maintaining prediction consistency. The interaction between this regularization and the subsequent variance computation in VPP is not analyzed; any dependence introduced by DCR on the private features could further affect the sensitivity of the variance estimator and must be accounted for in the privacy proof.
minor comments (3)
  1. [§3.2] The notation distinguishing per-dimension versus per-group variance in VPP is introduced without a clear equation reference; a single equation defining the grouping rule and the resulting noise covariance would improve readability.
  2. [Experiments] Table 1 and Figure 3: the reported attack success rates should include standard deviations over multiple random seeds to allow readers to assess whether the claimed robustness margin is statistically reliable.
  3. [Experiments] The hyper-parameter table for DCR (temperature, distillation weight, etc.) is missing; all values used in the reported experiments should be listed explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of the privacy analysis that require strengthening. We agree with the need to rigorously bound the sensitivity of data-dependent components and will revise the manuscript to provide a complete proof that the privacy guarantees are no weaker than those of IGPP.

read point-by-point responses
  1. Referee: [Theoretical Analysis] Theoretical Analysis section (around the statement that the groupwise mechanism provides privacy guarantees no weaker than the isotropic baseline): the proof must explicitly bound the sensitivity of the empirical dimension-wise variance estimator used to set per-group sigma in VPP. Because sigma is data-dependent, the mechanism is not a simple post-processing of a fixed-sensitivity Gaussian mechanism; without a sensitivity bound or a worst-case privacy-loss argument over all possible variance realizations, the claim that privacy is 'no weaker' under the same constraints does not follow from standard Gaussian LDP calibration.

    Authors: We thank the referee for this important observation. The current theoretical analysis argues that the groupwise mechanism provides LDP guarantees no weaker than IGPP under identical privacy budgets by allocating noise according to variance, but it does not explicitly address the sensitivity of the variance estimator itself. We will revise the Theoretical Analysis section to include a formal bound on the sensitivity of the empirical dimension-wise variance. Given that prototypes are computed from l2-clipped features, the variance in each dimension has a bounded range, which allows us to derive the sensitivity and calibrate the noise accordingly or use a worst-case argument. This will make the privacy claim rigorous. revision: yes

  2. Referee: [§3.2] §3.2 (VPP definition): the decision to allocate noise according to observed dimension-wise class variance relies on the assumption that variance directly reflects discriminability. This assumption is used both to motivate the method and to set the grouping thresholds; if the thresholds themselves are data-dependent, the mechanism requires an additional privacy accounting step that is not shown to be absorbed into the stated epsilon budget.

    Authors: The variance-discriminability assumption is primarily for motivating VPP and is supported by the experimental results showing improved utility. For the grouping thresholds, we will revise §3.2 to specify a data-independent grouping strategy, such as using a fixed number of groups or thresholds based on expected variance ranges rather than empirical quantiles from the current data. Alternatively, if data-dependent grouping is retained, we will incorporate the privacy cost of the grouping step into the overall analysis. This will ensure the epsilon budget is respected. revision: yes

  3. Referee: [§4] §4 (DCR): the regularization is claimed to make feature norms concentrate near the clipping threshold while maintaining prediction consistency. The interaction between this regularization and the subsequent variance computation in VPP is not analyzed; any dependence introduced by DCR on the private features could further affect the sensitivity of the variance estimator and must be accounted for in the privacy proof.

    Authors: We acknowledge that the interaction between DCR and VPP's variance computation was not explicitly analyzed in the privacy proof. We will revise the manuscript to include an analysis showing that DCR, being a regularization applied to the model training, does not alter the sensitivity bound because the subsequent clipping operation still limits the feature norms, and thus the variance sensitivity remains the same as without DCR. We will add this to the Theoretical Analysis section to confirm that the privacy guarantees are unaffected. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central theoretical claim—that the groupwise variance-adaptive mechanism provides privacy guarantees no weaker than the isotropic Gaussian baseline—is presented as the output of a separate analysis rather than being definitionally equivalent to the method's inputs. The variance-based noise allocation is motivated by an empirical observation about discriminability and is not used to derive the privacy bound by construction; the bound is asserted to hold under the same constraints via groupwise analysis. No self-citation chain, fitted parameter renamed as prediction, or ansatz smuggled via prior work is load-bearing for the privacy result. The derivation remains self-contained against the stated LDP constraints and does not reduce the claimed guarantee to a tautology of the variance estimator itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on one key domain assumption about variance reflecting discriminability and on two newly introduced methods whose independent evidence is limited to the proposal itself.

free parameters (1)
  • clipping threshold
    Predefined l2 clipping bound that DCR is designed to make feature norms concentrate near.
axioms (1)
  • domain assumption Dimension-wise class variance reflects discriminability of prototype dimensions
    This observation is used to motivate allocating less noise to high-variance subspaces in VPP.
invented entities (2)
  • Variance-adaptive Prototype Perturbation (VPP) no independent evidence
    purpose: Allocates less noise to discriminative high-variance dimensions of prototypes
    New perturbation strategy introduced to improve utility over isotropic noise.
  • Distillation-guided Clipping Regularization (DCR) no independent evidence
    purpose: Makes feature norms adaptively concentrate near the clipping threshold while preserving prediction consistency
    New regularization technique using distillation.

pith-pipeline@v0.9.0 · 5552 in / 1594 out tokens · 91732 ms · 2026-05-07T06:34:52.631884+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 3 canonical work pages

  1. [1]

    Revisiting neural scaling laws in language and vision

    Ibrahim M Alabdulmohsin, Behnam Neyshabur, and Xiao- hua Zhai. Revisiting neural scaling laws in language and vision. InAdvances in Neural Information Processing Sys- tems, pages 22300–22312. Curran Associates, Inc., 2022. 1

  2. [2]

    Diprompt: Disentan- gled prompt tuning for multiple latent domain generaliza- tion in federated learning

    Sikai Bai, Jie Zhang, Song Guo, Shuaicheng Li, Jingcai Guo, Jun Hou, Tao Han, and Xiaocheng Lu. Diprompt: Disentan- gled prompt tuning for multiple latent domain generaliza- tion in federated learning. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27274–27283, 2024. 1

  3. [3]

    Personalization improves privacy- accuracy tradeoffs in federated learning

    Alberto Bietti, Chen-Yu Wei, Miroslav Dudik, John Lang- ford, and Steven Wu. Personalization improves privacy- accuracy tradeoffs in federated learning. InInternational Conference on Machine Learning, pages 1945–1962. PMLR,

  4. [4]

    Fair feder- ated learning under domain skew with local consistency and domain diversity

    Yuhang Chen, Wenke Huang, and Mang Ye. Fair feder- ated learning under domain skew with local consistency and domain diversity. In2024 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 12077– 12086, 2024. 1

  5. [5]

    Differentially private federated learning with local regularization and sparsification

    Anda Cheng, Peisong Wang, Xi Sheryl Zhang, and Jian Cheng. Differentially private federated learning with local regularization and sparsification. In2022 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 10112–10121, 2022. 2, 1

  6. [6]

    Aldp-fl for adaptive local differen- tial privacy in federated learning.Scientific Reports, 15(1): 26679, 2025

    Lixin Cui and Xu Wu. Aldp-fl for adaptive local differen- tial privacy in federated learning.Scientific Reports, 15(1): 26679, 2025. 2, 1

  7. [7]

    Fedp3e: Privacy-preserving prototype ex- change for non-iid iot malware detection in cross-silo feder- ated learning.arXiv preprint arXiv:2507.07258, 2025

    Rami Darwish, Mahmoud Abdelsalam, Sajad Khorsandroo, and Kaushik Roy. Fedp3e: Privacy-preserving prototype ex- change for non-iid iot malware detection in cross-silo feder- ated learning.arXiv preprint arXiv:2507.07258, 2025. 2

  8. [8]

    An image is worth 16×16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, and Alexander Houlsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations (ICLR), 2021. 6, 4

  9. [9]

    Calibrating noise to sensitivity in private data analy- sis

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analy- sis. InTheory of cryptography conference, pages 265–284. Springer, 2006. 6

  10. [10]

    Per- sonalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach

    Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. Per- sonalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. InAdvances in Neural Information Processing Systems, pages 3557–3568. Curran Associates, Inc., 2020. 1

  11. [11]

    Per- sonalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach

    Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. Per- sonalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. InAdvances in Neural Information Processing Systems, pages 3557–3568. Curran Associates, Inc., 2020. 1, 2, 3, 5, 6, 7, 4

  12. [12]

    Differentially Private Federated Learning: A Client Level Perspective

    Robin C Geyer, Tassilo Klein, and Moin Nabi. Differentially private federated learning: A client level perspective.arXiv preprint arXiv:1712.07557, 2017. 2, 1

  13. [13]

    Geodesic flow kernel for unsupervised domain adaptation

    Boqing Gong, Yuan Shi, Fei Sha, and Kristen Grauman. Geodesic flow kernel for unsupervised domain adaptation. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2066–2073, 2012. 6, 4

  14. [14]

    Clustered feder- ated learning with adaptive local differential privacy on het- erogeneous iot data.IEEE Internet of Things Journal, 11(1): 137–146, 2024

    Zaobo He, Lintao Wang, and Zhipeng Cai. Clustered feder- ated learning with adaptive local differential privacy on het- erogeneous iot data.IEEE Internet of Things Journal, 11(1): 137–146, 2024. 2, 1

  15. [15]

    Re- thinking federated learning with domain shift: A prototype view

    Wenke Huang, Mang Ye, Zekun Shi, He Li, and Bo Du. Re- thinking federated learning with domain shift: A prototype view. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16312–16322, 2023. 1, 2, 3, 5, 6, 7, 4

  16. [16]

    Jonathan J. Hull. A database for handwritten text recogni- tion research.IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554, 1994. 6, 3

  17. [17]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 7, 4

  18. [18]

    Lecun, L

    Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient- based learning applied to document recognition.Proceed- ings of the IEEE, 86(11):2278–2324, 1998. 6, 3

  19. [19]

    Fedsol: Stabilized orthogonal learn- ing with proximal restrictions in federated learning

    Gihun Lee, Minchan Jeong, Sangmook Kim, Jaehoon Oh, and Se-Young Yun. Fedsol: Stabilized orthogonal learn- ing with proximal restrictions in federated learning. In 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 12512–12522, 2024. 1

  20. [20]

    Hospedales

    Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Deeper, broader and artier domain generaliza- tion. In2017 IEEE International Conference on Computer Vision (ICCV), pages 5543–5551, 2017. 6, 4

  21. [21]

    Ditto: Fair and robust federated learning through personalization

    Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. Ditto: Fair and robust federated learning through personalization. InProceedings of the 38th International Conference on Machine Learning, pages 6357–6368. PMLR,

  22. [22]

    Layer-wised model aggregation for personalized federated learning

    Xiaosong Ma, Jie Zhang, Song Guo, and Wenchao Xu. Layer-wised model aggregation for personalized federated learning. In2022 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 10082–10091,

  23. [23]

    Communication- Efficient Learning of Deep Networks from Decentralized Data

    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- Efficient Learning of Deep Networks from Decentralized Data. InProceedings of the 20th International Conference on Artificial Intelligence and Statistics, pages 1273–1282. PMLR, 2017. 1

  24. [24]

    R ´enyi differential privacy

    Ilya Mironov. R ´enyi differential privacy. In2017 IEEE 30th computer security foundations symposium (CSF), pages 263–275. IEEE, 2017. 3

  25. [25]

    Scaling data- constrained language models

    Niklas Muennighoff, Alexander Rush, Boaz Barak, Teven Le Scao, Nouamane Tazi, Aleksandra Piktus, Sampo Pyysalo, Thomas Wolf, and Colin A Raffel. Scaling data- constrained language models. InAdvances in Neural Infor- mation Processing Systems, pages 50358–50376. Curran As- sociates, Inc., 2023. 1

  26. [26]

    Reading digits in natural images with unsupervised feature learning

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bis- sacco, Baolin Wu, Andrew Y Ng, et al. Reading digits in natural images with unsupervised feature learning. InNIPS workshop on deep learning and unsupervised feature learn- ing, page 4. Granada, 2011. 6, 3

  27. [27]

    Fedhide: Federated learn- ing by hiding in the neighbors

    Hyunsin Park and Sungrack Yun. Fedhide: Federated learn- ing by hiding in the neighbors. InEuropean Conference on Computer Vision, pages 405–422. Springer, 2024. 2

  28. [28]

    Su, and Michael I

    Anqi Qiao, Weijie J. Su, and Michael I. Zhang. Oneshot differentially private top-k selection. InProceedings of the 38th International Conference on Machine Learning. PMLR, 2021. 3, 4

  29. [29]

    Effects of degra- dations on deep neural network architectures

    P Roy, S Ghosh, S Bhattacharya, and U Pal. Effects of degra- dations on deep neural network architectures. arxiv 2018. arXiv preprint arXiv:1807.10108, 2018. 6, 3

  30. [30]

    John Wiley & Sons, 1999

    Henry Scheffe.The analysis of variance. John Wiley & Sons, 1999. 4

  31. [31]

    Fedawa: Adaptive optimiza- tion of aggregation weights in federated learning using client vectors

    Changlong Shi, He Zhao, Bingjie Zhang, Mingyuan Zhou, Dandan Guo, and Yi Chang. Fedawa: Adaptive optimiza- tion of aggregation weights in federated learning using client vectors. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 30651–30660, 2025. 1

  32. [32]

    Make landscape flatter in differentially private federated learning

    Yifan Shi, Yingqi Liu, Kang Wei, Li Shen, Xueqian Wang, and Dacheng Tao. Make landscape flatter in differentially private federated learning. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24552–24562, 2023. 2, 1

  33. [33]

    Membership inference attacks against machine learning models

    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE Symposium on Security and Privacy (SP), pages 3–18, 2017. 8

  34. [34]

    Dinh, Nguyen Tran, and Josh Nguyen

    Canh T. Dinh, Nguyen Tran, and Josh Nguyen. Personal- ized federated learning with moreau envelopes. InAdvances in Neural Information Processing Systems, pages 21394– 21405. Curran Associates, Inc., 2020. 1

  35. [35]

    Fedselect: Personalized fed- erated learning with customized selection of parameters for fine-tuning

    Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, and Aviv Shamsian. Fedselect: Personalized fed- erated learning with customized selection of parameters for fine-tuning. In2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 23985–23994,

  36. [36]

    Fedproto: Federated proto- type learning across heterogeneous clients

    Yue Tan, Guodong Long, Lu Liu, Tianyi Zhou, Qinghua Lu, Jing Jiang, and Chengqi Zhang. Fedproto: Federated proto- type learning across heterogeneous clients. InProceedings of the AAAI conference on artificial intelligence, pages 8432– 8440, 2022. 1, 2, 3, 6, 7, 8, 4, 5

  37. [37]

    Privacy-preserving personalized federated prompt learning for multimodal large language models

    Linh Tran, Wei Sun, Stacy Patterson, and Ana Milanova. Privacy-preserving personalized federated prompt learning for multimodal large language models. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 2, 3

  38. [38]

    R ´enyi divergence and kullback-leibler divergence.IEEE Transactions on Informa- tion Theory, 60(7):3797–3820, 2014

    Tim van Erven and Peter Harremos. R ´enyi divergence and kullback-leibler divergence.IEEE Transactions on Informa- tion Theory, 60(7):3797–3820, 2014. 3

  39. [39]

    Position: Will we run out of data? limits of llm scaling based on human- generated data

    Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Be- siroglu, Lennart Heim, and Marius Hobbhahn. Position: Will we run out of data? limits of llm scaling based on human- generated data. InForty-first International Conference on Machine Learning, 2024. 1

  40. [40]

    Federated graph learning under domain shift with generalizable pro- totypes.Proceedings of the AAAI Conference on Artificial Intelligence, 38(14):15429–15437, 2024

    Guancheng Wan, Wenke Huang, and Mang Ye. Federated graph learning under domain shift with generalizable pro- totypes.Proceedings of the AAAI Conference on Artificial Intelligence, 38(14):15429–15437, 2024. 1

  41. [41]

    Fedfr-adp: Adaptive dif- ferential privacy with feedback regulation for robust model performance in federated learning.Information Fusion, 116: 102796, 2025

    Debao Wang and Shaopeng Guan. Fedfr-adp: Adaptive dif- ferential privacy with feedback regulation for robust model performance in federated learning.Information Fusion, 116: 102796, 2025. 2, 1

  42. [42]

    Taming cross-domain representation variance in feder- ated prototype learning with heterogeneous data domains

    Lei Wang, Jieming Bian, Letian Zhang, Chen Chen, and Jie Xu. Taming cross-domain representation variance in feder- ated prototype learning with heterogeneous data domains. In Advances in Neural Information Processing Systems, pages 88348–88372. Curran Associates, Inc., 2024. 1, 2, 3, 5, 6, 7, 4

  43. [43]

    Fedpcl-cdr: A fed- erated prototype-based contrastive learning framework for privacy-preserving cross-domain recommendation.Neural Networks, 196:108380, 2026

    Li Wang, Qiang Wu, and Min Xu. Fedpcl-cdr: A fed- erated prototype-based contrastive learning framework for privacy-preserving cross-domain recommendation.Neural Networks, 196:108380, 2026. 2

  44. [44]

    Local differential private data aggregation for discrete distribution estimation.IEEE Transactions on Parallel and Distributed Systems, 30(9):2046–2059, 2019

    Shaowei Wang, Liusheng Huang, Yiwen Nie, Xinyuan Zhang, Pengzhan Wang, Hongli Xu, and Wei Yang. Local differential private data aggregation for discrete distribution estimation.IEEE Transactions on Parallel and Distributed Systems, 30(9):2046–2059, 2019. 3

  45. [45]

    Federated learning with domain shift eraser

    Zheng Wang, Zihui Wang, Zheng Wang, Xiaoliang Fan, and Cheng Wang. Federated learning with domain shift eraser. In2025 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 4978–4987, 2025. 1

  46. [46]

    Vincent Poor

    Kang Wei, Jun Li, Ming Ding, Chuan Ma, Hang Su, Bo Zhang, and H. Vincent Poor. User-level privacy-preserving federated learning: Analysis and performance optimization. IEEE Transactions on Mobile Computing, 21(9):3388–3401,

  47. [47]

    Perada: Parameter-efficient federated learning personalization with generalization guarantees

    Chulin Xie, De-An Huang, Wenda Chu, Daguang Xu, Chaowei Xiao, Bo Li, and Anima Anandkumar. Perada: Parameter-efficient federated learning personalization with generalization guarantees. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23838–23848, 2024. 1

  48. [48]

    Dynamic per- sonalized federated learning with adaptive differential pri- vacy

    Xiyuan Yang, Wenke Huang, and Mang Ye. Dynamic per- sonalized federated learning with adaptive differential pri- vacy. InAdvances in Neural Information Processing Sys- tems, pages 72181–72192. Curran Associates, Inc., 2023. 2, 1

  49. [49]

    Fedas: Bridg- ing inconsistency in personalized federated learning

    Xiyuan Yang, Wenke Huang, and Mang Ye. Fedas: Bridg- ing inconsistency in personalized federated learning. In 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 11986–11995, 2024. 1

  50. [50]

    Open- fedllm: Training large language models on decentralized private data via federated learning

    Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, and Siheng Chen. Open- fedllm: Training large language models on decentralized private data via federated learning. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 6137–6147, New York, NY , USA,

  51. [51]

    Association for Computing Machinery. 1

  52. [52]

    Fedala: Adaptive local aggregation for personalized federated learning.Proceed- ings of the AAAI Conference on Artificial Intelligence, 37 (9):11237–11244, 2023

    Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, and Haibing Guan. Fedala: Adaptive local aggregation for personalized federated learning.Proceed- ings of the AAAI Conference on Artificial Intelligence, 37 (9):11237–11244, 2023. 1

  53. [53]

    An upload-efficient scheme for transferring knowledge from a server-side pre-trained generator to clients in heterogeneous federated learning

    Jianqing Zhang, Yang Liu, Yang Hua, and Jian Cao. An upload-efficient scheme for transferring knowledge from a server-side pre-trained generator to clients in heterogeneous federated learning. In2024 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 12109– 12119, 2024. 1, 2, 4

  54. [54]

    Jianqing Zhang, Yang Liu, Yang Hua, and Jian Cao. Fedtgp: Trainable global prototypes with adaptive-margin-enhanced contrastive learning for data and model heterogeneity in fed- erated learning.Proceedings of the AAAI Conference on Ar- tificial Intelligence, 38(15):16768–16776, 2024. 1, 2, 3, 5, 6, 7, 4

  55. [55]

    Fedgmkd: An efficient prototype federated learning framework through knowledge distillation and discrepancy-aware aggregation

    Jianqiao Zhang, Caifeng Shan, and Jungong Han. Fedgmkd: An efficient prototype federated learning framework through knowledge distillation and discrepancy-aware aggregation. InAdvances in Neural Information Processing Systems, pages 118326–118356. Curran Associates, Inc., 2024. 2, 3, 1

  56. [56]

    Enhancing federated do- main adaptation with multi-domain prototype-based feder- ated fine-tuning

    Jingyuan Zhang, Yiyang Duan, Shuaicheng Niu, Y ANG CAO, and Wei Yang Bryan Lim. Enhancing federated do- main adaptation with multi-domain prototype-based feder- ated fine-tuning. InThe Thirteenth International Conference on Learning Representations, 2025. 2, 3, 6, 7, 8, 4

  57. [57]

    Safely learn- ing with private data: A federated learning framework for large language model

    Jia-Ying Zheng, Hainan Zhang, Lingxiang Wang, Wangjie Qiu, Hong-Wei Zheng, and Zhi-Ming Zheng. Safely learn- ing with private data: A federated learning framework for large language model. InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Processing, pages 5293–5306, Miami, Florida, USA, 2024. Association for Computational Li...