Recognition: no theorem link
Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization
Pith reviewed 2026-05-12 01:08 UTC · model grok-4.3
The pith
FedPower improves differentially private low-rank federated learning by reconstructing full-rank updates before noisy projection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FedPower reshapes server-side aggregation in cross-silo federated learning by explicitly reconstructing and clipping full-rank client updates to bound sensitivity, then projects the exact aggregated update back into a secure low-rank space using PowerDP. PowerDP is a differentially private low-rank factorization based on simultaneous subspace iteration that injects calibrated DP noise prior to the final orthonormalization step. Rigorous theoretical analyses establish sensitivity bounds for subspace projections, proving that this achieves both sample-level and client-level differential privacy. Experiments on language understanding tasks demonstrate robustness against tight privacy budgets.
What carries the argument
PowerDP, the differentially private low-rank factorization mechanism based on simultaneous subspace iteration that injects calibrated DP noise prior to the final orthonormalization step to preserve matrix orthogonality
If this is right
- FedPower achieves both sample-level and client-level differential privacy.
- The framework is robust against tight privacy budgets while adding negligible computational overheads.
- PowerDP improves the accuracy-privacy tradeoff compared to other noise injection schemes.
- The approach is validated on language understanding tasks in cross-silo settings.
- Evaluation against membership inference attacks confirms the privacy guarantees.
Where Pith is reading between the lines
- Similar reconstruction of full-rank updates before projection could help other low-rank methods in privacy-preserving distributed training.
- Adding noise before orthonormalization may be useful in other DP mechanisms that rely on subspace or matrix decompositions.
- The sensitivity bounds derived for subspace projections could apply to related projection-based techniques in private machine learning.
- Extending the framework beyond cross-silo to cross-device federated learning would be a natural next test of its scalability.
Load-bearing premise
Reconstructing full-rank client updates from low-rank factors, clipping them, and then projecting the aggregate back via PowerDP with noise before orthonormalization will not introduce errors that overwhelm the signal in restricted low-rank subspaces.
What would settle it
If experiments under a tight privacy budget like epsilon=1 show that model accuracy on language tasks drops more than the non-private LoRA baseline or if membership inference attacks succeed above chance level, the claim would be falsified.
Figures
read the original abstract
Federated Learning (FL) with parameter-efficient fine-tuning, such as Low-Rank Adaptation (LoRA), enables scalable model training on distributed data. However, when combined with Differential Privacy (DP), LoRA often introduces errors during global aggregation and amplifies the negative effect of DP noise. Existing cross-silo FL approaches mitigate the aggregation error by freezing one LoRA module and applying output perturbation. However, in a restricted low-rank subspaces, this additive noise frequently overwhelms the signals of the weight matrices, leading to suboptimal accuracy. To address this vulnerability, we propose FedPower, a differentially private cross-silo FL framework that reshapes server-side aggregation. Instead of perturbing mismatched low-rank factors, FedPower explicitly reconstructs and clips full-rank client updates to bound the sensitivity. The server then projects the exact aggregated update back into a secure low-rank space using PowerDP, a novel differentially private low-rank factorization mechanism. Based on simultaneous subspace iteration, PowerDP injects calibrated DP noise prior to the final orthonormalization step, effectively mitigates the negative effect of DP noise by preserving matrix orthogonality. We provide rigorous theoretical analyses establishing sensitivity bounds for subspace projections, proving that FedPower achieves both sample-level and client-level DP. Extensive experiments on various language understanding tasks in cross-silo FL settings show that FedPower is robust against tight privacy budgets while adding negligible computational overheads. Additional empirical study on different DP noise injection schemes validates the effectiveness of PowerDP in improving the tradeoff in accuracy and privacy. Evaluation on three different membership inference attacks validates the robustness and privacy-preserving capability of the proposed framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FedPower, a cross-silo federated learning framework for differentially private LoRA fine-tuning. Client updates are reconstructed to full-rank form (ΔW = BA), clipped to bound sensitivity, aggregated at the server, and then projected back to low-rank via the new PowerDP mechanism, which applies simultaneous subspace iteration and injects calibrated DP noise before the final orthonormalization step. The authors claim rigorous sensitivity bounds establishing both sample-level and client-level DP, along with empirical results on language understanding tasks showing improved robustness to tight privacy budgets and negligible overhead compared to prior output-perturbation baselines.
Significance. If the sensitivity bounds hold and the utility gains are reproducible, the work would meaningfully advance DP parameter-efficient FL by mitigating noise dominance in restricted low-rank subspaces. The provision of theoretical analyses for the combined reconstruction-plus-projection pipeline and extensive experiments across multiple tasks and DP schemes are strengths that support the central claim.
major comments (2)
- [§4] §4 (Theoretical Analysis): The claimed sensitivity bounds for subspace projections in PowerDP must explicitly compose the sensitivity introduced by full-rank reconstruction (ΔW = BA) and clipping with the subsequent noisy iteration and orthonormalization; the current separation of these steps leaves open whether the effective sensitivity for client-level DP remains bounded as stated, particularly for the tight ε values used in the experiments.
- [§5.2] §5.2 (Experiments on language tasks): The reported accuracy improvements of FedPower over output-perturbation baselines under ε ≤ 1 rely on the assumption that post-projection error does not overwhelm the low-rank signal; without an ablation isolating the reconstruction-plus-PowerDP error term or reporting per-run variance, it is unclear whether the gains are robust or could be explained by the specific LoRA rank and clipping choices.
minor comments (2)
- [Figure 2] Figure 2: The diagram of the PowerDP iteration would benefit from explicit annotation of where the DP noise is added relative to the orthonormalization step.
- The notation distinguishing sample-level versus client-level sensitivity bounds could be made more consistent between the theorem statements and the experimental privacy budget settings.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of the theoretical composition and experimental robustness, which we address point by point below. We have revised the manuscript to incorporate clarifications and additional analyses where appropriate.
read point-by-point responses
-
Referee: §4 (Theoretical Analysis): The claimed sensitivity bounds for subspace projections in PowerDP must explicitly compose the sensitivity introduced by full-rank reconstruction (ΔW = BA) and clipping with the subsequent noisy iteration and orthonormalization; the current separation of these steps leaves open whether the effective sensitivity for client-level DP remains bounded as stated, particularly for the tight ε values used in the experiments.
Authors: We appreciate the referee's point on explicit composition. Section 4.1 establishes that full-rank reconstruction followed by clipping bounds the L2 sensitivity of each client's update to the clipping threshold C. PowerDP then receives this bounded aggregate and injects noise calibrated to the resulting sensitivity before orthonormalization. Orthonormalization is post-processing and preserves the DP guarantee. To address the concern directly, we have added a new paragraph in §4.2 that walks through the sequential composition: (i) clipping bounds per-client sensitivity, (ii) averaging scales sensitivity by 1/K, and (iii) PowerDP's Gaussian noise is scaled to this composed bound. The revised analysis confirms that client-level (ε,δ)-DP holds for the reported ε ≤ 1 values, with the proof details moved to the appendix for clarity. revision: yes
-
Referee: §5.2 (Experiments on language tasks): The reported accuracy improvements of FedPower over output-perturbation baselines under ε ≤ 1 rely on the assumption that post-projection error does not overwhelm the low-rank signal; without an ablation isolating the reconstruction-plus-PowerDP error term or reporting per-run variance, it is unclear whether the gains are robust or could be explained by the specific LoRA rank and clipping choices.
Authors: We agree that isolating the projection error and reporting variance would strengthen the empirical claims. In the revised §5.2 we have added an ablation that compares FedPower against a non-private reconstruction baseline and a direct low-rank perturbation variant, reporting the Frobenius norm of the post-projection residual. The results show that PowerDP's error remains below the noise level of output-perturbation baselines. We now also report mean accuracy ± standard deviation over five independent runs for all tasks and privacy budgets. Additional tables vary LoRA rank (r=4,8,16) and clipping norm (C=1,5), confirming that the accuracy gains persist across these choices and are not explained by specific hyperparameter settings. revision: yes
Circularity Check
No circularity: sensitivity bounds and PowerDP derived from standard DP definitions and subspace iteration without reduction to inputs or self-citations
full rationale
The paper's central claims rest on explicit reconstruction of full-rank updates from LoRA factors, clipping for sensitivity bounding, aggregation, and then PowerDP projection via simultaneous subspace iteration with noise injected before orthonormalization. Theoretical sensitivity bounds for the subspace projections are presented as derived analyses proving sample- and client-level DP; these follow from the mechanism definition and standard DP composition rather than any fitted quantity or self-referential loop. No equations reduce the claimed DP guarantees or utility improvements to tautologies by construction, and the abstract and mechanism description invoke no load-bearing self-citations or uniqueness theorems from prior author work. The framework is self-contained against external DP and LoRA benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- LoRA rank r
- DP privacy budget (epsilon, delta)
axioms (2)
- standard math Standard (epsilon, delta)-differential privacy definitions apply to both sample-level and client-level settings
- domain assumption Simultaneous subspace iteration can be adapted to inject noise while preserving orthogonality after projection
invented entities (2)
-
PowerDP mechanism
no independent evidence
-
FedPower framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In ACM Conf. Comput. Commun. Secur. (CCS). ACM, 308–318
work page 2016
-
[2]
Maria-Florina Balcan, Simon Shaolei Du, Yining Wang, and Adams Wei Yu. 2016. An improved gap-dependency analysis of the noisy power method. InConf. Learn. Theory (COLT). PMLR, 284–309
work page 2016
-
[3]
Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. 2012. The Johnson- Lindenstrauss transform itself preserves differential privacy. InIEEE Annu. Symp. Found. Comput. Sci. (FOCS). IEEE, 410–419
work page 2012
-
[4]
Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy.Found. Trends Theor. Comput. Sci.9, 3–4 (Aug. 2014), 211–487
work page 2014
-
[5]
Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. 2020. Inverting gradients – How easy is it to break privacy in federated learning?. In Adv. Neural Inf. Process. Syst. (NeurIPS). Curran Associates, Inc., 16937–16947
work page 2020
-
[6]
Moritz Hardt and Eric Price. 2014. The noisy power method: A meta algorithm with applications. InAdv. Neural Inf. Process. Syst. (NeurIPS), Vol. 27. Curran Associates, Inc
work page 2014
-
[7]
Moritz Hardt and Aaron Roth. 2012. Beating randomized response on incoherent matrices. InACM Symp. Theory Comput. (STOC). ACM, 1255–1268
work page 2012
-
[8]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-rank adaptation of large language models.arXiv:2106.09685
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[9]
Peter Kairouz, Monica Ribero Diaz, Keith Rush, and Abhradeep Thakurta. 2021. (Nearly) dimension independent private ERM with AdaGrad rates via publicly estimated subspaces. InConf. Learn. Theory (COLT). PMLR, 2717–2746
work page 2021
-
[10]
Tianqu Kang, Zixin Wang, Hengtao He, Jun Zhang, Shenghui Song, and Khaled B Letaief. 2025. Federated low-rank adaptation with differential privacy for wireless networks. InIEEE Int. Mediterr. Conf. Commun. Netw. (MeditCom). IEEE, 1–6
work page 2025
-
[11]
Michael Kapralov and Kunal Talwar. 2013. On differentially private low rank approximation. InACM-SIAM Symp. Discrete Algorithms (SODA). SIAM, 1395– 1414
work page 2013
-
[12]
Xuechen Li, Florian Tramer, Percy Liang, and Tatsunori Hashimoto. 2022. Large language models can be strong differentially private learners. InInt. Conf. Learn. Represent. (ICLR). OpenReview.net
work page 2022
-
[13]
Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Matt White, and Meikang Qiu. 2025. Differentially private low-rank adaptation of large Linh Tran, Ana Milanova, and Stacy Patterson (a) Shadow Model Attack. (b) Loss Based Attack. (c) Calibration Loss Attack. Figure 4: True positive rate vs. False positive rate of each membership inference a...
work page 2025
-
[14]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach.arXiv:1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[15]
Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image rep- resentations by inverting them. InIEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). IEEE, 5188–5196
work page 2015
-
[16]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. InInt. Conf. Artif. Intell. Statist. (AISTATS). PMLR, 1273–1282
work page 2017
-
[17]
H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learn- ing differentially private recurrent language models. InInt. Conf. Learn. Represent. (ICLR). OpenReview.net
work page 2018
- [18]
-
[19]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. InIEEE Symp. Secur. Privacy (S&P). IEEE, 3–18
work page 2017
-
[20]
Raghav Singhal, Kaustubh Ponkshe, and Praneeth Vepakomma. 2025. FedEx- LoRA: Exact aggregation for federated and efficient fine-tuning of large language models. InAnnu. Meet. Assoc. Comput. Linguist. (ACL). Association for Computa- tional Linguistics, 1316–1336
work page 2025
-
[21]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. InConf. Empir. Methods Nat. Lang. Process. (EMNLP). Association for Computational Linguistics, 1631–1642
work page 2013
-
[22]
William J Stewart and Alan Jennings. 1981. A simultaneous iteration algorithm for real matrices.ACM Trans. Math. Softw.7, 2 (Jun. 1981), 184–198
work page 1981
-
[23]
Youbang Sun, Zitao Li, Yaliang Li, and Bolin Ding. 2024. Improving LoRA in privacy-preserving federated learning. InInt. Conf. Learn. Represent. (ICLR). OpenReview.net
work page 2024
-
[24]
Linh Tran, Wei Sun, Stacy Patterson, and Ana Milanova. 2025. Privacy-preserving personalized federated prompt learning for multimodal large language models. InInt. Conf. Learn. Represent. (ICLR). OpenReview.net
work page 2025
-
[25]
Thijs Vogels, Sai Praneeth Karimireddy, and Martin Jaggi. 2019. PowerSGD: Practical low-rank gradient compression for distributed optimization. InAdv. Neural Inf. Process. Syst. (NeurIPS). Curran Associates, Inc., 14259–14268
work page 2019
-
[26]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natu- ral language understanding. InEMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 353–355
work page 2018
-
[27]
Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, and Ang Li. 2024. FLoRA: Federated fine-tuning large language models with heterogeneous low-rank adaptations. InAdv. Neural Inf. Process. Syst. (NeurIPS), Vol. 37. Curran Associates, Inc., 22513–22533
work page 2024
-
[28]
Lauren Watson, Chuan Guo, Graham Cormode, and Alexandre Sablayrolles. 2021. On the importance of difficulty calibration in membership inference attacks. In Int. Conf. Learn. Represent. (ICLR). OpenReview.net
work page 2021
-
[29]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. InConf. N. Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol. (NAACL-HLT). Association for Computational Linguistics, 1112–1122
work page 2018
- [30]
-
[31]
Qianren Yang, Yong Li, and Tao Zhao. 2025. FL-DPLoRA: An integrated and efficient privacy-preserving training framework for large language models in privacy-critical applications. InInt. Conf. Softw. Qual., Reliab., Secur. Companion (QRS-C). IEEE, 53–62
work page 2025
-
[32]
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. 2018. Privacy risk in machine learning: Analyzing the connection to overfitting. InIEEE Comput. Secur. Found. Symp. (CSF). IEEE, 268–282
work page 2018
-
[33]
Da Yu, Huishuai Zhang, Wei Chen, and Tie-Yan Liu. 2021. Do not let privacy overbill utility: Gradient embedding perturbation for private learning. InInt. Conf. Learn. Represent. (ICLR). OpenReview.net
work page 2021
-
[34]
Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, and Tie-Yan Liu. 2021. Large scale private learning via low-rank reparametrization. InInt. Conf. Mach. Learn. (ICML). PMLR, 12208–12218
work page 2021
-
[35]
Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Guoyin Wang, and Yiran Chen. 2024. Towards building the FederatedGPT: Federated instruction tuning. InIEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP). IEEE, 6915–6919
work page 2024
-
[36]
Yingxue Zhou, Steven Wu, and Arindam Banerjee. 2021. Bypassing the ambient dimension: Private SGD with gradient subspace identification. InInt. Conf. Learn. Represent. (ICLR). OpenReview.net. Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization A Additional Details on Experimental Setup Table 4 details the informati...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.