DP-SelFT: Differentially Private Selective Fine-Tuning for Large Language Models
Pith reviewed 2026-05-20 13:30 UTC · model grok-4.3
The pith
Selective layer fine-tuning chosen on a DP synthetic dataset improves utility under the same privacy budget for large language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DP-SelFT first builds a DP synthetic dataset, then evaluates candidate layer subsets by temporarily training each on a synthetic training split and scoring them on a synthetic validation split under perturbation magnitudes matched to the downstream DP noise. The best subset is retained for the actual private fine-tuning on real data. Because the entire selection stage uses only the synthetic data, it consumes none of the privacy budget; the matched worst-case perturbations ensure the chosen layers tolerate the clipping and noise that will appear in the real run.
What carries the argument
Layer-level selection performed on a DP synthetic dataset via temporary training under worst-case perturbations whose scale matches the noise of the final private fine-tuning step.
If this is right
- For any fixed privacy budget the resulting model reaches higher accuracy on benchmark tasks than full-parameter DP fine-tuning or existing DP-LoRA baselines.
- The selection stage adds no privacy cost because it runs exclusively on the synthetic dataset.
- Chosen layers remain useful after gradient clipping and noise addition because the temporary training already simulates those effects.
- The method can be combined with existing parameter-efficient techniques without changing the privacy accounting.
Where Pith is reading between the lines
- The same synthetic-data selection idea could be tested on vision or multimodal models where full fine-tuning is also expensive.
- If the selected layers turn out to be consistent across tasks, the approach might allow pre-computing reusable layer masks that further reduce per-task compute.
- Extending the perturbation matching to other DP mechanisms such as DP-SGD variants could broaden applicability.
Load-bearing premise
Layer subsets that perform well in temporary training on synthetic data with matched worst-case perturbations will remain the best choices when the same selection is applied to real private data.
What would settle it
On a real downstream dataset, private fine-tuning with the DP-SelFT selected layers produces no higher accuracy than private fine-tuning with randomly chosen layers or with all layers, under identical privacy parameters.
Figures
read the original abstract
Large language models (LLMs) are commonly adapted to downstream tasks through fine-tuning, but fine-tuning data often contains sensitive information that may be leaked by the resulting model. Differential privacy (DP) offers formal protection against such leakage, yet DP fine-tuning of LLMs still suffers from substantial utility degradation due to gradient clipping and noise injection. Existing work improves this trade-off by combining DP with parameter-efficient fine-tuning methods such as LoRA, which constrain the form of updates. In this work, we study a complementary direction: selective fine-tuning, which constrains where updates are applied. We propose DP-SelFT, a framework for differentially private selective fine-tuning of LLMs. DP-SelFT addresses three DP-specific challenges in parameter selection: avoiding repeated privacy cost, improving stability under noisy estimates, and selecting parameters that remain useful under clipped and noisy updates. It first constructs a lightweight DP synthetic dataset and performs selection only on this synthetic data, so the selection stage incurs no additional privacy cost. It then conducts layer-level selection by temporarily training candidate layer subsets on a synthetic training split and evaluating them on a synthetic validation split. Crucially, this temporary training is performed under a perturbation regime matched to downstream DP fine-tuning, with worst-case perturbations of the same scale as DP noise. This favors layer subsets that are not only learnable but also robust to noisy private updates. Experiments on benchmark tasks show that DP-SelFT consistently improves the privacy--utility trade-off over existing DP fine-tuning baselines under the same privacy guarantees.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DP-SelFT, a framework for differentially private selective fine-tuning of large language models. It constructs a lightweight DP synthetic dataset on which layer-level selection is performed by temporarily training candidate layer subsets on a synthetic training split and evaluating on a synthetic validation split. This temporary training uses a perturbation regime with worst-case perturbations matched in scale to the downstream DP noise. The selection incurs no additional privacy cost due to the use of synthetic data. The authors claim that experiments on benchmark tasks show DP-SelFT consistently improves the privacy-utility trade-off over existing DP fine-tuning baselines under identical privacy guarantees.
Significance. If the synthetic-to-real transfer of selected layers holds, the method offers a useful complement to parameter-efficient techniques such as LoRA by focusing updates on layers that remain effective under clipping and noise. The design choice to match perturbation regimes during selection is a clear strength that directly targets DP-specific robustness. The approach avoids extra privacy expenditure for selection, which is a practical advantage. The manuscript provides a coherent description of the three DP-specific challenges it addresses.
major comments (2)
- [§3] §3 (Method): The central claim depends on the assumption that layer subsets chosen by temporary training on the DP synthetic dataset under matched worst-case perturbations remain effective when the actual DP fine-tuning is performed on real private data. No analysis of gradient statistics, feature scale, or noise interaction differences between synthetic and real data is provided, and no quantitative validation (e.g., overlap between synthetic-selected and oracle real-selected layers) is reported.
- [§4] §4 (Experiments): The reported improvements over baselines are not accompanied by ablations that isolate the contribution of the selection procedure or that test sensitivity to mismatches between the synthetic data distribution and the downstream task, leaving the load-bearing transfer assumption untested.
minor comments (2)
- [Abstract] Abstract: The phrase 'benchmark tasks' is used without naming the specific datasets or tasks, which would aid immediate assessment of scope.
- [§3] Notation: The description of the 'perturbation regime matched to downstream DP fine-tuning' would benefit from an explicit equation or pseudocode showing how the worst-case perturbation scale is computed and applied during the temporary training step.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below and describe the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Method): The central claim depends on the assumption that layer subsets chosen by temporary training on the DP synthetic dataset under matched worst-case perturbations remain effective when the actual DP fine-tuning is performed on real private data. No analysis of gradient statistics, feature scale, or noise interaction differences between synthetic and real data is provided, and no quantitative validation (e.g., overlap between synthetic-selected and oracle real-selected layers) is reported.
Authors: We agree that direct quantitative validation such as layer overlap with an oracle selection performed on real data would provide stronger support for the transfer assumption. However, constructing such an oracle would require non-private access to the real data, which is incompatible with the DP setting. The matched perturbation regime during selection is specifically designed to simulate the noise and clipping conditions of downstream DP fine-tuning, thereby favoring robust layers even if gradient statistics differ. In the revised manuscript we will add a new paragraph in §3 discussing expected differences in gradient norms and feature scales between the DP synthetic data and real data, supported by non-private auxiliary experiments on public proxies. We will also report the stability of selected layers across multiple synthetic data realizations. revision: yes
-
Referee: [§4] §4 (Experiments): The reported improvements over baselines are not accompanied by ablations that isolate the contribution of the selection procedure or that test sensitivity to mismatches between the synthetic data distribution and the downstream task, leaving the load-bearing transfer assumption untested.
Authors: We acknowledge that isolating the selection procedure and testing robustness to synthetic-real distribution mismatch would strengthen the experimental section. In the revised version we will add an ablation that compares DP-SelFT against (i) full-layer DP fine-tuning and (ii) random layer selection, all under identical privacy budgets and perturbation scales. We will also include results using synthetic datasets generated at different privacy levels (higher and lower ε) and with an alternative synthesis method to quantify sensitivity to distribution mismatch. These additions will directly test the transfer assumption. revision: yes
Circularity Check
No circularity detected in DP-SelFT derivation or claims
full rationale
The paper presents a practical method for layer selection in DP fine-tuning that constructs a lightweight DP synthetic dataset and performs temporary training under a matched perturbation regime to choose subsets robust to noise. This procedure is justified by standard differential privacy composition (selection on synthetic data incurs no extra privacy cost) and does not reduce any claimed result to its inputs by definition, fitted parameters renamed as predictions, or self-citation chains. No equations or uniqueness theorems are invoked that collapse the selection criterion to the downstream data or prior author work; the experimental improvements are presented as empirical outcomes on benchmarks rather than tautological consequences of the construction. The approach is self-contained against external DP baselines.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DP-SelFT performs layer-level selection by temporarily training candidate layer subsets on a synthetic training split and evaluating them on a synthetic validation split under a perturbation regime matched to downstream DP fine-tuning, with worst-case perturbations of the same scale as DP noise.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1 decomposes the effect of a noisy private update into a descent term and a perturbation-induced error term... dΛσ²C² captures the DP noise damage, which decreases with the number of trainable parameters.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308–318
work page 2016
-
[2]
Accountability Act. 1996. Health insurance portability and accountability act of 1996.Public law104 (1996), 191
work page 1996
-
[3]
Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, and Pasin Manurangsi
-
[4]
InFindings of the Association for Computational Linguistics: EMNLP 2022
Large-scale differentially private BERT. InFindings of the Association for Computational Linguistics: EMNLP 2022. 6481–6491
work page 2022
-
[5]
Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. InProceedings of the 11th international workshop on semantic evaluation (SemEval-2017). 1–14
work page 2017
-
[6]
Linkang Du, Zhikun Zhang, Shaojie Bai, Changchang Liu, Shouling Ji, Peng Cheng, and Jiming Chen. 2021. AHEAD: adaptive hierarchical decomposition for range query under local differential privacy. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 1266–1288
work page 2021
- [7]
-
[8]
Cynthia Dwork. 2008. Differential privacy: A survey of results. InInternational conference on theory and applications of models of computation. Springer, 1–19
work page 2008
-
[9]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali- brating noise to sensitivity in private data analysis. InTheory of cryptography conference. Springer, 265–284
work page 2006
-
[10]
Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy.Foundations and trends®in theoretical computer science9, 3-4 (2014), 211–487
work page 2014
-
[11]
Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, and Chiyuan Zhang. 2021. Deep learning with label differential privacy.Advances in neural information processing systems34 (2021), 27131–27145
work page 2021
-
[12]
Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, and Bohan Zhuang. 2023. Sensitivity-aware visual parameter-efficient fine-tuning. InProceedings of the IEEE/CVF international conference on computer vision. 11825–11835
work page 2023
-
[13]
Sanghyun Hong, Pietro Frigo, Yiğitcan Kaya, Cristiano Giuffrida, and Tudor Dumitras,. 2019. Terminal brain damage: Exposing the graceless degradation in deep neural networks under hardware fault attacks. In28th USENIX Security Symposium (USENIX Security 19). 497–514
work page 2019
-
[14]
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. InInternational conference on machine learning. PMLR, 2790–2799
work page 2019
-
[15]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3
work page 2022
-
[16]
Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. Com- pacter: Efficient low-rank hypercomplex adapter layers.Advances in neural information processing systems34 (2021), 1022–1035
work page 2021
- [17]
- [18]
- [19]
- [20]
-
[21]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[22]
Ashwin Machanavajjhala, Daniel Kifer, John Abowd, Johannes Gehrke, and Lars Vilhuber. 2008. Privacy: Theory meets practice on the map. In2008 IEEE 24th international conference on data engineering. IEEE, 277–286
work page 2008
-
[23]
Justus Mattern, Zhijing Jin, Benjamin Weggenmann, Bernhard Schoelkopf, and Mrinmaya Sachan. 2022. Differentially private language models for secure data sharing. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 4860–4873
work page 2022
- [24]
-
[25]
Nicolas Papernot, Martín Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2016. Semi-supervised knowledge transfer for deep learning from private training data.arXiv preprint arXiv:1610.05755(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[26]
Jinseong Park, Hoki Kim, Yujin Choi, and Jaewook Lee. 2023. Differentially pri- vate sharpness-aware training. InInternational Conference on Machine Learning. PMLR, 27204–27224
work page 2023
-
[27]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Dis- tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108(2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [28]
-
[29]
Yifan Shi, Yingqi Liu, Kang Wei, Li Shen, Xueqian Wang, and Dacheng Tao
-
[30]
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Make landscape flatter in differentially private federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 24552–24562
-
[31]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 conference on empirical methods in natural language processing. 1631–1642
work page 2013
-
[32]
Yu-Lin Tsai, Yizhe Li, Chia-Mu Yu, Xuebin Ren, Po-Yu Chen, Zekai Chen, and Francois Buet-Golfouse. 2025. Differentially private fine-tuning of diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4561–4571
work page 2025
-
[33]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. https://openreview.net/forum?id=rJ4km2R5t7
work page 2019
-
[34]
Haiming Wang, Zhikun Zhang, Tianhao Wang, Shibo He, Michael Backes, Jiming Chen, and Yang Zhang. 2023. {PrivTrace}: Differentially private trajectory synthesis by adaptive markov models. In32nd USENIX Security Symposium (USENIX Security 23). 1649–1666
work page 2023
-
[35]
Tianhao Wang, Joann Qiongna Chen, Zhikun Zhang, Dong Su, Yueqiang Cheng, Zhou Li, Ninghui Li, and Somesh Jha. 2021. Continuous release of data streams under both centralized and local differential privacy. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 1237–1253
work page 2021
-
[36]
Zihao Wang, Di Tang, XiaoFeng Wang, Wei He, Zhaoyang Geng, and Wenhao Wang. 2024. Tossing in the dark: Practical {Bit-Flipping} on gray-box deep neural networks for runtime trojan injection. In33rd USENIX Security Symposium (USENIX Security 24). 1331–1348
work page 2024
-
[37]
Zihao Wang, Rui Zhu, Dongruo Zhou, Zhikun Zhang, John Mitchell, Haixu Tang, and XiaoFeng Wang. 2024. {DPAdapter}: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training. In33rd USENIX Security Symposium (USENIX Security 24). 991–1008
work page 2024
-
[38]
Zihao Wang, Rui Zhu, Dongruo Zhou, Zhikun Zhang, XiaoFeng Wang, and Haixu Tang. 2025. Sharpness-Aware Initialization: Improving Differentially Private Machine Learning from First Principles. In34th USENIX Security Symposium (USENIX Security 25). 3103–3122
work page 2025
-
[39]
Chengkun Wei, Minghu Zhao, Zhikun Zhang, Min Chen, Wenlong Meng, Bo Liu, Yuan Fan, and Wenzhi Chen. 2023. Dpmlbench: Holistic evaluation of differentially private machine learning. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 2621–2635
work page 2023
-
[40]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. InProceedings of the 2018 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1112–1122
work page 2018
- [41]
-
[42]
Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, et al
- [43]
- [44]
-
[45]
Quan Yuan, Zhikun Zhang, Linkang Du, Min Chen, Peng Cheng, and Mingyang Sun. 2023. {PrivGraph}: Differentially private graph data publication by ex- ploiting community information. In32nd USENIX Security Symposium (USENIX Security 23). 3241–3258
work page 2023
-
[46]
Xiang Yue, Huseyin Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, and Robert Sim. 2023. Synthetic text generation with differential privacy: A simple and practical recipe. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1321–1342
work page 2023
-
[47]
Jiaqi Zhang, Kai Zheng, Wenlong Mou, and Liwei Wang. 2017. Efficient private ERM for smooth objectives.arXiv preprint arXiv:1703.09947(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[48]
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Haichao Sha, Zihao Wang, Yuncheng Wu, Hong Chen, and Wei Dong Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[49]
Zhikun Zhang, Tianhao Wang, Ninghui Li, Shibo He, and Jiming Chen. 2018. CALM: Consistent adaptive local marginal for marginal release under local dif- ferential privacy. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 212–229
work page 2018
-
[50]
Zhikun Zhang, Tianhao Wang, Ninghui Li, Jean Honorio, Michael Backes, Shibo He, Jiming Chen, and Yang Zhang. 2021. {PrivSyn}: Differentially private data synthesis. In30th USENIX Security Symposium (USENIX Security 21). 929–946
work page 2021
-
[51]
Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, and Shanghang Zhang. 2024. Gradient-based parameter selection for efficient fine-tuning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 28566–28577. Ethical Considerations We conducted this research in accordance with established ethical g...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.