On the Sample Complexity of Differentially Private Policy Optimization
Pith reviewed 2026-05-18 04:40 UTC · model grok-4.3
The pith
Differentially private policy optimization adds privacy costs only as lower-order terms in sample complexity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors formalize a differential privacy definition for policy optimization that addresses on-policy learning dynamics and the choice of privacy unit. They analyze the sample complexity of policy gradient, natural policy gradient, and related algorithms under differential privacy constraints via a unified framework. The central result is that privacy costs typically manifest as lower-order terms in the sample complexity, accompanied by several subtle observations particular to private policy optimization settings.
What carries the argument
A unified analytical framework for deriving sample complexity bounds of differentially private policy optimization algorithms.
Load-bearing premise
The analysis depends on the newly formalized definition of differential privacy for on-policy learning dynamics and the specific choice of privacy unit.
What would settle it
An experiment or calculation showing that the additional samples required for a fixed privacy level grow linearly with the number of iterations or episodes, rather than as a lower-order additive or multiplicative term, would falsify the main claim.
read the original abstract
Policy optimization (PO) is a cornerstone of modern reinforcement learning (RL), with diverse applications spanning robotics, healthcare, and large language model training. The increasing deployment of PO in sensitive domains, however, raises significant privacy concerns. In this paper, we initiate a theoretical study of differentially private policy optimization, focusing explicitly on its sample complexity. We first formalize an appropriate definition of differential privacy (DP) tailored to PO, addressing the inherent challenges arising from on-policy learning dynamics and the subtlety involved in defining the unit of privacy. We then systematically analyze the sample complexity of widely-used PO algorithms, including policy gradient (PG), natural policy gradient (NPG) and more, under DP constraints and various settings, via a unified framework. Our theoretical results demonstrate that privacy costs can often manifest as lower-order terms in the sample complexity, while also highlighting subtle yet important observations in private PO settings. These offer valuable practical insights for privacy-preserving PO algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper initiates a theoretical study of differentially private policy optimization in reinforcement learning. It formalizes a DP definition tailored to on-policy learning dynamics and the choice of privacy unit, then analyzes the sample complexity of policy gradient (PG), natural policy gradient (NPG), and variants under DP constraints via a unified framework. The central claim is that privacy costs often manifest as lower-order terms in the sample complexity, with additional subtle observations in private PO settings.
Significance. If the results hold, the work provides practical insights for privacy-preserving PO in sensitive domains like healthcare and robotics by suggesting that DP can often be added with only lower-order sample overhead. The unified analysis across PG/NPG and the focus on on-policy dynamics are strengths; the paper merits credit for attempting a systematic treatment of privacy unit choice in trajectory-based learning.
major comments (2)
- [§3 (DP definition for PO)] §3 (DP definition for PO): The custom DP notion, which redefines the privacy unit to account for on-policy trajectory dependence, is load-bearing for the entire sample-complexity analysis. The manuscript provides no formal reduction to standard (ε,δ)-DP, no composition theorem over iterative policy updates, and no verification that the neighboring relation prevents leakage during gradient estimation or policy improvement steps. Without this, the separation of privacy costs as lower-order terms does not necessarily follow.
- [§4 (unified sample-complexity bounds)] §4 (unified sample-complexity bounds): The claim that privacy overhead is lower-order for both PG and NPG relies on the fixed custom DP definition and specific assumptions on how privacy noise interacts with on-policy sampling. The presentation does not explicitly derive or bound the interaction term between the privacy mechanism and the policy-update operator, leaving open whether the lower-order result holds under standard MDP assumptions or requires additional restrictions on the policy class or horizon.
minor comments (2)
- [Abstract] The abstract refers to 'and more' algorithms without listing the variants analyzed; this should be made precise in the introduction or contributions paragraph.
- [Notation and Preliminaries] Notation for privacy parameters (ε, δ) and sample terms should be introduced once and used consistently; occasional redefinition across sections reduces readability.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments, which help strengthen the rigor of our analysis. We appreciate the positive assessment of the paper's contributions in formalizing DP for policy optimization and analyzing sample complexity. We address each major comment below and will incorporate the suggested clarifications and additions in the revised manuscript.
read point-by-point responses
-
Referee: §3 (DP definition for PO): The custom DP notion, which redefines the privacy unit to account for on-policy trajectory dependence, is load-bearing for the entire sample-complexity analysis. The manuscript provides no formal reduction to standard (ε,δ)-DP, no composition theorem over iterative policy updates, and no verification that the neighboring relation prevents leakage during gradient estimation or policy improvement steps. Without this, the separation of privacy costs as lower-order terms does not necessarily follow.
Authors: We agree that establishing a formal link to standard DP strengthens the foundation. In the revision, we will add an appendix providing a formal reduction from our tailored DP definition to standard (ε,δ)-DP with respect to neighboring trajectories. We will also derive a composition theorem for the sequence of policy updates and explicitly verify that the chosen neighboring relation bounds sensitivity to prevent leakage in both gradient estimation and policy improvement steps. These additions will confirm that the lower-order privacy cost separation holds under the defined notion. revision: yes
-
Referee: §4 (unified sample-complexity bounds): The claim that privacy overhead is lower-order for both PG and NPG relies on the fixed custom DP definition and specific assumptions on how privacy noise interacts with on-policy sampling. The presentation does not explicitly derive or bound the interaction term between the privacy mechanism and the policy-update operator, leaving open whether the lower-order result holds under standard MDP assumptions or requires additional restrictions on the policy class or horizon.
Authors: We acknowledge the value of making the interaction term explicit. In the revised version, we will expand the unified analysis in Section 4 to derive and bound the interaction between the privacy noise and the policy-update operator. We will show that the lower-order overhead holds under standard MDP assumptions (bounded rewards, finite horizon, Lipschitz policies) without further restrictions on the policy class, and we will clarify any horizon-dependent conditions in the statements of the theorems. revision: yes
Circularity Check
No circularity: derivation self-contained from new DP definition and standard primitives
full rationale
The paper introduces a custom DP definition for on-policy PO and derives sample-complexity bounds for PG/NPG variants under that definition. No quoted step reduces a claimed result to a fitted parameter, self-citation chain, or input by construction. The central claim that privacy appears as a lower-order term follows directly from the analysis once the neighboring relation and privacy unit are fixed; this is a standard first-principles derivation rather than a renaming or self-referential fit. The work is self-contained against external DP and RL benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A suitable differential privacy definition exists for on-policy policy optimization that respects learning dynamics and privacy unit choice.
Reference graph
Works this paper leans on
-
[1]
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforce- ment learning.Machine learning, 8:229–256, 1992
work page 1992
-
[2]
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient meth- ods for reinforcement learning with function approximation.Advances in neural information processing systems, 12, 1999
work page 1999
-
[3]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[4]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
Alekh Agarwal, Sham M Kakade, Jason D Lee, and Gaurav Mahajan. On the theory of policy gradient methods: Optimality, approximation, and distribution shift.Journal of Machine Learning Research, 22(98):1–76, 2021
work page 2021
-
[6]
A general sample complexity analysis of vanilla policy gradient
Rui Yuan, Robert M Gower, and Alessandro Lazaric. A general sample complexity analysis of vanilla policy gradient. InInternational Conference on Artificial Intelligence and Statistics, pages 3332–3380. PMLR, 2022
work page 2022
-
[7]
Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps
Lior Shani, Yonathan Efroni, and Shie Mannor. Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps. InProceedings of the AAAI Confer- ence on Artificial Intelligence, volume 34, pages 5668–5675, 2020
work page 2020
-
[8]
Boyi Liu, Qi Cai, Zhuoran Yang, and Zhaoran Wang. Neural trust region/proximal policy optimization attains globally optimal policy.Advances in neural information processing systems, 32, 2019
work page 2019
-
[9]
Calibrating noise to sensitivity in private data analysis
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. InTheory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006
work page 2006
-
[10]
A natural policy gradient.Advances in neural information processing systems, 14, 2001
Sham M Kakade. A natural policy gradient.Advances in neural information processing systems, 14, 2001
work page 2001
-
[11]
Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J
Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, and Wen Sun. Rebel: Reinforcement learning via regressing relative rewards, 2024. URL https://arxiv.org/abs/2404.16767
-
[12]
Differentially private policy gradient.arXiv preprint arXiv:2501.19080, 2025
Alexandre Rio, Merwan Barlier, and Igor Colin. Differentially private policy gradient.arXiv preprint arXiv:2501.19080, 2025
-
[13]
Differentially private regret minimization in episodic markov decision processes
Sayak Ray Chowdhury and Xingyu Zhou. Differentially private regret minimization in episodic markov decision processes. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6375–6383, 2022. 10
work page 2022
-
[14]
Differentially private reinforcement learning with linear function approximation
Xingyu Zhou. Differentially private reinforcement learning with linear function approximation. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6(1):1–27, 2022
work page 2022
-
[15]
Differentially private policy evaluation
Borja Balle, Maziar Gomrokchi, and Doina Precup. Differentially private policy evaluation. In International Conference on Machine Learning, pages 2130–2138. PMLR, 2016
work page 2016
-
[16]
Privately aligning language models with reinforcement learning.arXiv preprint arXiv:2310.16960, 2023
Fan Wu, Huseyin A Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, and Robert Sim. Privately aligning language models with reinforcement learning.arXiv preprint arXiv:2310.16960, 2023
-
[17]
Regressing the relative future: Efficient policy optimization for multi-turn rlhf
Zhaolin Gao, Wenhao Zhan, Jonathan D Chang, Gokul Swamy, Kianté Brantley, Jason D Lee, and Wen Sun. Regressing the relative future: Efficient policy optimization for multi-turn rlhf. arXiv preprint arXiv:2410.04612, 2024
-
[18]
Tongxin Zhou, Yingfei Wang, Lu Yan, and Yong Tan. Spoiled for choice? personalized recommendation for healthcare decisions: A multiarmed bandit approach.Information Systems Research, 34(4):1493–1512, 2023
work page 2023
-
[19]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022
work page 2022
-
[20]
Direct preference optimization: Your language model is secretly a reward model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36:53728–53741, 2023
work page 2023
-
[21]
Differentially private empirical risk minimization.Journal of Machine Learning Research, 12(3), 2011
Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization.Journal of Machine Learning Research, 12(3), 2011
work page 2011
-
[22]
Private empirical risk minimization: Efficient algorithms and tight error bounds
Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In2014 IEEE 55th annual symposium on foundations of computer science, pages 464–473. IEEE, 2014
work page 2014
-
[23]
Raef Bassily, Vitaly Feldman, Kunal Talwar, and Abhradeep Guha Thakurta. Private stochastic convex optimization with optimal rates.Advances in neural information processing systems, 32, 2019
work page 2019
-
[24]
Private reinforcement learning with pac and regret guarantees
Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, and Steven Wu. Private reinforcement learning with pac and regret guarantees. InInternational Conference on Machine Learning, pages 9754–9764. PMLR, 2020
work page 2020
-
[25]
Private matchings and allocations.SIAM Journal on Computing, 45:1953–1984, 2016
J Hsu, Z Huang, A Roth, T Roughgarden, and ZS Wu. Private matchings and allocations.SIAM Journal on Computing, 45:1953–1984, 2016
work page 1953
-
[26]
Roshan Shariff and Or Sheffet. Differentially private contextual linear bandits.Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[27]
Near-optimal differentially private reinforcement learning
Dan Qiao and Yu-Xiang Wang. Near-optimal differentially private reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pages 9914–9940. PMLR, 2023
work page 2023
-
[28]
Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy.Founda- tions and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014
work page 2014
-
[29]
On the global optimum convergence of momentum- based policy gradient
Yuhao Ding, Junzi Zhang, and Javad Lavaei. On the global optimum convergence of momentum- based policy gradient. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 ofProceedings of Machine Learning Research, pages 1910–1934. PMLR, 2...
work page 1910
-
[30]
Yanli Liu, Kaiqing Zhang, Tamer Ba¸ sar, and Wotao Yin. An improved analysis of (variance- reduced) policy gradient and natural policy gradient methods, 2022. URL https://arxiv. org/abs/2211.07937. 11
-
[31]
Neural policy gradient methods: Global optimality and rates of convergence, 2019
Lingxiao Wang, Qi Cai, Zhuoran Yang, and Zhaoran Wang. Neural policy gradient methods: Global optimality and rates of convergence, 2019. URL https://arxiv.org/abs/1909. 01150
work page 2019
-
[32]
Hybrid rl: Using both offline and online data can make rl efficient,
Yuda Song, Yifei Zhou, Ayush Sekhari, J Andrew Bagnell, Akshay Krishnamurthy, and Wen Sun. Hybrid rl: Using both offline and online data can make rl efficient.arXiv preprint arXiv:2210.06718, 2022
-
[33]
Mechanism design via differential privacy
Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103. IEEE, 2007
work page 2007
-
[34]
Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C Perdomo, and Adam Smith. Insufficient statistics perturbation: Stable estimators for private least squares.arXiv preprint arXiv:2404.15409, 2024
-
[35]
Near-optimal private learning in linear contextual bandits, 2025
Fan Chen, Jiachun Li, Alexander Rakhlin, and David Simchi-Levi. Near-optimal private learning in linear contextual bandits, 2025. URLhttps://arxiv.org/abs/2502.13115
-
[36]
Rui Yuan, Simon S Du, Robert M Gower, Alessandro Lazaric, and Lin Xiao. Linear convergence of natural policy gradient methods with log-linear policies.arXiv preprint arXiv:2210.01400, 2022
-
[37]
Stochastic variance-reduced policy gradient
Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, and Marcello Restelli. Stochastic variance-reduced policy gradient. InInternational conference on machine learning, pages 4026–4035. PMLR, 2018
work page 2018
-
[38]
Trust region policy optimization
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. InInternational conference on machine learning, pages 1889–1897. PMLR, 2015
work page 2015
-
[39]
Global optimality guarantees for policy gradient methods
Jalaj Bhandari and Daniel Russo. Global optimality guarantees for policy gradient methods. Operations Research, 72(5):1906–1927, 2024
work page 1906
-
[40]
Boyi Liu, Qi Cai, Zhuoran Yang, and Zhaoran Wang. Neural proximal/trust region policy optimization attains globally optimal policy (2019).arXiv preprint arXiv:1906.10306, 1906
-
[41]
Yanli Liu, Kaiqing Zhang, Tamer Basar, and Wotao Yin. An improved analysis of (variance- reduced) policy gradient and natural policy gradient methods.Advances in Neural Information Processing Systems, 33:7624–7636, 2020
work page 2020
-
[42]
Optimistic policy optimization with bandit feedback
Lior Shani, Yonathan Efroni, Aviv Rosenberg, and Shie Mannor. Optimistic policy optimization with bandit feedback. InInternational Conference on Machine Learning, pages 8604–8613. PMLR, 2020
work page 2020
-
[43]
Provably efficient exploration in policy optimization
Qi Cai, Zhuoran Yang, Chi Jin, and Zhaoran Wang. Provably efficient exploration in policy optimization. InInternational Conference on Machine Learning, pages 1283–1294. PMLR, 2020
work page 2020
-
[44]
Alekh Agarwal, Mikael Henaff, Sham Kakade, and Wen Sun. Pc-pg: Policy cover directed exploration for provable policy gradient learning.Advances in neural information processing systems, 33:13399–13412, 2020
work page 2020
-
[45]
Cautiously optimistic policy optimiza- tion and exploration with linear function approximation
Andrea Zanette, Ching-An Cheng, and Alekh Agarwal. Cautiously optimistic policy optimiza- tion and exploration with linear function approximation. InConference on Learning Theory, pages 4473–4525. PMLR, 2021
work page 2021
-
[46]
Qinghua Liu, Gellért Weisz, András György, Chi Jin, and Csaba Szepesvári. Optimistic natural policy gradient: a simple efficient policy optimization framework for online rl.Advances in Neural Information Processing Systems, 36:3560–3577, 2023
work page 2023
-
[47]
(nearly) optimal differentially private stochastic multi-arm bandits
Nikita Mishra and Abhradeep Thakurta. (nearly) optimal differentially private stochastic multi-arm bandits. InProceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pages 592–601, 2015. 12
work page 2015
-
[48]
An optimal private stochastic-mab algorithm based on optimal private stopping rule
Touqir Sajed and Or Sheffet. An optimal private stochastic-mab algorithm based on optimal private stopping rule. InInternational Conference on Machine Learning, pages 5579–5588. PMLR, 2019
work page 2019
-
[49]
Distributed differential privacy in multi-armed bandits.arXiv preprint arXiv:2206.05772, 2022
Sayak Ray Chowdhury and Xingyu Zhou. Distributed differential privacy in multi-armed bandits.arXiv preprint arXiv:2206.05772, 2022
-
[50]
Multi-armed bandits with local differential privacy.arXiv preprint arXiv:2007.03121, 2020
Wenbo Ren, Xingyu Zhou, Jia Liu, and Ness B Shroff. Multi-armed bandits with local differential privacy.arXiv preprint arXiv:2007.03121, 2020
-
[51]
Yulian Wu, Xingyu Zhou, Youming Tao, and Di Wang. On private and robust bandits.Advances in Neural Information Processing Systems, 36:34778–34790, 2023
work page 2023
-
[52]
A reduction from linear contextual bandits lower bounds to estimations lower bounds
Jiahao He, Jiheng Zhang, and Rachel Q Zhang. A reduction from linear contextual bandits lower bounds to estimations lower bounds. InInternational Conference on Machine Learning, pages 8660–8677. PMLR, 2022
work page 2022
-
[53]
Shuffle private linear contextual bandits.arXiv preprint arXiv:2202.05567, 2022
Sayak Ray Chowdhury and Xingyu Zhou. Shuffle private linear contextual bandits.arXiv preprint arXiv:2202.05567, 2022
-
[54]
On differentially private federated linear contextual bandits.arXiv preprint arXiv:2302.13945, 2023
Xingyu Zhou and Sayak Ray Chowdhury. On differentially private federated linear contextual bandits.arXiv preprint arXiv:2302.13945, 2023
-
[55]
Paul Luyo, Evrard Garcelon, Alessandro Lazaric, and Matteo Pirotta. Differentially private ex- ploration in reinforcement learning with linear representation.arXiv preprint arXiv:2112.01585, 2021
-
[56]
Private stochastic convex optimization: optimal rates in linear time
Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: optimal rates in linear time. InProceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 439–449, 2020
work page 2020
-
[57]
Private stochastic convex optimiza- tion: Optimal rates in l1 geometry
Hilal Asi, Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimiza- tion: Optimal rates in l1 geometry. InInternational Conference on Machine Learning, pages 393–403. PMLR, 2021
work page 2021
-
[58]
Non-euclidean differentially private stochastic convex optimization
Raef Bassily, Cristóbal Guzmán, and Anupama Nandi. Non-euclidean differentially private stochastic convex optimization. InConference on Learning Theory, pages 474–499. PMLR, 2021
work page 2021
-
[59]
Janardhan Kulkarni, Yin Tat Lee, and Daogao Liu. Private non-smooth empirical risk minimiza- tion and stochastic convex optimization in subquadratic steps.arXiv preprint arXiv:2103.15352, 2021
-
[60]
Private federated learning without a trusted server: Optimal algorithms for convex losses
Andrew Lowy and Meisam Razaviyayn. Private federated learning without a trusted server: Optimal algorithms for convex losses. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
- [61]
-
[62]
Differentially private empirical risk minimization with non-convex loss functions
Di Wang, Changyou Chen, and Jinhui Xu. Differentially private empirical risk minimization with non-convex loss functions. InInternational Conference on Machine Learning, pages 6526–6535. PMLR, 2019
work page 2019
-
[63]
Private stochastic non-convex optimization: Adaptive algorithms and tighter generalization bounds
Yingxue Zhou, Xiangyi Chen, Mingyi Hong, Zhiwei Steven Wu, and Arindam Banerjee. Private stochastic non-convex optimization: Adaptive algorithms and tighter generalization bounds. arXiv preprint arXiv:2006.13501, 2020
-
[64]
Faster rates of convergence to stationary points in differentially private optimization
Raman Arora, Raef Bassily, Tomás González, Cristóbal A Guzmán, Michael Menart, and Enayat Ullah. Faster rates of convergence to stationary points in differentially private optimization. In International Conference on Machine Learning, pages 1060–1092. PMLR, 2023. 13
work page 2023
-
[65]
Daogao Liu, Arun Ganesh, Sewoong Oh, and Abhradeep Guha Thakurta. Private (stochastic) non-convex optimization revisited: Second-order stationary points and excess risks.Advances in Neural Information Processing Systems, 36:65618–65641, 2023
work page 2023
-
[66]
Daogao Liu and Kunal Talwar. Adaptive batch size for privately finding second-order stationary points.arXiv preprint arXiv:2410.07502, 2024
-
[67]
Yu-Xiang Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain.arXiv preprint arXiv:1803.02596, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[68]
Old techniques in differentially private linear regression
Or Sheffet. Old techniques in differentially private linear regression. InAlgorithmic Learning Theory, pages 789–827. PMLR, 2019
work page 2019
-
[69]
T Tony Cai, Yichen Wang, and Linjun Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy.The Annals of Statistics, 49(5):2825–2850, 2021
work page 2021
-
[70]
Gavin Brown, Krishnamurthy Dvijotham, Georgina Evans, Daogao Liu, Adam Smith, and Abhradeep Thakurta. Private gradient descent for linear regression: Tighter error bounds and instance-specific uncertainty estimation.arXiv preprint arXiv:2402.13531, 2024
-
[71]
(nearly) optimal private linear regression for sub-gaussian data via adaptive clipping
Prateek Varshney, Abhradeep Thakurta, and Prateek Jain. (nearly) optimal private linear regression for sub-gaussian data via adaptive clipping. InConference on Learning Theory, pages 1126–1166. PMLR, 2022
work page 2022
-
[72]
Xiyang Liu, Prateek Jain, Weihao Kong, Sewoong Oh, and Arun Suggala. Label robust and differentially private linear regression: Computational and statistical efficiency.Advances in Neural Information Processing Systems, 36:23019–23033, 2023
work page 2023
-
[73]
Sample-optimal private regression in polynomial time.arXiv preprint arXiv:2503.24321, 2025
Prashanti Anderson, Ainesh Bakshi, Mahbod Majid, and Stefan Tiegel. Sample-optimal private regression in polynomial time.arXiv preprint arXiv:2503.24321, 2025
-
[74]
Dylan J Foster, Sham M Kakade, Jian Qian, and Alexander Rakhlin. The statistical complexity of interactive decision making.arXiv preprint arXiv:2112.13487, 2021. NeurIPS Paper Checklist 1.Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: : In the a...
-
[75]
e∇mJ(θ) + λ |X | 1 |Y| −[π x(θ)]x∈X 2# =E
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
work page 2025
-
[76]
who use PPO instead of NPG, and (iii) as privacy budget decreases (smaller ε), performance degrades as expected from theory. 36 AlgorithmεMean Final Reward Std Final Reward Best Epoch Mean PG N/A 334.37 25.25 361.70 DP-PG5 190.34 52.91 199.17 DP-PG3 143.87 22.88 187.17 NPG N/A 492.90 10.04 500.00 DP-NPG5 478.73 28.05 494.70 DP-NPG3 400.87 65.37 410.43 Tab...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.