pith. sign in

arxiv: 2502.06044 · v3 · submitted 2025-02-09 · 📊 stat.ML · cs.LG

Differentially Private Hyperparameter Tuning using Local Bayesian Optimization

Pith reviewed 2026-05-23 03:16 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords differentially private hyperparameter tuninglocal Bayesian optimizationGaussian process surrogateconvergence analysisblack-box optimizationprivacy preserving machine learninghyperparameter optimization
0
0 comments X

The pith

DP-GIBO converges to locally optimal hyperparameters under differential privacy with polynomial dimensional scaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DP-GIBO to perform hyperparameter tuning when validation data must stay private. It applies local Bayesian optimization and uses a Gaussian process surrogate to generate private gradient approximations from noisy black-box evaluations. The central result is a convergence guarantee to a local optimum whose error depends on the privacy budget and grows only polynomially with dimension. Prior private tuning methods either rely on inefficient random search or global Bayesian optimization that scales exponentially, so the new approach targets the practical regime of moderate-to-high-dimensional search spaces.

Core claim

Under suitable conditions, DP-GIBO converges to a locally optimal hyperparameter configuration up to a privacy-dependent error, with dimensional dependence that is polynomial rather than exponential. Empirically, DP-GIBO provides scalable private hyperparameter tuning across multiple tasks, substantially outperforming non-private random search and global Bayesian optimization baselines in moderate-to-high-dimensional hyperparameter spaces.

What carries the argument

DP-GIBO, a differentially private local Bayesian optimization framework that privately approximates gradients using a Gaussian Process surrogate

If this is right

  • Convergence holds to a local optimum whose distance from the true optimum is controlled by the privacy parameter.
  • The method operates on noisy black-box validation objectives without requiring direct gradient access.
  • Empirical results show clear gains over random search and global Bayesian optimization once dimension reaches moderate values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Local rather than global search may be the natural regime for privacy-preserving black-box optimization because privacy noise interacts less destructively with local structure.
  • The same surrogate-based private gradient mechanism could be tested on other sequential decision problems that require both privacy and high-dimensional search.
  • Downstream model accuracy after private tuning would be a direct test of whether the local convergence error remains tolerable in practice.

Load-bearing premise

Suitable conditions on noise levels, objective smoothness, and privacy parameters exist so the Gaussian process surrogate yields a private gradient approximation whose error still permits local convergence.

What would settle it

An experiment in which DP-GIBO's error grows exponentially with dimension or its performance falls below random search for any fixed privacy budget would falsify the polynomial scaling and convergence claims.

Figures

Figures reproduced from arXiv: 2502.06044 by Getoar Sopa, John P. Cunningham, Juraj Marusic, Marco Avella Medina.

Figure 1
Figure 1. Figure 1: Local private approach does not suffer from the curse of dimensionality. The performance of Global Bayesian Optimization methods, as well as random grid search methods, depend greatly on the dimension of the problem. By privatizing a Local Bayesian Optimization approach, we significantly improve performance in higher dimensions while preserving privacy. Pictured are DP-GIBO, Random Search, and UCB-BO. appr… view at source ↗
Figure 2
Figure 2. Figure 2: Results of experiments. Left. We present the results of Example 4.1, where we test the performance of Algorithm 1 on tuning the GP regression lengthscales in d = 15 dimensions, and we vary the level of permitted bias ε, privacy level µ and noise level σ. Right. In Example 4.2, we compare our privatized method to non-private GIBO (i.e. µ = ∞ ) and to random search. 5 Discussion High-dimensional black-box op… view at source ↗
Figure 3
Figure 3. Figure 3: Top: Estimates of a single coordinate of the model; Middle: The optimization path of the negative-log likelihood; Bottom: Comparison between the norm of the bias of the gradient estimate and the norm of the added noise for µ = 0.5 case. E.4 Linear regression Here we explore the proposed algorithm on simulated data from a linear regression model and compare the results to noisy gradient descent, a widely us… view at source ↗
Figure 4
Figure 4. Figure 4: Top: Estimates of a single coordinate of the regression parameter; Bottom: Gradient of the loss function at the current iteration, on a log scale. Setting θ = [1, 1, 1, 1]⊤, [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left: We used the correct variance of the noise; Middle: We overestimated the variance of the noise; Right: We underestimated the variance of the noise 25 [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
read the original abstract

Hyperparameter tuning is a key component of machine learning procedures, but when validation data contain sensitive user information, search mechanisms can leak private information through the selected configuration. Existing differentially private hyperparameter tuning methods often rely on near-random search, while prior differentially private Bayesian optimization approaches are typically global and, therefore, scale poorly with the hyperparameter dimensionality. We study differentially private hyperparameter tuning using local Bayesian optimization, focusing on settings where the validation objective is available only through noisy black box evaluations and gradients are unavailable or impractical to compute. We introduce DP-GIBO, a differentially private local Bayesian optimization framework that privately approximates gradients using a Gaussian Process surrogate. Under suitable conditions, we prove that DP-GIBO converges to a locally optimal hyperparameter configuration up to a privacy-dependent error, with dimensional dependence that is polynomial rather than exponential.Empirically, we show that DP-GIBO provides scalable private hyperparameter tuning across multiple tasks, substantially outperforming non-private random search and global Bayesian optimization baselines in moderate-to-high-dimensional hyperparameter spaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces DP-GIBO, a differentially private local Bayesian optimization method for hyperparameter tuning when validation objectives are available only via noisy black-box evaluations. It privately approximates gradients via a Gaussian process surrogate and claims that, under suitable conditions, DP-GIBO converges to a locally optimal configuration up to a privacy-dependent error with polynomial (rather than exponential) dimensional dependence. Empirical results are reported showing substantial outperformance over non-private random search and global BO baselines in moderate-to-high-dimensional spaces.

Significance. If the convergence result holds under conditions that are realistic for typical privacy budgets (ε ≪ 1) and validation objectives, the work would supply a scalable private HPO primitive that fills a gap between near-random-search DP methods and dimensionally fragile global DP-BO approaches. The polynomial dimension scaling and local-search focus are potentially high-impact for modern ML pipelines whose hyperparameter spaces exceed a few dozen dimensions.

major comments (2)
  1. [Abstract] Abstract: The central convergence claim is stated only 'under suitable conditions' with no explicit bounds relating the GP length-scale, smoothness constant, noise variance, and privacy parameters (ε, δ) to the threshold required for the local descent step to make progress. Without these relations it is impossible to determine whether the claimed polynomial dimensional dependence remains meaningful for practical ε values.
  2. [Theoretical analysis] Theoretical analysis section (assumed to contain the proof): The manuscript provides no derivation details, explicit error bounds, or validation of the privacy mechanism implementation. The reader's report notes the absence of these elements, which are load-bearing for assessing whether the total error (approximation + privacy noise) stays below the local-convergence threshold.
minor comments (1)
  1. [Abstract] The abstract is information-dense; expanding the 'suitable conditions' clause into one additional sentence would improve readability without lengthening the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that explicit parameter relations and expanded derivation details would strengthen the presentation of the convergence result and will incorporate these changes. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central convergence claim is stated only 'under suitable conditions' with no explicit bounds relating the GP length-scale, smoothness constant, noise variance, and privacy parameters (ε, δ) to the threshold required for the local descent step to make progress. Without these relations it is impossible to determine whether the claimed polynomial dimensional dependence remains meaningful for practical ε values.

    Authors: We agree that the abstract would benefit from greater specificity. The theoretical analysis establishes the polynomial dimensional dependence under assumptions on the GP length-scale, smoothness constant, noise variance, and privacy parameters (ε, δ), with the local descent step progressing when the total error remains below a derived threshold. In the revision we will update the abstract to reference these conditions explicitly and add a remark (or short corollary) in the theory section that relates the parameters to the progress threshold, allowing readers to assess relevance for practical ε ≪ 1. revision: yes

  2. Referee: [Theoretical analysis] Theoretical analysis section (assumed to contain the proof): The manuscript provides no derivation details, explicit error bounds, or validation of the privacy mechanism implementation. The reader's report notes the absence of these elements, which are load-bearing for assessing whether the total error (approximation + privacy noise) stays below the local-convergence threshold.

    Authors: The main text states the convergence theorem while the complete proof appears in the appendix; however, we acknowledge that the main body lacks intermediate derivation steps and an explicit combined error bound. We will expand the theoretical analysis section to include key proof steps and a bound showing that the sum of GP approximation error and privacy noise remains below the local-convergence threshold under the stated conditions. The privacy mechanism follows the Gaussian mechanism applied to the GP gradient estimate; we will add a short paragraph clarifying this and referencing the standard privacy analysis to address the implementation validation point. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence theorem is a self-contained proof under stated assumptions

full rationale

The paper's central claim is a convergence theorem for DP-GIBO to a locally optimal point (up to privacy error with polynomial dimension scaling). This is presented as a mathematical result relying on suitable conditions for the GP surrogate and noise. No quoted equations or steps reduce the claimed prediction or theorem to a fitted parameter, self-definition, or self-citation chain by construction. The derivation chain is independent of the target result itself and does not invoke load-bearing self-citations or ansatzes smuggled from prior author work. This is the normal case of a self-contained theoretical analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on unspecified suitable conditions for the convergence proof and on the modeling choice that a Gaussian process surrogate can be made differentially private while still providing useful local gradient estimates. No free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Suitable conditions exist under which the private gradient approximation preserves local convergence
    Invoked in the sentence stating the convergence result.

pith-pipeline@v0.9.0 · 5717 in / 1251 out tokens · 18749 ms · 2026-05-23T03:16:08.267727+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Deep learning with differential privacy

    Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages 308--318, 2016

  2. [2]

    Differentially private inference via noisy optimization

    Marco Avella-Medina, Casey Bradshaw, and Po-Ling Loh. Differentially private inference via noisy optimization. The Annals of Statistics , 51(5):2067--2092, 2023

  3. [3]

    Private empirical risk minimization: Efficient algorithms and tight error bounds

    Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science , 2014

  4. [4]

    The role of differential privacy in gdpr compliance

    Rachel Cummings and Deven Desai. The role of differential privacy in gdpr compliance. In Proceedings of the Conference on Fairness, Accountability, and Transparency , 2018

  5. [5]

    Gaussian differential privacy

    Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy. Journal of the Royal Statistical Society: Series B , 84(1):3--37, 2022

  6. [6]

    Differential privacy

    Cynthia Dwork. Differential privacy. In International Colloquium on Automata, Languages, and Programming . Springer, 2006

  7. [7]

    High-dimensional bayesian optimization with sparse axis-aligned subspaces

    David Eriksson and Martin Jankowiak. High-dimensional bayesian optimization with sparse axis-aligned subspaces. In Uncertainty in Artificial Intelligence , pages 493--503. PMLR, 2021

  8. [8]

    Scalable global optimization via local Bayesian optimization

    David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. Scalable global optimization via local Bayesian optimization. Advances in Neural Information Processing Systems , 2019

  9. [9]

    Private stochastic convex optimization: Optimal rates in linear time

    Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: Optimal rates in linear time. In Symposium on Theory of Computing , 2020

  10. [10]

    Bayesian optimization

    Roman Garnett. Bayesian optimization . Cambridge University Press, 2023

  11. [11]

    Faster differentially private convex optimization via second-order methods

    Arun Ganesh, Mahdi Haghifam, Thomas Steinke, and Abhradeep Guha Thakurta. Faster differentially private convex optimization via second-order methods. Advances in Neural Information Processing Systems , 36:79426--79438, 2023

  12. [12]

    Relative location of CT slices on axial axis

    Franz Graf, Hans-Peter Kriegel, Matthias Schubert, Sebastian Poelsterl, and Alexander Cavallaro. Relative location of CT slices on axial axis . UCI Machine Learning Repository, 2011. DOI : https://doi.org/10.24432/C5CP6G

  13. [13]

    Vanilla bayesian optimization performs great in high dimensions

    Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla bayesian optimization performs great in high dimensions. In International Conference on Machine Learning , pages 20793--20817. PMLR, 2024

  14. [14]

    Towards practical differentially private convex optimization

    Roger Iyengar, Joseph P Near, Dawn Song, Om Thakkar, Abhradeep Thakurta, and Lun Wang. Towards practical differentially private convex optimization. In 2019 IEEE symposium on security and privacy (SP) , pages 299--316. IEEE, 2019

  15. [15]

    Machine learning applications in cancer prognosis and prediction

    Konstantina Kourou, Themis P Exarchos, Konstantinos P Exarchos, Michalis V Karamouzis, and Dimitrios I Fotiadis. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal , 2015

  16. [16]

    Differentially private Bayesian optimization

    Matt Kusner, Jacob Gardner, Roman Garnett, and Kilian Weinberger. Differentially private Bayesian optimization. In International Conference on Machine Learning , 2015

  17. [17]

    Private selection from private candidates

    Jingcheng Liu and Kunal Talwar. Private selection from private candidates. In Symposium on Theory of Computing , 2019

  18. [18]

    Local latent space bayesian optimization over structured inputs

    Natalie Maus, Haydn Jones, Juston Moore, Matt J Kusner, John Bradshaw, and Jacob Gardner. Local latent space bayesian optimization over structured inputs. In Advances in Neural Information Processing Systems , 2022

  19. [19]

    The role of adaptive optimizers for honest private hyperparameter selection

    Shubhankar Mohapatra, Sajin Sasy, Xi He, Gautam Kamath, and Om Thakkar. The role of adaptive optimizers for honest private hyperparameter selection. In AAAI Conference on Artificial Intelligence , 2022

  20. [20]

    Local policy search with bayesian optimization

    Sarah M \"u ller, Alexander von Rohr, and Sebastian Trimpe. Local policy search with bayesian optimization. Advances in Neural Information Processing Systems , 2021

  21. [21]

    Lectures on convex optimization , volume 137

    Yurii Nesterov. Lectures on convex optimization , volume 137. Springer, 2018

  22. [22]

    Hyperparameter tuning with renyi differential privacy

    Nicolas Papernot and Thomas Steinke. Hyperparameter tuning with renyi differential privacy. International Conference on Learning Representations , 2022

  23. [23]

    High-dimensional statistics

    Philippe Rigollet and Jan-Christian H \"u tter. High-dimensional statistics. Lecture notes for Course 18S997 , 2023

  24. [24]

    Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing , 2013

  25. [25]

    Practical Bayesian optimization of machine learning algorithms

    Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems , 2012

  26. [26]

    Evading the curse of dimensionality in unconstrained private GLM s

    Shuang Song, Thomas Steinke, Om Thakkar, and Abhradeep Thakurta. Evading the curse of dimensionality in unconstrained private GLM s. In International Conference on Artificial Intelligence and Statistics , pages 2638--2646. PMLR, 2021

  27. [27]

    Taking the human out of the loop: A review of Bayesian optimization

    Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE , 2015

  28. [28]

    Dp-hypo: An adaptive private framework for hyperparameter optimization

    Hua Wang, Sheng Gao, Huanyu Zhang, Weijie Su, and Milan Shen. Dp-hypo: An adaptive private framework for hyperparameter optimization. Advances in Neural Information Processing Systems , 2023

  29. [29]

    The behavior and convergence of local Bayesian optimization

    Kaiwen Wu, Kyurae Kim, Roman Garnett, and Jacob Gardner. The behavior and convergence of local Bayesian optimization. Advances in Neural Information Processing Systems , 2024

  30. [30]

    A statistical framework for differential privacy

    Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association , 105(489):375--389, 2010

  31. [31]

    Gaussian differentially private robust mean estimation and inference

    Myeonghun Yu, Zhao Ren, and Wen-Xin Zhou. Gaussian differentially private robust mean estimation and inference. Bernoulli , 30(4):3059--3088, 2024