Integrating Feature Correlation in Differential Privacy with Applications in DP-ERM
Pith reviewed 2026-05-07 03:32 UTC · model grok-4.3
The pith
CorrDP relaxes privacy for insensitive features by their total variation distance to sensitive ones, yielding better utility in DP-ERM.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By defining a correlation-aware differential privacy notion that relaxes protection for insensitive features in line with their total variation distance to sensitive features, the paper shows that DP-ERM can achieve strictly superior utility bounds while still satisfying the formal privacy definition, with the same holding when correlations are estimated from the dataset itself.
What carries the argument
The CorrDP framework, which quantifies correlations via total variation distance to determine the allowable privacy relaxation for each feature and applies distance-dependent noise calibration in the gradient descent process for empirical risk minimization.
If this is right
- DP-ERM under CorrDP obtains improved theoretical utility guarantees relative to standard differential privacy.
- Estimating correlation distances from the dataset achieves comparable privacy-utility performance.
- The framework yields higher accuracy than standard DP-ERM on both synthetic and real-world data containing insensitive features.
- Distance-dependent noise addition in gradients is the mechanism delivering the utility improvement.
Where Pith is reading between the lines
- Similar correlation-based relaxations could be applied to other DP primitives beyond ERM, such as private stochastic gradient descent in deep learning.
- Data pipelines might incorporate automated correlation estimation as a preprocessing step to identify opportunities for privacy relaxation.
- Alternative distance or dependence measures could be substituted for total variation distance if they provide tighter bounds on information leakage.
Load-bearing premise
The total variation distance between the distributions of insensitive and sensitive features fully captures the risk of information leakage, such that relaxing privacy based on it does not permit any unexpected privacy breach.
What would settle it
Observing a privacy violation in a constructed dataset where an insensitive feature has low total variation distance but high predictive power for a sensitive attribute under the CorrDP noise levels.
Figures
read the original abstract
Standard differential privacy imposes uniform privacy constraints across all features, overlooking the inherent distinction between sensitive and insensitive features in practice. In this paper, we introduce a relaxed definition of differential privacy that accounts for such heterogeneity, allowing certain features to be treated as insensitive even when correlated with sensitive ones. We propose a correlation-aware framework, $\textsf{CorrDP}$, which relaxes privacy for insensitive features while accounting for their correlations with sensitive features, with the correlations quantified using total variation distance. We design algorithms for differentially private empirical risk minimization (DP-ERM) under the $\textsf{CorrDP}$ framework, incorporating distance-dependent noise into gradients for improved theoretical utility guarantees. When the correlation distance is unknown, we estimate it from the dataset and show that it achieves a comparable privacy-utility guarantee. We perform experiments on synthetic and real-world datasets and show that $\textsf{CorrDP}$-based DP-ERM algorithms consistently outperform the standard DP framework in the presence of insensitive features.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CorrDP, a relaxed differential privacy definition that accounts for heterogeneity between sensitive and insensitive features by quantifying their correlations via total variation distance. It develops DP-ERM algorithms that incorporate distance-dependent noise into gradients to obtain improved theoretical utility bounds. When the correlation distance is unknown, the paper estimates it directly from the training data and asserts that this yields comparable privacy-utility guarantees. Experiments on synthetic and real-world datasets show that CorrDP-based methods outperform standard DP-ERM when insensitive features are present.
Significance. If the privacy analysis for the data-dependent estimation step holds, the framework could meaningfully improve utility in DP-ERM settings with heterogeneous feature sensitivities, a common practical scenario. The use of total variation distance supplies a concrete, quantifiable relaxation, and the distance-dependent noise mechanism is a natural algorithmic extension. The empirical validation on real data adds practical relevance, but the overall significance hinges on whether the estimation procedure preserves the claimed (ε,δ) guarantees without additional leakage.
major comments (1)
- [Estimation procedure for unknown correlation distance (abstract and algorithmic section)] The claim that estimating the correlation distance d from the dataset achieves comparable privacy-utility guarantees (stated in the abstract and developed in the algorithmic section) is load-bearing for the main result when d is unknown. The estimated d directly scales the noise magnitude added to the gradients in the DP-ERM procedure. The manuscript does not appear to bound the sensitivity of the total-variation estimator or to fold its privacy cost into the overall calibration; a non-private estimator therefore risks either under-noising or allowing an adversary to infer correlations (and thereby information about sensitive features) from the observed noise level or output model. A revised version must either supply a private estimator (e.g., via the Laplace mechanism on a bounded-sensitivity statistic) or prove that the estimation error can be absorbed into the existing utility bound.
minor comments (2)
- [Abstract] The abstract states that the framework yields 'improved theoretical utility guarantees' but does not identify the precise baseline (standard DP-SGD, prior relaxed DP notions, etc.) or the functional improvement (e.g., better dependence on dimension, sample size, or Lipschitz constant). Adding a short comparison sentence would clarify the advance.
- [Preliminaries / definition of CorrDP] The definition of total variation distance and its estimation should explicitly address whether features are discrete or continuous and how the estimator behaves in moderate-to-high dimensions, where finite-sample TV estimation can be biased or high-variance.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The primary concern identified is the lack of a formal privacy analysis for the data-dependent estimation of the correlation distance. We agree that this aspect requires explicit treatment to substantiate the claimed guarantees and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Estimation procedure for unknown correlation distance (abstract and algorithmic section)] The claim that estimating the correlation distance d from the dataset achieves comparable privacy-utility guarantees (stated in the abstract and developed in the algorithmic section) is load-bearing for the main result when d is unknown. The estimated d directly scales the noise magnitude added to the gradients in the DP-ERM procedure. The manuscript does not appear to bound the sensitivity of the total-variation estimator or to fold its privacy cost into the overall calibration; a non-private estimator therefore risks either under-noising or allowing an adversary to infer correlations (and thereby information about sensitive features) from the observed noise level or output model. A revised version must either supply a private estimator (e.g., via the Laplace mechanism on a bounded-sensitivity sta
Authors: We thank the referee for identifying this important gap. The current manuscript estimates the correlation distance directly from the data and asserts comparable guarantees, but does not explicitly bound the sensitivity of the total-variation estimator or allocate privacy budget to the estimation step. In the revised version we will introduce a private estimator: we will first establish that the empirical total-variation distance statistic has bounded sensitivity (linear in the inverse sample size for discrete features), then release a noisy version via the Laplace mechanism. A fixed fraction of the overall privacy budget will be reserved for this release; the remaining budget will be used to calibrate the distance-dependent noise in the DP-ERM gradients. The resulting estimation error will be absorbed into the existing utility analysis as an additive term that vanishes with the sample size, preserving the improved privacy-utility tradeoff. The abstract, algorithmic section, and theoretical statements will be updated to reflect the revised procedure and guarantees. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper defines CorrDP via total variation distance to relax privacy for insensitive features, then derives DP-ERM algorithms that inject distance-dependent noise into gradients. The claim that estimating the distance from the dataset yields comparable privacy-utility guarantees is presented as a separate result rather than a tautological reduction of the core theorems to their inputs. No equation is shown to equal its own fitted parameter by construction, no self-citation chain is load-bearing for the main framework, and the derivation remains independent of the estimation step. This is the normal case of a self-contained theoretical development against standard DP-ERM baselines.
Axiom & Free-Parameter Ledger
free parameters (1)
- correlation distance
axioms (1)
- domain assumption Total variation distance is an appropriate measure for quantifying correlations in the context of privacy relaxation.
Reference graph
Works this paper leans on
-
[1]
, author=
Differentially private empirical risk minimization. , author=. Journal of Machine Learning Research , volume=
-
[2]
, title =
Chaudhuri, Syomantak and Courtade, Thomas A. , title =. 2025 , booktitle =
2025
-
[3]
Proceedings of the 28th International Conference on Artificial Intelligence and Statistics , pages=
Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy , author=. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics , pages=. 2025 , series=
2025
-
[4]
2015 Winter Simulation Conference (WSC) , pages=
Unbiased Monte Carlo for optimization and functions of expectations via multi-level randomization , author=. 2015 Winter Simulation Conference (WSC) , pages=. 2015 , organization=
2015
-
[5]
Conference on Learning Theory , pages=
Differentially private online learning , author=. Conference on Learning Theory , pages=. 2012 , organization=
2012
-
[6]
Tong, X.; Li, W.; Li, L.; Loy, C
Optimal algorithms for group distributionally robust optimization and beyond , author=. arXiv preprint arXiv:2212.13669 , year=
-
[7]
Proceedings of the 37th International Conference on Machine Learning , pages=
Context aware local differential privacy , author=. Proceedings of the 37th International Conference on Machine Learning , pages=. 2020 , series=
2020
-
[8]
Computer Standards & Interfaces , pages=
Local differential privacy and its applications: A comprehensive survey , author=. Computer Standards & Interfaces , pages=. 2023 , publisher=
2023
-
[9]
Selective Differential Privacy for Language Modeling
Shi, Weiyan and Cui, Aiqi and Li, Evan and Jia, Ruoxi and Yu, Zhou. Selective Differential Privacy for Language Modeling. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics. 2022
2022
-
[10]
International Conference on Machine Learning , pages=
Collect at once, use effectively: Making non-interactive locally private learning possible , author=. International Conference on Machine Learning , pages=. 2017 , organization=
2017
-
[11]
arXiv preprint arXiv:2002.08570 , year=
Input perturbation: A new paradigm between central and local differential privacy , author=. arXiv preprint arXiv:2002.08570 , year=
-
[12]
Proceedings of the 30th International conference on machine learning , pages=
Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes , author=. Proceedings of the 30th International conference on machine learning , pages=. 2013 , series=
2013
-
[13]
Linear convergence of gradient and proximal-gradient methods under the polyak-
Karimi, Hamed and Nutini, Julie and Schmidt, Mark , booktitle=. Linear convergence of gradient and proximal-gradient methods under the polyak-
-
[14]
2006 , publisher=
A distribution-free theory of nonparametric regression , author=. 2006 , publisher=
2006
-
[15]
Journal of statistical software , volume=
Nonparametric econometrics: The np package , author=. Journal of statistical software , volume=
-
[16]
Proceedings of the 2017 ACM International Conference on Management of Data , pages=
Pufferfish privacy mechanisms for correlated data , author=. Proceedings of the 2017 ACM International Conference on Management of Data , pages=
2017
-
[17]
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=
Attribute privacy: Framework and mechanisms , author=. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=
2022
-
[18]
2008 , publisher=
Introduction to nonparametric estimation , author=. 2008 , publisher=
2008
-
[19]
Advances in Neural Information Processing Systems , volume=
Deep learning with label differential privacy , author=. Advances in Neural Information Processing Systems , volume=
-
[20]
Advances in neural information processing systems , volume=
Stochastic gradient methods for distributionally robust optimization with f-divergences , author=. Advances in neural information processing systems , volume=
-
[21]
Advances in Neural Information Processing Systems , volume=
Large-scale methods for distributionally robust optimization , author=. Advances in Neural Information Processing Systems , volume=
-
[22]
The Annals of Statistics , volume=
Learning models with uniform performance via distributionally robust optimization , author=. The Annals of Statistics , volume=. 2021 , publisher=
2021
-
[23]
Advances in Neural Information Processing Systems , volume=
Bring your own algorithm for optimal differentially private stochastic minimax optimization , author=. Advances in Neural Information Processing Systems , volume=
-
[24]
Mathematical Programming , volume=
Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , author=. Mathematical Programming , volume=. 2018 , publisher=
2018
-
[25]
International Conference on Learning Representations , year=
Distributionally Robust Neural Networks , author=. International Conference on Learning Representations , year=
-
[26]
Advances in Neural Information Processing Systems , volume=
Privacy induces robustness: Information-computation gaps and sparse mean estimation , author=. Advances in Neural Information Processing Systems , volume=
-
[27]
Uncertainty in Artificial Intelligence , pages=
Differentially private sgda for minimax problems , author=. Uncertainty in Artificial Intelligence , pages=. 2022 , organization=
2022
-
[28]
Operations Research , year=
Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality , author=. Operations Research , year=
-
[29]
Chawla, Nitesh V and Bowyer, Kevin W and Hall, Lawrence O and Kegelmeyer, W Philip , journal=
-
[30]
Journal of Machine Learning Research , volume=
Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning , author=. Journal of Machine Learning Research , volume=
-
[31]
Proceedings of the 55th Annual ACM Symposium on Theory of Computing , pages=
Robustness implies privacy in statistical estimation , author=. Proceedings of the 55th Annual ACM Symposium on Theory of Computing , pages=
-
[32]
Advances in neural information processing systems , volume=
Variance-based regularization with convex objectives , author=. Advances in neural information processing systems , volume=
-
[33]
2009 , publisher=
The elements of statistical learning: data mining, inference, and prediction , author=. 2009 , publisher=
2009
-
[34]
Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming , pages=
Differential privacy , author=. Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming , pages=. 2006 , series=
2006
-
[35]
Foundations and Trends
The algorithmic foundations of differential privacy , author=. Foundations and Trends. 2014 , publisher=
2014
-
[36]
International Conference on Machine Learning , pages=
Does distributionally robust supervised learning give robust classifiers? , author=. International Conference on Machine Learning , pages=. 2018 , organization=
2018
-
[37]
Science , volume=
Making machine learning trustworthy , author=. Science , volume=. 2021 , publisher=
2021
-
[38]
Distributionally robust optimization: A review , author=. arXiv preprint arXiv:1908.05659 , year=
-
[39]
Mathematical Programming , pages=
Optimal algorithms for differentially private stochastic monotone variational inequalities and saddle-point problems , author=. Mathematical Programming , pages=. 2023 , publisher=
2023
-
[40]
Journal of Machine Learning Research , volume=
Katyusha: The first direct acceleration of stochastic gradient methods , author=. Journal of Machine Learning Research , volume=
-
[41]
Advances in neural information processing systems , volume=
Privacy-preserving logistic regression , author=. Advances in neural information processing systems , volume=
-
[42]
Advances in neural information processing systems , volume=
Fast rates for regularized objectives , author=. Advances in neural information processing systems , volume=
-
[43]
Efficient Private ERM for Smooth Objectives
Efficient private ERM for smooth objectives , author=. arXiv preprint arXiv:1703.09947 , year=
-
[44]
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , series=
Deep learning with differential privacy , author=. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , series=
2016
-
[45]
Advances in Neural Information Processing Systems , volume=
Differentially private empirical risk minimization revisited: Faster and more general , author=. Advances in Neural Information Processing Systems , volume=
-
[46]
The total variation distance between high-dimensional Gaussians with the same mean , author=
-
[47]
Training differentially private ad prediction models with semi-sensitive features , author=
-
[48]
Proceedings of the 37th Annual Conference on Learning Theory , pages=
On Convex Optimization with Semi-Sensitive Features , author=. Proceedings of the 37th Annual Conference on Learning Theory , pages=. 2024 , series=
2024
-
[49]
Classification with Partially Private Features , author=
-
[50]
Sepsis Survival Dataset , howpublished =
-
[51]
Adult Income Dataset , howpublished =
-
[52]
Default of Credit Card Clients , howpublished =
-
[53]
Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security , pages=
Geo-indistinguishability: Differential privacy for location-based systems , author=. Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security , pages=
2013
-
[54]
Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence , pages=
Balancing utility and scalability in metric differential privacy , author=. Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence , pages=. 2022 , series=
2022
-
[55]
ACM Computing Surveys , volume=
A survey on differential privacy for unstructured data content , author=. ACM Computing Surveys , volume=
-
[56]
Medical Cost Dataset , howpublished =
-
[57]
Proceedings of the 46th Annual ACM Symposium on Theory of Computing , pages=
Fingerprinting codes and the price of approximate differential privacy , author=. Proceedings of the 46th Annual ACM Symposium on Theory of Computing , pages=. 2014 , series=
2014
-
[58]
Proceedings of the IEEE 55th Annual Symposium on Foundations of Computer Science , pages=
Private empirical risk minimization: Efficient algorithms and tight error bounds , author=. Proceedings of the IEEE 55th Annual Symposium on Foundations of Computer Science , pages=. 2014 , series=
2014
-
[59]
Conference on Learning Theory , pages=
Private convex empirical risk minimization and high-dimensional regression , author=. Conference on Learning Theory , pages=. 2012 , organization=
2012
-
[60]
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing , pages=
Smooth sensitivity and sampling in private data analysis , author=. Proceedings of the thirty-ninth annual ACM symposium on Theory of computing , pages=
-
[61]
Proceedings of the 41st International Conference on Machine Learning , year=
Optimal Differentially Private Model Training with Public Data , author=. Proceedings of the 41st International Conference on Machine Learning , year=
-
[62]
Proceedings of the 38th International Conference on Machine Learning , pages=
Private adaptive gradient methods for convex optimization , author=. Proceedings of the 38th International Conference on Machine Learning , pages=. 2021 , series=
2021
-
[63]
SIAM Journal on Computing , volume=
What can we learn privately? , author=. SIAM Journal on Computing , volume=. 2011 , publisher=
2011
-
[64]
Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited , volume =
Wang, Di and Gaboardi, Marco and Xu, Jinhui , booktitle =. Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited , volume =
-
[65]
Proceedings of the 2019 IEEE International Symposium on Information Theory , pages=
Profile-based privacy for locally private computations , author=. Proceedings of the 2019 IEEE International Symposium on Information Theory , pages=. 2019 , series=
2019
-
[66]
Proceedings of the 39th International Conference on Machine Learning , pages=
Task-aware privacy preservation for multi-dimensional data , author=. Proceedings of the 39th International Conference on Machine Learning , pages=. 2022 , series=
2022
-
[67]
IEEE Transactions on Information Forensics and Security , year=
Online Context-aware Streaming Data Release with Sequence Information Privacy , author=. IEEE Transactions on Information Forensics and Security , year=
-
[68]
International Conference on Machine Learning , pages=
Label differential privacy and private training data release , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[69]
Advances in Neural Information Processing Systems , volume=
Limits of private learning with access to public data , author=. Advances in Neural Information Processing Systems , volume=
-
[70]
Concurrency and Computation: Practice and Experience , volume=
Correlated data in differential privacy: Definition and analysis , author=. Concurrency and Computation: Practice and Experience , volume=
-
[71]
arXiv preprint arXiv:2009.11680 , year=
Privacy-preserving transfer learning via secure maximum mean discrepancy , author=. arXiv preprint arXiv:2009.11680 , year=
-
[72]
International conference on artificial intelligence and statistics , pages=
Dp-merf: Differentially private mean embeddings with randomfeatures for practical privacy-preserving data generation , author=. International conference on artificial intelligence and statistics , pages=. 2021 , organization=
2021
-
[73]
Density estimation under local differential privacy and
Sart, Mathieu , journal=. Density estimation under local differential privacy and. 2023 , publisher=
2023
-
[74]
The Annals of Statistics , volume=
Geometrizing rates of convergence under local differential privacy constraints , author=. The Annals of Statistics , volume=. 2020 , publisher=
2020
-
[75]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Wasserstein Differential Privacy , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[76]
IEEE Journal of Biomedical and Health Informatics , year=
Wasserstein generative adversarial networks based differential privacy metaverse data sharing , author=. IEEE Journal of Biomedical and Health Informatics , year=
-
[77]
Proceedings of the 38th International Conference on Machine Learning , pages=
Differentially private sliced Wasserstein distance , author=. Proceedings of the 38th International Conference on Machine Learning , pages=. 2021 , series=
2021
-
[78]
Private Wasserstein Distance , author=
-
[79]
Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data , pages=
No free lunch in data privacy , author=. Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data , pages=
2011
-
[80]
Masked Differential Privacy , author=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.