The Value of Collaboration in Convex Machine Learning with Differential Privacy
Pith reviewed 2026-05-25 17:56 UTC · model grok-4.3
The pith
In convex machine learning with differential privacy, the fitness gap to the non-private model is inversely proportional to the square of the total training dataset size and the square of the privacy budget.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When minimizing convex loss functions via noisy differentially-private stochastic gradient descent on distributed datasets, the difference between the fitness achieved with privacy and the fitness without privacy is inversely proportional to the size of the training datasets squared and the privacy budget squared.
What carries the argument
Noisy stochastic gradient descent under differential privacy applied to convex losses across multiple non-overlapping datasets, yielding a closed-form bound on excess fitness.
Load-bearing premise
The loss functions being minimized must be convex.
What would settle it
Running the private and non-private training on datasets of different total sizes and checking whether the observed fitness difference scales exactly as one over n squared for fixed privacy budget.
Figures
read the original abstract
In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that for convex machine learning on distributed private datasets, noisy DP-SGD yields an excess fitness gap (relative to non-private training) that scales as 1/(N² ε²), enabling a priori prediction of collaboration value among data owners; this is derived for convex losses and validated on loan regression and credit-card SVM tasks.
Significance. If the scaling holds, the result would provide a closed-form, parameter-light way to forecast utility loss from DP in collaborative convex ML, which is practically useful for privacy-sensitive domains like finance. The validation on two real datasets adds empirical grounding, though the absence of strong-convexity regularization details limits immediate applicability.
major comments (3)
- [Abstract] Abstract: the claimed inverse-square scaling 1/(N² ε²) for the fitness gap holds only under μ-strong convexity (where the privacy-induced excess risk is Θ(σ²/μ) and σ ∝ 1/(ε N)); for plain convex Lipschitz losses the standard DP-ERM bound is O(1/(ε N)). The title and abstract state the result for “convex” losses without mentioning strong convexity, making the central proportionality claim incorrect in the stated generality.
- [Abstract] Abstract (validation paragraph): the claim that the scaling prediction “was validated on two real datasets” is presented without derivation details, error bars, regularization parameters, or exclusion criteria for the loan regression and credit-card SVM experiments. This leaves the empirical support for the 1/(N² ε²) claim unverifiable from the given text.
- [Derivation (likely §3–4)] The distributed collaboration analysis and the fitness-gap derivation must be checked for an explicit invocation of the strong-convexity inequality; if the proof relies only on convexity, the stated proportionality cannot be recovered.
minor comments (1)
- [Abstract] Notation for the fitness cost and privacy budget should be introduced with explicit definitions before the scaling claim is stated.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments correctly identify that our claimed scaling requires strong convexity, which was implicit in the analysis but not stated explicitly in the abstract and title. We will revise the manuscript to clarify this assumption and expand the empirical details.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claimed inverse-square scaling 1/(N² ε²) for the fitness gap holds only under μ-strong convexity (where the privacy-induced excess risk is Θ(σ²/μ) and σ ∝ 1/(ε N)); for plain convex Lipschitz losses the standard DP-ERM bound is O(1/(ε N)). The title and abstract state the result for “convex” losses without mentioning strong convexity, making the central proportionality claim incorrect in the stated generality.
Authors: We agree with this assessment. The 1/(N² ε²) scaling is derived under the assumption of μ-strong convexity (used to bound excess risk as Θ(σ²/μ) with σ ∝ 1/(ε N)). The manuscript's abstract and title use the term 'convex' without qualification. We will revise the abstract, title, and introduction to explicitly state 'strongly convex' losses. revision: yes
-
Referee: [Abstract] Abstract (validation paragraph): the claim that the scaling prediction “was validated on two real datasets” is presented without derivation details, error bars, regularization parameters, or exclusion criteria for the loan regression and credit-card SVM experiments. This leaves the empirical support for the 1/(N² ε²) claim unverifiable from the given text.
Authors: We will add the missing details in the revised manuscript, including the value of the strong-convexity parameter μ used for regularization, standard error bars from repeated trials, and dataset preprocessing steps. This will allow readers to verify the empirical match to the predicted scaling. revision: yes
-
Referee: [Derivation (likely §3–4)] The distributed collaboration analysis and the fitness-gap derivation must be checked for an explicit invocation of the strong-convexity inequality; if the proof relies only on convexity, the stated proportionality cannot be recovered.
Authors: The derivation invokes the strong-convexity inequality when relating the excess fitness to the noise variance (specifically in the step bounding the optimality gap after noisy gradient steps). We will insert an explicit remark and reference to this step in the revised version of Sections 3–4 to make the reliance on strong convexity transparent. revision: partial
Circularity Check
No circularity: scaling bound derived from noisy-SGD analysis, not fitted or self-referential
full rationale
The paper presents the 1/(N²ε²) fitness-gap scaling as a derived bound obtained by analyzing the excess risk of noisy gradient descent on convex losses (abstract and full derivation sections). The result is obtained from standard DP-SGD excess-risk analysis under the stated convexity assumption and is then validated on separate financial and fraud datasets. No equation reduces the claimed proportionality to a parameter fitted from the same data used for validation, no load-bearing self-citation chain is invoked to establish the scaling, and the derivation does not rename an empirical pattern or smuggle an ansatz via prior work. The central claim therefore remains independent of its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The machine-learning objective is convex
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the difference … is inversely proportional to the size of the training datasets squared and the privacy budget squared (Theorem 2, under L-strong convexity and λ-Lipschitz gradient)
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assumption 2. g1 and g2 are convex functions of θ … f is also a convex function
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Revisiting the governance o f privacy: Contemporary policy instruments in global perspective,
C. J. Bennett and C. D. Raab, “Revisiting the governance o f privacy: Contemporary policy instruments in global perspective,” Regulation & Governance, 2018
work page 2018
-
[2]
O. Shamir and T. Zhang, “Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging s chemes,” in International Conference on Machine Learning , pp. 71–79, 2013
work page 2013
-
[3]
Lending club loan data: Analyze lending club’s i s- sued loans
W. Kan, “Lending club loan data: Analyze lending club’s i s- sued loans.” https://www. kaggle. com/wendykan/lending-club-loan-data , Date Accessed: 17 Oct 2018
work page 2018
-
[4]
Credit card fraud detecti on: Anonymized credit card transactions labeled as fraudulent or genuine
Machine Learning Group–ULB, “Credit card fraud detecti on: Anonymized credit card transactions labeled as fraudulent or genuine.” https://www. kaggle. com/mlg-ulb/creditcardfraud/home, Date Accessed: 27 Nov 2018
work page 2018
-
[5]
Privacy preserving data minin g,
Y . Lindell and B. Pinkas, “Privacy preserving data minin g,” in Advances in Cryptology — CRYPTO 2000 (M. Bellare, ed.), (Berlin, Heidelberg), pp. 36–54, Springer Berlin Heidelberg, 2000
work page 2000
-
[6]
Privacy-preserving multiv ariate statistical analysis: Linear regression and classificatio n,
W. Du, Y . S. Han, and S. Chen, “Privacy-preserving multiv ariate statistical analysis: Linear regression and classificatio n,” in Proceedings of the 2004 SIAM international conference on data mining , pp. 222–233, SIAM, 2004
work page 2004
-
[7]
Privacy preserving associati on rule mining in vertically partitioned data,
J. V aidya and C. Clifton, “Privacy preserving associati on rule mining in vertically partitioned data,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data m ining, pp. 639–644, ACM, 2002
work page 2002
-
[8]
Privacy-p reserving naive bayes classification,
J. V aidya, M. Kantarcıo˘ glu, and C. Clifton, “Privacy-p reserving naive bayes classification,” The VLDB Journal , vol. 17, no. 4, pp. 879–898, 2008
work page 2008
-
[9]
Privacy-preserving di stributed k- means clustering over arbitrarily partitioned data,
G. Jagannathan and R. N. Wright, “Privacy-preserving di stributed k- means clustering over arbitrarily partitioned data,” in Proceedings of the eleventh ACM SIGKDD international conference on Knowle dge discovery in data mining , pp. 593–599, ACM, 2005
work page 2005
-
[10]
Practical secure aggregation for privacy-preserving machine learning,
K. Bonawitz, V . Ivanov, B. Kreuter, A. Marcedone, H. B. M cMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Secur ity, pp. 1175–1191, ACM, 2017
work page 2017
-
[11]
ML confidential: Machine learning on encrypted data,
T. Graepel, K. Lauter, and M. Naehrig, “ML confidential: Machine learning on encrypted data,” in International Conference on Information Security and Cryptology , pp. 1–21, Springer, 2012
work page 2012
-
[12]
Chiron: Privacy-preserving Machine Learning as a Service
T. Hunt, C. Song, R. Shokri, V . Shmatikov, and E. Witchel , “Chi- ron: Privacy-preserving machine learning as a service,” arXiv preprint arXiv:1803.05961, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
Multi- key privacy-preserving deep learning in cloud computing,
P . Li, J. Li, Z. Huang, T. Li, C.-Z. Gao, S.-M. Yiu, and K. C hen, “Multi- key privacy-preserving deep learning in cloud computing,” Future Gen- eration Computer Systems , vol. 74, pp. 76–85, 2017
work page 2017
-
[14]
Privacy-preserving deep learning via additively homomorphic encryption,
Y . Aono, T. Hayashi, L. Wang, S. Moriai, et al. , “Privacy-preserving deep learning via additively homomorphic encryption,” IEEE Transac- tions on Information F orensics and Security , vol. 13, no. 5, pp. 1333– 1345, 2018
work page 2018
-
[15]
Cryptonets: Applying neural networks to encr ypted data with high throughput and accuracy,
R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Na ehrig, and J. Wernsing, “Cryptonets: Applying neural networks to encr ypted data with high throughput and accuracy,” in International Conference on Machine Learning , pp. 201–210, 2016
work page 2016
-
[16]
A. D. Sarwate and K. Chaudhuri, “Signal processing and m achine learn- ing with differential privacy: Algorithms and challenges f or continuous data,” IEEE signal processing magazine , vol. 30, no. 5, pp. 86–94, 2013
work page 2013
-
[17]
Fu nctional mechanism: Regression analysis under differential privac y,
J. Zhang, Z. Zhang, X. Xiao, Y . Y ang, and M. Winslett, “Fu nctional mechanism: Regression analysis under differential privac y,” Proceedings of the VLDB Endowment , vol. 5, no. 11, pp. 1364–1375, 2012
work page 2012
-
[18]
Privacy-preserving l ogistic regres- sion,
K. Chaudhuri and C. Monteleoni, “Privacy-preserving l ogistic regres- sion,” in Advances in Neural Information Processing Systems , pp. 289– 296, 2009
work page 2009
-
[19]
On t he differen- tial privacy of Bayesian inference,
Z. Zhang, B. I. P . Rubinstein, and C. Dimitrakakis, “On t he differen- tial privacy of Bayesian inference,” in AAAI Conference on Artificial Intelligence, pp. 2365–2371, 2016
work page 2016
-
[20]
Dynamic differential privacy for A DMM-based distributed classification learning,
T. Zhang and Q. Zhu, “Dynamic differential privacy for A DMM-based distributed classification learning,” IEEE Transactions on Information F orensics and Security, vol. 12, no. 1, pp. 172–187, 2017
work page 2017
-
[21]
DP-ADMM: ADMM - based distributed learning with differential privacy,
Z. Huang, R. Hu, Y . Gong, and E. Chan-Tin, “DP-ADMM: ADMM - based distributed learning with differential privacy,” Preprint: arXiv preprint arXiv:1808.10101 , 2018
-
[22]
Differentially priv ate distributed optimization,
Z. Huang, S. Mitra, and N. V aidya, “Differentially priv ate distributed optimization,” in Proceedings of the 2015 International Conference on Distributed Computing and Networking , p. 4, 2015
work page 2015
-
[23]
Different ially private dis- tributed convex optimization via functional perturbation ,
E. Nozari, P . Tallapragada, and J. Cort´ es, “Different ially private dis- tributed convex optimization via functional perturbation ,” IEEE Trans- actions on Control of Network Systems , vol. 5, no. 1, pp. 395–408, 2018
work page 2018
-
[24]
Differentially private cloud -based multi- agent optimization with constraints,
M. Hale and M. Egersted, “Differentially private cloud -based multi- agent optimization with constraints,” in Proceedings of the American Control Conference, pp. 1235–1240, 2015
work page 2015
-
[25]
Privacy-preserving deep l earning,
R. Shokri and V . Shmatikov, “Privacy-preserving deep l earning,” in Proceedings of the 22nd ACM SIGSAC conference on computer an d communications security , pp. 1310–1321, ACM, 2015
work page 2015
-
[26]
Deep learning with differential pr ivacy,
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mirono v, K. Talwar, and L. Zhang, “Deep learning with differential pr ivacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer an d Communications Security , pp. 308–318, 2016
work page 2016
-
[27]
Learning Differentially Private Recurrent Language Models
H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learn - ing differentially private recurrent language models,” arXiv preprint arXiv:1710.06963, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[28]
Privacy-preserving Machine Learning through Data Obfuscation
T. Zhang, Z. He, and R. B. Lee, “Privacy-preserving mach ine learning through data obfuscation,” arXiv preprint arXiv:1807.01860 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[29]
Differentially priv ate distributed constrained optimization,
S. Han, U. Topcu, and G. J. Pappas, “Differentially priv ate distributed constrained optimization,” IEEE Transactions on Automatic Control , vol. 62, no. 1, pp. 50–64, 2017
work page 2017
-
[30]
N. Z. Shor, Minimization methods for non-differentiable functions , vol. 3 of Springer Series in Computational Mathematics . Berlin, Heidelberg: Springer, 2012
work page 2012
-
[31]
The algorithmic foundations of di fferential privacy,
C. Dwork and A. Roth, “The algorithmic foundations of di fferential privacy,” F oundations and Trends in Theoretical Computer Science , vol. 9, no. 3–4, pp. 211–407, 2014
work page 2014
-
[32]
B. Grimmer, “Convergence rates for deterministic and s tochastic sub- gradient methods without lipschitz continuity,” SIAM Journal on Opti- mization, vol. 29, no. 2, pp. 1350–1365, 2019
work page 2019
-
[33]
Private empiric al risk minimiza- tion: Efficient algorithms and tight error bounds,
R. Bassily, A. Smith, and A. Thakurta, “Private empiric al risk minimiza- tion: Efficient algorithms and tight error bounds,” in 2014 IEEE 55th Annual Symposium on F oundations of Computer Science , pp. 464–473, IEEE, 2014
work page 2014
-
[34]
Nesterov, Introductory Lectures on Convex Optimization: A Basic Course
Y . Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization, Springer US, 2013. APPENDIX A PROOF OF THEOREM 1 First, because of (6), we have ∥Qℓ(Dℓ;k) − Qℓ(D′ ℓ;k))∥1 = 1 nℓ ∑ {x,y }∈D ℓ ξ¯gx,y 2 (θ[k]) − ∑ {x,y }∈D ′ ℓ ξ¯gx,y 2 (θ[k]) 1 = 1 nℓ ∥ξ¯gx,y 2 (θ[k])|{x,y }∈D ℓ ⊆D ′ ℓ −ξ¯gx,y 2 (θ[k])|{x,...
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.