pith. sign in

arxiv: 1906.09679 · v1 · pith:G2A33JV6new · submitted 2019-06-24 · 💻 cs.CR · cs.LG· stat.ML

The Value of Collaboration in Convex Machine Learning with Differential Privacy

Pith reviewed 2026-05-25 17:56 UTC · model grok-4.3

classification 💻 cs.CR cs.LGstat.ML
keywords differential privacyconvex machine learningstochastic gradient descentcollaborative learningprivacy utility tradeoffregressionsupport vector machinesfinancial datasets
0
0 comments X

The pith

In convex machine learning with differential privacy, the fitness gap to the non-private model is inversely proportional to the square of the total training dataset size and the square of the privacy budget.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when data owners collaborate on convex machine learning tasks by sharing noisy gradients protected by differential privacy, the excess fitness cost of the resulting model compared to non-private training decreases as the inverse square of the combined dataset size times the inverse square of the privacy budget. This closed-form relationship lets participants predict the benefit of pooling their private data before committing to computation. A reader would care because it turns the privacy-utility tradeoff into a quantifiable decision tool rather than an empirical guess, and the authors confirm the prediction matches actual runs on loan interest rate regression and credit card fraud detection.

Core claim

When minimizing convex loss functions via noisy differentially-private stochastic gradient descent on distributed datasets, the difference between the fitness achieved with privacy and the fitness without privacy is inversely proportional to the size of the training datasets squared and the privacy budget squared.

What carries the argument

Noisy stochastic gradient descent under differential privacy applied to convex losses across multiple non-overlapping datasets, yielding a closed-form bound on excess fitness.

Load-bearing premise

The loss functions being minimized must be convex.

What would settle it

Running the private and non-private training on datasets of different total sizes and checking whether the observed fitness difference scales exactly as one over n squared for fixed privacy budget.

Figures

Figures reproduced from arXiv: 1906.09679 by David Smith, Farhad Farokhi, Mohamed Ali Kaafar, Nan Wu.

Figure 1
Figure 1. Figure 1: The communication structure between the learner and [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Statistics of relative fitness of the stochastic grad [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Relative fitness of the stochastic gradient method in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Relative fitness of the stochastic gradient method in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Relative fitness of the stochastic gradient method in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Statistics of relative fitness of the stochastic grad [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Relative fitness of the stochastic gradient method in [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Relative fitness of the stochastic gradient method i [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Relative fitness of the stochastic gradient method i [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
read the original abstract

In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that for convex machine learning on distributed private datasets, noisy DP-SGD yields an excess fitness gap (relative to non-private training) that scales as 1/(N² ε²), enabling a priori prediction of collaboration value among data owners; this is derived for convex losses and validated on loan regression and credit-card SVM tasks.

Significance. If the scaling holds, the result would provide a closed-form, parameter-light way to forecast utility loss from DP in collaborative convex ML, which is practically useful for privacy-sensitive domains like finance. The validation on two real datasets adds empirical grounding, though the absence of strong-convexity regularization details limits immediate applicability.

major comments (3)
  1. [Abstract] Abstract: the claimed inverse-square scaling 1/(N² ε²) for the fitness gap holds only under μ-strong convexity (where the privacy-induced excess risk is Θ(σ²/μ) and σ ∝ 1/(ε N)); for plain convex Lipschitz losses the standard DP-ERM bound is O(1/(ε N)). The title and abstract state the result for “convex” losses without mentioning strong convexity, making the central proportionality claim incorrect in the stated generality.
  2. [Abstract] Abstract (validation paragraph): the claim that the scaling prediction “was validated on two real datasets” is presented without derivation details, error bars, regularization parameters, or exclusion criteria for the loan regression and credit-card SVM experiments. This leaves the empirical support for the 1/(N² ε²) claim unverifiable from the given text.
  3. [Derivation (likely §3–4)] The distributed collaboration analysis and the fitness-gap derivation must be checked for an explicit invocation of the strong-convexity inequality; if the proof relies only on convexity, the stated proportionality cannot be recovered.
minor comments (1)
  1. [Abstract] Notation for the fitness cost and privacy budget should be introduced with explicit definitions before the scaling claim is stated.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify that our claimed scaling requires strong convexity, which was implicit in the analysis but not stated explicitly in the abstract and title. We will revise the manuscript to clarify this assumption and expand the empirical details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claimed inverse-square scaling 1/(N² ε²) for the fitness gap holds only under μ-strong convexity (where the privacy-induced excess risk is Θ(σ²/μ) and σ ∝ 1/(ε N)); for plain convex Lipschitz losses the standard DP-ERM bound is O(1/(ε N)). The title and abstract state the result for “convex” losses without mentioning strong convexity, making the central proportionality claim incorrect in the stated generality.

    Authors: We agree with this assessment. The 1/(N² ε²) scaling is derived under the assumption of μ-strong convexity (used to bound excess risk as Θ(σ²/μ) with σ ∝ 1/(ε N)). The manuscript's abstract and title use the term 'convex' without qualification. We will revise the abstract, title, and introduction to explicitly state 'strongly convex' losses. revision: yes

  2. Referee: [Abstract] Abstract (validation paragraph): the claim that the scaling prediction “was validated on two real datasets” is presented without derivation details, error bars, regularization parameters, or exclusion criteria for the loan regression and credit-card SVM experiments. This leaves the empirical support for the 1/(N² ε²) claim unverifiable from the given text.

    Authors: We will add the missing details in the revised manuscript, including the value of the strong-convexity parameter μ used for regularization, standard error bars from repeated trials, and dataset preprocessing steps. This will allow readers to verify the empirical match to the predicted scaling. revision: yes

  3. Referee: [Derivation (likely §3–4)] The distributed collaboration analysis and the fitness-gap derivation must be checked for an explicit invocation of the strong-convexity inequality; if the proof relies only on convexity, the stated proportionality cannot be recovered.

    Authors: The derivation invokes the strong-convexity inequality when relating the excess fitness to the noise variance (specifically in the step bounding the optimality gap after noisy gradient steps). We will insert an explicit remark and reference to this step in the revised version of Sections 3–4 to make the reliance on strong convexity transparent. revision: partial

Circularity Check

0 steps flagged

No circularity: scaling bound derived from noisy-SGD analysis, not fitted or self-referential

full rationale

The paper presents the 1/(N²ε²) fitness-gap scaling as a derived bound obtained by analyzing the excess risk of noisy gradient descent on convex losses (abstract and full derivation sections). The result is obtained from standard DP-SGD excess-risk analysis under the stated convexity assumption and is then validated on separate financial and fraud datasets. No equation reduces the claimed proportionality to a parameter fitted from the same data used for validation, no load-bearing self-citation chain is invoked to establish the scaling, and the derivation does not rename an empirical pattern or smuggle an ansatz via prior work. The central claim therefore remains independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The scaling result rests on the convexity of the loss (domain assumption) and the standard properties of Gaussian noise added for differential privacy; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption The machine-learning objective is convex
    Required for the noisy gradient analysis to yield a closed-form fitness gap depending only on n and epsilon.

pith-pipeline@v0.9.0 · 5722 in / 1158 out tokens · 26577 ms · 2026-05-25T17:56:04.047462+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 3 internal anchors

  1. [1]

    Revisiting the governance o f privacy: Contemporary policy instruments in global perspective,

    C. J. Bennett and C. D. Raab, “Revisiting the governance o f privacy: Contemporary policy instruments in global perspective,” Regulation & Governance, 2018

  2. [2]

    Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging s chemes,

    O. Shamir and T. Zhang, “Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging s chemes,” in International Conference on Machine Learning , pp. 71–79, 2013

  3. [3]

    Lending club loan data: Analyze lending club’s i s- sued loans

    W. Kan, “Lending club loan data: Analyze lending club’s i s- sued loans.” https://www. kaggle. com/wendykan/lending-club-loan-data , Date Accessed: 17 Oct 2018

  4. [4]

    Credit card fraud detecti on: Anonymized credit card transactions labeled as fraudulent or genuine

    Machine Learning Group–ULB, “Credit card fraud detecti on: Anonymized credit card transactions labeled as fraudulent or genuine.” https://www. kaggle. com/mlg-ulb/creditcardfraud/home, Date Accessed: 27 Nov 2018

  5. [5]

    Privacy preserving data minin g,

    Y . Lindell and B. Pinkas, “Privacy preserving data minin g,” in Advances in Cryptology — CRYPTO 2000 (M. Bellare, ed.), (Berlin, Heidelberg), pp. 36–54, Springer Berlin Heidelberg, 2000

  6. [6]

    Privacy-preserving multiv ariate statistical analysis: Linear regression and classificatio n,

    W. Du, Y . S. Han, and S. Chen, “Privacy-preserving multiv ariate statistical analysis: Linear regression and classificatio n,” in Proceedings of the 2004 SIAM international conference on data mining , pp. 222–233, SIAM, 2004

  7. [7]

    Privacy preserving associati on rule mining in vertically partitioned data,

    J. V aidya and C. Clifton, “Privacy preserving associati on rule mining in vertically partitioned data,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data m ining, pp. 639–644, ACM, 2002

  8. [8]

    Privacy-p reserving naive bayes classification,

    J. V aidya, M. Kantarcıo˘ glu, and C. Clifton, “Privacy-p reserving naive bayes classification,” The VLDB Journal , vol. 17, no. 4, pp. 879–898, 2008

  9. [9]

    Privacy-preserving di stributed k- means clustering over arbitrarily partitioned data,

    G. Jagannathan and R. N. Wright, “Privacy-preserving di stributed k- means clustering over arbitrarily partitioned data,” in Proceedings of the eleventh ACM SIGKDD international conference on Knowle dge discovery in data mining , pp. 593–599, ACM, 2005

  10. [10]

    Practical secure aggregation for privacy-preserving machine learning,

    K. Bonawitz, V . Ivanov, B. Kreuter, A. Marcedone, H. B. M cMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Secur ity, pp. 1175–1191, ACM, 2017

  11. [11]

    ML confidential: Machine learning on encrypted data,

    T. Graepel, K. Lauter, and M. Naehrig, “ML confidential: Machine learning on encrypted data,” in International Conference on Information Security and Cryptology , pp. 1–21, Springer, 2012

  12. [12]

    Chiron: Privacy-preserving Machine Learning as a Service

    T. Hunt, C. Song, R. Shokri, V . Shmatikov, and E. Witchel , “Chi- ron: Privacy-preserving machine learning as a service,” arXiv preprint arXiv:1803.05961, 2018

  13. [13]

    Multi- key privacy-preserving deep learning in cloud computing,

    P . Li, J. Li, Z. Huang, T. Li, C.-Z. Gao, S.-M. Yiu, and K. C hen, “Multi- key privacy-preserving deep learning in cloud computing,” Future Gen- eration Computer Systems , vol. 74, pp. 76–85, 2017

  14. [14]

    Privacy-preserving deep learning via additively homomorphic encryption,

    Y . Aono, T. Hayashi, L. Wang, S. Moriai, et al. , “Privacy-preserving deep learning via additively homomorphic encryption,” IEEE Transac- tions on Information F orensics and Security , vol. 13, no. 5, pp. 1333– 1345, 2018

  15. [15]

    Cryptonets: Applying neural networks to encr ypted data with high throughput and accuracy,

    R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Na ehrig, and J. Wernsing, “Cryptonets: Applying neural networks to encr ypted data with high throughput and accuracy,” in International Conference on Machine Learning , pp. 201–210, 2016

  16. [16]

    Signal processing and m achine learn- ing with differential privacy: Algorithms and challenges f or continuous data,

    A. D. Sarwate and K. Chaudhuri, “Signal processing and m achine learn- ing with differential privacy: Algorithms and challenges f or continuous data,” IEEE signal processing magazine , vol. 30, no. 5, pp. 86–94, 2013

  17. [17]

    Fu nctional mechanism: Regression analysis under differential privac y,

    J. Zhang, Z. Zhang, X. Xiao, Y . Y ang, and M. Winslett, “Fu nctional mechanism: Regression analysis under differential privac y,” Proceedings of the VLDB Endowment , vol. 5, no. 11, pp. 1364–1375, 2012

  18. [18]

    Privacy-preserving l ogistic regres- sion,

    K. Chaudhuri and C. Monteleoni, “Privacy-preserving l ogistic regres- sion,” in Advances in Neural Information Processing Systems , pp. 289– 296, 2009

  19. [19]

    On t he differen- tial privacy of Bayesian inference,

    Z. Zhang, B. I. P . Rubinstein, and C. Dimitrakakis, “On t he differen- tial privacy of Bayesian inference,” in AAAI Conference on Artificial Intelligence, pp. 2365–2371, 2016

  20. [20]

    Dynamic differential privacy for A DMM-based distributed classification learning,

    T. Zhang and Q. Zhu, “Dynamic differential privacy for A DMM-based distributed classification learning,” IEEE Transactions on Information F orensics and Security, vol. 12, no. 1, pp. 172–187, 2017

  21. [21]

    DP-ADMM: ADMM - based distributed learning with differential privacy,

    Z. Huang, R. Hu, Y . Gong, and E. Chan-Tin, “DP-ADMM: ADMM - based distributed learning with differential privacy,” Preprint: arXiv preprint arXiv:1808.10101 , 2018

  22. [22]

    Differentially priv ate distributed optimization,

    Z. Huang, S. Mitra, and N. V aidya, “Differentially priv ate distributed optimization,” in Proceedings of the 2015 International Conference on Distributed Computing and Networking , p. 4, 2015

  23. [23]

    Different ially private dis- tributed convex optimization via functional perturbation ,

    E. Nozari, P . Tallapragada, and J. Cort´ es, “Different ially private dis- tributed convex optimization via functional perturbation ,” IEEE Trans- actions on Control of Network Systems , vol. 5, no. 1, pp. 395–408, 2018

  24. [24]

    Differentially private cloud -based multi- agent optimization with constraints,

    M. Hale and M. Egersted, “Differentially private cloud -based multi- agent optimization with constraints,” in Proceedings of the American Control Conference, pp. 1235–1240, 2015

  25. [25]

    Privacy-preserving deep l earning,

    R. Shokri and V . Shmatikov, “Privacy-preserving deep l earning,” in Proceedings of the 22nd ACM SIGSAC conference on computer an d communications security , pp. 1310–1321, ACM, 2015

  26. [26]

    Deep learning with differential pr ivacy,

    M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mirono v, K. Talwar, and L. Zhang, “Deep learning with differential pr ivacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer an d Communications Security , pp. 308–318, 2016

  27. [27]

    Learning Differentially Private Recurrent Language Models

    H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learn - ing differentially private recurrent language models,” arXiv preprint arXiv:1710.06963, 2017

  28. [28]

    Privacy-preserving Machine Learning through Data Obfuscation

    T. Zhang, Z. He, and R. B. Lee, “Privacy-preserving mach ine learning through data obfuscation,” arXiv preprint arXiv:1807.01860 , 2018

  29. [29]

    Differentially priv ate distributed constrained optimization,

    S. Han, U. Topcu, and G. J. Pappas, “Differentially priv ate distributed constrained optimization,” IEEE Transactions on Automatic Control , vol. 62, no. 1, pp. 50–64, 2017

  30. [30]

    N. Z. Shor, Minimization methods for non-differentiable functions , vol. 3 of Springer Series in Computational Mathematics . Berlin, Heidelberg: Springer, 2012

  31. [31]

    The algorithmic foundations of di fferential privacy,

    C. Dwork and A. Roth, “The algorithmic foundations of di fferential privacy,” F oundations and Trends in Theoretical Computer Science , vol. 9, no. 3–4, pp. 211–407, 2014

  32. [32]

    Convergence rates for deterministic and s tochastic sub- gradient methods without lipschitz continuity,

    B. Grimmer, “Convergence rates for deterministic and s tochastic sub- gradient methods without lipschitz continuity,” SIAM Journal on Opti- mization, vol. 29, no. 2, pp. 1350–1365, 2019

  33. [33]

    Private empiric al risk minimiza- tion: Efficient algorithms and tight error bounds,

    R. Bassily, A. Smith, and A. Thakurta, “Private empiric al risk minimiza- tion: Efficient algorithms and tight error bounds,” in 2014 IEEE 55th Annual Symposium on F oundations of Computer Science , pp. 464–473, IEEE, 2014

  34. [34]

    Nesterov, Introductory Lectures on Convex Optimization: A Basic Course

    Y . Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization, Springer US, 2013. APPENDIX A PROOF OF THEOREM 1 First, because of (6), we have ∥Qℓ(Dℓ;k) − Qℓ(D′ ℓ;k))∥1 = 1 nℓ      ∑ {x,y }∈D ℓ ξ¯gx,y 2 (θ[k]) − ∑ {x,y }∈D ′ ℓ ξ¯gx,y 2 (θ[k])      1 = 1 nℓ ∥ξ¯gx,y 2 (θ[k])|{x,y }∈D ℓ ⊆D ′ ℓ −ξ¯gx,y 2 (θ[k])|{x,...