pith. sign in

arxiv: 2605.16219 · v1 · pith:U64BQ3LCnew · submitted 2026-05-15 · 💻 cs.LG · stat.ML

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

Pith reviewed 2026-05-20 20:31 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords differential privacyCVaR optimizationtail riskexcess risk boundseffective sample sizestochastic convex optimizationprivacy price
0
0 comments X

The pith

Differential privacy reduces the effective sample size for CVaR learning from n to ε n τ.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that adding differential privacy to Conditional Value at Risk optimization changes the governing sample size from the total number of records n to the product ε n τ, where τ is the tail mass and ε is the privacy parameter. Excess risk then splits cleanly into ordinary statistical error from estimating the tail and a separate privacy price that scales as 1 over that effective size. Complete rates are derived for scalar estimation and finite hypothesis classes, while modular reductions show the same 1/(ε n τ) scaling for convex Lipschitz problems, inheriting dimension dependence from private stochastic convex optimization. The results frame private CVaR as standard private learning performed on an effective dataset of only Θ(n τ) tail records.

Core claim

Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. Scalar estimation achieves rate Θ(B min{1, (nτ)^{-1/2} + (ε n τ)^{-1}}), finite classes of size M achieve Θ(B min{1, sqrt(log(2M)/(nτ)) + log(2M)/(ε n τ)}), and convex Lipschitz learning requires a CVaR-specific privacy term that necessarily scales as 1/(ε n τ) with dimension factors from private SCO. These complete rates hold under pure DP, with lower bounds extending to approximate DP in small-δ regimes.

What carries the argument

The effective private tail sample size ε n τ, which replaces n as the quantity controlling the privacy contribution to excess risk in CVaR optimization.

If this is right

  • Scalar and finite-class CVaR learners under pure DP attain the stated two-term rates with the privacy term explicit.
  • Lower bounds extend to approximate DP only in the small-δ regimes stated in the analysis.
  • Convex Lipschitz CVaR optimization inherits at least Ω(1/(ε n τ)) privacy cost in excess risk, plus dimension dependence.
  • The canonical hard subproblem is ordinary private learning restricted to Θ(n τ) informative tail records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Data collection budgets for private tail-risk models may need to grow proportionally to 1/τ to offset the reduced effective size.
  • Tail-focused metrics appear more privacy-sensitive than mean-risk metrics because only a fraction τ of samples contribute information.
  • Similar effective-sample-size reductions may apply to other coherent risk measures or quantile-based objectives.

Load-bearing premise

The complete rates and modular reductions assume pure differential privacy, with the convex case further requiring Lipschitz convex losses.

What would settle it

Measure private CVaR excess risk while fixing n τ but varying n and τ independently; if the privacy term does not stay proportional to 1/(ε n τ), the claimed decomposition fails.

Figures

Figures reproduced from arXiv: 2605.16219 by El Mustapha Mansouri.

Figure 1
Figure 1. Figure 1: Error decomposition in private CVaR learning. CVaR concentrates learning on the worst τ -fraction of the sample, leaving only nτ informative tail observa￾tions. Ordinary tail-risk estimation contributes the statistical term 1/ √ nτ , while differential privacy adds a separate price of order 1/(εnτ ). Thus privacy acts on the effective private tail sample size εnτ , not on the full sample size n. This is th… view at source ↗
Figure 2
Figure 2. Figure 2: The Privacy–Tail-Risk Frontier in Private CVaR Learning. CVaR concen￾trates the objective on the worst τ -fraction of the loss distribution, so only about nτ records carry tail information. Record-level differential privacy must protect each high-loss tail record, producing sensitivity of order B/(nτ ). Consequently, the privacy-relevant sample size is εnτ , not εn. Nontrivial private tail-risk learning re… view at source ↗
Figure 3
Figure 3. Figure 3: Tail-sample embedding reduction. Ordinary expected-risk learning with m samples is embedded into CVaR learning by activating each record with probability τ and assigning inactive records zero loss. The embedded loss ℓ(w; (T, Y )) = T a(w; Y ) places the informative observations in the upper τ -tail, so CVaR recovers the ordinary expected loss on the active distribution. Thus the effective-sample-size heuri… view at source ↗
Figure 4
Figure 4. Figure 4: Matched convex upper and lower transfers. The upper route applies the rescaled Rockafellar–Uryasev lift and imports the Euclidean approximate-DP SCO upper primitive, producing the displayed private term. The lower route uses the tail-sample embedding to transfer Euclidean private-SCO hardness into CVaR learning. The tilde-Omega notation suppresses the logarithmic factors inherited from the imported Euclide… view at source ↗
read the original abstract

Differential privacy changes the effective sample size governing CVaR learning. For tail mass $\tau$, the privacy-relevant sample size is not $n$, but $n\tau$; equivalently, the effective private tail sample size is $\epsilon n\tau$. Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. This decomposition is complete for scalar estimation and finite classes: scalar estimation has rate $\Theta(B \min\{1,(n\tau)^{-1/2}+(\epsilon n\tau)^{-1}\})$, and finite classes of size $M$ have rate $\Theta(B \min\{1,\sqrt{\log(2M)/(n\tau)}+\log(2M)/(\epsilon n\tau)\})$. These complete rates hold under pure DP, and their lower bounds extend to approximate DP in the stated small-$\delta$ regimes. For convex Lipschitz learning, modular upper and lower reductions show that the CVaR-specific privacy term necessarily scales as $1/(\epsilon n\tau)$, with dimension dependence inherited from private stochastic convex optimization. Together, these results identify ordinary private learning on $\Theta(n\tau)$ informative tail records as the canonical hard subproblem inside private CVaR learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that differential privacy alters the effective sample size for CVaR learning from n to nτ (for tail mass τ), so that the privacy-relevant size is ε n τ. Private CVaR excess risk decomposes into ordinary tail-risk statistical error plus a privacy price. Complete matching upper and lower bounds are given for scalar estimation (rate Θ(B min{1, (nτ)^{-1/2} + (ε n τ)^{-1}})) and finite classes of size M (rate Θ(B min{1, √(log(2M)/(nτ)) + log(2M)/(ε n τ)})) under pure DP, with lower bounds extending to approximate DP for small δ. For convex Lipschitz losses the CVaR-specific privacy term scales as 1/(ε n τ) with dimension dependence inherited from private stochastic convex optimization. The central hard subproblem is therefore ordinary private learning on Θ(nτ) informative tail records.

Significance. If the modular reductions and matching bounds hold, the work supplies a clean, interpretable decomposition that isolates the privacy price of tail-risk learning and identifies the effective tail sample size as the governing quantity. The explicit pure-DP rates, small-δ extensions, and reduction to private SCO for the convex case are concrete strengths that would be useful for both theory and practice in private risk-sensitive learning.

major comments (1)
  1. [Abstract, §3] Abstract and §3: the claim that lower bounds extend to approximate DP 'in the stated small-δ regimes' requires an explicit statement of the δ threshold (e.g., δ = o(1/(nτ)) or similar) under which the pure-DP lower bound carries over; without this the extension is not yet load-bearing for the approximate-DP case.
minor comments (2)
  1. Notation: the paper uses both ε and ε n τ for the privacy parameter; a single consistent symbol for the effective privacy budget on the tail would improve readability.
  2. [Abstract] The abstract states 'complete rates' for scalar and finite-class cases; adding a short table or corollary that lists the exact constants or log factors omitted by Θ notation would help readers verify the decomposition.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the helpful suggestion regarding the approximate-DP extension. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3: the claim that lower bounds extend to approximate DP 'in the stated small-δ regimes' requires an explicit statement of the δ threshold (e.g., δ = o(1/(nτ)) or similar) under which the pure-DP lower bound carries over; without this the extension is not yet load-bearing for the approximate-DP case.

    Authors: We agree that the current phrasing leaves the precise δ threshold implicit. The manuscript states that the lower bounds extend to approximate DP 'in the stated small-δ regimes,' but does not spell out the concrete condition (such as δ = o(1/(nτ))). In the revision we will add an explicit statement of the required δ scaling, derived directly from the proof, so that the extension is fully rigorous and load-bearing. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents private CVaR excess risk as decomposing into ordinary tail-risk statistical error plus a privacy price, with effective sample size nτ for both terms. The stated rates for scalar estimation and finite hypothesis classes are derived from standard DP concentration and statistical learning bounds applied to the tail subsample; the convex Lipschitz case reduces modularly to private stochastic convex optimization with the CVaR-specific term inheriting the 1/(ε n τ) scaling. No equations define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing steps rely on self-citations or imported uniqueness theorems. The argument is self-contained against external DP and learning-theory primitives under the explicitly stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard domain assumptions from differential privacy and convex optimization; no free parameters or new entities are introduced in the abstract.

axioms (3)
  • domain assumption Pure differential privacy
    Complete rates stated to hold under pure DP.
  • domain assumption Small-δ regime for approximate DP lower bounds
    Lower bounds extend to approximate DP only in stated small-δ regimes.
  • domain assumption Convex Lipschitz losses
    Modular upper and lower reductions for the convex case.

pith-pipeline@v0.9.0 · 5748 in / 1480 out tokens · 87134 ms · 2026-05-20T20:31:19.542210+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Acerbi and D

    C. Acerbi and D. Tasche. On the coherence of expected shortfall. Journal of Banking and Finance, 26(7):1487--1503, 2002

  2. [2]

    Artzner, F

    P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9(3):203--228, 1999

  3. [3]

    P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Annals of Statistics, 33(4):1497--1537, 2005

  4. [4]

    Bassily, V

    R. Bassily, V. Feldman, K. Talwar, and A. G. Thakurta. Private stochastic convex optimization with optimal rates. Advances in Neural Information Processing Systems, 2019

  5. [5]

    Bassily, A

    R. Bassily, A. Smith, and A. G. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science, pages 464--473, 2014

  6. [6]

    Boucheron, G

    S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

  7. [7]

    D. B. Brown. Large deviations bounds for estimating conditional value-at-risk. Operations Research Letters, 35(6):722--730, 2007

  8. [8]

    Chaudhuri, C

    K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12:1069--1109, 2011

  9. [9]

    Chen and G

    D. Chen and G. A. Chua. Differentially private stochastic convex optimization under a quantile loss function. Proceedings of the 40th International Conference on Machine Learning, PMLR 202:4435--4461, 2023

  10. [10]

    Duchi, T

    J. Duchi, T. Hashimoto, and H. Namkoong. Distributionally robust losses for latent covariate mixtures. Operations Research, 71(2):649--664, 2023

  11. [11]

    Dwork, F

    C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference, LNCS 3876:265--284, Springer, 2006

  12. [12]

    Dwork and A

    C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3--4):211--407, 2014

  13. [13]

    Gillenwater, M

    J. Gillenwater, M. Joseph, and A. Kulesza. Differentially private quantiles. Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3713--3722, 2021

  14. [14]

    Hashimoto, M

    T. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang. Fairness without demographics in repeated loss minimization. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1929--1938, 2018

  15. [15]

    McSherry and K

    F. McSherry and K. Talwar. Mechanism design via differential privacy. Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, pages 94--103, 2007

  16. [16]

    Mhammedi, B

    Z. Mhammedi, B. Guedj, and R. C. Williamson. PAC-Bayesian bound for the conditional value at risk. Advances in Neural Information Processing Systems, 2020

  17. [17]

    D. K. Mulumudi, P. Manupriya, G. Aminian, and A. Raj. On the generalization and robustness in conditional value-at-risk. arXiv preprint arXiv:2602.18053, 2026

  18. [18]

    R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2(3):21--41, 2000

  19. [19]

    R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions. Journal of Banking and Finance, 26(7):1443--1471, 2002

  20. [20]

    Ruszczy\'nski and A

    A. Ruszczy\'nski and A. Shapiro. Optimization of convex risk functions. Mathematics of Operations Research, 31(3):433--452, 2006

  21. [21]

    A. Shapiro. On Kusuoka representation of law invariant risk measures. Mathematics of Operations Research, 38(1):142--152, 2013

  22. [22]

    A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. Proceedings of the 43rd Annual ACM Symposium on Theory of Computing, pages 813--821, 2011

  23. [23]

    Takeda and T

    A. Takeda and T. Kanamori. A robust approach based on conditional value-at-risk measure to statistical learning problems. European Journal of Operational Research, 198(1):287--296, 2009

  24. [24]

    P. S. Thomas and E. Learned-Miller. Concentration inequalities for conditional value at risk. Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6225--6233, 2019

  25. [25]

    A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, 1996

  26. [26]

    D. Xu, M. Ding, Z. Ma, H. Xie, Y. Tao, A. Slaitane, and D. Wang. Differentially private non-convex distributionally robust optimization. arXiv preprint arXiv:2602.16155, 2026

  27. [27]

    Zhou and R

    X. Zhou and R. Bassily. Differentially private worst-group risk minimization. Proceedings of the 41st International Conference on Machine Learning, PMLR 235:61783--61803, 2024