The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

El Mustapha Mansouri

arxiv: 2605.16219 · v1 · pith:U64BQ3LCnew · submitted 2026-05-15 · 💻 cs.LG · stat.ML

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

El Mustapha Mansouri This is my paper

Pith reviewed 2026-05-20 20:31 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords differential privacyCVaR optimizationtail riskexcess risk boundseffective sample sizestochastic convex optimizationprivacy price

0 comments

The pith

Differential privacy reduces the effective sample size for CVaR learning from n to ε n τ.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that adding differential privacy to Conditional Value at Risk optimization changes the governing sample size from the total number of records n to the product ε n τ, where τ is the tail mass and ε is the privacy parameter. Excess risk then splits cleanly into ordinary statistical error from estimating the tail and a separate privacy price that scales as 1 over that effective size. Complete rates are derived for scalar estimation and finite hypothesis classes, while modular reductions show the same 1/(ε n τ) scaling for convex Lipschitz problems, inheriting dimension dependence from private stochastic convex optimization. The results frame private CVaR as standard private learning performed on an effective dataset of only Θ(n τ) tail records.

Core claim

Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. Scalar estimation achieves rate Θ(B min{1, (nτ)^{-1/2} + (ε n τ)^{-1}}), finite classes of size M achieve Θ(B min{1, sqrt(log(2M)/(nτ)) + log(2M)/(ε n τ)}), and convex Lipschitz learning requires a CVaR-specific privacy term that necessarily scales as 1/(ε n τ) with dimension factors from private SCO. These complete rates hold under pure DP, with lower bounds extending to approximate DP in small-δ regimes.

What carries the argument

The effective private tail sample size ε n τ, which replaces n as the quantity controlling the privacy contribution to excess risk in CVaR optimization.

If this is right

Scalar and finite-class CVaR learners under pure DP attain the stated two-term rates with the privacy term explicit.
Lower bounds extend to approximate DP only in the small-δ regimes stated in the analysis.
Convex Lipschitz CVaR optimization inherits at least Ω(1/(ε n τ)) privacy cost in excess risk, plus dimension dependence.
The canonical hard subproblem is ordinary private learning restricted to Θ(n τ) informative tail records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Data collection budgets for private tail-risk models may need to grow proportionally to 1/τ to offset the reduced effective size.
Tail-focused metrics appear more privacy-sensitive than mean-risk metrics because only a fraction τ of samples contribute information.
Similar effective-sample-size reductions may apply to other coherent risk measures or quantile-based objectives.

Load-bearing premise

The complete rates and modular reductions assume pure differential privacy, with the convex case further requiring Lipschitz convex losses.

What would settle it

Measure private CVaR excess risk while fixing n τ but varying n and τ independently; if the privacy term does not stay proportional to 1/(ε n τ), the claimed decomposition fails.

Figures

Figures reproduced from arXiv: 2605.16219 by El Mustapha Mansouri.

**Figure 1.** Figure 1: Error decomposition in private CVaR learning. CVaR concentrates learning on the worst τ -fraction of the sample, leaving only nτ informative tail observations. Ordinary tail-risk estimation contributes the statistical term 1/ √ nτ , while differential privacy adds a separate price of order 1/(εnτ ). Thus privacy acts on the effective private tail sample size εnτ , not on the full sample size n. This is th… view at source ↗

**Figure 2.** Figure 2: The Privacy–Tail-Risk Frontier in Private CVaR Learning. CVaR concentrates the objective on the worst τ -fraction of the loss distribution, so only about nτ records carry tail information. Record-level differential privacy must protect each high-loss tail record, producing sensitivity of order B/(nτ ). Consequently, the privacy-relevant sample size is εnτ , not εn. Nontrivial private tail-risk learning re… view at source ↗

**Figure 3.** Figure 3: Tail-sample embedding reduction. Ordinary expected-risk learning with m samples is embedded into CVaR learning by activating each record with probability τ and assigning inactive records zero loss. The embedded loss ℓ(w; (T, Y )) = T a(w; Y ) places the informative observations in the upper τ -tail, so CVaR recovers the ordinary expected loss on the active distribution. Thus the effective-sample-size heuri… view at source ↗

**Figure 4.** Figure 4: Matched convex upper and lower transfers. The upper route applies the rescaled Rockafellar–Uryasev lift and imports the Euclidean approximate-DP SCO upper primitive, producing the displayed private term. The lower route uses the tail-sample embedding to transfer Euclidean private-SCO hardness into CVaR learning. The tilde-Omega notation suppresses the logarithmic factors inherited from the imported Euclide… view at source ↗

read the original abstract

Differential privacy changes the effective sample size governing CVaR learning. For tail mass $\tau$, the privacy-relevant sample size is not $n$, but $n\tau$; equivalently, the effective private tail sample size is $\epsilon n\tau$. Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. This decomposition is complete for scalar estimation and finite classes: scalar estimation has rate $\Theta(B \min\{1,(n\tau)^{-1/2}+(\epsilon n\tau)^{-1}\})$, and finite classes of size $M$ have rate $\Theta(B \min\{1,\sqrt{\log(2M)/(n\tau)}+\log(2M)/(\epsilon n\tau)\})$. These complete rates hold under pure DP, and their lower bounds extend to approximate DP in the stated small-$\delta$ regimes. For convex Lipschitz learning, modular upper and lower reductions show that the CVaR-specific privacy term necessarily scales as $1/(\epsilon n\tau)$, with dimension dependence inherited from private stochastic convex optimization. Together, these results identify ordinary private learning on $\Theta(n\tau)$ informative tail records as the canonical hard subproblem inside private CVaR learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core contribution is showing that private CVaR optimization is governed by an effective sample size of nτ rather than n, with clean decompositions and reductions that make the privacy price explicit.

read the letter

The main takeaway is that differential privacy changes the governing sample size for CVaR from n to nτ. The excess risk then splits into ordinary tail statistical error plus a privacy term that scales with 1/(ε n τ). This holds with explicit rates for scalar estimation and finite classes, and reduces to private SCO for the convex Lipschitz case while preserving the tail scaling and picking up the usual dimension factors. The modular structure is the part that lands cleanly: they treat the informative tail records as a standard private learning problem on Θ(nτ) points instead of forcing a new analysis from scratch. That framing is useful and avoids unnecessary complexity. The rates look derived from standard primitives without circularity, and the assumptions (pure DP for tight bounds, small-δ extension, Lipschitz convex losses) are stated up front. The lower bounds matching the upper bounds in the simple cases is a plus. The soft spots are limited. The convex case is a reduction, so its tightness rides on existing private SCO results; if those have looseness in high dimensions, it carries through here. The work stays within CVaR and does not claim broader implications for other risk measures. No empirical validation appears, but that is expected for this style of theoretical paper. The citation pattern is appropriate and does not over-rely on self-citation. This is for researchers working on private optimization or risk-sensitive learning who already know DP-SCO and basic CVaR. A reader wanting precise accounting of privacy cost for tail objectives will get direct value. The thinking is clear and the argument structure holds together without load-bearing gaps. It deserves a serious referee to check the full derivations and any edge cases in the regime restrictions.

Referee Report

1 major / 2 minor

Summary. The paper claims that differential privacy alters the effective sample size for CVaR learning from n to nτ (for tail mass τ), so that the privacy-relevant size is ε n τ. Private CVaR excess risk decomposes into ordinary tail-risk statistical error plus a privacy price. Complete matching upper and lower bounds are given for scalar estimation (rate Θ(B min{1, (nτ)^{-1/2} + (ε n τ)^{-1}})) and finite classes of size M (rate Θ(B min{1, √(log(2M)/(nτ)) + log(2M)/(ε n τ)})) under pure DP, with lower bounds extending to approximate DP for small δ. For convex Lipschitz losses the CVaR-specific privacy term scales as 1/(ε n τ) with dimension dependence inherited from private stochastic convex optimization. The central hard subproblem is therefore ordinary private learning on Θ(nτ) informative tail records.

Significance. If the modular reductions and matching bounds hold, the work supplies a clean, interpretable decomposition that isolates the privacy price of tail-risk learning and identifies the effective tail sample size as the governing quantity. The explicit pure-DP rates, small-δ extensions, and reduction to private SCO for the convex case are concrete strengths that would be useful for both theory and practice in private risk-sensitive learning.

major comments (1)

[Abstract, §3] Abstract and §3: the claim that lower bounds extend to approximate DP 'in the stated small-δ regimes' requires an explicit statement of the δ threshold (e.g., δ = o(1/(nτ)) or similar) under which the pure-DP lower bound carries over; without this the extension is not yet load-bearing for the approximate-DP case.

minor comments (2)

Notation: the paper uses both ε and ε n τ for the privacy parameter; a single consistent symbol for the effective privacy budget on the tail would improve readability.
[Abstract] The abstract states 'complete rates' for scalar and finite-class cases; adding a short table or corollary that lists the exact constants or log factors omitted by Θ notation would help readers verify the decomposition.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the helpful suggestion regarding the approximate-DP extension. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3: the claim that lower bounds extend to approximate DP 'in the stated small-δ regimes' requires an explicit statement of the δ threshold (e.g., δ = o(1/(nτ)) or similar) under which the pure-DP lower bound carries over; without this the extension is not yet load-bearing for the approximate-DP case.

Authors: We agree that the current phrasing leaves the precise δ threshold implicit. The manuscript states that the lower bounds extend to approximate DP 'in the stated small-δ regimes,' but does not spell out the concrete condition (such as δ = o(1/(nτ))). In the revision we will add an explicit statement of the required δ scaling, derived directly from the proof, so that the extension is fully rigorous and load-bearing. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents private CVaR excess risk as decomposing into ordinary tail-risk statistical error plus a privacy price, with effective sample size nτ for both terms. The stated rates for scalar estimation and finite hypothesis classes are derived from standard DP concentration and statistical learning bounds applied to the tail subsample; the convex Lipschitz case reduces modularly to private stochastic convex optimization with the CVaR-specific term inheriting the 1/(ε n τ) scaling. No equations define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing steps rely on self-citations or imported uniqueness theorems. The argument is self-contained against external DP and learning-theory primitives under the explicitly stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard domain assumptions from differential privacy and convex optimization; no free parameters or new entities are introduced in the abstract.

axioms (3)

domain assumption Pure differential privacy
Complete rates stated to hold under pure DP.
domain assumption Small-δ regime for approximate DP lower bounds
Lower bounds extend to approximate DP only in stated small-δ regimes.
domain assumption Convex Lipschitz losses
Modular upper and lower reductions for the convex case.

pith-pipeline@v0.9.0 · 5748 in / 1480 out tokens · 87134 ms · 2026-05-20T20:31:19.542210+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price; scalar estimation has rate Θ(B min{1,(nτ)^{-1/2}+(ε nτ)^{-1}})
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sensitivity B min{1,1/(nτ)} for minimized empirical CVaR

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Acerbi and D

C. Acerbi and D. Tasche. On the coherence of expected shortfall. Journal of Banking and Finance, 26(7):1487--1503, 2002

work page 2002
[2]

Artzner, F

P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9(3):203--228, 1999

work page 1999
[3]

P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Annals of Statistics, 33(4):1497--1537, 2005

work page 2005
[4]

Bassily, V

R. Bassily, V. Feldman, K. Talwar, and A. G. Thakurta. Private stochastic convex optimization with optimal rates. Advances in Neural Information Processing Systems, 2019

work page 2019
[5]

Bassily, A

R. Bassily, A. Smith, and A. G. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science, pages 464--473, 2014

work page 2014
[6]

Boucheron, G

S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

work page 2013
[7]

D. B. Brown. Large deviations bounds for estimating conditional value-at-risk. Operations Research Letters, 35(6):722--730, 2007

work page 2007
[8]

Chaudhuri, C

K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12:1069--1109, 2011

work page 2011
[9]

Chen and G

D. Chen and G. A. Chua. Differentially private stochastic convex optimization under a quantile loss function. Proceedings of the 40th International Conference on Machine Learning, PMLR 202:4435--4461, 2023

work page 2023
[10]

Duchi, T

J. Duchi, T. Hashimoto, and H. Namkoong. Distributionally robust losses for latent covariate mixtures. Operations Research, 71(2):649--664, 2023

work page 2023
[11]

Dwork, F

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference, LNCS 3876:265--284, Springer, 2006

work page 2006
[12]

Dwork and A

C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3--4):211--407, 2014

work page 2014
[13]

Gillenwater, M

J. Gillenwater, M. Joseph, and A. Kulesza. Differentially private quantiles. Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3713--3722, 2021

work page 2021
[14]

Hashimoto, M

T. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang. Fairness without demographics in repeated loss minimization. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1929--1938, 2018

work page 1929
[15]

McSherry and K

F. McSherry and K. Talwar. Mechanism design via differential privacy. Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, pages 94--103, 2007

work page 2007
[16]

Mhammedi, B

Z. Mhammedi, B. Guedj, and R. C. Williamson. PAC-Bayesian bound for the conditional value at risk. Advances in Neural Information Processing Systems, 2020

work page 2020
[17]

D. K. Mulumudi, P. Manupriya, G. Aminian, and A. Raj. On the generalization and robustness in conditional value-at-risk. arXiv preprint arXiv:2602.18053, 2026

work page arXiv 2026
[18]

R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2(3):21--41, 2000

work page 2000
[19]

R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions. Journal of Banking and Finance, 26(7):1443--1471, 2002

work page 2002
[20]

Ruszczy\'nski and A

A. Ruszczy\'nski and A. Shapiro. Optimization of convex risk functions. Mathematics of Operations Research, 31(3):433--452, 2006

work page 2006
[21]

A. Shapiro. On Kusuoka representation of law invariant risk measures. Mathematics of Operations Research, 38(1):142--152, 2013

work page 2013
[22]

A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. Proceedings of the 43rd Annual ACM Symposium on Theory of Computing, pages 813--821, 2011

work page 2011
[23]

Takeda and T

A. Takeda and T. Kanamori. A robust approach based on conditional value-at-risk measure to statistical learning problems. European Journal of Operational Research, 198(1):287--296, 2009

work page 2009
[24]

P. S. Thomas and E. Learned-Miller. Concentration inequalities for conditional value at risk. Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6225--6233, 2019

work page 2019
[25]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, 1996

work page 1996
[26]

D. Xu, M. Ding, Z. Ma, H. Xie, Y. Tao, A. Slaitane, and D. Wang. Differentially private non-convex distributionally robust optimization. arXiv preprint arXiv:2602.16155, 2026

work page arXiv 2026
[27]

Zhou and R

X. Zhou and R. Bassily. Differentially private worst-group risk minimization. Proceedings of the 41st International Conference on Machine Learning, PMLR 235:61783--61803, 2024

work page 2024

[1] [1]

Acerbi and D

C. Acerbi and D. Tasche. On the coherence of expected shortfall. Journal of Banking and Finance, 26(7):1487--1503, 2002

work page 2002

[2] [2]

Artzner, F

P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9(3):203--228, 1999

work page 1999

[3] [3]

P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Annals of Statistics, 33(4):1497--1537, 2005

work page 2005

[4] [4]

Bassily, V

R. Bassily, V. Feldman, K. Talwar, and A. G. Thakurta. Private stochastic convex optimization with optimal rates. Advances in Neural Information Processing Systems, 2019

work page 2019

[5] [5]

Bassily, A

R. Bassily, A. Smith, and A. G. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science, pages 464--473, 2014

work page 2014

[6] [6]

Boucheron, G

S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

work page 2013

[7] [7]

D. B. Brown. Large deviations bounds for estimating conditional value-at-risk. Operations Research Letters, 35(6):722--730, 2007

work page 2007

[8] [8]

Chaudhuri, C

K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12:1069--1109, 2011

work page 2011

[9] [9]

Chen and G

D. Chen and G. A. Chua. Differentially private stochastic convex optimization under a quantile loss function. Proceedings of the 40th International Conference on Machine Learning, PMLR 202:4435--4461, 2023

work page 2023

[10] [10]

Duchi, T

J. Duchi, T. Hashimoto, and H. Namkoong. Distributionally robust losses for latent covariate mixtures. Operations Research, 71(2):649--664, 2023

work page 2023

[11] [11]

Dwork, F

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference, LNCS 3876:265--284, Springer, 2006

work page 2006

[12] [12]

Dwork and A

C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3--4):211--407, 2014

work page 2014

[13] [13]

Gillenwater, M

J. Gillenwater, M. Joseph, and A. Kulesza. Differentially private quantiles. Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3713--3722, 2021

work page 2021

[14] [14]

Hashimoto, M

T. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang. Fairness without demographics in repeated loss minimization. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1929--1938, 2018

work page 1929

[15] [15]

McSherry and K

F. McSherry and K. Talwar. Mechanism design via differential privacy. Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, pages 94--103, 2007

work page 2007

[16] [16]

Mhammedi, B

Z. Mhammedi, B. Guedj, and R. C. Williamson. PAC-Bayesian bound for the conditional value at risk. Advances in Neural Information Processing Systems, 2020

work page 2020

[17] [17]

D. K. Mulumudi, P. Manupriya, G. Aminian, and A. Raj. On the generalization and robustness in conditional value-at-risk. arXiv preprint arXiv:2602.18053, 2026

work page arXiv 2026

[18] [18]

R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2(3):21--41, 2000

work page 2000

[19] [19]

R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions. Journal of Banking and Finance, 26(7):1443--1471, 2002

work page 2002

[20] [20]

Ruszczy\'nski and A

A. Ruszczy\'nski and A. Shapiro. Optimization of convex risk functions. Mathematics of Operations Research, 31(3):433--452, 2006

work page 2006

[21] [21]

A. Shapiro. On Kusuoka representation of law invariant risk measures. Mathematics of Operations Research, 38(1):142--152, 2013

work page 2013

[22] [22]

A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. Proceedings of the 43rd Annual ACM Symposium on Theory of Computing, pages 813--821, 2011

work page 2011

[23] [23]

Takeda and T

A. Takeda and T. Kanamori. A robust approach based on conditional value-at-risk measure to statistical learning problems. European Journal of Operational Research, 198(1):287--296, 2009

work page 2009

[24] [24]

P. S. Thomas and E. Learned-Miller. Concentration inequalities for conditional value at risk. Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6225--6233, 2019

work page 2019

[25] [25]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, 1996

work page 1996

[26] [26]

D. Xu, M. Ding, Z. Ma, H. Xie, Y. Tao, A. Slaitane, and D. Wang. Differentially private non-convex distributionally robust optimization. arXiv preprint arXiv:2602.16155, 2026

work page arXiv 2026

[27] [27]

Zhou and R

X. Zhou and R. Bassily. Differentially private worst-group risk minimization. Proceedings of the 41st International Conference on Machine Learning, PMLR 235:61783--61803, 2024

work page 2024