Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning
Pith reviewed 2026-06-29 13:21 UTC · model grok-4.3
The pith
A horizon-free pure-DP algorithm achieves gap-dependent regret bounded by 1000 times (log K over Delta_min plus log K over epsilon) for all time horizons T in stochastic decision-theoretic online learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. For K actions, losses in [0,1], and a unique best action separated from the second-best by gap Delta_min, the known lower bound is of order log K over min(Delta_min, epsilon). We give a horizon-free pure-DP algorithm and prove the explicit regret bound Reg_T <= 1000 * (log K / Delta_min + log K / epsilon) for every horizon T. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The rand
What carries the argument
Block partitioning with exponentially growing sizes combined with an exponential mechanism on a data-independent random prefix of the prior block; the prefix turns each block's regret into a controllable sum of softmax errors whose total is bounded by an entropy potential.
If this is right
- The achieved regret is independent of the time horizon T.
- The privacy cost appears only as the additive term log K / epsilon and does not interact with the gap term.
- The same block structure yields a pure-DP guarantee while preserving the non-private gap-dependent rate up to the constant factor.
- The construction works for any sequence of loss vectors satisfying the unique-gap assumption.
Where Pith is reading between the lines
- The random-prefix technique may extend to other settings where one wants to decouple privacy noise from gap-dependent selection.
- If the constant 1000 can be reduced, the same algorithmic skeleton would immediately improve the explicit bound.
- The entropy-potential argument might apply to other full-information private online problems that admit a softmax error decomposition.
Load-bearing premise
The analysis assumes a unique best action separated from the second-best by a fixed gap Delta_min, losses bounded in [0,1], and that the random-prefix exponential mechanism converts block regret into a sum of softmax errors that an entropy potential can control.
What would settle it
An explicit instance with unique gap Delta_min where the algorithm's realized regret exceeds 1000 times (log K / Delta_min + log K / epsilon) on some run of length T, or a matching lower bound construction showing a strictly smaller universal constant is impossible.
read the original abstract
We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $\Delta_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{\Delta_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{\Delta_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{\Delta_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to resolve the COLT open problem of Hu and Mehta by giving a horizon-free pure event-level DP algorithm for stochastic decision-theoretic online learning. For K actions and losses in [0,1] with unique best action separated by gap Δ_min, it proves the explicit T-independent regret bound Reg_T ≤ 1000 ⋅ (log K / Δ_min + log K / ε) that matches the known lower bound up to the universal constant 1000. The algorithm partitions time into exponentially growing blocks, plays a single action per block, and selects the next action via the exponential mechanism applied to a data-independent random prefix of the prior block; a single entropy-potential argument is asserted to bound the total contribution of privacy-dominated actions by O(log K / ε).
Significance. If the central analysis holds, the result is significant: it supplies the first explicit horizon-free pure-DP algorithm achieving the optimal gap-dependent rate, directly answering the open question. The block-partitioning construction together with the random-prefix reduction and entropy-potential argument are technically notable for eliminating all log T factors while preserving pure DP.
major comments (1)
- [Abstract] Abstract (final paragraph describing the algorithm): the claim that 'a single entropy-potential argument controls all privacy-dominated large-gap actions at cost log K/ε' is load-bearing for the T-independent bound. With Θ(log T) blocks of exponentially increasing size, the random-prefix construction produces a sum over an increasing number of prefix lengths; without an explicit definition of the potential function, its update rule across blocks, and the telescoping argument that prevents accumulation of log T factors, it is impossible to verify that the total privacy cost remains O(log K / ε) rather than O(log T ⋅ log K / ε).
minor comments (1)
- [Abstract] The constant 1000 is explicitly stated as unoptimized; a short remark on whether the analysis yields a concrete (even if large) improvement path would aid readers.
Simulated Author's Rebuttal
We thank the referee for their thorough review and positive assessment of the significance of our work. We address the major comment regarding the entropy-potential argument below. We agree that additional details are needed for clarity and will revise the manuscript to include them.
read point-by-point responses
-
Referee: [Abstract] Abstract (final paragraph describing the algorithm): the claim that 'a single entropy-potential argument controls all privacy-dominated large-gap actions at cost log K/ε' is load-bearing for the T-independent bound. With Θ(log T) blocks of exponentially increasing size, the random-prefix construction produces a sum over an increasing number of prefix lengths; without an explicit definition of the potential function, its update rule across blocks, and the telescoping argument that prevents accumulation of log T factors, it is impossible to verify that the total privacy cost remains O(log K / ε) rather than O(log T ⋅ log K / ε).
Authors: We acknowledge that the abstract's description is high-level and that the full details of the entropy-potential argument are crucial for verifying the bound. In the revised version of the paper, we will expand the relevant section to explicitly define the entropy potential function, describe its update rule after each block, and provide the telescoping argument showing that the total contribution from privacy-dominated actions sums to O(log K / ε) independent of T. The argument relies on the fact that the random prefix ensures the selection error is averaged over prefixes, and the potential decreases sufficiently to offset the privacy cost without accumulation over the log T blocks due to the exponential block sizes and the uniform control over large-gap actions. revision: yes
Circularity Check
Block partitioning and single entropy-potential argument yield horizon-free bound with no self-referential reduction
full rationale
The paper derives the stated regret bound from an explicit block-partitioning construction (exponentially growing blocks) combined with an entropy-potential argument that is asserted to telescope the summed softmax errors over all prefix lengths and all blocks at total cost O(log K / ε). No equations are presented in which a fitted parameter is renamed as a prediction, a quantity is defined in terms of itself, or a load-bearing step reduces to a self-citation. The central claim therefore remains independent of its own outputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- 1000 =
1000
axioms (2)
- standard math Pure event-level differential privacy definition and properties of the exponential mechanism
- domain assumption Existence of a unique best action with fixed gap Δ_min and losses in [0,1]
Reference graph
Works this paper leans on
-
[1]
St´ ephane Boucheron, G´ abor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 02 2013. ISBN 9780199535255. doi: 10.1093/acprof:oso/9780199535255.001.0001. URLhttps://doi.org/ 10.1093/acprof:oso/9780199535255.001.0001
work page doi:10.1093/acprof:oso/9780199535255.001.0001 2013
-
[2]
Differential privacy
Cynthia Dwork. Differential privacy. automata, languages and programming, 2006
2006
-
[3]
A decision-theoretic generalization of on-line learning and an application to boosting.Journal of computer and system sciences, 55(1):119–139, 1997
Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting.Journal of computer and system sciences, 55(1):119–139, 1997
1997
-
[4]
Probability inequalities for sums of bounded random variables.Journal of the American statistical association, 58(301):13–30, 1963
Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American statistical association, 58(301):13–30, 1963
1963
-
[5]
Bingshan Hu and Nishant A. Mehta. Open problem: Optimal rates for stochastic decision- theoretic online learning under differentially privacy. In Shipra Agrawal and Aaron Roth, editors,Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 ofPro- ceedings of Machine Learning Research, pages 5330–5334. PMLR, 30 Jun–03 Jul 2024. URL https:/...
2024
-
[6]
Bingshan Hu, Zhiming Huang, Nishant A. Mehta, and Nidhi Hegde. Near-optimal algo- rithms for differentially private online learning in a stochastic environment.arXiv preprint arXiv:2102.07929, 2021
-
[7]
On minimaxity of follow the leader strategy in the stochastic setting
Wojciech Kot lowski. On minimaxity of follow the leader strategy in the stochastic setting. Theoretical Computer Science, 742:50–65, 2018
2018
-
[8]
Mechanism design via differential privacy
Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103. IEEE, 2007
2007
-
[9]
On the optimality of the hedge algorithm in the stochastic regime.Journal of Machine Learning Research, 20(83):1–28, 2019
Jaouad Mourtada and St´ ephane Ga¨ ıffas. On the optimality of the hedge algorithm in the stochastic regime.Journal of Machine Learning Research, 20(83):1–28, 2019
2019
-
[10]
Improved regret in stochastic decision-theoretic online learning under differential privacy
Ruihan Wu and Yu-Xiang Wang. Improved regret in stochastic decision-theoretic online learning under differential privacy. In Matus Telgarsky and Jonathan Ullman, editors,Pro- ceedings of The 37th International Conference on Algorithmic Learning Theory, volume 313 ofProceedings of Machine Learning Research, pages 1–22. PMLR, 23–26 Feb 2026. URL https://pro...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.