pith. sign in

arxiv: 2602.00417 · v2 · submitted 2026-01-31 · 📊 stat.ML · cs.LG

Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

Pith reviewed 2026-05-16 09:27 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords contextual banditsgeneralized linear modelsdifferential privacyshuffle privacyjoint privacyregret analysisprivate optimization
0
0 comments X

The pith

The first algorithms for generalized linear contextual bandits achieve near-optimal regret under shuffle and joint differential privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops the initial methods that let generalized linear contextual bandits run while satisfying shuffle differential privacy for stochastic contexts and joint differential privacy for adversarial contexts. Earlier private bandit work stayed inside linear reward models because they allow closed-form estimators, but GLMs require private convex optimization whose error must be tracked through changing design matrices and folded into regret bounds. The resulting algorithms recover regret rates that differ from the non-private benchmark by only a sqrt(d/epsilon) factor in the stochastic case and by an additive privacy term in the adversarial case, all while needing nothing beyond l2-bounded contexts and no spectral assumptions on the data distribution.

Core claim

The paper constructs a shuffle-DP algorithm for stochastic contexts whose dominant regret term is O(d^{3/2} sqrt(T log T)/sqrt(epsilon)) and a joint-DP algorithm for adversarial contexts whose regret is O(d sqrt(T) log T + d^{3/4} sqrt(T/epsilon) (log T) (d + log T)^{1/4}). Both match the leading non-private rate up to the stated privacy-dependent factors and require only that the GLM link function admit convex optimization whose error can be bounded and incorporated into the regret analysis.

What carries the argument

Private convex optimization routines that produce bounded-error estimators for the GLM parameter while preserving differential privacy across a sequence of evolving design matrices, with the optimization error explicitly propagated into the regret bound.

If this is right

  • The privacy cost appears as a multiplicative sqrt(d/epsilon) factor for stochastic contexts and only an additive term for adversarial contexts.
  • No spectral assumptions on the context distribution are required beyond l2 boundedness.
  • The same convex-optimization-plus-regret-tracking technique extends prior linear private bandit results to the GLM setting.
  • The algorithms remain valid for any GLM link function that supports bounded-error private convex optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same error-tracking approach could be applied to other online convex optimization problems that lack closed-form solutions.
  • The regret expressions suggest that privacy overhead shrinks relative to the non-private term as T grows large in the adversarial setting.
  • The methods might be instantiated with any off-the-shelf private convex solver whose utility guarantees can be stated in terms of the GLM loss.

Load-bearing premise

The GLM link function admits convex optimization whose error can be bounded and incorporated into the regret analysis when contexts are merely l2-bounded.

What would settle it

A concrete counterexample or numerical instance in which the private optimization error grows too fast to be absorbed into the stated regret bounds under only l2-bounded contexts.

Figures

Figures reproduced from arXiv: 2602.00417 by Sahasrajit Sarmasarkar.

Figure 1
Figure 1. Figure 1: Cumulative regret under different generalized linear models. Private-GLM variants are plotted with distinct [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
read the original abstract

We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward models -- which admit closed-form estimators -- generalized linear models (GLMs) pose fundamental new challenges: no closed-form estimator exists, requiring private convex optimization; privacy must be tracked across multiple evolving design matrices; and optimization error must be explicitly incorporated into regret analysis. We address these challenges under two privacy models and context settings. For stochastic contexts, we design a shuffle-DP algorithm achieving $\tilde{O}(d^{3/2}\sqrt{T \log T}/\sqrt{\varepsilon})$ regret in dominant term, differing from the non-private rate by a factor of $\sqrt{d/\varepsilon}$. For adversarial contexts, we provide a joint-DP algorithm with regret $\tilde{O}\!\big(d\sqrt{T} \log T + d^{3/4}\sqrt{T/\varepsilon}\,(\log T)\,(d + \log T)^{1/4}\big)$ -- matching the non-private rate $\tilde{O}(d\sqrt{T} \log T)$ in the leading term, with privacy contributing only an additive correction. Unlike prior work on locally private GLM bandits, our methods require no spectral assumptions on the context distribution beyond $\ell_2$ boundedness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper presents the first algorithms for generalized linear contextual bandits under shuffle differential privacy (stochastic contexts) and joint differential privacy (adversarial contexts). For stochastic contexts it achieves regret Õ(d^{3/2} √(T log T)/√ε), differing from the non-private rate by √(d/ε). For adversarial contexts it achieves Õ(d √(T log T) + d^{3/4} √(T/ε) (log T) (d + log T)^{1/4}), matching the non-private leading term with only an additive privacy correction. The analysis relies on ℓ₂-bounded contexts, standard GLM link assumptions (monotonicity and Lipschitz continuity), and explicit incorporation of private convex optimization error into the regret decomposition without additional spectral assumptions.

Significance. If the stated regret bounds hold, the work meaningfully extends private contextual bandits from linear to generalized linear models, addressing the lack of closed-form estimators and the need to track privacy across evolving design matrices. The explicit folding of optimization error into the regret analysis and the near-matching of non-private rates (especially the leading term for adversarial contexts) are technically valuable. The relaxation to only ℓ₂ boundedness rather than spectral assumptions broadens applicability. The derivations appear to use standard GLM regret decomposition techniques augmented with privacy mechanisms.

minor comments (3)
  1. [Abstract] Abstract: the regret expressions are stated without a one-sentence pointer to the proof strategy (e.g., “by folding private convex optimization error into the standard GLM decomposition”); adding this would improve readability for readers who stop at the abstract.
  2. [§3] Notation: the evolving design matrix is referenced in multiple places; a single consolidated definition (perhaps in §3) with explicit dependence on the privacy mechanism would reduce cross-referencing.
  3. [Table 1] Table 1 (comparison with prior work): the row for “this work” lists the two regret bounds but omits the precise privacy parameter dependence; aligning the table columns with the theorem statements would help.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. The summary correctly identifies the core contributions: the first shuffle-DP and joint-DP algorithms for GLM contextual bandits, the regret bounds, and the relaxation to only ℓ₂-bounded contexts without spectral assumptions. No specific major comments appear in the report, so we provide no point-by-point rebuttals below.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces new algorithms for GLM contextual bandits under shuffle and joint differential privacy. Regret bounds are obtained by explicitly incorporating the error term from private convex optimization into the standard GLM regret decomposition, relying only on ℓ₂-bounded contexts and monotonicity/Lipschitz assumptions on the link function. No equations reduce a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction. The central claims rest on independent algorithmic constructions and standard analysis techniques rather than self-referential definitions or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on standard differential privacy definitions and GLM convexity assumptions from prior literature; no new free parameters or invented entities are introduced.

axioms (1)
  • domain assumption GLM link function permits convex optimization with bounded error
    Required to incorporate optimization error into regret analysis.

pith-pipeline@v0.9.0 · 5535 in / 912 out tokens · 28573 ms · 2026-05-16T09:27:35.509518+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

    PMLR, 30 Jun–03 Jul 2024b. URL https: //proceedings.mlr.press/v247/azize24a.html. Borja Balle, Gilles Barthe, and Marco Gaboardi. Pri- vacy amplification by subsampling: tight analyses via 9 couplings and divergences. InProceedings of the 32nd International Conference on Neural Information Pro- cessing Systems, NIPS’18, page 6280–6290, Red Hook, NY, USA, ...

  2. [2]

    Concentrated Differential Privacy: Simplifica- tions, Extensions, and Lower Bounds,

    Springer-Verlag. ISBN 9783662536407. doi: 10.1007/978-3-662-53641-4 24. URL https://doi. org/10.1007/978-3-662-53641-4_24. T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics.ACM Trans. Inf. Syst. Secur., 14(3), November 2011. ISSN 1094-9224. doi: 10.1145/2043621.2043626. URL https://doi. org/10.1145/2043621.2043626. ...

  3. [3]

    Korhonen, A single-exponential time 2-approximation algorithm for treewidth, in: IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS 2021), 2022, pp

    doi: 10.1109/FOCS52979.2021.00096. Sarah Filippi, Olivier Cappe, Aur´ elien Garivier, and Csaba Szepesv´ ari. Parametric bandits: The generalized linear case. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors,Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010. URL https://proceedings. ...

  4. [4]

    Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Ma- nurangsi, Rasmus Pagh, and Ameya Velingker

    URL https://proceedings.mlr.press/v167/ garcelon22a.html. Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Ma- nurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous mes- sages. In11th Innovations in Theoretical Computer Science Conference (ITC 2020), volume 151 ofLIPIcs, pages 15:1–15:23. Schloss Dagstuhl–Leibniz-Ze...

  5. [5]

    URL https://www.aeaweb.org/articles?id= 10.1257/aer.104.5.431. J. Kiefer and J. Wolfowitz. The equivalence of two ex- tremum problems.Canadian Journal of Mathematics, 12:363–366, 1960. doi: 10.4153/CJM-1960-030-4. Jiachun Li, David Simchi-Levi, and Yining Wang. On the optimal regret of locally private linear contextual bandit.arXiv preprint arXiv:2404.094...

  6. [6]

    Schapire

    Association for Computing Machinery. ISBN 9781605587998. doi: 10.1145/1772690.1772758. URL https://doi.org/10.1145/1772690.1772758. Lihong Li, Yu Lu, and Dengyong Zhou. Provably optimal algorithms for generalized linear contextual bandits. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learn- ing, vol...

  7. [7]

    doi: 10.1109/CSF.2017

    IEEE Computer Society. doi: 10.1109/CSF.2017

  8. [8]

    URLhttp://dx.doi.org/10.1109/ CSF.2017.11

    URL https://doi.ieeecomputersociety.org/ 10.1109/CSF.2017.11. Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar. Max-Information, Differential Pri- vacy, and Post-selection Hypothesis Testing . In2016 IEEE 57th Annual Symposium on Foundations of Com- puter Science (FOCS), pages 487–494, Los Alami- tos, CA, USA, October 2016. IEEE Computer Soci- ety. doi...

  9. [9]

    ISBN 9781450380539

    Association for Computing Machinery. ISBN 9781450380539. doi: 10.1145/3406325.3451004. URL https://doi.org/10.1145/3406325.3451004. Ayush Sawarni, Nirjhar Das, Siddharth Barman, and Gaurav Sinha. Generalized linear bandits with limited adaptivity. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neu...

  10. [10]

    Dynamic transitive closure via dynamic matrix inverse

    IEEE Computer Society. doi: 10.1109/FOCS. 2010.12. URL https://doi.ieeecomputersociety. org/10.1109/FOCS.2010.12. Yogatheesan Varatharajah and Brent Berry. A contextual- bandit-based approach for informed decision-making in clinical trials.Life, 12(8), 2022. ISSN 2075-1729. doi: 10.3390/life12081277. URL https://www.mdpi. com/2075-1729/12/8/1277. Roman Ve...

  11. [11]

    doi: 10.1017/ 9781108231596

    ISBN 978-1-108-41519-4. doi: 10.1017/ 9781108231596. URL https://www.cambridge. org/core/books/highdimensional-probability/ 797C466DA29743D2C8213493BD2D2102. Justin Whitehouse, Aaditya Ramdas, Ryan Rogers, and Zhiwei Steven Wu. Fully-adaptive composition in differ- ential privacy. InProceedings of the 40th International Conference on Machine Learning, ICM...

  12. [12]

    R p dΛT + RS d1/4Λ1/2 T Λ1/2 δ d+ Λ T 1/4 p εmin(κ ∗,1) # (51) γ=C

    Set the number of iteration T = ε2s2 dlog(sd/δ) and parameter η = 2S L √ T . Let matrix H∗ =Pt j=1 ˙µ(⟨xj, θ∗⟩)xjxT j + N where λminI⪯ N ⪯λ maxI with probability at least 1− 1 T 6 Then, ifλ min ≥20dRlogTwith probability at least1− 3 T 2 , the following inequalities hold ||θ∗ − ˆθ||H∗ ≤24RS p logT+d+ R(logT+d)√λmin + 2S p λmax + √ 4RSν(31) To prove this le...

  13. [13]

    Let N denote the set of all values computed in the binary tree

    Let nt denote the set of nodes that were updated at time step t and Nt denote their corresponding values. Let N denote the set of all values computed in the binary tree. We use a−i to denote all the coordinates of a vector a except the ith coordinate. Similarly, a≤i and a<i denote the coordinates of the first i and i− 1 coordinates of the vector a respect...