Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

Sahasrajit Sarmasarkar

arxiv: 2602.00417 · v2 · submitted 2026-01-31 · 📊 stat.ML · cs.LG

Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

Sahasrajit Sarmasarkar This is my paper

Pith reviewed 2026-05-16 09:27 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords contextual banditsgeneralized linear modelsdifferential privacyshuffle privacyjoint privacyregret analysisprivate optimization

0 comments

The pith

The first algorithms for generalized linear contextual bandits achieve near-optimal regret under shuffle and joint differential privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops the initial methods that let generalized linear contextual bandits run while satisfying shuffle differential privacy for stochastic contexts and joint differential privacy for adversarial contexts. Earlier private bandit work stayed inside linear reward models because they allow closed-form estimators, but GLMs require private convex optimization whose error must be tracked through changing design matrices and folded into regret bounds. The resulting algorithms recover regret rates that differ from the non-private benchmark by only a sqrt(d/epsilon) factor in the stochastic case and by an additive privacy term in the adversarial case, all while needing nothing beyond l2-bounded contexts and no spectral assumptions on the data distribution.

Core claim

The paper constructs a shuffle-DP algorithm for stochastic contexts whose dominant regret term is O(d^{3/2} sqrt(T log T)/sqrt(epsilon)) and a joint-DP algorithm for adversarial contexts whose regret is O(d sqrt(T) log T + d^{3/4} sqrt(T/epsilon) (log T) (d + log T)^{1/4}). Both match the leading non-private rate up to the stated privacy-dependent factors and require only that the GLM link function admit convex optimization whose error can be bounded and incorporated into the regret analysis.

What carries the argument

Private convex optimization routines that produce bounded-error estimators for the GLM parameter while preserving differential privacy across a sequence of evolving design matrices, with the optimization error explicitly propagated into the regret bound.

If this is right

The privacy cost appears as a multiplicative sqrt(d/epsilon) factor for stochastic contexts and only an additive term for adversarial contexts.
No spectral assumptions on the context distribution are required beyond l2 boundedness.
The same convex-optimization-plus-regret-tracking technique extends prior linear private bandit results to the GLM setting.
The algorithms remain valid for any GLM link function that supports bounded-error private convex optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same error-tracking approach could be applied to other online convex optimization problems that lack closed-form solutions.
The regret expressions suggest that privacy overhead shrinks relative to the non-private term as T grows large in the adversarial setting.
The methods might be instantiated with any off-the-shelf private convex solver whose utility guarantees can be stated in terms of the GLM loss.

Load-bearing premise

The GLM link function admits convex optimization whose error can be bounded and incorporated into the regret analysis when contexts are merely l2-bounded.

What would settle it

A concrete counterexample or numerical instance in which the private optimization error grows too fast to be absorbed into the stated regret bounds under only l2-bounded contexts.

Figures

Figures reproduced from arXiv: 2602.00417 by Sahasrajit Sarmasarkar.

**Figure 1.** Figure 1: Cumulative regret under different generalized linear models. Private-GLM variants are plotted with distinct [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗

read the original abstract

We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward models -- which admit closed-form estimators -- generalized linear models (GLMs) pose fundamental new challenges: no closed-form estimator exists, requiring private convex optimization; privacy must be tracked across multiple evolving design matrices; and optimization error must be explicitly incorporated into regret analysis. We address these challenges under two privacy models and context settings. For stochastic contexts, we design a shuffle-DP algorithm achieving $\tilde{O}(d^{3/2}\sqrt{T \log T}/\sqrt{\varepsilon})$ regret in dominant term, differing from the non-private rate by a factor of $\sqrt{d/\varepsilon}$. For adversarial contexts, we provide a joint-DP algorithm with regret $\tilde{O}\!\big(d\sqrt{T} \log T + d^{3/4}\sqrt{T/\varepsilon}\,(\log T)\,(d + \log T)^{1/4}\big)$ -- matching the non-private rate $\tilde{O}(d\sqrt{T} \log T)$ in the leading term, with privacy contributing only an additive correction. Unlike prior work on locally private GLM bandits, our methods require no spectral assumptions on the context distribution beyond $\ell_2$ boundedness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first shuffle-DP and joint-DP algorithms for GLM contextual bandits, with regret bounds that fold in optimization error and track privacy over multiple design matrices.

read the letter

The punchline is that the work extends private contextual bandits beyond linear models to GLMs. It supplies explicit algorithms for both shuffle DP on stochastic contexts and joint DP on adversarial contexts, and the regret expressions match the non-private rates in the leading terms for the adversarial case while adding only a controlled sqrt(d/eps) factor for the stochastic one. The analysis uses only l2-bounded contexts plus standard monotonicity and Lipschitz assumptions on the link function, without extra spectral requirements on the context distribution. That is a genuine advance over prior linear-only private work and locally private GLM results. The paper handles the three stated challenges directly: private convex optimization replaces the closed-form estimator, privacy is tracked across evolving design matrices via the shuffle and joint mechanisms, and optimization error is folded into the regret decomposition. The stress-test note confirms the full manuscript carries out these derivations without hidden gaps. The only soft spot worth noting is that the stochastic regret still carries the extra sqrt(d) factor from privacy; this looks like the natural price of the model rather than a flaw, but it is the term that future tightening would target. No circularity appears in the regret expressions, and the citation pattern properly positions the result against existing linear and local-DP baselines. This is the kind of paper I would bring to a reading group on private online learning. It is worth citing if you work in that corner of stat.ML. A serious editor should send it to peer review because the technical pieces are grounded and the claims are verifiable from the derivations provided.

Referee Report

0 major / 3 minor

Summary. The paper presents the first algorithms for generalized linear contextual bandits under shuffle differential privacy (stochastic contexts) and joint differential privacy (adversarial contexts). For stochastic contexts it achieves regret Õ(d^{3/2} √(T log T)/√ε), differing from the non-private rate by √(d/ε). For adversarial contexts it achieves Õ(d √(T log T) + d^{3/4} √(T/ε) (log T) (d + log T)^{1/4}), matching the non-private leading term with only an additive privacy correction. The analysis relies on ℓ₂-bounded contexts, standard GLM link assumptions (monotonicity and Lipschitz continuity), and explicit incorporation of private convex optimization error into the regret decomposition without additional spectral assumptions.

Significance. If the stated regret bounds hold, the work meaningfully extends private contextual bandits from linear to generalized linear models, addressing the lack of closed-form estimators and the need to track privacy across evolving design matrices. The explicit folding of optimization error into the regret analysis and the near-matching of non-private rates (especially the leading term for adversarial contexts) are technically valuable. The relaxation to only ℓ₂ boundedness rather than spectral assumptions broadens applicability. The derivations appear to use standard GLM regret decomposition techniques augmented with privacy mechanisms.

minor comments (3)

[Abstract] Abstract: the regret expressions are stated without a one-sentence pointer to the proof strategy (e.g., “by folding private convex optimization error into the standard GLM decomposition”); adding this would improve readability for readers who stop at the abstract.
[§3] Notation: the evolving design matrix is referenced in multiple places; a single consolidated definition (perhaps in §3) with explicit dependence on the privacy mechanism would reduce cross-referencing.
[Table 1] Table 1 (comparison with prior work): the row for “this work” lists the two regret bounds but omits the precise privacy parameter dependence; aligning the table columns with the theorem statements would help.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. The summary correctly identifies the core contributions: the first shuffle-DP and joint-DP algorithms for GLM contextual bandits, the regret bounds, and the relaxation to only ℓ₂-bounded contexts without spectral assumptions. No specific major comments appear in the report, so we provide no point-by-point rebuttals below.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces new algorithms for GLM contextual bandits under shuffle and joint differential privacy. Regret bounds are obtained by explicitly incorporating the error term from private convex optimization into the standard GLM regret decomposition, relying only on ℓ₂-bounded contexts and monotonicity/Lipschitz assumptions on the link function. No equations reduce a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction. The central claims rest on independent algorithmic constructions and standard analysis techniques rather than self-referential definitions or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on standard differential privacy definitions and GLM convexity assumptions from prior literature; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption GLM link function permits convex optimization with bounded error
Required to incorporate optimization error into regret analysis.

pith-pipeline@v0.9.0 · 5535 in / 912 out tokens · 28573 ms · 2026-05-16T09:27:35.509518+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 1 internal anchor

[1]

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

PMLR, 30 Jun–03 Jul 2024b. URL https: //proceedings.mlr.press/v247/azize24a.html. Borja Balle, Gilles Barthe, and Marco Gaboardi. Pri- vacy amplification by subsampling: tight analyses via 9 couplings and divergences. InProceedings of the 32nd International Conference on Neural Information Pro- cessing Systems, NIPS’18, page 6280–6290, Red Hook, NY, USA, ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3372297 2018
[2]

Concentrated Differential Privacy: Simplifica- tions, Extensions, and Lower Bounds,

Springer-Verlag. ISBN 9783662536407. doi: 10.1007/978-3-662-53641-4 24. URL https://doi. org/10.1007/978-3-662-53641-4_24. T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics.ACM Trans. Inf. Syst. Secur., 14(3), November 2011. ISSN 1094-9224. doi: 10.1145/2043621.2043626. URL https://doi. org/10.1145/2043621.2043626. ...

work page doi:10.1007/978-3-662-53641-4 2011
[3]

Korhonen, A single-exponential time 2-approximation algorithm for treewidth, in: IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS 2021), 2022, pp

doi: 10.1109/FOCS52979.2021.00096. Sarah Filippi, Olivier Cappe, Aur´ elien Garivier, and Csaba Szepesv´ ari. Parametric bandits: The generalized linear case. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors,Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010. URL https://proceedings. ...

work page doi:10.1109/focs52979.2021.00096 2021
[4]

Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Ma- nurangsi, Rasmus Pagh, and Ameya Velingker

URL https://proceedings.mlr.press/v167/ garcelon22a.html. Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Ma- nurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous mes- sages. In11th Innovations in Theoretical Computer Science Conference (ITC 2020), volume 151 ofLIPIcs, pages 15:1–15:23. Schloss Dagstuhl–Leibniz-Ze...

work page doi:10.4230/lipics.itc.2020.15 2020
[5]

URL https://www.aeaweb.org/articles?id= 10.1257/aer.104.5.431. J. Kiefer and J. Wolfowitz. The equivalence of two ex- tremum problems.Canadian Journal of Mathematics, 12:363–366, 1960. doi: 10.4153/CJM-1960-030-4. Jiachun Li, David Simchi-Levi, and Yining Wang. On the optimal regret of locally private linear contextual bandit.arXiv preprint arXiv:2404.094...

work page doi:10.1257/aer.104.5.431 1960
[6]

Schapire

Association for Computing Machinery. ISBN 9781605587998. doi: 10.1145/1772690.1772758. URL https://doi.org/10.1145/1772690.1772758. Lihong Li, Yu Lu, and Dengyong Zhou. Provably optimal algorithms for generalized linear contextual bandits. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learn- ing, vol...

work page doi:10.1145/1772690.1772758 2071
[7]

doi: 10.1109/CSF.2017

IEEE Computer Society. doi: 10.1109/CSF.2017

work page doi:10.1109/csf.2017 2017
[8]

URLhttp://dx.doi.org/10.1109/ CSF.2017.11

URL https://doi.ieeecomputersociety.org/ 10.1109/CSF.2017.11. Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar. Max-Information, Differential Pri- vacy, and Post-selection Hypothesis Testing . In2016 IEEE 57th Annual Symposium on Foundations of Com- puter Science (FOCS), pages 487–494, Los Alami- tos, CA, USA, October 2016. IEEE Computer Soci- ety. doi...

work page doi:10.1109/csf.2017.11 2017
[9]

ISBN 9781450380539

Association for Computing Machinery. ISBN 9781450380539. doi: 10.1145/3406325.3451004. URL https://doi.org/10.1145/3406325.3451004. Ayush Sawarni, Nirjhar Das, Siddharth Barman, and Gaurav Sinha. Generalized linear bandits with limited adaptivity. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neu...

work page doi:10.1145/3406325.3451004 2024
[10]

Dynamic transitive closure via dynamic matrix inverse

IEEE Computer Society. doi: 10.1109/FOCS. 2010.12. URL https://doi.ieeecomputersociety. org/10.1109/FOCS.2010.12. Yogatheesan Varatharajah and Brent Berry. A contextual- bandit-based approach for informed decision-making in clinical trials.Life, 12(8), 2022. ISSN 2075-1729. doi: 10.3390/life12081277. URL https://www.mdpi. com/2075-1729/12/8/1277. Roman Ve...

work page doi:10.1109/focs 2010
[11]

doi: 10.1017/ 9781108231596

ISBN 978-1-108-41519-4. doi: 10.1017/ 9781108231596. URL https://www.cambridge. org/core/books/highdimensional-probability/ 797C466DA29743D2C8213493BD2D2102. Justin Whitehouse, Aaditya Ramdas, Ryan Rogers, and Zhiwei Steven Wu. Fully-adaptive composition in differ- ential privacy. InProceedings of the 40th International Conference on Machine Learning, ICM...

work page 2023
[12]

R p dΛT + RS d1/4Λ1/2 T Λ1/2 δ d+ Λ T 1/4 p εmin(κ ∗,1) # (51) γ=C

Set the number of iteration T = ε2s2 dlog(sd/δ) and parameter η = 2S L √ T . Let matrix H∗ =Pt j=1 ˙µ(⟨xj, θ∗⟩)xjxT j + N where λminI⪯ N ⪯λ maxI with probability at least 1− 1 T 6 Then, ifλ min ≥20dRlogTwith probability at least1− 3 T 2 , the following inequalities hold ||θ∗ − ˆθ||H∗ ≤24RS p logT+d+ R(logT+d)√λmin + 2S p λmax + √ 4RSν(31) To prove this le...

work page 2024
[13]

Let N denote the set of all values computed in the binary tree

Let nt denote the set of nodes that were updated at time step t and Nt denote their corresponding values. Let N denote the set of all values computed in the binary tree. We use a−i to denote all the coordinates of a vector a except the ith coordinate. Similarly, a≤i and a<i denote the coordinates of the first i and i− 1 coordinates of the vector a respect...

work page 2014

[1] [1]

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

PMLR, 30 Jun–03 Jul 2024b. URL https: //proceedings.mlr.press/v247/azize24a.html. Borja Balle, Gilles Barthe, and Marco Gaboardi. Pri- vacy amplification by subsampling: tight analyses via 9 couplings and divergences. InProceedings of the 32nd International Conference on Neural Information Pro- cessing Systems, NIPS’18, page 6280–6290, Red Hook, NY, USA, ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3372297 2018

[2] [2]

Concentrated Differential Privacy: Simplifica- tions, Extensions, and Lower Bounds,

Springer-Verlag. ISBN 9783662536407. doi: 10.1007/978-3-662-53641-4 24. URL https://doi. org/10.1007/978-3-662-53641-4_24. T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics.ACM Trans. Inf. Syst. Secur., 14(3), November 2011. ISSN 1094-9224. doi: 10.1145/2043621.2043626. URL https://doi. org/10.1145/2043621.2043626. ...

work page doi:10.1007/978-3-662-53641-4 2011

[3] [3]

Korhonen, A single-exponential time 2-approximation algorithm for treewidth, in: IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS 2021), 2022, pp

doi: 10.1109/FOCS52979.2021.00096. Sarah Filippi, Olivier Cappe, Aur´ elien Garivier, and Csaba Szepesv´ ari. Parametric bandits: The generalized linear case. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors,Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010. URL https://proceedings. ...

work page doi:10.1109/focs52979.2021.00096 2021

[4] [4]

Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Ma- nurangsi, Rasmus Pagh, and Ameya Velingker

URL https://proceedings.mlr.press/v167/ garcelon22a.html. Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Ma- nurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous mes- sages. In11th Innovations in Theoretical Computer Science Conference (ITC 2020), volume 151 ofLIPIcs, pages 15:1–15:23. Schloss Dagstuhl–Leibniz-Ze...

work page doi:10.4230/lipics.itc.2020.15 2020

[5] [5]

URL https://www.aeaweb.org/articles?id= 10.1257/aer.104.5.431. J. Kiefer and J. Wolfowitz. The equivalence of two ex- tremum problems.Canadian Journal of Mathematics, 12:363–366, 1960. doi: 10.4153/CJM-1960-030-4. Jiachun Li, David Simchi-Levi, and Yining Wang. On the optimal regret of locally private linear contextual bandit.arXiv preprint arXiv:2404.094...

work page doi:10.1257/aer.104.5.431 1960

[6] [6]

Schapire

Association for Computing Machinery. ISBN 9781605587998. doi: 10.1145/1772690.1772758. URL https://doi.org/10.1145/1772690.1772758. Lihong Li, Yu Lu, and Dengyong Zhou. Provably optimal algorithms for generalized linear contextual bandits. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learn- ing, vol...

work page doi:10.1145/1772690.1772758 2071

[7] [7]

doi: 10.1109/CSF.2017

IEEE Computer Society. doi: 10.1109/CSF.2017

work page doi:10.1109/csf.2017 2017

[8] [8]

URLhttp://dx.doi.org/10.1109/ CSF.2017.11

URL https://doi.ieeecomputersociety.org/ 10.1109/CSF.2017.11. Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar. Max-Information, Differential Pri- vacy, and Post-selection Hypothesis Testing . In2016 IEEE 57th Annual Symposium on Foundations of Com- puter Science (FOCS), pages 487–494, Los Alami- tos, CA, USA, October 2016. IEEE Computer Soci- ety. doi...

work page doi:10.1109/csf.2017.11 2017

[9] [9]

ISBN 9781450380539

Association for Computing Machinery. ISBN 9781450380539. doi: 10.1145/3406325.3451004. URL https://doi.org/10.1145/3406325.3451004. Ayush Sawarni, Nirjhar Das, Siddharth Barman, and Gaurav Sinha. Generalized linear bandits with limited adaptivity. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neu...

work page doi:10.1145/3406325.3451004 2024

[10] [10]

Dynamic transitive closure via dynamic matrix inverse

IEEE Computer Society. doi: 10.1109/FOCS. 2010.12. URL https://doi.ieeecomputersociety. org/10.1109/FOCS.2010.12. Yogatheesan Varatharajah and Brent Berry. A contextual- bandit-based approach for informed decision-making in clinical trials.Life, 12(8), 2022. ISSN 2075-1729. doi: 10.3390/life12081277. URL https://www.mdpi. com/2075-1729/12/8/1277. Roman Ve...

work page doi:10.1109/focs 2010

[11] [11]

doi: 10.1017/ 9781108231596

ISBN 978-1-108-41519-4. doi: 10.1017/ 9781108231596. URL https://www.cambridge. org/core/books/highdimensional-probability/ 797C466DA29743D2C8213493BD2D2102. Justin Whitehouse, Aaditya Ramdas, Ryan Rogers, and Zhiwei Steven Wu. Fully-adaptive composition in differ- ential privacy. InProceedings of the 40th International Conference on Machine Learning, ICM...

work page 2023

[12] [12]

R p dΛT + RS d1/4Λ1/2 T Λ1/2 δ d+ Λ T 1/4 p εmin(κ ∗,1) # (51) γ=C

Set the number of iteration T = ε2s2 dlog(sd/δ) and parameter η = 2S L √ T . Let matrix H∗ =Pt j=1 ˙µ(⟨xj, θ∗⟩)xjxT j + N where λminI⪯ N ⪯λ maxI with probability at least 1− 1 T 6 Then, ifλ min ≥20dRlogTwith probability at least1− 3 T 2 , the following inequalities hold ||θ∗ − ˆθ||H∗ ≤24RS p logT+d+ R(logT+d)√λmin + 2S p λmax + √ 4RSν(31) To prove this le...

work page 2024

[13] [13]

Let N denote the set of all values computed in the binary tree

Let nt denote the set of nodes that were updated at time step t and Nt denote their corresponding values. Let N denote the set of all values computed in the binary tree. We use a−i to denote all the coordinates of a vector a except the ith coordinate. Similarly, a≤i and a<i denote the coordinates of the first i and i− 1 coordinates of the vector a respect...

work page 2014