AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification

Eranga Ukwatta; Mohammadreza Riyazat

arxiv: 2606.21757 · v1 · pith:Y4HWGYTVnew · submitted 2026-06-19 · 💻 cs.LG · cs.CR· stat.ML

AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification

Mohammadreza Riyazat , Eranga Ukwatta This is my paper

Pith reviewed 2026-06-26 14:24 UTC · model grok-4.3

classification 💻 cs.LG cs.CRstat.ML

keywords contextual banditsdifferential privacyThompson samplingprivacy amplificationevent-level privacyzero-concentrated DPlinear models

0 comments

The pith

Differential privacy noise in contextual bandit Thompson Sampling is absorbed as increased uncertainty, producing only logarithmic privacy cost over time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

AdaPrivate-TS applies Thompson Sampling to contextual bandits while adding differential privacy through batched noise addition. The added Gaussian noise changes the posterior sampling distribution in a way that naturally increases exploration rather than harming decisions. Because contexts are stochastic and privacy is event-level, batches can be composed in parallel, so the total privacy budget grows only logarithmically with the number of rounds. This lets the method reach 93 to 99 percent of non-private performance at moderate privacy levels and beat other private algorithms on real recommendation data.

Core claim

By adding noise to the sufficient statistics of a linear model and then sampling from the inflated posterior covariance v squared times A inverse plus sigma squared times A to the minus two, Thompson Sampling automatically treats privacy noise as extra uncertainty. Under event-level privacy and stochastic contexts, batched zCDP composition therefore incurs only an O of square root d times log T over square root rho privacy cost, which remains logarithmic in the horizon T.

What carries the argument

The structured covariance inflation v^2 A^{-1} + sigma^2 A^{-2} that lets Thompson Sampling interpret privacy noise as uncertainty, together with parallel composition of batched zCDP updates under event-level privacy.

If this is right

Achieves 93-99% of non-private performance at privacy levels epsilon between 0.5 and 5
Outperforms private UCB variants by 0.5 to 3.7 percent, with larger gains at extreme privacy
Poisson subsampling amplification yields extra 2-5 percent improvement at low epsilon
Performs best overall on MovieLens and Jester among event-level private methods for epsilon at or above 2
Maintains advantage over UCB even when features themselves are privatized via DP-SVD

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same noise-as-uncertainty mechanism could apply to other posterior-sampling bandit methods
Relaxing the stochastic-context assumption to adversarial contexts would likely restore linear dependence on T in the privacy cost
Dynamic adjustment of batch sizes based on observed privacy spend might further lower the effective cost
Similar covariance inflation effects may appear in non-bandit linear models that use additive privacy noise

Load-bearing premise

Contexts arrive stochastically from a fixed distribution, allowing independent batches to be protected by parallel composition without interaction effects.

What would settle it

Running the algorithm on a sequence of adversarially chosen contexts and observing that the cumulative privacy cost or regret grows linearly rather than logarithmically with T would disprove the main scaling result.

read the original abstract

We present AdaPrivate-TS, a differentially private contextual bandit algorithm that combines Thompson Sampling with batched zCDP composition. Our key insight is that differential privacy noise inflates the posterior covariance in a structured way: adding Gaussian noise $N(0,\sigma^2 I)$ to $b$ yields sampling covariance $v^2 A^{-1} + \sigma^2 A^{-2}$, which Thompson Sampling interprets as increased uncertainty rather than pure corruption. Under event-level privacy (protecting individual interactions) with stochastic contexts, we prove that the privacy cost is only $O(\sqrt{d}\,\log T/\sqrt{\rho})$, logarithmic in $T$, because parallel composition amortizes noise across batches. Additionally, we explore privacy amplification via Poisson subsampling, which can reduce effective noise at stringent privacy budgets. Experiments on synthetic and real-world datasets demonstrate: (1) AdaPrivate-TS achieves 93-99% of non-private performance at $\varepsilon \in [0.5, 5]$, outperforming UCB by 0.5-3.7% and up to 18% with tuned adaptive exploration at extreme $\varepsilon$; (2) privacy amplification provides additional 2-5% gains at low $\varepsilon$; (3) on MovieLens and Jester, AdaPrivate-TS achieves the best overall performance among event-level baselines, dominating at $\varepsilon \geq 2$; (4) under DP-SVD private features, TS's advantage over UCB grows to +11%, confirming noise-as-uncertainty is not limited to reward privacy. We provide rigorous proofs for privacy guarantees under interactive zCDP composition and comprehensive evaluation including convergence curves, 12-seed CIs, and DP-SVD feature ablation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main advance is showing how batched zCDP plus structured posterior inflation lets Thompson Sampling achieve logarithmic privacy cost under event-level privacy with stochastic contexts.

read the letter

The central claim is that adding Gaussian noise to the linear model parameters produces a specific covariance inflation that Thompson Sampling reads as extra uncertainty, and that batched composition under event-level privacy then drives the total privacy cost down to O(sqrt(d) log T / sqrt(rho)). That combination looks new relative to earlier private bandit papers.

The experiments are the strongest part. They include 12-seed confidence intervals, ablation on DP-SVD features, and direct comparisons showing 93-99% of non-private regret at moderate epsilon, with measurable gains from Poisson subsampling at tighter budgets. The fact that the advantage over UCB grows under private features is a useful data point.

The soft spots are the modeling assumptions. The logarithmic cost and the clean interpretation of the inflated covariance both require stochastic contexts and event-level privacy; the abstract is explicit about this, but it means the result does not automatically carry over to adversarial contexts or user-level privacy. Without the full proofs it is hard to judge how the extra sigma^2 A^{-2} term interacts with the regret analysis or whether the O(sqrt(d) log T) bound absorbs it cleanly. The noise scale sigma also remains a free parameter.

This is for researchers working on private contextual bandits and long-horizon private RL. Anyone who needs to run Thompson Sampling under differential privacy will find the covariance view and the batching trick worth checking. The work is coherent on its own terms and the experiments are reproducible enough to merit referee time.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces AdaPrivate-TS, a differentially private contextual bandit algorithm combining Thompson Sampling with batched zCDP composition. It claims that adding Gaussian noise N(0, σ²I) to the linear sufficient statistic b produces a structured posterior covariance inflation v²A⁻¹ + σ²A⁻² that Thompson Sampling interprets as increased uncertainty. Under event-level privacy and stochastic contexts, parallel composition is shown to yield a privacy cost of only O(√d log T / √ρ), logarithmic in T. The paper supplies rigorous proofs for the interactive zCDP guarantees and reports experiments on synthetic and real-world data (MovieLens, Jester) demonstrating 93-99% of non-private performance at ε ∈ [0.5, 5], with gains from Poisson subsampling and advantages over UCB baselines, including under DP-SVD private features.

Significance. If the central claims hold, the work is significant for private online learning: it demonstrates that the Bayesian structure of Thompson Sampling can absorb DP noise as uncertainty rather than incurring additive regret penalties, producing only logarithmic privacy overhead in T. The empirical results, including 12-seed confidence intervals, ablation studies, and the DP-SVD feature experiment, provide concrete evidence of practical utility. The explicit use of zCDP composition theorems and the structured covariance derivation are strengths that distinguish the contribution from generic noise-addition approaches.

major comments (2)

[Abstract and privacy analysis] Abstract, privacy guarantees paragraph: the O(√d log T / √ρ) bound is derived under event-level privacy and stochastic contexts via parallel composition; the manuscript should cite the precise theorem (or appendix lemma) that establishes this amortization and confirm that the stochastic-context assumption is necessary for the logarithmic dependence.
[Covariance inflation derivation] Covariance inflation claim: the statement that adding N(0, σ²I) to b yields sampling covariance v²A⁻¹ + σ²A⁻² is load-bearing for the 'noise-as-uncertainty' interpretation; an explicit derivation (even if short) should appear in the main text or a numbered appendix equation to allow verification that no additional terms arise from the interactive setting.

minor comments (2)

Ensure every experimental figure reports the 12-seed confidence intervals mentioned in the abstract; several convergence curves appear to omit error bars.
The abstract references 'rigorous proofs for privacy guarantees under interactive zCDP composition'; a short pointer in the main text to the appendix theorem numbers would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. The two major comments are constructive and we address them point-by-point below, agreeing to incorporate clarifications that strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and privacy analysis] Abstract, privacy guarantees paragraph: the O(√d log T / √ρ) bound is derived under event-level privacy and stochastic contexts via parallel composition; the manuscript should cite the precise theorem (or appendix lemma) that establishes this amortization and confirm that the stochastic-context assumption is necessary for the logarithmic dependence.

Authors: We agree. The O(√d log T / √ρ) bound follows from applying the parallel composition property of zCDP to the batched updates under event-level privacy when contexts are drawn i.i.d. from a fixed distribution; this is formalized in Appendix Lemma 4 (Batched zCDP Composition under Stochastic Contexts) together with the standard zCDP parallel composition theorem (Bun & Steinke, 2016). The stochastic-context assumption is required for the amortization to yield only logarithmic dependence on T; without it, sequential composition would produce a linear dependence. We will add an explicit citation to Appendix Lemma 4 in the abstract's privacy paragraph and a one-sentence clarification of the assumption's necessity in the introduction. revision: yes
Referee: [Covariance inflation derivation] Covariance inflation claim: the statement that adding N(0, σ²I) to b yields sampling covariance v²A⁻¹ + σ²A⁻² is load-bearing for the 'noise-as-uncertainty' interpretation; an explicit derivation (even if short) should appear in the main text or a numbered appendix equation to allow verification that no additional terms arise from the interactive setting.

Authors: We agree that an explicit derivation improves verifiability. The covariance follows directly from the posterior update: after adding noise to the sufficient statistic b, the posterior mean and covariance are obtained by completing the square in the Gaussian likelihood, producing exactly v²A⁻¹ + σ²A⁻² with no cross terms from the interactive protocol (the noise is added after the linear sufficient statistic is formed). We will insert a short numbered derivation (three lines) as Equation (3) in Section 3.2 of the main text, with a pointer to the full interactive zCDP proof in Appendix B confirming no additional interaction-induced terms appear. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives the structured covariance inflation v²A⁻¹ + σ²A⁻² directly from adding N(0,σ²I) to b and interprets it as extra uncertainty for Thompson Sampling; this is a standard linear-algebra consequence, not a self-referential fit. The O(√d log T / √ρ) privacy cost bound is obtained from external interactive zCDP composition theorems under the stated event-level and stochastic-context assumptions, with parallel composition amortizing noise across batches. No step renames a fitted quantity as a prediction, imports uniqueness via self-citation, or reduces the central claim to an internal definition. The derivation remains self-contained against the cited composition results and Gaussian posterior mechanics.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The abstract relies on standard properties of Gaussian noise and zCDP composition; no new free parameters or invented entities are introduced beyond the usual privacy noise scale sigma and batch size choices.

free parameters (1)

noise scale sigma
Chosen to meet the target privacy budget rho; appears as a tunable parameter in the algorithm description.

axioms (2)

standard math Gaussian noise added to sufficient statistic b produces the stated inflated covariance v^2 A^{-1} + sigma^2 A^{-2}
Invoked in the key insight paragraph; follows from standard multivariate Gaussian conditioning.
domain assumption Parallel composition amortizes noise across batches under event-level privacy
Central to the O(sqrt(d) log T) claim; assumes stochastic contexts and independent interactions.

pith-pipeline@v0.9.1-grok · 5860 in / 1652 out tokens · 20380 ms · 2026-06-26T14:24:31.197125+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references

[1]

Calibrating noise to sensitivity in private data analysis

C. Dwork, F. McSherry, K. Nissim, and A. Smith. “Calibrating noise to sensitivity in private data analysis”. In:TCC. 2006, pp. 265–284

2006
[2]

Differentially private contextual linear bandits

R. Shariff and O. Sheffet. “Differentially private contextual linear bandits”. In:NeurIPS. Vol. 31. 2018

2018
[3]

Concentrated differential privacy: Simplifications, extensions, and lower bounds

M. Bun and T. Steinke. “Concentrated differential privacy: Simplifications, extensions, and lower bounds”. In:TCC. 2016, pp. 635–658

2016
[4]

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

W. R. Thompson. “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples”. In:Biometrika25.3/4 (1933), pp. 285–294

1933
[5]

Thompson sampling for contextual bandits with linear payoffs

S. Agrawal and N. Goyal. “Thompson sampling for contextual bandits with linear payoffs”. In:ICML. 2013, pp. 127–135

2013
[6]

Privacy amplification by subsampling: Tight analyses via couplings and divergences

B. Balle, G. Barthe, and M. Gavin. “Privacy amplification by subsampling: Tight analyses via couplings and divergences”. In:NeurIPS. Vol. 31. 2018

2018
[7]

Rényi differential privacy of the sampled Gaussian mechanism

I. Mironov, K. Talwar, and L. Zhang. “Rényi differential privacy of the sampled Gaussian mechanism”. In:arXiv preprint arXiv:1908.10530. 2019

arXiv 1908
[8]

The movielens datasets: History and context

F. M. Harper and J. A. Konstan. “The movielens datasets: History and context”. In:ACM TIIS5.4 (2015), pp. 1–19

2015
[9]

Eigentaste: A constant time collaborative filtering algorithm

K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. “Eigentaste: A constant time collaborative filtering algorithm”. In:Information Retrieval4.2 (2001), pp. 133–151

2001
[10]

Concentrated Differential Privacy for Bandits

A. Azize and D. Basu. “Concentrated Differential Privacy for Bandits”. In:IEEE SaTML. arXiv:2309.00557. 2024

arXiv 2024
[11]

Dynamic Global Sensitivity for Differentially Private Contextual Bandits

H. Wang, D. Zhao, and H. Wang. “Dynamic Global Sensitivity for Differentially Private Contextual Bandits”. In:arXiv preprint arXiv:2208.14555(2022)

arXiv 2022
[12]

Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits

B. Hu and N. Hegde. “Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits”. In:UAI. 2022, pp. 844–852

2022
[13]

Locally differentially private (contextual) bandits learning

K. Zheng, T. Cai, W. Huang, Z. Li, and L. Wang. “Locally differentially private (contextual) bandits learning”. In:NeurIPS. Vol. 33. 2020

2020
[14]

Shuffle Private Linear Contextual Bandits

S. R. Chowdhury and X. Zhou. “Shuffle Private Linear Contextual Bandits”. In:ICML. 2022, pp. 3984–4009. 12

2022

[1] [1]

Calibrating noise to sensitivity in private data analysis

C. Dwork, F. McSherry, K. Nissim, and A. Smith. “Calibrating noise to sensitivity in private data analysis”. In:TCC. 2006, pp. 265–284

2006

[2] [2]

Differentially private contextual linear bandits

R. Shariff and O. Sheffet. “Differentially private contextual linear bandits”. In:NeurIPS. Vol. 31. 2018

2018

[3] [3]

Concentrated differential privacy: Simplifications, extensions, and lower bounds

M. Bun and T. Steinke. “Concentrated differential privacy: Simplifications, extensions, and lower bounds”. In:TCC. 2016, pp. 635–658

2016

[4] [4]

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

W. R. Thompson. “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples”. In:Biometrika25.3/4 (1933), pp. 285–294

1933

[5] [5]

Thompson sampling for contextual bandits with linear payoffs

S. Agrawal and N. Goyal. “Thompson sampling for contextual bandits with linear payoffs”. In:ICML. 2013, pp. 127–135

2013

[6] [6]

Privacy amplification by subsampling: Tight analyses via couplings and divergences

B. Balle, G. Barthe, and M. Gavin. “Privacy amplification by subsampling: Tight analyses via couplings and divergences”. In:NeurIPS. Vol. 31. 2018

2018

[7] [7]

Rényi differential privacy of the sampled Gaussian mechanism

I. Mironov, K. Talwar, and L. Zhang. “Rényi differential privacy of the sampled Gaussian mechanism”. In:arXiv preprint arXiv:1908.10530. 2019

arXiv 1908

[8] [8]

The movielens datasets: History and context

F. M. Harper and J. A. Konstan. “The movielens datasets: History and context”. In:ACM TIIS5.4 (2015), pp. 1–19

2015

[9] [9]

Eigentaste: A constant time collaborative filtering algorithm

K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. “Eigentaste: A constant time collaborative filtering algorithm”. In:Information Retrieval4.2 (2001), pp. 133–151

2001

[10] [10]

Concentrated Differential Privacy for Bandits

A. Azize and D. Basu. “Concentrated Differential Privacy for Bandits”. In:IEEE SaTML. arXiv:2309.00557. 2024

arXiv 2024

[11] [11]

Dynamic Global Sensitivity for Differentially Private Contextual Bandits

H. Wang, D. Zhao, and H. Wang. “Dynamic Global Sensitivity for Differentially Private Contextual Bandits”. In:arXiv preprint arXiv:2208.14555(2022)

arXiv 2022

[12] [12]

Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits

B. Hu and N. Hegde. “Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits”. In:UAI. 2022, pp. 844–852

2022

[13] [13]

Locally differentially private (contextual) bandits learning

K. Zheng, T. Cai, W. Huang, Z. Li, and L. Wang. “Locally differentially private (contextual) bandits learning”. In:NeurIPS. Vol. 33. 2020

2020

[14] [14]

Shuffle Private Linear Contextual Bandits

S. R. Chowdhury and X. Zhou. “Shuffle Private Linear Contextual Bandits”. In:ICML. 2022, pp. 3984–4009. 12

2022