AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification
Pith reviewed 2026-06-26 14:24 UTC · model grok-4.3
The pith
Differential privacy noise in contextual bandit Thompson Sampling is absorbed as increased uncertainty, producing only logarithmic privacy cost over time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By adding noise to the sufficient statistics of a linear model and then sampling from the inflated posterior covariance v squared times A inverse plus sigma squared times A to the minus two, Thompson Sampling automatically treats privacy noise as extra uncertainty. Under event-level privacy and stochastic contexts, batched zCDP composition therefore incurs only an O of square root d times log T over square root rho privacy cost, which remains logarithmic in the horizon T.
What carries the argument
The structured covariance inflation v^2 A^{-1} + sigma^2 A^{-2} that lets Thompson Sampling interpret privacy noise as uncertainty, together with parallel composition of batched zCDP updates under event-level privacy.
If this is right
- Achieves 93-99% of non-private performance at privacy levels epsilon between 0.5 and 5
- Outperforms private UCB variants by 0.5 to 3.7 percent, with larger gains at extreme privacy
- Poisson subsampling amplification yields extra 2-5 percent improvement at low epsilon
- Performs best overall on MovieLens and Jester among event-level private methods for epsilon at or above 2
- Maintains advantage over UCB even when features themselves are privatized via DP-SVD
Where Pith is reading between the lines
- The same noise-as-uncertainty mechanism could apply to other posterior-sampling bandit methods
- Relaxing the stochastic-context assumption to adversarial contexts would likely restore linear dependence on T in the privacy cost
- Dynamic adjustment of batch sizes based on observed privacy spend might further lower the effective cost
- Similar covariance inflation effects may appear in non-bandit linear models that use additive privacy noise
Load-bearing premise
Contexts arrive stochastically from a fixed distribution, allowing independent batches to be protected by parallel composition without interaction effects.
What would settle it
Running the algorithm on a sequence of adversarially chosen contexts and observing that the cumulative privacy cost or regret grows linearly rather than logarithmically with T would disprove the main scaling result.
read the original abstract
We present AdaPrivate-TS, a differentially private contextual bandit algorithm that combines Thompson Sampling with batched zCDP composition. Our key insight is that differential privacy noise inflates the posterior covariance in a structured way: adding Gaussian noise $N(0,\sigma^2 I)$ to $b$ yields sampling covariance $v^2 A^{-1} + \sigma^2 A^{-2}$, which Thompson Sampling interprets as increased uncertainty rather than pure corruption. Under event-level privacy (protecting individual interactions) with stochastic contexts, we prove that the privacy cost is only $O(\sqrt{d}\,\log T/\sqrt{\rho})$, logarithmic in $T$, because parallel composition amortizes noise across batches. Additionally, we explore privacy amplification via Poisson subsampling, which can reduce effective noise at stringent privacy budgets. Experiments on synthetic and real-world datasets demonstrate: (1) AdaPrivate-TS achieves 93-99% of non-private performance at $\varepsilon \in [0.5, 5]$, outperforming UCB by 0.5-3.7% and up to 18% with tuned adaptive exploration at extreme $\varepsilon$; (2) privacy amplification provides additional 2-5% gains at low $\varepsilon$; (3) on MovieLens and Jester, AdaPrivate-TS achieves the best overall performance among event-level baselines, dominating at $\varepsilon \geq 2$; (4) under DP-SVD private features, TS's advantage over UCB grows to +11%, confirming noise-as-uncertainty is not limited to reward privacy. We provide rigorous proofs for privacy guarantees under interactive zCDP composition and comprehensive evaluation including convergence curves, 12-seed CIs, and DP-SVD feature ablation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AdaPrivate-TS, a differentially private contextual bandit algorithm combining Thompson Sampling with batched zCDP composition. It claims that adding Gaussian noise N(0, σ²I) to the linear sufficient statistic b produces a structured posterior covariance inflation v²A⁻¹ + σ²A⁻² that Thompson Sampling interprets as increased uncertainty. Under event-level privacy and stochastic contexts, parallel composition is shown to yield a privacy cost of only O(√d log T / √ρ), logarithmic in T. The paper supplies rigorous proofs for the interactive zCDP guarantees and reports experiments on synthetic and real-world data (MovieLens, Jester) demonstrating 93-99% of non-private performance at ε ∈ [0.5, 5], with gains from Poisson subsampling and advantages over UCB baselines, including under DP-SVD private features.
Significance. If the central claims hold, the work is significant for private online learning: it demonstrates that the Bayesian structure of Thompson Sampling can absorb DP noise as uncertainty rather than incurring additive regret penalties, producing only logarithmic privacy overhead in T. The empirical results, including 12-seed confidence intervals, ablation studies, and the DP-SVD feature experiment, provide concrete evidence of practical utility. The explicit use of zCDP composition theorems and the structured covariance derivation are strengths that distinguish the contribution from generic noise-addition approaches.
major comments (2)
- [Abstract and privacy analysis] Abstract, privacy guarantees paragraph: the O(√d log T / √ρ) bound is derived under event-level privacy and stochastic contexts via parallel composition; the manuscript should cite the precise theorem (or appendix lemma) that establishes this amortization and confirm that the stochastic-context assumption is necessary for the logarithmic dependence.
- [Covariance inflation derivation] Covariance inflation claim: the statement that adding N(0, σ²I) to b yields sampling covariance v²A⁻¹ + σ²A⁻² is load-bearing for the 'noise-as-uncertainty' interpretation; an explicit derivation (even if short) should appear in the main text or a numbered appendix equation to allow verification that no additional terms arise from the interactive setting.
minor comments (2)
- Ensure every experimental figure reports the 12-seed confidence intervals mentioned in the abstract; several convergence curves appear to omit error bars.
- The abstract references 'rigorous proofs for privacy guarantees under interactive zCDP composition'; a short pointer in the main text to the appendix theorem numbers would improve readability.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation for minor revision. The two major comments are constructive and we address them point-by-point below, agreeing to incorporate clarifications that strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and privacy analysis] Abstract, privacy guarantees paragraph: the O(√d log T / √ρ) bound is derived under event-level privacy and stochastic contexts via parallel composition; the manuscript should cite the precise theorem (or appendix lemma) that establishes this amortization and confirm that the stochastic-context assumption is necessary for the logarithmic dependence.
Authors: We agree. The O(√d log T / √ρ) bound follows from applying the parallel composition property of zCDP to the batched updates under event-level privacy when contexts are drawn i.i.d. from a fixed distribution; this is formalized in Appendix Lemma 4 (Batched zCDP Composition under Stochastic Contexts) together with the standard zCDP parallel composition theorem (Bun & Steinke, 2016). The stochastic-context assumption is required for the amortization to yield only logarithmic dependence on T; without it, sequential composition would produce a linear dependence. We will add an explicit citation to Appendix Lemma 4 in the abstract's privacy paragraph and a one-sentence clarification of the assumption's necessity in the introduction. revision: yes
-
Referee: [Covariance inflation derivation] Covariance inflation claim: the statement that adding N(0, σ²I) to b yields sampling covariance v²A⁻¹ + σ²A⁻² is load-bearing for the 'noise-as-uncertainty' interpretation; an explicit derivation (even if short) should appear in the main text or a numbered appendix equation to allow verification that no additional terms arise from the interactive setting.
Authors: We agree that an explicit derivation improves verifiability. The covariance follows directly from the posterior update: after adding noise to the sufficient statistic b, the posterior mean and covariance are obtained by completing the square in the Gaussian likelihood, producing exactly v²A⁻¹ + σ²A⁻² with no cross terms from the interactive protocol (the noise is added after the linear sufficient statistic is formed). We will insert a short numbered derivation (three lines) as Equation (3) in Section 3.2 of the main text, with a pointer to the full interactive zCDP proof in Appendix B confirming no additional interaction-induced terms appear. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper derives the structured covariance inflation v²A⁻¹ + σ²A⁻² directly from adding N(0,σ²I) to b and interprets it as extra uncertainty for Thompson Sampling; this is a standard linear-algebra consequence, not a self-referential fit. The O(√d log T / √ρ) privacy cost bound is obtained from external interactive zCDP composition theorems under the stated event-level and stochastic-context assumptions, with parallel composition amortizing noise across batches. No step renames a fitted quantity as a prediction, imports uniqueness via self-citation, or reduces the central claim to an internal definition. The derivation remains self-contained against the cited composition results and Gaussian posterior mechanics.
Axiom & Free-Parameter Ledger
free parameters (1)
- noise scale sigma
axioms (2)
- standard math Gaussian noise added to sufficient statistic b produces the stated inflated covariance v^2 A^{-1} + sigma^2 A^{-2}
- domain assumption Parallel composition amortizes noise across batches under event-level privacy
Reference graph
Works this paper leans on
-
[1]
Calibrating noise to sensitivity in private data analysis
C. Dwork, F. McSherry, K. Nissim, and A. Smith. “Calibrating noise to sensitivity in private data analysis”. In:TCC. 2006, pp. 265–284
2006
-
[2]
Differentially private contextual linear bandits
R. Shariff and O. Sheffet. “Differentially private contextual linear bandits”. In:NeurIPS. Vol. 31. 2018
2018
-
[3]
Concentrated differential privacy: Simplifications, extensions, and lower bounds
M. Bun and T. Steinke. “Concentrated differential privacy: Simplifications, extensions, and lower bounds”. In:TCC. 2016, pp. 635–658
2016
-
[4]
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
W. R. Thompson. “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples”. In:Biometrika25.3/4 (1933), pp. 285–294
1933
-
[5]
Thompson sampling for contextual bandits with linear payoffs
S. Agrawal and N. Goyal. “Thompson sampling for contextual bandits with linear payoffs”. In:ICML. 2013, pp. 127–135
2013
-
[6]
Privacy amplification by subsampling: Tight analyses via couplings and divergences
B. Balle, G. Barthe, and M. Gavin. “Privacy amplification by subsampling: Tight analyses via couplings and divergences”. In:NeurIPS. Vol. 31. 2018
2018
-
[7]
Rényi differential privacy of the sampled Gaussian mechanism
I. Mironov, K. Talwar, and L. Zhang. “Rényi differential privacy of the sampled Gaussian mechanism”. In:arXiv preprint arXiv:1908.10530. 2019
arXiv 1908
-
[8]
The movielens datasets: History and context
F. M. Harper and J. A. Konstan. “The movielens datasets: History and context”. In:ACM TIIS5.4 (2015), pp. 1–19
2015
-
[9]
Eigentaste: A constant time collaborative filtering algorithm
K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. “Eigentaste: A constant time collaborative filtering algorithm”. In:Information Retrieval4.2 (2001), pp. 133–151
2001
-
[10]
Concentrated Differential Privacy for Bandits
A. Azize and D. Basu. “Concentrated Differential Privacy for Bandits”. In:IEEE SaTML. arXiv:2309.00557. 2024
arXiv 2024
-
[11]
Dynamic Global Sensitivity for Differentially Private Contextual Bandits
H. Wang, D. Zhao, and H. Wang. “Dynamic Global Sensitivity for Differentially Private Contextual Bandits”. In:arXiv preprint arXiv:2208.14555(2022)
arXiv 2022
-
[12]
Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits
B. Hu and N. Hegde. “Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits”. In:UAI. 2022, pp. 844–852
2022
-
[13]
Locally differentially private (contextual) bandits learning
K. Zheng, T. Cai, W. Huang, Z. Li, and L. Wang. “Locally differentially private (contextual) bandits learning”. In:NeurIPS. Vol. 33. 2020
2020
-
[14]
Shuffle Private Linear Contextual Bandits
S. R. Chowdhury and X. Zhou. “Shuffle Private Linear Contextual Bandits”. In:ICML. 2022, pp. 3984–4009. 12
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.