Worst-Case Regret Bounds for Combinatorial Thompson Sampling in Sleeping Semi-Bandits
Pith reviewed 2026-05-13 07:35 UTC · model grok-4.3
The pith
Combinatorial Thompson sampling with Gaussian priors achieves first worst-case regret bound of Õ(m√NT) for sleeping semi-bandits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CTS-G attains a worst-case regret upper bound of Õ(m√NT) with a matching lower bound Ω̃(m√NT) in sleeping semi-bandits under adversarial availability. CL-SG, by sampling a single shared Gaussian seed each round, attains the improved bound Õ(√(mNT)) with matching lower bound Ω(√(mNT)).
What carries the argument
CL-SG's shared Gaussian seed sampling, which coordinates exploration across arms by using one random seed per round instead of independent samples.
If this is right
- The regret for CL-SG scales as the square root of the product of the number of arms m and time horizon T, enabling better scaling in large systems.
- Matching lower bounds confirm that the upper bounds cannot be improved asymptotically.
- CL-SG outperforms CTS-G and CTS-B on real-world datasets.
- The analysis applies to adversarial changes in arm availability while keeping combinatorial constraints fixed.
Where Pith is reading between the lines
- CL-SG's coordination trick could be applied to other Thompson sampling algorithms to reduce regret in non-combinatorial settings.
- Wireless network designers can now select algorithms with proven regret guarantees for fluctuating link availability.
- Future work might test these bounds in online settings with learned combinatorial constraints rather than fixed ones.
Load-bearing premise
The feasible actions must satisfy fixed combinatorial constraints that are known in advance and do not change over time.
What would settle it
Observe the cumulative regret of CL-SG in a synthetic sleeping semi-bandit instance with m arms and T rounds under adversarial availability; if it grows faster than order sqrt(mNT), the bound is falsified.
Figures
read the original abstract
We revisit combinatorial Thompson sampling (CTS) for semi-bandits with sleeping arms, where arm availability varies over time and actions must satisfy combinatorial constraints, as in wireless mesh routing with fluctuating link availability. Despite its practical relevance, CTS has been hindered by several long-standing problems: (i) the absence of worst-case regret guarantees in the semi-bandit setting even without sleeping arms, (ii) the lack of theory under adversarially varying availability, and (iii) the consistently weak empirical performance of CTS with Gaussian priors (CTS-G). This paper resolves these long-standing issues by providing the first worst-case regret analysis of CTS-G, proving an upper bound of $\tilde{O}(m\sqrt{NT})$ and a matching lower bound of $\tilde{\Omega}(m\sqrt{NT})$. To bridge the gap between theory and practice, we further propose CL-SG, a simple CTS-G variant that samples a single shared Gaussian seed each round to coordinate exploration across arms. We show that CL-SG achieves an improved regret bound of $\tilde{O}(\sqrt{mNT})$, together with a matching lower bound $\Omega(\sqrt{mNT})$. Experiments on real-world datasets demonstrate that CL-SG consistently outperforms strong baselines including CTS-G and CTS-B, and we open-source our implementation for reproducibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to deliver the first worst-case regret analysis of combinatorial Thompson sampling with Gaussian priors (CTS-G) for sleeping semi-bandits, establishing a matching upper bound of Õ(m√(NT)) and lower bound of Ω̃(m√(NT)). It further introduces CL-SG, which draws one shared Gaussian seed per round to coordinate exploration, and proves that this variant attains the improved regret Õ(√(mNT)) with a matching lower bound Ω(√(mNT)). Empirical results on real-world datasets are reported to show CL-SG outperforming CTS-G and CTS-B baselines.
Significance. If the stated bounds are correct, the work closes two long-standing gaps: the absence of worst-case guarantees for CTS in semi-bandits (even without sleeping) and the lack of theory under adversarial availability. The matching upper/lower bounds for both algorithms, together with the open-sourced implementation, would constitute a substantive contribution to the combinatorial bandit literature.
major comments (1)
- [Abstract / CL-SG analysis] Abstract / CL-SG analysis: the improved Õ(√(mNT)) regret for CL-SG is asserted to follow from sampling a single shared Gaussian seed per round. Under adversarial sleeping, the set of available arms (and therefore the effective support of the seed) changes arbitrarily each round. The manuscript must supply the explicit argument showing that the anti-concentration or variance properties of the shared seed survive this random restriction and do not collapse to the weaker m√(NT) scaling of plain CTS-G; without that step the central improvement claim is not yet secured.
minor comments (1)
- [Abstract] The abstract uses m, N, T without immediate definition; an early sentence or footnote clarifying that m is the number of base arms, N the maximum action size, and T the horizon would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the CL-SG analysis requires an explicit argument showing that the shared seed's anti-concentration properties are preserved under adversarial sleeping, and we will add this in the revision.
read point-by-point responses
-
Referee: [Abstract / CL-SG analysis] Abstract / CL-SG analysis: the improved Õ(√(mNT)) regret for CL-SG is asserted to follow from sampling a single shared Gaussian seed per round. Under adversarial sleeping, the set of available arms (and therefore the effective support of the seed) changes arbitrarily each round. The manuscript must supply the explicit argument showing that the anti-concentration or variance properties of the shared seed survive this random restriction and do not collapse to the weaker m√(NT) scaling of plain CTS-G; without that step the central improvement claim is not yet secured.
Authors: We agree the current draft leaves this step implicit. In the revised manuscript we will add a dedicated lemma (new Lemma 4.3) proving that the shared standard-Gaussian seed g_t ~ N(0,1), when restricted to the adversarially chosen available set S_t, still satisfies the required anti-concentration: for any fixed direction v supported on S_t, P(|<g_t, v>| < ε) ≤ O(ε) uniformly in S_t. The proof uses the rotational invariance of the Gaussian together with the fact that the restriction operator is a coordinate projection whose operator norm is at most 1; consequently the marginal variance along any unit vector in the span of S_t remains exactly 1, preventing collapse to the per-arm independent scaling of CTS-G. This lemma is then plugged directly into the existing regret decomposition to recover the Õ(√(mNT)) bound. We will also include a short remark explaining why the same argument fails for independent per-arm seeds. revision: yes
Circularity Check
No significant circularity; bounds derived via standard analysis techniques
full rationale
The paper presents the first worst-case regret analysis for CTS-G under sleeping semi-bandits, claiming an upper bound of Õ(m√NT) with matching lower bound, and an improved Õ(√(mNT)) bound for the CL-SG variant using a shared Gaussian seed. These derivations rely on standard semi-bandit regret decomposition, concentration inequalities, and Thompson sampling posterior analysis rather than any self-definitional reduction, fitted parameter renamed as prediction, or load-bearing self-citation. No equations or steps in the provided claims reduce the target bounds to quantities defined by the paper's own inputs. The central claims remain independent of any circular construction, consistent with the reader's assessment of standard techniques.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard sub-Gaussian concentration bounds for Thompson sampling analysis
Reference graph
Works this paper leans on
-
[1]
Z. Huang, B. Hu, and J. Pan, “Bridging the Regret Gap in Combinatorial Thompson Sampling: Worst-Case Guarantees and Algorithmic Refine- ment,” inProc. IEEE Conference on Computer Communications (IN- FOCOM). IEEE, 2026
work page 2026
-
[2]
Regret Bounds for Sleeping Experts and Bandits,
R. Kleinberg, A. Niculescu-Mizil, and Y . Sharma, “Regret Bounds for Sleeping Experts and Bandits,”Machine Learning, vol. 80, no. 2-3, pp. 245–272, 2010
work page 2010
-
[3]
Analysis of Thompson Sampling for Stochastic Sleeping Bandits,
A. Chatterjee, G. Ghalme, S. Jain, R. Vaish, and Y . Narahari, “Analysis of Thompson Sampling for Stochastic Sleeping Bandits,” inProc. Conference on Uncertainty in Artificial Intelligence (UAI), 2017
work page 2017
-
[4]
Intelligent Caching Algorithms in Heterogeneous Wireless Networks with Uncertainty,
B. Hu, Y . Chen, Z. Huang, N. A. Mehta, and J. Pan, “Intelligent Caching Algorithms in Heterogeneous Wireless Networks with Uncertainty,” in Proc. IEEE Conference on Distributed Computing Systems (ICDCS). IEEE, 2019
work page 2019
-
[5]
Hardness of Online Sleeping Combinatorial Optimization Problems,
S. Kale, C. Lee, and D. Pal, “Hardness of Online Sleeping Combinatorial Optimization Problems,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 2181–2189
work page 2016
-
[6]
Online Combinatorial Optimization with Stochastic Decision Sets and Adversarial Losses,
G. Neu and M. Valko, “Online Combinatorial Optimization with Stochastic Decision Sets and Adversarial Losses,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2014, pp. 2780– 2788
work page 2014
-
[7]
Combinatorial Sleeping Bandits with Fairness Constraints,
F. Li, J. Liu, and B. Ji, “Combinatorial Sleeping Bandits with Fairness Constraints,” inProc. IEEE Conference on Computer Communica- tions (INFOCOM). IEEE, May 2019, pp. 1702–1710
work page 2019
-
[8]
Combinatorial Sleeping Bandits with Fairness Constraints,
——, “Combinatorial Sleeping Bandits with Fairness Constraints,”IEEE Transactions on Network Science and Engineering (TNSE), vol. 7, no. 3, pp. 1799–1813, 2019
work page 2019
-
[9]
Achieving Regular and Fair Learning in Combi- natorial Multi-Armed Bandit,
X. Wu and B. Li, “Achieving Regular and Fair Learning in Combi- natorial Multi-Armed Bandit,” inProc. IEEE Conference on Computer Communications (INFOCOM). IEEE, 2024, pp. 361–370
work page 2024
-
[10]
On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit,
X. Wu, B. Ji, and B. Li, “On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit,” inProc. IEEE Conference on Computer Communications (INFOCOM), 2025
work page 2025
-
[11]
An Empirical Evaluation of Thompson Sampling,
O. Chapelle and L. Li, “An Empirical Evaluation of Thompson Sampling,” inProc. Advances in Neural Information Processing Sys- tems (NeurIPS), 2011, pp. 2249–2257
work page 2011
-
[12]
Thompson Sampling for Combinatorial Semi-Bandits,
S. Wang and W. Chen, “Thompson Sampling for Combinatorial Semi-Bandits,” inProc. International Conference on Machine Learn- ing (ICML), 2018, pp. 5101–5109
work page 2018
-
[13]
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits,
P. Perrault, E. Boursier, M. Valko, and V . Perchet, “Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits,” inProc. Ad- vances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 5429–5440
work page 2020
-
[14]
On the Suboptimality of Thompson Sam- pling in High Dimensions,
R. Zhang and R. Combes, “On the Suboptimality of Thompson Sam- pling in High Dimensions,” inProc. Advances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021, pp. 8345–8354
work page 2021
-
[15]
Thompson Sampling for Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox,
——, “Thompson Sampling for Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[16]
Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits,
B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari, “Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits,” inProc. Artificial Intelligence and Statistics (AISTATS), 2015, pp. 535–543
work page 2015
-
[17]
Y . Gai, B. Krishnamachari, and R. Jain, “Combinatorial Network Opti- mization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations,”IEEE/ACM Transactions on Networking (TON), vol. 20, no. 5, pp. 1466–1478, 2012
work page 2012
-
[18]
Combinatorial Bandits Revisited,
R. Combes, M. S. T. M. Shahi, A. Proutiereet al., “Combinatorial Bandits Revisited,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2015, pp. 2116–2124
work page 2015
-
[19]
Combinatorial Multi-Armed Bandit: General Framework and Applications,
W. Chen, Y . Wang, and Y . Yuan, “Combinatorial Multi-Armed Bandit: General Framework and Applications,” inProc. International Confer- ence on Machine Learning (ICML), 2013, pp. 151–159
work page 2013
-
[20]
Tight Lower Bounds for Combinato- rial Multi-Armed Bandits,
N. Merlis and S. Mannor, “Tight Lower Bounds for Combinato- rial Multi-Armed Bandits,” inProc. Conference on Learning The- ory (COLT). PMLR, 2020, pp. 2830–2857
work page 2020
-
[21]
Combinatorial Semi-Bandit with Known Covariance,
R. Degenne and V . Perchet, “Combinatorial Semi-Bandit with Known Covariance,” inProc. Advances in Neural Information Processing Sys- tems (NeurIPS), vol. 29. Curran Associates, Inc., 2016
work page 2016
-
[22]
T. Lattimore and C. Szepesv ´ari,Bandit Algorithms. Cambridge University Press, 2020
work page 2020
-
[23]
Thompson Sampling for Combinatorial Net- work Optimization in Unknown Environments,
A. H ¨uy¨uk and C. Tekin, “Thompson Sampling for Combinatorial Net- work Optimization in Unknown Environments,”IEEE/ACM Transac- tions on Networking (TON), vol. 28, no. 6, pp. 2836–2849, 2020
work page 2020
-
[24]
Sleeping Combi- natorial Bandits,
K. Abhishek, G. Ghalme, S. Gujar, and Y . Narahari, “Sleeping Combi- natorial Bandits,”arXiv preprint arXiv:2106.01624, 2021
-
[25]
Near-Optimal Regret Bounds for Thomp- son Sampling (Corrected),
S. Agrawal and N. Goyal, “Near-Optimal Regret Bounds for Thomp- son Sampling (Corrected),” http://www.columbia.edu/ ∼sa3305/papers/ j3-corrected.pdf, 2017
work page 2017
-
[26]
Worst-Case Regret Bounds for Exploration via Randomized Value Functions,
D. Russo, “Worst-Case Regret Bounds for Exploration via Randomized Value Functions,” inProc. Advances in Neural Information Processing Systems, vol. 32, 2019
work page 2019
-
[27]
Near-Optimal Randomized Exploration for Tabular Markov Decision Processes,
Z. Xiong, R. Shen, Q. Cui, M. Fazel, and S. S. Du, “Near-Optimal Randomized Exploration for Tabular Markov Decision Processes,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2022
work page 2022
-
[28]
Experiences from the Design, Deployment, and Usage of the UCSB MeshNet Testbed,
H. Lundgren, K. Ramachandran, E. Belding-Royer, K. Almeroth, M. Benny, A. Hewatt, A. Touma, and A. Jardosh, “Experiences from the Design, Deployment, and Usage of the UCSB MeshNet Testbed,” IEEE Wireless Communications, vol. 13, no. 2, pp. 18–29, 2006
work page 2006
-
[29]
I. Sheriff, E. Belding, K. C. Almeroth, and K. N. Ramachandran, “UCSB/MeshNet,” February 2007. [Online]. Available: https://doi.org/ 10.15783/C7WS3S APPENDIX Notations:LetF t−1 denote by the history of past ac- tions and rewards until the end of roundt−1. Recall that JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 8 EΘt[·] :=E[· |Θ t]andPr Θt(·)...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.