pith. machine review for the scientific record. sign in

arxiv: 2605.09277 · v2 · submitted 2026-05-10 · 💻 cs.LG

Worst-Case Regret Bounds for Combinatorial Thompson Sampling in Sleeping Semi-Bandits

Pith reviewed 2026-05-13 07:35 UTC · model grok-4.3

classification 💻 cs.LG
keywords combinatorial thompson samplingsleeping semi-banditsregret boundsworst-case analysisgaussian priorscoordinated explorationadversarial availability
0
0 comments X

The pith

Combinatorial Thompson sampling with Gaussian priors achieves first worst-case regret bound of Õ(m√NT) for sleeping semi-bandits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper delivers the first worst-case regret analysis for combinatorial Thompson sampling using Gaussian priors in semi-bandits where arms can sleep or become unavailable over time. It proves that the standard CTS-G method has regret bounded above by Õ(m√NT) and establishes a matching lower bound of Ω̃(m√NT). The authors introduce CL-SG, a variant that samples one shared Gaussian seed per round to coordinate exploration, which improves the bound to Õ(√(mNT)) with a matching lower bound. These results address gaps in theory for settings like wireless mesh routing with fluctuating links and show better empirical performance than prior CTS variants.

Core claim

CTS-G attains a worst-case regret upper bound of Õ(m√NT) with a matching lower bound Ω̃(m√NT) in sleeping semi-bandits under adversarial availability. CL-SG, by sampling a single shared Gaussian seed each round, attains the improved bound Õ(√(mNT)) with matching lower bound Ω(√(mNT)).

What carries the argument

CL-SG's shared Gaussian seed sampling, which coordinates exploration across arms by using one random seed per round instead of independent samples.

If this is right

  • The regret for CL-SG scales as the square root of the product of the number of arms m and time horizon T, enabling better scaling in large systems.
  • Matching lower bounds confirm that the upper bounds cannot be improved asymptotically.
  • CL-SG outperforms CTS-G and CTS-B on real-world datasets.
  • The analysis applies to adversarial changes in arm availability while keeping combinatorial constraints fixed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • CL-SG's coordination trick could be applied to other Thompson sampling algorithms to reduce regret in non-combinatorial settings.
  • Wireless network designers can now select algorithms with proven regret guarantees for fluctuating link availability.
  • Future work might test these bounds in online settings with learned combinatorial constraints rather than fixed ones.

Load-bearing premise

The feasible actions must satisfy fixed combinatorial constraints that are known in advance and do not change over time.

What would settle it

Observe the cumulative regret of CL-SG in a synthetic sleeping semi-bandit instance with m arms and T rounds under adversarial availability; if it grows faster than order sqrt(mNT), the bound is falsified.

Figures

Figures reproduced from arXiv: 2605.09277 by Bingshan Hu, Jianping Pan, Zhiming Huang.

Figure 2
Figure 2. Figure 2: Edge weights shown as plain text only. No border [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the regret for both settings with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: Setting 1: Wireless Mesh Network. s 1 2 3 4 5 6 7 8 9 t 4 1 2 2 4 3 1 3 1 3 3 1 3 2 [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effect of different exploration rates γ = 0.01, 0.1, 0.5, and 1 for CTS-G and CL-SG. sample per round. CL-SG achieves near-optimal performance, with a worst-case upper bound of O˜( √ mNT) and a matching lower bound of Ω( ˜ √ mNT). Empirically, CL-SG significantly outperforms existing benchmarks such as CTS-B and Com￾bUCB, demonstrating both improved accuracy and efficiency across diverse environments. Look… view at source ↗
Figure 4
Figure 4. Figure 4: Effect of different exploration rates γ = 0.01, 0.1, 0.5, and 1 for CTS-G and CL-SG. CL-SG and CTS-G achieve the lowest regret when γ = 0.01. If we continue to increase γ, both algorithms suffer a larger regret. This suggests that a certain lower level of randomness (exploration) is more effective in practice. VI. CONCLUSION This paper addresses a long-standing open problem by establishing the first worst-… view at source ↗
read the original abstract

We revisit combinatorial Thompson sampling (CTS) for semi-bandits with sleeping arms, where arm availability varies over time and actions must satisfy combinatorial constraints, as in wireless mesh routing with fluctuating link availability. Despite its practical relevance, CTS has been hindered by several long-standing problems: (i) the absence of worst-case regret guarantees in the semi-bandit setting even without sleeping arms, (ii) the lack of theory under adversarially varying availability, and (iii) the consistently weak empirical performance of CTS with Gaussian priors (CTS-G). This paper resolves these long-standing issues by providing the first worst-case regret analysis of CTS-G, proving an upper bound of $\tilde{O}(m\sqrt{NT})$ and a matching lower bound of $\tilde{\Omega}(m\sqrt{NT})$. To bridge the gap between theory and practice, we further propose CL-SG, a simple CTS-G variant that samples a single shared Gaussian seed each round to coordinate exploration across arms. We show that CL-SG achieves an improved regret bound of $\tilde{O}(\sqrt{mNT})$, together with a matching lower bound $\Omega(\sqrt{mNT})$. Experiments on real-world datasets demonstrate that CL-SG consistently outperforms strong baselines including CTS-G and CTS-B, and we open-source our implementation for reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims to deliver the first worst-case regret analysis of combinatorial Thompson sampling with Gaussian priors (CTS-G) for sleeping semi-bandits, establishing a matching upper bound of Õ(m√(NT)) and lower bound of Ω̃(m√(NT)). It further introduces CL-SG, which draws one shared Gaussian seed per round to coordinate exploration, and proves that this variant attains the improved regret Õ(√(mNT)) with a matching lower bound Ω(√(mNT)). Empirical results on real-world datasets are reported to show CL-SG outperforming CTS-G and CTS-B baselines.

Significance. If the stated bounds are correct, the work closes two long-standing gaps: the absence of worst-case guarantees for CTS in semi-bandits (even without sleeping) and the lack of theory under adversarial availability. The matching upper/lower bounds for both algorithms, together with the open-sourced implementation, would constitute a substantive contribution to the combinatorial bandit literature.

major comments (1)
  1. [Abstract / CL-SG analysis] Abstract / CL-SG analysis: the improved Õ(√(mNT)) regret for CL-SG is asserted to follow from sampling a single shared Gaussian seed per round. Under adversarial sleeping, the set of available arms (and therefore the effective support of the seed) changes arbitrarily each round. The manuscript must supply the explicit argument showing that the anti-concentration or variance properties of the shared seed survive this random restriction and do not collapse to the weaker m√(NT) scaling of plain CTS-G; without that step the central improvement claim is not yet secured.
minor comments (1)
  1. [Abstract] The abstract uses m, N, T without immediate definition; an early sentence or footnote clarifying that m is the number of base arms, N the maximum action size, and T the horizon would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the CL-SG analysis requires an explicit argument showing that the shared seed's anti-concentration properties are preserved under adversarial sleeping, and we will add this in the revision.

read point-by-point responses
  1. Referee: [Abstract / CL-SG analysis] Abstract / CL-SG analysis: the improved Õ(√(mNT)) regret for CL-SG is asserted to follow from sampling a single shared Gaussian seed per round. Under adversarial sleeping, the set of available arms (and therefore the effective support of the seed) changes arbitrarily each round. The manuscript must supply the explicit argument showing that the anti-concentration or variance properties of the shared seed survive this random restriction and do not collapse to the weaker m√(NT) scaling of plain CTS-G; without that step the central improvement claim is not yet secured.

    Authors: We agree the current draft leaves this step implicit. In the revised manuscript we will add a dedicated lemma (new Lemma 4.3) proving that the shared standard-Gaussian seed g_t ~ N(0,1), when restricted to the adversarially chosen available set S_t, still satisfies the required anti-concentration: for any fixed direction v supported on S_t, P(|<g_t, v>| < ε) ≤ O(ε) uniformly in S_t. The proof uses the rotational invariance of the Gaussian together with the fact that the restriction operator is a coordinate projection whose operator norm is at most 1; consequently the marginal variance along any unit vector in the span of S_t remains exactly 1, preventing collapse to the per-arm independent scaling of CTS-G. This lemma is then plugged directly into the existing regret decomposition to recover the Õ(√(mNT)) bound. We will also include a short remark explaining why the same argument fails for independent per-arm seeds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds derived via standard analysis techniques

full rationale

The paper presents the first worst-case regret analysis for CTS-G under sleeping semi-bandits, claiming an upper bound of Õ(m√NT) with matching lower bound, and an improved Õ(√(mNT)) bound for the CL-SG variant using a shared Gaussian seed. These derivations rely on standard semi-bandit regret decomposition, concentration inequalities, and Thompson sampling posterior analysis rather than any self-definitional reduction, fitted parameter renamed as prediction, or load-bearing self-citation. No equations or steps in the provided claims reduce the target bounds to quantities defined by the paper's own inputs. The central claims remain independent of any circular construction, consistent with the reader's assessment of standard techniques.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The regret analysis relies on standard concentration inequalities for Gaussian variables and combinatorial structure assumptions common to the bandit literature; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • standard math Standard sub-Gaussian concentration bounds for Thompson sampling analysis
    Invoked implicitly for deriving the regret upper bounds in the semi-bandit setting.

pith-pipeline@v0.9.0 · 5533 in / 1205 out tokens · 51214 ms · 2026-05-13T07:35:39.241557+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Bridging the Regret Gap in Combinatorial Thompson Sampling: Worst-Case Guarantees and Algorithmic Refine- ment,

    Z. Huang, B. Hu, and J. Pan, “Bridging the Regret Gap in Combinatorial Thompson Sampling: Worst-Case Guarantees and Algorithmic Refine- ment,” inProc. IEEE Conference on Computer Communications (IN- FOCOM). IEEE, 2026

  2. [2]

    Regret Bounds for Sleeping Experts and Bandits,

    R. Kleinberg, A. Niculescu-Mizil, and Y . Sharma, “Regret Bounds for Sleeping Experts and Bandits,”Machine Learning, vol. 80, no. 2-3, pp. 245–272, 2010

  3. [3]

    Analysis of Thompson Sampling for Stochastic Sleeping Bandits,

    A. Chatterjee, G. Ghalme, S. Jain, R. Vaish, and Y . Narahari, “Analysis of Thompson Sampling for Stochastic Sleeping Bandits,” inProc. Conference on Uncertainty in Artificial Intelligence (UAI), 2017

  4. [4]

    Intelligent Caching Algorithms in Heterogeneous Wireless Networks with Uncertainty,

    B. Hu, Y . Chen, Z. Huang, N. A. Mehta, and J. Pan, “Intelligent Caching Algorithms in Heterogeneous Wireless Networks with Uncertainty,” in Proc. IEEE Conference on Distributed Computing Systems (ICDCS). IEEE, 2019

  5. [5]

    Hardness of Online Sleeping Combinatorial Optimization Problems,

    S. Kale, C. Lee, and D. Pal, “Hardness of Online Sleeping Combinatorial Optimization Problems,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 2181–2189

  6. [6]

    Online Combinatorial Optimization with Stochastic Decision Sets and Adversarial Losses,

    G. Neu and M. Valko, “Online Combinatorial Optimization with Stochastic Decision Sets and Adversarial Losses,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2014, pp. 2780– 2788

  7. [7]

    Combinatorial Sleeping Bandits with Fairness Constraints,

    F. Li, J. Liu, and B. Ji, “Combinatorial Sleeping Bandits with Fairness Constraints,” inProc. IEEE Conference on Computer Communica- tions (INFOCOM). IEEE, May 2019, pp. 1702–1710

  8. [8]

    Combinatorial Sleeping Bandits with Fairness Constraints,

    ——, “Combinatorial Sleeping Bandits with Fairness Constraints,”IEEE Transactions on Network Science and Engineering (TNSE), vol. 7, no. 3, pp. 1799–1813, 2019

  9. [9]

    Achieving Regular and Fair Learning in Combi- natorial Multi-Armed Bandit,

    X. Wu and B. Li, “Achieving Regular and Fair Learning in Combi- natorial Multi-Armed Bandit,” inProc. IEEE Conference on Computer Communications (INFOCOM). IEEE, 2024, pp. 361–370

  10. [10]

    On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit,

    X. Wu, B. Ji, and B. Li, “On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit,” inProc. IEEE Conference on Computer Communications (INFOCOM), 2025

  11. [11]

    An Empirical Evaluation of Thompson Sampling,

    O. Chapelle and L. Li, “An Empirical Evaluation of Thompson Sampling,” inProc. Advances in Neural Information Processing Sys- tems (NeurIPS), 2011, pp. 2249–2257

  12. [12]

    Thompson Sampling for Combinatorial Semi-Bandits,

    S. Wang and W. Chen, “Thompson Sampling for Combinatorial Semi-Bandits,” inProc. International Conference on Machine Learn- ing (ICML), 2018, pp. 5101–5109

  13. [13]

    Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits,

    P. Perrault, E. Boursier, M. Valko, and V . Perchet, “Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits,” inProc. Ad- vances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 5429–5440

  14. [14]

    On the Suboptimality of Thompson Sam- pling in High Dimensions,

    R. Zhang and R. Combes, “On the Suboptimality of Thompson Sam- pling in High Dimensions,” inProc. Advances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021, pp. 8345–8354

  15. [15]

    Thompson Sampling for Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox,

    ——, “Thompson Sampling for Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2024

  16. [16]

    Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits,

    B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari, “Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits,” inProc. Artificial Intelligence and Statistics (AISTATS), 2015, pp. 535–543

  17. [17]

    Combinatorial Network Opti- mization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations,

    Y . Gai, B. Krishnamachari, and R. Jain, “Combinatorial Network Opti- mization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations,”IEEE/ACM Transactions on Networking (TON), vol. 20, no. 5, pp. 1466–1478, 2012

  18. [18]

    Combinatorial Bandits Revisited,

    R. Combes, M. S. T. M. Shahi, A. Proutiereet al., “Combinatorial Bandits Revisited,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2015, pp. 2116–2124

  19. [19]

    Combinatorial Multi-Armed Bandit: General Framework and Applications,

    W. Chen, Y . Wang, and Y . Yuan, “Combinatorial Multi-Armed Bandit: General Framework and Applications,” inProc. International Confer- ence on Machine Learning (ICML), 2013, pp. 151–159

  20. [20]

    Tight Lower Bounds for Combinato- rial Multi-Armed Bandits,

    N. Merlis and S. Mannor, “Tight Lower Bounds for Combinato- rial Multi-Armed Bandits,” inProc. Conference on Learning The- ory (COLT). PMLR, 2020, pp. 2830–2857

  21. [21]

    Combinatorial Semi-Bandit with Known Covariance,

    R. Degenne and V . Perchet, “Combinatorial Semi-Bandit with Known Covariance,” inProc. Advances in Neural Information Processing Sys- tems (NeurIPS), vol. 29. Curran Associates, Inc., 2016

  22. [22]

    Lattimore and C

    T. Lattimore and C. Szepesv ´ari,Bandit Algorithms. Cambridge University Press, 2020

  23. [23]

    Thompson Sampling for Combinatorial Net- work Optimization in Unknown Environments,

    A. H ¨uy¨uk and C. Tekin, “Thompson Sampling for Combinatorial Net- work Optimization in Unknown Environments,”IEEE/ACM Transac- tions on Networking (TON), vol. 28, no. 6, pp. 2836–2849, 2020

  24. [24]

    Sleeping Combi- natorial Bandits,

    K. Abhishek, G. Ghalme, S. Gujar, and Y . Narahari, “Sleeping Combi- natorial Bandits,”arXiv preprint arXiv:2106.01624, 2021

  25. [25]

    Near-Optimal Regret Bounds for Thomp- son Sampling (Corrected),

    S. Agrawal and N. Goyal, “Near-Optimal Regret Bounds for Thomp- son Sampling (Corrected),” http://www.columbia.edu/ ∼sa3305/papers/ j3-corrected.pdf, 2017

  26. [26]

    Worst-Case Regret Bounds for Exploration via Randomized Value Functions,

    D. Russo, “Worst-Case Regret Bounds for Exploration via Randomized Value Functions,” inProc. Advances in Neural Information Processing Systems, vol. 32, 2019

  27. [27]

    Near-Optimal Randomized Exploration for Tabular Markov Decision Processes,

    Z. Xiong, R. Shen, Q. Cui, M. Fazel, and S. S. Du, “Near-Optimal Randomized Exploration for Tabular Markov Decision Processes,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2022

  28. [28]

    Experiences from the Design, Deployment, and Usage of the UCSB MeshNet Testbed,

    H. Lundgren, K. Ramachandran, E. Belding-Royer, K. Almeroth, M. Benny, A. Hewatt, A. Touma, and A. Jardosh, “Experiences from the Design, Deployment, and Usage of the UCSB MeshNet Testbed,” IEEE Wireless Communications, vol. 13, no. 2, pp. 18–29, 2006

  29. [29]

    UCSB/MeshNet,

    I. Sheriff, E. Belding, K. C. Almeroth, and K. N. Ramachandran, “UCSB/MeshNet,” February 2007. [Online]. Available: https://doi.org/ 10.15783/C7WS3S APPENDIX Notations:LetF t−1 denote by the history of past ac- tions and rewards until the end of roundt−1. Recall that JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 8 EΘt[·] :=E[· |Θ t]andPr Θt(·)...