Spectral Thompson sampling
Pith reviewed 2026-05-10 12:55 UTC · model grok-4.3
The pith
Spectral Thompson sampling achieves a regret bound of order d sqrt(T log N) for graph-structured bandits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the graph bandit setting where the expected payoff of each arm is smooth across neighboring nodes, the SpectralTS algorithm attains a high-probability regret bound of d sqrt(T ln N). The quantity d is the effective dimension of the graph, T is the time horizon, and N is the total number of arms. This scaling matches the best known results for the problem while supplying a computationally lighter alternative to prior algorithms.
What carries the argument
Spectral Thompson Sampling (SpectralTS), which uses the graph's spectral structure to enforce payoff smoothness across neighboring nodes and to reduce the problem to an effective dimension d.
If this is right
- Regret stays sublinear in T and grows only with the effective dimension d rather than the full number of arms N.
- The algorithm applies to large-scale recommender systems and advertising where choices can be arranged on a graph.
- Computational cost remains practical as N increases because the method avoids explicit dependence on the full arm count.
- Performance on synthetic and real data matches or exceeds that of standard Thompson sampling and other baselines.
Where Pith is reading between the lines
- The same spectral reduction could be applied to other bandit algorithms such as UCB when rewards are graph-smooth.
- If the underlying graph must be learned from data, the regret bound would need an extra term that accounts for graph estimation error.
- Graphs with very low effective dimension d would yield the largest practical savings compared with unstructured bandit methods.
Load-bearing premise
Neighboring nodes on the graph have similar expected payoffs, so that a small effective dimension d captures the structure.
What would settle it
A sequence of experiments on a fixed graph with verified payoff smoothness in which SpectralTS regret grows faster than order d sqrt(T ln N) for large T would contradict the bound.
Figures
read the original abstract
Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d*sqrt(T ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SpectralTS, a Thompson sampling algorithm for stochastic bandits where arms correspond to nodes of a graph and expected rewards are assumed smooth with respect to the graph structure. It defines an effective dimension d from the spectrum of the graph Laplacian (or normalized adjacency matrix) and provides an analysis establishing a high-probability regret bound of order d sqrt(T ln N). The algorithm is positioned as computationally more efficient than standard TS for large N while achieving regret comparable to known bounds; empirical results on synthetic and real-world data are also reported.
Significance. If the stated regret bound holds, the work supplies a theoretically supported, scalable alternative for graph-structured bandits that exploits low effective dimension to avoid poor scaling with N. The adaptation of posterior sampling and concentration arguments to a finite-dimensional spectral embedding is a clear strength, as is the explicit high-probability guarantee under the smoothness assumption. This is particularly relevant for recommender and advertising applications where graphs are large but spectrally compressible.
minor comments (3)
- [§2] §2 (algorithm description): the precise form of the posterior update and the truncation to the d-dimensional spectral embedding should be stated explicitly, including how the covariance is maintained in the embedded space.
- [§4] The proof sketch in §4 adapts standard TS tail bounds, but the union bound over the N arms and the dependence on the effective dimension d should be written out with the precise constants and failure probability to make the high-probability claim fully transparent.
- [Experiments] Figure 1 and the real-world experiments: label the axes consistently with the regret definition used in the analysis (cumulative vs. per-step) and report the number of independent runs together with error bars.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our work on SpectralTS and for recommending minor revision. The provided summary correctly captures the algorithm's regret scaling and computational advantages for graph-structured bandits.
Circularity Check
No significant circularity detected
full rationale
The paper derives a high-probability regret bound of order d sqrt(T ln N) for SpectralTS by adapting standard Thompson Sampling concentration inequalities to a finite-dimensional spectral embedding of the graph Laplacian. The effective dimension d is defined directly from the eigenvalues of the graph operator, independently of the regret quantity. No step in the claimed derivation reduces by construction to a fitted parameter, a self-referential definition, or a load-bearing self-citation; the analysis relies on external sub-Gaussian tail bounds and union bounds that are not internal to the result. The derivation is therefore self-contained against standard probabilistic tools.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Expected payoffs of neighboring nodes in the graph are similar
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...
-
[2]
Abbasi-Yadkori, Y.; P\' a l, D.; and Szepesv\' a ri, C. 2011. Improved Algorithms for Linear Stochastic Bandits . In Neural Information Processing Systems (NeurIPS)
work page 2011
-
[3]
Agrawal, S., and Goyal, N. 2012a. Analysis of Thompson Sampling for the multi-armed bandit problem . Conference on Learning Theory (COLT)
-
[4]
Agrawal, S., and Goyal, N. 2012b. Thompson Sampling for Contextual Bandits with Linear Payoffs . CoRR, abs/1209.3352, http://arxiv.org/abs/1209.3352
-
[5]
Agrawal, S., and Goyal, N. 2013a. Further Optimal Regret Bounds for Thompson Sampling . In International Conference on Artificial Intelligence and Statistics (AISTATS) , volume 31, 99--107
-
[6]
Agrawal, S., and Goyal, N. 2013b. Thompson Sampling for Contextual Bandits with Linear Payoffs . In International Conference on Machine Learning (ICML)
-
[7]
Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time Analysis of the Multiarmed Bandit Problem . Machine Learning 47(2-3):235--256
work page 2002
-
[8]
Auer, P. 2002. Using confidence bounds for exploitation-exploration trade-offs . Journal of Machine Learning Research 3:397--422
work page 2002
-
[9]
Barab\' a si, A.-L., and Albert, R. 1999. Emergence of scaling in random networks . Science 286:11
work page 1999
-
[10]
Belkin, M.; Niyogi, P.; and Sindhwani, V. 2006. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples . Journal of Machine Learning Research 7:2399--2434
work page 2006
-
[11]
Chapelle, O., and Li, L. 2011. An Empirical Evaluation of Thompson Sampling . In Neural Information Processing Systems (NeurIPS)
work page 2011
-
[12]
Dani, V.; Hayes, T. P.; and Kakade, S. M. 2008. The Price of Bandit Information for Online Optimization . In Neural Information Processing Systems (NeurIPS) . MIT Press
work page 2008
-
[13]
Kaufmann, E.; Korda, N.; and Munos, R. 2012. Thompson Sampling: An Asymptotically Optimal Finite Time Analysis . Algorithmic Learning Theory (ALT)
work page 2012
-
[14]
Korda, N.; Kaufmann, E.; and Munos, R. 2013. Thompson Sampling for 1-Dimensional Exponential Family Bandits . In Neural Information Processing Systems (NeurIPS)
work page 2013
-
[15]
Koutis, I.; Miller, G. L.; and Peng, R. 2010. Approaching Optimality for Solving SDD Linear Systems . In IEEE Symposium on Foundations of Computer Science (FOCS) , 235--244. IEEE
work page 2010
-
[16]
Lam, S., and Herlocker, J. 2012. MovieLens 1M Dataset . http://www.grouplens.org/node/12
work page 2012
-
[17]
Li, L.; Chu, W.; Langford, J.; and Schapire, R. E. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation . International World Wide Web Conference (WWW)
work page 2010
-
[18]
C.; Korda, N.; Lee, A.; and Leslie, D
May, B. C.; Korda, N.; Lee, A.; and Leslie, D. S. 2012. Optimistic Bayesian sampling in contextual-bandit problems . Journal of Machine Learning Research 13(1):2069--2106
work page 2012
-
[19]
Pazzani, M. J., and Billsus, D. 2007. Content-Based Recommendation Systems . The adaptive web
work page 2007
-
[20]
Russo, D., and Van Roy , B. 2013. Eluder Dimension and the Sample Complexity of Optimistic Exploration . In Neural Information Processing Systems (NeurIPS)
work page 2013
-
[21]
Russo, D., and Van Roy , B. 2014. Learning to Optimize Via Posterior Sampling . Mathematics of Operations Research
work page 2014
-
[22]
Srinivas, N.; Krause, A.; Kakade, S.; and Seeger, M. 2010. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design . International Conference on Machine Learning (ICML) 1015--1022
work page 2010
-
[23]
Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples . Biometrika 25:285--294
work page 1933
-
[24]
Valko, M.; Munos, R.; Kveton, B.; and Koc\' a k, T. 2014. Spectral Bandits for Smooth Graph Functions . In International Conference on Machine Learning (ICML)
work page 2014
-
[25]
Yue, Y.; Hong, S. A.; and Guestrin, C. 2012. Hierarchical Exploration for Accelerating Contextual Bandits . In International Conference on Machine Learning (ICML) , 1895--1902. New York, NY, USA: ACM
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.