arxiv: 2604.25272 · v1 · submitted 2026-04-28 · 📊 stat.ML · cs.AI· cs.LG

Recognition: unknown

Spectral bandits

Tom\'a\v{s} Koc\'ak , R\'emi Munos , Branislav Kveton , Shipra Agrawal , Michal Valko

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:49 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG

keywords spectral banditsgraph banditseffective dimensioncontent recommendationonline learningregret minimizationsmooth functions on graphsbandit algorithms

0 comments

The pith

Bandit algorithms learn graph-smooth payoffs with regret that scales in the graph's effective dimension rather than its size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies a bandit problem where the expected payoffs of arms form a smooth function on an undirected graph, a setting that directly models content recommendation where similar items receive similar ratings. The authors introduce an effective dimension of the graph that stays small on real-world structures and give three algorithms whose regret grows only linearly or sublinearly with that dimension. Traditional methods would incur regret linear in the total number of arms, which becomes infeasible for graphs with thousands of nodes. The new algorithms therefore make online selection practical by recovering a useful estimator of all node values after evaluating only a few dozen of them.

Core claim

We study a bandit problem where the payoffs of arms are smooth on a graph. We introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.

What carries the argument

The effective dimension of the graph, a measure of the complexity of smooth functions on the graph that controls how regret grows with the number of arms.

If this is right

Regret remains manageable even when the graph contains thousands of nodes, provided the effective dimension stays small.
The same algorithms apply directly to content-based recommendation where items form an undirected graph.
A useful estimator of all item values can be obtained after evaluating only tens of nodes.
The framework transfers smoothness ideas from manifold learning into an online decision setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The effective dimension could be estimated on the fly from observed payoffs to adapt the algorithm without knowing the graph spectrum in advance.
The same smoothness assumption might let the approach extend to time-varying graphs where edges or labels drift slowly.
Connections to spectral methods in semi-supervised learning suggest the algorithms could also accelerate batch recommendation tasks.

Load-bearing premise

The expected payoffs are smooth on the graph, so neighboring nodes have similar values and the effective dimension suffices to bound the regret.

What would settle it

Running the algorithms on a graph whose payoffs violate smoothness and observing that regret then grows linearly with the total number of nodes instead of the effective dimension.

Figures

Figures reproduced from arXiv: 2604.25272 by Branislav Kveton, Michal Valko, R\'emi Munos, Shipra Agrawal, Tom\'a\v{s} Koc\'ak.

**Figure 2.** Figure 2: Difference between d and 2dold for real world datasets. From left to right: Flixster dataset with N = 972, Movielens dataset with N = 618, and LastFM dataset with N = 804. Figures 2 and 3 show how d behaves compared to 2dold on the generated and the real Flixster, Movielens, and LastFM network graphs.1 We use some of them for the experiments 1We set Λ to ΛL + λI with λ = 0.1, where ΛL is the graph Laplacia… view at source ↗

**Figure 3.** Figure 3: Difference between d and 2dold for random graphs on N = 1000 nodes. From left to right: Erd˝os-Renyi graph with the probability 0.03 of an edge, Barab´asi-Albert graph with one edge per added node, Barab´asi-Albert graph with ten edges per added node. in Section 7. The figures clearly demonstrate the gap between d and 2dold while both of the quantities are much smaller then D. In fact, effective dimension … view at source ↗

**Figure 4.** Figure 4: The dependence of cumulative regret on confidence and regularization parameters. view at source ↗

**Figure 5.** Figure 5: Cumulative regret comparison of algorithms for different underlying graphs. view at source ↗

**Figure 6.** Figure 6: Cumulative regret of SpectralTS and SpectralUCB for reward functions with different smoothness. 34 view at source ↗

**Figure 7.** Figure 7: The impact of lazy updates and Sherman-Morrison formula on running time. view at source ↗

**Figure 8.** Figure 8: Cumulative regret and running time of SpectralUCB with reduced basis. The second part of the dataset is used for parameter estimation. Similarly as for the first part, we complete ratings using low-rank factorization. The reason for using a different part of the dataset is to avoid dependencies. The last part of the dataset is used to build our similarity graph over movies. We factor the dataset in the sam… view at source ↗

**Figure 9.** Figure 9: Comparison of spectral and linear bandit algorithms for a subset of users. view at source ↗

**Figure 10.** Figure 10: Comparison of spectral and linear bandit algorithms. view at source ↗

read the original abstract

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable way to run bandits on graph-structured arms by assuming payoff smoothness and introducing an effective dimension that lets regret scale with that quantity rather than the full number of nodes.

read the letter

The main contribution is framing a bandit problem where arms sit on an undirected graph and expected payoffs are smooth across edges, then defining an effective dimension that stays small on real graphs and using it to design three algorithms whose regret scales linearly or sublinearly in that dimension. This directly targets content recommendation, where items are nodes and you want to learn user preferences without evaluating every node. The experiments claim you can get a good estimator for thousands of items from only tens of pulls, which matches the practical goal. What works is the connection to spectral methods: the effective dimension quantifies how much the graph structure reduces the problem, and the algorithms appear to exploit that without needing the full ambient dimension. The setup is a natural fit for manifold or semi-supervised settings that already use graph smoothness. The soft spot is the reliance on the smoothness assumption holding strongly enough in the data. If neighboring nodes do not actually have similar payoffs in the recommendation graphs or ratings, the effective dimension grows and the scaling advantage disappears. The abstract and experiments do not show explicit checks or ablations on how well smoothness is satisfied or how sensitive performance is when it weakens, so the claimed gains could be narrower than presented. The regret analysis likely follows standard concentration arguments once the dimension is fixed, but the dependence on the graph Laplacian or kernel needs to be tight. This is for people working on structured bandits, graph-based online learning, or recommendation systems. It deserves a serious referee because the framing and scaling idea are clear and the application is relevant, even if the experimental validation of the core assumption could be strengthened.

Referee Report

2 major / 2 minor

Summary. The paper studies a bandit problem in which arm payoffs are smooth on an undirected graph, making it applicable to content recommendation where items are nodes and expected ratings are similar for neighbors. It introduces the notion of an effective dimension d_eff (derived from the graph and smoothness) that is claimed to be small for real-world graphs, proposes three algorithms whose regret scales linearly or sublinearly in d_eff rather than the number of nodes, and reports experiments showing that user preferences over thousands of items can be estimated from only tens of node evaluations.

Significance. If the smoothness assumption holds and d_eff is both small and controls regret as claimed, the framework offers a principled way to obtain scalable regret bounds for graph-structured bandits, with direct relevance to recommendation systems. The effective-dimension concept is a potentially useful addition if rigorously tied to the Laplacian or kernel and supported by verifiable bounds. The experimental claim of learning from few evaluations would be a strong practical result if the graphs and data satisfy the smoothness needed to keep d_eff small.

major comments (2)

[Experimental evaluation] The central regret-scaling claim requires that expected payoffs are sufficiently smooth on the graph so that d_eff remains small. The experimental section provides no measurement, visualization, or statistical check of smoothness (e.g., correlation of ratings between neighbors) or of the realized value of d_eff on the recommendation graphs used. Without this, the reported performance cannot be attributed to the claimed linear/sublinear scaling in d_eff rather than to other factors.
[§3] §3 (algorithm descriptions) and the regret analysis: the three algorithms are stated to achieve regret linear or sublinear in d_eff, but the derivation of the effective dimension and the precise dependence of the regret bound on d_eff (including any hidden constants or assumptions on the graph spectrum) are not shown to be parameter-free or independent of the ambient dimension in a way that survives when smoothness is only approximate.

minor comments (2)

[§2] Notation for the graph Laplacian and the precise definition of d_eff should be introduced earlier and used consistently; several passages refer to “the effective dimension” without an equation reference on first use.
[Experimental evaluation] The experimental figures would benefit from error bars or multiple random seeds; current plots appear to show single runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of our results and analysis.

read point-by-point responses

Referee: [Experimental evaluation] The central regret-scaling claim requires that expected payoffs are sufficiently smooth on the graph so that d_eff remains small. The experimental section provides no measurement, visualization, or statistical check of smoothness (e.g., correlation of ratings between neighbors) or of the realized value of d_eff on the recommendation graphs used. Without this, the reported performance cannot be attributed to the claimed linear/sublinear scaling in d_eff rather than to other factors.

Authors: We agree that explicit verification of the smoothness assumption and computation of d_eff on the experimental graphs would better support attribution of the observed performance to the theoretical scaling. In the revised manuscript, we will add measurements of neighbor rating correlations (including visualizations and summary statistics) along with the realized values of d_eff for each recommendation graph used. These additions will confirm that d_eff remains small in the datasets considered and that the smoothness condition holds sufficiently to justify the regret bounds. revision: yes
Referee: [§3] §3 (algorithm descriptions) and the regret analysis: the three algorithms are stated to achieve regret linear or sublinear in d_eff, but the derivation of the effective dimension and the precise dependence of the regret bound on d_eff (including any hidden constants or assumptions on the graph spectrum) are not shown to be parameter-free or independent of the ambient dimension in a way that survives when smoothness is only approximate.

Authors: The effective dimension is defined via the spectrum of the normalized graph Laplacian under the smoothness prior, specifically d_eff = sum_{i=1}^n 1/(1 + mu * lambda_i) for smoothness parameter mu. The regret bounds for the three algorithms are derived by reducing the problem to an effective subspace of dimension d_eff, yielding bounds that scale as O(d_eff sqrt(T log T)) or better and are independent of the ambient dimension n. We will expand the algorithm descriptions in §3 and the regret analysis to include the complete derivation, explicit dependence on d_eff, discussion of constants, and an extension to approximate smoothness that adds a controlled bias term. This will clarify that the bounds remain valid and parameter-free in the relevant sense when the smoothness assumption holds approximately. revision: yes

Circularity Check

0 steps flagged

No circularity: effective dimension defined from graph and smoothness premise

full rationale

The paper states the smoothness assumption on payoffs upfront as a modeling premise for the bandit problem on graphs, then defines effective dimension as a graph property that is small under this assumption. Algorithms are constructed to achieve regret scaling in this dimension, and experiments report empirical performance on recommendation data. No step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation, or definitional tautology; the central claims remain independent of their inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption of graph smoothness and the newly introduced effective dimension; no free parameters or invented entities with independent evidence are described in the abstract.

axioms (1)

domain assumption Payoffs of arms are smooth on the graph (expected ratings similar for neighboring nodes)
Explicitly stated as the framework for the bandit problem and content recommendation application.

invented entities (1)

effective dimension no independent evidence
purpose: Measure of graph complexity that controls algorithm scaling and regret
New notion introduced in the paper and claimed to be small for real-world graphs.

pith-pipeline@v0.9.0 · 5470 in / 1222 out tokens · 68749 ms · 2026-05-07T14:49:40.909654+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 11 canonical work pages

[1]

https://papers.nips.cc/paper/4417-improved-algorithms-for-linear-stochastic-bandits.pdf Improved algorithms for linear stochastic bandits

Yasin Abbasi-Yadkori, D \' a vid P \' a l, and Csaba Szepesv \' a ri. https://papers.nips.cc/paper/4417-improved-algorithms-for-linear-stochastic-bandits.pdf Improved algorithms for linear stochastic bandits . In Neural Information Processing Systems (NeurIPS), 2011

2011
[2]

http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf Linear Thompson sampling revisited

Marc Abeille and Alessandro Lazaric. http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf Linear Thompson sampling revisited . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017

2017
[3]

http://web.eecs.umich.edu/ jabernet/123-Abernethy.pdf Competing in the dark: An efficient algorithm for bandit linear optimization

Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. http://web.eecs.umich.edu/ jabernet/123-Abernethy.pdf Competing in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory (COLT), 2008

2008
[4]

http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf Analysis of Thompson sampling for the multi-armed bandit problem

Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf Analysis of Thompson sampling for the multi-armed bandit problem . In Conference on Learning Theory (COLT), 2012

2012
[5]

http://proceedings.mlr.press/v31/agrawal13a.pdf Further optimal regret bounds for Thompson sampling

Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v31/agrawal13a.pdf Further optimal regret bounds for Thompson sampling . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2013 a

2013
[6]

http://proceedings.mlr.press/v28/agrawal13.pdf Thompson sampling for contextual bandits with linear payoffs

Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v28/agrawal13.pdf Thompson sampling for contextual bandits with linear payoffs . In International Conference on Machine Learning (ICML), 2013 b

2013
[7]

https://papers.nips.cc/paper/4908-from-bandits-to-experts-a-tale-of-domination-and-independence.pdf From bandits to experts: A tale of domination and independence

Noga Alon, Nicol \` o Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. https://papers.nips.cc/paper/4908-from-bandits-to-experts-a-tale-of-domination-and-independence.pdf From bandits to experts: A tale of domination and independence . In Neural Information Processing Systems (NeurIPS), 2013

2013
[8]

http://proceedings.mlr.press/v40/Alon15.pdf Online learning with feedback graphs: Beyond bandits

Noga Alon, Nicol \` o Cesa-Bianchi, Ofer Dekel, and Tomer Koren. http://proceedings.mlr.press/v40/Alon15.pdf Online learning with feedback graphs: Beyond bandits . In Conference on Learning Theory (COLT), 2015

2015
[9]

https://arxiv.org/abs/1409.8428 Nonstochastic multi-armed bandits with graph-structured feedback

Noga Alon, Nicol \` o Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, and Ohad Shamir. https://arxiv.org/abs/1409.8428 Nonstochastic multi-armed bandits with graph-structured feedback . SIAM Journal on Computing, 46 0 (6): 0 1785--1826, 2017

work page arXiv 2017
[10]

http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf Using confidence bounds for exploitation-exploration trade-offs

Peter Auer. http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf Using confidence bounds for exploitation-exploration trade-offs . Journal of Machine Learning Research, 3: 0 397--422, 2002

2002
[11]

http://personal.unileoben.ac.at/rortner/Pubs/UCBRev.pdf revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Peter Auer and Ronald Ortner. http://personal.unileoben.ac.at/rortner/Pubs/UCBRev.pdf revisited: Improved regret bounds for the stochastic multi-armed bandit problem . Periodica Mathematica Hungarica, 2010

2010
[12]

The nonstochastic multiarmed bandit problem,

Peter Auer, Nicol \` o Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. https://epubs.siam.org/doi/pdf/10.1137/S0097539701398375 The nonstochastic multi-armed bandit problem . Journal on Computing, 32 0 (1): 0 48--77, 2002

work page doi:10.1137/s0097539701398375 2002
[13]

http://people.cs.uchicago.edu/ niyogi/papersps/reg \_ colt.pdf Regularization and semi-supervised learning on large graphs

Mikhail Belkin, Irina Matveeva, and Partha Niyogi. http://people.cs.uchicago.edu/ niyogi/papersps/reg \_ colt.pdf Regularization and semi-supervised learning on large graphs . In Conference on Learning Theory (COLT), 2004

2004
[14]

http://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Manifold regularization: A geometric framework for learning from labeled and unlabeled examples

Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. http://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Manifold regularization: A geometric framework for learning from labeled and unlabeled examples . Journal of Machine Learning Research, 7: 0 2399--2434, 2006

2006
[15]

Pazzani, and James Chen

Daniel Billsus, Michael J. Pazzani, and James Chen. https://www.ics.uci.edu/ pazzani/Publications/billsuspazzanichen.pdf A learning agent for wireless news access . In International Conference on Intelligent User Interfaces, 2000

2000
[16]

http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf -armed bandits

S \' e bastien Bubeck, R \' e mi Munos, Gilles Stoltz, and Csaba Szepesv \' a ri. http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf -armed bandits . Journal of Machine Learning Research, 12: 0 1587--1627, 2011

2011
[17]

S \' e bastien Bubeck, Nicol \` o Cesa-Bianchi, and Sham M. Kakade. http://proceedings.mlr.press/v23/bubeck12a/bubeck12a.pdf Towards minimax policies for online linear optimization with bandit feedback . In Conference on Learning Theory (COLT), 2012

2012
[18]

Swapna Buccapatnam, Atilla Eryilmaz, and Ness B. Shroff. https://www.orie.cornell.edu/orie/research/groups/multheavytail/upload/mabSigfinal.pdf Stochastic bandits with side observations on networks . In International Conference on Measurement and Modeling of Computer Systems, 2014

2014
[19]

https://arxiv.org/pdf/1210.4839.pdf Leveraging side observations in stochastic bandits

St \' e phane Caron, Branislav Kveton, Marc Lelarge, and Smriti Bhagat. https://arxiv.org/pdf/1210.4839.pdf Leveraging side observations in stochastic bandits. In Uncertainty in Artificial Intelligence (UAI), 2012

work page arXiv 2012
[20]

http://www.ii.uni.wroc.pl/ lukstafi/pmwiki/uploads/AGT/Prediction \_ Learning \_ and \_ Games.pdf Prediction, learning, and games

Nicol \` o Cesa-Bianchi and G \' a bor Lugosi. http://www.ii.uni.wroc.pl/ lukstafi/pmwiki/uploads/AGT/Prediction \_ Learning \_ and \_ Games.pdf Prediction, learning, and games . Cambridge University Press, 2006

2006
[21]

https://papers.nips.cc/paper/5006-a-gang-of-bandits.pdf A gang of bandits

Nicol \` o Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. https://papers.nips.cc/paper/5006-a-gang-of-bandits.pdf A gang of bandits . In Neural Information Processing Systems (NeurIPS), 2013

2013
[22]

https://arxiv.org/pdf/1605.08722.pdf An empirical evaluation of Thompson sampling

Olivier Chapelle and Lihong Li. https://arxiv.org/pdf/1605.08722.pdf An empirical evaluation of Thompson sampling . In Neural Information Processing Systems (NeurIPS). 2011

work page arXiv 2011
[23]

Hong, and Christos Faloutsos

Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. https://www.cc.gatech.edu/ dchau/papers/11-chi-apolo.pdf Apolo: Making sense of large network data by combining rich user interaction and machine learning . In Conference on Human Factors in Computing Systems, 2011

2011
[24]

http://proceedings.mlr.press/v15/chu11a/chu11a.pdf Contextual bandits with linear payoff functions

Lei Chu, Lihong Li, Lev Reyzin, and Robert E Schapire. http://proceedings.mlr.press/v15/chu11a/chu11a.pdf Contextual bandits with linear payoff functions . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011

2011
[25]

http://proceedings.mlr.press/v32/combes14.pdf Unimodal bandits: Regret lower bounds and optimal algorithms

Richard Combes and Alexandre Prouti \` e re. http://proceedings.mlr.press/v32/combes14.pdf Unimodal bandits: Regret lower bounds and optimal algorithms . In International Conference on Machine Learning (ICML), 2014

2014
[26]

https://repository.upenn.edu/cgi/viewcontent.cgi?article=1501 & context=statistics \_ papers Stochastic linear optimization under bandit feedback

Varsha Dani, Thomas P Hayes, and Sham M Kakade. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1501 & context=statistics \_ papers Stochastic linear optimization under bandit feedback . In Conference on Learning Theory (COLT), 2008

2008
[27]

https://icml.cc/2012//papers/602.pdf Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization

Thomas Desautels, Andreas Krause, and Joel Burdick. https://icml.cc/2012//papers/602.pdf Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization . In International Conference on Machine Learning (ICML), 2012

2012
[28]

Meng Fang and Dacheng Tao. http://delivery.acm.org/10.1145/2630000/2623672/p1106-fang.pdf?ip=193.49.212.233 & id=2623672 & acc=ACTIVE SERVICE & key=7EBF6E77E86B478F.5C2A4B72BE2A7DDF.4D4702B0C3E38B35.4D4702B0C3E38B35 & \_ \_ acm \_ \_ =1528847270 \_ 23b4902004322713593e1389d42ae48c Networked bandits with disjoint linear payoffs . In International Conferenc...

work page doi:10.1145/2630000/2623672/p1106-fang.pdf 2014
[29]

http://proceedings.mlr.press/v32/gentile14.pdf Online clustering of bandits

Claudio Gentile, Shuai Li, and Giovanni Zappella. http://proceedings.mlr.press/v32/gentile14.pdf Online clustering of bandits . In International Conference on Machine Learning (ICML), 2014

2014
[30]

http://proceedings.mlr.press/v70/gentile17a/gentile17a.pdf On context-dependent clustering of bandits

Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Giovanni Zappella, and Evans Etrue. http://proceedings.mlr.press/v70/gentile17a/gentile17a.pdf On context-dependent clustering of bandits . In International Conference on Machine Learning (ICML), 2017

2017
[31]

http://www.cs.utexas.edu/users/dml/software/graclus.html http://www.cs.utexas.edu/users/dml/Software/graclus.html

Graclus. http://www.cs.utexas.edu/users/dml/software/graclus.html http://www.cs.utexas.edu/users/dml/Software/graclus.html. University of Texas , 2013

2013
[32]

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7023409 Online spectral learning on a graph with bandit feedback

Quanquan Gu and Jiawei Han. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7023409 Online spectral learning on a graph with bandit feedback . In International Conference on Data Mining, 2014

2014
[33]

http://proceedings.mlr.press/v37/hanawal15.pdf Cheap bandits

Manjesh Hanawal, Venkatesh Saligrama, Michal Valko, and R \' e mi Munos. http://proceedings.mlr.press/v37/hanawal15.pdf Cheap bandits . In International Conference on Machine Learning (ICML), 2015

2015
[34]

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.691 & rep=rep1 & type=pdf A matrix factorization technique with trust propagation for recommendation in social networks

Mohsen Jamali and Martin Ester. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.691 & rep=rep1 & type=pdf A matrix factorization technique with trust propagation for recommendation in social networks . In Conference on Recommender systems, 2010

2010
[35]

www.scholat.com/teamwork/teamworkdownloadscholar.html?id=542 & teamId=316 Recommender systems: An introduction

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. www.scholat.com/teamwork/teamworkdownloadscholar.html?id=542 & teamId=316 Recommender systems: An introduction . Cambridge University Press, 2010

2010
[36]

https://arxiv.org/pdf/1205.4217.pdf Thompson sampling: An asymptotically optimal finite-time analysis

Emilie Kaufmann, Nathaniel Korda, and R \' e mi Munos. https://arxiv.org/pdf/1205.4217.pdf Thompson sampling: An asymptotically optimal finite-time analysis . Algorithmic Learning Theory, 2012

work page arXiv 2012
[37]

https://arxiv.org/pdf/0901.3150.pdf Matrix completion from a few entries

Raghunandan Keshavan, Sewoong Oh, and Andrea Montanari. https://arxiv.org/pdf/0901.3150.pdf Matrix completion from a few entries . In International Symposium on Information Theory, 2009

work page arXiv 2009
[38]

https://arxiv.org/pdf/0809.4882.pdf Multi-armed bandit problems in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. https://arxiv.org/pdf/0809.4882.pdf Multi-armed bandit problems in metric spaces . In Symposium on Theory of Computing (STOC), 2008

work page arXiv 2008
[39]

Tom \' a s Koc \' a k, Gergely Neu, Michal Valko, and R \' e mi Munos. https://papers.nips.cc/paper/5462-efficient-learning-by-implicit-exploration-in-bandit-problems-with-side-observations.pdf Efficient learning by implicit exploration in bandit problems with side observations . In Neural Information Processing Systems (NeurIPS), 2014 a

2014
[40]

https://hal.inria.fr/hal-00981575v2/document Spectral Thompson sampling

Tom \' a s Koc \' a k, Michal Valko, R \' e mi Munos, and Shipra Agrawal. https://hal.inria.fr/hal-00981575v2/document Spectral Thompson sampling . In AAAI Conference on Artificial Intelligence (AAAI), 2014 b

2014
[41]

http://proceedings.mlr.press/v51/kocak16-supp.pdf Online learning with noisy side observations

Tom \' a s Koc \' a k, Gergely Neu, and Michal Valko. http://proceedings.mlr.press/v51/kocak16-supp.pdf Online learning with noisy side observations . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2016 a

2016
[42]

https://hal.inria.fr/hal-01320588/document Online learning with Erd os-R \' e nyi side-observation graphs

Tom \' a s Koc \' a k, Gergely Neu, and Michal Valko. https://hal.inria.fr/hal-01320588/document Online learning with Erd os-R \' e nyi side-observation graphs . In Uncertainty in Artificial Intelligence (UAI), 2016 b

2016
[43]

http://proceedings.mlr.press/v48/korda16.pdf Distributed clustering of linear bandits in peer to peer networks

Nathan Korda, Bal \' a zs Sz \" o r \' e nyi, and Shuai Li. http://proceedings.mlr.press/v48/korda16.pdf Distributed clustering of linear bandits in peer to peer networks . In International Conference on Machine Learning (ICML), 2016

2016
[44]

Miller, and David Tolliver

Ioannis Koutis, Gary L. Miller, and David Tolliver. http://www.cs.cmu.edu/ ./jkoutis/papers/cviu \_ preprint.pdf Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing . Computer Vision and Image Understanding, 115 0 (12): 0 1638--1646, 2011

2011
[45]

http://www.grouplens.org/node/12 http://www.grouplens.org/node/12

Shyong Lam and Jon Herlocker. http://www.grouplens.org/node/12 http://www.grouplens.org/node/12. MovieLens 1M dataset , 2012

2012
[46]

Schapire

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. http://rob.schapire.net/papers/www10.pdf A contextual-bandit approach to personalized news article recommendation . International World Wide Web Conference, 2010

2010
[47]

https://arxiv.org/pdf/1502.03473.pdf Collaborative filtering bandits

Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. https://arxiv.org/pdf/1502.03473.pdf Collaborative filtering bandits . In Conference on Research and Development in Information Retrieval, 2016

work page arXiv 2016
[48]

https://pdfs.semanticscholar.org/f72b/71c747d2f487e8c0ade09f4d31e4ad2c0185.pdf Active search and bandits on graphs using sigma-optimality

Yifei Ma, Tzu-Kuo Huang, and Jeff Schneider. https://pdfs.semanticscholar.org/f72b/71c747d2f487e8c0ade09f4d31e4ad2c0185.pdf Active search and bandits on graphs using sigma-optimality . In Uncertainty in Artificial Intelligence (UAI), 2015

2015
[49]

https://papers.nips.cc/paper/4366-from-bandits-to-experts-on-the-value-of-side-observations.pdf From bandits to experts: On the value of side-observations

Shie Mannor and Ohad Shamir. https://papers.nips.cc/paper/4366-from-bandits-to-experts-on-the-value-of-side-observations.pdf From bandits to experts: On the value of side-observations . In Neural Information Processing Systems (NeurIPS), 2011

2011
[50]

May, Nathaniel Korda, Anthony Lee, and David S

Benedict C. May, Nathaniel Korda, Anthony Lee, and David S. Leslie. http://www.jmlr.org/papers/volume13/may12a/may12a.pdf Optimistic Bayesian sampling in contextual-bandit problems . Journal of Machine Learning Research, 13 0 (1): 0 2069--2106, 2012

2069
[51]

http://aris.ss.uci.edu/ lin/52.pdf Birds of a feather: Homophily in social networks

Miller McPherson, Lynn Smith-Lovin, and James Cook. http://aris.ss.uci.edu/ lin/52.pdf Birds of a feather: Homophily in social networks . Annual Review of Sociology, 27: 0 415--444, 2001

2001
[52]

Narang, Akshay Gadde, and Antonio Ortega

Sunil K. Narang, Akshay Gadde, and Antonio Ortega. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.650.2525 & rep=rep1 & type=pdf Signal processing techniques for interpolation in graph structured data . In International Conference on Acoustics, Speech and Signal Processing, 2013

2013
[53]

http://proceedings.mlr.press/v19/slivkins11a/slivkins11a.pdf Contextual bandits with similarity information

Aleksandrs Slivkins. http://proceedings.mlr.press/v19/slivkins11a/slivkins11a.pdf Contextual bandits with similarity information . In Conference on Learning Theory (COLT), 2009

2009
[54]

Gaussian process optimization in the bandit setting: No regret and experimental design,

Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias Seeger. https://arxiv.org/pdf/0912.3995.pdf Gaussian process optimization in the bandit setting: No regret and experimental design . International Conference on Machine Learning (ICML), 2010

work page arXiv 2010
[55]

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

William R. Thompson. https://www.jstor.org/stable/pdf/2332286.pdf On the likelihood that one unknown probability exceeds another in view of the evidence of two samples . Biometrika, 25: 0 285--294, 1933

work page arXiv 1933
[56]

https://hal.inria.fr/tel-01359757/document Bandits on graphs and structures

Michal Valko. https://hal.inria.fr/tel-01359757/document Bandits on graphs and structures . habilitation, \' E cole normale sup \' e rieure de Cachan, 2016

2016
[57]

http://researchers.lille.inria.fr/ valko/hp/serve.php?what=publications/valko2010online.pdf Online semi-supervised learning on quantized graphs

Michal Valko, Branislav Kveton, Ling Huang, and Daniel Ting. http://researchers.lille.inria.fr/ valko/hp/serve.php?what=publications/valko2010online.pdf Online semi-supervised learning on quantized graphs . In Uncertainty in Artificial Intelligence (UAI), 2010

2010
[58]

https://hal.inria.fr/hal-00826946/document Finite-time analysis of kernelised contextual bandits

Michal Valko, Nathan Korda, R \' e mi Munos, Ilias Flaounas, and Nelo Cristianini. https://hal.inria.fr/hal-00826946/document Finite-time analysis of kernelised contextual bandits . In Uncertainty in Artificial Intelligence (UAI), 2013

2013
[59]

http://proceedings.mlr.press/v32/valko14.pdf Spectral bandits for smooth graph functions

Michal Valko, R \' e mi Munos, Branislav Kveton, and Tom \' a s Koc \' a k. http://proceedings.mlr.press/v32/valko14.pdf Spectral bandits for smooth graph functions . In International Conference on Machine Learning (ICML), 2014

2014
[60]

http://www.kyb.mpg.de/fileadmin/user \_ upload/files/publications/attachments/Luxburg07 \_ tutorial \_ 4488 \ tutorial on spectral clustering

Ulrike von Luxburg. http://www.kyb.mpg.de/fileadmin/user \_ upload/files/publications/attachments/Luxburg07 \_ tutorial \_ 4488 \ tutorial on spectral clustering . Statistics and Computing, 17 0 (4): 0 395--416, 2007

2007
[61]

https://www.stat.berkeley.edu/ mjwain/stat210b/ STAT 210B A dvanced mathematical statistics

Martin Wainwright. https://www.stat.berkeley.edu/ mjwain/stat210b/ STAT 210B A dvanced mathematical statistics . Lecture notes, University of California at Berkeley, 2015

2015
[62]

http://www.icml-2011.org/papers/50 \_ icmlpaper.pdf Unimodal bandits

Jia Yuan Yu and Shie Mannor. http://www.icml-2011.org/papers/50 \_ icmlpaper.pdf Unimodal bandits . In International Conference on Machine Learning (ICML), 2011

2011
[63]

http://pages.cs.wisc.edu/ jerryzhu/pub/ssl \_ survey.pdf Semi-supervised learning literature survey

Xiaojin Zhu. http://pages.cs.wisc.edu/ jerryzhu/pub/ssl \_ survey.pdf Semi-supervised learning literature survey . Technical Report 1530, University of Wisconsin-Madison, 2008

2008
[64]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...