Spectral bandits for smooth graph functions

Branislav Kveton; Michal Valko; R\'emi Munos; Tom\'a\v{s} Koc\'ak

arxiv: 2604.18420 · v1 · submitted 2026-04-20 · 📊 stat.ML · cs.LG

Spectral bandits for smooth graph functions

Michal Valko , R\'emi Munos , Branislav Kveton , Tom\'a\v{s} Koc\'ak This is my paper

Pith reviewed 2026-05-10 03:55 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords spectral banditsgraph smoothnesseffective dimensionbandit regretcontent recommendationonline learningsemi-supervised learningregret bounds

0 comments

The pith

Smooth payoffs on graphs let bandit algorithms scale with effective dimension instead of graph size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines bandit problems in which the expected payoffs of arms are smooth functions on a given graph, a setup common in content recommendation where similar items have similar ratings. It defines an effective dimension for such functions that remains small on real-world graphs and develops two algorithms whose regret scales linearly or sublinearly with this dimension. The result is that preferences over thousands of items can be estimated well by sampling only tens of nodes. This matters because standard bandit methods would suffer regret that grows with the full number of arms, making them impractical for large graphs.

Core claim

The paper claims that in a bandit setting with graph-smooth payoffs, the cumulative regret with respect to the optimal policy scales with the effective dimension of the payoff function rather than the number of nodes, and provides two algorithms achieving linear and sublinear scaling in this dimension, validated by experiments on real recommendation data where few evaluations suffice for good estimation.

What carries the argument

The effective dimension, a measure of the complexity of smooth functions on graphs that is typically small for real-world structures and used to bound the regret of the proposed spectral bandit algorithms.

Load-bearing premise

The expected payoffs form a smooth function on the graph and the effective dimension of this function is small on real graphs.

What would settle it

Experiments on real-world graphs showing that the effective dimension grows proportionally with the number of nodes, or that the regret of the proposed algorithms scales linearly with graph size rather than the claimed dimension.

Figures

Figures reproduced from arXiv: 2604.18420 by Branislav Kveton, Michal Valko, R\'emi Munos, Tom\'a\v{s} Koc\'ak.

**Figure 2.** Figure 2: Effective dimension d in the regime T < N. The effective dimension for this data is much smaller than the ambient dimension D = N, which is 500 and 4546 respectively. The actual form of Definition 1 comes from Lemma 6 and will become apparent in Section 5. The dependence of the effective dimension on T comes from the fact, that d is related to the number of “non-negligible” dimensions characterizing the… view at source ↗

**Figure 3.** Figure 3: Cumulative regret for random graphs models Finally, using J = 1 + ⌊log2 T⌋, δ ′ = δ/(KJ), and β(δ ′ ) ≤ β(δ/(K(1 + log2 T))), we obtain the result of Theorem 2. Remark 3. If we set Λ = I in Algorithm 2 as in Remark 2, we get a new algorithm, LINEARELIMINATOR, which is a competitor to SupLinRel (Auer, 2002) and as a corollary to Theorem 2 also enjoys O˜( √ DT) upper bound on the cumulative regret. On the o… view at source ↗

**Figure 4.** Figure 4: MovieLens and Flixster: Cumulative regret for 50 randomly chosen users. Horizontal axis shows the user number. 0 10 20 30 40 50 60 70 80 90 100 0 50 100 150 time T cumulative regret J = 20 J = 200 J = 2000 J = 20 J=200 J=2000 100 101 102 103 104 computational time in seconds [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Regret and computational time with reduced basis both algorithms for T = 100. Here SPECTRALUCB suffers only about one fourth of regret over LinUCB, specifically 43.4611 vs. 133.0996 on average. 6.3. Flixster Experiments We also performed experiments on users preferences from the movie recommendation website Flixster. The social network of the users was crawled by Jamali & Ester (2010) and then clustered … view at source ↗

read the original abstract

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper defines an effective dimension for smooth functions on graphs and gives two bandit algorithms whose regret scales with that dimension rather than the full node count.

read the letter

The core contribution is a way to handle bandits where arm payoffs are smooth on a graph by introducing an effective dimension that stays small on real graphs. This lets them build algorithms with linear and sublinear dependence on that dimension, which directly targets the scaling problem in content recommendation and similar settings. The experiments back this up by showing decent preference estimates on thousands of items after evaluating only tens of nodes. That part is useful and the framing feels fresh compared to earlier graph bandit work. The smoothness assumption is stated clearly and the effective dimension is derived from it rather than fitted post hoc, so the logic holds together without obvious circularity. The main soft spot is that everything hinges on the effective dimension actually being small and on the graph capturing the right similarities; if the graph is misspecified or smoothness is weak, the regret bounds degrade and you are back to scaling with the full number of nodes. The paper does not spend much time on how to choose or validate the graph in practice, which leaves a gap for deployment. Proof details are not in the abstract but the setup looks standard enough that the derivations should be checkable. This is for people working on online learning with structure or recommendation systems who already think in terms of graphs and spectral methods. A reader who needs concrete algorithms that avoid full-graph computation will find something to use. It is worth sending to peer review because the scaling claim is concrete, the experiments are on real data, and the idea is straightforward to test or extend.

Referee Report

2 major / 3 minor

Summary. The paper studies a bandit problem where expected arm payoffs form a smooth function on a given graph, with applications to content recommendation. It introduces the notion of an effective dimension of such a function, which is claimed to be small on real-world graphs, and proposes two algorithms whose regret scales linearly or sublinearly in this dimension (rather than the total number of nodes). Experiments on a real-world recommendation task show that preferences over thousands of items can be learned from evaluations at only tens of nodes.

Significance. If the regret bounds hold under the smoothness assumption and the effective dimension is indeed small on practical graphs, the work provides a scalable approach to graph bandits that avoids poor dependence on graph size. This has potential impact for online recommendation and semi-supervised learning on graphs. The real-world experiments offer supporting evidence for practical utility when the smoothness condition is met.

major comments (2)

[Section 3 (or wherever the effective dimension is formally defined)] The definition and properties of the effective dimension (introduced to control regret scaling) are central to the main claims, but the manuscript should explicitly state the precise mathematical definition and any assumptions (e.g., on the graph Laplacian spectrum or smoothness parameter) under which it remains small for real graphs; without this, the scaling claims rest on an unverified quantity.
[Regret analysis section (likely §4 or §5)] The regret analysis for the two proposed algorithms must be checked for the precise dependence on the effective dimension; if the bounds are derived only under additional assumptions not stated in the abstract (e.g., on the decay of eigenvalues), this should be clarified as it directly affects whether the linear/sublinear scaling holds in general.

minor comments (3)

[Introduction] The introduction and abstract could more clearly distinguish the two algorithms and their respective linear vs. sublinear scaling regimes.
[Experiments] The experimental section would benefit from additional details on graph construction from the recommendation dataset and how the smoothness assumption was validated or enforced.
[Preliminaries / Problem setup] Notation for the graph, smoothness, and effective dimension should be introduced consistently and early to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive review and the recommendation for minor revision. The comments on clarifying the effective dimension and its role in the regret bounds are helpful, and we will incorporate explicit statements to address them. We respond point by point below.

read point-by-point responses

Referee: [Section 3 (or wherever the effective dimension is formally defined)] The definition and properties of the effective dimension (introduced to control regret scaling) are central to the main claims, but the manuscript should explicitly state the precise mathematical definition and any assumptions (e.g., on the graph Laplacian spectrum or smoothness parameter) under which it remains small for real graphs; without this, the scaling claims rest on an unverified quantity.

Authors: We appreciate the referee highlighting the need for greater explicitness. The effective dimension is formally introduced and defined in Section 3 in terms of the spectrum of the graph Laplacian and the smoothness parameter of the payoff function. We will revise the manuscript to state the definition more prominently at the start of Section 3, followed by a short paragraph discussing the spectral assumptions (rapid eigenvalue decay typical of real-world graphs with community structure) under which the quantity remains small. This will make the scaling claims self-contained without relying on external verification. revision: yes
Referee: [Regret analysis section (likely §4 or §5)] The regret analysis for the two proposed algorithms must be checked for the precise dependence on the effective dimension; if the bounds are derived only under additional assumptions not stated in the abstract (e.g., on the decay of eigenvalues), this should be clarified as it directly affects whether the linear/sublinear scaling holds in general.

Authors: The regret bounds in Sections 4 and 5 are derived directly from the smoothness assumption and the definition of the effective dimension; they hold in general under the problem formulation without further unstated restrictions on eigenvalue decay. The linear scaling for the first algorithm and sublinear scaling for the second follow from the analysis once the effective dimension is substituted. We will add a clarifying sentence in the regret analysis section (and a cross-reference in the abstract) to emphasize that no additional assumptions are required beyond those already stated. This ensures the dependence is transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; effective dimension derived independently from graph smoothness

full rationale

The paper defines an effective dimension for smooth graph functions as a derived spectral quantity that remains small on real-world graphs, enabling regret bounds that scale with this dimension rather than the full node count. This is not a fitted parameter renamed as a prediction, nor does any central claim reduce by construction to its inputs via self-definition or self-citation chains. The smoothness assumption and graph structure provide independent grounding, with experiments on recommendation data serving as external checks. The derivation chain is self-contained under the stated model, consistent with the abstract's presentation of the effective dimension as a non-tautological quantity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that payoffs are smooth on the graph and on the existence of a small effective dimension; no free parameters or invented entities beyond the effective dimension are stated.

axioms (1)

domain assumption Expected payoffs of arms form a smooth function on the given graph
Explicitly stated as the modeling assumption that makes the problem suitable for recommendation.

invented entities (1)

effective dimension no independent evidence
purpose: Quantity that measures the complexity of the smooth graph function and controls the regret scaling
Introduced in the paper as small on real-world graphs; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5462 in / 1208 out tokens · 30092 ms · 2026-05-10T03:55:28.819467+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Improved Algorithms for Linear Stochastic Bandits

Abbasi-Yadkori, Y , P´al, D, and Szepesv ´ari, C. Improved Algorithms for Linear Stochastic Bandits. InNeural In- formation Processing Systems (NeurIPS). 2011

work page 2011
[2]

D, Hazan, E, and Rakhlin, A

Abernethy, J. D, Hazan, E, and Rakhlin, A. Competing in the Dark: An Efficient Algorithm for Bandit Linear Op- timization. InConference on Learning Theory (COLT), 2008

work page 2008
[3]

From Bandits to Experts: A Tale of Domination and In- dependence

Alon, N, Cesa-Bianchi, N, Gentile, C, and Mansour, Y . From Bandits to Experts: A Tale of Domination and In- dependence. InNeural Information Processing Systems (NeurIPS), 2013

work page 2013
[4]

Using confidence bounds for exploitation- exploration trade-offs.Journal of Machine Learning Re- search, 3:397–422, March 2002

Auer, P. Using confidence bounds for exploitation- exploration trade-offs.Journal of Machine Learning Re- search, 3:397–422, March 2002. ISSN 1532-4435

work page 2002
[5]

UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem

Auer, P and Ortner, R. UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem. Periodica Mathematica Hungarica, 2010

work page 2010
[6]

Weighted sums of certain dependent random variables.Tohoku Mathematical Journal, 19(3):357– 367, 1967

Azuma, K. Weighted sums of certain dependent random variables.Tohoku Mathematical Journal, 19(3):357– 367, 1967. Barab´asi, A.-L and Albert, R. Emergence of scaling in ran- dom networks.Science, 286:11, 1999

work page 1967
[7]

Regularization and Semi-Supervised Learning on Large Graphs

Belkin, M, Matveeva, I, and Niyogi, P. Regularization and Semi-Supervised Learning on Large Graphs. InConfer- ence on Learning Theory (COLT), 2004

work page 2004
[8]

Manifold Regu- larization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.Journal of Machine Learning Research, 7:2399–2434, 2006

Belkin, M, Niyogi, P, and Sindhwani, V . Manifold Regu- larization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.Journal of Machine Learning Research, 7:2399–2434, 2006

work page 2006
[9]

J, and Chen, J

Billsus, D, Pazzani, M. J, and Chen, J. A learning agent for wireless news access. InIUI, pp. 33–36, 2000

work page 2000
[10]

X- armed bandits.Journal of Machine Learning Research, 12:1587–1627, 2011

Bubeck, S, Munos, R, Stoltz, G, and Szepesvari, C. X- armed bandits.Journal of Machine Learning Research, 12:1587–1627, 2011

work page 2011
[11]

Towards min- imax policies for online linear optimization with bandit feedback

Bubeck, S, Cesa-Bianchi, N, and Kakade, S. Towards min- imax policies for online linear optimization with bandit feedback. InConference on Learning Theory (COLT), 2012

work page 2012
[12]

Leverag- ing Side Observations in Stochastic Bandits

Caron, S, Kveton, B, Lelarge, M, and Bhagat, S. Leverag- ing Side Observations in Stochastic Bandits. InConfer- ence on Uncertainty in Artificial Intelligence (UAI), pp. 142–151, 2012

work page 2012
[13]

A Gang of Bandits

Cesa-Bianchi, N, Gentile, C, and Zappella, G. A Gang of Bandits. InNeural Information Processing Systems (NeurIPS), 2013

work page 2013
[14]

H, Kittur, A, Hong, J

Chau, D. H, Kittur, A, Hong, J. I, and Faloutsos, C. Apolo: making sense of large network data by combining rich user interaction and machine learning. InConference on Human Factors in Computing Systems (CHI), 2011

work page 2011
[15]

Contextual Bandits with Linear Payoff Functions

Chu, L, Li, L, Reyzin, L, and Schapire, R. Contextual Bandits with Linear Payoff Functions. InInternational Conference on Artificial Intelligence and Statistics (AIS- TATS), 2011

work page 2011
[16]

P, and Kakade, S

Dani, V , Hayes, T. P, and Kakade, S. M. Stochastic Linear Optimization under Bandit Feedback. InConference on Learning Theory (COLT), 2008

work page 2008
[17]

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Pro- cess Bandit Optimization

Desautels, T, Krause, A, and Burdick, J. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Pro- cess Bandit Optimization. InInternational Conference on Machine Learning (ICML), 2012. Graclus. Graclus, 2013. URL www.cs.utexas.edu/users/ dml/Software/graclus.html

work page 2012
[18]

A matrix factorization technique with trust propagation for recommendation in social net- works

Jamali, M and Ester, M. A matrix factorization technique with trust propagation for recommendation in social net- works. InACM Conference on Recommender Systems (RecSys). ACM, 2010

work page 2010
[19]

Cambridge Uni- versity Press, 2010

Jannach, D, Zanker, M, Felfernig, A, and Friedrich, G.Rec- ommender Systems: An Introduction. Cambridge Uni- versity Press, 2010

work page 2010
[20]

Matrix Completion from a Few Entries

Keshavan, R, Oh, S, and Montanari, A. Matrix Completion from a Few Entries. InIEEE International Symposium on Information Theory, pp. 324–328, 2009

work page 2009
[21]

Multi-armed ban- dit problems in metric spaces

Kleinberg, R, Slivkins, A, and Upfal, E. Multi-armed ban- dit problems in metric spaces. InACM Symposium on Theory of Computing (STOC), 2008

work page 2008
[22]

L, and Tolliver, D

Koutis, I, Miller, G. L, and Tolliver, D. Combinatorial pre- conditioners and multilevel solvers for problems in com- puter vision and image processing.Computer Vision and Image Understanding, 115:1638–1646, 2011

work page 2011
[23]

MovieLens 1M Dataset

Lam, S and Herlocker, J. MovieLens 1M Dataset. http://www.grouplens.org/node/12, 2012

work page 2012
[24]

Li, L, Chu, W, Langford, J, and Schapire, R. E. A Contextual-Bandit Approach to Personalized News Ar- ticle Recommendation.World Wide Web Conference (WWW), 2010

work page 2010
[25]

Birds of a Feather: Homophily in Social Networks.Annual Review of Sociology, 27:415–444, 2001

McPherson, M, Smith-Lovin, L, and Cook, J. Birds of a Feather: Homophily in Social Networks.Annual Review of Sociology, 27:415–444, 2001

work page 2001
[26]

A Variant of Azuma’s Inequality for Martin- gales with Subgaussian Tails.CoRR, abs/1110.2, 2011

Shamir, O. A Variant of Azuma’s Inequality for Martin- gales with Subgaussian Tails.CoRR, abs/1110.2, 2011

work page 2011
[27]

Contextual Bandits with Similarity Informa- tion.Conference on Learning Theory (COLT), pp

Slivkins, A. Contextual Bandits with Similarity Informa- tion.Conference on Learning Theory (COLT), pp. 1–27, 2009

work page 2009
[28]

Gaus- sian Process Optimization in the Bandit Setting: No Re- Spectral Bandits gret and Experimental Design.International Conference on Machine Learning (ICML), 2010

Srinivas, N, Krause, A, Kakade, S, and Seeger, M. Gaus- sian Process Optimization in the Bandit Setting: No Re- Spectral Bandits gret and Experimental Design.International Conference on Machine Learning (ICML), 2010

work page 2010
[29]

Finite-Time Analysis of Kernelised Contextual Bandits

Valko, M, Korda, N, Munos, R, Flaounas, I, and Cristian- ini, N. Finite-Time Analysis of Kernelised Contextual Bandits. InConference on Uncertainty in Artificial In- telligence (UAI), 2013

work page 2013
[30]

Springer, 2005

Zhang, F.The Schur complement and its applications, vol- ume 4. Springer, 2005

work page 2005
[31]

Semi-Supervised Learning Literature Survey

Zhu, X. Semi-Supervised Learning Literature Survey. Technical Report 1530, U. of Wisconsin-Madison, 2008. Spectral Bandits Supplementary Material Confidence ellipsoid Lemma 9.For any symmetric, positive semi-definite ma- trixXand any vectoru: (X+uu T )−1≺X−1 Proof.For any vectory, by Sherman–Morrison formula: − ( uTX−1y )T ( uTX−1y ) 1 +u TX−1u ≤0 y T ( X...

work page 2008

[1] [1]

Improved Algorithms for Linear Stochastic Bandits

Abbasi-Yadkori, Y , P´al, D, and Szepesv ´ari, C. Improved Algorithms for Linear Stochastic Bandits. InNeural In- formation Processing Systems (NeurIPS). 2011

work page 2011

[2] [2]

D, Hazan, E, and Rakhlin, A

Abernethy, J. D, Hazan, E, and Rakhlin, A. Competing in the Dark: An Efficient Algorithm for Bandit Linear Op- timization. InConference on Learning Theory (COLT), 2008

work page 2008

[3] [3]

From Bandits to Experts: A Tale of Domination and In- dependence

Alon, N, Cesa-Bianchi, N, Gentile, C, and Mansour, Y . From Bandits to Experts: A Tale of Domination and In- dependence. InNeural Information Processing Systems (NeurIPS), 2013

work page 2013

[4] [4]

Using confidence bounds for exploitation- exploration trade-offs.Journal of Machine Learning Re- search, 3:397–422, March 2002

Auer, P. Using confidence bounds for exploitation- exploration trade-offs.Journal of Machine Learning Re- search, 3:397–422, March 2002. ISSN 1532-4435

work page 2002

[5] [5]

UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem

Auer, P and Ortner, R. UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem. Periodica Mathematica Hungarica, 2010

work page 2010

[6] [6]

Weighted sums of certain dependent random variables.Tohoku Mathematical Journal, 19(3):357– 367, 1967

Azuma, K. Weighted sums of certain dependent random variables.Tohoku Mathematical Journal, 19(3):357– 367, 1967. Barab´asi, A.-L and Albert, R. Emergence of scaling in ran- dom networks.Science, 286:11, 1999

work page 1967

[7] [7]

Regularization and Semi-Supervised Learning on Large Graphs

Belkin, M, Matveeva, I, and Niyogi, P. Regularization and Semi-Supervised Learning on Large Graphs. InConfer- ence on Learning Theory (COLT), 2004

work page 2004

[8] [8]

Manifold Regu- larization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.Journal of Machine Learning Research, 7:2399–2434, 2006

Belkin, M, Niyogi, P, and Sindhwani, V . Manifold Regu- larization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.Journal of Machine Learning Research, 7:2399–2434, 2006

work page 2006

[9] [9]

J, and Chen, J

Billsus, D, Pazzani, M. J, and Chen, J. A learning agent for wireless news access. InIUI, pp. 33–36, 2000

work page 2000

[10] [10]

X- armed bandits.Journal of Machine Learning Research, 12:1587–1627, 2011

Bubeck, S, Munos, R, Stoltz, G, and Szepesvari, C. X- armed bandits.Journal of Machine Learning Research, 12:1587–1627, 2011

work page 2011

[11] [11]

Towards min- imax policies for online linear optimization with bandit feedback

Bubeck, S, Cesa-Bianchi, N, and Kakade, S. Towards min- imax policies for online linear optimization with bandit feedback. InConference on Learning Theory (COLT), 2012

work page 2012

[12] [12]

Leverag- ing Side Observations in Stochastic Bandits

Caron, S, Kveton, B, Lelarge, M, and Bhagat, S. Leverag- ing Side Observations in Stochastic Bandits. InConfer- ence on Uncertainty in Artificial Intelligence (UAI), pp. 142–151, 2012

work page 2012

[13] [13]

A Gang of Bandits

Cesa-Bianchi, N, Gentile, C, and Zappella, G. A Gang of Bandits. InNeural Information Processing Systems (NeurIPS), 2013

work page 2013

[14] [14]

H, Kittur, A, Hong, J

Chau, D. H, Kittur, A, Hong, J. I, and Faloutsos, C. Apolo: making sense of large network data by combining rich user interaction and machine learning. InConference on Human Factors in Computing Systems (CHI), 2011

work page 2011

[15] [15]

Contextual Bandits with Linear Payoff Functions

Chu, L, Li, L, Reyzin, L, and Schapire, R. Contextual Bandits with Linear Payoff Functions. InInternational Conference on Artificial Intelligence and Statistics (AIS- TATS), 2011

work page 2011

[16] [16]

P, and Kakade, S

Dani, V , Hayes, T. P, and Kakade, S. M. Stochastic Linear Optimization under Bandit Feedback. InConference on Learning Theory (COLT), 2008

work page 2008

[17] [17]

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Pro- cess Bandit Optimization

Desautels, T, Krause, A, and Burdick, J. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Pro- cess Bandit Optimization. InInternational Conference on Machine Learning (ICML), 2012. Graclus. Graclus, 2013. URL www.cs.utexas.edu/users/ dml/Software/graclus.html

work page 2012

[18] [18]

A matrix factorization technique with trust propagation for recommendation in social net- works

Jamali, M and Ester, M. A matrix factorization technique with trust propagation for recommendation in social net- works. InACM Conference on Recommender Systems (RecSys). ACM, 2010

work page 2010

[19] [19]

Cambridge Uni- versity Press, 2010

Jannach, D, Zanker, M, Felfernig, A, and Friedrich, G.Rec- ommender Systems: An Introduction. Cambridge Uni- versity Press, 2010

work page 2010

[20] [20]

Matrix Completion from a Few Entries

Keshavan, R, Oh, S, and Montanari, A. Matrix Completion from a Few Entries. InIEEE International Symposium on Information Theory, pp. 324–328, 2009

work page 2009

[21] [21]

Multi-armed ban- dit problems in metric spaces

Kleinberg, R, Slivkins, A, and Upfal, E. Multi-armed ban- dit problems in metric spaces. InACM Symposium on Theory of Computing (STOC), 2008

work page 2008

[22] [22]

L, and Tolliver, D

Koutis, I, Miller, G. L, and Tolliver, D. Combinatorial pre- conditioners and multilevel solvers for problems in com- puter vision and image processing.Computer Vision and Image Understanding, 115:1638–1646, 2011

work page 2011

[23] [23]

MovieLens 1M Dataset

Lam, S and Herlocker, J. MovieLens 1M Dataset. http://www.grouplens.org/node/12, 2012

work page 2012

[24] [24]

Li, L, Chu, W, Langford, J, and Schapire, R. E. A Contextual-Bandit Approach to Personalized News Ar- ticle Recommendation.World Wide Web Conference (WWW), 2010

work page 2010

[25] [25]

Birds of a Feather: Homophily in Social Networks.Annual Review of Sociology, 27:415–444, 2001

McPherson, M, Smith-Lovin, L, and Cook, J. Birds of a Feather: Homophily in Social Networks.Annual Review of Sociology, 27:415–444, 2001

work page 2001

[26] [26]

A Variant of Azuma’s Inequality for Martin- gales with Subgaussian Tails.CoRR, abs/1110.2, 2011

Shamir, O. A Variant of Azuma’s Inequality for Martin- gales with Subgaussian Tails.CoRR, abs/1110.2, 2011

work page 2011

[27] [27]

Contextual Bandits with Similarity Informa- tion.Conference on Learning Theory (COLT), pp

Slivkins, A. Contextual Bandits with Similarity Informa- tion.Conference on Learning Theory (COLT), pp. 1–27, 2009

work page 2009

[28] [28]

Gaus- sian Process Optimization in the Bandit Setting: No Re- Spectral Bandits gret and Experimental Design.International Conference on Machine Learning (ICML), 2010

Srinivas, N, Krause, A, Kakade, S, and Seeger, M. Gaus- sian Process Optimization in the Bandit Setting: No Re- Spectral Bandits gret and Experimental Design.International Conference on Machine Learning (ICML), 2010

work page 2010

[29] [29]

Finite-Time Analysis of Kernelised Contextual Bandits

Valko, M, Korda, N, Munos, R, Flaounas, I, and Cristian- ini, N. Finite-Time Analysis of Kernelised Contextual Bandits. InConference on Uncertainty in Artificial In- telligence (UAI), 2013

work page 2013

[30] [30]

Springer, 2005

Zhang, F.The Schur complement and its applications, vol- ume 4. Springer, 2005

work page 2005

[31] [31]

Semi-Supervised Learning Literature Survey

Zhu, X. Semi-Supervised Learning Literature Survey. Technical Report 1530, U. of Wisconsin-Madison, 2008. Spectral Bandits Supplementary Material Confidence ellipsoid Lemma 9.For any symmetric, positive semi-definite ma- trixXand any vectoru: (X+uu T )−1≺X−1 Proof.For any vectory, by Sherman–Morrison formula: − ( uTX−1y )T ( uTX−1y ) 1 +u TX−1u ≤0 y T ( X...

work page 2008