pith. machine review for the scientific record. sign in

arxiv: 2604.25272 · v1 · submitted 2026-04-28 · 📊 stat.ML · cs.AI· cs.LG

Recognition: unknown

Spectral bandits

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:49 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG
keywords spectral banditsgraph banditseffective dimensioncontent recommendationonline learningregret minimizationsmooth functions on graphsbandit algorithms
0
0 comments X

The pith

Bandit algorithms learn graph-smooth payoffs with regret that scales in the graph's effective dimension rather than its size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies a bandit problem where the expected payoffs of arms form a smooth function on an undirected graph, a setting that directly models content recommendation where similar items receive similar ratings. The authors introduce an effective dimension of the graph that stays small on real-world structures and give three algorithms whose regret grows only linearly or sublinearly with that dimension. Traditional methods would incur regret linear in the total number of arms, which becomes infeasible for graphs with thousands of nodes. The new algorithms therefore make online selection practical by recovering a useful estimator of all node values after evaluating only a few dozen of them.

Core claim

We study a bandit problem where the payoffs of arms are smooth on a graph. We introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.

What carries the argument

The effective dimension of the graph, a measure of the complexity of smooth functions on the graph that controls how regret grows with the number of arms.

If this is right

  • Regret remains manageable even when the graph contains thousands of nodes, provided the effective dimension stays small.
  • The same algorithms apply directly to content-based recommendation where items form an undirected graph.
  • A useful estimator of all item values can be obtained after evaluating only tens of nodes.
  • The framework transfers smoothness ideas from manifold learning into an online decision setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The effective dimension could be estimated on the fly from observed payoffs to adapt the algorithm without knowing the graph spectrum in advance.
  • The same smoothness assumption might let the approach extend to time-varying graphs where edges or labels drift slowly.
  • Connections to spectral methods in semi-supervised learning suggest the algorithms could also accelerate batch recommendation tasks.

Load-bearing premise

The expected payoffs are smooth on the graph, so neighboring nodes have similar values and the effective dimension suffices to bound the regret.

What would settle it

Running the algorithms on a graph whose payoffs violate smoothness and observing that regret then grows linearly with the total number of nodes instead of the effective dimension.

Figures

Figures reproduced from arXiv: 2604.25272 by Branislav Kveton, Michal Valko, R\'emi Munos, Shipra Agrawal, Tom\'a\v{s} Koc\'ak.

Figure 2
Figure 2. Figure 2: Difference between d and 2dold for real world datasets. From left to right: Flixster dataset with N = 972, Movielens dataset with N = 618, and LastFM dataset with N = 804. Figures 2 and 3 show how d behaves compared to 2dold on the generated and the real Flixster, Movielens, and LastFM network graphs.1 We use some of them for the experiments 1We set Λ to ΛL + λI with λ = 0.1, where ΛL is the graph Laplacia… view at source ↗
Figure 3
Figure 3. Figure 3: Difference between d and 2dold for random graphs on N = 1000 nodes. From left to right: Erd˝os-Renyi graph with the probability 0.03 of an edge, Barab´asi-Albert graph with one edge per added node, Barab´asi-Albert graph with ten edges per added node. in Section 7. The figures clearly demonstrate the gap between d and 2dold while both of the quantities are much smaller then D. In fact, effective dimension … view at source ↗
Figure 4
Figure 4. Figure 4: The dependence of cumulative regret on confidence and regularization parameters. view at source ↗
Figure 5
Figure 5. Figure 5: Cumulative regret comparison of algorithms for different underlying graphs. view at source ↗
Figure 6
Figure 6. Figure 6: Cumulative regret of SpectralTS and SpectralUCB for reward functions with different smoothness. 34 view at source ↗
Figure 7
Figure 7. Figure 7: The impact of lazy updates and Sherman-Morrison formula on running time. view at source ↗
Figure 8
Figure 8. Figure 8: Cumulative regret and running time of SpectralUCB with reduced basis. The second part of the dataset is used for parameter estimation. Similarly as for the first part, we complete ratings using low-rank factorization. The reason for using a different part of the dataset is to avoid dependencies. The last part of the dataset is used to build our similarity graph over movies. We factor the dataset in the sam… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of spectral and linear bandit algorithms for a subset of users. view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of spectral and linear bandit algorithms. view at source ↗
read the original abstract

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies a bandit problem in which arm payoffs are smooth on an undirected graph, making it applicable to content recommendation where items are nodes and expected ratings are similar for neighbors. It introduces the notion of an effective dimension d_eff (derived from the graph and smoothness) that is claimed to be small for real-world graphs, proposes three algorithms whose regret scales linearly or sublinearly in d_eff rather than the number of nodes, and reports experiments showing that user preferences over thousands of items can be estimated from only tens of node evaluations.

Significance. If the smoothness assumption holds and d_eff is both small and controls regret as claimed, the framework offers a principled way to obtain scalable regret bounds for graph-structured bandits, with direct relevance to recommendation systems. The effective-dimension concept is a potentially useful addition if rigorously tied to the Laplacian or kernel and supported by verifiable bounds. The experimental claim of learning from few evaluations would be a strong practical result if the graphs and data satisfy the smoothness needed to keep d_eff small.

major comments (2)
  1. [Experimental evaluation] The central regret-scaling claim requires that expected payoffs are sufficiently smooth on the graph so that d_eff remains small. The experimental section provides no measurement, visualization, or statistical check of smoothness (e.g., correlation of ratings between neighbors) or of the realized value of d_eff on the recommendation graphs used. Without this, the reported performance cannot be attributed to the claimed linear/sublinear scaling in d_eff rather than to other factors.
  2. [§3] §3 (algorithm descriptions) and the regret analysis: the three algorithms are stated to achieve regret linear or sublinear in d_eff, but the derivation of the effective dimension and the precise dependence of the regret bound on d_eff (including any hidden constants or assumptions on the graph spectrum) are not shown to be parameter-free or independent of the ambient dimension in a way that survives when smoothness is only approximate.
minor comments (2)
  1. [§2] Notation for the graph Laplacian and the precise definition of d_eff should be introduced earlier and used consistently; several passages refer to “the effective dimension” without an equation reference on first use.
  2. [Experimental evaluation] The experimental figures would benefit from error bars or multiple random seeds; current plots appear to show single runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of our results and analysis.

read point-by-point responses
  1. Referee: [Experimental evaluation] The central regret-scaling claim requires that expected payoffs are sufficiently smooth on the graph so that d_eff remains small. The experimental section provides no measurement, visualization, or statistical check of smoothness (e.g., correlation of ratings between neighbors) or of the realized value of d_eff on the recommendation graphs used. Without this, the reported performance cannot be attributed to the claimed linear/sublinear scaling in d_eff rather than to other factors.

    Authors: We agree that explicit verification of the smoothness assumption and computation of d_eff on the experimental graphs would better support attribution of the observed performance to the theoretical scaling. In the revised manuscript, we will add measurements of neighbor rating correlations (including visualizations and summary statistics) along with the realized values of d_eff for each recommendation graph used. These additions will confirm that d_eff remains small in the datasets considered and that the smoothness condition holds sufficiently to justify the regret bounds. revision: yes

  2. Referee: [§3] §3 (algorithm descriptions) and the regret analysis: the three algorithms are stated to achieve regret linear or sublinear in d_eff, but the derivation of the effective dimension and the precise dependence of the regret bound on d_eff (including any hidden constants or assumptions on the graph spectrum) are not shown to be parameter-free or independent of the ambient dimension in a way that survives when smoothness is only approximate.

    Authors: The effective dimension is defined via the spectrum of the normalized graph Laplacian under the smoothness prior, specifically d_eff = sum_{i=1}^n 1/(1 + mu * lambda_i) for smoothness parameter mu. The regret bounds for the three algorithms are derived by reducing the problem to an effective subspace of dimension d_eff, yielding bounds that scale as O(d_eff sqrt(T log T)) or better and are independent of the ambient dimension n. We will expand the algorithm descriptions in §3 and the regret analysis to include the complete derivation, explicit dependence on d_eff, discussion of constants, and an extension to approximate smoothness that adds a controlled bias term. This will clarify that the bounds remain valid and parameter-free in the relevant sense when the smoothness assumption holds approximately. revision: yes

Circularity Check

0 steps flagged

No circularity: effective dimension defined from graph and smoothness premise

full rationale

The paper states the smoothness assumption on payoffs upfront as a modeling premise for the bandit problem on graphs, then defines effective dimension as a graph property that is small under this assumption. Algorithms are constructed to achieve regret scaling in this dimension, and experiments report empirical performance on recommendation data. No step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation, or definitional tautology; the central claims remain independent of their inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption of graph smoothness and the newly introduced effective dimension; no free parameters or invented entities with independent evidence are described in the abstract.

axioms (1)
  • domain assumption Payoffs of arms are smooth on the graph (expected ratings similar for neighboring nodes)
    Explicitly stated as the framework for the bandit problem and content recommendation application.
invented entities (1)
  • effective dimension no independent evidence
    purpose: Measure of graph complexity that controls algorithm scaling and regret
    New notion introduced in the paper and claimed to be small for real-world graphs.

pith-pipeline@v0.9.0 · 5470 in / 1222 out tokens · 68749 ms · 2026-05-07T14:49:40.909654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 11 canonical work pages

  1. [1]

    https://papers.nips.cc/paper/4417-improved-algorithms-for-linear-stochastic-bandits.pdf Improved algorithms for linear stochastic bandits

    Yasin Abbasi-Yadkori, D \' a vid P \' a l, and Csaba Szepesv \' a ri. https://papers.nips.cc/paper/4417-improved-algorithms-for-linear-stochastic-bandits.pdf Improved algorithms for linear stochastic bandits . In Neural Information Processing Systems (NeurIPS), 2011

  2. [2]

    http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf Linear Thompson sampling revisited

    Marc Abeille and Alessandro Lazaric. http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf Linear Thompson sampling revisited . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017

  3. [3]

    http://web.eecs.umich.edu/ jabernet/123-Abernethy.pdf Competing in the dark: An efficient algorithm for bandit linear optimization

    Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. http://web.eecs.umich.edu/ jabernet/123-Abernethy.pdf Competing in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory (COLT), 2008

  4. [4]

    http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf Analysis of Thompson sampling for the multi-armed bandit problem

    Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf Analysis of Thompson sampling for the multi-armed bandit problem . In Conference on Learning Theory (COLT), 2012

  5. [5]

    http://proceedings.mlr.press/v31/agrawal13a.pdf Further optimal regret bounds for Thompson sampling

    Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v31/agrawal13a.pdf Further optimal regret bounds for Thompson sampling . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2013 a

  6. [6]

    http://proceedings.mlr.press/v28/agrawal13.pdf Thompson sampling for contextual bandits with linear payoffs

    Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v28/agrawal13.pdf Thompson sampling for contextual bandits with linear payoffs . In International Conference on Machine Learning (ICML), 2013 b

  7. [7]

    https://papers.nips.cc/paper/4908-from-bandits-to-experts-a-tale-of-domination-and-independence.pdf From bandits to experts: A tale of domination and independence

    Noga Alon, Nicol \` o Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. https://papers.nips.cc/paper/4908-from-bandits-to-experts-a-tale-of-domination-and-independence.pdf From bandits to experts: A tale of domination and independence . In Neural Information Processing Systems (NeurIPS), 2013

  8. [8]

    http://proceedings.mlr.press/v40/Alon15.pdf Online learning with feedback graphs: Beyond bandits

    Noga Alon, Nicol \` o Cesa-Bianchi, Ofer Dekel, and Tomer Koren. http://proceedings.mlr.press/v40/Alon15.pdf Online learning with feedback graphs: Beyond bandits . In Conference on Learning Theory (COLT), 2015

  9. [9]

    https://arxiv.org/abs/1409.8428 Nonstochastic multi-armed bandits with graph-structured feedback

    Noga Alon, Nicol \` o Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, and Ohad Shamir. https://arxiv.org/abs/1409.8428 Nonstochastic multi-armed bandits with graph-structured feedback . SIAM Journal on Computing, 46 0 (6): 0 1785--1826, 2017

  10. [10]

    http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf Using confidence bounds for exploitation-exploration trade-offs

    Peter Auer. http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf Using confidence bounds for exploitation-exploration trade-offs . Journal of Machine Learning Research, 3: 0 397--422, 2002

  11. [11]

    http://personal.unileoben.ac.at/rortner/Pubs/UCBRev.pdf revisited: Improved regret bounds for the stochastic multi-armed bandit problem

    Peter Auer and Ronald Ortner. http://personal.unileoben.ac.at/rortner/Pubs/UCBRev.pdf revisited: Improved regret bounds for the stochastic multi-armed bandit problem . Periodica Mathematica Hungarica, 2010

  12. [12]

    The nonstochastic multiarmed bandit problem,

    Peter Auer, Nicol \` o Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. https://epubs.siam.org/doi/pdf/10.1137/S0097539701398375 The nonstochastic multi-armed bandit problem . Journal on Computing, 32 0 (1): 0 48--77, 2002

  13. [13]

    http://people.cs.uchicago.edu/ niyogi/papersps/reg \_ colt.pdf Regularization and semi-supervised learning on large graphs

    Mikhail Belkin, Irina Matveeva, and Partha Niyogi. http://people.cs.uchicago.edu/ niyogi/papersps/reg \_ colt.pdf Regularization and semi-supervised learning on large graphs . In Conference on Learning Theory (COLT), 2004

  14. [14]

    http://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Manifold regularization: A geometric framework for learning from labeled and unlabeled examples

    Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. http://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Manifold regularization: A geometric framework for learning from labeled and unlabeled examples . Journal of Machine Learning Research, 7: 0 2399--2434, 2006

  15. [15]

    Pazzani, and James Chen

    Daniel Billsus, Michael J. Pazzani, and James Chen. https://www.ics.uci.edu/ pazzani/Publications/billsuspazzanichen.pdf A learning agent for wireless news access . In International Conference on Intelligent User Interfaces, 2000

  16. [16]

    http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf -armed bandits

    S \' e bastien Bubeck, R \' e mi Munos, Gilles Stoltz, and Csaba Szepesv \' a ri. http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf -armed bandits . Journal of Machine Learning Research, 12: 0 1587--1627, 2011

  17. [17]

    S \' e bastien Bubeck, Nicol \` o Cesa-Bianchi, and Sham M. Kakade. http://proceedings.mlr.press/v23/bubeck12a/bubeck12a.pdf Towards minimax policies for online linear optimization with bandit feedback . In Conference on Learning Theory (COLT), 2012

  18. [18]

    Swapna Buccapatnam, Atilla Eryilmaz, and Ness B. Shroff. https://www.orie.cornell.edu/orie/research/groups/multheavytail/upload/mabSigfinal.pdf Stochastic bandits with side observations on networks . In International Conference on Measurement and Modeling of Computer Systems, 2014

  19. [19]

    https://arxiv.org/pdf/1210.4839.pdf Leveraging side observations in stochastic bandits

    St \' e phane Caron, Branislav Kveton, Marc Lelarge, and Smriti Bhagat. https://arxiv.org/pdf/1210.4839.pdf Leveraging side observations in stochastic bandits. In Uncertainty in Artificial Intelligence (UAI), 2012

  20. [20]

    http://www.ii.uni.wroc.pl/ lukstafi/pmwiki/uploads/AGT/Prediction \_ Learning \_ and \_ Games.pdf Prediction, learning, and games

    Nicol \` o Cesa-Bianchi and G \' a bor Lugosi. http://www.ii.uni.wroc.pl/ lukstafi/pmwiki/uploads/AGT/Prediction \_ Learning \_ and \_ Games.pdf Prediction, learning, and games . Cambridge University Press, 2006

  21. [21]

    https://papers.nips.cc/paper/5006-a-gang-of-bandits.pdf A gang of bandits

    Nicol \` o Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. https://papers.nips.cc/paper/5006-a-gang-of-bandits.pdf A gang of bandits . In Neural Information Processing Systems (NeurIPS), 2013

  22. [22]

    https://arxiv.org/pdf/1605.08722.pdf An empirical evaluation of Thompson sampling

    Olivier Chapelle and Lihong Li. https://arxiv.org/pdf/1605.08722.pdf An empirical evaluation of Thompson sampling . In Neural Information Processing Systems (NeurIPS). 2011

  23. [23]

    Hong, and Christos Faloutsos

    Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. https://www.cc.gatech.edu/ dchau/papers/11-chi-apolo.pdf Apolo: Making sense of large network data by combining rich user interaction and machine learning . In Conference on Human Factors in Computing Systems, 2011

  24. [24]

    http://proceedings.mlr.press/v15/chu11a/chu11a.pdf Contextual bandits with linear payoff functions

    Lei Chu, Lihong Li, Lev Reyzin, and Robert E Schapire. http://proceedings.mlr.press/v15/chu11a/chu11a.pdf Contextual bandits with linear payoff functions . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011

  25. [25]

    http://proceedings.mlr.press/v32/combes14.pdf Unimodal bandits: Regret lower bounds and optimal algorithms

    Richard Combes and Alexandre Prouti \` e re. http://proceedings.mlr.press/v32/combes14.pdf Unimodal bandits: Regret lower bounds and optimal algorithms . In International Conference on Machine Learning (ICML), 2014

  26. [26]

    https://repository.upenn.edu/cgi/viewcontent.cgi?article=1501 & context=statistics \_ papers Stochastic linear optimization under bandit feedback

    Varsha Dani, Thomas P Hayes, and Sham M Kakade. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1501 & context=statistics \_ papers Stochastic linear optimization under bandit feedback . In Conference on Learning Theory (COLT), 2008

  27. [27]

    https://icml.cc/2012//papers/602.pdf Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization

    Thomas Desautels, Andreas Krause, and Joel Burdick. https://icml.cc/2012//papers/602.pdf Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization . In International Conference on Machine Learning (ICML), 2012

  28. [28]

    Meng Fang and Dacheng Tao. http://delivery.acm.org/10.1145/2630000/2623672/p1106-fang.pdf?ip=193.49.212.233 & id=2623672 & acc=ACTIVE SERVICE & key=7EBF6E77E86B478F.5C2A4B72BE2A7DDF.4D4702B0C3E38B35.4D4702B0C3E38B35 & \_ \_ acm \_ \_ =1528847270 \_ 23b4902004322713593e1389d42ae48c Networked bandits with disjoint linear payoffs . In International Conferenc...

  29. [29]

    http://proceedings.mlr.press/v32/gentile14.pdf Online clustering of bandits

    Claudio Gentile, Shuai Li, and Giovanni Zappella. http://proceedings.mlr.press/v32/gentile14.pdf Online clustering of bandits . In International Conference on Machine Learning (ICML), 2014

  30. [30]

    http://proceedings.mlr.press/v70/gentile17a/gentile17a.pdf On context-dependent clustering of bandits

    Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Giovanni Zappella, and Evans Etrue. http://proceedings.mlr.press/v70/gentile17a/gentile17a.pdf On context-dependent clustering of bandits . In International Conference on Machine Learning (ICML), 2017

  31. [31]

    http://www.cs.utexas.edu/users/dml/software/graclus.html http://www.cs.utexas.edu/users/dml/Software/graclus.html

    Graclus. http://www.cs.utexas.edu/users/dml/software/graclus.html http://www.cs.utexas.edu/users/dml/Software/graclus.html. University of Texas , 2013

  32. [32]

    https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7023409 Online spectral learning on a graph with bandit feedback

    Quanquan Gu and Jiawei Han. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7023409 Online spectral learning on a graph with bandit feedback . In International Conference on Data Mining, 2014

  33. [33]

    http://proceedings.mlr.press/v37/hanawal15.pdf Cheap bandits

    Manjesh Hanawal, Venkatesh Saligrama, Michal Valko, and R \' e mi Munos. http://proceedings.mlr.press/v37/hanawal15.pdf Cheap bandits . In International Conference on Machine Learning (ICML), 2015

  34. [34]

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.691 & rep=rep1 & type=pdf A matrix factorization technique with trust propagation for recommendation in social networks

    Mohsen Jamali and Martin Ester. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.691 & rep=rep1 & type=pdf A matrix factorization technique with trust propagation for recommendation in social networks . In Conference on Recommender systems, 2010

  35. [35]

    www.scholat.com/teamwork/teamworkdownloadscholar.html?id=542 & teamId=316 Recommender systems: An introduction

    Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. www.scholat.com/teamwork/teamworkdownloadscholar.html?id=542 & teamId=316 Recommender systems: An introduction . Cambridge University Press, 2010

  36. [36]

    https://arxiv.org/pdf/1205.4217.pdf Thompson sampling: An asymptotically optimal finite-time analysis

    Emilie Kaufmann, Nathaniel Korda, and R \' e mi Munos. https://arxiv.org/pdf/1205.4217.pdf Thompson sampling: An asymptotically optimal finite-time analysis . Algorithmic Learning Theory, 2012

  37. [37]

    https://arxiv.org/pdf/0901.3150.pdf Matrix completion from a few entries

    Raghunandan Keshavan, Sewoong Oh, and Andrea Montanari. https://arxiv.org/pdf/0901.3150.pdf Matrix completion from a few entries . In International Symposium on Information Theory, 2009

  38. [38]

    https://arxiv.org/pdf/0809.4882.pdf Multi-armed bandit problems in metric spaces

    Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. https://arxiv.org/pdf/0809.4882.pdf Multi-armed bandit problems in metric spaces . In Symposium on Theory of Computing (STOC), 2008

  39. [39]

    Tom \' a s Koc \' a k, Gergely Neu, Michal Valko, and R \' e mi Munos. https://papers.nips.cc/paper/5462-efficient-learning-by-implicit-exploration-in-bandit-problems-with-side-observations.pdf Efficient learning by implicit exploration in bandit problems with side observations . In Neural Information Processing Systems (NeurIPS), 2014 a

  40. [40]

    https://hal.inria.fr/hal-00981575v2/document Spectral Thompson sampling

    Tom \' a s Koc \' a k, Michal Valko, R \' e mi Munos, and Shipra Agrawal. https://hal.inria.fr/hal-00981575v2/document Spectral Thompson sampling . In AAAI Conference on Artificial Intelligence (AAAI), 2014 b

  41. [41]

    http://proceedings.mlr.press/v51/kocak16-supp.pdf Online learning with noisy side observations

    Tom \' a s Koc \' a k, Gergely Neu, and Michal Valko. http://proceedings.mlr.press/v51/kocak16-supp.pdf Online learning with noisy side observations . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2016 a

  42. [42]

    https://hal.inria.fr/hal-01320588/document Online learning with Erd os-R \' e nyi side-observation graphs

    Tom \' a s Koc \' a k, Gergely Neu, and Michal Valko. https://hal.inria.fr/hal-01320588/document Online learning with Erd os-R \' e nyi side-observation graphs . In Uncertainty in Artificial Intelligence (UAI), 2016 b

  43. [43]

    http://proceedings.mlr.press/v48/korda16.pdf Distributed clustering of linear bandits in peer to peer networks

    Nathan Korda, Bal \' a zs Sz \" o r \' e nyi, and Shuai Li. http://proceedings.mlr.press/v48/korda16.pdf Distributed clustering of linear bandits in peer to peer networks . In International Conference on Machine Learning (ICML), 2016

  44. [44]

    Miller, and David Tolliver

    Ioannis Koutis, Gary L. Miller, and David Tolliver. http://www.cs.cmu.edu/ ./jkoutis/papers/cviu \_ preprint.pdf Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing . Computer Vision and Image Understanding, 115 0 (12): 0 1638--1646, 2011

  45. [45]

    http://www.grouplens.org/node/12 http://www.grouplens.org/node/12

    Shyong Lam and Jon Herlocker. http://www.grouplens.org/node/12 http://www.grouplens.org/node/12. MovieLens 1M dataset , 2012

  46. [46]

    Schapire

    Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. http://rob.schapire.net/papers/www10.pdf A contextual-bandit approach to personalized news article recommendation . International World Wide Web Conference, 2010

  47. [47]

    https://arxiv.org/pdf/1502.03473.pdf Collaborative filtering bandits

    Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. https://arxiv.org/pdf/1502.03473.pdf Collaborative filtering bandits . In Conference on Research and Development in Information Retrieval, 2016

  48. [48]

    https://pdfs.semanticscholar.org/f72b/71c747d2f487e8c0ade09f4d31e4ad2c0185.pdf Active search and bandits on graphs using sigma-optimality

    Yifei Ma, Tzu-Kuo Huang, and Jeff Schneider. https://pdfs.semanticscholar.org/f72b/71c747d2f487e8c0ade09f4d31e4ad2c0185.pdf Active search and bandits on graphs using sigma-optimality . In Uncertainty in Artificial Intelligence (UAI), 2015

  49. [49]

    https://papers.nips.cc/paper/4366-from-bandits-to-experts-on-the-value-of-side-observations.pdf From bandits to experts: On the value of side-observations

    Shie Mannor and Ohad Shamir. https://papers.nips.cc/paper/4366-from-bandits-to-experts-on-the-value-of-side-observations.pdf From bandits to experts: On the value of side-observations . In Neural Information Processing Systems (NeurIPS), 2011

  50. [50]

    May, Nathaniel Korda, Anthony Lee, and David S

    Benedict C. May, Nathaniel Korda, Anthony Lee, and David S. Leslie. http://www.jmlr.org/papers/volume13/may12a/may12a.pdf Optimistic Bayesian sampling in contextual-bandit problems . Journal of Machine Learning Research, 13 0 (1): 0 2069--2106, 2012

  51. [51]

    http://aris.ss.uci.edu/ lin/52.pdf Birds of a feather: Homophily in social networks

    Miller McPherson, Lynn Smith-Lovin, and James Cook. http://aris.ss.uci.edu/ lin/52.pdf Birds of a feather: Homophily in social networks . Annual Review of Sociology, 27: 0 415--444, 2001

  52. [52]

    Narang, Akshay Gadde, and Antonio Ortega

    Sunil K. Narang, Akshay Gadde, and Antonio Ortega. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.650.2525 & rep=rep1 & type=pdf Signal processing techniques for interpolation in graph structured data . In International Conference on Acoustics, Speech and Signal Processing, 2013

  53. [53]

    http://proceedings.mlr.press/v19/slivkins11a/slivkins11a.pdf Contextual bandits with similarity information

    Aleksandrs Slivkins. http://proceedings.mlr.press/v19/slivkins11a/slivkins11a.pdf Contextual bandits with similarity information . In Conference on Learning Theory (COLT), 2009

  54. [54]

    Gaussian process optimization in the bandit setting: No regret and experimental design,

    Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias Seeger. https://arxiv.org/pdf/0912.3995.pdf Gaussian process optimization in the bandit setting: No regret and experimental design . International Conference on Machine Learning (ICML), 2010

  55. [55]

    On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

    William R. Thompson. https://www.jstor.org/stable/pdf/2332286.pdf On the likelihood that one unknown probability exceeds another in view of the evidence of two samples . Biometrika, 25: 0 285--294, 1933

  56. [56]

    https://hal.inria.fr/tel-01359757/document Bandits on graphs and structures

    Michal Valko. https://hal.inria.fr/tel-01359757/document Bandits on graphs and structures . habilitation, \' E cole normale sup \' e rieure de Cachan, 2016

  57. [57]

    http://researchers.lille.inria.fr/ valko/hp/serve.php?what=publications/valko2010online.pdf Online semi-supervised learning on quantized graphs

    Michal Valko, Branislav Kveton, Ling Huang, and Daniel Ting. http://researchers.lille.inria.fr/ valko/hp/serve.php?what=publications/valko2010online.pdf Online semi-supervised learning on quantized graphs . In Uncertainty in Artificial Intelligence (UAI), 2010

  58. [58]

    https://hal.inria.fr/hal-00826946/document Finite-time analysis of kernelised contextual bandits

    Michal Valko, Nathan Korda, R \' e mi Munos, Ilias Flaounas, and Nelo Cristianini. https://hal.inria.fr/hal-00826946/document Finite-time analysis of kernelised contextual bandits . In Uncertainty in Artificial Intelligence (UAI), 2013

  59. [59]

    http://proceedings.mlr.press/v32/valko14.pdf Spectral bandits for smooth graph functions

    Michal Valko, R \' e mi Munos, Branislav Kveton, and Tom \' a s Koc \' a k. http://proceedings.mlr.press/v32/valko14.pdf Spectral bandits for smooth graph functions . In International Conference on Machine Learning (ICML), 2014

  60. [60]

    http://www.kyb.mpg.de/fileadmin/user \_ upload/files/publications/attachments/Luxburg07 \_ tutorial \_ 4488 \ tutorial on spectral clustering

    Ulrike von Luxburg. http://www.kyb.mpg.de/fileadmin/user \_ upload/files/publications/attachments/Luxburg07 \_ tutorial \_ 4488 \ tutorial on spectral clustering . Statistics and Computing, 17 0 (4): 0 395--416, 2007

  61. [61]

    https://www.stat.berkeley.edu/ mjwain/stat210b/ STAT 210B A dvanced mathematical statistics

    Martin Wainwright. https://www.stat.berkeley.edu/ mjwain/stat210b/ STAT 210B A dvanced mathematical statistics . Lecture notes, University of California at Berkeley, 2015

  62. [62]

    http://www.icml-2011.org/papers/50 \_ icmlpaper.pdf Unimodal bandits

    Jia Yuan Yu and Shie Mannor. http://www.icml-2011.org/papers/50 \_ icmlpaper.pdf Unimodal bandits . In International Conference on Machine Learning (ICML), 2011

  63. [63]

    http://pages.cs.wisc.edu/ jerryzhu/pub/ssl \_ survey.pdf Semi-supervised learning literature survey

    Xiaojin Zhu. http://pages.cs.wisc.edu/ jerryzhu/pub/ssl \_ survey.pdf Semi-supervised learning literature survey . Technical Report 1530, University of Wisconsin-Madison, 2008

  64. [64]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...