Recognition: unknown
Spectral bandits
Pith reviewed 2026-05-07 14:49 UTC · model grok-4.3
The pith
Bandit algorithms learn graph-smooth payoffs with regret that scales in the graph's effective dimension rather than its size.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study a bandit problem where the payoffs of arms are smooth on a graph. We introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.
What carries the argument
The effective dimension of the graph, a measure of the complexity of smooth functions on the graph that controls how regret grows with the number of arms.
If this is right
- Regret remains manageable even when the graph contains thousands of nodes, provided the effective dimension stays small.
- The same algorithms apply directly to content-based recommendation where items form an undirected graph.
- A useful estimator of all item values can be obtained after evaluating only tens of nodes.
- The framework transfers smoothness ideas from manifold learning into an online decision setting.
Where Pith is reading between the lines
- The effective dimension could be estimated on the fly from observed payoffs to adapt the algorithm without knowing the graph spectrum in advance.
- The same smoothness assumption might let the approach extend to time-varying graphs where edges or labels drift slowly.
- Connections to spectral methods in semi-supervised learning suggest the algorithms could also accelerate batch recommendation tasks.
Load-bearing premise
The expected payoffs are smooth on the graph, so neighboring nodes have similar values and the effective dimension suffices to bound the regret.
What would settle it
Running the algorithms on a graph whose payoffs violate smoothness and observing that regret then grows linearly with the total number of nodes instead of the effective dimension.
Figures
read the original abstract
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies a bandit problem in which arm payoffs are smooth on an undirected graph, making it applicable to content recommendation where items are nodes and expected ratings are similar for neighbors. It introduces the notion of an effective dimension d_eff (derived from the graph and smoothness) that is claimed to be small for real-world graphs, proposes three algorithms whose regret scales linearly or sublinearly in d_eff rather than the number of nodes, and reports experiments showing that user preferences over thousands of items can be estimated from only tens of node evaluations.
Significance. If the smoothness assumption holds and d_eff is both small and controls regret as claimed, the framework offers a principled way to obtain scalable regret bounds for graph-structured bandits, with direct relevance to recommendation systems. The effective-dimension concept is a potentially useful addition if rigorously tied to the Laplacian or kernel and supported by verifiable bounds. The experimental claim of learning from few evaluations would be a strong practical result if the graphs and data satisfy the smoothness needed to keep d_eff small.
major comments (2)
- [Experimental evaluation] The central regret-scaling claim requires that expected payoffs are sufficiently smooth on the graph so that d_eff remains small. The experimental section provides no measurement, visualization, or statistical check of smoothness (e.g., correlation of ratings between neighbors) or of the realized value of d_eff on the recommendation graphs used. Without this, the reported performance cannot be attributed to the claimed linear/sublinear scaling in d_eff rather than to other factors.
- [§3] §3 (algorithm descriptions) and the regret analysis: the three algorithms are stated to achieve regret linear or sublinear in d_eff, but the derivation of the effective dimension and the precise dependence of the regret bound on d_eff (including any hidden constants or assumptions on the graph spectrum) are not shown to be parameter-free or independent of the ambient dimension in a way that survives when smoothness is only approximate.
minor comments (2)
- [§2] Notation for the graph Laplacian and the precise definition of d_eff should be introduced earlier and used consistently; several passages refer to “the effective dimension” without an equation reference on first use.
- [Experimental evaluation] The experimental figures would benefit from error bars or multiple random seeds; current plots appear to show single runs.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of our results and analysis.
read point-by-point responses
-
Referee: [Experimental evaluation] The central regret-scaling claim requires that expected payoffs are sufficiently smooth on the graph so that d_eff remains small. The experimental section provides no measurement, visualization, or statistical check of smoothness (e.g., correlation of ratings between neighbors) or of the realized value of d_eff on the recommendation graphs used. Without this, the reported performance cannot be attributed to the claimed linear/sublinear scaling in d_eff rather than to other factors.
Authors: We agree that explicit verification of the smoothness assumption and computation of d_eff on the experimental graphs would better support attribution of the observed performance to the theoretical scaling. In the revised manuscript, we will add measurements of neighbor rating correlations (including visualizations and summary statistics) along with the realized values of d_eff for each recommendation graph used. These additions will confirm that d_eff remains small in the datasets considered and that the smoothness condition holds sufficiently to justify the regret bounds. revision: yes
-
Referee: [§3] §3 (algorithm descriptions) and the regret analysis: the three algorithms are stated to achieve regret linear or sublinear in d_eff, but the derivation of the effective dimension and the precise dependence of the regret bound on d_eff (including any hidden constants or assumptions on the graph spectrum) are not shown to be parameter-free or independent of the ambient dimension in a way that survives when smoothness is only approximate.
Authors: The effective dimension is defined via the spectrum of the normalized graph Laplacian under the smoothness prior, specifically d_eff = sum_{i=1}^n 1/(1 + mu * lambda_i) for smoothness parameter mu. The regret bounds for the three algorithms are derived by reducing the problem to an effective subspace of dimension d_eff, yielding bounds that scale as O(d_eff sqrt(T log T)) or better and are independent of the ambient dimension n. We will expand the algorithm descriptions in §3 and the regret analysis to include the complete derivation, explicit dependence on d_eff, discussion of constants, and an extension to approximate smoothness that adds a controlled bias term. This will clarify that the bounds remain valid and parameter-free in the relevant sense when the smoothness assumption holds approximately. revision: yes
Circularity Check
No circularity: effective dimension defined from graph and smoothness premise
full rationale
The paper states the smoothness assumption on payoffs upfront as a modeling premise for the bandit problem on graphs, then defines effective dimension as a graph property that is small under this assumption. Algorithms are constructed to achieve regret scaling in this dimension, and experiments report empirical performance on recommendation data. No step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation, or definitional tautology; the central claims remain independent of their inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Payoffs of arms are smooth on the graph (expected ratings similar for neighboring nodes)
invented entities (1)
-
effective dimension
no independent evidence
Reference graph
Works this paper leans on
-
[1]
https://papers.nips.cc/paper/4417-improved-algorithms-for-linear-stochastic-bandits.pdf Improved algorithms for linear stochastic bandits
Yasin Abbasi-Yadkori, D \' a vid P \' a l, and Csaba Szepesv \' a ri. https://papers.nips.cc/paper/4417-improved-algorithms-for-linear-stochastic-bandits.pdf Improved algorithms for linear stochastic bandits . In Neural Information Processing Systems (NeurIPS), 2011
2011
-
[2]
http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf Linear Thompson sampling revisited
Marc Abeille and Alessandro Lazaric. http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf Linear Thompson sampling revisited . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017
2017
-
[3]
http://web.eecs.umich.edu/ jabernet/123-Abernethy.pdf Competing in the dark: An efficient algorithm for bandit linear optimization
Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. http://web.eecs.umich.edu/ jabernet/123-Abernethy.pdf Competing in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory (COLT), 2008
2008
-
[4]
http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf Analysis of Thompson sampling for the multi-armed bandit problem
Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf Analysis of Thompson sampling for the multi-armed bandit problem . In Conference on Learning Theory (COLT), 2012
2012
-
[5]
http://proceedings.mlr.press/v31/agrawal13a.pdf Further optimal regret bounds for Thompson sampling
Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v31/agrawal13a.pdf Further optimal regret bounds for Thompson sampling . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2013 a
2013
-
[6]
http://proceedings.mlr.press/v28/agrawal13.pdf Thompson sampling for contextual bandits with linear payoffs
Shipra Agrawal and Navin Goyal. http://proceedings.mlr.press/v28/agrawal13.pdf Thompson sampling for contextual bandits with linear payoffs . In International Conference on Machine Learning (ICML), 2013 b
2013
-
[7]
https://papers.nips.cc/paper/4908-from-bandits-to-experts-a-tale-of-domination-and-independence.pdf From bandits to experts: A tale of domination and independence
Noga Alon, Nicol \` o Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. https://papers.nips.cc/paper/4908-from-bandits-to-experts-a-tale-of-domination-and-independence.pdf From bandits to experts: A tale of domination and independence . In Neural Information Processing Systems (NeurIPS), 2013
2013
-
[8]
http://proceedings.mlr.press/v40/Alon15.pdf Online learning with feedback graphs: Beyond bandits
Noga Alon, Nicol \` o Cesa-Bianchi, Ofer Dekel, and Tomer Koren. http://proceedings.mlr.press/v40/Alon15.pdf Online learning with feedback graphs: Beyond bandits . In Conference on Learning Theory (COLT), 2015
2015
-
[9]
https://arxiv.org/abs/1409.8428 Nonstochastic multi-armed bandits with graph-structured feedback
Noga Alon, Nicol \` o Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, and Ohad Shamir. https://arxiv.org/abs/1409.8428 Nonstochastic multi-armed bandits with graph-structured feedback . SIAM Journal on Computing, 46 0 (6): 0 1785--1826, 2017
-
[10]
http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf Using confidence bounds for exploitation-exploration trade-offs
Peter Auer. http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf Using confidence bounds for exploitation-exploration trade-offs . Journal of Machine Learning Research, 3: 0 397--422, 2002
2002
-
[11]
http://personal.unileoben.ac.at/rortner/Pubs/UCBRev.pdf revisited: Improved regret bounds for the stochastic multi-armed bandit problem
Peter Auer and Ronald Ortner. http://personal.unileoben.ac.at/rortner/Pubs/UCBRev.pdf revisited: Improved regret bounds for the stochastic multi-armed bandit problem . Periodica Mathematica Hungarica, 2010
2010
-
[12]
The nonstochastic multiarmed bandit problem,
Peter Auer, Nicol \` o Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. https://epubs.siam.org/doi/pdf/10.1137/S0097539701398375 The nonstochastic multi-armed bandit problem . Journal on Computing, 32 0 (1): 0 48--77, 2002
-
[13]
http://people.cs.uchicago.edu/ niyogi/papersps/reg \_ colt.pdf Regularization and semi-supervised learning on large graphs
Mikhail Belkin, Irina Matveeva, and Partha Niyogi. http://people.cs.uchicago.edu/ niyogi/papersps/reg \_ colt.pdf Regularization and semi-supervised learning on large graphs . In Conference on Learning Theory (COLT), 2004
2004
-
[14]
http://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Manifold regularization: A geometric framework for learning from labeled and unlabeled examples
Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. http://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Manifold regularization: A geometric framework for learning from labeled and unlabeled examples . Journal of Machine Learning Research, 7: 0 2399--2434, 2006
2006
-
[15]
Pazzani, and James Chen
Daniel Billsus, Michael J. Pazzani, and James Chen. https://www.ics.uci.edu/ pazzani/Publications/billsuspazzanichen.pdf A learning agent for wireless news access . In International Conference on Intelligent User Interfaces, 2000
2000
-
[16]
http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf -armed bandits
S \' e bastien Bubeck, R \' e mi Munos, Gilles Stoltz, and Csaba Szepesv \' a ri. http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf -armed bandits . Journal of Machine Learning Research, 12: 0 1587--1627, 2011
2011
-
[17]
S \' e bastien Bubeck, Nicol \` o Cesa-Bianchi, and Sham M. Kakade. http://proceedings.mlr.press/v23/bubeck12a/bubeck12a.pdf Towards minimax policies for online linear optimization with bandit feedback . In Conference on Learning Theory (COLT), 2012
2012
-
[18]
Swapna Buccapatnam, Atilla Eryilmaz, and Ness B. Shroff. https://www.orie.cornell.edu/orie/research/groups/multheavytail/upload/mabSigfinal.pdf Stochastic bandits with side observations on networks . In International Conference on Measurement and Modeling of Computer Systems, 2014
2014
-
[19]
https://arxiv.org/pdf/1210.4839.pdf Leveraging side observations in stochastic bandits
St \' e phane Caron, Branislav Kveton, Marc Lelarge, and Smriti Bhagat. https://arxiv.org/pdf/1210.4839.pdf Leveraging side observations in stochastic bandits. In Uncertainty in Artificial Intelligence (UAI), 2012
-
[20]
http://www.ii.uni.wroc.pl/ lukstafi/pmwiki/uploads/AGT/Prediction \_ Learning \_ and \_ Games.pdf Prediction, learning, and games
Nicol \` o Cesa-Bianchi and G \' a bor Lugosi. http://www.ii.uni.wroc.pl/ lukstafi/pmwiki/uploads/AGT/Prediction \_ Learning \_ and \_ Games.pdf Prediction, learning, and games . Cambridge University Press, 2006
2006
-
[21]
https://papers.nips.cc/paper/5006-a-gang-of-bandits.pdf A gang of bandits
Nicol \` o Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. https://papers.nips.cc/paper/5006-a-gang-of-bandits.pdf A gang of bandits . In Neural Information Processing Systems (NeurIPS), 2013
2013
-
[22]
https://arxiv.org/pdf/1605.08722.pdf An empirical evaluation of Thompson sampling
Olivier Chapelle and Lihong Li. https://arxiv.org/pdf/1605.08722.pdf An empirical evaluation of Thompson sampling . In Neural Information Processing Systems (NeurIPS). 2011
-
[23]
Hong, and Christos Faloutsos
Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. https://www.cc.gatech.edu/ dchau/papers/11-chi-apolo.pdf Apolo: Making sense of large network data by combining rich user interaction and machine learning . In Conference on Human Factors in Computing Systems, 2011
2011
-
[24]
http://proceedings.mlr.press/v15/chu11a/chu11a.pdf Contextual bandits with linear payoff functions
Lei Chu, Lihong Li, Lev Reyzin, and Robert E Schapire. http://proceedings.mlr.press/v15/chu11a/chu11a.pdf Contextual bandits with linear payoff functions . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011
2011
-
[25]
http://proceedings.mlr.press/v32/combes14.pdf Unimodal bandits: Regret lower bounds and optimal algorithms
Richard Combes and Alexandre Prouti \` e re. http://proceedings.mlr.press/v32/combes14.pdf Unimodal bandits: Regret lower bounds and optimal algorithms . In International Conference on Machine Learning (ICML), 2014
2014
-
[26]
https://repository.upenn.edu/cgi/viewcontent.cgi?article=1501 & context=statistics \_ papers Stochastic linear optimization under bandit feedback
Varsha Dani, Thomas P Hayes, and Sham M Kakade. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1501 & context=statistics \_ papers Stochastic linear optimization under bandit feedback . In Conference on Learning Theory (COLT), 2008
2008
-
[27]
https://icml.cc/2012//papers/602.pdf Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization
Thomas Desautels, Andreas Krause, and Joel Burdick. https://icml.cc/2012//papers/602.pdf Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization . In International Conference on Machine Learning (ICML), 2012
2012
-
[28]
Meng Fang and Dacheng Tao. http://delivery.acm.org/10.1145/2630000/2623672/p1106-fang.pdf?ip=193.49.212.233 & id=2623672 & acc=ACTIVE SERVICE & key=7EBF6E77E86B478F.5C2A4B72BE2A7DDF.4D4702B0C3E38B35.4D4702B0C3E38B35 & \_ \_ acm \_ \_ =1528847270 \_ 23b4902004322713593e1389d42ae48c Networked bandits with disjoint linear payoffs . In International Conferenc...
-
[29]
http://proceedings.mlr.press/v32/gentile14.pdf Online clustering of bandits
Claudio Gentile, Shuai Li, and Giovanni Zappella. http://proceedings.mlr.press/v32/gentile14.pdf Online clustering of bandits . In International Conference on Machine Learning (ICML), 2014
2014
-
[30]
http://proceedings.mlr.press/v70/gentile17a/gentile17a.pdf On context-dependent clustering of bandits
Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Giovanni Zappella, and Evans Etrue. http://proceedings.mlr.press/v70/gentile17a/gentile17a.pdf On context-dependent clustering of bandits . In International Conference on Machine Learning (ICML), 2017
2017
-
[31]
http://www.cs.utexas.edu/users/dml/software/graclus.html http://www.cs.utexas.edu/users/dml/Software/graclus.html
Graclus. http://www.cs.utexas.edu/users/dml/software/graclus.html http://www.cs.utexas.edu/users/dml/Software/graclus.html. University of Texas , 2013
2013
-
[32]
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7023409 Online spectral learning on a graph with bandit feedback
Quanquan Gu and Jiawei Han. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7023409 Online spectral learning on a graph with bandit feedback . In International Conference on Data Mining, 2014
2014
-
[33]
http://proceedings.mlr.press/v37/hanawal15.pdf Cheap bandits
Manjesh Hanawal, Venkatesh Saligrama, Michal Valko, and R \' e mi Munos. http://proceedings.mlr.press/v37/hanawal15.pdf Cheap bandits . In International Conference on Machine Learning (ICML), 2015
2015
-
[34]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.691 & rep=rep1 & type=pdf A matrix factorization technique with trust propagation for recommendation in social networks
Mohsen Jamali and Martin Ester. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.691 & rep=rep1 & type=pdf A matrix factorization technique with trust propagation for recommendation in social networks . In Conference on Recommender systems, 2010
2010
-
[35]
www.scholat.com/teamwork/teamworkdownloadscholar.html?id=542 & teamId=316 Recommender systems: An introduction
Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. www.scholat.com/teamwork/teamworkdownloadscholar.html?id=542 & teamId=316 Recommender systems: An introduction . Cambridge University Press, 2010
2010
-
[36]
Emilie Kaufmann, Nathaniel Korda, and R \' e mi Munos. https://arxiv.org/pdf/1205.4217.pdf Thompson sampling: An asymptotically optimal finite-time analysis . Algorithmic Learning Theory, 2012
-
[37]
https://arxiv.org/pdf/0901.3150.pdf Matrix completion from a few entries
Raghunandan Keshavan, Sewoong Oh, and Andrea Montanari. https://arxiv.org/pdf/0901.3150.pdf Matrix completion from a few entries . In International Symposium on Information Theory, 2009
-
[38]
https://arxiv.org/pdf/0809.4882.pdf Multi-armed bandit problems in metric spaces
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. https://arxiv.org/pdf/0809.4882.pdf Multi-armed bandit problems in metric spaces . In Symposium on Theory of Computing (STOC), 2008
-
[39]
Tom \' a s Koc \' a k, Gergely Neu, Michal Valko, and R \' e mi Munos. https://papers.nips.cc/paper/5462-efficient-learning-by-implicit-exploration-in-bandit-problems-with-side-observations.pdf Efficient learning by implicit exploration in bandit problems with side observations . In Neural Information Processing Systems (NeurIPS), 2014 a
2014
-
[40]
https://hal.inria.fr/hal-00981575v2/document Spectral Thompson sampling
Tom \' a s Koc \' a k, Michal Valko, R \' e mi Munos, and Shipra Agrawal. https://hal.inria.fr/hal-00981575v2/document Spectral Thompson sampling . In AAAI Conference on Artificial Intelligence (AAAI), 2014 b
2014
-
[41]
http://proceedings.mlr.press/v51/kocak16-supp.pdf Online learning with noisy side observations
Tom \' a s Koc \' a k, Gergely Neu, and Michal Valko. http://proceedings.mlr.press/v51/kocak16-supp.pdf Online learning with noisy side observations . In International Conference on Artificial Intelligence and Statistics (AISTATS), 2016 a
2016
-
[42]
https://hal.inria.fr/hal-01320588/document Online learning with Erd os-R \' e nyi side-observation graphs
Tom \' a s Koc \' a k, Gergely Neu, and Michal Valko. https://hal.inria.fr/hal-01320588/document Online learning with Erd os-R \' e nyi side-observation graphs . In Uncertainty in Artificial Intelligence (UAI), 2016 b
2016
-
[43]
http://proceedings.mlr.press/v48/korda16.pdf Distributed clustering of linear bandits in peer to peer networks
Nathan Korda, Bal \' a zs Sz \" o r \' e nyi, and Shuai Li. http://proceedings.mlr.press/v48/korda16.pdf Distributed clustering of linear bandits in peer to peer networks . In International Conference on Machine Learning (ICML), 2016
2016
-
[44]
Miller, and David Tolliver
Ioannis Koutis, Gary L. Miller, and David Tolliver. http://www.cs.cmu.edu/ ./jkoutis/papers/cviu \_ preprint.pdf Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing . Computer Vision and Image Understanding, 115 0 (12): 0 1638--1646, 2011
2011
-
[45]
http://www.grouplens.org/node/12 http://www.grouplens.org/node/12
Shyong Lam and Jon Herlocker. http://www.grouplens.org/node/12 http://www.grouplens.org/node/12. MovieLens 1M dataset , 2012
2012
-
[46]
Schapire
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. http://rob.schapire.net/papers/www10.pdf A contextual-bandit approach to personalized news article recommendation . International World Wide Web Conference, 2010
2010
-
[47]
https://arxiv.org/pdf/1502.03473.pdf Collaborative filtering bandits
Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. https://arxiv.org/pdf/1502.03473.pdf Collaborative filtering bandits . In Conference on Research and Development in Information Retrieval, 2016
-
[48]
https://pdfs.semanticscholar.org/f72b/71c747d2f487e8c0ade09f4d31e4ad2c0185.pdf Active search and bandits on graphs using sigma-optimality
Yifei Ma, Tzu-Kuo Huang, and Jeff Schneider. https://pdfs.semanticscholar.org/f72b/71c747d2f487e8c0ade09f4d31e4ad2c0185.pdf Active search and bandits on graphs using sigma-optimality . In Uncertainty in Artificial Intelligence (UAI), 2015
2015
-
[49]
https://papers.nips.cc/paper/4366-from-bandits-to-experts-on-the-value-of-side-observations.pdf From bandits to experts: On the value of side-observations
Shie Mannor and Ohad Shamir. https://papers.nips.cc/paper/4366-from-bandits-to-experts-on-the-value-of-side-observations.pdf From bandits to experts: On the value of side-observations . In Neural Information Processing Systems (NeurIPS), 2011
2011
-
[50]
May, Nathaniel Korda, Anthony Lee, and David S
Benedict C. May, Nathaniel Korda, Anthony Lee, and David S. Leslie. http://www.jmlr.org/papers/volume13/may12a/may12a.pdf Optimistic Bayesian sampling in contextual-bandit problems . Journal of Machine Learning Research, 13 0 (1): 0 2069--2106, 2012
2069
-
[51]
http://aris.ss.uci.edu/ lin/52.pdf Birds of a feather: Homophily in social networks
Miller McPherson, Lynn Smith-Lovin, and James Cook. http://aris.ss.uci.edu/ lin/52.pdf Birds of a feather: Homophily in social networks . Annual Review of Sociology, 27: 0 415--444, 2001
2001
-
[52]
Narang, Akshay Gadde, and Antonio Ortega
Sunil K. Narang, Akshay Gadde, and Antonio Ortega. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.650.2525 & rep=rep1 & type=pdf Signal processing techniques for interpolation in graph structured data . In International Conference on Acoustics, Speech and Signal Processing, 2013
2013
-
[53]
http://proceedings.mlr.press/v19/slivkins11a/slivkins11a.pdf Contextual bandits with similarity information
Aleksandrs Slivkins. http://proceedings.mlr.press/v19/slivkins11a/slivkins11a.pdf Contextual bandits with similarity information . In Conference on Learning Theory (COLT), 2009
2009
-
[54]
Gaussian process optimization in the bandit setting: No regret and experimental design,
Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias Seeger. https://arxiv.org/pdf/0912.3995.pdf Gaussian process optimization in the bandit setting: No regret and experimental design . International Conference on Machine Learning (ICML), 2010
-
[55]
William R. Thompson. https://www.jstor.org/stable/pdf/2332286.pdf On the likelihood that one unknown probability exceeds another in view of the evidence of two samples . Biometrika, 25: 0 285--294, 1933
-
[56]
https://hal.inria.fr/tel-01359757/document Bandits on graphs and structures
Michal Valko. https://hal.inria.fr/tel-01359757/document Bandits on graphs and structures . habilitation, \' E cole normale sup \' e rieure de Cachan, 2016
2016
-
[57]
http://researchers.lille.inria.fr/ valko/hp/serve.php?what=publications/valko2010online.pdf Online semi-supervised learning on quantized graphs
Michal Valko, Branislav Kveton, Ling Huang, and Daniel Ting. http://researchers.lille.inria.fr/ valko/hp/serve.php?what=publications/valko2010online.pdf Online semi-supervised learning on quantized graphs . In Uncertainty in Artificial Intelligence (UAI), 2010
2010
-
[58]
https://hal.inria.fr/hal-00826946/document Finite-time analysis of kernelised contextual bandits
Michal Valko, Nathan Korda, R \' e mi Munos, Ilias Flaounas, and Nelo Cristianini. https://hal.inria.fr/hal-00826946/document Finite-time analysis of kernelised contextual bandits . In Uncertainty in Artificial Intelligence (UAI), 2013
2013
-
[59]
http://proceedings.mlr.press/v32/valko14.pdf Spectral bandits for smooth graph functions
Michal Valko, R \' e mi Munos, Branislav Kveton, and Tom \' a s Koc \' a k. http://proceedings.mlr.press/v32/valko14.pdf Spectral bandits for smooth graph functions . In International Conference on Machine Learning (ICML), 2014
2014
-
[60]
http://www.kyb.mpg.de/fileadmin/user \_ upload/files/publications/attachments/Luxburg07 \_ tutorial \_ 4488 \ tutorial on spectral clustering
Ulrike von Luxburg. http://www.kyb.mpg.de/fileadmin/user \_ upload/files/publications/attachments/Luxburg07 \_ tutorial \_ 4488 \ tutorial on spectral clustering . Statistics and Computing, 17 0 (4): 0 395--416, 2007
2007
-
[61]
https://www.stat.berkeley.edu/ mjwain/stat210b/ STAT 210B A dvanced mathematical statistics
Martin Wainwright. https://www.stat.berkeley.edu/ mjwain/stat210b/ STAT 210B A dvanced mathematical statistics . Lecture notes, University of California at Berkeley, 2015
2015
-
[62]
http://www.icml-2011.org/papers/50 \_ icmlpaper.pdf Unimodal bandits
Jia Yuan Yu and Shie Mannor. http://www.icml-2011.org/papers/50 \_ icmlpaper.pdf Unimodal bandits . In International Conference on Machine Learning (ICML), 2011
2011
-
[63]
http://pages.cs.wisc.edu/ jerryzhu/pub/ssl \_ survey.pdf Semi-supervised learning literature survey
Xiaojin Zhu. http://pages.cs.wisc.edu/ jerryzhu/pub/ssl \_ survey.pdf Semi-supervised learning literature survey . Technical Report 1530, University of Wisconsin-Madison, 2008
2008
-
[64]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.