pith. sign in

arxiv: 2606.27917 · v1 · pith:LDYBOMGOnew · submitted 2026-06-26 · 💻 cs.LG

Graph Dimensionality Reduction for Contextual Bandits: Structure-Specific Regret Bounds under Approximate Smoothness and Noisy Eigenspaces

Pith reviewed 2026-06-29 04:21 UTC · model grok-4.3

classification 💻 cs.LG
keywords contextual banditsgraph structurespectral dimensionality reductionregret boundslinear UCBapproximate smoothnessnoisy graphs
0
0 comments X

The pith

Projecting arm features onto a graph's low-frequency spectral subspace produces the first Õ(k√T) regret bound for contextual bandits, replacing dependence on ambient dimension d with dependence on subspace dimension k.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when arms lie on a graph and rewards are approximately smooth across connected arms, one can project the feature vectors into the k-dimensional low-frequency part of the graph spectrum and then run linear UCB only in that reduced space. The resulting algorithm avoids paying the full price of exploring in the original d dimensions. A central technical step is showing that any high-frequency reward component does not automatically produce linear-in-T regret; its cost is instead governed by how much it actually affects the specific sequence of arms that get played. The same analysis supplies an explicit additive penalty when the graph itself must be estimated from data or when smoothness holds only approximately. This matters for recommendation and retrieval tasks whose arms naturally form graphs, because the usual dimensionality-reduction methods discard that structure and therefore explore more than necessary.

Core claim

GraphDR-LinUCB projects each arm's feature vector onto the k-dimensional low-frequency spectral subspace of the graph Laplacian and then executes linear UCB inside the projected space; under the assumption that the reward function is approximately smooth on the graph, the regret is bounded by Õ(k√T) rather than the usual Õ(d√T), and a perturbation argument extends the bound to noisy or estimated graphs while charging an explicit term for any mismatch between the true reward smoothness and the chosen subspace plus any error in the estimated eigenvectors.

What carries the argument

GraphDR-LinUCB, the algorithm that projects arm features onto the graph's low-frequency spectral subspace before running linear UCB, together with the perturbation argument that controls the effect of noisy eigenspaces.

If this is right

  • Regret scales with the chosen subspace dimension k instead of the original feature dimension d.
  • High-frequency reward components incur a path-dependent cost rather than an automatic linear penalty.
  • An explicit additive term appears for any mismatch between reward smoothness and the chosen subspace or for error in the estimated graph eigenvectors.
  • A simple spectral comparison operator predicts which of two candidate subspaces will produce lower regret on a given dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same path-dependent costing argument could be applied to other structured bandit problems whose actions lie on a manifold or graph.
  • The predictive spectral comparison might allow automatic selection of the best reducer on new data without additional validation runs.
  • If the graph is only approximately known, the explicit penalty terms give a concrete budget for how much graph estimation error can be tolerated before the reduction loses its advantage.

Load-bearing premise

The reward function is approximately smooth on the graph, so the realized cost of any high-frequency component is set by its effect along the actual sequence of arms played rather than by a worst-case linear-in-T penalty that ignores the path.

What would settle it

An experiment in which the high-frequency reward component is shown to produce a linear-in-T penalty no matter which sequence of arms is played would falsify the claimed regret bound.

Figures

Figures reproduced from arXiv: 2606.27917 by Anuj Sharma, Ibne Farabi Shihab, Joyanta Jyoti Mondal.

Figure 1
Figure 1. Figure 1: GraphDR-LinUCB. The graph determines a bottom-k low-frequency Laplacian eigenbasis Ebk; arm features are projected to R k and LinUCB runs there. Only the high-frequency residual r⊥,k left outside the subspace can hurt the learner; its effect is governed by the realized graph quantities SRLT and CRWT . The shuffle control permutes node labels before computing Ebk, preserving the projected dimension but dest… view at source ↗
Figure 2
Figure 2. Figure 2: Main synthetic comparison with the graph [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dimension scaling at fixed k ⋆ = 5. GraphDR￾LinUCB remains nearly flat as d = n grows; full LinUCB grows with ambient dimension. Random GraphDR-shuffled (control) JL+LinUCB PCA+LinUCB LinUCB-full GraphDR+LinUCB (ours) Method 10 2 10 3 Cumulativ e re gret R(T) 2613 2546 2364 2297 418 18 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Second graph family: random geometric graph. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Contextual bandits with graph-structured arms arise in recommendation, citation retrieval, and social advertising, where arms connected on a graph tend to share reward signal. Standard dimensionality reduction ignores this structure, inflating exploration cost by a factor of $d/k$. We propose GraphDR-LinUCB, which projects arm features onto the graph's low-frequency spectral subspace and runs linear UCB in the resulting $k$-dimensional space. We prove the first $\wtO(k\sqrt{T})$ regret bound for spectral-projection-based contextual bandits, reducing dimension dependence from $d$ to $k$; a perturbation argument extends this to noisy graphs, with an explicit penalty for reward-smoothness mismatch and graph-estimation error. Our central theoretical finding is that the high-frequency reward component need not incur a worst-case linear-in-$T$ penalty: its actual cost depends on its realized impact along the played path, not on its total energy. A simple spectral comparison between subspaces ($\Gamma_k$) predicts which reducer wins on a given dataset, correctly calling five of six real-dataset outcomes without any fitted threshold. Across a synthetic benchmark and six real datasets (MovieLens, Amazon, LastFM, ogbn-arxiv, MIND), GraphDR-LinUCB reduces cumulative regret by $15\times$ over full-dimensional LinUCB and outperforms competing graph-aware methods on five of six; the single failure is precisely where the graph's spectral subspace is misaligned with the reward.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces GraphDR-LinUCB for contextual bandits whose arms lie on a graph. It projects arm features onto the graph's k-dimensional low-frequency eigenspace and runs linear UCB in the reduced space. The central theoretical claim is an Õ(k√T) regret bound (first such for spectral-projection methods), obtained by showing that the high-frequency reward component incurs only path-dependent cost under approximate smoothness rather than a worst-case linear-in-T penalty; a perturbation argument extends the bound to noisy graphs with explicit penalties for smoothness mismatch and estimation error. A spectral-comparison statistic Γ_k is shown to predict which reducer wins on a given instance (correct on 5/6 real datasets without fitted thresholds). Experiments on a synthetic benchmark and six real datasets (MovieLens, Amazon, LastFM, ogbn-arxiv, MIND) report 15× regret reduction versus full-dimensional LinUCB and outperformance versus competing graph-aware baselines on five of six instances.

Significance. If the path-dependent high-frequency analysis is valid, the result would be significant: it supplies the first explicit dimension reduction from d to k in the regret of spectral-projection contextual bandits and supplies a practical, threshold-free predictor Γ_k together with broad empirical validation. The explicit perturbation terms for graph noise and smoothness mismatch are also a strength, as is the fact that the single empirical failure is explained by misalignment of the spectral subspace with the reward.

major comments (1)
  1. [Abstract] Abstract (central claim): the Õ(k√T) regret bound rests on converting the path-dependent cost of the high-frequency component (under approximate smoothness) into a sub-dominant term whose perturbation penalties for graph-estimation error and smoothness mismatch remain o(√T). The abstract states this conversion explicitly but supplies neither the definition of approximate smoothness nor the derivation steps that bound the realized path cost without re-introducing a d- or T-linear factor; this is load-bearing for the dimension-reduction claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback on the central claim. The comment correctly notes that the abstract is high-level; we address it point-by-point below and will revise the abstract accordingly while preserving the manuscript's technical content.

read point-by-point responses
  1. Referee: [Abstract] Abstract (central claim): the Õ(k√T) regret bound rests on converting the path-dependent cost of the high-frequency component (under approximate smoothness) into a sub-dominant term whose perturbation penalties for graph-estimation error and smoothness mismatch remain o(√T). The abstract states this conversion explicitly but supplies neither the definition of approximate smoothness nor the derivation steps that bound the realized path cost without re-introducing a d- or T-linear factor; this is load-bearing for the dimension-reduction claim.

    Authors: We agree the abstract omits the explicit definition and derivation steps, as is conventional for abstracts. Approximate smoothness is defined in Definition 3.1: for reward function f, the high-frequency projection satisfies ||P_{>k} f||_G <= epsilon, where the graph Laplacian controls the decay. The path-dependent cost is formalized in Lemma 4.2: the high-frequency regret term equals sum_{t=1}^T <P_{>k} f, x_t>^2 / ||x_t||^2, which is bounded by the realized path energy rather than worst-case ||P_{>k} f|| * T. Under the smoothness assumption this sum is o(T) and, after the perturbation analysis in Theorem 5.1 (accounting for graph noise delta and mismatch eta), contributes only o(sqrt(T)) to the total regret, yielding the claimed Õ(k sqrt(T)) bound. The full derivation appears in Section 4. To make the abstract self-contained we will add a parenthetical definition of approximate smoothness and a citation to Theorem 4.3. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained; no reduction to inputs by construction

full rationale

The abstract and description present a theoretical proof establishing an Õ(k√T) regret bound via spectral projection onto the graph's low-frequency subspace, followed by a perturbation argument for noisy graphs and smoothness mismatch. The path-dependent cost under approximate smoothness is stated as the key theoretical finding (high-frequency component cost depends on realized impact along the played path), but this is framed as a derived property of the assumption rather than a definitional tautology or fitted input renamed as prediction. Γ_k is a simple spectral comparison without fitted thresholds that matches dataset outcomes, supplying independent grounding. No self-citation load-bearing steps, uniqueness theorems imported from authors, or ansatz smuggling via citation are present in the provided material. The central claim retains independent mathematical content from the stated assumptions and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption of approximate reward smoothness with respect to the graph and on the technical assumption that the low-frequency subspace can be estimated with controlled error; no free parameters or new invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Approximate smoothness of the reward function on the graph
    Invoked to ensure the high-frequency component incurs only path-dependent rather than worst-case linear regret.
  • domain assumption Controlled perturbation of the graph eigenspaces
    Required for the noisy-graph extension and explicit penalty term.

pith-pipeline@v0.9.1-grok · 5810 in / 1492 out tokens · 53055 ms · 2026-06-29T04:21:22.468461+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 20 canonical work pages · 7 internal anchors

  1. [1]

    Abbasi-Yadkori, Y.; Pál, D.; and Szepesvári, C. 2011. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems

  2. [2]

    Agarwal, A.; Luo, H.; Neyshabur, B.; and Schapire, R. E. 2017. Corralling a band of bandit algorithms. In Conference on Learning Theory

  3. [3]

    Auer, P. 2002. Using confidence bounds for exploitation-explorationtrade-offs.JournalofMachine Learning Research, 3: 397–422

  4. [4]

    LaplacianEigenmaps forDimensionalityReductionandDataRepresentation

    Belkin,M.;andNiyogi,P.2003. LaplacianEigenmaps forDimensionalityReductionandDataRepresentation. Neural Computation, 15(6): 1373–1396

  5. [5]

    Cesa-Bianchi, N.; Gentile, C.; and Zappella, G. 2013. A gang of bandits. InAdvances in Neural Information Processing Systems

  6. [6]

    Chapelle, O.; and Li, L. 2011. An empirical evalua- tion of Thompson sampling. InAdvances in Neural Information Processing Systems

  7. [7]

    Chu, W.; Li, L.; Reyzin, L.; and Schapire, R. E. 2011. Contextual Bandits with Linear Payoff Functions. In International Conference on Artificial Intelligence and Statistics

  8. [8]

    Dai, Y.; Golrezaei, N.; and Jaillet, P. 2026. Policy Re- gretforEmbeddingModelRouting:ContextualBandits with Low-Rank Experts. arXiv:2606.14929

  9. [9]

    In Conference on Learning Theory

    Dani,V.;Hayes,T.P.;andKakade,S.M.2008.Stochas- tic Linear Optimization under Bandit Feedback. In Conference on Learning Theory

  10. [10]

    Davis, C.; and Kahan, W. M. 1970. The rotation of eigenvectors by a perturbation. III.SIAM Journal on Numerical Analysis, 7(1): 1–46

  11. [11]

    Dudík, M.; Langford, J.; and Li, L. 2011. Doubly ro- bust policy evaluation and learning. InInternational Conference on Machine Learning

  12. [12]

    J.; Gentile, C.; Mohri, M.; and Zimmert, J

    Foster, D. J.; Gentile, C.; Mohri, M.; and Zimmert, J. 2020. Adapting to misspecification in contextual bandits. InAdvancesinNeuralInformationProcessing Systems

  13. [13]

    Gentile, C.; Li, S.; and Zappella, G. 2014. Online clustering of bandits. InInternational Conference on Machine Learning

  14. [14]

    Ghojogh, B.; Ghodsi, A.; Karray, F.; and Crowley, M

  15. [15]

    Laplacian-Based Dimensionality Reduction In- cluding Spectral Clustering, Laplacian Eigenmap, Lo- cality Preserving Projection, Graph Embedding, and DiffusionMap:TutorialandSurvey.arXiv:2106.02154

  16. [16]

    Inductive Representation Learning on Large Graphs

    Hamilton, W. L.; Ying, R.; and Leskovec, J. 2017. Inductive Representation Learning on Large Graphs. arXiv:1706.02216

  17. [17]

    Hu, W.; Fey, M.; Zitnik, M.; Dong, Y.; Ren, H.; Liu, B.; Catasta, M.; and Leskovec, J. 2020. Open Graph Benchmark:DatasetsforMachineLearningonGraphs. arXiv:2005.00687

  18. [18]

    Huang, R.; and Huang, Z. 2025. Nearly Tight Bounds for Cross-Learning Contextual Bandits with Graphical Feedback.arXiv preprint arXiv:2502.04678

  19. [19]

    Jamshidi, F.; Shahverdikondori, M.; and Kiyavash, N

  20. [20]

    arXiv:2503.07555

    Graph-Dependent Regret Bounds in Multi- Armed Bandits with Interference. arXiv:2503.07555

  21. [21]

    Jedra,Y.;Réveillard,W.;Stojanovic,S.;andProutiere, A. 2024. Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery. arXiv:2402.15739

  22. [22]

    Jun, K.-S.; Willett, R.; Wright, S.; and Nowak, R

  23. [23]

    Bilinear Bandits with Low-rank Structure

    Bilinear Bandits with Low-rank Structure. arXiv:1901.02470

  24. [24]

    Khosravi, H.; and Huo, X. 2026. Catching a Mov- ing Subspace: Low-Rank Bandits Beyond Stationarity. arXiv:2605.20269

  25. [25]

    Semi-Supervised Classification with Graph Convolutional Networks

    Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907

  26. [26]

    2020.Bandit Algo- rithms

    Lattimore, T.; and Szepesvári, C. 2020.Bandit Algo- rithms. Cambridge University Press

  27. [27]

    Lattimore, T.; Szepesvári, C.; and Weisz, G. 2019. Learning with good feature representations in bandits and in RL with a generative model. InInternational Conference on Machine Learning

  28. [28]

    A contextual-bandit approach to personalized news ar- ticle recommendation

    Li,L.;Chu,W.;Langford,J.;andSchapire,R.E.2010. A contextual-bandit approach to personalized news ar- ticle recommendation. InInternational Conference on World Wide Web

  29. [29]

    Mannor, S.; and Shamir, O. 2011. From Ban- dits to Experts: On the Value of Side-Observations. arXiv:1106.2436

  30. [30]

    Y.; Jordan, M

    Ng, A. Y.; Jordan, M. I.; and Weiss, Y. 2001. On Spectral Clustering: Analysis and an Algorithm. In Advances in Neural Information Processing Systems

  31. [31]

    Pacchiano, A.; Phan, M.; Abbasi-Yadkori, Y.; Rao, A.; Zimmert, J.; Lattimore, T.; and Szepesvari, C. 2020. ModelSelectioninContextualStochasticBanditProb- lems. arXiv:2003.01704

  32. [32]

    InAmericanControlConference(ACC)

    Paschalidis,P.;Zhang,R.;andLi,N.2024.Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Re- gretAnalysis. InAmericanControlConference(ACC)

  33. [33]

    Qi, H.; Guo, F.-Y.; Zhu, L.; Zhang, Q.; and Li, X

  34. [34]

    Graph Feedback Bandits on Similar Arms: With and Without Graph Structures.arXiv preprint arXiv:2501.14314

  35. [35]

    Qiu, H.; Zhang, M.; and Cesa-Bianchi, N. 2026. Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach.arXiv preprint arXiv:2602.06404

  36. [36]

    Russo,D.;VanRoy,B.;Kazerouni,A.;Osband,I.;and Wen, Z. 2017. A Tutorial on Thompson Sampling. arXiv:1707.02038

  37. [37]

    I.; Narang, S

    Shuman, D. I.; Narang, S. K.; Frossard, P.; Ortega, A.; and Vandergheynst, P. 2013. The emerging field of signal processing on graphs.IEEE Signal Processing Magazine, 30(3): 83–98

  38. [38]

    Spectral bandits for smooth graph functions

    Valko,M.;Munos,R.;Kveton,B.;andKocák,T.2014. Spectral bandits for smooth graph functions. InInter- national Conference on Machine Learning

  39. [39]

    von Luxburg, U. 2007. A Tutorial on Spectral Cluster- ing. arXiv:0711.0189

  40. [40]

    Wang, Y.; Li, J.; Kang, Y.; Gao, S.; and Xiao, Z. 2025. GeneralizedLow-RankMatrixContextualBanditswith Graph Information. arXiv:2507.17528

  41. [41]

    Wen, D.; Yin, H.; Zhang, X.; Zhao, P.; Zhang, L.; and Wei, Z. 2024. Revisiting Matrix Sketching in Linear Bandits:AchievingSublinearRegretviaDyadicBlock Sketching.arXiv preprint arXiv:2410.10258

  42. [42]

    Yang, K.; Toni, L.; and Dong, X. 2020. Laplacian- regularized graph bandits: Algorithms and theoretical analysis. InInternational Conference on Artificial In- telligence and Statistics

  43. [43]

    Contextual Bandits with Random Projection

    Yu,X.2019. ContextualBanditswithRandomProjec- tion. arXiv:1903.08600

  44. [44]

    Zhao, H.; Ye, C.; Gu, Q.; and Zhang, T. 2024. Sharp Analysis for KL-Regularized Contextual Bandits and RLHF. arXiv:2411.04625. Proofs This appendix contains the proofs of all results stated in the Theory section, in the order they appear. Each proof begins withabriefsketchofthekeysteptoorientbeforetheformal argument. Proof of Theorem 1. Sketch.The project...

  45. [45]

    BuildbLfrom the training graph and computebEk

  46. [46]

    Fitapilotrewardmodel bθfromadisjointexplorationpre- fix, logged data, or cross-fitting (a node reward estimate in the one-hot case; a ridge or doubly robust model with general features)

  47. [47]

    Computebr⊥,k = (I− bΠk)bθ

  48. [48]

    Estimate [CRWk on held-out candidate sets by averaging the range ofx⊤ t,abr⊥,k over candidates

  49. [49]

    EstimatedSRLk on a short simulated or logged prefix

  50. [50]

    Se- lect GraphDR whenΓk >0; otherwise prefer PCA, full LinUCB, or a Laplacian-regularized baseline

    Compute the subspace-capture marginΓk = U ⊤ k bθ 2 2 − P ⊤bθ 2 2 against the competing reducer’s basisP. Se- lect GraphDR whenΓk >0; otherwise prefer PCA, full LinUCB, or a Laplacian-regularized baseline. Full Diagnostic-Validation Protocol The protocol behind the predictive diagnostic evaluation is specified here in full so that the negative primary resu...