arxiv: 2604.25021 · v1 · submitted 2026-04-27 · 💻 cs.LG

Recognition: unknown

Dynamic Regret for Online Regression in RKHS via Discounted VAW and Subspace Approximation

Dmitry B. Rokhlin , Georgiy A. Karapetyants

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:59 UTC · model grok-4.3

classification 💻 cs.LG

keywords online regressiondynamic regretreproducing kernel Hilbert spacediscounted VAWsubspace approximationkernel truncationonline learning

0 comments

The pith

Online regression in reproducing kernel Hilbert spaces achieves dynamic regret bounds that scale with the path length of a time-varying comparator sequence through subspace approximations of the RKHS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to handle online regression with the square loss in an RKHS under a dynamic regret criterion, comparing the learner to a sequence of time-varying functions whose total change is measured by the sum of RKHS norms of consecutive differences. It transfers the discounted VAW approach from finite dimensions to the RKHS setting by approximating the space with finite-dimensional subspaces obtained from feature expansions or Mercer truncations. For each such subspace an ensemble of discounted VAW forecasters is run over a geometric grid of discount factors, and the resulting approximation error is controlled by the uniform projection error of the kernel sections. This construction yields explicit dynamic regret bounds for several kernel classes in both fast and slow regimes.

Core claim

For a fixed subspace the method runs a VAW-based ensemble of discounted VAW forecasters over a geometric grid of discount factors; the additional approximation error is controlled by the uniform projection error of kernel sections. Orthogonal truncation methods are introduced that start from a feature expansion, impose an inner product making the features orthonormal, and take spans of the first basis functions as the approximation spaces. These are applied to explicit expansions for Gaussian and analytic kernels, to Mercer truncations depending on eigenvalue decay, and to subspaces spanned by kernel sections for Matérn kernels.

What carries the argument

VAW-based ensemble of discounted VAW forecasters over a geometric grid of discount factors on finite-dimensional subspace approximations of the RKHS, with approximation error bounded by the uniform projection error of kernel sections.

If this is right

Dynamic regret bounds are obtained that depend on the path length of the time-varying comparator in the RKHS norm.
Fast-regime bounds hold for explicit feature expansions of Gaussian and analytic dot-product kernels.
Mercer truncations yield dynamic regret bounds in fast and slow regimes depending on the eigenvalue decay rate.
The construction applies to subspaces spanned by kernel sections and produces bounds for Matérn kernels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This suggests that choosing subspaces which capture the dominant directions of variation in the data stream is key to keeping approximation error low.
The geometric grid over discount factors provides robustness to unknown rates of change in the comparator sequence.
Similar subspace reduction techniques could be explored for other online learning tasks in infinite-dimensional function spaces.

Load-bearing premise

The uniform projection error of kernel sections onto the chosen finite-dimensional subspaces remains small enough not to dominate the dynamic regret term.

What would settle it

If experiments show that for typical data streams the cumulative uniform projection error grows faster than the path-length contribution to the regret, the method would fail to deliver useful bounds.

read the original abstract

We study online regression with the square loss in a reproducing kernel Hilbert space under a dynamic regret criterion. The learner is compared with a time-varying comparator sequence, and the bounds depend on its path length in the RKHS norm. The proposed method transfers the finite-dimensional discounted Vovk--Azoury--Warmuth approach of Jacobsen \& Cutkosky (2024) to the RKHS setting by means of finite-dimensional subspace approximations. For a fixed subspace, we run a VAW-based ensemble of discounted VAW forecasters over a geometric grid of discount factors. The additional approximation error is controlled by the uniform projection error of kernel sections. We then introduce a general orthogonal truncation method: starting from a feature expansion of the kernel, we construct the associated RKHS by introducing an inner product that makes the feature functions orthonormal, and then use the spans of the first basis functions as finite-dimensional approximation spaces. The resulting subspace reduction is applied to several approximation schemes. Explicit feature expansions yield fast-regime bounds for Gaussian and analytic dot-product kernels. Mercer truncations provide a spectral approximation method and lead to dynamic regret bounds in fast and slow regimes, depending on the eigenvalue decay. Finally, we study subspaces spanned by kernel sections and apply this construction to Mat\'ern kernels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper transfers finite-dimensional dynamic regret methods to RKHS online regression by running discounted VAW ensembles on orthogonal subspace approximations, with explicit constructions for Gaussian, Mercer, and Matérn kernels.

read the letter

The main thing to know is that the authors adapt the discounted VAW approach from Jacobsen and Cutkosky to the infinite-dimensional setting by projecting onto finite subspaces and controlling the extra error through uniform projection of kernel sections. They then give concrete orthogonal truncation schemes that produce explicit bounds depending on the kernel type and eigenvalue decay.

Referee Report

3 major / 2 minor

Summary. The paper studies online regression with square loss in an RKHS under dynamic regret, where the comparator sequence has bounded path length in the RKHS norm. It transfers the finite-dimensional discounted VAW forecaster of Jacobsen & Cutkosky (2024) to the RKHS setting via finite-dimensional subspace approximations: for a fixed subspace an ensemble of discounted VAW instances is run over a geometric grid of discount factors, with the extra error bounded by the uniform projection error of kernel sections onto the subspace. A general orthogonal truncation construction is then introduced from feature expansions, yielding explicit fast-regime bounds for Gaussian and analytic dot-product kernels, Mercer spectral truncations that produce fast/slow-regime bounds depending on eigenvalue decay, and kernel-section subspaces applied to Matérn kernels.

Significance. If the error decompositions are rigorous and the projection errors can be controlled to remain sublinear in T while preserving an online algorithm, the work would meaningfully extend dynamic-regret analysis to infinite-dimensional kernel settings and supply concrete, kernel-specific constructions that practitioners could use. The explicit orthogonal truncation recipes for several standard kernels constitute a concrete strength.

major comments (3)

[Abstract and orthogonal truncation construction] Abstract and the paragraph introducing the subspace reduction: the claim that 'the additional approximation error is controlled by the uniform projection error of kernel sections' is load-bearing. For any fixed finite-dimensional subspace of an infinite-dimensional RKHS (Gaussian, Matérn, etc.), the uniform projection error δ is a positive constant on generic streams; the per-round excess loss is then Θ(δ) and the total regret acquires an additive O(T δ) term. The manuscript must exhibit the full regret decomposition (presumably in the analysis section following the algorithm) and show explicitly how this term is rendered o(T) or absorbed into the path-length term, either by T-dependent truncation or by an online choice of subspace.
[Mercer truncations] Section on Mercer truncations and eigenvalue decay: the fast- and slow-regime bounds are stated to depend on the decay rate of the Mercer eigenvalues. The truncation level must be specified (fixed, T-dependent, or data-dependent). If the level grows with T, the manuscript must verify that the resulting algorithm remains strictly online (no lookahead) and that the dynamic-regret guarantee still holds with the stated dependence on path length V.
[VAW-based ensemble over discount factors] Ensemble construction for a fixed subspace: the transfer of the discounted-VAW ensemble from the finite-dimensional setting is plausible, but the additional regret incurred by the geometric grid of discount factors plus the subspace projection must be shown not to dominate the O(√(T V)) term. An explicit statement of the final bound (including all additive terms) is required to confirm the result is non-vacuous.

minor comments (2)

[Notation] Notation for the RKHS norm and path length V should be introduced once and used consistently; several passages mix ||·||_H with the path-length definition.
[Theorem statements] The abstract states that bounds are obtained 'in fast and slow regimes'; the precise definitions of these regimes (in terms of eigenvalue decay or truncation dimension) should appear in the theorem statements rather than only in the text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review, and for acknowledging the potential value of the explicit kernel constructions. We address each major comment below. We agree that the current manuscript requires additional explicit decompositions, specifications of truncation levels, and consolidated bounds to ensure full rigor and clarity. We will make the necessary revisions.

read point-by-point responses

Referee: [Abstract and orthogonal truncation construction] Abstract and the paragraph introducing the subspace reduction: the claim that 'the additional approximation error is controlled by the uniform projection error of kernel sections' is load-bearing. For any fixed finite-dimensional subspace of an infinite-dimensional RKHS (Gaussian, Matérn, etc.), the uniform projection error δ is a positive constant on generic streams; the per-round excess loss is then Θ(δ) and the total regret acquires an additive O(T δ) term. The manuscript must exhibit the full regret decomposition (presumably in the analysis section following the algorithm) and show explicitly how this term is rendered o(T) or absorbed into the path-length term, either by T-dependent truncation or by an online choice of subspace.

Authors: We agree that a fixed subspace would produce a linear O(T δ) term and render the bound vacuous. Our orthogonal truncation constructions are T-dependent: the subspace dimension is selected as a deterministic function of T (and kernel parameters) to drive the uniform projection error δ_T to zero sufficiently fast that T δ_T remains sublinear in T (or is absorbed into the path-length term). The choice depends only on T and known kernel properties, preserving the online nature of the algorithm. We will add an explicit regret decomposition theorem (in the analysis following the algorithm) that isolates the approximation term and shows how it is controlled for each kernel family. revision: yes
Referee: [Mercer truncations] Section on Mercer truncations and eigenvalue decay: the fast- and slow-regime bounds are stated to depend on the decay rate of the Mercer eigenvalues. The truncation level must be specified (fixed, T-dependent, or data-dependent). If the level grows with T, the manuscript must verify that the resulting algorithm remains strictly online (no lookahead) and that the dynamic-regret guarantee still holds with the stated dependence on path length V.

Authors: The truncation level m in the Mercer construction is T-dependent and chosen from the known eigenvalue decay rate to balance approximation and regret terms. Because m(T) is a deterministic function of T alone, the algorithm remains strictly online with no data-dependent lookahead. We will explicitly state the formula for m(T) and add a verification in the proof that the dynamic-regret bound retains its dependence on path length V, with the additional approximation terms remaining sublinear. revision: yes
Referee: [VAW-based ensemble over discount factors] Ensemble construction for a fixed subspace: the transfer of the discounted-VAW ensemble from the finite-dimensional setting is plausible, but the additional regret incurred by the geometric grid of discount factors plus the subspace projection must be shown not to dominate the O(√(T V)) term. An explicit statement of the final bound (including all additive terms) is required to confirm the result is non-vacuous.

Authors: We agree that an explicit consolidated bound is required. The analysis already shows that the geometric grid over discount factors contributes only a logarithmic factor while the projection error is controlled by the T-dependent subspace choice. We will add a single theorem that states the complete dynamic regret bound for each kernel (including the ensemble overhead and projection terms) and confirms that these additive contributions remain o(T) or O(√(T V log T)) and do not dominate the main term. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation transfers external finite-dimensional result via standard RKHS approximation

full rationale

The paper explicitly builds the RKHS method on the finite-dimensional discounted VAW forecaster of Jacobsen & Cutkosky (2024), an external reference. Dynamic regret is defined externally via comparator path length in the RKHS norm. The additional error term is bounded by the uniform projection error of kernel sections onto the chosen subspace, which follows directly from the definition of orthogonal projection in Hilbert space and does not reduce to a fitted parameter or self-referential assumption. No self-citations appear, no uniqueness theorems are imported from the authors' prior work, and no ansatz or renaming of known results is used to close the derivation. The construction remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Relies on standard RKHS axioms and the external definition of dynamic regret via path length; no free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math The RKHS is a reproducing kernel Hilbert space with the standard inner product and reproducing property.
Invoked throughout to define the function space and projection errors.
domain assumption The comparator sequence has bounded path length in the RKHS norm.
Central to the dynamic regret criterion stated in the abstract.

pith-pipeline@v0.9.0 · 5534 in / 1332 out tokens · 45369 ms · 2026-05-08T03:59:16.919675+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 10 canonical work pages

[1]

Abramowitz and I.A

M. Abramowitz and I.A. Stegun, eds.Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. 10th. Vol. 55. Applied Mathematics Series. Washington, D.C.: National Bureau of Standards, 1972

1972
[2]

Agranovich.Sobolev Spaces, Their Generalizations, and Elliptic Problems in Smooth and Lipschitz Domains

M.S. Agranovich.Sobolev Spaces, Their Generalizations, and Elliptic Problems in Smooth and Lipschitz Domains. Cham: Springer, 2015.doi:10.1007/978-3-319-14648-5

work page doi:10.1007/978-3-319-14648-5 2015
[3]

Relative loss bounds for on-line density estimation with the exponential family of distributions

K.S. Azoury and M.K. Warmuth. “Relative loss bounds for on-line density estimation with the exponential family of distributions”. In:Machine Learning43.3 (2001), pp. 211–246.doi:10 . 1023/A:1010896012157

2001
[4]

Dynamic Regret for Strongly Adaptive Methods and Opti- mality of Online KRR

D. Baby, H. Hasson, and Y. Wang. “Dynamic Regret for Strongly Adaptive Methods and Opti- mality of Online KRR”. In:arXiv preprint arXiv:2111.11550(2021). arXiv:2111.11550 [cs.LG]

work page arXiv 2021
[5]

Optimal Dynamic Regret in Exp-Concave Online Learning

D. Baby and Y.-X. Wang. “Optimal Dynamic Regret in Exp-Concave Online Learning”. In:Pro- ceedings of the 34th Conference on Learning Theory. Ed. by M. Belkin and S. Kpotufe. Vol. 134. Proceedings of Machine Learning Research. PMLR, 2021, pp. 359–409

2021
[6]

Explicit Approximations of the Gaussian Kernel

A. Cotter, J. Keshet, and N. Srebro. “Explicit Approximations of the Gaussian Kernel”. In:Pro- ceedings of the 8th International Conference on Machine Learning and Applications. IEEE, 2011, pp. 183–190.doi:10.1109/ICMLA.2011.152

work page doi:10.1109/icmla.2011.152 2011
[7]

Cucker and D.-X

F. Cucker and D.-X. Zhou.Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, 2007.isbn: 052186559X

2007
[8]

A Chaining Algorithm for Online Nonparametric Regression

P. Gaillard and S. Gerchinovitz. “A Chaining Algorithm for Online Nonparametric Regression”. In:Proceedings of the 28th Conference on Learning Theory. Vol. 40. 2015, pp. 1–33

2015
[9]

On-Line Prediction with Kernels and the Complex- ity Approximation Principle

A. Gammerman, Y. Kalnishkan, and V. Vovk. “On-Line Prediction with Kernels and the Complex- ity Approximation Principle”. In:Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI 2004). Ed. by David Maxwell Chickering and Joseph Y. Halpern. Banff, Canada: AUAI Press, 2004, pp. 170–176

2004
[10]

Online linear regression in dynamic environments via discounting

A. Jacobsen and A. Cutkosky. “Online linear regression in dynamic environments via discounting”. In:Proceedings of the 41st International Conference on Machine Learning. Vol. 235. Proceedings of Machine Learning Research. 2024, pp. 21083–21120

2024
[11]

Efficient online learning with kernels for adversarial large scale problems

R. J´ ez´ equel, P. Gaillard, and A. Rudi. “Efficient online learning with kernels for adversarial large scale problems”. In:Advances in Neural Information Processing Systems32 (2019)

2019
[12]

arXiv preprint arXiv:2506.17366 , year=

M Kanagawa et al.Gaussian Processes and Reproducing Kernels: Connections and Equivalences. 2025.doi:10.48550/arXiv.2506.17366. arXiv:2506.17366

work page doi:10.48550/arxiv.2506.17366 2025
[13]

Random Feature Maps for Dot Product Kernels

P. Kar and H. Karnick. “Random Feature Maps for Dot Product Kernels”. In:Proceedings of the 15th International Conference on Artificial Intelligence and Statistics. Vol. 22. JMLR Workshop and Conference Proceedings. 2012, pp. 583–591

2012
[14]

Eigenvalues of Analytic Kernels

G. Little and J. B. Reade. “Eigenvalues of Analytic Kernels”. In:SIAM Journal on Mathematical Analysis15.1 (1984), pp. 133–136.doi:10.1137/0515009

work page doi:10.1137/0515009 1984
[15]

Merris.Combinatorics

R. Merris.Combinatorics. 2nd. Hoboken, NJ: John Wiley & Sons, 2003

2003
[16]

Sobolev Bounds on Functions with Scattered Zeros, with Applications to Radial Basis Function Surface Fitting

F.J. Narcowich, J.D. Ward, and H. Wendland. “Sobolev Bounds on Functions with Scattered Zeros, with Applications to Radial Basis Function Surface Fitting”. In:Mathematics of Computation 74.250 (2005), pp. 743–763.doi:10.1090/S0025-5718-04-01675-6

work page doi:10.1090/s0025-5718-04-01675-6 2005
[17]

Paulsen and M

V.I. Paulsen and M. Raghupathi.An Introduction to the Theory of Reproducing Kernel Hilbert Spaces. Cambridge: Cambridge University Press, 2016. 25

2016
[18]

Online non-parametric regression

A. Rakhlin and K. Sridharan. “Online non-parametric regression”. In:Proceedings of The 27th Con- ference on Learning Theory. Vol. 35. Proceedings of Machine Learning Research. 2014, pp. 1232– 1264

2014
[19]

Rokhlin.A Hierarchical Vovk-Azoury-Warmuth Forecaster with Discounting for Online Re- gression in RKHS

D.B. Rokhlin.A Hierarchical Vovk-Azoury-Warmuth Forecaster with Discounting for Online Re- gression in RKHS. 2025.doi:10.48550/arXiv.2506.22631. arXiv:2506.22631 [cs.LG]

work page doi:10.48550/arxiv.2506.22631 2025
[20]

Ensembling discounted VAW experts with the VAW meta- learner

D.B. Rokhlin and G.A. Karapetyants. “Ensembling discounted VAW experts with the VAW meta- learner”. In:Global and Stochastic Analysis12.5 (2025), pp. 45–58

2025
[21]

Approximation of eigenfunctions in kernel-based spaces

G. Santin and R. Schaback. “Approximation of eigenfunctions in kernel-based spaces”. In:Advances in Computational Mathematics42.4 (2016), pp. 973–993

2016
[22]

Approximation by Positive Definite Kernels

R. Schaback and H. Wendland. “Approximation by Positive Definite Kernels”. In:Advanced Prob- lems in Constructive Approximation. Ed. by M.D. Buhmann and D.H. Mache. Vol. 142. Interna- tional Series of Numerical Mathematics. Basel: Birkh¨ auser, 2002, pp. 203–221

2002
[23]

Positive Definite Functions on Spheres

I.J. Schoenberg. “Positive Definite Functions on Spheres”. In:Duke Mathematical Journal9.1 (1942), pp. 96–108

1942
[24]

Stein.Interpolation of Spatial Data: Some Theory for Kriging

M.L. Stein.Interpolation of Spatial Data: Some Theory for Kriging. Springer, 1999.doi:10.1007/ 978-1-4612-1494-6

1999
[25]

Steinwart and A

I. Steinwart and A. Christmann.Support Vector Machines. New York: Springer-Verlag, 2008

2008
[26]

On the speed of uniform convergence in Mercer’s theorem

R. Takhanov. “On the speed of uniform convergence in Mercer’s theorem”. In:Journal of Mathe- matical Analysis and Applications518.2 (2023), p. 126718.doi:10.1016/j.jmaa.2022.126718

work page doi:10.1016/j.jmaa.2022.126718 2023
[27]

Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science

R. Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018

2018
[28]

Competitive on-line statistics

V. Vovk. “Competitive on-line statistics”. In:International Statistical Review69.2 (2001), pp. 213– 248.doi:10.1111/j.1751-5823.2001.tb00457.x

work page doi:10.1111/j.1751-5823.2001.tb00457.x 2001
[29]

On-line regression competitive with reproducing kernel Hilbert spaces

V. Vovk. “On-line regression competitive with reproducing kernel Hilbert spaces”. In:International Conference on Theory and Applications of Models of Computation. Springer. 2006, pp. 452–463

2006
[30]

Trading-Off Static and Dynamic Regret in Online Least- Squares and Beyond

Jianjun Yuan and Andrew G. Lamperski. “Trading-Off Static and Dynamic Regret in Online Least- Squares and Beyond”. In:Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34
[31]

Zadorozhnyi et al.Online Nonparametric Regression with Sobolev Kernels

O. Zadorozhnyi et al.Online Nonparametric Regression with Sobolev Kernels. arXiv:2102.03594. 2021. 26

work page arXiv 2021