Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels

Jia-Qi Yang; Lei Shi

arxiv: 2504.18184 · v4 · submitted 2025-04-25 · 📊 stat.ML · cs.LG· math.FA· math.ST· stat.TH

Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels

Jia-Qi Yang , Lei Shi This is my paper

Pith reviewed 2026-05-22 18:18 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.FAmath.STstat.TH

keywords operator learningregularized SGDoperator-valued kernelsvector-valued RKHSdimension-independent boundsconvergence ratesstatistical inverse problemsstructured prediction

0 comments

The pith

Regularized SGD with operator-valued kernels delivers dimension-independent bounds for learning regression operators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the problem of estimating an unknown operator that maps from a Polish space into a separable Hilbert space, where the operator belongs to a vector-valued reproducing kernel Hilbert space generated by an operator-valued kernel. It examines regularized stochastic gradient descent in two regimes: an online version with polynomially decaying step sizes and regularization, and a finite-horizon version with fixed parameters. Under structural and distributional assumptions, the analysis yields error bounds for both prediction and estimation that do not grow with the dimension of the output space. These bounds are shown to be near-optimal in expectation, while high-probability versions imply almost sure convergence of the iterates.

Core claim

Under suitable structural and distributional assumptions on the target operator and the data-generating process, regularized stochastic gradient descent algorithms applied to the vector-valued RKHS induced by an operator-valued kernel produce dimension-independent bounds on prediction and estimation errors. The resulting rates are near-optimal in expectation for both online and finite-horizon settings, and high-probability estimates are derived that imply almost sure convergence in infinite-dimensional output spaces.

What carries the argument

Regularized stochastic gradient descent iterates on the vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel, which regularizes the ill-posed inverse problem of operator estimation.

If this is right

The method yields near-optimal rates without explicit dependence on output dimension, enabling use in high- or infinite-dimensional output settings.
High-probability bounds guarantee almost sure convergence of the learned operator iterates.
The framework directly applies to structured prediction tasks where outputs are elements of a Hilbert space.
Concrete examples show the approach extends to learning solution operators for parametric partial differential equations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The general technique for high-probability guarantees in infinite dimensions could be adapted to other kernel methods that operate on function-valued data.
Numerical experiments on function-valued regression problems with growing output dimension would provide direct checks on whether the predicted dimension independence appears in practice.
The same regularization and step-size schedules might transfer to related stochastic approximation schemes for operator equations arising in control or inverse problems.

Load-bearing premise

The target operator and the data-generating process satisfy structural and distributional assumptions that keep the problem well-behaved enough for dimension-free error control.

What would settle it

Construct a data distribution and target operator satisfying the paper's stated assumptions yet produce prediction error that grows with the dimension of the output Hilbert space; if the observed error scales with dimension, the dimension-independence claim fails.

Figures

Figures reproduced from arXiv: 2504.18184 by Jia-Qi Yang, Lei Shi.

**Figure 2.** Figure 2: Commutative diagram of PCA encoder-decoder framework [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

read the original abstract

We consider a class of statistical inverse problems involving the estimation of a regression operator from a Polish space to a separable Hilbert space, where the target lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. To address the associated ill-posedness, we analyze regularized stochastic gradient descent (SGD) algorithms in both online and finite-horizon settings. The former uses polynomially decaying step sizes and regularization parameters, while the latter adopts fixed values. Under suitable structural and distributional assumptions, we establish dimension-independent bounds for prediction and estimation errors. The resulting convergence rates are near-optimal in expectation, and we also derive high-probability estimates that imply almost sure convergence. Our analysis introduces a general technique for obtaining high-probability guarantees in infinite-dimensional settings. We illustrate the practical scope of our framework with applications to structured prediction and parametric PDEs, providing examples that reflect how the approach can be applied in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper analyzes regularized SGD for operator learning with operator-valued kernels and claims dimension-independent near-optimal rates plus high-probability bounds in infinite dimensions.

read the letter

The main point is that the authors analyze regularized SGD for learning regression operators from a Polish space to a Hilbert space, using operator-valued kernels. They get dimension-independent prediction and estimation error bounds that are near-optimal, plus high-probability versions that give almost sure convergence, for both online and finite-horizon algorithms. What is new is the combination of regularized SGD with these kernels for statistical inverse problems, and the introduction of a general technique for high-probability bounds in infinite dimensions. The paper does a good job laying out the assumptions and showing applications to structured prediction and parametric PDEs. Those examples make the work more concrete. The potential soft spot is the high-probability analysis. The concern about whether the martingale concentration works without dimension dependence for general kernels is reasonable to check. If the proof uses a fixed test function or a chaining argument sensitive to the kernel's eigenvalue decay, the dimension-independence claim could be weaker than stated. The abstract mentions suitable structural and distributional assumptions, but the details matter here. This is aimed at researchers in statistical learning theory dealing with operator estimation and kernel methods. Readers interested in convergence rates for SGD in infinite-dimensional settings would find value. The work is coherent enough on its own terms to deserve a serious referee. I recommend putting it through peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes regularized stochastic gradient descent (SGD) for estimating a regression operator from a Polish space to a separable Hilbert space, where the target lies in a vector-valued RKHS induced by an operator-valued kernel. It considers both online (polynomially decaying step sizes and regularization) and finite-horizon (fixed parameters) settings. Under structural and distributional assumptions (source condition, noise moments), the authors derive dimension-independent bounds on prediction and estimation errors, establish near-optimal rates in expectation, and provide high-probability estimates implying almost-sure convergence via a general technique for infinite-dimensional settings. Illustrations are given for structured prediction and parametric PDEs.

Significance. If the central claims hold, the work would advance the theoretical analysis of operator learning in infinite-dimensional spaces by supplying convergence guarantees for SGD that remain dimension-independent. The proposed general technique for high-probability bounds in separable Hilbert spaces could serve as a template for related statistical inverse problems. The applications to structured prediction and PDEs demonstrate practical relevance, though the primary contribution is the theoretical development of error bounds.

major comments (2)

[§4] §4 (High-probability analysis): The martingale concentration step used to obtain high-probability operator-norm bounds must be verified to apply directly to general operator-valued kernels without implicit finite-rank or trace-class restrictions. If the argument reduces the process via a fixed test functional or employs a chaining argument whose covering numbers depend on the effective dimension of the RKHS, the claimed dimension-independence may fail for kernels with slowly decaying eigenvalues; the structural assumptions listed do not explicitly preclude this.
[Theorem 3.1] Theorem 3.1 and Corollary 3.2 (expectation bounds): The near-optimality claim for the convergence rates in expectation relies on the specific choice of polynomially decaying step sizes; it is unclear whether the constants remain uniform when the source condition parameter and noise moments vary simultaneously, which could affect the dimension-free character of the final rates.

minor comments (2)

[§2] Notation for the operator-valued kernel and the associated RKHS should be introduced with an explicit reference to the reproducing property in the vector-valued case to avoid ambiguity when passing between scalar and operator settings.
[§3.2] The finite-horizon setting would benefit from a short remark clarifying how the fixed regularization parameter interacts with the horizon length to maintain the claimed rates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate where clarifications will be incorporated in the revision.

read point-by-point responses

Referee: [§4] §4 (High-probability analysis): The martingale concentration step used to obtain high-probability operator-norm bounds must be verified to apply directly to general operator-valued kernels without implicit finite-rank or trace-class restrictions. If the argument reduces the process via a fixed test functional or employs a chaining argument whose covering numbers depend on the effective dimension of the RKHS, the claimed dimension-independence may fail for kernels with slowly decaying eigenvalues; the structural assumptions listed do not explicitly preclude this.

Authors: We appreciate the referee's concern. The high-probability bounds in Section 4 are obtained via a general martingale concentration inequality for processes taking values in separable Hilbert spaces (invoking a vector-valued version of Freedman's inequality or equivalent results that hold without finite-rank or trace-class assumptions). The argument bounds the operator norm directly using the separability of the codomain and the uniform boundedness of the operator-valued kernel; it does not reduce the process to a fixed test functional nor employ chaining whose covering numbers depend on the effective dimension of the RKHS. The source condition together with the moment assumptions on the noise control the variance terms uniformly, so that the final rates remain dimension-independent even when the kernel eigenvalues decay slowly. To make this generality explicit, we will add a short remark after the statement of the concentration lemma. revision: partial
Referee: [Theorem 3.1] Theorem 3.1 and Corollary 3.2 (expectation bounds): The near-optimality claim for the convergence rates in expectation relies on the specific choice of polynomially decaying step sizes; it is unclear whether the constants remain uniform when the source condition parameter and noise moments vary simultaneously, which could affect the dimension-free character of the final rates.

Authors: The near-optimality statements in Theorem 3.1 and Corollary 3.2 are with respect to the minimax rates that are known to depend on the source-condition index and the noise-moment order. The polynomial schedules for the step size and regularization parameter are chosen precisely to attain these rates. The multiplicative constants appearing in the bounds are explicit functions of those parameters (as well as the kernel bound and the initial error); they are therefore not claimed to be uniform over all possible source indices and noise moments. The dimension-free character of the rates refers exclusively to the absence of any dependence on the dimension of the input Polish space or the output Hilbert space, which is preserved regardless of the values taken by the source and noise parameters. We will add a clarifying paragraph in the discussion following Corollary 3.2 that makes this dependence explicit and reiterates that dimension independence is unaffected. revision: partial

Circularity Check

0 steps flagged

Theoretical derivation of dimension-independent bounds from structural assumptions; minor self-citation present but not load-bearing for central claims.

full rationale

The paper derives prediction and estimation error bounds for regularized SGD with operator-valued kernels under explicit structural and distributional assumptions (source conditions, noise moments, step-size schedules). These bounds are obtained via standard concentration and martingale arguments in separable Hilbert spaces rather than by fitting parameters to data or reducing predictions to inputs by construction. No self-definitional loops, fitted-input predictions, or uniqueness theorems imported from the authors' prior work appear in the derivation chain. The high-probability estimates are presented as a general technique for infinite-dimensional settings, but the analysis remains self-contained against external benchmarks once the listed assumptions are granted. A low score of 2 accounts for possible routine self-citations that do not carry the central claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claims rest on standard but unspecified structural and distributional assumptions typical of statistical learning in Hilbert spaces; no free parameters are explicitly fitted to data in the abstract description.

free parameters (1)

step sizes and regularization parameters
Polynomially decaying or fixed values chosen for the online and finite-horizon algorithms to achieve the stated rates.

axioms (1)

domain assumption Suitable structural and distributional assumptions on the regression operator and data distribution.
Invoked to establish dimension-independent bounds and convergence rates.

pith-pipeline@v0.9.0 · 5695 in / 1136 out tokens · 101790 ms · 2026-05-22T18:18:58.017475+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We consider a class of statistical inverse problems involving the estimation of a regression operator from a Polish space to a separable Hilbert space, where the target lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under suitable structural and distributional assumptions, we establish dimension-independent bounds for prediction and estimation errors.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Kernel methods are com- petitive for operator learning.Journal of Computational Physics, 496:112549, 2024

Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are com- petitive for operator learning.Journal of Computational Physics, 496:112549, 2024. 53

work page 2024
[2]

Tight nonparametric convergence rates for stochastic gradient descent under the noiseless linear model.Advances in Neural Information Processing Systems, 33:2576–2586, 2020

Rapha¨ el Berthier, Francis Bach, and Pierre Gaillard. Tight nonparametric convergence rates for stochastic gradient descent under the noiseless linear model.Advances in Neural Information Processing Systems, 33:2576–2586, 2020

work page 2020
[3]

Model reduction and neural networks for parametric PDEs.The SMAI Journal of Computational Math- ematics, 7:121–157, 2021

Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric PDEs.The SMAI Journal of Computational Math- ematics, 7:121–157, 2021

work page 2021
[4]

Vector-valued least-squares regression under output regularity assumptions.Journal of Machine Learning Research, 23(344):1–50, 2022

Luc Brogat-Motte, Alessandro Rudi, C´ eline Brouard, Juho Rousu, and Florence d’Alch´ e Buc. Vector-valued least-squares regression under output regularity assumptions.Journal of Machine Learning Research, 23(344):1–50, 2022

work page 2022
[5]

Semi-supervised penalized output kernel regression for link prediction

C´ eline Brouard, Florence d’Alch´ e Buc, and Marie Szafranski. Semi-supervised penalized output kernel regression for link prediction. In28th International Conference on Machine Learning (ICML 2011), pages 593–600, 2011

work page 2011
[6]

Fast metabolite identification with input output kernel regression.Bioinformatics, 32(12):i28–i36, 2016

C´ eline Brouard, Huibin Shen, Kai D¨ uhrkop, Florence d’Alch´ e Buc, Sebastian B¨ ocker, and Juho Rousu. Fast metabolite identification with input output kernel regression.Bioinformatics, 32(12):i28–i36, 2016

work page 2016
[7]

Input output kernel regression: Su- pervised and semi-supervised structured output prediction with operator-valued kernels.Journal of Machine Learning Research, 17(176):1–48, 2016

C´ eline Brouard, Marie Szafranski, and Florence d’Alch´ e Buc. Input output kernel regression: Su- pervised and semi-supervised structured output prediction with operator-valued kernels.Journal of Machine Learning Research, 17(176):1–48, 2016

work page 2016
[8]

Minimax and adaptive prediction for functional linear regression

T Tony Cai and Ming Yuan. Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association, 107(499):1201–1216, 2012

work page 2012
[9]

Optimal rates for the regularized least-squares algo- rithm.Foundations of Computational Mathematics, 7:331–368, 2007

Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algo- rithm.Foundations of Computational Mathematics, 7:331–368, 2007

work page 2007
[10]

Universal multi- task kernels.Journal of Machine Learning Research, 9:1615–1646, 2008

Andrea Caponnetto, Charles A Micchelli, Massimiliano Pontil, and Yiming Ying. Universal multi- task kernels.Journal of Machine Learning Research, 9:1615–1646, 2008

work page 2008
[11]

Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem.Analysis and Applications, 4(04):377– 408, 2006

Claudio Carmeli, Ernesto De Vito, and Alessandro Toigo. Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem.Analysis and Applications, 4(04):377– 408, 2006

work page 2006
[12]

Vector valued reproducing kernel Hilbert spaces and universality.Analysis and Applications, 8(01):19–61, 2010

Claudio Carmeli, Ernesto De Vito, Alessandro Toigo, and Veronica Umanit´ a. Vector valued reproducing kernel Hilbert spaces and universality.Analysis and Applications, 8(01):19–61, 2010

work page 2010
[13]

A consistent regularization approach for structured prediction.Advances in Neural Information Processing Systems, 29, 2016

Carlo Ciliberto, Lorenzo Rosasco, and Alessandro Rudi. A consistent regularization approach for structured prediction.Advances in Neural Information Processing Systems, 29, 2016

work page 2016
[14]

A general framework for consistent structured prediction with implicit loss embeddings.Journal of Machine Learning Research, 21(98):1–67, 2020

Carlo Ciliberto, Lorenzo Rosasco, and Alessandro Rudi. A general framework for consistent structured prediction with implicit loss embeddings.Journal of Machine Learning Research, 21(98):1–67, 2020

work page 2020
[15]

American Mathematical Society, 2000

John B Conway.A Course in Operator Theory. American Mathematical Society, 2000

work page 2000
[16]

Nonparametric stochastic approximation with large step- sizes.The Annals of Statistics, pages 1363–1399, 2016

Aymeric Dieuleveut and Francis Bach. Nonparametric stochastic approximation with large step- sizes.The Annals of Statistics, pages 1363–1399, 2016

work page 2016
[17]

Harder, better, faster, stronger convergence rates for least-squares regression.Journal of Machine Learning Research, 18(101):1– 51, 2017

Aymeric Dieuleveut, Nicolas Flammarion, and Francis Bach. Harder, better, faster, stronger convergence rates for least-squares regression.Journal of Machine Learning Research, 18(101):1– 51, 2017

work page 2017
[18]

John Wiley & Sons, 1988

Nelson Dunford and Jacob T Schwartz.Linear Operators, Part 1: General Theory, volume 10. John Wiley & Sons, 1988

work page 1988
[19]

Learning multiple tasks with kernel methods.Journal of Machine Learning Research, 6(4), 2005

Theodoros Evgeniou, Charles A Micchelli, Massimiliano Pontil, and John Shawe-Taylor. Learning multiple tasks with kernel methods.Journal of Machine Learning Research, 6(4), 2005. 54

work page 2005
[20]

A survey of kernels for structured data.ACM SIGKDD Explorations Newsletter, 5(1):49–58, 2003

Thomas G¨ artner. A survey of kernels for structured data.ACM SIGKDD Explorations Newsletter, 5(1):49–58, 2003

work page 2003
[21]

Capacity dependent analysis for functional online learning algorithms.Applied and Computational Harmonic Analysis, 67:101567, 2023

Xin Guo, Zheng-Chu Guo, and Lei Shi. Capacity dependent analysis for functional online learning algorithms.Applied and Computational Harmonic Analysis, 67:101567, 2023

work page 2023
[22]

Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao

Steven C.H. Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. Online learning: A comprehensive survey.Neurocomputing, 459:249–289, 2021

work page 2021
[23]

Nonlinear functional regression: A functional RKHS approach

Hachem Kadri, Emmanuel Duflos, Philippe Preux, St´ ephane Canu, and Manuel Davy. Nonlinear functional regression: A functional RKHS approach. InProceedings of the Thirteenth Interna- tional Conference on Artificial Intelligence and Statistics, pages 374–380. JMLR Workshop and Conference Proceedings, 2010

work page 2010
[24]

Operator-valued kernels for learning from functional response data.Journal of Machine Learning Research, 17(20):1–54, 2016

Hachem Kadri, Emmanuel Duflos, Philippe Preux, St´ ephane Canu, Alain Rakotomamonjy, and Julien Audiffren. Operator-valued kernels for learning from functional response data.Journal of Machine Learning Research, 17(20):1–54, 2016

work page 2016
[25]

Functional regularized least squares classification with operator-valued kernels

Hachem Kadri, Asma Rabaoui, Philippe Preux, Emmanuel Duflos, and Alain Rakotomamonjy. Functional regularized least squares classification with operator-valued kernels. In28th Interna- tional Conference on Machine Learning (ICML), pages 993–1000. ACM, 2011

work page 2011
[26]

Multiple operator- valued kernel learning.Advances in Neural Information Processing Systems, 25, 2012

Hachem Kadri, Alain Rakotomamonjy, Philippe Preux, and Francis Bach. Multiple operator- valued kernel learning.Advances in Neural Information Processing Systems, 25, 2012

work page 2012
[27]

A structured prediction approach for label ranking.Advances in Neural Information Processing Systems, 31, 2018

Anna Korba, Alexandre Garcia, and Florence d’Alch´ e Buc. A structured prediction approach for label ranking.Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[28]

Operator learning with PCA-Net: upper and lower complexity bounds.Journal of Machine Learning Research, 24(318):1–67, 2023

Samuel Lanthaler. Operator learning with PCA-Net: upper and lower complexity bounds.Journal of Machine Learning Research, 24(318):1–67, 2023

work page 2023
[29]

Error estimates for deep- onets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1):tnac001, 2022

Samuel Lanthaler, Siddhartha Mishra, and George E Karniadakis. Error estimates for deep- onets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1):tnac001, 2022

work page 2022
[30]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede liu, Kaushik Bhat- tacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2020

work page 2020
[31]

Nonlinear functional models for functional responses in reproducing kernel Hilbert spaces.Canadian Journal of Statistics, 35(4):597–606, 2007

Heng Lian. Nonlinear functional models for functional responses in reproducing kernel Hilbert spaces.Canadian Journal of Statistics, 35(4):597–606, 2007

work page 2007
[32]

Statistical optimality of divide and conquer kernel-based functional linear regression.Journal of Machine Learning Research, 25(155):1–56, 2024

Jiading Liu and Lei Shi. Statistical optimality of divide and conquer kernel-based functional linear regression.Journal of Machine Learning Research, 25(155):1–56, 2024

work page 2024
[33]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021

work page 2021
[34]

On learning vector-valued functions.Neural Com- putation, 17(1):177–204, 2005

Charles A Micchelli and Massimiliano Pontil. On learning vector-valued functions.Neural Com- putation, 17(1):177–204, 2005

work page 2005
[35]

Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes.Advances in Neural Infor- mation Processing Systems, 31, 2018

Loucas Pillaud-Vivien, Alessandro Rudi, and Francis Bach. Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes.Advances in Neural Infor- mation Processing Systems, 31, 2018

work page 2018
[36]

Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

work page 1994
[37]

Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associ´ es (noyaux reproduisants).Journal D’analyse Math´ ematique, 13:115–256, 1964

Laurent Schwartz. Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associ´ es (noyaux reproduisants).Journal D’analyse Math´ ematique, 13:115–256, 1964. 55

work page 1964
[38]

Learning operators with stochastic gradient descent in general Hilbert spaces.arXiv preprint arXiv:2402.04691, 2024

Lei Shi and Jia-Qi Yang. Learning operators with stochastic gradient descent in general Hilbert spaces.arXiv preprint arXiv:2402.04691, 2024

work page arXiv 2024
[39]

Online learning algorithms.Foundations of Computational Mathe- matics, 6:145–170, 2006

Steve Smale and Yuan Yao. Online learning algorithms.Foundations of Computational Mathe- matics, 6:145–170, 2006

work page 2006
[40]

Online learning as stochastic approximation of regularization paths: Optimality and almost-sure convergence.IEEE Transactions on Information Theory, 60(9):5716– 5735, 2014

Pierre Tarres and Yuan Yao. Online learning as stochastic approximation of regularization paths: Optimality and almost-sure convergence.IEEE Transactions on Information Theory, 60(9):5716– 5735, 2014

work page 2014
[41]

Last iterate convergence of sgd for least-squares in the interpolation regime.Advances in Neural Information Processing Systems, 34:21581–21591, 2021

Aditya Vardhan Varre, Loucas Pillaud-Vivien, and Nicolas Flammarion. Last iterate convergence of sgd for least-squares in the interpolation regime.Advances in Neural Information Processing Systems, 34:21581–21591, 2021

work page 2021
[42]

Cambridge University Press, 2004

Holger Wendland.Scattered Data Approximation, volume 17. Cambridge University Press, 2004

work page 2004
[43]

Ker- nel dependency estimation.Advances in Neural Information Processing Systems, 15, 2002

Jason Weston, Olivier Chapelle, Vladimir Vapnik, Andr´ e Elisseeff, and Bernhard Sch¨ olkopf. Ker- nel dependency estimation.Advances in Neural Information Processing Systems, 15, 2002

work page 2002
[44]

Learning deep neural network representations for koopman operators of nonlinear dynamical systems

Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In2019 American Control Conference (ACC), pages 4832–4839, 2019

work page 2019
[45]

Online gradient descent learning algorithms.Foundations of Computational Mathematics, 8:561–596, 2008

Yiming Ying and Massimiliano Pontil. Online gradient descent learning algorithms.Foundations of Computational Mathematics, 8:561–596, 2008

work page 2008
[46]

A reproducing kernel Hilbert space approach to functional linear regression.The Annals of Statistics, 38(6):3412–3444, 2010

Ming Yuan and T Tony Cai. A reproducing kernel Hilbert space approach to functional linear regression.The Annals of Statistics, 38(6):3412–3444, 2010

work page 2010
[47]

An algorithmic view of l2 regularization and some path-following algorithms.Journal of Machine Learning Research, 22(138):1–62, 2021

Yunzhang Zhu and Renxiong Liu. An algorithmic view of l2 regularization and some path-following algorithms.Journal of Machine Learning Research, 22(138):1–62, 2021. 56

work page 2021

[1] [1]

Kernel methods are com- petitive for operator learning.Journal of Computational Physics, 496:112549, 2024

Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are com- petitive for operator learning.Journal of Computational Physics, 496:112549, 2024. 53

work page 2024

[2] [2]

Tight nonparametric convergence rates for stochastic gradient descent under the noiseless linear model.Advances in Neural Information Processing Systems, 33:2576–2586, 2020

Rapha¨ el Berthier, Francis Bach, and Pierre Gaillard. Tight nonparametric convergence rates for stochastic gradient descent under the noiseless linear model.Advances in Neural Information Processing Systems, 33:2576–2586, 2020

work page 2020

[3] [3]

Model reduction and neural networks for parametric PDEs.The SMAI Journal of Computational Math- ematics, 7:121–157, 2021

Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric PDEs.The SMAI Journal of Computational Math- ematics, 7:121–157, 2021

work page 2021

[4] [4]

Vector-valued least-squares regression under output regularity assumptions.Journal of Machine Learning Research, 23(344):1–50, 2022

Luc Brogat-Motte, Alessandro Rudi, C´ eline Brouard, Juho Rousu, and Florence d’Alch´ e Buc. Vector-valued least-squares regression under output regularity assumptions.Journal of Machine Learning Research, 23(344):1–50, 2022

work page 2022

[5] [5]

Semi-supervised penalized output kernel regression for link prediction

C´ eline Brouard, Florence d’Alch´ e Buc, and Marie Szafranski. Semi-supervised penalized output kernel regression for link prediction. In28th International Conference on Machine Learning (ICML 2011), pages 593–600, 2011

work page 2011

[6] [6]

Fast metabolite identification with input output kernel regression.Bioinformatics, 32(12):i28–i36, 2016

C´ eline Brouard, Huibin Shen, Kai D¨ uhrkop, Florence d’Alch´ e Buc, Sebastian B¨ ocker, and Juho Rousu. Fast metabolite identification with input output kernel regression.Bioinformatics, 32(12):i28–i36, 2016

work page 2016

[7] [7]

Input output kernel regression: Su- pervised and semi-supervised structured output prediction with operator-valued kernels.Journal of Machine Learning Research, 17(176):1–48, 2016

C´ eline Brouard, Marie Szafranski, and Florence d’Alch´ e Buc. Input output kernel regression: Su- pervised and semi-supervised structured output prediction with operator-valued kernels.Journal of Machine Learning Research, 17(176):1–48, 2016

work page 2016

[8] [8]

Minimax and adaptive prediction for functional linear regression

T Tony Cai and Ming Yuan. Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association, 107(499):1201–1216, 2012

work page 2012

[9] [9]

Optimal rates for the regularized least-squares algo- rithm.Foundations of Computational Mathematics, 7:331–368, 2007

Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algo- rithm.Foundations of Computational Mathematics, 7:331–368, 2007

work page 2007

[10] [10]

Universal multi- task kernels.Journal of Machine Learning Research, 9:1615–1646, 2008

Andrea Caponnetto, Charles A Micchelli, Massimiliano Pontil, and Yiming Ying. Universal multi- task kernels.Journal of Machine Learning Research, 9:1615–1646, 2008

work page 2008

[11] [11]

Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem.Analysis and Applications, 4(04):377– 408, 2006

Claudio Carmeli, Ernesto De Vito, and Alessandro Toigo. Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem.Analysis and Applications, 4(04):377– 408, 2006

work page 2006

[12] [12]

Vector valued reproducing kernel Hilbert spaces and universality.Analysis and Applications, 8(01):19–61, 2010

Claudio Carmeli, Ernesto De Vito, Alessandro Toigo, and Veronica Umanit´ a. Vector valued reproducing kernel Hilbert spaces and universality.Analysis and Applications, 8(01):19–61, 2010

work page 2010

[13] [13]

A consistent regularization approach for structured prediction.Advances in Neural Information Processing Systems, 29, 2016

Carlo Ciliberto, Lorenzo Rosasco, and Alessandro Rudi. A consistent regularization approach for structured prediction.Advances in Neural Information Processing Systems, 29, 2016

work page 2016

[14] [14]

A general framework for consistent structured prediction with implicit loss embeddings.Journal of Machine Learning Research, 21(98):1–67, 2020

Carlo Ciliberto, Lorenzo Rosasco, and Alessandro Rudi. A general framework for consistent structured prediction with implicit loss embeddings.Journal of Machine Learning Research, 21(98):1–67, 2020

work page 2020

[15] [15]

American Mathematical Society, 2000

John B Conway.A Course in Operator Theory. American Mathematical Society, 2000

work page 2000

[16] [16]

Nonparametric stochastic approximation with large step- sizes.The Annals of Statistics, pages 1363–1399, 2016

Aymeric Dieuleveut and Francis Bach. Nonparametric stochastic approximation with large step- sizes.The Annals of Statistics, pages 1363–1399, 2016

work page 2016

[17] [17]

Harder, better, faster, stronger convergence rates for least-squares regression.Journal of Machine Learning Research, 18(101):1– 51, 2017

Aymeric Dieuleveut, Nicolas Flammarion, and Francis Bach. Harder, better, faster, stronger convergence rates for least-squares regression.Journal of Machine Learning Research, 18(101):1– 51, 2017

work page 2017

[18] [18]

John Wiley & Sons, 1988

Nelson Dunford and Jacob T Schwartz.Linear Operators, Part 1: General Theory, volume 10. John Wiley & Sons, 1988

work page 1988

[19] [19]

Learning multiple tasks with kernel methods.Journal of Machine Learning Research, 6(4), 2005

Theodoros Evgeniou, Charles A Micchelli, Massimiliano Pontil, and John Shawe-Taylor. Learning multiple tasks with kernel methods.Journal of Machine Learning Research, 6(4), 2005. 54

work page 2005

[20] [20]

A survey of kernels for structured data.ACM SIGKDD Explorations Newsletter, 5(1):49–58, 2003

Thomas G¨ artner. A survey of kernels for structured data.ACM SIGKDD Explorations Newsletter, 5(1):49–58, 2003

work page 2003

[21] [21]

Capacity dependent analysis for functional online learning algorithms.Applied and Computational Harmonic Analysis, 67:101567, 2023

Xin Guo, Zheng-Chu Guo, and Lei Shi. Capacity dependent analysis for functional online learning algorithms.Applied and Computational Harmonic Analysis, 67:101567, 2023

work page 2023

[22] [22]

Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao

Steven C.H. Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. Online learning: A comprehensive survey.Neurocomputing, 459:249–289, 2021

work page 2021

[23] [23]

Nonlinear functional regression: A functional RKHS approach

Hachem Kadri, Emmanuel Duflos, Philippe Preux, St´ ephane Canu, and Manuel Davy. Nonlinear functional regression: A functional RKHS approach. InProceedings of the Thirteenth Interna- tional Conference on Artificial Intelligence and Statistics, pages 374–380. JMLR Workshop and Conference Proceedings, 2010

work page 2010

[24] [24]

Operator-valued kernels for learning from functional response data.Journal of Machine Learning Research, 17(20):1–54, 2016

Hachem Kadri, Emmanuel Duflos, Philippe Preux, St´ ephane Canu, Alain Rakotomamonjy, and Julien Audiffren. Operator-valued kernels for learning from functional response data.Journal of Machine Learning Research, 17(20):1–54, 2016

work page 2016

[25] [25]

Functional regularized least squares classification with operator-valued kernels

Hachem Kadri, Asma Rabaoui, Philippe Preux, Emmanuel Duflos, and Alain Rakotomamonjy. Functional regularized least squares classification with operator-valued kernels. In28th Interna- tional Conference on Machine Learning (ICML), pages 993–1000. ACM, 2011

work page 2011

[26] [26]

Multiple operator- valued kernel learning.Advances in Neural Information Processing Systems, 25, 2012

Hachem Kadri, Alain Rakotomamonjy, Philippe Preux, and Francis Bach. Multiple operator- valued kernel learning.Advances in Neural Information Processing Systems, 25, 2012

work page 2012

[27] [27]

A structured prediction approach for label ranking.Advances in Neural Information Processing Systems, 31, 2018

Anna Korba, Alexandre Garcia, and Florence d’Alch´ e Buc. A structured prediction approach for label ranking.Advances in Neural Information Processing Systems, 31, 2018

work page 2018

[28] [28]

Operator learning with PCA-Net: upper and lower complexity bounds.Journal of Machine Learning Research, 24(318):1–67, 2023

Samuel Lanthaler. Operator learning with PCA-Net: upper and lower complexity bounds.Journal of Machine Learning Research, 24(318):1–67, 2023

work page 2023

[29] [29]

Error estimates for deep- onets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1):tnac001, 2022

Samuel Lanthaler, Siddhartha Mishra, and George E Karniadakis. Error estimates for deep- onets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1):tnac001, 2022

work page 2022

[30] [30]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede liu, Kaushik Bhat- tacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2020

work page 2020

[31] [31]

Nonlinear functional models for functional responses in reproducing kernel Hilbert spaces.Canadian Journal of Statistics, 35(4):597–606, 2007

Heng Lian. Nonlinear functional models for functional responses in reproducing kernel Hilbert spaces.Canadian Journal of Statistics, 35(4):597–606, 2007

work page 2007

[32] [32]

Statistical optimality of divide and conquer kernel-based functional linear regression.Journal of Machine Learning Research, 25(155):1–56, 2024

Jiading Liu and Lei Shi. Statistical optimality of divide and conquer kernel-based functional linear regression.Journal of Machine Learning Research, 25(155):1–56, 2024

work page 2024

[33] [33]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021

work page 2021

[34] [34]

On learning vector-valued functions.Neural Com- putation, 17(1):177–204, 2005

Charles A Micchelli and Massimiliano Pontil. On learning vector-valued functions.Neural Com- putation, 17(1):177–204, 2005

work page 2005

[35] [35]

Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes.Advances in Neural Infor- mation Processing Systems, 31, 2018

Loucas Pillaud-Vivien, Alessandro Rudi, and Francis Bach. Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes.Advances in Neural Infor- mation Processing Systems, 31, 2018

work page 2018

[36] [36]

Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

work page 1994

[37] [37]

Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associ´ es (noyaux reproduisants).Journal D’analyse Math´ ematique, 13:115–256, 1964

Laurent Schwartz. Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associ´ es (noyaux reproduisants).Journal D’analyse Math´ ematique, 13:115–256, 1964. 55

work page 1964

[38] [38]

Learning operators with stochastic gradient descent in general Hilbert spaces.arXiv preprint arXiv:2402.04691, 2024

Lei Shi and Jia-Qi Yang. Learning operators with stochastic gradient descent in general Hilbert spaces.arXiv preprint arXiv:2402.04691, 2024

work page arXiv 2024

[39] [39]

Online learning algorithms.Foundations of Computational Mathe- matics, 6:145–170, 2006

Steve Smale and Yuan Yao. Online learning algorithms.Foundations of Computational Mathe- matics, 6:145–170, 2006

work page 2006

[40] [40]

Online learning as stochastic approximation of regularization paths: Optimality and almost-sure convergence.IEEE Transactions on Information Theory, 60(9):5716– 5735, 2014

Pierre Tarres and Yuan Yao. Online learning as stochastic approximation of regularization paths: Optimality and almost-sure convergence.IEEE Transactions on Information Theory, 60(9):5716– 5735, 2014

work page 2014

[41] [41]

Last iterate convergence of sgd for least-squares in the interpolation regime.Advances in Neural Information Processing Systems, 34:21581–21591, 2021

Aditya Vardhan Varre, Loucas Pillaud-Vivien, and Nicolas Flammarion. Last iterate convergence of sgd for least-squares in the interpolation regime.Advances in Neural Information Processing Systems, 34:21581–21591, 2021

work page 2021

[42] [42]

Cambridge University Press, 2004

Holger Wendland.Scattered Data Approximation, volume 17. Cambridge University Press, 2004

work page 2004

[43] [43]

Ker- nel dependency estimation.Advances in Neural Information Processing Systems, 15, 2002

Jason Weston, Olivier Chapelle, Vladimir Vapnik, Andr´ e Elisseeff, and Bernhard Sch¨ olkopf. Ker- nel dependency estimation.Advances in Neural Information Processing Systems, 15, 2002

work page 2002

[44] [44]

Learning deep neural network representations for koopman operators of nonlinear dynamical systems

Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In2019 American Control Conference (ACC), pages 4832–4839, 2019

work page 2019

[45] [45]

Online gradient descent learning algorithms.Foundations of Computational Mathematics, 8:561–596, 2008

Yiming Ying and Massimiliano Pontil. Online gradient descent learning algorithms.Foundations of Computational Mathematics, 8:561–596, 2008

work page 2008

[46] [46]

A reproducing kernel Hilbert space approach to functional linear regression.The Annals of Statistics, 38(6):3412–3444, 2010

Ming Yuan and T Tony Cai. A reproducing kernel Hilbert space approach to functional linear regression.The Annals of Statistics, 38(6):3412–3444, 2010

work page 2010

[47] [47]

An algorithmic view of l2 regularization and some path-following algorithms.Journal of Machine Learning Research, 22(138):1–62, 2021

Yunzhang Zhu and Renxiong Liu. An algorithmic view of l2 regularization and some path-following algorithms.Journal of Machine Learning Research, 22(138):1–62, 2021. 56

work page 2021