How does feature learning reshape the function space?

Bruno Loureiro; Fanghui Liu; Jo\~ao Lobo; Long Tran-Than

arxiv: 2605.17718 · v1 · pith:JC54ULOCnew · submitted 2026-05-18 · 📊 stat.ML · cs.LG

How does feature learning reshape the function space?

Jo\~ao Lobo , Bruno Loureiro , Long Tran-Than , Fanghui Liu This is my paper

Pith reviewed 2026-05-19 22:22 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords feature learningfunction spacegradient descenthigh-dimensional regimespiked covarianceadaptive kerneltwo-layer networksspectral structure

0 comments

The pith

In high dimensions, one large gradient step on a two-layer network produces features whose distribution approximates a target-dependent spiked Gaussian covariance, inducing a data-adaptive kernel that reshapes the function space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that feature learning during gradient descent is equivalent to a specific change in the distribution of network features rather than a simple rescaling of a fixed kernel. In the proportional high-dimensional limit, a sufficiently large update makes the post-training feature covariance look like a Gaussian with an extra spike aligned to the target function. This change creates an adaptive kernel whose eigenstructure favors directions that match the signal in the data. A reader should care because it supplies a concrete function-space account of why neural networks can represent different functions after training compared with static kernel methods.

Core claim

We prove that, in the high-dimensional proportional regime, after a large gradient step the post-update feature distribution is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space and modifies its spectral structure. Feature learning can be viewed as a distributional transformation in parameter space or input space, or equivalently as the introduction of a target-dependent kernel. In particular, the update selectively amplifies eigenvalues aligned with the target direction and mixes leading eigenfunctions, coupling the top radial mode with a target-aligned quadratic harmonic. The overall effect is a data-adp

What carries the argument

Target-dependent spiked Gaussian covariance that approximates the post-update feature distribution and thereby induces the data-adaptive kernel.

If this is right

The induced kernel selectively amplifies eigenvalues aligned with the target direction.
Leading eigenfunctions mix, coupling the top radial mode with a target-aligned quadratic harmonic.
Feature learning acts as a distributional transformation in parameter space or input space.
Early training deforms the function space to preferentially enhance directions aligned with the signal rather than rescaling a fixed kernel.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Repeated gradient steps could compound the distributional shift, producing successively more adapted kernels at later training stages.
The same spiked approximation may appear in deeper networks if each layer experiences an analogous large update.
Low-dimensional regimes or small step sizes offer a direct test of where the reshaping mechanism breaks down.
This view connects to analyses of kernel evolution under gradient flow by supplying an explicit distributional mechanism for the adaptation.

Load-bearing premise

The analysis requires the high-dimensional proportional regime together with a large enough gradient step size so that the updated features can be approximated by the spiked Gaussian form.

What would settle it

Numerical computation of the empirical covariance of hidden features after one large gradient step on finite but proportional n and d data, checking whether the observed matrix deviates from the predicted target-dependent spike.

Figures

Figures reproduced from arXiv: 2605.17718 by Bruno Loureiro, Fanghui Liu, Jo\~ao Lobo, Long Tran-Than.

read the original abstract

Feature learning is widely regarded as the key mechanism distinguishing neural networks from fixed-kernel methods, yet its impact on the induced function space remains poorly understood. In this work, we precisely characterize how the function space spanned by the features of a two-layer neural network evolves during gradient descent training. We prove that, in the high-dimensional proportional regime, after a large gradient step the post-update feature distribution is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space and modifies its spectral structure. Our analysis reveals that feature learning can be interpreted as a distributional transformation in either parameter space or input space, equivalently as the introduction of a target-dependent kernel. In particular, it selectively amplifies eigenvalues aligned with the target direction and mixes leading eigenfunctions, coupling the top radial mode with a target-aligned quadratic harmonic. Overall, our results provide a precise function-space perspective on early-stage feature learning: rather than just rescaling a fixed kernel, gradient descent induces a data-adaptive deformation that preferentially enhances directions aligned with the signal in the data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

After one large gradient step in the proportional limit, the features get approximated by a target-dependent spiked covariance that induces a data-adaptive kernel with specific eigenfunction mixing.

read the letter

The main point is that the paper derives an approximation for the post-update feature distribution after a single large gradient step. In the high-dimensional regime where n and d grow together with fixed ratio, this distribution looks like a spiked Gaussian whose spike direction depends on the target. The result is a kernel that adapts to the data and changes its spectrum by boosting aligned directions while mixing the top radial mode with a target-aligned quadratic term.

Referee Report

2 major / 2 minor

Summary. The paper claims that in the high-dimensional proportional regime (n, d → ∞ with n/d fixed), a single large gradient step on a two-layer neural network produces a post-update feature distribution that is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space, selectively amplifying eigenvalues aligned with the target direction and mixing leading eigenfunctions (e.g., coupling the top radial mode with a target-aligned quadratic harmonic). Feature learning is interpreted equivalently as a distributional transformation in parameter space or input space.

Significance. If the central approximation holds with the stated precision, the work supplies a rigorous function-space view of early-stage feature learning that goes beyond fixed-kernel or NTK analyses by exhibiting an explicit data-adaptive deformation of the spectral structure. The result is potentially useful for understanding how gradient descent modifies the effective kernel during the initial phase of training and for designing adaptive kernels that capture target-aligned directions.

major comments (2)

[§3.1, Theorem 1] §3.1, Theorem 1: The spiked-Gaussian approximation is asserted to hold after a 'sufficiently large' gradient step, yet the statement provides neither an explicit lower bound on the step size η nor quantitative error bounds (in total variation or Wasserstein distance) that depend on n, d, and η. Without these, it is impossible to verify the domain of validity of the claimed limit or to assess how the approximation degrades when the step-size condition is relaxed.
[§4.2, Eq. (17)] §4.2, Eq. (17): The induced kernel is defined via the expectation over the spiked covariance; however, the derivation of the eigenvalue amplification and eigenfunction mixing (radial mode coupled to quadratic harmonic) appears to rest on an additional assumption that the target is exactly aligned with a single direction. The paper does not state whether this alignment is necessary or how the spectral reshaping generalizes to targets with multiple relevant directions.

minor comments (2)

[§2] Notation for the proportional limit (n/d → γ) is introduced in §2 but used inconsistently in the statement of the main result; a single displayed definition would improve readability.
[Figure 2] Figure 2 caption does not specify the value of the step size η used in the simulation, making it difficult to relate the plotted spectra to the theoretical regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which help clarify the scope and presentation of our results. We respond to each major comment below.

read point-by-point responses

Referee: [§3.1, Theorem 1] §3.1, Theorem 1: The spiked-Gaussian approximation is asserted to hold after a 'sufficiently large' gradient step, yet the statement provides neither an explicit lower bound on the step size η nor quantitative error bounds (in total variation or Wasserstein distance) that depend on n, d, and η. Without these, it is impossible to verify the domain of validity of the claimed limit or to assess how the approximation degrades when the step-size condition is relaxed.

Authors: We agree that greater precision on the step-size condition would improve the statement. In the proof of Theorem 1 the requirement that η be sufficiently large arises from ensuring the target-dependent spike dominates the fluctuation terms in the high-dimensional limit; this translates to η exceeding a constant determined by the data variance and the Lipschitz constant of the activation. We will revise the theorem to state an explicit lower bound of this form. Our analysis is strictly asymptotic (n,d→∞ with n/d fixed), so we do not derive non-asymptotic total-variation or Wasserstein bounds; we will add a remark noting this limitation and the resulting domain of validity. revision: partial
Referee: [§4.2, Eq. (17)] §4.2, Eq. (17): The induced kernel is defined via the expectation over the spiked covariance; however, the derivation of the eigenvalue amplification and eigenfunction mixing (radial mode coupled to quadratic harmonic) appears to rest on an additional assumption that the target is exactly aligned with a single direction. The paper does not state whether this alignment is necessary or how the spectral reshaping generalizes to targets with multiple relevant directions.

Authors: The single-direction setting is adopted for expository clarity, as it already exhibits the essential phenomenon of target-dependent eigenvalue amplification and the specific radial-to-quadratic mixing. The underlying spiked-covariance construction extends immediately to a finite number of spikes aligned with a multi-dimensional target subspace; the induced kernel then amplifies the corresponding eigenspace and produces analogous mixing within that subspace. We will revise §4.2 to state the single-direction assumption explicitly and add a short paragraph describing the multi-spike generalization, confirming that the qualitative conclusions on data-adaptive reshaping remain unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation proceeds from explicit high-dimensional limit assumptions without reduction to fitted inputs or self-referential definitions

full rationale

The paper derives the spiked Gaussian covariance approximation for post-update features from gradient descent dynamics under the stated proportional regime (n,d→∞, n/d fixed) and large step-size condition. This is presented as a proven limit result rather than an ansatz, fit, or self-definition. No equations reduce the target-dependent kernel or spectral reshaping to a tautology or to a parameter fitted from the same data being predicted. Self-citations, if present, are not load-bearing for the central claim, which rests on the regime-specific analysis rather than prior author work invoked as uniqueness. The result is not a renaming of a known pattern but a characterization of function-space evolution. The derivation chain is therefore self-contained against the explicit assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the high-dimensional proportional limit and the large-gradient-step approximation; these are standard modeling choices in the field but constitute the main unverified assumptions for the result.

axioms (2)

domain assumption High-dimensional proportional regime: n, d → ∞ with n/d = γ fixed
Invoked to obtain the spiked Gaussian approximation after one gradient step
domain assumption Sufficiently large gradient step size
Required for the post-update feature distribution to concentrate around the target-dependent spike

pith-pipeline@v0.9.0 · 5722 in / 1441 out tokens · 33307 ms · 2026-05-19T22:22:46.075133+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 2 internal anchors

[1]

R., Millman, K

Charles R. Harris and K. Jarrod Millman and St. Array programming with. 2020 , month = sep, journal =. doi:10.1038/s41586-020-2649-2 , publisher =

work page doi:10.1038/s41586-020-2649-2 2020
[2]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and. Nature Methods , year =

work page
[3]

and Varoquaux, G

Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E. , journal=. Scikit-learn: Machine Learning in

work page
[4]

PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...

work page
[5]

Journal of Machine Learning Research , year =

Xiangxiang Xu and Lizhong Zheng , title =. Journal of Machine Learning Research , year =

work page
[6]

2025 , journal=

Learning Multi-Index Models with Hyper-Kernel Ridge Regression , author=. 2025 , journal=

work page 2025
[7]

2011 , eprint=

Introduction to the non-asymptotic analysis of random matrices , author=. 2011 , eprint=

work page 2011
[8]

, journal=

Price, R. , journal=. A useful theorem for nonlinear devices having Gaussian inputs , year=

work page
[9]

, journal=

McMahon, E. , journal=. An extension of Price's theorem (Corresp.) , year=

work page
[10]

2025 , eprint=

Learning single-index models via harmonic decomposition , author=. 2025 , eprint=

work page 2025
[11]

Mathematics of the USSR-Sbornik , volume=

Distribution of eigenvalues for some sets of random matrices , author=. Mathematics of the USSR-Sbornik , volume=

work page
[12]

2024 , eprint=

A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator , author=. 2024 , eprint=

work page 2024
[13]

2024 , eprint=

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions , author=. 2024 , eprint=

work page 2024
[14]

Advances in Neural Information Processing Systems , pages =

Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks , author =. Advances in Neural Information Processing Systems , pages =

work page
[15]

Linear Algebra and Its Applications , volume=

Characterization of the subdifferential of some matrix norms , author=. Linear Algebra and Its Applications , volume=

work page
[16]

International Conference on Machine Learning , pages=

Gaussian process kernels for pattern discovery and extrapolation , author=. International Conference on Machine Learning , pages=

work page
[17]

ICML , pages=

Gaussian process kernels for pattern discovery and extrapolation , author=. ICML , pages=

work page
[18]

Journal of Machine Learning Research , volume=

Algorithms for learning kernels based on centered alignment , author=. Journal of Machine Learning Research , volume=

work page
[19]

Journal of Machine Learning Research , volume=

Multiple kernel learning algorithms , author=. Journal of Machine Learning Research , volume=

work page
[20]

and Song, Le and Wilson, Andrew Gordon , booktitle=

Yang, Zichao and Smola, Alexander J. and Song, Le and Wilson, Andrew Gordon , booktitle=. \`

work page
[21]

Fixed point and

Ma, Shiqian and Goldfarb, Donald and Chen, Lifeng , journal=. Fixed point and

work page
[22]

Mathematical Programming , volume=

Smooth minimization of non-smooth functions , author=. Mathematical Programming , volume=

work page
[23]

Advances in Neural Information Processing Systems , year=

Convolutional kernel networks , author=. Advances in Neural Information Processing Systems , year=

work page
[24]

International Conference on Machine Learning , pages=

Learning a kernel matrix for nonlinear dimensionality reduction , author=. International Conference on Machine Learning , pages=

work page
[25]

International Conference on Machine Learning , pages=

Geometry-aware metric learning , author=. International Conference on Machine Learning , pages=

work page
[26]

International Conference on Computer Analysis of Images and Patterns , pages=

Learning geometry-aware kernels in a regularization framework , author=. International Conference on Computer Analysis of Images and Patterns , pages=

work page
[27]

Neural Networks , volume=

Ideal regularization for learning kernels from labels , author=. Neural Networks , volume=

work page
[28]

Journal of Machine Learning Research , volume=

A family of simple non-parametric kernel learning algorithms , author=. Journal of Machine Learning Research , volume=

work page
[29]

IEEE Transactions on Systems Man and Cybernetics Part B , volume=

An explicit nonlinear mapping for manifold learning , author=. IEEE Transactions on Systems Man and Cybernetics Part B , volume=

work page
[30]

International conference on machine learning , pages=

On a nonlinear generalization of sparse coding and dictionary learning , author=. International conference on machine learning , pages=

work page
[31]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Graph embedding and extensions: a general framework for dimensionality reduction , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page
[32]

Advances in Neural Information Processing Systems , pages=

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , author=. Advances in Neural Information Processing Systems , pages=

work page
[33]

AAAI Conference on Artificial Intelligence , pages=

A Generalised Solution to the Out-of-Sample Extension Problem in Manifold Learning , author=. AAAI Conference on Artificial Intelligence , pages=

work page
[34]

IEEE Transactions on Neural Networks , volume=

Semi-supervised kernel matrix learning by kernel propagation , author=. IEEE Transactions on Neural Networks , volume=

work page
[35]

Journal of Machine Learning Research , volume=

Metric and kernel learning using a linear transformation , author=. Journal of Machine Learning Research , volume=

work page
[36]

Journal of Machine Learning Research , volume=

Learning the kernel with hyperkernels , author=. Journal of Machine Learning Research , volume=

work page
[37]

Advances in Neural Information Processing Systems , year=

On valid optimal assignment kernels and applications to graph classification , author=. Advances in Neural Information Processing Systems , year=

work page
[38]

Advances in Neural Information Processing Systems , pages=

Nonparametric transforms of graph kernels for semi-supervised learning , author=. Advances in Neural Information Processing Systems , pages=

work page
[39]

Advances in Neural Information Processing Systems , pages=

Fast kernel learning for multidimensional pattern extrapolation , author=. Advances in Neural Information Processing Systems , pages=

work page
[40]

International Conference on Machine Learning , pages=

Two-stage learning kernel algorithms , author=. International Conference on Machine Learning , pages=

work page
[41]

ICML , pages=

Two-stage learning kernel algorithms , author=. ICML , pages=

work page
[42]

Proceedins of Advances in Neural Information Processing Systems , pages =

Learning kernels with random features , author =. Proceedins of Advances in Neural Information Processing Systems , pages =

work page
[43]

NeurIPS , pages =

Learning kernels with random features , author =. NeurIPS , pages =

work page
[44]

Foundations of Computational Mathematics , volume=

Analysis of support vector machines regression , author=. Foundations of Computational Mathematics , volume=

work page
[45]

Foundations of Computational Mathematics , volume=

Learning rates of least-square regularized regression , author=. Foundations of Computational Mathematics , volume=

work page
[46]

Annals of Statistics , volume=

Fast learning rate of multiple kernel learning: trade-off between sparsity and smoothness , author=. Annals of Statistics , volume=

work page
[47]

Conference on Learning Theory , year=

Optimal rates for regularized least squares regression , author=. Conference on Learning Theory , year=

work page
[48]

2004 , booktitle=

Learning with non-positive kernels , author=. 2004 , booktitle=

work page 2004
[49]

, author=

Learning with convex loss and indefinite kernels. , author=. Neural Computation , volume=

work page
[50]

Applied and Computational Harmonic Analysis , volume=

Least square regression with indefinite kernels and coefficient regularization , author=. Applied and Computational Harmonic Analysis , volume=

work page
[51]

International Conference on Artificial Neural Networks , pages=

Indefinite support vector regression , author=. International Conference on Artificial Neural Networks , pages=

work page
[52]

AAAI Conference on Artificial Intelligence , pages=

Nonlinear pairwise layer and its training for kernel learning , author=. AAAI Conference on Artificial Intelligence , pages=

work page
[53]

Advances in Neural Information Processing Systems , year=

Learning low-dimensional metrics , author=. Advances in Neural Information Processing Systems , year=

work page
[54]

Foundations and Trends in Machine Learning , volume=

Metric learning: a survey , author=. Foundations and Trends in Machine Learning , volume=

work page
[55]

Huang, Zhiwu and Wang, Ruiping and Shan, Shiguang and Li, Xianqiu and Chen, Xilin , year=. Log-

work page
[56]

1974 , publisher=

Indefinite inner product spaces , author=. 1974 , publisher=

work page 1974
[57]

Foundations and Trends

Pairwise independence and derandomization , author=. Foundations and Trends. 2006 , publisher=

work page 2006
[58]

Neural computation , volume=

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming , author=. Neural computation , volume=

work page
[59]

Foundations of Computational Mathematics , volume=

Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , author=. Foundations of Computational Mathematics , volume=

work page
[60]

Gaussian and

Kondor, Risi and Jebara, Tony , booktitle=. Gaussian and

work page
[61]

Journal of Machine Learning Research , volume=

Graph kernels , author=. Journal of Machine Learning Research , volume=

work page
[62]

Journal of Applied Mathematics , volume=

Approximation analysis of learning algorithms for support vector regression and quantile regression , author=. Journal of Applied Mathematics , volume=

work page
[63]

, journal=

Shi, Lei and Huang, Xiaolin and Tian, Zheng and Suykens, Johan A.K. , journal=. Quantile regression with _1 -regularization and

work page
[64]

2007 , publisher=

Learning theory: an approximation theory viewpoint , author=. 2007 , publisher=

work page 2007
[65]

Conditionally positive definite kernels for

Boughorbel, Sabri and Tarel, J-P and Boujemaa, Nozha , booktitle=. Conditionally positive definite kernels for

work page
[66]

IEEE Transactions on Image Processing , volume=

Out-of-sample generalizations for supervised manifold learning for classification , author=. IEEE Transactions on Image Processing , volume=

work page
[67]

International Conference on Computer Vision , pages=

Attribute and simile classifiers for face verification , author=. International Conference on Computer Vision , pages=

work page
[68]

Journal of Complexity , volume=

The covering number in learning theory , author=. Journal of Complexity , volume=. 2002 , publisher=

work page 2002
[69]

Journal of Machine Learning Research , volume=

Learning theory approach to minimum error entropy criterion , author=. Journal of Machine Learning Research , volume=

work page
[70]

Fast rates for support vector machines using

Steinwart, Ingo and Scovel, Clint , journal=. Fast rates for support vector machines using

work page
[71]

Scalable

Jang, Phillip A and Loeb, Andrew and Davidow, Matthew and Wilson, Andrew G , booktitle=. Scalable

work page
[72]

IEEE transactions on pattern analysis and machine intelligence , volume=

Representation learning: A review and new perspectives , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2013 , publisher=

work page 2013
[73]

CHI'06 extended abstracts on Human factors in computing systems , pages=

Being accurate is not enough: how accuracy metrics have hurt recommender systems , author=. CHI'06 extended abstracts on Human factors in computing systems , pages=. 2006 , organization=

work page 2006
[74]

Advances in Computational Mathematics , volume=

Concentration estimates for learning with unbounded sampling , author=. Advances in Computational Mathematics , volume=. 2013 , publisher=

work page 2013
[75]

arXiv preprint arXiv:1711.07271 , year=

Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions , author=. arXiv preprint arXiv:1711.07271 , year=

work page arXiv
[76]

, author=

Learning with varying insensitive loss. , author=. Applied Mathematics Letters , volume=

work page
[77]

Machine Learning , volume=

Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=

work page 1995
[78]

Journal of Machine Learning Research , volume=

Building support vector machines with reduced classifier complexity , author=. Journal of Machine Learning Research , volume=

work page
[79]

International Conference on Machine Learning , pages=

A divide-and-conquer solver for kernel support vector machines , author=. International Conference on Machine Learning , pages=

work page
[80]

ICML , pages=

A divide-and-conquer solver for kernel support vector machines , author=. ICML , pages=

work page

Showing first 80 references.

[1] [1]

R., Millman, K

Charles R. Harris and K. Jarrod Millman and St. Array programming with. 2020 , month = sep, journal =. doi:10.1038/s41586-020-2649-2 , publisher =

work page doi:10.1038/s41586-020-2649-2 2020

[2] [2]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and. Nature Methods , year =

work page

[3] [3]

and Varoquaux, G

Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E. , journal=. Scikit-learn: Machine Learning in

work page

[4] [4]

PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...

work page

[5] [5]

Journal of Machine Learning Research , year =

Xiangxiang Xu and Lizhong Zheng , title =. Journal of Machine Learning Research , year =

work page

[6] [6]

2025 , journal=

Learning Multi-Index Models with Hyper-Kernel Ridge Regression , author=. 2025 , journal=

work page 2025

[7] [7]

2011 , eprint=

Introduction to the non-asymptotic analysis of random matrices , author=. 2011 , eprint=

work page 2011

[8] [8]

, journal=

Price, R. , journal=. A useful theorem for nonlinear devices having Gaussian inputs , year=

work page

[9] [9]

, journal=

McMahon, E. , journal=. An extension of Price's theorem (Corresp.) , year=

work page

[10] [10]

2025 , eprint=

Learning single-index models via harmonic decomposition , author=. 2025 , eprint=

work page 2025

[11] [11]

Mathematics of the USSR-Sbornik , volume=

Distribution of eigenvalues for some sets of random matrices , author=. Mathematics of the USSR-Sbornik , volume=

work page

[12] [12]

2024 , eprint=

A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator , author=. 2024 , eprint=

work page 2024

[13] [13]

2024 , eprint=

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions , author=. 2024 , eprint=

work page 2024

[14] [14]

Advances in Neural Information Processing Systems , pages =

Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks , author =. Advances in Neural Information Processing Systems , pages =

work page

[15] [15]

Linear Algebra and Its Applications , volume=

Characterization of the subdifferential of some matrix norms , author=. Linear Algebra and Its Applications , volume=

work page

[16] [16]

International Conference on Machine Learning , pages=

Gaussian process kernels for pattern discovery and extrapolation , author=. International Conference on Machine Learning , pages=

work page

[17] [17]

ICML , pages=

Gaussian process kernels for pattern discovery and extrapolation , author=. ICML , pages=

work page

[18] [18]

Journal of Machine Learning Research , volume=

Algorithms for learning kernels based on centered alignment , author=. Journal of Machine Learning Research , volume=

work page

[19] [19]

Journal of Machine Learning Research , volume=

Multiple kernel learning algorithms , author=. Journal of Machine Learning Research , volume=

work page

[20] [20]

and Song, Le and Wilson, Andrew Gordon , booktitle=

Yang, Zichao and Smola, Alexander J. and Song, Le and Wilson, Andrew Gordon , booktitle=. \`

work page

[21] [21]

Fixed point and

Ma, Shiqian and Goldfarb, Donald and Chen, Lifeng , journal=. Fixed point and

work page

[22] [22]

Mathematical Programming , volume=

Smooth minimization of non-smooth functions , author=. Mathematical Programming , volume=

work page

[23] [23]

Advances in Neural Information Processing Systems , year=

Convolutional kernel networks , author=. Advances in Neural Information Processing Systems , year=

work page

[24] [24]

International Conference on Machine Learning , pages=

Learning a kernel matrix for nonlinear dimensionality reduction , author=. International Conference on Machine Learning , pages=

work page

[25] [25]

International Conference on Machine Learning , pages=

Geometry-aware metric learning , author=. International Conference on Machine Learning , pages=

work page

[26] [26]

International Conference on Computer Analysis of Images and Patterns , pages=

Learning geometry-aware kernels in a regularization framework , author=. International Conference on Computer Analysis of Images and Patterns , pages=

work page

[27] [27]

Neural Networks , volume=

Ideal regularization for learning kernels from labels , author=. Neural Networks , volume=

work page

[28] [28]

Journal of Machine Learning Research , volume=

A family of simple non-parametric kernel learning algorithms , author=. Journal of Machine Learning Research , volume=

work page

[29] [29]

IEEE Transactions on Systems Man and Cybernetics Part B , volume=

An explicit nonlinear mapping for manifold learning , author=. IEEE Transactions on Systems Man and Cybernetics Part B , volume=

work page

[30] [30]

International conference on machine learning , pages=

On a nonlinear generalization of sparse coding and dictionary learning , author=. International conference on machine learning , pages=

work page

[31] [31]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Graph embedding and extensions: a general framework for dimensionality reduction , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page

[32] [32]

Advances in Neural Information Processing Systems , pages=

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , author=. Advances in Neural Information Processing Systems , pages=

work page

[33] [33]

AAAI Conference on Artificial Intelligence , pages=

A Generalised Solution to the Out-of-Sample Extension Problem in Manifold Learning , author=. AAAI Conference on Artificial Intelligence , pages=

work page

[34] [34]

IEEE Transactions on Neural Networks , volume=

Semi-supervised kernel matrix learning by kernel propagation , author=. IEEE Transactions on Neural Networks , volume=

work page

[35] [35]

Journal of Machine Learning Research , volume=

Metric and kernel learning using a linear transformation , author=. Journal of Machine Learning Research , volume=

work page

[36] [36]

Journal of Machine Learning Research , volume=

Learning the kernel with hyperkernels , author=. Journal of Machine Learning Research , volume=

work page

[37] [37]

Advances in Neural Information Processing Systems , year=

On valid optimal assignment kernels and applications to graph classification , author=. Advances in Neural Information Processing Systems , year=

work page

[38] [38]

Advances in Neural Information Processing Systems , pages=

Nonparametric transforms of graph kernels for semi-supervised learning , author=. Advances in Neural Information Processing Systems , pages=

work page

[39] [39]

Advances in Neural Information Processing Systems , pages=

Fast kernel learning for multidimensional pattern extrapolation , author=. Advances in Neural Information Processing Systems , pages=

work page

[40] [40]

International Conference on Machine Learning , pages=

Two-stage learning kernel algorithms , author=. International Conference on Machine Learning , pages=

work page

[41] [41]

ICML , pages=

Two-stage learning kernel algorithms , author=. ICML , pages=

work page

[42] [42]

Proceedins of Advances in Neural Information Processing Systems , pages =

Learning kernels with random features , author =. Proceedins of Advances in Neural Information Processing Systems , pages =

work page

[43] [43]

NeurIPS , pages =

Learning kernels with random features , author =. NeurIPS , pages =

work page

[44] [44]

Foundations of Computational Mathematics , volume=

Analysis of support vector machines regression , author=. Foundations of Computational Mathematics , volume=

work page

[45] [45]

Foundations of Computational Mathematics , volume=

Learning rates of least-square regularized regression , author=. Foundations of Computational Mathematics , volume=

work page

[46] [46]

Annals of Statistics , volume=

Fast learning rate of multiple kernel learning: trade-off between sparsity and smoothness , author=. Annals of Statistics , volume=

work page

[47] [47]

Conference on Learning Theory , year=

Optimal rates for regularized least squares regression , author=. Conference on Learning Theory , year=

work page

[48] [48]

2004 , booktitle=

Learning with non-positive kernels , author=. 2004 , booktitle=

work page 2004

[49] [49]

, author=

Learning with convex loss and indefinite kernels. , author=. Neural Computation , volume=

work page

[50] [50]

Applied and Computational Harmonic Analysis , volume=

Least square regression with indefinite kernels and coefficient regularization , author=. Applied and Computational Harmonic Analysis , volume=

work page

[51] [51]

International Conference on Artificial Neural Networks , pages=

Indefinite support vector regression , author=. International Conference on Artificial Neural Networks , pages=

work page

[52] [52]

AAAI Conference on Artificial Intelligence , pages=

Nonlinear pairwise layer and its training for kernel learning , author=. AAAI Conference on Artificial Intelligence , pages=

work page

[53] [53]

Advances in Neural Information Processing Systems , year=

Learning low-dimensional metrics , author=. Advances in Neural Information Processing Systems , year=

work page

[54] [54]

Foundations and Trends in Machine Learning , volume=

Metric learning: a survey , author=. Foundations and Trends in Machine Learning , volume=

work page

[55] [55]

Huang, Zhiwu and Wang, Ruiping and Shan, Shiguang and Li, Xianqiu and Chen, Xilin , year=. Log-

work page

[56] [56]

1974 , publisher=

Indefinite inner product spaces , author=. 1974 , publisher=

work page 1974

[57] [57]

Foundations and Trends

Pairwise independence and derandomization , author=. Foundations and Trends. 2006 , publisher=

work page 2006

[58] [58]

Neural computation , volume=

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming , author=. Neural computation , volume=

work page

[59] [59]

Foundations of Computational Mathematics , volume=

Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , author=. Foundations of Computational Mathematics , volume=

work page

[60] [60]

Gaussian and

Kondor, Risi and Jebara, Tony , booktitle=. Gaussian and

work page

[61] [61]

Journal of Machine Learning Research , volume=

Graph kernels , author=. Journal of Machine Learning Research , volume=

work page

[62] [62]

Journal of Applied Mathematics , volume=

Approximation analysis of learning algorithms for support vector regression and quantile regression , author=. Journal of Applied Mathematics , volume=

work page

[63] [63]

, journal=

Shi, Lei and Huang, Xiaolin and Tian, Zheng and Suykens, Johan A.K. , journal=. Quantile regression with _1 -regularization and

work page

[64] [64]

2007 , publisher=

Learning theory: an approximation theory viewpoint , author=. 2007 , publisher=

work page 2007

[65] [65]

Conditionally positive definite kernels for

Boughorbel, Sabri and Tarel, J-P and Boujemaa, Nozha , booktitle=. Conditionally positive definite kernels for

work page

[66] [66]

IEEE Transactions on Image Processing , volume=

Out-of-sample generalizations for supervised manifold learning for classification , author=. IEEE Transactions on Image Processing , volume=

work page

[67] [67]

International Conference on Computer Vision , pages=

Attribute and simile classifiers for face verification , author=. International Conference on Computer Vision , pages=

work page

[68] [68]

Journal of Complexity , volume=

The covering number in learning theory , author=. Journal of Complexity , volume=. 2002 , publisher=

work page 2002

[69] [69]

Journal of Machine Learning Research , volume=

Learning theory approach to minimum error entropy criterion , author=. Journal of Machine Learning Research , volume=

work page

[70] [70]

Fast rates for support vector machines using

Steinwart, Ingo and Scovel, Clint , journal=. Fast rates for support vector machines using

work page

[71] [71]

Scalable

Jang, Phillip A and Loeb, Andrew and Davidow, Matthew and Wilson, Andrew G , booktitle=. Scalable

work page

[72] [72]

IEEE transactions on pattern analysis and machine intelligence , volume=

Representation learning: A review and new perspectives , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2013 , publisher=

work page 2013

[73] [73]

CHI'06 extended abstracts on Human factors in computing systems , pages=

Being accurate is not enough: how accuracy metrics have hurt recommender systems , author=. CHI'06 extended abstracts on Human factors in computing systems , pages=. 2006 , organization=

work page 2006

[74] [74]

Advances in Computational Mathematics , volume=

Concentration estimates for learning with unbounded sampling , author=. Advances in Computational Mathematics , volume=. 2013 , publisher=

work page 2013

[75] [75]

arXiv preprint arXiv:1711.07271 , year=

Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions , author=. arXiv preprint arXiv:1711.07271 , year=

work page arXiv

[76] [76]

, author=

Learning with varying insensitive loss. , author=. Applied Mathematics Letters , volume=

work page

[77] [77]

Machine Learning , volume=

Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=

work page 1995

[78] [78]

Journal of Machine Learning Research , volume=

Building support vector machines with reduced classifier complexity , author=. Journal of Machine Learning Research , volume=

work page

[79] [79]

International Conference on Machine Learning , pages=

A divide-and-conquer solver for kernel support vector machines , author=. International Conference on Machine Learning , pages=

work page

[80] [80]

ICML , pages=

A divide-and-conquer solver for kernel support vector machines , author=. ICML , pages=

work page