Establishes asymptotic consistency of factor estimates and √T-normality in factor-augmented regressions for fixed R ≥ r using anisotropic local laws from random matrix theory.
Advances in neural information processing systems , volume=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
Extends high-dimensional KRR to product kernels, proving convergence rates that recover minimax optimality for source condition s ≤ 1, saturation for s > 1, and multiple-descent phenomena with respect to sample size n.
Wahkon unifies Kolmogorov superposition with RKHS regularization to produce a deep network whose penalized estimator is exactly the MAP under a hierarchical GP prior and achieves minimax-optimal rates.
AdamO modifies Adam with an orthogonality correction to ensure the spectral radius of the TD update operator stays below one, providing a theoretical stability guarantee for offline RL.
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.
citing papers explorer
-
Fixed-order PCA: Theory for Overestimated Factor Models
Establishes asymptotic consistency of factor estimates and √T-normality in factor-augmented regressions for fixed R ≥ r using anisotropic local laws from random matrix theory.
-
Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model
A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
-
Large Dimensional Kernel Ridge Regression: Extending to Product Kernels
Extends high-dimensional KRR to product kernels, proving convergence rates that recover minimax optimality for source condition s ≤ 1, saturation for s > 1, and multiple-descent phenomena with respect to sample size n.
-
Wahkon: A Statistically Principled Deep RKHS Superposition Network
Wahkon unifies Kolmogorov superposition with RKHS regularization to produce a deep network whose penalized estimator is exactly the MAP under a hierarchical GP prior and achieves minimax-optimal rates.
-
AdamO: A Collapse-Suppressed Optimizer for Offline RL
AdamO modifies Adam with an orthogonality correction to ensure the spectral radius of the TD update operator stays below one, providing a theoretical stability guarantee for offline RL.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.