arXiv preprint arXiv:2201.04753 , year=

· 2022 · arXiv 2201.04753

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

Spectral Perturbation of the Empirical Fisher Information Matrix under Weight Quantization

stat.ML · 2026-06-26 · unverdicted · novelty 6.0

Derives Weyl-based perturbation bounds showing quantization increases the dominant eigenvalue of the empirical FIM up to higher-order terms, with supporting measurements on language models.

Bayesian Inference with Shaped Deep Non-linear MLPs

math.ST · 2026-05-29 · unverdicted · novelty 5.0

In the LP/N = Θ(1) regime, Bayesian predictive posteriors for deep MLPs equal those of data-dependent kernels to first order, with a criterion identifying data processes that benefit from larger effective depth.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 154 · 2 links
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
Spectral Perturbation of the Empirical Fisher Information Matrix under Weight Quantization stat.ML · 2026-06-26 · unverdicted · none · ref 8
Derives Weyl-based perturbation bounds showing quantization increases the dominant eigenvalue of the empirical FIM up to higher-order terms, with supporting measurements on language models.
Bayesian Inference with Shaped Deep Non-linear MLPs math.ST · 2026-05-29 · unverdicted · none · ref 5
In the LP/N = Θ(1) regime, Bayesian predictive posteriors for deep MLPs equal those of data-dependent kernels to first order, with a criterion identifying data processes that benefit from larger effective depth.

arXiv preprint arXiv:2201.04753 , year=

fields

years

verdicts

representative citing papers

citing papers explorer