Advances in Neural Information Processing Systems , year=

Neural tangent kernel: Convergence, generalization in neural networks , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

Unveiling High-Probability Generalization in Decentralized SGD

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.

Rethinking the Rank Threshold for LoRA Fine-Tuning

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

For binary classification in the NTK regime, LoRA rank r=1 suffices and is often optimal under cross-entropy loss, reducing the prior sufficient condition from r>=12.

citing papers explorer

Showing 3 of 3 citing papers.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 159 · 2 links
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
Unveiling High-Probability Generalization in Decentralized SGD cs.LG · 2026-05-11 · unverdicted · none · ref 120
High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.
Rethinking the Rank Threshold for LoRA Fine-Tuning cs.LG · 2026-05-05 · unverdicted · none · ref 5
For binary classification in the NTK regime, LoRA rank r=1 suffices and is often optimal under cross-entropy loss, reducing the prior sufficient condition from r>=12.

Advances in Neural Information Processing Systems , year=

fields

years

verdicts

representative citing papers

citing papers explorer