Exact, tractable gauss-newton optimization in deep reversible architectures reveal poor generalization.arXiv preprint arXiv:2411.07979,

Davide Buffelli, Jamie McGowan, Wangkun Xu, Alexandru Cioba, Da-shan Shiu, Guillaume Hennequin, Alberto Bernacchia · arXiv 2411.07979

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

cs.LG · 2026-01-06 · unverdicted · novelty 5.0

Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

citing papers explorer

Showing 1 of 1 citing paper.

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime cs.LG · 2026-01-06 · unverdicted · none · ref 3
Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

Exact, tractable gauss-newton optimization in deep reversible architectures reveal poor generalization.arXiv preprint arXiv:2411.07979,

fields

years

verdicts

representative citing papers

citing papers explorer