When does preconditioning help or hurt generalization?arXiv preprint arXiv:2006.10732

Amari, S · 2006 · arXiv 2006.10732

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

math.ST · 2026-05-10 · unverdicted · novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

cs.LG · 2025-02-07 · unverdicted · novelty 6.0

In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

cs.LG · 2026-01-06 · unverdicted · novelty 5.0

Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

citing papers explorer

Showing 4 of 4 citing papers.

The Statistical Cost of Adaptation in Multi-Source Transfer Learning math.ST · 2026-05-10 · unverdicted · none · ref 160
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation cs.LG · 2026-05-18 · unverdicted · none · ref 106
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension cs.LG · 2025-02-07 · unverdicted · none · ref 1
In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime cs.LG · 2026-01-06 · unverdicted · none · ref 1
Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

When does preconditioning help or hurt generalization?arXiv preprint arXiv:2006.10732

fields

years

verdicts

representative citing papers

citing papers explorer