pith. sign in

arXiv preprint arXiv:2206.05794 , year=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.LG 3 cs.CL 1

years

2026 4

representative citing papers

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.

Does Weight Decay Enhance Training Stability?

cs.LG · 2026-05-15 · conditional · novelty 6.0

Weight decay slows progressive sharpening at the edge of stability, inducing damped oscillations in CNNs and a phase transition to sub-2/η sharpness in MLPs driven by parameter-sharpness gradient alignment, yielding more stable NTK dynamics.

citing papers explorer

Showing 4 of 4 citing papers.

  • The Implicit Bias of Depth: From Neural Collapse to Softmax Codes cs.LG · 2026-05-21 · unverdicted · none · ref 137

    Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.

  • Evolutionary Search for Automated Design of Uncertainty Quantification Methods cs.CL · 2026-04-03 · unverdicted · none · ref 2

    LLM-driven evolutionary search discovers unsupervised UQ methods as Python programs that improve ROC-AUC by up to 6.7% over manual baselines on atomic claim verification across 9 datasets with OOD generalization.

  • Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics cs.LG · 2026-05-19 · conditional · none · ref 10

    Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.

  • Does Weight Decay Enhance Training Stability? cs.LG · 2026-05-15 · conditional · none · ref 8

    Weight decay slows progressive sharpening at the edge of stability, inducing damped oscillations in CNNs and a phase transition to sub-2/η sharpness in MLPs driven by parameter-sharpness gradient alignment, yielding more stable NTK dynamics.