Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
A generative transfer framework using iterative path-wise tilting integrated with conditional flow matching recovers target entropic optimal transport couplings from reference samples, achieving O(δ) convergence in Wasserstein-1 distance.
The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.
Bifurcations cause sNTK to reduce to a dominant rank-one channel matching normal forms, collapsing effective rank and funneling gradient descent into critical dynamical directions.
E-value sequential tests enable early stopping of MCMC sampling in Bayesian deep ensembles, often needing only a fraction of the full budget while improving over standard deep ensembles.
A mirror descent algorithm computes exact Wasserstein barycenters for mixed discrete and continuous input measures with convergence guarantees.
citing papers explorer
-
The Implicit Bias of Depth: From Neural Collapse to Softmax Codes
Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
-
Generative Transfer for Entropic Optimal Transport with Unknown Costs
A generative transfer framework using iterative path-wise tilting integrated with conditional flow matching recovers target entropic optimal transport couplings from reference samples, achieving O(δ) convergence in Wasserstein-1 distance.
-
The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning
The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.
-
State-Space NTK Collapse Near Bifurcations
Bifurcations cause sNTK to reduce to a dominant rank-one channel matching normal forms, collapsing effective rank and funneling gradient descent into critical dynamical directions.
-
Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles
E-value sequential tests enable early stopping of MCMC sampling in Bayesian deep ensembles, often needing only a fraction of the full budget while improving over standard deep ensembles.
-
A Unified Approach for Computing Wasserstein Barycenters of Discrete and Continuous Measures
A mirror descent algorithm computes exact Wasserstein barycenters for mixed discrete and continuous input measures with convergence guarantees.
- PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting