Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
arXiv preprint arXiv:1906.05890 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 9verdicts
UNVERDICTED 9representative citing papers
A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.
Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.
Wide neural networks with cross-entropy loss remain in the lazy training regime under parameter-space regularization or non-degenerate targets, allowing explicit NTK-based solution characterization and uncertainty analysis.
Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.
Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
citing papers explorer
-
The Implicit Bias of Depth: From Neural Collapse to Softmax Codes
Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
-
Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees
A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.
-
Implicit Bias in Deep Linear Discriminant Analysis
Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.
-
Convergence of Continual Learning in Homogeneous Deep Networks
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
-
A Theory on Flow Matching with Neural Networks
Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.
-
The Neural Tangent Kernel for Classification
Wide neural networks with cross-entropy loss remain in the lazy training regime under parameter-space regularization or non-degenerate targets, allowing explicit NTK-based solution characterization and uncertainty analysis.
-
The Effect of Mini-Batch Noise on the Implicit Bias of Adam
Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.
-
Prediction horizon shapes representations in predictive learning
Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.