Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
arXiv preprint arXiv:1906.05890 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 9verdicts
UNVERDICTED 9representative citing papers
A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.
Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.
Wide neural networks with cross-entropy loss remain in the lazy training regime under parameter-space regularization or non-degenerate targets, allowing explicit NTK-based solution characterization and uncertainty analysis.
Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.
Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
citing papers explorer
No citing papers match the current filters.