arXiv preprint arXiv:1906.05890 , year=

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , author= · 1906 · arXiv 1906.05890

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.

Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.

Implicit Bias in Deep Linear Discriminant Analysis

cs.LG · 2026-03-03 · unverdicted · novelty 7.0

Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.

Convergence of Continual Learning in Homogeneous Deep Networks

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.

A Theory on Flow Matching with Neural Networks

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.

The Neural Tangent Kernel for Classification

cs.LG · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

Wide neural networks with cross-entropy loss remain in the lazy training regime under parameter-space regularization or non-degenerate targets, allowing explicit NTK-based solution characterization and uncertainty analysis.

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

cs.LG · 2026-02-02 · unverdicted · novelty 6.0

Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.

Prediction horizon shapes representations in predictive learning

cs.LG · 2025-11-12 · unverdicted · novelty 6.0

Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

citing papers explorer

Showing 9 of 9 citing papers after filters.

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes cs.LG · 2026-05-21 · unverdicted · none · ref 108
Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees cs.LG · 2026-05-07 · unverdicted · none · ref 4
A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.
Implicit Bias in Deep Linear Discriminant Analysis cs.LG · 2026-03-03 · unverdicted · none · ref 11
Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.
Convergence of Continual Learning in Homogeneous Deep Networks cs.LG · 2026-06-29 · unverdicted · none · ref 3
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
A Theory on Flow Matching with Neural Networks cs.LG · 2026-06-08 · unverdicted · none · ref 262
Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.
The Neural Tangent Kernel for Classification cs.LG · 2026-05-17 · unverdicted · none · ref 4 · 2 links
Wide neural networks with cross-entropy loss remain in the lazy training regime under parameter-space regularization or non-degenerate targets, allowing explicit NTK-based solution characterization and uncertainty analysis.
The Effect of Mini-Batch Noise on the Implicit Bias of Adam cs.LG · 2026-02-02 · unverdicted · none · ref 42
Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.
Prediction horizon shapes representations in predictive learning cs.LG · 2025-11-12 · unverdicted · none · ref 5
Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 243
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

arXiv preprint arXiv:1906.05890 , year=

fields

years

verdicts

representative citing papers

citing papers explorer