Gradient descent maximizes the margin of homogeneous neural networks

Gradient descent maximizes the margin of homogeneous neural networks , author= · 1906 · arXiv 1906.05890

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.

Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.

Implicit Bias in Deep Linear Discriminant Analysis

cs.LG · 2026-03-03 · unverdicted · novelty 7.0

Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

cs.LG · 2026-02-02 · unverdicted · novelty 6.0

Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.

Prediction horizon shapes representations in predictive learning

cs.LG · 2025-11-12 · unverdicted · novelty 6.0

Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

The Neural Tangent Kernel for Classification

cs.LG · 2026-05-17

citing papers explorer

Showing 7 of 7 citing papers.

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes cs.LG · 2026-05-21 · unverdicted · none · ref 108
Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees cs.LG · 2026-05-07 · unverdicted · none · ref 4
A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.
Implicit Bias in Deep Linear Discriminant Analysis cs.LG · 2026-03-03 · unverdicted · none · ref 11
Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.
The Effect of Mini-Batch Noise on the Implicit Bias of Adam cs.LG · 2026-02-02 · unverdicted · none · ref 42
Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.
Prediction horizon shapes representations in predictive learning cs.LG · 2025-11-12 · unverdicted · none · ref 5
Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 243
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
The Neural Tangent Kernel for Classification cs.LG · 2026-05-17 · unreviewed · ref 4

Gradient descent maximizes the margin of homogeneous neural networks

fields

years

verdicts

representative citing papers

citing papers explorer