pith. sign in

arXiv preprint arXiv:1906.05890 , year=

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

fields

cs.LG 9

verdicts

UNVERDICTED 9

clear filters

representative citing papers

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.

Implicit Bias in Deep Linear Discriminant Analysis

cs.LG · 2026-03-03 · unverdicted · novelty 7.0

Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.

Convergence of Continual Learning in Homogeneous Deep Networks

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.

A Theory on Flow Matching with Neural Networks

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.

The Neural Tangent Kernel for Classification

cs.LG · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

Wide neural networks with cross-entropy loss remain in the lazy training regime under parameter-space regularization or non-degenerate targets, allowing explicit NTK-based solution characterization and uncertainty analysis.

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

cs.LG · 2026-02-02 · unverdicted · novelty 6.0

Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.

citing papers explorer

Showing 9 of 9 citing papers after filters.

  • The Implicit Bias of Depth: From Neural Collapse to Softmax Codes cs.LG · 2026-05-21 · unverdicted · none · ref 108

    Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.

  • Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees cs.LG · 2026-05-07 · unverdicted · none · ref 4

    A unified data reconstruction attack achieves provable finite-width recovery in random feature networks and efficient subspace-based reconstruction for general models using weight changes.

  • Implicit Bias in Deep Linear Discriminant Analysis cs.LG · 2026-03-03 · unverdicted · none · ref 11

    Gradient flow on deep diagonal linear LDA networks with balanced initialization converts additive updates to multiplicative updates, automatically conserving the (2/L) quasi-norm.

  • Convergence of Continual Learning in Homogeneous Deep Networks cs.LG · 2026-06-29 · unverdicted · none · ref 3

    Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.

  • A Theory on Flow Matching with Neural Networks cs.LG · 2026-06-08 · unverdicted · none · ref 262

    Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-task representation learning bounds.

  • The Neural Tangent Kernel for Classification cs.LG · 2026-05-17 · unverdicted · none · ref 4 · 2 links

    Wide neural networks with cross-entropy loss remain in the lazy training regime under parameter-space regularization or non-degenerate targets, allowing explicit NTK-based solution characterization and uncertainty analysis.

  • The Effect of Mini-Batch Noise on the Implicit Bias of Adam cs.LG · 2026-02-02 · unverdicted · none · ref 42

    Mini-batch noise reverses how Adam's β2 controls anti-regularization, making default momentum values suitable for small batches but requiring β1 closer to β2 for large batches to favor flatter minima.

  • Prediction horizon shapes representations in predictive learning cs.LG · 2025-11-12 · unverdicted · none · ref 5

    Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.

  • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 243

    SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.