International Conference on Learning Representations , year=

Gradient Descent Provably Optimizes Over-parameterized Neural Networks , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

cs.AI · 2026-05-20 · conditional · novelty 7.0

DPO-RLHF equivalence holds only conditionally on the optimal policy preferring human-preferred responses; otherwise DPO optimizes relative advantage and can prefer worse outputs, addressed by introducing CPO.

The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.

State-Space NTK Collapse Near Bifurcations

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Bifurcations cause sNTK to reduce to a dominant rank-one channel matching normal forms, collapsing effective rank and funneling gradient descent into critical dynamical directions.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

citing papers explorer

Showing 4 of 4 citing papers.

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment cs.AI · 2026-05-20 · conditional · none · ref 39
DPO-RLHF equivalence holds only conditionally on the optimal policy preferring human-preferred responses; otherwise DPO optimizes relative advantage and can prefer worse outputs, addressed by introducing CPO.
The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning cs.LG · 2026-05-09 · unverdicted · none · ref 39
The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.
State-Space NTK Collapse Near Bifurcations cs.LG · 2026-05-12 · unverdicted · none · ref 39
Bifurcations cause sNTK to reduce to a dominant rank-one channel matching normal forms, collapsing effective rank and funneling gradient descent into critical dynamical directions.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 156
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

International Conference on Learning Representations , year=

fields

years

verdicts

representative citing papers

citing papers explorer