Deep double descent: Where bigger models and more data hurt.Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003

Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever · 2021

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

cs.LG · 2025-04-29 · accept · novelty 7.0

One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.

Transformers for dynamical systems learn transfer operators in-context

cs.LG · 2026-02-21 · unverdicted · novelty 6.0

Small transformers learn to forecast unseen dynamical systems in-context by using delay embeddings to recover the manifold and forecasting its invariant sets via a transfer-operator strategy.

$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning

cs.LG · 2025-09-20 · conditional · novelty 6.0

λ-Orthogonality regularization enables distribution-specific adaptation of representations via affine transformations while retaining original learned structures.

A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization

cs.LG · 2025-08-24 · unverdicted · novelty 6.0

Negative-capable ridge regression uses controlled negative regularization as anti-shrinkage to increase effective complexity along weak eigendirections and mitigate underfitting in small-data regression.

Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.

citing papers explorer

Showing 5 of 5 citing papers.

Reinforcement Learning for Reasoning in Large Language Models with One Training Example cs.LG · 2025-04-29 · accept · none · ref 65
One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.
Transformers for dynamical systems learn transfer operators in-context cs.LG · 2026-02-21 · unverdicted · none · ref 48
Small transformers learn to forecast unseen dynamical systems in-context by using delay embeddings to recover the manifold and forecasting its invariant sets via a transfer-operator strategy.
$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning cs.LG · 2025-09-20 · conditional · none · ref 69
λ-Orthogonality regularization enables distribution-specific adaptation of representations via affine transformations while retaining original learned structures.
A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization cs.LG · 2025-08-24 · unverdicted · none · ref 20
Negative-capable ridge regression uses controlled negative regularization as anti-shrinkage to increase effective complexity along weak eigendirections and mitigate underfitting in small-data regression.
Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise cs.LG · 2026-05-18 · unverdicted · none · ref 15
Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.

Deep double descent: Where bigger models and more data hurt.Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003

fields

years

verdicts

representative citing papers

citing papers explorer