On large-batch training for deep learning: Generalization gap and sharp minima

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, Ping Tak Peter Tang · 2017

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization

cs.LG · 2025-08-24 · unverdicted · novelty 6.0

Negative-capable ridge regression uses controlled negative regularization as anti-shrinkage to increase effective complexity along weak eigendirections and mitigate underfitting in small-data regression.

Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings

cs.LG · 2026-05-07 · unverdicted · novelty 4.0

C3PO is a foundation model for bilevel pricing optimization that trains on simulated discrete choice data and retrieves elasticity priors from literature to improve revenue KPIs under business constraints.

Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

cs.LG · 2026-05-07 · 2 refs

Depth, Not Data: An Analysis of Hessian Spectral Bifurcation

cs.LG · 2026-01-31

citing papers explorer

Showing 4 of 4 citing papers.

A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization cs.LG · 2025-08-24 · unverdicted · none · ref 61
Negative-capable ridge regression uses controlled negative regularization as anti-shrinkage to increase effective complexity along weak eigendirections and mitigate underfitting in small-data regression.
Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings cs.LG · 2026-05-07 · unverdicted · none · ref 25
C3PO is a foundation model for bilevel pricing optimization that trains on simulated discrete choice data and retrieves elasticity priors from literature to improve revenue KPIs under business constraints.
Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes cs.LG · 2026-05-07 · unreviewed · ref 38 · 2 links
Depth, Not Data: An Analysis of Hessian Spectral Bifurcation cs.LG · 2026-01-31 · unreviewed · ref 5

On large-batch training for deep learning: Generalization gap and sharp minima

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer