Proceedings of the Forty-Third International Conference on Machine Learning , year =

· 2026 · arXiv 2603.01968

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

A Stochastic--Geometric Theory of Scaling Laws in Grokking

stat.ML · 2026-06-29 · unverdicted · novelty 6.0

A stochastic-geometric model of solution-space topology under Adam derives explicit scaling laws for grokking transition time as a function of learning rate, batch size, and L2 coefficient.

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Larger models succeed on rare and complex tasks by reducing gradient interference from common tasks, allowing rare-task features to accumulate, as shown via synthetic task mixtures and OLMo pretraining from 4M to 4B parameters.

citing papers explorer

Showing 2 of 2 citing papers after filters.

A Stochastic--Geometric Theory of Scaling Laws in Grokking stat.ML · 2026-06-29 · unverdicted · none · ref 27
A stochastic-geometric model of solution-space topology under Adam derives explicit scaling laws for grokking transition time as a function of learning rate, batch size, and L2 coefficient.
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention cs.LG · 2026-05-28 · unverdicted · none · ref 65
Larger models succeed on rare and complex tasks by reducing gradient interference from common tasks, allowing rare-task features to accumulate, as shown via synthetic task mixtures and OLMo pretraining from 4M to 4B parameters.

Proceedings of the Forty-Third International Conference on Machine Learning , year =

fields

years

verdicts

representative citing papers

citing papers explorer