Annual Meeting of the Association for Computational Linguistics , year =

Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom · arXiv 2211.12316

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

The two clocks and the innovation window: When and how generative models learn rules

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

cs.LG · 2026-01-06 · unverdicted · novelty 5.0

Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

citing papers explorer

Showing 2 of 2 citing papers.

The two clocks and the innovation window: When and how generative models learn rules cs.LG · 2026-05-11 · unverdicted · none · ref 26
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime cs.LG · 2026-01-06 · unverdicted · none · ref 2
Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

Annual Meeting of the Association for Computational Linguistics , year =

fields

years

verdicts

representative citing papers

citing papers explorer