Context gating in associative memories boosts inter-memory separation and sparsity for exponential retrieval gains, admits a unique fixed point driven by direct bias and feedback, and matches in-context learning dynamics in transformers like Llama-3.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Uniform-based discrete diffusion models behave as associative memories that retrieve unseen data, with a dataset-size-driven memorization-to-generalization transition detectable via conditional entropy of token predictions.
Ordinary least squares is a special case of the single-layer linear transformer when attention parameters are set via spectral decomposition of the empirical covariance matrix.
citing papers explorer
-
Context-Gated Associative Retrieval: From Theory to Transformers
Context gating in associative memories boosts inter-memory separation and sparsity for exponential retrieval gains, admits a unique fixed point driven by direct bias and feedback, and matches in-context learning dynamics in transformers like Llama-3.