Title resolution pending

· 2025 · arXiv 2510.22026

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

math.AP · 2026-05-11 · unverdicted · novelty 6.0

In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).

When Does Removing LayerNorm Help? Activation Bounding as a Regime-Dependent Implicit Regularizer

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.

Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning

cs.LG · 2026-04-30

citing papers explorer

Showing 4 of 4 citing papers.

Stability and Generalization in Looped Transformers cs.LG · 2026-04-16 · unverdicted · none · ref 13
Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime math.AP · 2026-05-11 · unverdicted · none · ref 31
In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).
When Does Removing LayerNorm Help? Activation Bounding as a Regime-Dependent Implicit Regularizer cs.LG · 2026-04-25 · unverdicted · none · ref 13
DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.
Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning cs.LG · 2026-04-30 · unreviewed · ref 64

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer