Root Mean Square Layer Normalization , url =

Zhang, Biao, Sennrich, Rico , booktitle =

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

Recurrent Transformers add per-layer recurrent memory via self-attention on own activations plus a tiling algorithm that reduces training memory traffic, yielding better C4 pretraining cross-entropy than parameter-matched standard transformers with fewer layers.

A geometric relation of the error introduced by sampling a language model's output distribution to its internal state

cs.LG · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

A geometric 1-form on token embeddings has curvature that couples to semantic world models in language models, as evidenced by clustering on chess board regions and piece importance.

How Language Models Process Negation

cs.CL · 2026-05-04

citing papers explorer

Showing 3 of 3 citing papers.

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding cs.LG · 2026-04-23 · unverdicted · none · ref 46
Recurrent Transformers add per-layer recurrent memory via self-attention on own activations plus a tiling algorithm that reduces training memory traffic, yielding better C4 pretraining cross-entropy than parameter-matched standard transformers with fewer layers.
A geometric relation of the error introduced by sampling a language model's output distribution to its internal state cs.LG · 2026-05-06 · unverdicted · none · ref 7 · 2 links
A geometric 1-form on token embeddings has curvature that couples to semantic world models in language models, as evidenced by clustering on chess board regions and piece importance.
How Language Models Process Negation cs.CL · 2026-05-04 · unreviewed · ref 32

Root Mean Square Layer Normalization , url =

fields

years

verdicts

representative citing papers

citing papers explorer