Root mean square layer normalization

Biao Zhang, Rico Sennrich · 2019

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

cs.LG · 2024-07-05 · conditional · novelty 8.0

TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.

CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

cs.LG · 2025-06-10 · unverdicted · novelty 6.0

CodeBrain introduces a decoupled TFDual-Tokenizer and multi-scale EEGSSM architecture for an EEG foundation model pretrained on a large corpus, claiming strong generalization across eight downstream tasks and ten datasets.

Prune, Update and Trim: Robust Structured Pruning for Large Language Models

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Putri is a structured pruning technique for LLMs that compensates for pruning errors via weight updates and sequential processing while pruning at the attention-head level to reach state-of-the-art results at extreme sparsity.

citing papers explorer

Showing 3 of 3 citing papers.

Learning to (Learn at Test Time): RNNs with Expressive Hidden States cs.LG · 2024-07-05 · conditional · none · ref 84
TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model cs.LG · 2025-06-10 · unverdicted · none · ref 65
CodeBrain introduces a decoupled TFDual-Tokenizer and multi-scale EEGSSM architecture for an EEG foundation model pretrained on a large corpus, claiming strong generalization across eight downstream tasks and ten datasets.
Prune, Update and Trim: Robust Structured Pruning for Large Language Models cs.LG · 2026-05-18 · unverdicted · none · ref 34
Putri is a structured pruning technique for LLMs that compensates for pruning errors via weight updates and sequential processing while pruning at the attention-head level to reach state-of-the-art results at extreme sparsity.

Root mean square layer normalization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer