TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

URL https://arxiv · 2023 · arXiv 2307.14995

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

cs.LG · 2024-05-31 · unverdicted · novelty 7.0

Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.

Higher-order Linear Attention

cs.LG · 2025-10-31 · unverdicted · novelty 6.0

Higher-order Linear Attention realizes second-order and higher interactions in linear-time causal attention via constant-size state and associative scans.

Kimi Linear: An Expressive, Efficient Attention Architecture

cs.CL · 2025-10-30 · unverdicted · novelty 6.0

Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.

Lizard: An Efficient Linearization Framework for Large Language Models

cs.CL · 2025-07-11 · unverdicted · novelty 6.0

Lizard linearizes Transformer LLMs via subquadratic attention and adaptive learnable modules, recovering near-original performance while outperforming prior linearization methods on MMLU and associative recall.

Gated Linear Attention Transformers with Hardware-Efficient Training

cs.LG · 2023-12-11 · unverdicted · novelty 6.0

Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.

citing papers explorer

Showing 5 of 5 citing papers.

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality cs.LG · 2024-05-31 · unverdicted · none · ref 82
Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.
Higher-order Linear Attention cs.LG · 2025-10-31 · unverdicted · none · ref 10
Higher-order Linear Attention realizes second-order and higher interactions in linear-time causal attention via constant-size state and associative scans.
Kimi Linear: An Expressive, Efficient Attention Architecture cs.CL · 2025-10-30 · unverdicted · none · ref 80
Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.
Lizard: An Efficient Linearization Framework for Large Language Models cs.CL · 2025-07-11 · unverdicted · none · ref 17
Lizard linearizes Transformer LLMs via subquadratic attention and adaptive learnable modules, recovering near-original performance while outperforming prior linearization methods on MMLU and associative recall.
Gated Linear Attention Transformers with Hardware-Efficient Training cs.LG · 2023-12-11 · unverdicted · none · ref 72
Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer