Efficient transformers: A survey.ACM Computing Surveys, 55(6):1–28

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler · 2022 · DOI 10.1145/3530811

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

cs.CL · 2023-08-28 · unverdicted · novelty 8.0

LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).

From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

Sparsity-guided distillation enables replacing attention layers in ViTs with simpler sequential modules, with sparser layers showing smaller performance drops.

Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers

cs.LG · 2026-05-09 · unverdicted · novelty 3.0

This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations

cs.AR · 2026-05-12

citing papers explorer

Showing 4 of 4 citing papers.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding cs.CL · 2023-08-28 · unverdicted · none · ref 121
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation cs.LG · 2026-05-15 · unverdicted · none · ref 23
Sparsity-guided distillation enables replacing attention layers in ViTs with simpler sequential modules, with sparser layers showing smaller performance drops.
Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers cs.LG · 2026-05-09 · unverdicted · none · ref 42
This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.
Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations cs.AR · 2026-05-12 · unreviewed · ref 13

Efficient transformers: A survey.ACM Computing Surveys, 55(6):1–28

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer