Accelerating transformer pre-training with 2: 4 sparsity.arXiv preprint arXiv:2404.01847, 2024a

Yuezhou Hu, Kang Zhao, Weiyu Huang, Jianfei Chen, Jun Zhu · arXiv 2404.01847

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity

cs.LG · 2026-05-05 · unverdicted · novelty 5.0

ELAS pre-trains low-rank LLMs by applying 2:4 activation sparsity after squared ReLU to cut memory and accelerate training with minimal performance loss.

citing papers explorer

Showing 1 of 1 citing paper.

ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity cs.LG · 2026-05-05 · unverdicted · none · ref 3
ELAS pre-trains low-rank LLMs by applying 2:4 activation sparsity after squared ReLU to cut memory and accelerate training with minimal performance loss.

Accelerating transformer pre-training with 2: 4 sparsity.arXiv preprint arXiv:2404.01847, 2024a

fields

years

verdicts

representative citing papers

citing papers explorer