ELAS pre-trains low-rank LLMs by applying 2:4 activation sparsity after squared ReLU to cut memory and accelerate training with minimal performance loss.
Accelerating transformer pre-training with 2: 4 sparsity.arXiv preprint arXiv:2404.01847, 2024a
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity
ELAS pre-trains low-rank LLMs by applying 2:4 activation sparsity after squared ReLU to cut memory and accelerate training with minimal performance loss.