Relu strikes back: Exploiting activation sparsity in large language models

Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar · 2023 · arXiv 2310.04564

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

SiLIF: Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks

cs.NE · 2025-06-04 · unverdicted · novelty 6.0

SiLIF models apply SSM dynamics and parametrization to spiking neurons for stable training, reaching new SOTA on event-based and raw-audio speech datasets while using half the compute of SSMs via synaptic delays.

Resting Neurons, Active Insights: Robustifying Activation Sparsity in LLMs via Spontaneity

cs.LG · 2025-12-14 · unverdicted · novelty 5.0 · 2 refs

SPON adds a small set of trainable input-independent activation vectors as representational anchors, trained by distribution matching, to stabilize sparse activation in LLMs and recover performance lost to hidden-state distribution shifts.

Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches

cs.LG · 2025-09-26 · unverdicted · novelty 5.0

Post-training N:M activation pruning preserves generative performance in LLMs better than equivalent weight pruning, with the 8:16 pattern emerging as a practical hardware-friendly choice.

citing papers explorer

Showing 3 of 3 citing papers.

SiLIF: Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks cs.NE · 2025-06-04 · unverdicted · none · ref 3
SiLIF models apply SSM dynamics and parametrization to spiking neurons for stable training, reaching new SOTA on event-based and raw-audio speech datasets while using half the compute of SSMs via synaptic delays.
Resting Neurons, Active Insights: Robustifying Activation Sparsity in LLMs via Spontaneity cs.LG · 2025-12-14 · unverdicted · none · ref 15 · 2 links
SPON adds a small set of trainable input-independent activation vectors as representational anchors, trained by distribution matching, to stabilize sparse activation in LLMs and recover performance lost to hidden-state distribution shifts.
Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches cs.LG · 2025-09-26 · unverdicted · none · ref 17
Post-training N:M activation pruning preserves generative performance in LLMs better than equivalent weight pruning, with the 8:16 pattern emerging as a practical hardware-friendly choice.

Relu strikes back: Exploiting activation sparsity in large language models

fields

years

verdicts

representative citing papers

citing papers explorer