Relu strikes back: Exploiting activation sparsity in large language models

Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar · 2023 · arXiv 2310.04564

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

RT-Lynx shifts DiT sparsity from weights to activations, reports up to 1.55x linear-layer speedup while preserving generation quality across multiple diffusion models.

SiLIF: Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks

cs.NE · 2025-06-04 · unverdicted · novelty 6.0

SiLIF models apply SSM dynamics and parametrization to spiking neurons for stable training, reaching new SOTA on event-based and raw-audio speech datasets while using half the compute of SSMs via synaptic delays.

Resting Neurons, Active Insights: Robustifying Activation Sparsity in LLMs via Spontaneity

cs.LG · 2025-12-14 · unverdicted · novelty 5.0 · 2 refs

SPON adds a small set of trainable input-independent activation vectors as representational anchors, trained by distribution matching, to stabilize sparse activation in LLMs and recover performance lost to hidden-state distribution shifts.

Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches

cs.LG · 2025-09-26 · unverdicted · novelty 5.0

Post-training N:M activation pruning preserves generative performance in LLMs better than equivalent weight pruning, with the 8:16 pattern emerging as a practical hardware-friendly choice.

PowLU: An Activation Function for Stable Pre-Training of LLMs

cs.CL · 2026-05-25 · unverdicted · novelty 4.0

PowLU replaces SwiGLU with a rational-power activation to reduce outlier amplification and numerical instability during large-scale LLM pre-training while matching performance.

citing papers explorer

Showing 2 of 2 citing papers after filters.

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models cs.LG · 2026-05-26 · unverdicted · none · ref 44
RT-Lynx shifts DiT sparsity from weights to activations, reports up to 1.55x linear-layer speedup while preserving generation quality across multiple diffusion models.
PowLU: An Activation Function for Stable Pre-Training of LLMs cs.CL · 2026-05-25 · unverdicted · none · ref 14
PowLU replaces SwiGLU with a rational-power activation to reduce outlier amplification and numerical instability during large-scale LLM pre-training while matching performance.

Relu strikes back: Exploiting activation sparsity in large language models

fields

years

verdicts

representative citing papers

citing papers explorer