RT-Lynx shifts DiT sparsity from weights to activations, reports up to 1.55x linear-layer speedup while preserving generation quality across multiple diffusion models.
Relu strikes back: Exploiting activation sparsity in large language models
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
SiLIF models apply SSM dynamics and parametrization to spiking neurons for stable training, reaching new SOTA on event-based and raw-audio speech datasets while using half the compute of SSMs via synaptic delays.
SPON adds a small set of trainable input-independent activation vectors as representational anchors, trained by distribution matching, to stabilize sparse activation in LLMs and recover performance lost to hidden-state distribution shifts.
Post-training N:M activation pruning preserves generative performance in LLMs better than equivalent weight pruning, with the 8:16 pattern emerging as a practical hardware-friendly choice.
PowLU replaces SwiGLU with a rational-power activation to reduce outlier amplification and numerical instability during large-scale LLM pre-training while matching performance.
citing papers explorer
-
SiLIF: Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks
SiLIF models apply SSM dynamics and parametrization to spiking neurons for stable training, reaching new SOTA on event-based and raw-audio speech datasets while using half the compute of SSMs via synaptic delays.
-
Resting Neurons, Active Insights: Robustifying Activation Sparsity in LLMs via Spontaneity
SPON adds a small set of trainable input-independent activation vectors as representational anchors, trained by distribution matching, to stabilize sparse activation in LLMs and recover performance lost to hidden-state distribution shifts.
-
Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches
Post-training N:M activation pruning preserves generative performance in LLMs better than equivalent weight pruning, with the 8:16 pattern emerging as a practical hardware-friendly choice.