hub Canonical reference

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

· 2018 · cs.LG · arXiv 1803.03635

Canonical reference. 70% of citing Pith papers cite this work as background.

40 Pith papers citing it

Background 70% of classified citations

open full Pith review browse 40 citing papers arXiv PDF

abstract

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 1 other 1

citation-polarity summary

background 7 unclear 2 use method 1

representative citing papers

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

cs.CL · 2023-05-12 · conditional · novelty 8.0

Tiny language models under 10M parameters trained on a synthetic children's story dataset generate fluent, consistent, multi-paragraph English text with near-perfect grammar and reasoning.

Progress measures for grokking via mechanistic interpretability

cs.LG · 2023-01-12 · accept · novelty 8.0

Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.

You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector

cs.RO · 2026-03-16 · conditional · novelty 7.0

Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.

Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution

cs.LG · 2025-02-04 · unverdicted · novelty 7.0

Neurons exhibit concept-conditioned activation ranges forming Gaussian-like distributions with minimal overlap, and range-based interventions via NeuronLens outperform neuron-level masking in targeted manipulation with reduced collateral effects.

Universal Differential Equations for Scientific Machine Learning

cs.LG · 2020-01-13 · unverdicted · novelty 7.0

Universal Differential Equations unify scientific models with machine learning by embedding flexible approximators into differential equations, enabling applications from biological mechanism discovery to high-dimensional optimization.

Importance Estimation for Neural Network Pruning

cs.LG · 2019-06-25 · unverdicted · novelty 7.0

Taylor-expansion importance scoring enables layer-agnostic pruning of neural networks that outperforms prior methods on ImageNet accuracy-FLOPs trade-offs.

TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

cs.PL · 2026-04-19 · unverdicted · novelty 7.0

A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.

Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.

Minimal Information Control Invariance via Vector Quantization

eess.SY · 2026-04-03 · unverdicted · novelty 7.0

A vector-quantized autoencoder learns minimal control codebooks for forward invariance in sampled-data control, achieving 157x reduction over grid baselines on a 12D quadrotor model.

A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Sign-flip perturbations produce π/(π-2) ≈ 2.75 times more transverse output energy than equal-norm sign-preserving perturbations in a ReLU + RMSNorm block because ReLU creates directional asymmetry that RMSNorm's transverse projection exposes.

Prediction horizon shapes representations in predictive learning

cs.LG · 2025-11-12 · unverdicted · novelty 6.0

Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.

MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs

cs.LG · 2025-06-15 · unverdicted · novelty 6.0

MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.

Poisoning with A Pill: Circumventing Detection in Federated Learning

cs.LG · 2024-07-22 · unverdicted · novelty 6.0

A three-stage pill-based augmentation makes existing FL poisoning attacks evade popular defenses while raising error rates up to 7x on both IID and non-IID data.

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

cs.LG · 2023-10-19 · conditional · novelty 6.0

SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.

Sparse Autoencoders Find Highly Interpretable Features in Language Models

cs.LG · 2023-09-15 · unverdicted · novelty 6.0

Sparse autoencoders applied to language model activations yield more interpretable and monosemantic features than alternative approaches, enabling finer causal analysis on the indirect object identification task.

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

cs.CL · 2023-06-01 · conditional · novelty 6.0

AWQ quantizes LLM weights to low bits by scaling salient channels based on activation statistics, outperforming prior methods on language, coding, math, and multi-modal benchmarks.

Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on most Matbench Discovery metrics and downstream tasks.

XPERT: Expert Knowledge Transfer for Effective Training of Language Models

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.

Gaussians on a Diet: High-Quality Memory-Bounded 3D Gaussian Splatting Training

cs.CV · 2026-04-21 · conditional · novelty 6.0

A dynamic training framework for 3D Gaussian Splatting alternates incremental pruning and adaptive growing of primitives to maintain high rendering quality at up to 80% lower peak memory than standard 3DGS.

Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems Approach

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

Introduces integration, metastability, and dynamical stability index measures from layer activations and reports patterns distinguishing CIFAR-10 from CIFAR-100 difficulty plus early convergence signals across ResNet variants, DenseNet, MobileNetV2, VGG-16, and a Vision Transformer.

SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

SubFLOT uses optimal transport to generate data-aware personalized submodels via server-side pruning and scaling-based adaptive regularization to mitigate parametric divergence in heterogeneous federated learning.

SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

cs.CL · 2020-06-30 · unverdicted · novelty 6.0

GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution cs.LG · 2025-02-04 · unverdicted · none · ref 6 · internal anchor
Neurons exhibit concept-conditioned activation ranges forming Gaussian-like distributions with minimal overlap, and range-based interventions via NeuronLens outperform neuron-level masking in targeted manipulation with reduced collateral effects.
Prediction horizon shapes representations in predictive learning cs.LG · 2025-11-12 · unverdicted · none · ref 1 · internal anchor
Longer prediction horizons in predictive learning interact with model biases to recover the latent geometry of the task.
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs cs.LG · 2025-06-15 · unverdicted · none · ref 11 · internal anchor
MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer