Training your sparse neural network better with any mask

Ajay Kumar Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang · 2022

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

cs.LG · 2023-09-29 · unverdicted · novelty 5.0

Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.

citing papers explorer

Showing 1 of 1 citing paper.

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs cs.LG · 2023-09-29 · unverdicted · none · ref 21
Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.

Training your sparse neural network better with any mask

fields

years

verdicts

representative citing papers

citing papers explorer