Dissecting Pruned Neural Networks

David Bau; Jonathan Frankle

arxiv: 1907.00262 · v1 · pith:RYQMQXQOnew · submitted 2019-06-29 · 💻 cs.LG · cs.CV· cs.NE· stat.ML

Dissecting Pruned Neural Networks

Jonathan Frankle , David Bau This is my paper

Pith reviewed 2026-05-25 12:41 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.NEstat.ML

keywords pruninginterpretabilityneural networksnetwork dissectionResNet-50ImageNetmodel compressiondisentangled representations

0 comments

The pith

ResNet-50 models on ImageNet keep the same number of interpretable concepts in their units after more than 90 percent of parameters are pruned.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pruning removes large numbers of parameters from neural networks while preserving accuracy. The paper measures whether this process also preserves the count of hidden units that represent human-recognizable concepts, using network dissection to identify such units. It finds that the number of these interpretable concepts and units remains unchanged until pruning reaches the point where accuracy begins to decline. The result applies to ResNet-50 trained on ImageNet and holds after more than 90 percent of parameters have been removed. This indicates that the parameters removed by pruning are not required for maintaining this form of interpretability.

Core claim

Pruning has no detrimental effect on the measure of interpretability until so few parameters remain that accuracy begins to drop. Resnet-50 models trained on ImageNet maintain the same number of interpretable concepts and units until more than 90% of parameters have been pruned.

What carries the argument

Network dissection, which counts hidden units that learn disentangled representations of human-recognizable concepts.

If this is right

The structure removed by pruning does not include the units that encode the measured interpretable concepts.
This measure of interpretability remains stable under compression as long as accuracy is preserved.
Accuracy and the count of interpretable units decline together once pruning exceeds the point where unnecessary parameters are exhausted.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The encoding of these concepts may be redundant enough to survive removal of most parameters.
The same pattern could be tested on other architectures or datasets to check whether it is general.
Pruning might serve as a compression method that leaves explanatory units intact.

Load-bearing premise

Network dissection continues to provide a reliable count of disentangled human-recognizable concepts after pruning without the reduced capacity introducing systematic bias into the measurement.

What would settle it

A measured drop in the number of interpretable units in a ResNet-50 on ImageNet that occurs before accuracy declines would falsify the central claim.

read the original abstract

Pruning is a standard technique for removing unnecessary structure from a neural network to reduce its storage footprint, computational demands, or energy consumption. Pruning can reduce the parameter-counts of many state-of-the-art neural networks by an order of magnitude without compromising accuracy, meaning these networks contain a vast amount of unnecessary structure. In this paper, we study the relationship between pruning and interpretability. Namely, we consider the effect of removing unnecessary structure on the number of hidden units that learn disentangled representations of human-recognizable concepts as identified by network dissection. We aim to evaluate how the interpretability of pruned neural networks changes as they are compressed. We find that pruning has no detrimental effect on this measure of interpretability until so few parameters remain that accuracy beings to drop. Resnet-50 models trained on ImageNet maintain the same number of interpretable concepts and units until more than 90% of parameters have been pruned.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper measures network dissection counts of interpretable units in pruned ResNet-50 and reports they hold steady past 90% pruning until accuracy falls.

read the letter

The main point is that ResNet-50 on ImageNet keeps the same number of units flagged as interpretable by network dissection after more than 90% of parameters are pruned, provided accuracy does not drop. The work takes the existing dissection pipeline and runs it on successively pruned versions of the model to count matches against Broden concepts. This specific quantification had not been reported in the pruning papers referenced in the abstract, so the measurement itself is the new piece. It gives a clean empirical trace of how one proxy for disentangled concepts behaves under compression, which is a direct check rather than a theoretical argument. The result is easy to read and ties two lines of work together with numbers from a standard architecture and dataset. The soft spot is the stress-test concern about the metric. Dissection scores units by IoU against fixed concepts using activation thresholds and selectivity that depend on the empirical distribution of activations. Pruning changes sparsity and co-activation patterns, so the same underlying concepts could produce different IoU values without any re-calibration or re-validation of the pipeline on the pruned models. The abstract supplies no detail on whether this was checked, how many runs were done, or what statistical controls were applied. If the full paper does not contain those checks, the plateau could be partly an artifact of the measurement rather than evidence that the concepts are preserved. This paper is for readers already working on pruning or on concept-based interpretability who want a concrete data point on their interaction. It is not a broad theory paper and does not claim to solve deployment questions on its own. It deserves a serious referee because the experiment is well-scoped and the connection it draws is worth verifying in detail, even if the interpretation of the count needs more scrutiny on the metric side.

Referee Report

2 major / 1 minor

Summary. The paper claims that pruning ResNet-50 models trained on ImageNet has no detrimental effect on interpretability—as measured by the number of hidden units learning disentangled human-recognizable concepts via network dissection—until more than 90% of parameters are removed, at which point accuracy also begins to decline. The work positions this as evidence that the 'unnecessary structure' removed by pruning does not include the units responsible for these interpretable concepts.

Significance. If the central empirical result is robust, it would indicate that aggressive unstructured pruning preserves the count of concept-aligned units, implying that interpretability is concentrated in a small, resilient subset of parameters. This could guide pruning algorithms that explicitly protect interpretable representations and inform theoretical accounts of how overparameterization relates to disentangled feature learning.

major comments (2)

[Abstract] Abstract: the central claim equates stable dissection counts with preserved interpretability, yet the text supplies no indication that the network-dissection pipeline (activation thresholds, IoU computation against Broden concepts, selectivity criteria) was re-validated or re-calibrated on the pruned models. Because pruning changes sparsity, dynamic range, and co-activation statistics, these quantities are distribution-dependent; without explicit checks, the plateau until >90% pruning could be a measurement artifact.
[Results] Results / Experimental protocol (implied by the abstract's empirical finding): no details are given on the number of independent runs, statistical tests for the 'same number' claim, or controls that would rule out systematic bias in the dissection metric after pruning. These omissions are load-bearing because the weakest assumption is precisely that the metric remains unbiased under the altered activation regime.

minor comments (1)

[Abstract] Abstract: 'accuracy beings to drop' is a typographical error and should read 'accuracy begins to drop'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim equates stable dissection counts with preserved interpretability, yet the text supplies no indication that the network-dissection pipeline (activation thresholds, IoU computation against Broden concepts, selectivity criteria) was re-validated or re-calibrated on the pruned models. Because pruning changes sparsity, dynamic range, and co-activation statistics, these quantities are distribution-dependent; without explicit checks, the plateau until >90% pruning could be a measurement artifact.

Authors: The network dissection procedure followed the exact protocol and hyperparameters from the original Network Dissection paper, applied uniformly to all models. We agree that the manuscript would benefit from explicit discussion of metric stability under pruning-induced distribution shifts. In the revised version we will add a dedicated paragraph and supplementary analysis confirming that activation thresholds and concept IoU distributions do not exhibit systematic drift with increasing sparsity. revision: partial
Referee: [Results] Results / Experimental protocol (implied by the abstract's empirical finding): no details are given on the number of independent runs, statistical tests for the 'same number' claim, or controls that would rule out systematic bias in the dissection metric after pruning. These omissions are load-bearing because the weakest assumption is precisely that the metric remains unbiased under the altered activation regime.

Authors: Results are reported from the standard single training run per pruning level, consistent with common practice for large-scale ImageNet experiments. No formal statistical tests or multi-seed controls were included. We acknowledge these omissions weaken the robustness claim. The revision will add an explicit experimental-protocol subsection noting the single-run limitation and, where computationally feasible, supplementary multi-seed verification or a bias-control argument based on the observed invariance of the Broden concept set. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical measurement

full rationale

The paper presents an experimental study measuring the number of interpretable units via network dissection before and after pruning ResNet-50 models. No derivation, equations, fitted parameters, or predictions appear in the claim; the result is a direct count from applying an external method to pruned networks. The central observation is therefore self-contained against the reported benchmarks and does not reduce to any self-referential step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that network dissection yields a stable, meaningful count of interpretable units even after network capacity is reduced by pruning.

axioms (1)

domain assumption Network dissection reliably identifies units that learn disentangled representations of human-recognizable concepts.
This metric is used to quantify interpretability before and after pruning.

pith-pipeline@v0.9.0 · 5685 in / 1000 out tokens · 34044 ms · 2026-05-25T12:41:18.532580+00:00 · methodology

Dissecting Pruned Neural Networks

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)