pith. sign in

arxiv: 1907.04613 · v1 · pith:2E5NHKO3new · submitted 2019-07-10 · 💻 cs.CL · cs.LG

Neural Networks as Explicit Word-Based Rules

Pith reviewed 2026-05-25 00:01 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords convolutional networksinterpretabilitysentiment classificationword-based rulesfilter visualizationnatural language processingmodel explanation
0
0 comments X

The pith

A convolutional network for sentiment classification can be rewritten as explicit word-based rules that recover its original performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a visualization technique from computer vision to interpret the weight matrices of a convolutional neural network used for sentiment classification. By maximizing the response of each filter, the authors identify sets of words that activate the filters, turning them into explicit word-based rules. These rules are then used in place of the network to classify text. The extracted rules achieve the same level of performance as the original model. This demonstrates that the network's behavior can be fully captured by simple combinations of word indicators.

Core claim

The filters of a convolutional network for sentiment classification can be interpreted as word-based rules by maximizing their responses over input words. The resulting rules recover the performance of the original model on the classification task.

What carries the argument

Maximizing filter responses on word embeddings to extract sets of words that define each filter's activation rule.

Load-bearing premise

That the words maximizing each filter's response form a complete and accurate representation of what the filter computes in the full model.

What would settle it

Testing the extracted word rules on the original test set and finding that their accuracy is substantially lower than the neural network's accuracy.

read the original abstract

Filters of convolutional networks used in computer vision are often visualized as image patches that maximize the response of the filter. We use the same approach to interpret weight matrices in simple architectures for natural language processing tasks. We interpret a convolutional network for sentiment classification as word-based rules. Using the rule, we recover the performance of the original model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript adapts the max-response visualization technique from computer vision to interpret the weight matrices of a convolutional neural network for sentiment classification. Individual words that maximize filter responses are presented as explicit word-based rules; the authors claim these rules recover the performance of the original CNN on the test set.

Significance. If the extracted rules provably match the CNN's decisions (including max-pooling and the final classifier), the work would supply a concrete, falsifiable method for turning a neural NLP model into an interpretable rule set. The performance-recovery check is a strength that directly tests the fidelity of the interpretation.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'using the rule, we recover the performance of the original model' is load-bearing, yet the abstract (and the description provided) supplies no account of how single-filter argmax words are aggregated into rules that replicate the CNN's max-pooling across filters and final linear layer.
  2. [Method] The transfer of the CV visualization technique assumes that words maximizing individual filter responses automatically compose into complete, composable rules; the manuscript does not demonstrate that this holds once n-gram interactions and the downstream classifier are taken into account.
minor comments (1)
  1. Clarify the exact procedure (thresholding, combination function, handling of negative contributions) used to turn per-filter word lists into executable rules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important points about the clarity of our claims and the description of how the extracted rules are formed and applied. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'using the rule, we recover the performance of the original model' is load-bearing, yet the abstract (and the description provided) supplies no account of how single-filter argmax words are aggregated into rules that replicate the CNN's max-pooling across filters and final linear layer.

    Authors: We agree that the abstract is concise and does not include an explicit account of the aggregation step. The manuscript forms rules by extracting the single word that maximizes the response of each filter and then applies these words by computing their activations under the same convolutional and pooling operations as the original network before feeding into the final linear layer. We will revise the abstract to add a brief clause describing this aggregation process so that the central claim is better supported within the word limit. revision: yes

  2. Referee: [Method] The transfer of the CV visualization technique assumes that words maximizing individual filter responses automatically compose into complete, composable rules; the manuscript does not demonstrate that this holds once n-gram interactions and the downstream classifier are taken into account.

    Authors: The manuscript demonstrates composition empirically: the extracted per-filter words are substituted back into the original CNN architecture (including max-pooling over the filter responses and the downstream linear classifier), and test-set performance is recovered. This empirical match serves as evidence that the rules capture the net effect of the model's operations. We acknowledge that an additional explicit walk-through of how n-gram filter responses and the classifier weights interact with the selected words would improve transparency. We will add a short clarifying paragraph in the method section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical interpretation validated externally

full rationale

The paper applies max-response visualization (transferred from CV) to extract word rules from a CNN's filters for sentiment classification, then reports that the resulting rules recover the CNN's test performance. This is an empirical demonstration of interpretability rather than a mathematical derivation or prediction step. No equations reduce a claimed result to its own fitted inputs by construction, no self-citation chain bears the central claim, and the performance match is measured against held-out data rather than being tautological. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No information on free parameters, axioms, or invented entities available from the abstract alone.

pith-pipeline@v0.9.0 · 5564 in / 877 out tokens · 21912 ms · 2026-05-25T00:01:36.803192+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.