Model-Level GNN Explanations via Rule-to-Graph Readout for Logit Reconstruction

Aedan J. DeFrates; Baochun Li; Di Niu; Jiuding Yang; Keith G. Mills; Shengyao Lu

arxiv: 2503.09051 · v2 · submitted 2025-03-12 · 💻 cs.LG · cs.AI

Model-Level GNN Explanations via Rule-to-Graph Readout for Logit Reconstruction

Shengyao Lu , Jiuding Yang , Aedan J. DeFrates , Keith G. Mills , Baochun Li , Di Niu This is my paper

Pith reviewed 2026-05-23 00:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph neural networksmodel explanationsrule-based explanationslogit reconstructionsubgraph conceptsGNN interpretabilityglobal explanations

0 comments

The pith

Logical rules built from subgraph concepts reconstruct a GNN's multiclass logits via its frozen classifier head.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a model-level explanation technique for graph neural networks that moves the target from extracting class-specific rules to reconstructing the model's raw multiclass output logits. Grounded subgraph patterns are assembled into logical rules whose embeddings are calculated straight from their symbolic form; these rules are then routed through the original frozen classifier head to produce the same logits the base GNN would output. The resulting explanations are global, remain valid on graphs never seen during training, support direct identification of contributing subgraphs, and permit per-rule contribution scores at inference time. Experiments across synthetic and real graph-classification tasks show the reconstructed logits match the original model's probability outputs at high fidelity, while the method runs substantially faster than earlier class-wise rule extractors.

Core claim

By recasting the pretrained GNN's graph-level readout as a weighted rule-level readout, logical rules formed from grounded subgraph concepts yield embeddings that, when passed through the frozen classifier head, faithfully reproduce the base model's raw multiclass logits on both training and unseen graphs.

What carries the argument

Rule-to-graph readout, which computes embeddings directly from the symbolic structure of logical rules and routes active rules through the frozen original classifier head to reconstruct logits.

If this is right

Explanations remain instantiable on unseen graphs without retraining.
Subgraph-level grounding becomes directly available for each active rule.
Rule-level contribution analysis can be performed at test time for any input.
Rule ablations show critical rules support the predicted class while suppressing others.
Prediction agreement matches or exceeds prior class-wise explainers while running up to 20 times faster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same rule embeddings could be inspected to diagnose whether a GNN has learned spurious subgraph patterns.
Because rules act as functional units, one could test whether editing or removing specific rules changes downstream predictions in a controllable way.
The framework might generalize to other readout-based architectures if the classifier head is kept frozen and the rule embedding step is adapted.

Load-bearing premise

Embeddings derived solely from the symbolic form of the logical rules contain enough information for the frozen classifier head to match the original GNN's multiclass logits on both seen and unseen graphs.

What would settle it

Compute the probability-level difference between the rule-reconstructed logits and the base GNN logits on a held-out test set of graphs; large systematic mismatch would falsify the reconstruction claim.

read the original abstract

We propose a novel model-level GNN explanation framework that shifts the explanation target from class-wise rule extraction to rule-based logit reconstruction. Our method recasts the graph-level readout of a pretrained GNN as a weighted rule-level readout: grounded subgraph concepts are composed into logical rules, rule embeddings are computed directly from their symbolic structure, and active rules are passed through the frozen classifier head to reconstruct the GNN's raw multiclass logits. As a result, our approach provides global explanations that remain instantiable on unseen graphs, support subgraph-level grounding, and admit rule-level contribution analysis at test-time. Experiments on three synthetic and two real-world graph classification benchmarks show that our approach faithfully reconstructs the base GNN's raw multiclass logits, achieving high probability-level fidelity across datasets. Rule-level ablations further demonstrate that the identified critical rules actively support the predicted class while suppressing non-target classes, suggesting that they act as functional units rather than merely serving as post-hoc symbolic artifacts. Compared with prior class-wise rule-based explainers, our approach achieves competitive or better prediction agreement while being up to \(20\times\) faster, and additionally provides rule weights, test-time grounding, and logit-level contribution analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new move is embedding symbolic rules and feeding them through the frozen GNN head for logit reconstruction, which is distinct from class-wise methods but rests on an unproven semantic match between symbolic encodings and actual readouts.

read the letter

This paper shifts GNN explanations from class-wise rule extraction to reconstructing the model's raw multiclass logits. Grounded subgraphs get composed into logical rules, embeddings are taken straight from the symbolic form, and active rules go through the original frozen classifier head. The result is meant to give global explanations that still work on unseen graphs plus rule-level contribution scores at test time.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a model-level GNN explanation framework that recasts the graph readout as a weighted rule-level readout: grounded subgraph concepts are composed into logical rules whose embeddings are derived directly from symbolic structure and passed through the frozen original classifier head to reconstruct the pretrained GNN's raw multiclass logits. It claims that the resulting global explanations are instantiable on unseen graphs, support subgraph grounding and rule-level contribution analysis at test time, achieve high probability-level fidelity on three synthetic and two real-world benchmarks, demonstrate functional (non-post-hoc) rule behavior via ablations, and offer competitive prediction agreement at up to 20× speed relative to prior class-wise rule extractors.

Significance. If the reconstruction is shown to be non-circular and the symbolic-to-embedding mapping preserves the semantics required by the frozen head, the method would supply a distinctive combination of global, instantiable explanations with logit-level diagnostics and test-time applicability that is currently unavailable from class-wise rule-based explainers.

major comments (2)

[Abstract / Methods] Abstract and Methods (rule-embedding construction): the claim that rule embeddings computed purely from symbolic structure can be fed to the frozen classifier head to reconstruct logits on unseen graphs is load-bearing; the manuscript must explicitly define whether this mapping is a fixed function, a learned projection, or otherwise, and must demonstrate that the resulting vectors lie in the distribution expected by the head (i.e., reproduce the numerical outputs that actual GNN readouts would produce) rather than merely fitting training logits.
[Experiments] Experiments (fidelity results): the abstract asserts “high probability-level fidelity” and “faithful reconstruction” yet supplies no quantitative metrics, error distributions, or per-class breakdown; without these numbers and without an explicit check that reconstruction error does not increase on held-out graphs, the generalization claim cannot be evaluated.

minor comments (1)

[Abstract] The abstract states that rules “act as functional units rather than merely serving as post-hoc symbolic artifacts”; this phrasing should be replaced by a precise statement of what the ablation actually measures (e.g., change in reconstructed logit when a rule is removed).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment point-by-point below, providing clarifications on the technical construction and committing to revisions that strengthen the presentation of our results without altering the core claims.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods (rule-embedding construction): the claim that rule embeddings computed purely from symbolic structure can be fed to the frozen classifier head to reconstruct logits on unseen graphs is load-bearing; the manuscript must explicitly define whether this mapping is a fixed function, a learned projection, or otherwise, and must demonstrate that the resulting vectors lie in the distribution expected by the head (i.e., reproduce the numerical outputs that actual GNN readouts would produce) rather than merely fitting training logits.

Authors: We agree this claim is central and benefits from explicit definition. The rule embeddings are produced by a fixed, non-learned function that encodes the symbolic structure (predicate identities, variable bindings, and logical connectives) using a deterministic composition of predefined embeddings followed by a fixed aggregation operator; no parameters are trained for this mapping and it is applied identically at test time. Section 3.2 already describes the construction, but we will add a dedicated paragraph clarifying that the mapping is fixed (not a learned projection) and that the resulting vectors are passed directly to the frozen head. To demonstrate alignment with the expected distribution, the experiments already report low reconstruction MSE on both training and held-out graphs; we will augment this with a direct comparison of embedding-norm statistics between rule-derived vectors and actual GNN readouts to confirm they occupy the same numerical regime. revision: yes
Referee: [Experiments] Experiments (fidelity results): the abstract asserts “high probability-level fidelity” and “faithful reconstruction” yet supplies no quantitative metrics, error distributions, or per-class breakdown; without these numbers and without an explicit check that reconstruction error does not increase on held-out graphs, the generalization claim cannot be evaluated.

Authors: The Experiments section (4.2–4.3) already contains quantitative fidelity metrics (probability-level fidelity and logit MSE) together with comparisons against baselines on all five datasets. However, we acknowledge that the abstract itself contains no numerical values and that per-class breakdowns plus explicit held-out error analysis are not presented in a single consolidated view. We will revise the abstract to include the key aggregate fidelity numbers. In addition, we will add a new table reporting per-class fidelity, reconstruction-error histograms, and a side-by-side comparison of MSE on training versus test graphs to directly verify that error does not increase on unseen data. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation uses independent frozen classifier head on symbolic embeddings

full rationale

The paper's central mechanism computes rule embeddings directly from the symbolic structure of logical rules over grounded subgraphs and passes them through the original GNN's frozen classifier head to reconstruct logits. This construction is self-contained against external benchmarks because the head is unchanged from the pretrained model, the embedding function is defined from symbolic structure rather than fitted to the target logits, and fidelity is evaluated on both training and unseen graphs. No load-bearing step reduces by definition or self-citation to the inputs; the reconstruction claim therefore retains independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach appears to rely on standard GNN components and logical rule composition without additional postulated objects.

pith-pipeline@v0.9.0 · 5762 in / 1279 out tokens · 82995 ms · 2026-05-23T00:29:58.861574+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.2 ... if AGG and UPDATE ... are injective, the l-th layer node embedding h(l)_v is a Perfect Rooted Tree Representation of the full l-hop subtree
IndisputableMonolith/Foundation/AbsoluteFloorClosure absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we feed the trainable weighted sum of the embeddings of the m̂ global graph concepts to the original classifier ... optimize ... NLL loss with L2-penalty on wt

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.