Model-Level GNN Explanations via Rule-to-Graph Readout for Logit Reconstruction
Pith reviewed 2026-05-23 00:29 UTC · model grok-4.3
The pith
Logical rules built from subgraph concepts reconstruct a GNN's multiclass logits via its frozen classifier head.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By recasting the pretrained GNN's graph-level readout as a weighted rule-level readout, logical rules formed from grounded subgraph concepts yield embeddings that, when passed through the frozen classifier head, faithfully reproduce the base model's raw multiclass logits on both training and unseen graphs.
What carries the argument
Rule-to-graph readout, which computes embeddings directly from the symbolic structure of logical rules and routes active rules through the frozen original classifier head to reconstruct logits.
If this is right
- Explanations remain instantiable on unseen graphs without retraining.
- Subgraph-level grounding becomes directly available for each active rule.
- Rule-level contribution analysis can be performed at test time for any input.
- Rule ablations show critical rules support the predicted class while suppressing others.
- Prediction agreement matches or exceeds prior class-wise explainers while running up to 20 times faster.
Where Pith is reading between the lines
- The same rule embeddings could be inspected to diagnose whether a GNN has learned spurious subgraph patterns.
- Because rules act as functional units, one could test whether editing or removing specific rules changes downstream predictions in a controllable way.
- The framework might generalize to other readout-based architectures if the classifier head is kept frozen and the rule embedding step is adapted.
Load-bearing premise
Embeddings derived solely from the symbolic form of the logical rules contain enough information for the frozen classifier head to match the original GNN's multiclass logits on both seen and unseen graphs.
What would settle it
Compute the probability-level difference between the rule-reconstructed logits and the base GNN logits on a held-out test set of graphs; large systematic mismatch would falsify the reconstruction claim.
read the original abstract
We propose a novel model-level GNN explanation framework that shifts the explanation target from class-wise rule extraction to rule-based logit reconstruction. Our method recasts the graph-level readout of a pretrained GNN as a weighted rule-level readout: grounded subgraph concepts are composed into logical rules, rule embeddings are computed directly from their symbolic structure, and active rules are passed through the frozen classifier head to reconstruct the GNN's raw multiclass logits. As a result, our approach provides global explanations that remain instantiable on unseen graphs, support subgraph-level grounding, and admit rule-level contribution analysis at test-time. Experiments on three synthetic and two real-world graph classification benchmarks show that our approach faithfully reconstructs the base GNN's raw multiclass logits, achieving high probability-level fidelity across datasets. Rule-level ablations further demonstrate that the identified critical rules actively support the predicted class while suppressing non-target classes, suggesting that they act as functional units rather than merely serving as post-hoc symbolic artifacts. Compared with prior class-wise rule-based explainers, our approach achieves competitive or better prediction agreement while being up to \(20\times\) faster, and additionally provides rule weights, test-time grounding, and logit-level contribution analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a model-level GNN explanation framework that recasts the graph readout as a weighted rule-level readout: grounded subgraph concepts are composed into logical rules whose embeddings are derived directly from symbolic structure and passed through the frozen original classifier head to reconstruct the pretrained GNN's raw multiclass logits. It claims that the resulting global explanations are instantiable on unseen graphs, support subgraph grounding and rule-level contribution analysis at test time, achieve high probability-level fidelity on three synthetic and two real-world benchmarks, demonstrate functional (non-post-hoc) rule behavior via ablations, and offer competitive prediction agreement at up to 20× speed relative to prior class-wise rule extractors.
Significance. If the reconstruction is shown to be non-circular and the symbolic-to-embedding mapping preserves the semantics required by the frozen head, the method would supply a distinctive combination of global, instantiable explanations with logit-level diagnostics and test-time applicability that is currently unavailable from class-wise rule-based explainers.
major comments (2)
- [Abstract / Methods] Abstract and Methods (rule-embedding construction): the claim that rule embeddings computed purely from symbolic structure can be fed to the frozen classifier head to reconstruct logits on unseen graphs is load-bearing; the manuscript must explicitly define whether this mapping is a fixed function, a learned projection, or otherwise, and must demonstrate that the resulting vectors lie in the distribution expected by the head (i.e., reproduce the numerical outputs that actual GNN readouts would produce) rather than merely fitting training logits.
- [Experiments] Experiments (fidelity results): the abstract asserts “high probability-level fidelity” and “faithful reconstruction” yet supplies no quantitative metrics, error distributions, or per-class breakdown; without these numbers and without an explicit check that reconstruction error does not increase on held-out graphs, the generalization claim cannot be evaluated.
minor comments (1)
- [Abstract] The abstract states that rules “act as functional units rather than merely serving as post-hoc symbolic artifacts”; this phrasing should be replaced by a precise statement of what the ablation actually measures (e.g., change in reconstructed logit when a rule is removed).
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment point-by-point below, providing clarifications on the technical construction and committing to revisions that strengthen the presentation of our results without altering the core claims.
read point-by-point responses
-
Referee: [Abstract / Methods] Abstract and Methods (rule-embedding construction): the claim that rule embeddings computed purely from symbolic structure can be fed to the frozen classifier head to reconstruct logits on unseen graphs is load-bearing; the manuscript must explicitly define whether this mapping is a fixed function, a learned projection, or otherwise, and must demonstrate that the resulting vectors lie in the distribution expected by the head (i.e., reproduce the numerical outputs that actual GNN readouts would produce) rather than merely fitting training logits.
Authors: We agree this claim is central and benefits from explicit definition. The rule embeddings are produced by a fixed, non-learned function that encodes the symbolic structure (predicate identities, variable bindings, and logical connectives) using a deterministic composition of predefined embeddings followed by a fixed aggregation operator; no parameters are trained for this mapping and it is applied identically at test time. Section 3.2 already describes the construction, but we will add a dedicated paragraph clarifying that the mapping is fixed (not a learned projection) and that the resulting vectors are passed directly to the frozen head. To demonstrate alignment with the expected distribution, the experiments already report low reconstruction MSE on both training and held-out graphs; we will augment this with a direct comparison of embedding-norm statistics between rule-derived vectors and actual GNN readouts to confirm they occupy the same numerical regime. revision: yes
-
Referee: [Experiments] Experiments (fidelity results): the abstract asserts “high probability-level fidelity” and “faithful reconstruction” yet supplies no quantitative metrics, error distributions, or per-class breakdown; without these numbers and without an explicit check that reconstruction error does not increase on held-out graphs, the generalization claim cannot be evaluated.
Authors: The Experiments section (4.2–4.3) already contains quantitative fidelity metrics (probability-level fidelity and logit MSE) together with comparisons against baselines on all five datasets. However, we acknowledge that the abstract itself contains no numerical values and that per-class breakdowns plus explicit held-out error analysis are not presented in a single consolidated view. We will revise the abstract to include the key aggregate fidelity numbers. In addition, we will add a new table reporting per-class fidelity, reconstruction-error histograms, and a side-by-side comparison of MSE on training versus test graphs to directly verify that error does not increase on unseen data. revision: yes
Circularity Check
No significant circularity; derivation uses independent frozen classifier head on symbolic embeddings
full rationale
The paper's central mechanism computes rule embeddings directly from the symbolic structure of logical rules over grounded subgraphs and passes them through the original GNN's frozen classifier head to reconstruct logits. This construction is self-contained against external benchmarks because the head is unchanged from the pretrained model, the embedding function is defined from symbolic structure rather than fitted to the target logits, and fidelity is evaluated on both training and unseen graphs. No load-bearing step reduces by definition or self-citation to the inputs; the reconstruction claim therefore retains independent empirical content.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 4.2 ... if AGG and UPDATE ... are injective, the l-th layer node embedding h(l)_v is a Perfect Rooted Tree Representation of the full l-hop subtree
-
IndisputableMonolith/Foundation/AbsoluteFloorClosureabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we feed the trainable weighted sum of the embeddings of the m̂ global graph concepts to the original classifier ... optimize ... NLL loss with L2-penalty on wt
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.