Binary Spiking Neural Networks as Causal Models
Pith reviewed 2026-05-07 11:43 UTC · model grok-4.3
The pith
Representing binary spiking neural networks as binary causal models allows SAT and SMT solvers to compute abductive explanations that exclude irrelevant features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We formally define a binary spiking neural network and represent its spiking activity as a binary causal model. Thanks to this causal representation we can use a SAT solver as well as an SMT solver to compute abductive explanations of the network's output. When the method is applied to a network trained on MNIST, the resulting explanations are expressed in terms of pixel-level features and contain only features that participate in a causal chain to the classification.
What carries the argument
The binary causal model obtained by mapping each neuron's spike events to binary variables and their temporal dependencies to directed causal links, which turns the network's forward pass into a logical theory from which abduction can be performed.
If this is right
- Abductive explanations computed from the causal model contain only causally relevant features and exclude any that have no effect on the output.
- Both SAT and SMT solvers can be applied directly to the same causal model to produce such explanations.
- The method yields pixel-level explanations for classifications on the MNIST dataset that satisfy the relevance guarantee.
Where Pith is reading between the lines
- The same causal-model construction could be tried on binary spiking networks trained for tasks other than digit recognition to test whether the relevance guarantee still holds.
- Because the explanations rest on explicit causal relations, they could be used to generate minimal intervention sets that would change a given classification.
- The approach suggests a route for embedding formal verification inside spiking-network pipelines without leaving the domain of binary logic.
Load-bearing premise
The spiking activity of a trained binary spiking neural network can be faithfully captured by a binary causal model without omitting any behavior that affects the final classification.
What would settle it
A concrete input pattern for which the binary spiking neural network produces a classification that cannot be recovered from any assignment of values to the variables in the corresponding causal model.
Figures
read the original abstract
We provide a causal analysis of Binary Spiking Neural Networks (BSNNs) to explain their behavior. We formally define a BSNN and represent its spiking activity as a binary causal model. Thanks to this causal representation, we are able to explain the output of the network by leveraging logic-based methods. In particular, we show that we can successfully use a SAT as well as a SMT solver to compute abductive explanations from this binary causal model. To illustrate our approach, we trained the BSNN on the standard MNIST dataset and applied our SAT-based and SMT-based methods to finding abductive explanations of the network's classifications based on pixel-level features. We also compared the found explanations against SHAP, a popular method used in the area of explainable AI. We show that, unlike SHAP, our approach guarantees that a found explanation does not contain completely irrelevant features.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines Binary Spiking Neural Networks (BSNNs) and maps their spiking activity to a binary causal model. It then uses SAT and SMT solvers to compute abductive explanations for classifications, illustrates the method on a BSNN trained on MNIST using pixel-level features, and compares the resulting explanations to SHAP, claiming that the logic-based approach guarantees explanations contain no completely irrelevant features.
Significance. If the mapping from BSNN dynamics to the binary causal model is faithful, the work would provide a formal, solver-based route to abductive explanations for spiking networks that avoids the irrelevant-feature problem of perturbation methods such as SHAP. The absence of quantitative metrics, error analysis, or a proof of semantic preservation, however, leaves the central claim unverified and reduces the immediate contribution to the literature on explainable AI for neuromorphic models.
major comments (3)
- [Abstract] Abstract: the assertion that SAT/SMT solvers are 'successfully' used to compute abductive explanations and that the method was compared to SHAP is unsupported by any quantitative results, accuracy figures, runtime data, or statistical analysis of explanation quality. Without these, the claim that the approach 'guarantees' no irrelevant features cannot be evaluated against the baseline.
- [Causal-model construction] The construction of the binary causal model from the BSNN (described after the formal definition of the network): the mapping encodes spiking activity via per-timestep binary variables. No argument or experiment is supplied showing that this encoding preserves membrane-potential accumulation, refractory dynamics, or inter-layer timing dependencies that affect the final classification. If any such behavior is omitted, abductive explanations extracted from the causal model may miss causally relevant features even while satisfying the 'no irrelevant features' property inside the model.
- [MNIST experiments] MNIST illustration section: the paper reports that explanations were obtained for network classifications but supplies neither the number of test instances examined, nor any measure of agreement with the network's actual decision boundary, nor a direct comparison (e.g., feature overlap, fidelity, or stability) against SHAP. This leaves the superiority claim untested.
minor comments (2)
- [Notation] The notation used for the causal variables (spike indicators, layer indices, time steps) should be introduced once in a single table or definition block to avoid repeated re-definition.
- [Solver encoding] The manuscript would benefit from an explicit statement of the precise SAT/SMT encoding (variables, clauses, objective) used to extract the abductive explanations.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments. We provide point-by-point responses to the major comments below. Where appropriate, we indicate revisions that will be incorporated into the next version of the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that SAT/SMT solvers are 'successfully' used to compute abductive explanations and that the method was compared to SHAP is unsupported by any quantitative results, accuracy figures, runtime data, or statistical analysis of explanation quality. Without these, the claim that the approach 'guarantees' no irrelevant features cannot be evaluated against the baseline.
Authors: We agree that the abstract could be more precise in describing the results. The manuscript presents the formal mapping and demonstrates the approach on MNIST examples, showing that the logic-based method produces explanations without irrelevant features by construction, in contrast to SHAP which can include them. However, we acknowledge the lack of quantitative metrics such as runtime or fidelity scores. In the revised manuscript, we will update the abstract to reflect the illustrative nature of the experiments and add quantitative analysis, including runtime data for the solvers and a comparison of explanation sizes or fidelity to the model's decisions. revision: yes
-
Referee: [Causal-model construction] The construction of the binary causal model from the BSNN (described after the formal definition of the network): the mapping encodes spiking activity via per-timestep binary variables. No argument or experiment is supplied showing that this encoding preserves membrane-potential accumulation, refractory dynamics, or inter-layer timing dependencies that affect the final classification. If any such behavior is omitted, abductive explanations extracted from the causal model may miss causally relevant features even while satisfying the 'no irrelevant features' property inside the model.
Authors: The binary causal model is derived directly from the BSNN definition by representing each neuron's spike at each time step as a binary variable, with clauses encoding the spiking rules based on the network's weights and thresholds. This is intended to be a faithful abstraction for the purpose of causal analysis of spike-based decisions. We concede that a formal proof of equivalence to the full continuous dynamics (including exact membrane potential accumulation) is not included. We will add a subsection providing a formal argument that the discrete binary encoding preserves the causal dependencies relevant to the classification output, focusing on the spike timings that determine the final class. revision: yes
-
Referee: [MNIST experiments] MNIST illustration section: the paper reports that explanations were obtained for network classifications but supplies neither the number of test instances examined, nor any measure of agreement with the network's actual decision boundary, nor a direct comparison (e.g., feature overlap, fidelity, or stability) against SHAP. This leaves the superiority claim untested.
Authors: The MNIST section is presented as an illustration of the method rather than a comprehensive empirical study. We report examples for specific classifications but did not include aggregate statistics. We agree this limits the ability to evaluate performance quantitatively. In the revision, we will specify the number of instances tested, provide measures such as the average size of explanations and their fidelity (how well they predict the output when features are set), and include a direct comparison table with SHAP regarding feature relevance and overlap. revision: yes
Circularity Check
No circularity: direct definitional mapping with independent solver application
full rationale
The paper defines a BSNN, then directly encodes its spiking activity into a binary causal model to enable SAT/SMT computation of abductive explanations. This encoding is presented as a formal representation rather than a derived prediction or fitted result. No equations reduce to prior self-citations, no parameters are fit then renamed as predictions, and no uniqueness theorems or ansatzes are smuggled in. The MNIST comparison to SHAP is an external empirical check, and the 'no irrelevant features' property follows from standard abductive reasoning over the constructed model. The derivation chain is self-contained and does not collapse to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Spiking activity of a BSNN can be represented as a binary causal model
Reference graph
Works this paper leans on
-
[2]
In an analogous way we can prove that iv) leads to a contradiction
This leads to a contradiction. In an analogous way we can prove that iv) leads to a contradiction. 2 A.2 Proof of Proposition 6.2 Proof. Suppose i) the term λ= pIx,y,t∧λ′is an abductive explanation of outSbin k ,t and, toward a contradiction, ii) ̸ ∃Hz ∈Hk such that Ix,y ∈ R+(Hz). By ii), we have that iii) for every pX,t′∈VSbin k the Boolean equation E(pX...
work page 2026
-
[3]
1 arXiv:2604.27007v1 [cs.AI] 29 Apr 2026 Figure 2: Visualization of abductive explanation for digit
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.