The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs
Pith reviewed 2026-05-19 03:44 UTC · model grok-4.3
The pith
The Graph Tsetlin Machine learns nested deep clauses from graph inputs via message passing to recognize sub-graph patterns with exponentially fewer rules.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through message passing, the GraphTM builds nested deep clauses to recognize sub-graph patterns with exponentially fewer clauses, increasing both interpretability and data utilization. This moves the Tsetlin Machine beyond fixed-length flat inputs to support versatile graph representations while preserving its core logical and efficient properties.
What carries the argument
Message passing on graphs to construct nested deep clauses that recognize sub-graph patterns.
Load-bearing premise
Message passing on graphs can reliably construct interpretable deep clauses while preserving the efficiency and accuracy advantages of standard Tsetlin Machines across diverse tasks.
What would settle it
An experiment on a new graph dataset in which the GraphTM requires as many or more clauses as a flat Tsetlin Machine, or loses accuracy relative to standard deep models without a clear gain in interpretability, would falsify the central claim.
Figures
read the original abstract
Pattern recognition with concise and flat AND-rules makes the Tsetlin Machine (TM) both interpretable and efficient, while the power of Tsetlin automata enables accuracy comparable to deep learning on an increasing number of datasets. We introduce the Graph Tsetlin Machine (GraphTM) for learning interpretable deep clauses from graph-structured input. Moving beyond flat, fixed-length input, the GraphTM gets more versatile, supporting sequences, grids, relations, and multimodality. Through message passing, the GraphTM builds nested deep clauses to recognize sub-graph patterns with exponentially fewer clauses, increasing both interpretability and data utilization. For image classification, GraphTM preserves interpretability and achieves 3.86%-points higher accuracy on CIFAR-10 than a convolutional TM. For tracking action coreference, faced with increasingly challenging tasks, GraphTM outperforms other reinforcement learning methods by up to 20.6%-points. In recommendation systems, it tolerates increasing noise to a great extent, similar to a GCN. Finally, for viral genome sequence data, GraphTM is competitive with BiLSTM-CNN and GCN accuracy-wise, training ~2.5x faster than GCN. The GraphTM's application to these varied fields demonstrates how graph representation learning and deep clauses bring new possibilities for TM learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Graph Tsetlin Machine (GraphTM), an extension of the Tsetlin Machine to graph-structured inputs. Using message passing, it constructs nested deep clauses for recognizing sub-graph patterns, claiming exponentially fewer clauses, enhanced interpretability, and improved data utilization. Empirical evaluations are provided on CIFAR-10 image classification, action coreference tracking, recommendation systems, and viral genome sequence data, reporting accuracy gains or competitiveness against baselines such as convolutional TMs, reinforcement learning methods, GCNs, and BiLSTM-CNNs.
Significance. Should the proposed mechanism for building interpretable nested clauses via message passing prove sound, this could represent a meaningful advance in interpretable AI by integrating graph-based learning with the logical, automata-driven framework of Tsetlin Machines. The reported performance improvements across diverse tasks highlight potential for more efficient and versatile logical reasoning systems.
major comments (2)
- The central claim that message passing builds 'nested deep clauses' to recognize sub-graph patterns with exponentially fewer clauses requires a concrete example or formal definition showing how messages are mapped to literals while preserving the conjunction structure and Tsetlin automata feedback. Without this, it is unclear if the resulting rules remain flat AND-rules or if interpretability guarantees carry over. This is load-bearing for the interpretability and efficiency advantages asserted in the abstract.
- Specific accuracy figures are reported (e.g., 3.86%-points improvement on CIFAR-10), but the manuscript appears to lack full details on experimental setups, error bars, hyperparameter choices, and statistical tests. This makes it difficult to assess the reliability of the performance claims and the assertion of better data utilization.
minor comments (2)
- Consider adding a short illustrative example of a deep clause or the clause reduction factor to make the 'exponentially fewer clauses' claim more tangible for readers.
- Ensure consistent notation for graph elements (nodes, edges, messages) and how they relate to Tsetlin literals.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will incorporate to strengthen the presentation.
read point-by-point responses
-
Referee: The central claim that message passing builds 'nested deep clauses' to recognize sub-graph patterns with exponentially fewer clauses requires a concrete example or formal definition showing how messages are mapped to literals while preserving the conjunction structure and Tsetlin automata feedback. Without this, it is unclear if the resulting rules remain flat AND-rules or if interpretability guarantees carry over. This is load-bearing for the interpretability and efficiency advantages asserted in the abstract.
Authors: We agree that a concrete example and explicit formal mapping are needed to make the mechanism fully transparent. In the revised manuscript we will add a new subsection that walks through a small illustrative graph, showing the exact mapping of each message to literals, the preservation of conjunction at every nesting level, and the extension of Tsetlin automata feedback to the resulting deep clauses. This addition will demonstrate that the rules remain logical AND-rules while the nesting enables sub-graph pattern recognition, thereby supporting the claimed interpretability and efficiency gains. revision: yes
-
Referee: Specific accuracy figures are reported (e.g., 3.86%-points improvement on CIFAR-10), but the manuscript appears to lack full details on experimental setups, error bars, hyperparameter choices, and statistical tests. This makes it difficult to assess the reliability of the performance claims and the assertion of better data utilization.
Authors: We acknowledge that the current experimental reporting is insufficient for full reproducibility and statistical assessment. In the revision we will expand the experimental section with complete hyperparameter tables for every dataset and baseline, error bars computed over multiple independent runs, and appropriate statistical tests (e.g., paired t-tests) for the reported accuracy differences, including the 3.86%-point CIFAR-10 gain and the data-utilization claims. revision: yes
Circularity Check
No circularity: claims rest on new graph message-passing construction and empirical results
full rationale
The paper defines GraphTM by extending standard TM clause learning with message passing on graphs to produce nested clauses. This is presented as a novel architectural choice rather than a derivation that reduces to prior fitted parameters or self-citations by construction. Central claims about exponential clause reduction and preserved interpretability are tied to experimental outcomes on CIFAR-10, action tracking, recommendations, and genomes, not to any self-referential equation or uniqueness theorem imported from the authors' prior work. No load-bearing step equates a 'prediction' to an input fit or renames an existing result. Self-citations to base TM are normal background and do not carry the new graph-specific results.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Through message passing, the GraphTM builds nested deep clauses to recognize sub-graph patterns with exponentially fewer clauses
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction and embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Cj = C0j ∧ C1j ∧ ⋯ ∧ CDj … message submission … inbox I^d_q
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Interpretable rule-based architecture for gnss jamming signal classification
Sindhusha Jeeru, Lei Jiao, Per-Arne Andersen, and Ole-Christoffer Granmo. Interpretable rule-based architecture for gnss jamming signal classification. IEEE Sensors Journal, PP:1–1, 01 2025
work page 2025
-
[2]
Using Tsetlin Machine to discover interpretable rules in natural language processing applications
Rupsa Saha, Ole-Christoffer Granmo, and Morten Goodwin. Using Tsetlin Machine to discover interpretable rules in natural language processing applications. Expert Systems, 40(4):e12873,
-
[3]
_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/exsy.12873
-
[4]
Enhancing interpretable clauses semantically using pretrained word representation
Rohan Kumar Yadav, Lei Jiao, Ole-Christoffer Granmo, and Morten Goodwin. Enhancing interpretable clauses semantically using pretrained word representation. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 265–274. Association for Computational Linguistics, November 2021
work page 2021
-
[5]
Tsetlin machine embedding: Representing words using logical expressions
Bimal Bhattarai, Ole-Christoffer Granmo, Lei Jiao, Rohan Yadav, and Jivitesh Sharma. Tsetlin machine embedding: Representing words using logical expressions. In Yvette Graham and Matthew Purver, editors,Findings of the Association for Computational Linguistics: EACL 2024, pages 1512–1522, St. Julian’s, Malta, March 2024. Association for Computational Linguistics
work page 2024
-
[6]
V ojtech Halenka, Ahmed K. Kadhim, Paul F. A. Clarke, Bimal Bhattarai, Rupsa Saha, Ole- Christoffer Granmo, Lei Jiao, and Per-Arne Andersen. Exploring effects of hyperdimensional vectors for tsetlin machines. In 2024 International Symposium on the Tsetlin Machine (ISTM), pages 1–8, 2024
work page 2024
-
[7]
Multimodal learning with graphs
Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. Multimodal learning with graphs. Nature Machine Intelligence, 5(4):340–350, April 2023. Publisher: Nature Publishing Group
work page 2023
-
[8]
Ole-Christoffer Granmo. The tsetlin machine–a game theoretic bandit driven approach to optimal pattern recognition with propositional logic. arXiv preprint arXiv:1804.01508, 2018
-
[9]
The convolutional tsetlin machine
Ole-Christoffer Granmo, Sondre Glimsdal, Lei Jiao, Morten Goodwin, Christian W Omlin, and Geir Thore Berge. The convolutional tsetlin machine. arXiv preprint arXiv:1905.09688, 2019
-
[10]
Jivitesh Sharma, Rohan Yadav, Ole-Christoffer Granmo, and Lei Jiao. Drop Clause: Enhanc- ing Performance, Robustness and Pattern Recognition Capabilities of the Tsetlin Machine. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11):13547–13555, Jun. 2023
work page 2023
-
[11]
Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Lei Jiao, and Morten Goodwin
K. Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Lei Jiao, and Morten Goodwin. The Regression Tsetlin Machine - A Novel Approach to Interpretable Non-Linear Regression. Philosophical Transactions of the Royal Society A, 378, 2020
work page 2020
-
[12]
Tsetlin machine for solving contextual bandit problems
Raihan Seraj, Jivitesh Sharma, and Ole-Christoffer Granmo. Tsetlin machine for solving contextual bandit problems. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 30194–30205. Curran Associates, Inc., 2022
work page 2022
-
[13]
Dmitri A. Rachkovskij. Representation and processing of structures with binary sparse dis- tributed codes. IEEE transactions on Knowledge and Data Engineering, 13(2):261–276, 2001
work page 2001
-
[14]
Coalesced multi-output tsetlin machines with clause sharing
Sondre Glimsdal and Ole-Christoffer Granmo. Coalesced multi-output tsetlin machines with clause sharing. arXiv preprint arXiv:2108.07594, 2021
-
[15]
Gradient-based learning applied to document recognition
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 10
work page 1998
-
[16]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[17]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. University of Toronto, 2009
work page 2009
-
[18]
Smørvik, and Ole-Christoffer Granmo
Ylva Grønningsæter, Halvor S. Smørvik, and Ole-Christoffer Granmo. An optimized toolbox for advanced image processing with tsetlin machine composites. In 2024 International Symposium on the Tsetlin Machine (ISTM), pages 1–8, 2024
work page 2024
-
[19]
Geometric deep learning on graphs and manifolds using mixture model cnns
Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5115–5124, 2017
work page 2017
-
[20]
Convolutional neural networks on graphs with fast localized spectral filtering
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016
work page 2016
-
[21]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y . Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, page 142–150, USA, 2011. Association for Computational Linguistics
work page 2011
-
[22]
Character-level convolutional networks for text classification
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 649–657, Cambridge, MA, USA, 2015. MIT Press
work page 2015
-
[23]
Annotating Expressions of Opinions and Emotions in Language
Janyce Wiebe, Theresa Wilson, and Claire Cardie. Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation, 39(2):165–210, May 2005
work page 2005
-
[24]
Simpler Context-Dependent Logical Forms via Model Projections
Reginald Long, Panupong Pasupat, and Percy Liang. Simpler context-dependent logical forms via model projections. arXiv preprint arXiv:1606.05378, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[25]
From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood
Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, and Percy Liang. From language to pro- grams: Bridging reinforcement learning and maximum marginal likelihood. arXiv preprint arXiv:1704.07926, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [26]
-
[27]
Ncbi taxonomy: enhanced access via ncbi datasets
Eric Cox, Mirian Tsuchiya, Stacy Ciufo, John Torcivia, Robert Falk, W Anderson, J Holmes, Vichet Hem, Laurie Breen, Emily Davis, Anne Ketter, Peifen Zhang, Vladimir Soussov, Conrad Schoch, and Nuala O’Leary. Ncbi taxonomy: enhanced access via ncbi datasets. Nucleic acids research, 53, 10 2024
work page 2024
-
[28]
Include" is selected. The action becomes “Exclude
Lei Jiao, Xuan Zhang, Ole-Christoffer Granmo, and Kuruge Darshana Abeyrathna. On the Convergence of Tsetlin Machines for the XOR Operator.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6072–6085, 2022. A Appendix / supplemental material A.1 Standard Tsetline Machine We briefly introduce here the learning entities in a Tsetlin Machin...
work page 2022
-
[29]
Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: The main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope, which is further validated by experiments in Section 3. Guidelines: • The answer NA mean...
-
[30]
Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: Limitations of the proposed architecture with regards to different datasets are discussed along side the experimental results. Computational efficiency is reported where applicable. Guidelines: • The answer NA means that the pape...
-
[31]
Guidelines: • The answer NA means that the paper does not include theoretical results
Theory Assumptions and Proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? 28 Answer: [NA] Justification: This work proposes a new Tsetlin Machine architecture and explores its applicability across various datasets. Guidelines: • The answer NA means that the paper does not in...
-
[32]
Guidelines: • The answer NA means that the paper does not include experiments
Experimental Result Reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main ex- perimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: Code will be provided as p...
-
[33]
Guidelines: • The answer NA means that paper does not include experiments requiring code
Open access to data and code 29 Question: Does the paper provide open access to the data and code, with sufficient instruc- tions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Anonymized code for experiments will be provided in the supplemental material, and details for open acce...
-
[34]
Guidelines: • The answer NA means that the paper does not include experiments
Experimental Setting/Details Question: Does the paper specify all the training and test details (e.g., data splits, hyper- parameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: Training and test details are available via the appendix and the supplemental material. Guidelines: • The ans...
-
[35]
Experimental run wise details are added in supplemental material
Experiment Statistical Significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: Standard deviation are reported alongside obtained accuracies for most experi- ments. Experimental run wise details are added in s...
-
[36]
Guidelines: • The answer NA means that the paper does not include experiments
Experiments Compute Resources Question: For each experiment, does the paper provide sufficient information on the com- puter resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: Compute resources are elaborated in Section 3. Guidelines: • The answer NA means that the paper does no...
-
[37]
Guidelines: • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics
Code Of Ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines? Answer: [Yes] Justification: This work maintains strict adherence to the NeurIPS Code of Ethics. Guidelines: • The answer NA means that the authors have not reviewed the NeurIPS Code of Et...
-
[38]
Guidelines: • The answer NA means that there is no societal impact of the work performed
Broader Impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: This work proposes a novel architecture for the Tsetlin Machine and its applicability on smaller datsets, thus wider applications and further societal impact is beyond the scope of this ...
-
[39]
Guidelines: • The answer NA means that the paper poses no such risks
Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: The data and models described in this paper pose no such risk. Guidelines: • The answer NA mean...
-
[40]
Guidelines: • The answer NA means that the paper does not use existing assets
Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: All licences and credits have been properly acknowledged in the references. Guidelines: • The an...
-
[41]
Guidelines: • The answer NA means that the paper does not release new assets
New Assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [Yes] Justification: Applicable assets are made available via supplemental material. Guidelines: • The answer NA means that the paper does not release new assets. • Researchers should communicate the details of the da...
-
[42]
Crowdsourcing and Research with Human Subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: Not applicable as this research does not use human su...
-
[43]
Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.