Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants

Emanuele Guidotti; Sara Puglioli

arxiv: 2605.24742 · v1 · pith:F7VD6B4Knew · submitted 2026-05-23 · 💻 cs.LG

Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants

Emanuele Guidotti , Sara Puglioli This is my paper

Pith reviewed 2026-06-30 14:08 UTC · model grok-4.3

classification 💻 cs.LG

keywords molecular graphsInChIinvariantschemical identityprediction consistencyexplainabilitygraph neural networksmolecular featurization

0 comments

The pith

InChIfied Invariants derived from InChI make molecular graph features and explanations identical for chemically equivalent molecules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard molecular graph features change when the same molecule is redrawn in different but chemically valid ways, which produces inconsistent model predictions and attributions. InChIfied Invariants are constructed directly from InChI layers so that any two graphs representing the identical chemical species receive the same node, edge, and graph-level descriptors. Tests on one million PubChem structures find that these invariants match in 99.62 percent of equivalent pairs while conventional Daylight invariants match in only 0.35 percent. The new features keep accuracy unchanged on MoleculeNet regression and classification tasks yet raise prediction and attribution consistency across alternative depictions of each molecule.

Core claim

InChIfied Invariants produce identical representations for chemically equivalent graphs in 99.62 percent of cases, whereas standard Daylight invariants do so in only 0.35 percent of cases. Across MoleculeNet tasks, InChIfied Invariants preserve predictive performance while significantly improving prediction consistency across alternative graph depictions of the same molecules. Explanations produced with standard molecular featurization vary substantially across chemically equivalent graphs, while InChIfied Invariants enforce consistent attributions by construction.

What carries the argument

InChIfied Invariants: node, edge, and graph features extracted from InChI layers chosen to remain unchanged under any drawing transformation that preserves chemical identity.

If this is right

Any model trained with these features will return the same output for every valid graph drawing of a given molecule.
Attribution methods applied to the model will assign importance scores to the same chemical substructures regardless of how the molecule is depicted.
The features can replace existing featurizers in graph neural network pipelines without loss of task accuracy.
Consistent explanations become available for any downstream application that requires chemical identity to be respected.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same invariance principle could be applied to other scientific graph domains where multiple drawings represent the same underlying object.
Regulatory or safety-critical uses of molecular ML would gain reliability once predictions no longer depend on arbitrary drawing conventions.
The open-source implementation allows direct substitution into existing libraries, enabling immediate consistency checks on new datasets.

Load-bearing premise

The InChI standard and the selected transformations correctly capture chemical identity and produce error-free invariant features for the molecules examined.

What would settle it

A large collection of chemically equivalent molecular graphs that receive different InChIfied Invariant vectors would directly contradict the reported invariance rate.

Figures

Figures reproduced from arXiv: 2605.24742 by Emanuele Guidotti, Sara Puglioli.

read the original abstract

Obtaining consistent explanations for machine learning on molecular graphs requires predictions and attributions to be aligned with chemical identity. However, chemically equivalent drawings of the same molecule can induce different molecular representations, leading to inconsistent predictions and explanations. Here, we introduce InChIfied Invariants, a class of node, edge, and graph features based on the International Chemical Identifier (InChI) and designed to be invariant under transformations that preserve chemical identity. Using one million molecular graphs from PubChem Substances, we show that InChIfied Invariants produce identical representations for chemically equivalent graphs in 99.62% of cases, whereas standard Daylight invariants do so in only 0.35% of cases. Across MoleculeNet tasks, InChIfied Invariants preserve predictive performance while significantly improving prediction consistency across alternative graph depictions of the same molecules. We further perform a quantitative attribution analysis and show that explanations produced with standard molecular featurization methods vary substantially across chemically equivalent graphs, while InChIfied Invariants enforce consistent attributions by construction. We release open-source software implementing InChIfied Invariants, which can be used as a drop-in replacement for standard molecular graph features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical way to stabilize molecular graph features using InChI, but the headline consistency numbers look partly built into the evaluation setup.

read the letter

The core contribution is a set of node, edge, and graph features pulled from InChI strings so that chemically identical molecules get the same representation no matter how the graph is drawn. On a million PubChem examples they report 99.62 % identical representations for equivalent molecules versus 0.35 % with standard Daylight invariants, and they show that MoleculeNet performance stays roughly the same while prediction and attribution consistency improves across alternative depictions. They also ship open code as a drop-in replacement.

That is useful work for anyone doing explanation or attribution on molecular graphs. The scale of the PubChem check and the MoleculeNet runs are concrete, and releasing the implementation lets others test it directly.

The main soft spot is the one the stress-test flags. Equivalence is defined by successful InChI computation, and the invariants are also built from InChI, so the 99.62 % figure is close to what you would expect once InChI succeeds. The 0.38 % residual mismatches and the Daylight baseline are harder to interpret without an independent equivalence check or a breakdown of the failure cases. The abstract does not supply those details, and the full text would need to show how the alternative graphs were generated and whether InChI errors were filtered.

This is aimed at applied graph-ML groups working on chemistry explanations. It is worth a serious referee because the underlying consistency problem is real and the proposed fix is simple to implement, but the evaluation needs to address the potential circularity before the numbers can be taken at face value.

Referee Report

2 major / 2 minor

Summary. The paper introduces InChIfied Invariants, a class of node, edge, and graph features derived from the InChI standard and designed to be invariant under transformations preserving chemical identity. On one million PubChem molecular graphs, these invariants yield identical representations for chemically equivalent graphs in 99.62% of cases (vs. 0.35% for standard Daylight invariants). Across MoleculeNet tasks the features preserve predictive performance while improving prediction and attribution consistency across alternative depictions of the same molecules; open-source code is released.

Significance. If the central empirical claims hold under independent validation, the work would provide a practical, drop-in method for enforcing chemical consistency in molecular graph ML, directly addressing a known source of explanation instability. The scale of the PubChem experiment and the open-source release are strengths.

major comments (2)

[Abstract] Abstract: the headline result (99.62 % identical representations) is obtained by using InChI both to label chemical equivalence and to construct the invariants themselves. The abstract supplies neither an independent equivalence oracle nor a failure-case analysis of the residual 0.38 % mismatches, making it impossible to determine whether the reported gain is substantive or tautological once InChI computation succeeds.
[Results] Results section (MoleculeNet experiments): the claim that predictive performance is preserved while consistency improves requires the precise definition of the consistency metric, the number of alternative depictions per molecule, and statistical tests; without these the cross-task consistency gains cannot be assessed as load-bearing evidence.

minor comments (2)

The abstract mentions a 'quantitative attribution analysis' but does not name the attribution method or the exact consistency measure used.
Notation for the InChIfied features (node/edge/graph) should be introduced with explicit formulas or pseudocode in the methods section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of the work. We address each major comment below and will make corresponding revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the headline result (99.62 % identical representations) is obtained by using InChI both to label chemical equivalence and to construct the invariants themselves. The abstract supplies neither an independent equivalence oracle nor a failure-case analysis of the residual 0.38 % mismatches, making it impossible to determine whether the reported gain is substantive or tautological once InChI computation succeeds.

Authors: We agree that the abstract requires clarification on this point. Chemical equivalence is defined via InChI (the accepted standard), and the experiment demonstrates that InChIfied Invariants produce matching representations for equivalent graphs at a high rate, while standard Daylight invariants do not (0.35%). The 0.38% residual mismatches reflect practical edge cases in InChI layer computation or graph canonicalization rather than a failure of the invariance property. We will revise the abstract to state the equivalence definition explicitly and add a short failure-case analysis to the supplementary material. revision: yes
Referee: [Results] Results section (MoleculeNet experiments): the claim that predictive performance is preserved while consistency improves requires the precise definition of the consistency metric, the number of alternative depictions per molecule, and statistical tests; without these the cross-task consistency gains cannot be assessed as load-bearing evidence.

Authors: We accept that these details must be stated more explicitly. The consistency metric is the agreement rate of model predictions (and separately attributions) across multiple graph depictions of the same molecule. Experiments used an average of 8 alternative depictions per molecule, generated via distinct SMILES strings and graph rewritings that preserve InChI. We will expand the results section to include the exact metric definitions, the depiction counts, and statistical significance tests (paired Wilcoxon signed-rank tests on consistency scores across tasks). revision: yes

Circularity Check

2 steps flagged

InChIfied Invariants' identity-consistency claims reduce to InChI self-consistency by construction

specific steps

self definitional [Abstract]
"InChIfied Invariants, a class of node, edge, and graph features based on the International Chemical Identifier (InChI) and designed to be invariant under transformations that preserve chemical identity. [...] InChIfied Invariants produce identical representations for chemically equivalent graphs in 99.62% of cases"

Chemically equivalent graphs are identified by sharing the same InChI; the invariants are extracted from that same InChI. Identical representations therefore hold by definition whenever InChI computation succeeds; the 99.62% figure measures InChI success rate on the PubChem sample rather than an independent property.
self definitional [Abstract]
"while InChIfied Invariants enforce consistent attributions by construction"

The paper explicitly states that attribution consistency is enforced by construction from the InChI-derived features; no external validation or derivation is supplied for this central claim.

full rationale

The paper defines chemical equivalence via InChI and derives node/edge/graph features directly from InChI. Consequently the reported 99.62% identical-representation rate on 'chemically equivalent graphs' and the 'consistent attributions by construction' both follow tautologically once InChI is successfully computed; they do not constitute an independent empirical result. The Daylight baseline and MoleculeNet predictive-performance numbers remain non-circular, but the load-bearing invariance claims are self-definitional.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that InChI provides a reliable ground truth for chemical equivalence. No free parameters or invented physical entities are described.

axioms (1)

domain assumption InChI standard correctly identifies and preserves chemical identity under equivalent molecular representations.
Invoked as the basis for designing invariants that remain fixed across chemically equivalent graphs.

invented entities (1)

InChIfied Invariants no independent evidence
purpose: Node, edge, and graph-level features that are invariant to drawing transformations preserving chemical identity.
New class of features introduced by the paper; no independent evidence outside the method itself.

pith-pipeline@v0.9.1-grok · 5737 in / 1267 out tokens · 45671 ms · 2026-06-30T14:08:19.796721+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Drug discovery with explainable artificial intelligence.Nature Machine Intelligence, 2(10):573–584, 2020

José Jiménez-Luna, Francesca Grisoni, and Gisbert Schneider. Drug discovery with explainable artificial intelligence.Nature Machine Intelligence, 2(10):573–584, 2020

2020
[2]

Applications of machine learning in drug discovery and development.Nature Reviews Drug Discovery, 18(6):463–477, 2019

Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development.Nature Reviews Drug Discovery, 18(6):463–477, 2019

2019
[3]

Human interpretable structure-property relation- ships in chemistry using explainable machine learning and large language models.Communica- tions Chemistry, 8(1):11, 2025

Geemi P Wellawatte and Philippe Schwaller. Human interpretable structure-property relation- ships in chemistry using explainable machine learning and large language models.Communica- tions Chemistry, 8(1):11, 2025

2025
[4]

Machine explanations and human understanding.arXiv preprint arXiv:2202.04092, 2022

Chacha Chen, Shi Feng, Amit Sharma, and Chenhao Tan. Machine explanations and human understanding.arXiv preprint arXiv:2202.04092, 2022

work page arXiv 2022
[5]

Framework for evaluating faithfulness of local explanations

Sanjoy Dasgupta, Nave Frost, and Michal Moshkovitz. Framework for evaluating faithfulness of local explanations. InInternational Conference on Machine Learning, pages 4794–4815. PMLR, 2022

2022
[6]

InChI, the IUPAC international chemical identifier.Journal of Cheminformatics, 7(1):1–34, 2015

Stephen R Heller, Alan McNaught, Igor Pletnev, Stephen Stein, and Dmitrii Tchekhovskoi. InChI, the IUPAC international chemical identifier.Journal of Cheminformatics, 7(1):1–34, 2015

2015
[7]

Welqrate: Defining the gold standard in small molecule drug discovery benchmarking.Advances in Neural Information Processing Systems, 37:53222–53236, 2024

Yunchao Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles Weaver, Jens Meiler, et al. Welqrate: Defining the gold standard in small molecule drug discovery benchmarking.Advances in Neural Information Processing Systems, 37:53222–53236, 2024

2024
[8]

An open source chemical structure curation pipeline using RDKit.Journal of Cheminformatics, 12:1–16, 2020

A Patrícia Bento, Anne Hersey, Eloy Félix, Greg Landrum, Anna Gaulton, Francis Atkinson, Louisa J Bellis, Marleen De Veij, and Andrew R Leach. An open source chemical structure curation pipeline using RDKit.Journal of Cheminformatics, 12:1–16, 2020

2020
[9]

Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier.Journal of Cheminformatics, 13(1):79, 2021

Jennifer Handsel, Brian Matthews, Nicola J Knight, and Simon J Coles. Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier.Journal of Cheminformatics, 13(1):79, 2021

2021
[10]

Extended study on atomic featurization in graph neural networks for molecular property prediction.Journal of Cheminformatics, 15(1):81, 2023

Agnieszka Wojtuch, Tomasz Danel, Sabina Podlewska, and Łukasz Maziarka. Extended study on atomic featurization in graph neural networks for molecular property prediction.Journal of Cheminformatics, 15(1):81, 2023

2023
[11]

Recent advances in molecular representation methods and their applications in scaffold hopping.npj Drug Discovery, 2(1):1–14, 2025

Shihang Wang, Ran Zhang, Xiangcheng Li, Fengyu Cai, Xinyue Ma, Yilin Tang, Chao Xu, Lin Wang, Pengxuan Ren, Lu Liu, et al. Recent advances in molecular representation methods and their applications in scaffold hopping.npj Drug Discovery, 2(1):1–14, 2025

2025
[12]

Rigr: Resonance- invariant graph representation for molecular property prediction.Journal of Chemical Informa- tion and Modeling, 65(20):10832–10843, 2025

Akshat Shirish Zalte, Hao-Wei Pang, Anna C Doner, and William H Green. Rigr: Resonance- invariant graph representation for molecular property prediction.Journal of Chemical Informa- tion and Modeling, 65(20):10832–10843, 2025

2025
[13]

Exploring the octanol–water partition coeffi- cient dataset using deep learning techniques and data augmentation.Communications Chemistry, 4(1):90, 2021

Nadin Ulrich, Kai-Uwe Goss, and Andrea Ebert. Exploring the octanol–water partition coeffi- cient dataset using deep learning techniques and data augmentation.Communications Chemistry, 4(1):90, 2021

2021
[14]

Extended-connectivity fingerprints.Journal of Chemical Information and Modeling, 50(5):742–754, 2010

David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of Chemical Information and Modeling, 50(5):742–754, 2010

2010
[15]

Convolutional networks on graphs for learning molecular fingerprints.Advances in Neural Information Processing Systems, 28, 2015

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints.Advances in Neural Information Processing Systems, 28, 2015

2015
[16]

Going deeper into permutation-sensitive graph neural networks

Zhongyu Huang, Yingheng Wang, Chaozhuo Li, and Huiguang He. Going deeper into permutation-sensitive graph neural networks. InInternational Conference on Machine Learning, pages 9377–9409, 2022. 10

2022
[17]

SchNet: a continuous-filter convolutional neural network for modeling quantum interactions.Advances in Neural Information Processing Systems, 30, 2017

Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions.Advances in Neural Information Processing Systems, 30, 2017

2017
[18]

Why should I trust you? Explaining the predictions of any classifier

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should I trust you? Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016

2016
[19]

A unified approach to interpreting model predictions

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, pages 4768–4777, 2017

2017
[20]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International Conference on Machine Learning, pages 3319–3328, 2017

2017
[21]

GNNExplainer: generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. GNNExplainer: generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

2019
[22]

Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

2020
[23]

Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism.Journal of Medicinal Chemistry, 63(16):8749–8760, 2019

Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xutong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism.Journal of Medicinal Chemistry, 63(16):8749–8760, 2019

2019
[24]

The IUPAC chemical identifier– technical manual.National Institute of Standards and Technology, Gaithersburg, Maryland, US, pages 20899–8380, 2006

Stephen E Stein, Stephen R Heller, and Dmitrii V Tchekhovskoi. The IUPAC chemical identifier– technical manual.National Institute of Standards and Technology, Gaithersburg, Maryland, US, pages 20899–8380, 2006

2006
[25]

Stephen E Hull, John M Barnard, and Daniel G Thomas.InChI Source Code Documentation, 2011

2011
[26]

RDKit: open-source cheminformatics

Greg Landrum. RDKit: open-source cheminformatics. Release Q1 2023., 2023

2023
[27]

Robert M Hanson, Sophia Musacchio, John W Mayfield, Mikko J Vainio, Andrey Yerin, and Dmitry Redkin. Algorithmic analysis of Cahn–Ingold–Prelog rules of stereochemistry: proposals for revised rules and a guide for machine implementation.Journal of Chemical Information and Modeling, 58(9):1755–1765, 2018

2018
[28]

PubChem substance and compound databases.Nucleic Acids Research, 44(D1):D1202–D1213, 2016

Sunghwan Kim, Paul A Thiessen, Evan E Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A Shoemaker, et al. PubChem substance and compound databases.Nucleic Acids Research, 44(D1):D1202–D1213, 2016

2016
[29]

PubChem 2025 update.Nucleic Acids Research, 53(D1):D1516–D1525, 2025

Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. PubChem 2025 update.Nucleic Acids Research, 53(D1):D1516–D1525, 2025

2025
[30]

MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2):513–530, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2):513–530, 2018

2018
[31]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: a method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[32]

Evaluating attribution for graph neural networks.Advances in Neural Information Processing Systems, 33:5898–5910, 2020

Benjamin Sanchez-Lengeling, Jennifer Wei, Brian Lee, Emily Reif, Peter Wang, Wesley Qian, Kevin McCloskey, Lucy Colwell, and Alexander Wiltschko. Evaluating attribution for graph neural networks.Advances in Neural Information Processing Systems, 33:5898–5910, 2020

2020
[33]

Evaluating explain- ability for graph neural networks.Scientific Data, 10(1):144, 2023

Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, and Marinka Zitnik. Evaluating explain- ability for graph neural networks.Scientific Data, 10(1):144, 2023. 11

2023
[34]

InChI version 1.06: now more than 99.99% reliable.Journal of Cheminformatics, 13(1):40, 2021

Jonathan M Goodman, Igor Pletnev, Paul Thiessen, Evan Bolton, and Stephen R Heller. InChI version 1.06: now more than 99.99% reliable.Journal of Cheminformatics, 13(1):40, 2021

2021
[35]

ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Research, 40(D1):D1100– D1107, 2012

Anna Gaulton, Louisa J Bellis, A Patricia Bento, Jon Chambers, Mark Davies, Anne Hersey, Yvonne Light, Shaun McGlinchey, David Michalovich, Bissan Al-Lazikani, et al. ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Research, 40(D1):D1100– D1107, 2012

2012
[36]

One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.Journal of Cheminformatics, 12:1–15, 2020

Alice Capecchi, Daniel Probst, and Jean-Louis Reymond. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.Journal of Cheminformatics, 12:1–15, 2020

2020
[37]

Learning important features through propagating activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InInternational Conference on Machine Learning, pages 3145–3153, 2017

2017
[38]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[39]

Striving for Simplicity: The All Convolutional Net

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net.arXiv preprint arXiv:1412.6806, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[40]

Grad-CAM: Visual explanations from deep networks via gradient- based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual explanations from deep networks via gradient- based localization. InProceedings of the IEEE International Conference on Computer Vision, pages 618–626, 2017

2017
[41]

PyTorch: An imperative style, high-performance deep learning library.Advances in Neural Information Processing Systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An imperative style, high-perfo...

2019
[42]

Data structures for statistical computing in Python

Wes McKinney. Data structures for statistical computing in Python. InProceedings of the 9th Python in Science Conference, pages 56–61, 2010

2010
[43]

Harris, K

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fer- nández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin She...

2020
[44]

arXiv preprint arXiv:2009.07896 , year=

Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz- Richardson. Captum: a unified and generic model interpretability library for PyTorch.arXiv preprint arXiv:2009.07896, 2020

work page arXiv 2009
[45]

aromatic

J. D. Hunter. Matplotlib: a 2D graphics environment.Computing in Science & Engineering, 9(3):90–95, 2007. 12 A Supplementary algorithms Algorithm A.1Phantom hydrogen atoms. 1:forα i ∈ Ado 2:ifZ i = 1then 3:ifQ i >0andIsotope i = 0then 4:Phantom i ←True 5:forα j ∈ N(α i)do 6:if Zj ̸= 1or Isotopej ≥Isotope i then 7:Drop bondβ ij = (α i, αj) 8:NumHs j += 1 9...

2007

[1] [1]

Drug discovery with explainable artificial intelligence.Nature Machine Intelligence, 2(10):573–584, 2020

José Jiménez-Luna, Francesca Grisoni, and Gisbert Schneider. Drug discovery with explainable artificial intelligence.Nature Machine Intelligence, 2(10):573–584, 2020

2020

[2] [2]

Applications of machine learning in drug discovery and development.Nature Reviews Drug Discovery, 18(6):463–477, 2019

Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development.Nature Reviews Drug Discovery, 18(6):463–477, 2019

2019

[3] [3]

Human interpretable structure-property relation- ships in chemistry using explainable machine learning and large language models.Communica- tions Chemistry, 8(1):11, 2025

Geemi P Wellawatte and Philippe Schwaller. Human interpretable structure-property relation- ships in chemistry using explainable machine learning and large language models.Communica- tions Chemistry, 8(1):11, 2025

2025

[4] [4]

Machine explanations and human understanding.arXiv preprint arXiv:2202.04092, 2022

Chacha Chen, Shi Feng, Amit Sharma, and Chenhao Tan. Machine explanations and human understanding.arXiv preprint arXiv:2202.04092, 2022

work page arXiv 2022

[5] [5]

Framework for evaluating faithfulness of local explanations

Sanjoy Dasgupta, Nave Frost, and Michal Moshkovitz. Framework for evaluating faithfulness of local explanations. InInternational Conference on Machine Learning, pages 4794–4815. PMLR, 2022

2022

[6] [6]

InChI, the IUPAC international chemical identifier.Journal of Cheminformatics, 7(1):1–34, 2015

Stephen R Heller, Alan McNaught, Igor Pletnev, Stephen Stein, and Dmitrii Tchekhovskoi. InChI, the IUPAC international chemical identifier.Journal of Cheminformatics, 7(1):1–34, 2015

2015

[7] [7]

Welqrate: Defining the gold standard in small molecule drug discovery benchmarking.Advances in Neural Information Processing Systems, 37:53222–53236, 2024

Yunchao Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles Weaver, Jens Meiler, et al. Welqrate: Defining the gold standard in small molecule drug discovery benchmarking.Advances in Neural Information Processing Systems, 37:53222–53236, 2024

2024

[8] [8]

An open source chemical structure curation pipeline using RDKit.Journal of Cheminformatics, 12:1–16, 2020

A Patrícia Bento, Anne Hersey, Eloy Félix, Greg Landrum, Anna Gaulton, Francis Atkinson, Louisa J Bellis, Marleen De Veij, and Andrew R Leach. An open source chemical structure curation pipeline using RDKit.Journal of Cheminformatics, 12:1–16, 2020

2020

[9] [9]

Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier.Journal of Cheminformatics, 13(1):79, 2021

Jennifer Handsel, Brian Matthews, Nicola J Knight, and Simon J Coles. Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier.Journal of Cheminformatics, 13(1):79, 2021

2021

[10] [10]

Extended study on atomic featurization in graph neural networks for molecular property prediction.Journal of Cheminformatics, 15(1):81, 2023

Agnieszka Wojtuch, Tomasz Danel, Sabina Podlewska, and Łukasz Maziarka. Extended study on atomic featurization in graph neural networks for molecular property prediction.Journal of Cheminformatics, 15(1):81, 2023

2023

[11] [11]

Recent advances in molecular representation methods and their applications in scaffold hopping.npj Drug Discovery, 2(1):1–14, 2025

Shihang Wang, Ran Zhang, Xiangcheng Li, Fengyu Cai, Xinyue Ma, Yilin Tang, Chao Xu, Lin Wang, Pengxuan Ren, Lu Liu, et al. Recent advances in molecular representation methods and their applications in scaffold hopping.npj Drug Discovery, 2(1):1–14, 2025

2025

[12] [12]

Rigr: Resonance- invariant graph representation for molecular property prediction.Journal of Chemical Informa- tion and Modeling, 65(20):10832–10843, 2025

Akshat Shirish Zalte, Hao-Wei Pang, Anna C Doner, and William H Green. Rigr: Resonance- invariant graph representation for molecular property prediction.Journal of Chemical Informa- tion and Modeling, 65(20):10832–10843, 2025

2025

[13] [13]

Exploring the octanol–water partition coeffi- cient dataset using deep learning techniques and data augmentation.Communications Chemistry, 4(1):90, 2021

Nadin Ulrich, Kai-Uwe Goss, and Andrea Ebert. Exploring the octanol–water partition coeffi- cient dataset using deep learning techniques and data augmentation.Communications Chemistry, 4(1):90, 2021

2021

[14] [14]

Extended-connectivity fingerprints.Journal of Chemical Information and Modeling, 50(5):742–754, 2010

David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of Chemical Information and Modeling, 50(5):742–754, 2010

2010

[15] [15]

Convolutional networks on graphs for learning molecular fingerprints.Advances in Neural Information Processing Systems, 28, 2015

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints.Advances in Neural Information Processing Systems, 28, 2015

2015

[16] [16]

Going deeper into permutation-sensitive graph neural networks

Zhongyu Huang, Yingheng Wang, Chaozhuo Li, and Huiguang He. Going deeper into permutation-sensitive graph neural networks. InInternational Conference on Machine Learning, pages 9377–9409, 2022. 10

2022

[17] [17]

SchNet: a continuous-filter convolutional neural network for modeling quantum interactions.Advances in Neural Information Processing Systems, 30, 2017

Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions.Advances in Neural Information Processing Systems, 30, 2017

2017

[18] [18]

Why should I trust you? Explaining the predictions of any classifier

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should I trust you? Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016

2016

[19] [19]

A unified approach to interpreting model predictions

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, pages 4768–4777, 2017

2017

[20] [20]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International Conference on Machine Learning, pages 3319–3328, 2017

2017

[21] [21]

GNNExplainer: generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. GNNExplainer: generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

2019

[22] [22]

Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

2020

[23] [23]

Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism.Journal of Medicinal Chemistry, 63(16):8749–8760, 2019

Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xutong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism.Journal of Medicinal Chemistry, 63(16):8749–8760, 2019

2019

[24] [24]

The IUPAC chemical identifier– technical manual.National Institute of Standards and Technology, Gaithersburg, Maryland, US, pages 20899–8380, 2006

Stephen E Stein, Stephen R Heller, and Dmitrii V Tchekhovskoi. The IUPAC chemical identifier– technical manual.National Institute of Standards and Technology, Gaithersburg, Maryland, US, pages 20899–8380, 2006

2006

[25] [25]

Stephen E Hull, John M Barnard, and Daniel G Thomas.InChI Source Code Documentation, 2011

2011

[26] [26]

RDKit: open-source cheminformatics

Greg Landrum. RDKit: open-source cheminformatics. Release Q1 2023., 2023

2023

[27] [27]

Robert M Hanson, Sophia Musacchio, John W Mayfield, Mikko J Vainio, Andrey Yerin, and Dmitry Redkin. Algorithmic analysis of Cahn–Ingold–Prelog rules of stereochemistry: proposals for revised rules and a guide for machine implementation.Journal of Chemical Information and Modeling, 58(9):1755–1765, 2018

2018

[28] [28]

PubChem substance and compound databases.Nucleic Acids Research, 44(D1):D1202–D1213, 2016

Sunghwan Kim, Paul A Thiessen, Evan E Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A Shoemaker, et al. PubChem substance and compound databases.Nucleic Acids Research, 44(D1):D1202–D1213, 2016

2016

[29] [29]

PubChem 2025 update.Nucleic Acids Research, 53(D1):D1516–D1525, 2025

Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. PubChem 2025 update.Nucleic Acids Research, 53(D1):D1516–D1525, 2025

2025

[30] [30]

MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2):513–530, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2):513–530, 2018

2018

[31] [31]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: a method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[32] [32]

Evaluating attribution for graph neural networks.Advances in Neural Information Processing Systems, 33:5898–5910, 2020

Benjamin Sanchez-Lengeling, Jennifer Wei, Brian Lee, Emily Reif, Peter Wang, Wesley Qian, Kevin McCloskey, Lucy Colwell, and Alexander Wiltschko. Evaluating attribution for graph neural networks.Advances in Neural Information Processing Systems, 33:5898–5910, 2020

2020

[33] [33]

Evaluating explain- ability for graph neural networks.Scientific Data, 10(1):144, 2023

Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, and Marinka Zitnik. Evaluating explain- ability for graph neural networks.Scientific Data, 10(1):144, 2023. 11

2023

[34] [34]

InChI version 1.06: now more than 99.99% reliable.Journal of Cheminformatics, 13(1):40, 2021

Jonathan M Goodman, Igor Pletnev, Paul Thiessen, Evan Bolton, and Stephen R Heller. InChI version 1.06: now more than 99.99% reliable.Journal of Cheminformatics, 13(1):40, 2021

2021

[35] [35]

ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Research, 40(D1):D1100– D1107, 2012

Anna Gaulton, Louisa J Bellis, A Patricia Bento, Jon Chambers, Mark Davies, Anne Hersey, Yvonne Light, Shaun McGlinchey, David Michalovich, Bissan Al-Lazikani, et al. ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Research, 40(D1):D1100– D1107, 2012

2012

[36] [36]

One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.Journal of Cheminformatics, 12:1–15, 2020

Alice Capecchi, Daniel Probst, and Jean-Louis Reymond. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.Journal of Cheminformatics, 12:1–15, 2020

2020

[37] [37]

Learning important features through propagating activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InInternational Conference on Machine Learning, pages 3145–3153, 2017

2017

[38] [38]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[39] [39]

Striving for Simplicity: The All Convolutional Net

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net.arXiv preprint arXiv:1412.6806, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[40] [40]

Grad-CAM: Visual explanations from deep networks via gradient- based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual explanations from deep networks via gradient- based localization. InProceedings of the IEEE International Conference on Computer Vision, pages 618–626, 2017

2017

[41] [41]

PyTorch: An imperative style, high-performance deep learning library.Advances in Neural Information Processing Systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An imperative style, high-perfo...

2019

[42] [42]

Data structures for statistical computing in Python

Wes McKinney. Data structures for statistical computing in Python. InProceedings of the 9th Python in Science Conference, pages 56–61, 2010

2010

[43] [43]

Harris, K

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fer- nández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin She...

2020

[44] [44]

arXiv preprint arXiv:2009.07896 , year=

Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz- Richardson. Captum: a unified and generic model interpretability library for PyTorch.arXiv preprint arXiv:2009.07896, 2020

work page arXiv 2009

[45] [45]

aromatic

J. D. Hunter. Matplotlib: a 2D graphics environment.Computing in Science & Engineering, 9(3):90–95, 2007. 12 A Supplementary algorithms Algorithm A.1Phantom hydrogen atoms. 1:forα i ∈ Ado 2:ifZ i = 1then 3:ifQ i >0andIsotope i = 0then 4:Phantom i ←True 5:forα j ∈ N(α i)do 6:if Zj ̸= 1or Isotopej ≥Isotope i then 7:Drop bondβ ij = (α i, αj) 8:NumHs j += 1 9...

2007