ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation
Pith reviewed 2026-05-23 02:23 UTC · model grok-4.3
The pith
ExPath infers targeted pathways in biological networks by identifying the links that most drive graph classification of experimental data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ExPath is a subgraph inference framework that classifies graphs representing bio-networks after integrating experimental molecular data, possibly through foundation models, and designates the links contributing most to those classifications as targeted pathways. Across evaluations on 301 bio-networks the inferred pathways achieve up to 4.5 times higher Fidelity+ and 14 times lower Fidelity- than explainer baselines while preserving signaling chains up to 4 times longer.
What carries the argument
The ExPath subgraph inference framework that classifies bio-networks from experimental data and extracts pathways according to their contribution to the classification outcome.
If this is right
- Targeted pathways can be recovered directly from classification contributions without additional specialized analyses.
- Experimental molecular data can be integrated into pathway inference through existing biological foundation models.
- New ML-oriented metrics allow quantitative comparison of pathway necessity and sufficiency.
- Inferred pathways maintain longer intact signaling chains than those from baseline methods.
Where Pith is reading between the lines
- The classification-contribution principle could be tested on non-biological graphs to extract important substructures in other domains.
- Repeated application as new experimental data arrives could keep pathway databases current without manual curation.
- The fidelity metrics introduced here might serve as proxies for biological relevance in settings where ground-truth pathways are scarce.
Load-bearing premise
Links that contribute more to the classification decision correspond to biologically targeted pathways.
What would settle it
Independent wet-lab tests on a new set of bio-networks that measure whether the pathways ranked highest by ExPath match the actual experimentally confirmed targets better than pathways ranked by existing explainers.
Figures
read the original abstract
Retrieving targeted pathways in biological knowledge bases, particularly when incorporating wet-lab experimental data, remains a challenging task and often requires downstream analyses and specialized expertise. In this paper, we frame this challenge as a solvable graph learning and explaining task and propose a novel subgraph inference framework, ExPAth, that explicitly integrates experimental data to classify various graphs (bio-networks) in biological databases. The links (representing pathways) that contribute more to classification can be considered as targeted pathways. Our framework can seamlessly integrate biological foundation models to encode the experimental molecular data. We propose ML-oriented biological evaluations and a new metric. The experiments involving 301 bio-networks evaluations demonstrate that pathways inferred by ExPath are biologically meaningful, achieving up to 4.5x higher Fidelity+ (necessity) and 14x lower Fidelity- (sufficiency) than explainer baselines, while preserving signaling chains up to 4x longer.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ExPath, a subgraph inference framework that frames targeted pathway retrieval in biological knowledge bases as a graph classification task. Experimental molecular data is integrated into bio-networks, which are then classified using graph learning (potentially with biological foundation models); explanation techniques identify high-contribution links as 'targeted pathways.' On 301 bio-networks, it reports up to 4.5x higher Fidelity+ (necessity) and 14x lower Fidelity- (sufficiency) than explainer baselines, plus preservation of signaling chains up to 4x longer, while introducing ML-oriented biological evaluations and a new metric.
Significance. If the central premise holds and the quantitative gains are reproducible with proper validation, the work could offer a scalable, automated method for inferring biologically relevant pathways directly from experimental data embedded in knowledge graphs, reducing reliance on manual downstream analyses. The integration of foundation models and the focus on fidelity metrics tailored to pathway necessity/sufficiency are potentially useful extensions of graph explanation techniques to systems biology.
major comments (3)
- [Abstract, Experiments] Abstract and § on methods/experiments: no details are provided on model architecture (e.g., GNN type, layers), training procedure, baseline implementations, how the 301 networks were selected, or statistical significance testing. These omissions make the reported 4.5x/14x fidelity gains impossible to assess or reproduce and are load-bearing for the performance claims.
- [Abstract, Evaluation] Evaluation and premise in abstract: the claim that 'links that contribute more to classification can be considered as targeted pathways' and that results are 'biologically meaningful' rests on fidelity metrics alone. No independent validation against curated pathway databases, known signaling chains, or perturbation data is described; if classification contribution does not correlate with biological targeting, the fidelity improvements remain internal to the ML task and do not support the inference claim.
- [Abstract, Evaluation] The new metric and 'ML-oriented biological evaluations' are referenced but not defined or shown to differ substantively from standard explainer fidelity; without explicit definitions or ablation showing they capture biological relevance beyond classification contribution, the biological utility claim cannot be evaluated.
minor comments (1)
- [Abstract] The abstract states quantitative improvements but provides no equations, pseudocode, or high-level architecture diagram, making the framework hard to follow at a glance.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment point-by-point below with clarifications from the manuscript and indicate revisions where appropriate to improve reproducibility and clarity.
read point-by-point responses
-
Referee: [Abstract, Experiments] Abstract and § on methods/experiments: no details are provided on model architecture (e.g., GNN type, layers), training procedure, baseline implementations, how the 301 networks were selected, or statistical significance testing. These omissions make the reported 4.5x/14x fidelity gains impossible to assess or reproduce and are load-bearing for the performance claims.
Authors: We agree that centralized and expanded details are needed for full reproducibility. The full manuscript describes the GNN architecture, training procedure, baseline implementations, network selection from public biological databases, and statistical testing in the Methods and Experiments sections. To address the concern directly, we will add a dedicated 'Implementation Details' subsection (or appendix) that consolidates hyperparameters, pseudocode where relevant, explicit selection criteria for the 301 networks, and the exact statistical tests with p-values in the revised version. revision: yes
-
Referee: [Abstract, Evaluation] Evaluation and premise in abstract: the claim that 'links that contribute more to classification can be considered as targeted pathways' and that results are 'biologically meaningful' rests on fidelity metrics alone. No independent validation against curated pathway databases, known signaling chains, or perturbation data is described; if classification contribution does not correlate with biological targeting, the fidelity improvements remain internal to the ML task and do not support the inference claim.
Authors: The premise rests on integrating experimental molecular data into the bio-networks prior to classification, so that the classification decision is driven by the experimental signals; explanatory links are therefore those most relevant to the experimental conditions and can be interpreted as targeted pathways in that context. In addition to Fidelity+ and Fidelity- (necessity and sufficiency), the manuscript reports preservation of signaling chains up to 4x longer than baselines, which constitutes validation against known biological signaling structures. We will revise the abstract and evaluation sections to more explicitly link the experimental data integration to the biological interpretation and to highlight the signaling-chain results as supporting evidence. revision: partial
-
Referee: [Abstract, Evaluation] The new metric and 'ML-oriented biological evaluations' are referenced but not defined or shown to differ substantively from standard explainer fidelity; without explicit definitions or ablation showing they capture biological relevance beyond classification contribution, the biological utility claim cannot be evaluated.
Authors: The ML-oriented biological evaluations center on the signaling-chain preservation metric (average length of retained continuous signaling paths), which is orthogonal to standard fidelity because it directly measures a topological property relevant to pathway biology. The new metric combines this with the fidelity scores. We will add explicit mathematical definitions, formulas, and an ablation study comparing the new metrics against standard fidelity alone in the revised Experiments section to demonstrate their distinct contribution. revision: yes
Circularity Check
No significant circularity; derivation applies standard graph learning with explicit framing assumption
full rationale
The paper frames the task as graph classification on bio-networks and states that 'The links (representing pathways) that contribute more to classification can be considered as targeted pathways' as a direct modeling premise rather than a derived result. No equations, fitted parameters renamed as predictions, or self-citation chains are present that reduce the claimed Fidelity+ / Fidelity- gains or biological meaningfulness to inputs by construction. The evaluations use adapted explainer metrics on 301 networks, and the framework integrates standard techniques without self-referential definitions or uniqueness theorems imported from prior author work. The load-bearing assumption equates classification contribution with targeting but does not create a circular derivation; it remains an open premise subject to external checks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Links contributing more to graph classification correspond to biologically targeted pathways
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The links (representing pathways) that contribute more to classification can be considered as targeted pathways... PATHMAMBA... PATHEXPLAINER... Fidelity+ (necessity) and Fidelity- (sufficiency)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PATHMAMBA is strictly more expressive than 1-WL GNNs... explanations of EXPATH... higher-order structural patterns
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Aoki, K. F.; and Kanehisa, M. 2005. Using the KEGG database resource. Current protocols in bioinformatics, 11(1): 1--12
work page 2005
-
[4]
Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; et al. 2000. Gene ontology: tool for the unification of biology. Nature genetics, 25(1): 25--29
work page 2000
-
[5]
Behrouz, A.; and Hashemi, F. 2024. Graph Mamba: Towards Learning on Graphs with State Space Models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '24, 119–130
work page 2024
-
[6]
Bornholdt, S. 2005. Less is more in modeling large genetic networks. Science, 310(5747): 449--451
work page 2005
- [7]
-
[8]
Chen, J.; Crawford, R.; and Xiao, Y. 2013. Vertical inhibition of the PI3K/Akt/mTOR pathway for the treatment of osteoarthritis. Journal of cellular biochemistry, 114(2): 245--249
work page 2013
-
[9]
Cheng, Z.; Yan, C.; Wu, F.-X.; and Wang, J. 2021. Drug-target interaction prediction using multi-head self-attention and graph attention network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(4): 2208--2218
work page 2021
-
[10]
Dolcet, X.; Llobet, D.; Pallares, J.; and Matias-Guiu, X. 2005. NF-kB in development and progression of human cancer. Virchows archiv, 446: 475--482
work page 2005
-
[11]
D'Souza, W. N.; Chang, C.-F.; Fischer, A. M.; Li, M.; and Hedrick, S. M. 2008. The Erk2 MAPK regulates CD8 T cell proliferation and survival. The Journal of Immunology, 181(11): 7617--7629
work page 2008
-
[12]
Gaud, G.; Lesourne, R.; and Love, P. E. 2018. Regulatory mechanisms in T cell receptor signalling. Nature Reviews Immunology, 18(8): 485--497
work page 2018
-
[13]
Gligorijevi \'c , V.; Renfrew, P. D.; Kosciolek, T.; Leman, J. K.; Berenberg, D.; Vatanen, T.; Chandler, C.; Taylor, B. C.; Fisk, I. M.; Vlamakis, H.; et al. 2021. Structure-based protein function prediction using graph convolutional networks. Nature communications, 12(1): 3168
work page 2021
-
[14]
Gu, A.; and Dao, T. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems
work page 2017
-
[16]
W.; Hedetniemi, S.; and Slater, P
Haynes, T. W.; Hedetniemi, S.; and Slater, P. 2013. Fundamentals of domination in graphs. CRC press
work page 2013
-
[17]
Huang, D. W.; Sherman, B. T.; and Lempicki, R. A. 2009. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research, 37(1): 1--13
work page 2009
-
[18]
Huse, M. 2009. The T-cell-receptor signaling network. Journal of cell science, 122(9): 1269--1273
work page 2009
-
[19]
Ideker, T.; and Krogan, N. J. 2012. Differential network biology. Molecular systems biology, 8(1): 565
work page 2012
-
[20]
Iván, G.; and Grolmusz, V. 2011. When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks. Bioinformatics (Oxford, England), 405—407
work page 2011
-
[21]
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Z \' dek, A.; Potapenko, A.; et al. 2021. Highly accurate protein structure prediction with AlphaFold. nature, 596(7873): 583--589
work page 2021
-
[22]
Kanehisa, M.; Furumichi, M.; Sato, Y.; Matsuura, Y.; and Ishiguro-Watanabe, M. 2024. KEGG: biological systems database as a model of the real world. Nucleic Acids Research, D672--D677
work page 2024
-
[23]
Kanehisa, M.; and Goto, S. 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1): 27--30
work page 2000
-
[24]
Kanehisa, M.; and Sato, Y. 2020. KEGG Mapper for inferring cellular functions from protein sequences. Protein Science, 29(1): 28--35
work page 2020
-
[25]
Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; and Tanabe, M. 2016. KEGG as a reference resource for gene and protein annotation. Nucleic acids research, 44(D1): D457--D462
work page 2016
-
[26]
Karger, D. R. 1994. Random sampling in cut, flow, and network design problems. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, 648--657
work page 1994
-
[27]
Kashtan, N.; Itzkovitz, S.; Milo, R.; and Alon, U. 2004 a . Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11): 1746--1758
work page 2004
-
[28]
Kashtan, N.; Itzkovitz, S.; Milo, R.; and Alon, U. 2004 b . Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 1746--1758
work page 2004
-
[29]
Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations
work page 2017
-
[30]
Li, M.; Wang, Z.; Liu, L.; Liu, X.; and Zhang, W. 2024. Subgraph-Aware Graph Kernel Neural Network for Link Prediction in Biological Networks. IEEE Journal of Biomedical and Health Informatics
work page 2024
-
[31]
Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. 2023. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 1123--1130
work page 2023
-
[32]
Luo, D.; Cheng, W.; Xu, D.; Yu, W.; Zong, B.; Chen, H.; and Zhang, X. 2020. Parameterized Explainer for Graph Neural Network. In Advances in Neural Information Processing Systems, 19620--19631
work page 2020
-
[33]
Maehara, T.; Akiba, T.; Iwata, Y.; and Kawarabayashi, K.-i. 2014. Computing personalized pagerank quickly by exploiting graph structures. Proceedings of the VLDB Endowment, 7(12): 1023--1034
work page 2014
-
[34]
Mahendran, A.; and Vedaldi, A. 2016. Salient deconvolutional networks. In ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, 120--135
work page 2016
-
[35]
Muzio, G.; O’Bray, L.; and Borgwardt, K. 2021. Biological network analysis with deep learning. Briefings in bioinformatics, 22(2): 1515--1530
work page 2021
-
[36]
Nacher, J. C.; and Akutsu, T. 2016. Minimum dominating set-based methods for analyzing biological networks. Methods, 102: 57--63
work page 2016
-
[37]
Rafiq, S.; Hackett, C. S.; and Brentjens, R. J. 2020. Engineering strategies to overcome the current roadblocks in CAR T cell therapy. Nature reviews Clinical oncology, 17(3): 147--167
work page 2020
-
[38]
Ramp\' a s ek, L.; Galkin, M.; Dwivedi, V. P.; Luu, A. T.; Wolf, G.; and Beaini, D. 2022. Recipe for a General, Powerful, Scalable Graph Transformer. In Advances in Neural Information Processing Systems, 14501--14515
work page 2022
-
[39]
Ravindra, N.; Sehanobish, A.; Pappalardo, J. L.; Hafler, D. A.; and van Dijk, D. 2020. Disease state prediction from single-cell data using graph attention networks. In Proceedings of the ACM conference on health, inference, and learning, 121--130
work page 2020
-
[40]
Shah, K.; Al-Haidari, A.; Sun, J.; and Kazi, J. U. 2021. T cell receptor (TCR) signaling in health and disease. Signal transduction and targeted therapy, 6(1): 412
work page 2021
-
[41]
Shrikumar, A.; Greenside, P.; and Kundaje, A. 2017. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning, ICML'17, 3145–3153
work page 2017
-
[42]
Simonyan, K.; Vedaldi, A.; and Zisserman, A. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[43]
Striving for Simplicity: The All Convolutional Net
Springenberg, J. T.; Dosovitskiy, A.; Brox, T.; and Riedmiller, M. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[44]
Strumbelj, E.; and Kononenko, I. 2010. An Efficient Explanation of Individual Classifications using Game Theory. J. Mach. Learn. Res., 1--18
work page 2010
-
[45]
Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A. L.; Fang, T.; Doncheva, N. T.; Pyysalo, S.; et al. 2023. The STRING database in 2023: protein--protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research, 51(D1): D638--D646
work page 2023
-
[46]
Tonshoff, J.; Ritzert, M.; Wolf, H.; and Grohe, M. 2023. Walking Out of the Weisfeiler Leman Hierarchy: Graph Learning Beyond Message Passing. Transactions on Machine Learning Research
work page 2023
-
[47]
Vara, J. \'A . F.; Casado, E.; de Castro, J.; Cejas, P.; Belda-Iniesta, C.; and Gonz \'a lez-Bar \'o n, M. 2004. PI3K/Akt signalling pathway and cancer. Cancer treatment reviews, 30(2): 193--204
work page 2004
-
[48]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; and Bengio, Y. 2018. Graph Attention Networks. In International Conference on Learning Representations
work page 2018
- [49]
-
[50]
Wang, H.; Zheng, H.; Browne, F.; and Wang, C. 2014. Minimum dominating sets in cell cycle specific protein interaction networks. In 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 25--30. IEEE
work page 2014
-
[51]
Wang, J.; Ma, A.; Chang, Y.; Gong, J.; Jiang, Y.; Qi, R.; Wang, C.; Fu, H.; Ma, Q.; and Xu, D. 2021. scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nature communications, 12(1): 1882
work page 2021
-
[52]
Wuchty, S. 2014. Controllability in protein interaction networks. Proceedings of the National Academy of Sciences, 7156--7160
work page 2014
-
[53]
Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019. How Powerful are Graph Neural Networks? In International Conference on Learning Representations
work page 2019
- [54]
-
[55]
Ying, Z.; Bourgeois, D.; You, J.; Zitnik, M.; and Leskovec, J. 2019. GNNExplainer: Generating Explanations for Graph Neural Networks. In Advances in Neural Information Processing Systems
work page 2019
-
[56]
Yu, G.; Wang, L.-G.; Han, Y.; and He, Q.-Y. 2012. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics: a journal of integrative biology, 16(5): 284--287
work page 2012
-
[57]
M.; Zhang, W.; Zhang, P.; and Sun, H
Yue, X.; Wang, Z.; Huang, J.; Parthasarathy, S.; Moosavinasab, S.; Huang, Y.; Lin, S. M.; Zhang, W.; Zhang, P.; and Sun, H. 2020. Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics, 36(4): 1241--1251
work page 2020
-
[58]
Zhang, X.-M.; Liang, L.; Liu, L.; and Tang, M.-J. 2021. Graph neural networks and their current applications in bioinformatics. Frontiers in genetics, 12: 690049
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.