Flexible inference of evolutionary accumulation dynamics using uncertain observational data

Iain G. Johnston; Jessica Renz; Morten Brun

arxiv: 2502.05872 · v4 · submitted 2025-02-09 · 🧬 q-bio.PE

Flexible inference of evolutionary accumulation dynamics using uncertain observational data

Jessica Renz , Morten Brun , Iain G. Johnston This is my paper

Pith reviewed 2026-05-23 03:35 UTC · model grok-4.3

classification 🧬 q-bio.PE

keywords evolutionary pathwayshypercubic inferenceuncertain dataaccumulation dynamicspathway inferencemultidrug resistancetuberculosisobservational data

0 comments

The pith

HyperLAU infers evolutionary accumulation pathways from data even when up to half the features are uncertain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HyperLAU, an algorithm that extends hypercubic inference to datasets containing uncertainties, allowing the use of cross-sectional, phylogenetic, and longitudinal observations without discarding uncertain entries. It demonstrates that the method recovers the main pathways identified by prior tools when as many as 50 percent of input features are uncertain. A tuberculosis multidrug resistance example shows that the approach yields additional pathway information while avoiding biases introduced by simply excluding uncertain data. Sympathetic readers care because many real-world evolutionary datasets are sparse or noisy, and methods that tolerate uncertainty can draw on larger portions of available evidence.

Core claim

HyperLAU is a new algorithm for hypercubic inference that learns dynamic pathways and feature interactions from data that includes uncertainties, even when large sets of particular features remain unobserved across the source dataset. It is shown to highlight the main pathways recovered by other tools when up to 50 percent of the features in the input data are uncertain and to reduce biases that arise when uncertain portions of the data are simply excluded.

What carries the argument

HyperLAU algorithm, which extends hypercubic inference models to incorporate uncertain observational data for pathway learning.

If this is right

Datasets with up to 50 percent uncertain features can still be used to infer main evolutionary pathways.
Biases introduced by excluding uncertain data entries can be reduced or avoided.
Additional information on evolutionary pathways becomes available in applications such as multidrug resistance analysis.
Cross-sectional, phylogenetic, and longitudinal data sources can be combined even when individual features carry uncertainty.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same uncertainty-handling logic could be adapted to other accumulation models outside the hypercubic setting.
Medical datasets with noisy or incomplete observations may benefit from similar flexible inference without forced data pruning.
Testing on synthetic datasets with controlled uncertainty levels would provide direct checks on pathway recovery rates.
The approach may allow retrospective re-analysis of existing evolutionary studies that previously discarded uncertain observations.

Load-bearing premise

The hypercubic model structure continues to represent the underlying evolutionary process accurately when large fractions of the input features are uncertain or unobserved.

What would settle it

Application of HyperLAU to a dataset with known accumulation pathways where the recovered pathways diverge from those found by other tools once 40-50 percent of features are marked uncertain.

Figures

Figures reproduced from arXiv: 2502.05872 by Iain G. Johnston, Jessica Renz, Morten Brun.

**Figure 1.** Figure 1: HyperLAU workflow. Learning evolutionary trajectories on a hypercube, based on data that contains uncertainties. (A) Dataset (structure can be cross-sectional or longitudinal) that contains information about the presence (red/dark) or absence (green/gradient) of certain features. White boxes indicate missing/uncertain information. (B) Translation of the data into binary barcodes, 1 = presence of the featur… view at source ↗

**Figure 2.** Figure 2: Visualization of evolutionary pathways inferred by HyperLAU based on some toy examples. Plots show inferred transition networks through the evolutionary state space from 000... (top) to 111... (bottom) (as in Figure 1D). In all plots, the thickness of the edges represents the probability flux between the corresponding state nodes (all coloured edges have minimum 0.05). Coefficient of variation (CV) is illu… view at source ↗

**Figure 3.** Figure 3: Visualisation of evolutionary pathways learned by HyperLAU based on the tuberculosis dataset [Casali et al., 2014]. Plots show inferred transition networks through the evolutionary state space from 000... (top) to 111... (bottom) (as in Figure 1D). In all plots, the thickness of the edges represents the probability flux between the corresponding state nodes (all coloured edges have minimum 0.05). Coeffici… view at source ↗

**Figure 4.** Figure 4: Inference of anti-microbial evolution in tuberculosis based on a full dataset including uncertainties. (A) Visualisation of the used dataset from [Casali et al., 2014] embedded in a phylogeny. Each row in the matrix corresponds to a bacterial isolate that is a tip in the phylogeny. Each column in the matrix describes resistance to a different drug: red fields in the profile represent missing data, white f… view at source ↗

read the original abstract

Understanding and predicting evolutionary accumulation pathways is a key objective in many fields of research, ranging from classical evolutionary biology to diverse applications in medicine. In this context, we are often confronted with the problem that data is sparse and uncertain. To use the available data as best as possible, inference approaches that can handle this uncertainty are required. One way that allows us to use not only cross-sectional data, but also phylogenetic related and longitudinal data, is using `hypercubic inference' models. In this article we introduce HyperLAU, a new algorithm for hypercubic inference that makes it possible to use datasets including uncertainties for learning evolutionary pathways. Expanding the flexibility of accumulation modelling, HyperLAU allows us to infer dynamic pathways and interactions between features, even when large sets of particular features are unobserved across the source dataset. We show that HyperLAU is able to highlight the main pathways found by other tools, even when up to 50% of the features in the input data are uncertain. Additionally, we demonstrate how it can help to overcome possible biases that can occur then reducing the used data by excluding uncertain parts. We illustrate the approach with a case study on multidrug resistance in tuberculosis, showing that HyperLAU allows more flexible data and provides new information about evolutionary pathways compared to existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HyperLAU lets hypercubic models keep uncertain observations instead of dropping them, but the 50% uncertainty claim rests on one dataset without controlled checks.

read the letter

The paper introduces HyperLAU as a new algorithm that extends hypercubic inference to datasets with uncertain or unobserved features. This is the core practical move: instead of discarding rows or columns with missing or uncertain entries, the method tries to use them while still recovering accumulation pathways and interactions. On the tuberculosis multidrug-resistance example it recovers the main pathways reported by other tools even when up to half the features are uncertain, and it shows that simply excluding uncertain data can shift the inferred order of events.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces HyperLAU, a new algorithm extending hypercubic inference models to accommodate uncertain or missing observational data (cross-sectional, phylogenetic, or longitudinal) when inferring evolutionary accumulation pathways and feature interactions. It claims that HyperLAU recovers the dominant pathways identified by existing tools even when up to 50% of input features are uncertain, mitigates bias from simply discarding uncertain observations, and yields additional pathway information on a multidrug-resistance tuberculosis dataset.

Significance. If the uncertainty-handling mechanism proves robust, the work could meaningfully expand the usable data volume for accumulation modeling in evolutionary biology and clinical microbiology. The approach directly targets a common practical limitation (sparse/uncertain features) rather than assuming complete observations. However, the significance is currently limited by the absence of controlled validation or analytic guarantees beyond a single real-world case study.

major comments (3)

[Abstract] Abstract and introduction: the central empirical claim (recovery of main pathways with up to 50% uncertain features on the TB dataset) rests on a single case study without reported controlled simulations that systematically vary the fraction, correlation structure, or type of uncertainty while holding the true accumulation graph fixed. This leaves open whether the hypercubic model structure plus the chosen uncertainty encoding introduces systematic bias under the stated conditions.
[Abstract] Abstract: no analytic bound, parameter-free derivation, or error-propagation analysis is referenced for how uncertainty in individual features propagates through the hypercubic inference procedure; the manuscript therefore provides no a-priori reason to expect the 50% threshold to be general rather than dataset-specific.
[Abstract] The manuscript states that HyperLAU 'allows more flexible data' and 'provides new information' compared with existing tools, yet supplies no quantitative comparison (e.g., pathway overlap metrics, false-positive rates on simulated graphs, or sensitivity to the uncertainty representation) that would establish these improvements are not artifacts of the particular TB dataset or the chosen baseline tools.

minor comments (2)

[Abstract] The abstract refers to 'hypercubic inference' models without a brief reminder of the underlying state-space construction or the precise meaning of 'accumulation dynamics'; a one-sentence definition would aid readers unfamiliar with the prior literature.
[Abstract] The claim that the method 'overcome[s] possible biases' from data reduction is stated without specifying how bias is measured or what the baseline bias magnitude is on the TB example.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment below, acknowledging the limitations in the current version and outlining specific revisions to address them.

read point-by-point responses

Referee: [Abstract] Abstract and introduction: the central empirical claim (recovery of main pathways with up to 50% uncertain features on the TB dataset) rests on a single case study without reported controlled simulations that systematically vary the fraction, correlation structure, or type of uncertainty while holding the true accumulation graph fixed. This leaves open whether the hypercubic model structure plus the chosen uncertainty encoding introduces systematic bias under the stated conditions.

Authors: We agree that the current empirical support relies primarily on the TB case study and that controlled simulations would strengthen the claims. In the revised manuscript we will add a dedicated simulation study section that systematically varies the fraction of uncertain features (including around the 50% level), their correlation structure, and uncertainty types, using fixed ground-truth accumulation graphs. This will allow direct assessment of potential bias introduced by the uncertainty encoding. revision: yes
Referee: [Abstract] Abstract: no analytic bound, parameter-free derivation, or error-propagation analysis is referenced for how uncertainty in individual features propagates through the hypercubic inference procedure; the manuscript therefore provides no a-priori reason to expect the 50% threshold to be general rather than dataset-specific.

Authors: The manuscript does not currently include analytic bounds or formal error-propagation analysis, as the emphasis was on algorithmic extension and practical application. We will add a section in the revision that analyzes uncertainty propagation through the hypercubic model, including any available bounds or sensitivity results, to provide additional justification for the observed performance levels. revision: yes
Referee: [Abstract] The manuscript states that HyperLAU 'allows more flexible data' and 'provides new information' compared with existing tools, yet supplies no quantitative comparison (e.g., pathway overlap metrics, false-positive rates on simulated graphs, or sensitivity to the uncertainty representation) that would establish these improvements are not artifacts of the particular TB dataset or the chosen baseline tools.

Authors: We acknowledge the absence of quantitative metrics comparing pathway recovery, false-positive rates, and sensitivity across methods. The revised manuscript will incorporate such comparisons, using both the TB dataset and the new simulations, to quantify improvements in pathway overlap and robustness relative to baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: new algorithm with empirical case-study validation

full rationale

The manuscript introduces HyperLAU as a new inference algorithm extending hypercubic accumulation models to handle uncertain or missing features. No derivation chain, parameter fitting, or prediction step is shown to reduce tautologically to its own inputs or to a self-citation. Performance claims rest on an empirical demonstration that the algorithm recovers pathways identified by other tools on one TB multidrug-resistance dataset, which constitutes independent validation rather than a self-referential loop. The central contribution is therefore algorithmic and observational, not a closed mathematical identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no details on parameters, axioms, or new entities are provided.

pith-pipeline@v0.9.0 · 5761 in / 939 out tokens · 22893 ms · 2026-05-23T03:35:10.563065+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An Algebraic Approach to Evolutionary Accumulation Models
stat.AP 2025-11 unverdicted novelty 6.0

An algebraic approach defines semi-algebraic parameter sets from underlying polynomial structures in evolutionary processes before likelihood maximization, showing compatibility with existing statistical EvAM models w...

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper

[1]

Aga, O. N. L., Brun, M., Dauda, K. A., Diaz-Uriarte, R., Giannakis, K., and Johnston, I. G. (2024). HyperTraPS - CT : Inference and prediction for accumulation pathways with flexible data and model structures. PLOS Computational Biology , 20(9):e1012393

work page 2024
[2]

Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2006). Evolution on distributive lattices. Journal of Theoretical Biology , 242(2):409--420

work page 2006
[3]

Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2007). Conjunctive Bayesian networks. Bernoulli , 13(4):893--909

work page 2007
[4]

F., Gerstung, M., and Markowetz, F

Beerenwinkel, N., Schwarz, R. F., Gerstung, M., and Markowetz, F. (2015). Cancer Evolution : Mathematical Models and Computational Inference . Systematic Biology , 64(1):e1--e25

work page 2015
[5]

and Sullivant, S

Beerenwinkel, N. and Sullivant, S. (2009). Markov models for accumulating mutations. Biometrika , 96(3):645--661

work page 2009
[6]

R., Ignatyeva, O., Kontsevaya, I., Corander, J., Bryant, J., Parkhill, J., Nejentsev, S., Horstmann, R

Casali, N., Nikolayevskyy, V., Balabanova, Y., Harris, S. R., Ignatyeva, O., Kontsevaya, I., Corander, J., Bryant, J., Parkhill, J., Nejentsev, S., Horstmann, R. D., Brown, T., and Drobniewski, F. (2014). Evolution and transmission of drug-resistant tuberculosis in a Russian population. Nature Genetics , 46(3):279--286

work page 2014
[7]

Chen, J. (2023). Timed hazard networks: Incorporating temporal difference for oncogenetic analysis. PLOS ONE , 18(3):e0283004

work page 2023
[8]

and Nepusz, T

Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal , Complex Systems:1695

work page 2006
[9]

Csárdi, G., Nepusz, T., Traag, V., Horvát, S., Zanini, F., Noom, D., and Müller, K. (2024). \ igraph\ : Network Analysis and Visualization in R

work page 2024
[10]

O., Wu, H., Safa Erenay, F., Sir, M

Dalgıç, Ö. O., Wu, H., Safa Erenay, F., Sir, M. Y., Özaltın, O. Y., Crum, B. A., and Pasupathy, K. S. (2021). Mapping of critical events in disease progression through binary classification: Application to amyotrophic lateral sclerosis. Journal of Biomedical Informatics , 123:103895

work page 2021
[11]

H., and Schäffer, A

Desper, R., Jiang, F., Kallioniemi, O.-P., Moch, H., Papadimitriou, C. H., and Schäffer, A. A. (1999). Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data . Journal of Computational Biology , 6(1):37--51

work page 1999
[12]

Diaz-Uriarte, R and Johnston, I. (2025). A picture guide to cancer progression and monotonic accumulation models: evolutionary assumptions, plausible interpretations, and alternative uses. IEEE Access

work page 2025
[13]

Diaz-Uriarte, R and Herrera-Nieto, P. (2022). EvAM-Tools: tools for evolutionary accumulation and cancer progression models. uses. Bioinformatics , 38(24), 5457--5459

work page 2022
[14]

and Vasallo, C

Diaz-Uriarte, R. and Vasallo, C. (2019). Every which way? On predicting tumor evolution using cancer progression models. PLOS Computational Biology , 15(8):e1007246

work page 2019
[15]

Gao, Y., Gaither, J., Chifman, J., and Kubatko, L. (2022). A phylogenetic approach to inferring the order in which mutations arise during cancer progression. PLOS Computational Biology , 18(12):e1010560

work page 2022
[16]

Gerstung, M., Baudis, M., Moch, H., and Beerenwinkel, N. (2009). Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics , 25(21):2809--2815

work page 2009
[17]

Gotovos, A., Burkholz, R., Quackenbush, J., and Jegelka, S. (2021). Scaling up Continuous - Time Markov Chains Helps Resolve Underspecification . In Advances in Neural Information Processing Systems , volume 34, pages 14580--14592. Curran Associates, Inc

work page 2021
[18]

F., Barahona, M., and Johnston, I

Greenbury, S. F., Barahona, M., and Johnston, I. G. (2020). HyperTraPS : Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways . Cell Systems , 10(1):39--51.e10

work page 2020
[19]

Hjelm, M., Höglund, M., and Lagergren, J. (2006). New Probabilistic Network Models and Algorithms for Oncogenesis . Journal of Computational Biology , 13(4):853--865

work page 2006
[20]

Johnston, I. G. and Diaz-Uriarte, R. (2024). A hypercubic Mk model framework for capturing reversibility in disease, cancer, and evolutionary accumulation modelling. Publication Title: bioRxiv

work page 2024
[21]

G., Hoffmann, T., Greenbury, S

Johnston, I. G., Hoffmann, T., Greenbury, S. F., Cominetti, O., Jallow, M., Kwiatkowski, D., Barahona, M., Jones, N. S., and Casals-Pascual, C. (2019). Precision identification of high-risk phenotypes and progression pathways in severe malaria without requiring longitudinal data. npj Digital Medicine , 2(1):1--9

work page 2019
[22]

Johnston, I. G. and Williams, B. P. (2016). Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention . Cell Systems , 2(2):101--111

work page 2016
[23]

Kassambara, A. (2023). ggpubr: 'ggplot2' Based Publication Ready Plots

work page 2023
[24]

Lewis, P. O. (2001). A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data . Systematic Biology , 50(6):913--925

work page 2001
[25]

G., Kuipers, J., and Beerenwinkel, N

Luo, X. G., Kuipers, J., and Beerenwinkel, N. (2023). Joint inference of exclusivity patterns and recurrent trajectories from tumor mutation trees. Nature Communications , 14(1):3676

work page 2023
[26]

F., and Martin, W

Maier, U.-G., Zauner, S., Woehle, C., Bolte, K., Hempel, F., Allen, J. F., and Martin, W. F. (2013). Massively Convergent Evolution for Ribosomal Protein Gene Content in Plastid and Mitochondrial Genomes . Genome Biology and Evolution , 5(12):2318--2329

work page 2013
[27]

Moen, M. T. and Johnston, I. G. (2023). HyperHMM : efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics , 39(1):btac803

work page 2023
[28]

F., Beerenwinkel, N., and The Swiss HIV Cohort Study (2016)

Montazeri, H., Kuipers, J., Kouyos, R., Böni, J., Yerly, S., Klimkait, T., Aubert, V., Günthard, H. F., Beerenwinkel, N., and The Swiss HIV Cohort Study (2016). Large-scale inference of conjunctive Bayesian networks. Bioinformatics , 32(17):i727--i735

work page 2016
[29]

Murray, C. J. L., Ikuta, K. S., Sharara, F., Swetschinski, L., Robles Aguilar, G., Gray, A., Han, C., Bisignano, C., Rao, P., Wool, E., Johnson, S. C., Browne, A. J., Chipeta, M. G., Fell, F., Hackett, S., Haines-Woodhouse, G., Kashef Hamadani, B. H., Kumaran, E. A. P., McManigal, B., Achalapong, S., Agarwal, R., Akech, S., Albertson, S., Amuasi, J., Andr...

work page 2022
[30]

G., Bonomo, R

Nichol, D., Jeavons, P., Fletcher, A. G., Bonomo, R. A., Maini, P. K., Paul, J. L., Gatenby, R. A., Anderson, A. R. A., and Scott, J. G. (2015). Steering Evolution with Sequential Therapy to Prevent the Emergence of Bacterial Antibiotic Resistance . PLOS Computational Biology , 11(9):e1004493

work page 2015
[31]

B., Coombes, K

Nicol, P. B., Coombes, K. R., Deaver, C., Chkrebtii, O., Paul, S., Toland, A. E., and Asiaee, A. (2021). Oncogenetic network estimation with disjunctive Bayesian networks. Computational and Systems Oncology , 1(2):e1027

work page 2021
[32]

Pagel, M. (1994). Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proceedings of the Royal Society , 255(1342):37--45

work page 1994
[33]

Pedersen, T. L. (2024). ggraph: An Implementation of Grammar of Graphics for Graphs and Networks

work page 2024
[34]

A., Aga, O

Renz, J., Dauda, K. A., Aga, O. N. L., Diaz-Uriarte, R., Löhr, I. H., Blomberg, B., and Johnston, I. G. (2024). Evolutionary accumulation modelling in AMR : machine learning to infer and predict evolutionary dynamics of multi-drug resistance

work page 2024
[35]

Rupp, K., Schill, R., Süskind, J., Georg, P., Klever, M., Lösch, A., Grasedyck, L., Wettig, T., and Spang, R. (2024). Differentiated uniformization: a new method for inferring Markov chains on combinatorial state spaces including stochastic epidemic models. Computational Statistics

work page 2024
[36]

and Curtin, R

Sanderson, C. and Curtin, R. (2016). Armadillo: a template-based C ++ library for linear algebra. Journal of Open Source Software , 1(2):26

work page 2016
[37]

and Curtin, R

Sanderson, C. and Curtin, R. (2019). Practical Sparse Matrices in C ++ with Hybrid Storage and Template - Based Expression Optimisation . Mathematical and Computational Applications , 24(3)

work page 2019
[38]

L., Vocht, S., Rupp, K., Grasedyck, L., Spang, R., and Beerenwinkel, N

Schill, R., Klever, M., Lösch, A., Hu, Y. L., Vocht, S., Rupp, K., Grasedyck, L., Spang, R., and Beerenwinkel, N. (2024a). Overcoming Observation Bias for Cancer Progression Modeling . In Ma, J., editor, Research in Computational Molecular Biology , pages 217--234, Cham. Springer Nature Switzerland

work page
[39]

L., Lösch, A., Georg, P., Pfahler, S., Vocht, S., Hansch, S., Wettig, T., Grasedyck, L., and Spang, R

Schill, R., Klever, M., Rupp, K., Hu, Y. L., Lösch, A., Georg, P., Pfahler, S., Vocht, S., Hansch, S., Wettig, T., Grasedyck, L., and Spang, R. (2024b). Reconstructing Disease Histories in Huge Discrete State Spaces . KI - Künstliche Intelligenz

work page
[40]

Schill, R., Solbrig, S., Wettig, T., and Spang, R. (2020). Modelling cancer progression using Mutual Hazard Networks . Bioinformatics , 36(1):241--249

work page 2020
[41]

and Schäffer, A

Schwartz, R. and Schäffer, A. A. (2017). The evolution of tumour phylogenetics: principles and practice. Nature Reviews Genetics , 18(4):213--229

work page 2017
[42]

Slowikowski, K. (2024). ggrepel: Automatically Position Non - Overlapping Text Labels with 'ggplot2'. https://ggrepel.slowkow.com/, https://github.com/slowkow/ggrepel

work page 2024
[43]

and Boucher, K

Szabo, A. and Boucher, K. (2002). Estimating an oncogenetic tree when false negatives and positives are present. Mathematical Biosciences , 176(2):219--236

work page 2002
[44]

and Ushijima, T

Takeshima, H. and Ushijima, T. (2019). Accumulation of genetic and epigenetic alterations in normal cells and cancer risk. npj Precision Oncology , 3(1):1--8

work page 2019
[45]

X., and Gore, J

Tan, L., Serene, S., Chao, H. X., and Gore, J. (2011). Hidden Randomness between Fitness Landscapes Limits Reverse Evolution . Physical Review Letters , 106(19):198102

work page 2011
[46]

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag New York

work page 2016
[47]

Wickham, H. (2023). stringr: Simple , Consistent Wrappers for Common String Operations

work page 2023
[48]

and Bryan, J

Wickham, H. and Bryan, J. (2025). readxl: Read Excel Files . R package version 1.4.5, https://github.com/tidyverse/readxl

work page 2025
[49]

Wickham, H., François, R., Henry, L., Müller, K., and Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation . R package version 1.1.4, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org

work page 2023
[50]

Wickham, H., Vaughan, D., and Girlich, M. (2024). tidyr: Tidy Messy Data . R package version 1.3.1, https://github.com/tidyverse/tidyr

work page 2024
[51]

and Simon, R

Youn, A. and Simon, R. (2012). Estimating the order of mutations during tumorigenesis from tumor genome sequencing data. Bioinformatics , 28(12):1555--1561

work page 2012

[1] [1]

Aga, O. N. L., Brun, M., Dauda, K. A., Diaz-Uriarte, R., Giannakis, K., and Johnston, I. G. (2024). HyperTraPS - CT : Inference and prediction for accumulation pathways with flexible data and model structures. PLOS Computational Biology , 20(9):e1012393

work page 2024

[2] [2]

Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2006). Evolution on distributive lattices. Journal of Theoretical Biology , 242(2):409--420

work page 2006

[3] [3]

Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2007). Conjunctive Bayesian networks. Bernoulli , 13(4):893--909

work page 2007

[4] [4]

F., Gerstung, M., and Markowetz, F

Beerenwinkel, N., Schwarz, R. F., Gerstung, M., and Markowetz, F. (2015). Cancer Evolution : Mathematical Models and Computational Inference . Systematic Biology , 64(1):e1--e25

work page 2015

[5] [5]

and Sullivant, S

Beerenwinkel, N. and Sullivant, S. (2009). Markov models for accumulating mutations. Biometrika , 96(3):645--661

work page 2009

[6] [6]

R., Ignatyeva, O., Kontsevaya, I., Corander, J., Bryant, J., Parkhill, J., Nejentsev, S., Horstmann, R

Casali, N., Nikolayevskyy, V., Balabanova, Y., Harris, S. R., Ignatyeva, O., Kontsevaya, I., Corander, J., Bryant, J., Parkhill, J., Nejentsev, S., Horstmann, R. D., Brown, T., and Drobniewski, F. (2014). Evolution and transmission of drug-resistant tuberculosis in a Russian population. Nature Genetics , 46(3):279--286

work page 2014

[7] [7]

Chen, J. (2023). Timed hazard networks: Incorporating temporal difference for oncogenetic analysis. PLOS ONE , 18(3):e0283004

work page 2023

[8] [8]

and Nepusz, T

Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal , Complex Systems:1695

work page 2006

[9] [9]

Csárdi, G., Nepusz, T., Traag, V., Horvát, S., Zanini, F., Noom, D., and Müller, K. (2024). \ igraph\ : Network Analysis and Visualization in R

work page 2024

[10] [10]

O., Wu, H., Safa Erenay, F., Sir, M

Dalgıç, Ö. O., Wu, H., Safa Erenay, F., Sir, M. Y., Özaltın, O. Y., Crum, B. A., and Pasupathy, K. S. (2021). Mapping of critical events in disease progression through binary classification: Application to amyotrophic lateral sclerosis. Journal of Biomedical Informatics , 123:103895

work page 2021

[11] [11]

H., and Schäffer, A

Desper, R., Jiang, F., Kallioniemi, O.-P., Moch, H., Papadimitriou, C. H., and Schäffer, A. A. (1999). Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data . Journal of Computational Biology , 6(1):37--51

work page 1999

[12] [12]

Diaz-Uriarte, R and Johnston, I. (2025). A picture guide to cancer progression and monotonic accumulation models: evolutionary assumptions, plausible interpretations, and alternative uses. IEEE Access

work page 2025

[13] [13]

Diaz-Uriarte, R and Herrera-Nieto, P. (2022). EvAM-Tools: tools for evolutionary accumulation and cancer progression models. uses. Bioinformatics , 38(24), 5457--5459

work page 2022

[14] [14]

and Vasallo, C

Diaz-Uriarte, R. and Vasallo, C. (2019). Every which way? On predicting tumor evolution using cancer progression models. PLOS Computational Biology , 15(8):e1007246

work page 2019

[15] [15]

Gao, Y., Gaither, J., Chifman, J., and Kubatko, L. (2022). A phylogenetic approach to inferring the order in which mutations arise during cancer progression. PLOS Computational Biology , 18(12):e1010560

work page 2022

[16] [16]

Gerstung, M., Baudis, M., Moch, H., and Beerenwinkel, N. (2009). Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics , 25(21):2809--2815

work page 2009

[17] [17]

Gotovos, A., Burkholz, R., Quackenbush, J., and Jegelka, S. (2021). Scaling up Continuous - Time Markov Chains Helps Resolve Underspecification . In Advances in Neural Information Processing Systems , volume 34, pages 14580--14592. Curran Associates, Inc

work page 2021

[18] [18]

F., Barahona, M., and Johnston, I

Greenbury, S. F., Barahona, M., and Johnston, I. G. (2020). HyperTraPS : Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways . Cell Systems , 10(1):39--51.e10

work page 2020

[19] [19]

Hjelm, M., Höglund, M., and Lagergren, J. (2006). New Probabilistic Network Models and Algorithms for Oncogenesis . Journal of Computational Biology , 13(4):853--865

work page 2006

[20] [20]

Johnston, I. G. and Diaz-Uriarte, R. (2024). A hypercubic Mk model framework for capturing reversibility in disease, cancer, and evolutionary accumulation modelling. Publication Title: bioRxiv

work page 2024

[21] [21]

G., Hoffmann, T., Greenbury, S

Johnston, I. G., Hoffmann, T., Greenbury, S. F., Cominetti, O., Jallow, M., Kwiatkowski, D., Barahona, M., Jones, N. S., and Casals-Pascual, C. (2019). Precision identification of high-risk phenotypes and progression pathways in severe malaria without requiring longitudinal data. npj Digital Medicine , 2(1):1--9

work page 2019

[22] [22]

Johnston, I. G. and Williams, B. P. (2016). Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention . Cell Systems , 2(2):101--111

work page 2016

[23] [23]

Kassambara, A. (2023). ggpubr: 'ggplot2' Based Publication Ready Plots

work page 2023

[24] [24]

Lewis, P. O. (2001). A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data . Systematic Biology , 50(6):913--925

work page 2001

[25] [25]

G., Kuipers, J., and Beerenwinkel, N

Luo, X. G., Kuipers, J., and Beerenwinkel, N. (2023). Joint inference of exclusivity patterns and recurrent trajectories from tumor mutation trees. Nature Communications , 14(1):3676

work page 2023

[26] [26]

F., and Martin, W

Maier, U.-G., Zauner, S., Woehle, C., Bolte, K., Hempel, F., Allen, J. F., and Martin, W. F. (2013). Massively Convergent Evolution for Ribosomal Protein Gene Content in Plastid and Mitochondrial Genomes . Genome Biology and Evolution , 5(12):2318--2329

work page 2013

[27] [27]

Moen, M. T. and Johnston, I. G. (2023). HyperHMM : efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics , 39(1):btac803

work page 2023

[28] [28]

F., Beerenwinkel, N., and The Swiss HIV Cohort Study (2016)

Montazeri, H., Kuipers, J., Kouyos, R., Böni, J., Yerly, S., Klimkait, T., Aubert, V., Günthard, H. F., Beerenwinkel, N., and The Swiss HIV Cohort Study (2016). Large-scale inference of conjunctive Bayesian networks. Bioinformatics , 32(17):i727--i735

work page 2016

[29] [29]

Murray, C. J. L., Ikuta, K. S., Sharara, F., Swetschinski, L., Robles Aguilar, G., Gray, A., Han, C., Bisignano, C., Rao, P., Wool, E., Johnson, S. C., Browne, A. J., Chipeta, M. G., Fell, F., Hackett, S., Haines-Woodhouse, G., Kashef Hamadani, B. H., Kumaran, E. A. P., McManigal, B., Achalapong, S., Agarwal, R., Akech, S., Albertson, S., Amuasi, J., Andr...

work page 2022

[30] [30]

G., Bonomo, R

Nichol, D., Jeavons, P., Fletcher, A. G., Bonomo, R. A., Maini, P. K., Paul, J. L., Gatenby, R. A., Anderson, A. R. A., and Scott, J. G. (2015). Steering Evolution with Sequential Therapy to Prevent the Emergence of Bacterial Antibiotic Resistance . PLOS Computational Biology , 11(9):e1004493

work page 2015

[31] [31]

B., Coombes, K

Nicol, P. B., Coombes, K. R., Deaver, C., Chkrebtii, O., Paul, S., Toland, A. E., and Asiaee, A. (2021). Oncogenetic network estimation with disjunctive Bayesian networks. Computational and Systems Oncology , 1(2):e1027

work page 2021

[32] [32]

Pagel, M. (1994). Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proceedings of the Royal Society , 255(1342):37--45

work page 1994

[33] [33]

Pedersen, T. L. (2024). ggraph: An Implementation of Grammar of Graphics for Graphs and Networks

work page 2024

[34] [34]

A., Aga, O

Renz, J., Dauda, K. A., Aga, O. N. L., Diaz-Uriarte, R., Löhr, I. H., Blomberg, B., and Johnston, I. G. (2024). Evolutionary accumulation modelling in AMR : machine learning to infer and predict evolutionary dynamics of multi-drug resistance

work page 2024

[35] [35]

Rupp, K., Schill, R., Süskind, J., Georg, P., Klever, M., Lösch, A., Grasedyck, L., Wettig, T., and Spang, R. (2024). Differentiated uniformization: a new method for inferring Markov chains on combinatorial state spaces including stochastic epidemic models. Computational Statistics

work page 2024

[36] [36]

and Curtin, R

Sanderson, C. and Curtin, R. (2016). Armadillo: a template-based C ++ library for linear algebra. Journal of Open Source Software , 1(2):26

work page 2016

[37] [37]

and Curtin, R

Sanderson, C. and Curtin, R. (2019). Practical Sparse Matrices in C ++ with Hybrid Storage and Template - Based Expression Optimisation . Mathematical and Computational Applications , 24(3)

work page 2019

[38] [38]

L., Vocht, S., Rupp, K., Grasedyck, L., Spang, R., and Beerenwinkel, N

Schill, R., Klever, M., Lösch, A., Hu, Y. L., Vocht, S., Rupp, K., Grasedyck, L., Spang, R., and Beerenwinkel, N. (2024a). Overcoming Observation Bias for Cancer Progression Modeling . In Ma, J., editor, Research in Computational Molecular Biology , pages 217--234, Cham. Springer Nature Switzerland

work page

[39] [39]

L., Lösch, A., Georg, P., Pfahler, S., Vocht, S., Hansch, S., Wettig, T., Grasedyck, L., and Spang, R

Schill, R., Klever, M., Rupp, K., Hu, Y. L., Lösch, A., Georg, P., Pfahler, S., Vocht, S., Hansch, S., Wettig, T., Grasedyck, L., and Spang, R. (2024b). Reconstructing Disease Histories in Huge Discrete State Spaces . KI - Künstliche Intelligenz

work page

[40] [40]

Schill, R., Solbrig, S., Wettig, T., and Spang, R. (2020). Modelling cancer progression using Mutual Hazard Networks . Bioinformatics , 36(1):241--249

work page 2020

[41] [41]

and Schäffer, A

Schwartz, R. and Schäffer, A. A. (2017). The evolution of tumour phylogenetics: principles and practice. Nature Reviews Genetics , 18(4):213--229

work page 2017

[42] [42]

Slowikowski, K. (2024). ggrepel: Automatically Position Non - Overlapping Text Labels with 'ggplot2'. https://ggrepel.slowkow.com/, https://github.com/slowkow/ggrepel

work page 2024

[43] [43]

and Boucher, K

Szabo, A. and Boucher, K. (2002). Estimating an oncogenetic tree when false negatives and positives are present. Mathematical Biosciences , 176(2):219--236

work page 2002

[44] [44]

and Ushijima, T

Takeshima, H. and Ushijima, T. (2019). Accumulation of genetic and epigenetic alterations in normal cells and cancer risk. npj Precision Oncology , 3(1):1--8

work page 2019

[45] [45]

X., and Gore, J

Tan, L., Serene, S., Chao, H. X., and Gore, J. (2011). Hidden Randomness between Fitness Landscapes Limits Reverse Evolution . Physical Review Letters , 106(19):198102

work page 2011

[46] [46]

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag New York

work page 2016

[47] [47]

Wickham, H. (2023). stringr: Simple , Consistent Wrappers for Common String Operations

work page 2023

[48] [48]

and Bryan, J

Wickham, H. and Bryan, J. (2025). readxl: Read Excel Files . R package version 1.4.5, https://github.com/tidyverse/readxl

work page 2025

[49] [49]

Wickham, H., François, R., Henry, L., Müller, K., and Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation . R package version 1.1.4, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org

work page 2023

[50] [50]

Wickham, H., Vaughan, D., and Girlich, M. (2024). tidyr: Tidy Messy Data . R package version 1.3.1, https://github.com/tidyverse/tidyr

work page 2024

[51] [51]

and Simon, R

Youn, A. and Simon, R. (2012). Estimating the order of mutations during tumorigenesis from tumor genome sequencing data. Bioinformatics , 28(12):1555--1561

work page 2012