Flexible inference of evolutionary accumulation dynamics using uncertain observational data
Pith reviewed 2026-05-23 03:35 UTC · model grok-4.3
The pith
HyperLAU infers evolutionary accumulation pathways from data even when up to half the features are uncertain.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HyperLAU is a new algorithm for hypercubic inference that learns dynamic pathways and feature interactions from data that includes uncertainties, even when large sets of particular features remain unobserved across the source dataset. It is shown to highlight the main pathways recovered by other tools when up to 50 percent of the features in the input data are uncertain and to reduce biases that arise when uncertain portions of the data are simply excluded.
What carries the argument
HyperLAU algorithm, which extends hypercubic inference models to incorporate uncertain observational data for pathway learning.
If this is right
- Datasets with up to 50 percent uncertain features can still be used to infer main evolutionary pathways.
- Biases introduced by excluding uncertain data entries can be reduced or avoided.
- Additional information on evolutionary pathways becomes available in applications such as multidrug resistance analysis.
- Cross-sectional, phylogenetic, and longitudinal data sources can be combined even when individual features carry uncertainty.
Where Pith is reading between the lines
- The same uncertainty-handling logic could be adapted to other accumulation models outside the hypercubic setting.
- Medical datasets with noisy or incomplete observations may benefit from similar flexible inference without forced data pruning.
- Testing on synthetic datasets with controlled uncertainty levels would provide direct checks on pathway recovery rates.
- The approach may allow retrospective re-analysis of existing evolutionary studies that previously discarded uncertain observations.
Load-bearing premise
The hypercubic model structure continues to represent the underlying evolutionary process accurately when large fractions of the input features are uncertain or unobserved.
What would settle it
Application of HyperLAU to a dataset with known accumulation pathways where the recovered pathways diverge from those found by other tools once 40-50 percent of features are marked uncertain.
Figures
read the original abstract
Understanding and predicting evolutionary accumulation pathways is a key objective in many fields of research, ranging from classical evolutionary biology to diverse applications in medicine. In this context, we are often confronted with the problem that data is sparse and uncertain. To use the available data as best as possible, inference approaches that can handle this uncertainty are required. One way that allows us to use not only cross-sectional data, but also phylogenetic related and longitudinal data, is using `hypercubic inference' models. In this article we introduce HyperLAU, a new algorithm for hypercubic inference that makes it possible to use datasets including uncertainties for learning evolutionary pathways. Expanding the flexibility of accumulation modelling, HyperLAU allows us to infer dynamic pathways and interactions between features, even when large sets of particular features are unobserved across the source dataset. We show that HyperLAU is able to highlight the main pathways found by other tools, even when up to 50% of the features in the input data are uncertain. Additionally, we demonstrate how it can help to overcome possible biases that can occur then reducing the used data by excluding uncertain parts. We illustrate the approach with a case study on multidrug resistance in tuberculosis, showing that HyperLAU allows more flexible data and provides new information about evolutionary pathways compared to existing approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces HyperLAU, a new algorithm extending hypercubic inference models to accommodate uncertain or missing observational data (cross-sectional, phylogenetic, or longitudinal) when inferring evolutionary accumulation pathways and feature interactions. It claims that HyperLAU recovers the dominant pathways identified by existing tools even when up to 50% of input features are uncertain, mitigates bias from simply discarding uncertain observations, and yields additional pathway information on a multidrug-resistance tuberculosis dataset.
Significance. If the uncertainty-handling mechanism proves robust, the work could meaningfully expand the usable data volume for accumulation modeling in evolutionary biology and clinical microbiology. The approach directly targets a common practical limitation (sparse/uncertain features) rather than assuming complete observations. However, the significance is currently limited by the absence of controlled validation or analytic guarantees beyond a single real-world case study.
major comments (3)
- [Abstract] Abstract and introduction: the central empirical claim (recovery of main pathways with up to 50% uncertain features on the TB dataset) rests on a single case study without reported controlled simulations that systematically vary the fraction, correlation structure, or type of uncertainty while holding the true accumulation graph fixed. This leaves open whether the hypercubic model structure plus the chosen uncertainty encoding introduces systematic bias under the stated conditions.
- [Abstract] Abstract: no analytic bound, parameter-free derivation, or error-propagation analysis is referenced for how uncertainty in individual features propagates through the hypercubic inference procedure; the manuscript therefore provides no a-priori reason to expect the 50% threshold to be general rather than dataset-specific.
- [Abstract] The manuscript states that HyperLAU 'allows more flexible data' and 'provides new information' compared with existing tools, yet supplies no quantitative comparison (e.g., pathway overlap metrics, false-positive rates on simulated graphs, or sensitivity to the uncertainty representation) that would establish these improvements are not artifacts of the particular TB dataset or the chosen baseline tools.
minor comments (2)
- [Abstract] The abstract refers to 'hypercubic inference' models without a brief reminder of the underlying state-space construction or the precise meaning of 'accumulation dynamics'; a one-sentence definition would aid readers unfamiliar with the prior literature.
- [Abstract] The claim that the method 'overcome[s] possible biases' from data reduction is stated without specifying how bias is measured or what the baseline bias magnitude is on the TB example.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment below, acknowledging the limitations in the current version and outlining specific revisions to address them.
read point-by-point responses
-
Referee: [Abstract] Abstract and introduction: the central empirical claim (recovery of main pathways with up to 50% uncertain features on the TB dataset) rests on a single case study without reported controlled simulations that systematically vary the fraction, correlation structure, or type of uncertainty while holding the true accumulation graph fixed. This leaves open whether the hypercubic model structure plus the chosen uncertainty encoding introduces systematic bias under the stated conditions.
Authors: We agree that the current empirical support relies primarily on the TB case study and that controlled simulations would strengthen the claims. In the revised manuscript we will add a dedicated simulation study section that systematically varies the fraction of uncertain features (including around the 50% level), their correlation structure, and uncertainty types, using fixed ground-truth accumulation graphs. This will allow direct assessment of potential bias introduced by the uncertainty encoding. revision: yes
-
Referee: [Abstract] Abstract: no analytic bound, parameter-free derivation, or error-propagation analysis is referenced for how uncertainty in individual features propagates through the hypercubic inference procedure; the manuscript therefore provides no a-priori reason to expect the 50% threshold to be general rather than dataset-specific.
Authors: The manuscript does not currently include analytic bounds or formal error-propagation analysis, as the emphasis was on algorithmic extension and practical application. We will add a section in the revision that analyzes uncertainty propagation through the hypercubic model, including any available bounds or sensitivity results, to provide additional justification for the observed performance levels. revision: yes
-
Referee: [Abstract] The manuscript states that HyperLAU 'allows more flexible data' and 'provides new information' compared with existing tools, yet supplies no quantitative comparison (e.g., pathway overlap metrics, false-positive rates on simulated graphs, or sensitivity to the uncertainty representation) that would establish these improvements are not artifacts of the particular TB dataset or the chosen baseline tools.
Authors: We acknowledge the absence of quantitative metrics comparing pathway recovery, false-positive rates, and sensitivity across methods. The revised manuscript will incorporate such comparisons, using both the TB dataset and the new simulations, to quantify improvements in pathway overlap and robustness relative to baselines. revision: yes
Circularity Check
No circularity: new algorithm with empirical case-study validation
full rationale
The manuscript introduces HyperLAU as a new inference algorithm extending hypercubic accumulation models to handle uncertain or missing features. No derivation chain, parameter fitting, or prediction step is shown to reduce tautologically to its own inputs or to a self-citation. Performance claims rest on an empirical demonstration that the algorithm recovers pathways identified by other tools on one TB multidrug-resistance dataset, which constitutes independent validation rather than a self-referential loop. The central contribution is therefore algorithmic and observational, not a closed mathematical identity.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
An Algebraic Approach to Evolutionary Accumulation Models
An algebraic approach defines semi-algebraic parameter sets from underlying polynomial structures in evolutionary processes before likelihood maximization, showing compatibility with existing statistical EvAM models w...
Reference graph
Works this paper leans on
-
[1]
Aga, O. N. L., Brun, M., Dauda, K. A., Diaz-Uriarte, R., Giannakis, K., and Johnston, I. G. (2024). HyperTraPS - CT : Inference and prediction for accumulation pathways with flexible data and model structures. PLOS Computational Biology , 20(9):e1012393
work page 2024
-
[2]
Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2006). Evolution on distributive lattices. Journal of Theoretical Biology , 242(2):409--420
work page 2006
-
[3]
Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2007). Conjunctive Bayesian networks. Bernoulli , 13(4):893--909
work page 2007
-
[4]
F., Gerstung, M., and Markowetz, F
Beerenwinkel, N., Schwarz, R. F., Gerstung, M., and Markowetz, F. (2015). Cancer Evolution : Mathematical Models and Computational Inference . Systematic Biology , 64(1):e1--e25
work page 2015
-
[5]
Beerenwinkel, N. and Sullivant, S. (2009). Markov models for accumulating mutations. Biometrika , 96(3):645--661
work page 2009
-
[6]
Casali, N., Nikolayevskyy, V., Balabanova, Y., Harris, S. R., Ignatyeva, O., Kontsevaya, I., Corander, J., Bryant, J., Parkhill, J., Nejentsev, S., Horstmann, R. D., Brown, T., and Drobniewski, F. (2014). Evolution and transmission of drug-resistant tuberculosis in a Russian population. Nature Genetics , 46(3):279--286
work page 2014
-
[7]
Chen, J. (2023). Timed hazard networks: Incorporating temporal difference for oncogenetic analysis. PLOS ONE , 18(3):e0283004
work page 2023
-
[8]
Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal , Complex Systems:1695
work page 2006
-
[9]
Csárdi, G., Nepusz, T., Traag, V., Horvát, S., Zanini, F., Noom, D., and Müller, K. (2024). \ igraph\ : Network Analysis and Visualization in R
work page 2024
-
[10]
O., Wu, H., Safa Erenay, F., Sir, M
Dalgıç, Ö. O., Wu, H., Safa Erenay, F., Sir, M. Y., Özaltın, O. Y., Crum, B. A., and Pasupathy, K. S. (2021). Mapping of critical events in disease progression through binary classification: Application to amyotrophic lateral sclerosis. Journal of Biomedical Informatics , 123:103895
work page 2021
-
[11]
Desper, R., Jiang, F., Kallioniemi, O.-P., Moch, H., Papadimitriou, C. H., and Schäffer, A. A. (1999). Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data . Journal of Computational Biology , 6(1):37--51
work page 1999
-
[12]
Diaz-Uriarte, R and Johnston, I. (2025). A picture guide to cancer progression and monotonic accumulation models: evolutionary assumptions, plausible interpretations, and alternative uses. IEEE Access
work page 2025
-
[13]
Diaz-Uriarte, R and Herrera-Nieto, P. (2022). EvAM-Tools: tools for evolutionary accumulation and cancer progression models. uses. Bioinformatics , 38(24), 5457--5459
work page 2022
-
[14]
Diaz-Uriarte, R. and Vasallo, C. (2019). Every which way? On predicting tumor evolution using cancer progression models. PLOS Computational Biology , 15(8):e1007246
work page 2019
-
[15]
Gao, Y., Gaither, J., Chifman, J., and Kubatko, L. (2022). A phylogenetic approach to inferring the order in which mutations arise during cancer progression. PLOS Computational Biology , 18(12):e1010560
work page 2022
-
[16]
Gerstung, M., Baudis, M., Moch, H., and Beerenwinkel, N. (2009). Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics , 25(21):2809--2815
work page 2009
-
[17]
Gotovos, A., Burkholz, R., Quackenbush, J., and Jegelka, S. (2021). Scaling up Continuous - Time Markov Chains Helps Resolve Underspecification . In Advances in Neural Information Processing Systems , volume 34, pages 14580--14592. Curran Associates, Inc
work page 2021
-
[18]
F., Barahona, M., and Johnston, I
Greenbury, S. F., Barahona, M., and Johnston, I. G. (2020). HyperTraPS : Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways . Cell Systems , 10(1):39--51.e10
work page 2020
-
[19]
Hjelm, M., Höglund, M., and Lagergren, J. (2006). New Probabilistic Network Models and Algorithms for Oncogenesis . Journal of Computational Biology , 13(4):853--865
work page 2006
-
[20]
Johnston, I. G. and Diaz-Uriarte, R. (2024). A hypercubic Mk model framework for capturing reversibility in disease, cancer, and evolutionary accumulation modelling. Publication Title: bioRxiv
work page 2024
-
[21]
G., Hoffmann, T., Greenbury, S
Johnston, I. G., Hoffmann, T., Greenbury, S. F., Cominetti, O., Jallow, M., Kwiatkowski, D., Barahona, M., Jones, N. S., and Casals-Pascual, C. (2019). Precision identification of high-risk phenotypes and progression pathways in severe malaria without requiring longitudinal data. npj Digital Medicine , 2(1):1--9
work page 2019
-
[22]
Johnston, I. G. and Williams, B. P. (2016). Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention . Cell Systems , 2(2):101--111
work page 2016
-
[23]
Kassambara, A. (2023). ggpubr: 'ggplot2' Based Publication Ready Plots
work page 2023
-
[24]
Lewis, P. O. (2001). A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data . Systematic Biology , 50(6):913--925
work page 2001
-
[25]
G., Kuipers, J., and Beerenwinkel, N
Luo, X. G., Kuipers, J., and Beerenwinkel, N. (2023). Joint inference of exclusivity patterns and recurrent trajectories from tumor mutation trees. Nature Communications , 14(1):3676
work page 2023
-
[26]
Maier, U.-G., Zauner, S., Woehle, C., Bolte, K., Hempel, F., Allen, J. F., and Martin, W. F. (2013). Massively Convergent Evolution for Ribosomal Protein Gene Content in Plastid and Mitochondrial Genomes . Genome Biology and Evolution , 5(12):2318--2329
work page 2013
-
[27]
Moen, M. T. and Johnston, I. G. (2023). HyperHMM : efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics , 39(1):btac803
work page 2023
-
[28]
F., Beerenwinkel, N., and The Swiss HIV Cohort Study (2016)
Montazeri, H., Kuipers, J., Kouyos, R., Böni, J., Yerly, S., Klimkait, T., Aubert, V., Günthard, H. F., Beerenwinkel, N., and The Swiss HIV Cohort Study (2016). Large-scale inference of conjunctive Bayesian networks. Bioinformatics , 32(17):i727--i735
work page 2016
-
[29]
Murray, C. J. L., Ikuta, K. S., Sharara, F., Swetschinski, L., Robles Aguilar, G., Gray, A., Han, C., Bisignano, C., Rao, P., Wool, E., Johnson, S. C., Browne, A. J., Chipeta, M. G., Fell, F., Hackett, S., Haines-Woodhouse, G., Kashef Hamadani, B. H., Kumaran, E. A. P., McManigal, B., Achalapong, S., Agarwal, R., Akech, S., Albertson, S., Amuasi, J., Andr...
work page 2022
-
[30]
Nichol, D., Jeavons, P., Fletcher, A. G., Bonomo, R. A., Maini, P. K., Paul, J. L., Gatenby, R. A., Anderson, A. R. A., and Scott, J. G. (2015). Steering Evolution with Sequential Therapy to Prevent the Emergence of Bacterial Antibiotic Resistance . PLOS Computational Biology , 11(9):e1004493
work page 2015
-
[31]
Nicol, P. B., Coombes, K. R., Deaver, C., Chkrebtii, O., Paul, S., Toland, A. E., and Asiaee, A. (2021). Oncogenetic network estimation with disjunctive Bayesian networks. Computational and Systems Oncology , 1(2):e1027
work page 2021
-
[32]
Pagel, M. (1994). Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proceedings of the Royal Society , 255(1342):37--45
work page 1994
-
[33]
Pedersen, T. L. (2024). ggraph: An Implementation of Grammar of Graphics for Graphs and Networks
work page 2024
-
[34]
Renz, J., Dauda, K. A., Aga, O. N. L., Diaz-Uriarte, R., Löhr, I. H., Blomberg, B., and Johnston, I. G. (2024). Evolutionary accumulation modelling in AMR : machine learning to infer and predict evolutionary dynamics of multi-drug resistance
work page 2024
-
[35]
Rupp, K., Schill, R., Süskind, J., Georg, P., Klever, M., Lösch, A., Grasedyck, L., Wettig, T., and Spang, R. (2024). Differentiated uniformization: a new method for inferring Markov chains on combinatorial state spaces including stochastic epidemic models. Computational Statistics
work page 2024
-
[36]
Sanderson, C. and Curtin, R. (2016). Armadillo: a template-based C ++ library for linear algebra. Journal of Open Source Software , 1(2):26
work page 2016
-
[37]
Sanderson, C. and Curtin, R. (2019). Practical Sparse Matrices in C ++ with Hybrid Storage and Template - Based Expression Optimisation . Mathematical and Computational Applications , 24(3)
work page 2019
-
[38]
L., Vocht, S., Rupp, K., Grasedyck, L., Spang, R., and Beerenwinkel, N
Schill, R., Klever, M., Lösch, A., Hu, Y. L., Vocht, S., Rupp, K., Grasedyck, L., Spang, R., and Beerenwinkel, N. (2024a). Overcoming Observation Bias for Cancer Progression Modeling . In Ma, J., editor, Research in Computational Molecular Biology , pages 217--234, Cham. Springer Nature Switzerland
-
[39]
Schill, R., Klever, M., Rupp, K., Hu, Y. L., Lösch, A., Georg, P., Pfahler, S., Vocht, S., Hansch, S., Wettig, T., Grasedyck, L., and Spang, R. (2024b). Reconstructing Disease Histories in Huge Discrete State Spaces . KI - Künstliche Intelligenz
-
[40]
Schill, R., Solbrig, S., Wettig, T., and Spang, R. (2020). Modelling cancer progression using Mutual Hazard Networks . Bioinformatics , 36(1):241--249
work page 2020
-
[41]
Schwartz, R. and Schäffer, A. A. (2017). The evolution of tumour phylogenetics: principles and practice. Nature Reviews Genetics , 18(4):213--229
work page 2017
-
[42]
Slowikowski, K. (2024). ggrepel: Automatically Position Non - Overlapping Text Labels with 'ggplot2'. https://ggrepel.slowkow.com/, https://github.com/slowkow/ggrepel
work page 2024
-
[43]
Szabo, A. and Boucher, K. (2002). Estimating an oncogenetic tree when false negatives and positives are present. Mathematical Biosciences , 176(2):219--236
work page 2002
-
[44]
Takeshima, H. and Ushijima, T. (2019). Accumulation of genetic and epigenetic alterations in normal cells and cancer risk. npj Precision Oncology , 3(1):1--8
work page 2019
-
[45]
Tan, L., Serene, S., Chao, H. X., and Gore, J. (2011). Hidden Randomness between Fitness Landscapes Limits Reverse Evolution . Physical Review Letters , 106(19):198102
work page 2011
-
[46]
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag New York
work page 2016
-
[47]
Wickham, H. (2023). stringr: Simple , Consistent Wrappers for Common String Operations
work page 2023
-
[48]
Wickham, H. and Bryan, J. (2025). readxl: Read Excel Files . R package version 1.4.5, https://github.com/tidyverse/readxl
work page 2025
-
[49]
Wickham, H., François, R., Henry, L., Müller, K., and Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation . R package version 1.1.4, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org
work page 2023
-
[50]
Wickham, H., Vaughan, D., and Girlich, M. (2024). tidyr: Tidy Messy Data . R package version 1.3.1, https://github.com/tidyverse/tidyr
work page 2024
-
[51]
Youn, A. and Simon, R. (2012). Estimating the order of mutations during tumorigenesis from tumor genome sequencing data. Bioinformatics , 28(12):1555--1561
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.