Phylogenetic Tree Inference with Tropical Axial Attention
Pith reviewed 2026-05-15 06:07 UTC · model grok-4.3
The pith
Tropical axial attention replaces standard attention with max-plus operators to learn phylogenetic distances aligned with tree geometry.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Tropical axial attention provides a natural geometric framework for phylogenetic inference. By using max-plus operators instead of vanilla softmax dot-product attention, the architecture induces a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, the model learns all possible pairwise distances and is trained using a combination of l1 and tropical symmetric distance metric losses with an ultrametric violation penalty. Leveraging the isomorphic relationship between the space of all phylogenetic trees with n species and the tropical Grassmannian, it produces distance matrices substantially closer to their BME-induced tree metrics.
What carries the argument
Tropical Axial Attention that replaces vanilla softmax dot-product attention with max-plus operators to induce piecewise-linear structure aligned with phylogenetic tree geometry via the tropical Grassmannian.
If this is right
- The tropical model produces distance matrices substantially closer to BME-induced tree metrics than baseline models on DS1-DS11 alignments.
- Tropical attention acts as a geometric inductive bias that improves neural phylogenetic inference under distribution shift.
- Enforcing tree-metric consistency through tropical losses becomes a practical route to better distance estimates when true trees are unavailable.
- The isomorphism with the tropical Grassmannian supplies a principled reason to prefer max-plus attention over standard dot-product attention for this domain.
Where Pith is reading between the lines
- The same max-plus replacement could be tested on other hierarchical or ultrametric learning problems outside phylogenetics.
- Hybrid pipelines that feed tropical attention distances into existing BME or NJ algorithms might improve accuracy without full retraining.
- Controlled simulations with ground-truth trees would let researchers measure how much the ultrametric penalty reduces metric error beyond what the data alone provides.
- If the piecewise-linear inductive bias generalizes, similar attention swaps could benefit sequence models in other domains that involve additive or tree-like distances.
Load-bearing premise
The piecewise-linear structure induced by max-plus operators and the ultrametric violation penalty will align with actual phylogenetic tree geometry even when true trees are unknown and evaluation relies on BME-induced metrics.
What would settle it
Running the model on simulated alignments generated from known true trees and checking whether the tropical distances deviate more from the true tree metric than baseline distances would falsify the claimed geometric advantage.
Figures
read the original abstract
In this work, we introduce a Tropical Axial Attention neural reasoning architecture that replaces vanilla softmax dot-product attention with max-plus operators, inducing a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, our model learns all possible pairwise distances and is trained using a combination of $\ell_1$ and tropical symmetric distance metric losses with an ultrametric violation penalty. We leverage the well known isomorphic relationship between the space of all phylogenetic trees with $n$ species and tropical Grassmannian to show that tropical attention provides a natural geometric framework for phylogenetic inference. On empirical $DS1-DS11$ alignments, where true trees are unknown, the tropical model produces distance matrices that are substantially closer to their BME-induced tree metrics than the baseline models. These results suggest that tropical attention is a useful geometric inductive bias for neural phylogenetic inference, especially under distribution shift and when tree-metric consistency is important.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper introduces a Tropical Axial Attention neural architecture for phylogenetic tree inference from multi-species sequence alignments. It replaces standard softmax dot-product attention with max-plus operators to induce piecewise-linear structures aligned with dynamic programming. The model learns all pairwise distances and is trained with a combination of ℓ1 loss, tropical symmetric distance metric loss, and an ultrametric violation penalty. It invokes the known isomorphism between phylogenetic trees with n species and the tropical Grassmannian to argue that tropical attention supplies a natural geometric framework. On the DS1-DS11 empirical alignments (true trees unknown), the tropical model is reported to produce distance matrices substantially closer to their BME-induced tree metrics than baseline models.
Significance. If the central empirical claim can be substantiated without proxy circularity, the work would supply a geometrically motivated inductive bias for neural phylogenetic inference that aligns with tree metrics and ultrametric properties. This could be especially useful under distribution shift, where standard attention mechanisms may not enforce tree-like consistency. The explicit use of max-plus operators and the tropical loss terms represent a concrete attempt to embed tropical geometry into sequence-to-distance models.
major comments (3)
- [Abstract and §4] Abstract and §4 (Empirical Evaluation on DS1-DS11): The claim that the tropical model produces distance matrices 'substantially closer' to BME-induced tree metrics is load-bearing for the utility argument, yet the evaluation applies BME to the model's own outputs while the training loss (ℓ1 + tropical symmetric distance + ultrametric violation penalty) explicitly incentivizes ultrametric/tree-like distances. With true trees unavailable on these datasets, the metric risks measuring consistency with the training objective rather than independent phylogenetic accuracy.
- [§2] §2 (Geometric Framework): The assertion that tropical attention provides a 'natural geometric framework' rests on the external isomorphism between phylogenetic trees and the tropical Grassmannian, but the manuscript does not demonstrate that the learned axial-attention parameters map to points in this space or that the max-plus replacement directly recovers tree metrics beyond the effect of the added loss terms.
- [§4.1] §4.1 (Baselines, Metrics, and Ablations): No error bars, statistical significance tests, or ablation results isolating the contribution of the tropical symmetric distance loss versus the ultrametric penalty are reported for the 'substantially closer' distances. This leaves the magnitude and robustness of the improvement over baselines difficult to assess.
minor comments (2)
- [Abstract] Abstract: The description of DS1-DS11 would be clearer if the number of taxa, alignment lengths, and sources of the datasets were stated explicitly.
- [Throughout] Throughout: The max-plus operator notation and its relation to the standard attention formula should be introduced with a short worked example to aid readers unfamiliar with tropical semirings.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below, providing clarifications and indicating the revisions incorporated in the updated manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Empirical Evaluation on DS1-DS11): The claim that the tropical model produces distance matrices 'substantially closer' to BME-induced tree metrics is load-bearing for the utility argument, yet the evaluation applies BME to the model's own outputs while the training loss (ℓ1 + tropical symmetric distance + ultrametric violation penalty) explicitly incentivizes ultrametric/tree-like distances. With true trees unavailable on these datasets, the metric risks measuring consistency with the training objective rather than independent phylogenetic accuracy.
Authors: We agree that the BME comparison serves as a proxy metric and that the training losses encourage tree-like structure, which introduces a risk of measuring consistency with the objective rather than independent accuracy. All models (including baselines) are evaluated under identical BME post-processing, so relative improvements isolate the contribution of tropical axial attention. We have revised the abstract and §4 to explicitly acknowledge this proxy nature and the unavailability of true trees on DS1-DS11. We also added a brief discussion of this limitation and its implications for interpreting the results. revision: partial
-
Referee: [§2] §2 (Geometric Framework): The assertion that tropical attention provides a 'natural geometric framework' rests on the external isomorphism between phylogenetic trees and the tropical Grassmannian, but the manuscript does not demonstrate that the learned axial-attention parameters map to points in this space or that the max-plus replacement directly recovers tree metrics beyond the effect of the added loss terms.
Authors: The isomorphism is invoked strictly as geometric motivation for replacing dot-product attention with max-plus operators, whose piecewise-linear behavior aligns with tropical semiring operations used in tree metric formulations. We do not claim that the learned parameters lie in the tropical Grassmannian or that max-plus alone recovers tree metrics independently of the loss terms. We have added a clarifying paragraph in §2 that distinguishes the motivational role of the isomorphism from any direct parameter-space embedding. revision: yes
-
Referee: [§4.1] §4.1 (Baselines, Metrics, and Ablations): No error bars, statistical significance tests, or ablation results isolating the contribution of the tropical symmetric distance loss versus the ultrametric penalty are reported for the 'substantially closer' distances. This leaves the magnitude and robustness of the improvement over baselines difficult to assess.
Authors: We agree that error bars, significance testing, and targeted ablations are required to substantiate the reported improvements. The revised §4.1 now includes standard error bars across five independent runs, Wilcoxon signed-rank tests for pairwise model comparisons, and ablation tables that separately remove the tropical symmetric distance loss and the ultrametric penalty to quantify their individual contributions. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper motivates its architecture via the well-known external isomorphism between phylogenetic trees and the tropical Grassmannian, an independent mathematical fact not derived from the model's parameters or losses. Empirical evaluation on DS1-DS11 (true trees unknown) reports that the model's distances are closer to BME-induced metrics than baselines; this is a comparative result against external baselines rather than a quantity forced by construction from the training losses. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Isomorphic relationship between the space of phylogenetic trees with n species and the tropical Grassmannian
Reference graph
Works this paper leans on
-
[1]
Capacitated Dynamic Programming: Faster Knapsack and Graph Algorithms
Axiotis, K. and Tzamos, C. (2018). Capacitated dynamic programming: Faster knapsack and graph algorithms.arXiv preprint arXiv:1802.06440
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
J., Duchene, D., and Yoshida, R
Bhatt, S., Sabol, J., Dey, P., Penn, M. J., Duchene, D., and Yoshida, R. (2025). Phy- logenetics in a warm place: computational aspects of the tropical grassmannian.arXiv preprint arXiv:2512.21765
-
[3]
Billera, L. J., Holmes, S. P., and Vogtmann, K. (2001). Geometry of the space of phylogenetic trees.Advances in Applied Mathematics27, 733–767. doi:10.1006/aama.2001. 0759
- [4]
-
[5]
Chen, Y., Huang, J., Yang, C., Hsu, K., and Liu, H. (2023). A comprehensive phyloge- netic analysis of sars-cov-2: Utilizing a novel and convenient in-house rt-pcr method for characterization without virus culture and bsl-3 facilities.Viruses16
work page 2023
-
[6]
Day, W. H. (1987). Computational complexity of inferring phylogenies from dissimilarity matrices.Bulletin of Mathematical Biology49, 461–467. doi:10.1007/BF02458863
-
[7]
Desper, R. and Gascuel, O. (2002). Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.Journal of computational biology : a journal of computational molecular cell biology9, 687–705. doi:10.1089/106652702761034136
-
[8]
Eickmeyer, K., Huggins, P., Pachter, L., and Yoshida, R. (2008). On the optimality of the neighbor-joining algorithm. algorithms.Algorithms Mol Biol3
work page 2008
-
[9]
Felsenstein, J. (1981). Evolutionary trees from dna sequences: a maximum likelihood approach.Journal of Molecular Evolution
work page 1981
-
[10]
Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0.Systematic Biology59, 307–321. doi:10.1093/ sysbio/syq010
work page 2010
-
[11]
Guindon, S. and Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.Systematic Biology52, 696–704. doi:10.1080/ 10635150390235520 16 CHRIS TESKA, KURT PASQUE, RURIKO YOSHIDA, AND BARAN HASHEMI
work page 2003
-
[12]
[Dataset] Hashemi, B., Pasque, K., Teska, C., and Yoshida, R. (2025). Tropical attention: Neural algorithmic reasoning for combinatorial algorithms. The Thirty-ninth Annual Conference on Neural Information Processing Systems
work page 2025
-
[13]
[Dataset] Ho, J., Kalchbrenner, N., Weissenborn, D., and Salimans, T. (2019). Axial attention in multidimensional transformers
work page 2019
-
[14]
Huelsenbeck, J. P. and Ronquist, F. (2001). Mrbayes: Bayesian inference of phylogenetic trees.Bioinformatics17, 754–755. doi:10.1093/bioinformatics/17.8.754
-
[15]
Joswig, M. and Schröter, B. (2022). Parametric shortest-path algorithms via tropical geometry.Mathematics of Operations Research47, 2065–2081
work page 2022
-
[16]
Jukna, S. (2014). Lower bounds for tropical circuits and dynamic programs.Theory of Computing Systems57, 160–194
work page 2014
-
[17]
Kuhner, M. K. and Felsenstein, J. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.Molecular Biology and Evolution11, 459–468. doi:10.1093/oxfordjournals.molbev.a040126
-
[18]
P., Larget, B., and Ronquist, F
Lakner, C., van der Mark, P., Huelsenbeck, J. P., Larget, B., and Ronquist, F. (2008). Efficiency of markov chain monte carlo tree proposals in bayesian phylogenetics.Systematic Biology57, 86–103. doi:10.1080/10635150801886156
-
[19]
Lanave, C., Preparata, G., Saccone, C., and Serio, G. (1984). A new method for calculating evolutionary substitution rates.Journal of Molecular Evolution20, 86–93
work page 1984
-
[20]
Lefort, V., Desper, R., and Gascuel, O. (2015). Fastme 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program.Molecular Biology and Evolution
work page 2015
-
[21]
Ly-Trong, N., Naser-Khdour, S., Lanfear, R., and Minh, B. Q. (2022). AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.Molecular Biology and Evolution39, msac092. doi:10.1093/molbev/msac092
-
[22]
Maclagan, D. and Sturmfels, B. (2015).Introduction to Tropical Geometry(American Mathematical Society)
work page 2015
- [23]
-
[24]
Nesterenko, L., Blassel, L., Veber, P., and Gascuel, O. (2025). Phyloformer: Fast, accurate, and versatile phylogenetic reconstruction with deep neural networks.Molecular Biology and Evolution42, msaf051. doi:10.1093/molbev/msaf051
-
[25]
Nguyen, L.-T. e. a. (2015). Iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.Molecular Biology and Evolution
work page 2015
-
[26]
Price, M. N., Dehal, P. S., and Arkin, A. P. (2010). Fasttree 2–approximately maximum- likelihood trees for large alignments.PLoS ONE
work page 2010
-
[27]
A., Monical, C., Delcourt, M., et al
Ren, Y., Zha, S., Bi, J., Sanchez, J. A., Monical, C., Delcourt, M., et al. (2021). A combinatorial method for connecting bhv spaces representing different numbers of taxa. arXiv preprint arXiv:1708.02626October 30, 2021 update
-
[28]
Robinson, D. F. and Foulds, L. R. (1981). Comparison of phylogenetic trees.Bellman Prize in Mathematical Biosciences53, 131–147
work page 1981
-
[29]
Saitou, N. and Nei, M. (1987). The neighbor-joining method: a new method for recon- structing phylogenetic trees.Molecular Biology and Evolution
work page 1987
-
[30]
Soltis, D. E. and Soltis, P. S. (2003). The role of phylogenetics in comparative genetics. Plant Physiology132, 1790–1800. doi:10.1104/pp.103.022509
-
[31]
Speyer, D. and Sturmfels, B. (2004). The tropical Grassmannian.Advances in Geometry 4, 389–411
work page 2004
-
[32]
Stamatakis, A. (2014). Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.Bioinformatics PHYLOGENETIC TREE INFERENCE WITH TROPICAL AXIAL ATTENTION 17
work page 2014
- [33]
-
[34]
[Dataset] Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., and Chen, L.-C. (2020). Axial-deeplab: Stand-alone axial-attention for panoptic segmentation
work page 2020
-
[35]
[Dataset] Wang, P. (2021). axial-attention. https://github.com/lucidrains/ axial-attention
work page 2021
-
[36]
Whidden, C. and Matsen, F. A. (2015). Quantifying MCMC exploration of phylogenetic tree space.Systematic Biology64, 472–491. doi:10.1093/sysbio/syv006 (Chris Teska, Kurt Pasque, and Ruriko Yoshida)Department of Operations Research, Naval Post- graduate School, Monterey, CA, USA Email address, Chris Teska:christopher.teska@nps.edu (Baran Hashemi)Max Planck...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.