pith. sign in

arxiv: 2605.13894 · v1 · pith:BZ2X5GU3new · submitted 2026-05-12 · 🧬 q-bio.PE · cs.LG

Phylogenetic Tree Inference with Tropical Axial Attention

Pith reviewed 2026-05-15 06:07 UTC · model grok-4.3

classification 🧬 q-bio.PE cs.LG
keywords phylogenetic inferencetropical geometrymax-plus attentiondistance matricesBME metricsultrametric penaltysequence alignmentsneural networks
0
0 comments X

The pith

Tropical axial attention replaces standard attention with max-plus operators to learn phylogenetic distances aligned with tree geometry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a neural architecture that uses tropical axial attention for phylogenetic tree inference. It replaces the usual softmax dot-product attention with max-plus operators to create a piecewise-linear structure that matches dynamic programming approaches used in tree reconstruction. The model is trained on sequence alignments to predict pairwise distances using a mix of L1 loss and tropical symmetric distance losses, along with a penalty for violating ultrametric properties. Drawing on the known isomorphism between phylogenetic tree space and the tropical Grassmannian, the authors argue this provides a natural geometric framework for the task. On real datasets DS1 through DS11, the resulting distance matrices are substantially closer to those induced by the BME method than those from baseline attention models, indicating a useful inductive bias especially when data distribution shifts occur.

Core claim

Tropical axial attention provides a natural geometric framework for phylogenetic inference. By using max-plus operators instead of vanilla softmax dot-product attention, the architecture induces a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, the model learns all possible pairwise distances and is trained using a combination of l1 and tropical symmetric distance metric losses with an ultrametric violation penalty. Leveraging the isomorphic relationship between the space of all phylogenetic trees with n species and the tropical Grassmannian, it produces distance matrices substantially closer to their BME-induced tree metrics.

What carries the argument

Tropical Axial Attention that replaces vanilla softmax dot-product attention with max-plus operators to induce piecewise-linear structure aligned with phylogenetic tree geometry via the tropical Grassmannian.

If this is right

  • The tropical model produces distance matrices substantially closer to BME-induced tree metrics than baseline models on DS1-DS11 alignments.
  • Tropical attention acts as a geometric inductive bias that improves neural phylogenetic inference under distribution shift.
  • Enforcing tree-metric consistency through tropical losses becomes a practical route to better distance estimates when true trees are unavailable.
  • The isomorphism with the tropical Grassmannian supplies a principled reason to prefer max-plus attention over standard dot-product attention for this domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same max-plus replacement could be tested on other hierarchical or ultrametric learning problems outside phylogenetics.
  • Hybrid pipelines that feed tropical attention distances into existing BME or NJ algorithms might improve accuracy without full retraining.
  • Controlled simulations with ground-truth trees would let researchers measure how much the ultrametric penalty reduces metric error beyond what the data alone provides.
  • If the piecewise-linear inductive bias generalizes, similar attention swaps could benefit sequence models in other domains that involve additive or tree-like distances.

Load-bearing premise

The piecewise-linear structure induced by max-plus operators and the ultrametric violation penalty will align with actual phylogenetic tree geometry even when true trees are unknown and evaluation relies on BME-induced metrics.

What would settle it

Running the model on simulated alignments generated from known true trees and checking whether the tropical distances deviate more from the true tree metric than baseline distances would falsify the claimed geometric advantage.

Figures

Figures reproduced from arXiv: 2605.13894 by Baran Hashemi, Chris Teska, Kurt Pasque, Ruriko Yoshida.

Figure 1
Figure 1. Figure 1: Tropical axial attention takes R ∈ RM×P ×dmodel and tropically projects to Q, K, V tensors. Attention is performed in parallel along both the M and P axis, preserving the geometric relevance of both axes for phylogenetic inference. • Sequence axis attention: For a fixed leaf pair (i, j) ∈ P, attention across M sites lets the model learn mutation evidence across the alignment. • Pair axis attention: For a f… view at source ↗
Figure 2
Figure 2. Figure 2: Tree reconstruction error by leaf count on simulated test data generated under the training regime. Panels show (a) RF distance, (b) normalized RF distance, (c) weighted RF distance, and (d) KF distance. Lower values indicate better agreement with the reference tree associated with the input MSA. Phyloformer 2 achieves the strongest topological performance under RF-based metrics, while Phyloformer generall… view at source ↗
read the original abstract

In this work, we introduce a Tropical Axial Attention neural reasoning architecture that replaces vanilla softmax dot-product attention with max-plus operators, inducing a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, our model learns all possible pairwise distances and is trained using a combination of $\ell_1$ and tropical symmetric distance metric losses with an ultrametric violation penalty. We leverage the well known isomorphic relationship between the space of all phylogenetic trees with $n$ species and tropical Grassmannian to show that tropical attention provides a natural geometric framework for phylogenetic inference. On empirical $DS1-DS11$ alignments, where true trees are unknown, the tropical model produces distance matrices that are substantially closer to their BME-induced tree metrics than the baseline models. These results suggest that tropical attention is a useful geometric inductive bias for neural phylogenetic inference, especially under distribution shift and when tree-metric consistency is important.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. This paper introduces a Tropical Axial Attention neural architecture for phylogenetic tree inference from multi-species sequence alignments. It replaces standard softmax dot-product attention with max-plus operators to induce piecewise-linear structures aligned with dynamic programming. The model learns all pairwise distances and is trained with a combination of ℓ1 loss, tropical symmetric distance metric loss, and an ultrametric violation penalty. It invokes the known isomorphism between phylogenetic trees with n species and the tropical Grassmannian to argue that tropical attention supplies a natural geometric framework. On the DS1-DS11 empirical alignments (true trees unknown), the tropical model is reported to produce distance matrices substantially closer to their BME-induced tree metrics than baseline models.

Significance. If the central empirical claim can be substantiated without proxy circularity, the work would supply a geometrically motivated inductive bias for neural phylogenetic inference that aligns with tree metrics and ultrametric properties. This could be especially useful under distribution shift, where standard attention mechanisms may not enforce tree-like consistency. The explicit use of max-plus operators and the tropical loss terms represent a concrete attempt to embed tropical geometry into sequence-to-distance models.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Empirical Evaluation on DS1-DS11): The claim that the tropical model produces distance matrices 'substantially closer' to BME-induced tree metrics is load-bearing for the utility argument, yet the evaluation applies BME to the model's own outputs while the training loss (ℓ1 + tropical symmetric distance + ultrametric violation penalty) explicitly incentivizes ultrametric/tree-like distances. With true trees unavailable on these datasets, the metric risks measuring consistency with the training objective rather than independent phylogenetic accuracy.
  2. [§2] §2 (Geometric Framework): The assertion that tropical attention provides a 'natural geometric framework' rests on the external isomorphism between phylogenetic trees and the tropical Grassmannian, but the manuscript does not demonstrate that the learned axial-attention parameters map to points in this space or that the max-plus replacement directly recovers tree metrics beyond the effect of the added loss terms.
  3. [§4.1] §4.1 (Baselines, Metrics, and Ablations): No error bars, statistical significance tests, or ablation results isolating the contribution of the tropical symmetric distance loss versus the ultrametric penalty are reported for the 'substantially closer' distances. This leaves the magnitude and robustness of the improvement over baselines difficult to assess.
minor comments (2)
  1. [Abstract] Abstract: The description of DS1-DS11 would be clearer if the number of taxa, alignment lengths, and sources of the datasets were stated explicitly.
  2. [Throughout] Throughout: The max-plus operator notation and its relation to the standard attention formula should be introduced with a short worked example to aid readers unfamiliar with tropical semirings.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below, providing clarifications and indicating the revisions incorporated in the updated manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Empirical Evaluation on DS1-DS11): The claim that the tropical model produces distance matrices 'substantially closer' to BME-induced tree metrics is load-bearing for the utility argument, yet the evaluation applies BME to the model's own outputs while the training loss (ℓ1 + tropical symmetric distance + ultrametric violation penalty) explicitly incentivizes ultrametric/tree-like distances. With true trees unavailable on these datasets, the metric risks measuring consistency with the training objective rather than independent phylogenetic accuracy.

    Authors: We agree that the BME comparison serves as a proxy metric and that the training losses encourage tree-like structure, which introduces a risk of measuring consistency with the objective rather than independent accuracy. All models (including baselines) are evaluated under identical BME post-processing, so relative improvements isolate the contribution of tropical axial attention. We have revised the abstract and §4 to explicitly acknowledge this proxy nature and the unavailability of true trees on DS1-DS11. We also added a brief discussion of this limitation and its implications for interpreting the results. revision: partial

  2. Referee: [§2] §2 (Geometric Framework): The assertion that tropical attention provides a 'natural geometric framework' rests on the external isomorphism between phylogenetic trees and the tropical Grassmannian, but the manuscript does not demonstrate that the learned axial-attention parameters map to points in this space or that the max-plus replacement directly recovers tree metrics beyond the effect of the added loss terms.

    Authors: The isomorphism is invoked strictly as geometric motivation for replacing dot-product attention with max-plus operators, whose piecewise-linear behavior aligns with tropical semiring operations used in tree metric formulations. We do not claim that the learned parameters lie in the tropical Grassmannian or that max-plus alone recovers tree metrics independently of the loss terms. We have added a clarifying paragraph in §2 that distinguishes the motivational role of the isomorphism from any direct parameter-space embedding. revision: yes

  3. Referee: [§4.1] §4.1 (Baselines, Metrics, and Ablations): No error bars, statistical significance tests, or ablation results isolating the contribution of the tropical symmetric distance loss versus the ultrametric penalty are reported for the 'substantially closer' distances. This leaves the magnitude and robustness of the improvement over baselines difficult to assess.

    Authors: We agree that error bars, significance testing, and targeted ablations are required to substantiate the reported improvements. The revised §4.1 now includes standard error bars across five independent runs, Wilcoxon signed-rank tests for pairwise model comparisons, and ablation tables that separately remove the tropical symmetric distance loss and the ultrametric penalty to quantify their individual contributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper motivates its architecture via the well-known external isomorphism between phylogenetic trees and the tropical Grassmannian, an independent mathematical fact not derived from the model's parameters or losses. Empirical evaluation on DS1-DS11 (true trees unknown) reports that the model's distances are closer to BME-induced metrics than baselines; this is a comparative result against external baselines rather than a quantity forced by construction from the training losses. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the external mathematical isomorphism between phylogenetic tree space and the tropical Grassmannian plus the assumption that max-plus attention plus the chosen losses will enforce tree-metric properties.

axioms (1)
  • domain assumption Isomorphic relationship between the space of phylogenetic trees with n species and the tropical Grassmannian
    Invoked to argue that tropical attention supplies a natural geometric framework.

pith-pipeline@v0.9.0 · 5459 in / 1248 out tokens · 37428 ms · 2026-05-15T06:07:07.596298+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    Capacitated Dynamic Programming: Faster Knapsack and Graph Algorithms

    Axiotis, K. and Tzamos, C. (2018). Capacitated dynamic programming: Faster knapsack and graph algorithms.arXiv preprint arXiv:1802.06440

  2. [2]

    J., Duchene, D., and Yoshida, R

    Bhatt, S., Sabol, J., Dey, P., Penn, M. J., Duchene, D., and Yoshida, R. (2025). Phy- logenetics in a warm place: computational aspects of the tropical grassmannian.arXiv preprint arXiv:2512.21765

  3. [3]

    J., Holmes, S

    Billera, L. J., Holmes, S. P., and Vogtmann, K. (2001). Geometry of the space of phylogenetic trees.Advances in Applied Mathematics27, 733–767. doi:10.1006/aama.2001. 0759

  4. [4]

    Blassel, L., Sauvage, N., Barrat-Charlaix, P., Boussau, B., Lartillot, N., and Jacob, L. (2025). Likelihood-free inference of phylogenetic tree posterior distributions.arXiv preprint arXiv:2510.12976

  5. [5]

    Chen, Y., Huang, J., Yang, C., Hsu, K., and Liu, H. (2023). A comprehensive phyloge- netic analysis of sars-cov-2: Utilizing a novel and convenient in-house rt-pcr method for characterization without virus culture and bsl-3 facilities.Viruses16

  6. [6]

    Day, W. H. (1987). Computational complexity of inferring phylogenies from dissimilarity matrices.Bulletin of Mathematical Biology49, 461–467. doi:10.1007/BF02458863

  7. [7]

    and Gascuel, O

    Desper, R. and Gascuel, O. (2002). Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.Journal of computational biology : a journal of computational molecular cell biology9, 687–705. doi:10.1089/106652702761034136

  8. [8]

    Eickmeyer, K., Huggins, P., Pachter, L., and Yoshida, R. (2008). On the optimality of the neighbor-joining algorithm. algorithms.Algorithms Mol Biol3

  9. [9]

    Felsenstein, J. (1981). Evolutionary trees from dna sequences: a maximum likelihood approach.Journal of Molecular Evolution

  10. [10]

    Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0.Systematic Biology59, 307–321. doi:10.1093/ sysbio/syq010

  11. [11]

    and Gascuel, O

    Guindon, S. and Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.Systematic Biology52, 696–704. doi:10.1080/ 10635150390235520 16 CHRIS TESKA, KURT PASQUE, RURIKO YOSHIDA, AND BARAN HASHEMI

  12. [12]

    [Dataset] Hashemi, B., Pasque, K., Teska, C., and Yoshida, R. (2025). Tropical attention: Neural algorithmic reasoning for combinatorial algorithms. The Thirty-ninth Annual Conference on Neural Information Processing Systems

  13. [13]

    [Dataset] Ho, J., Kalchbrenner, N., Weissenborn, D., and Salimans, T. (2019). Axial attention in multidimensional transformers

  14. [14]

    Huelsenbeck, J. P. and Ronquist, F. (2001). Mrbayes: Bayesian inference of phylogenetic trees.Bioinformatics17, 754–755. doi:10.1093/bioinformatics/17.8.754

  15. [15]

    and Schröter, B

    Joswig, M. and Schröter, B. (2022). Parametric shortest-path algorithms via tropical geometry.Mathematics of Operations Research47, 2065–2081

  16. [16]

    Jukna, S. (2014). Lower bounds for tropical circuits and dynamic programs.Theory of Computing Systems57, 160–194

  17. [17]

    Kuhner, M. K. and Felsenstein, J. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.Molecular Biology and Evolution11, 459–468. doi:10.1093/oxfordjournals.molbev.a040126

  18. [18]

    P., Larget, B., and Ronquist, F

    Lakner, C., van der Mark, P., Huelsenbeck, J. P., Larget, B., and Ronquist, F. (2008). Efficiency of markov chain monte carlo tree proposals in bayesian phylogenetics.Systematic Biology57, 86–103. doi:10.1080/10635150801886156

  19. [19]

    Lanave, C., Preparata, G., Saccone, C., and Serio, G. (1984). A new method for calculating evolutionary substitution rates.Journal of Molecular Evolution20, 86–93

  20. [20]

    Lefort, V., Desper, R., and Gascuel, O. (2015). Fastme 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program.Molecular Biology and Evolution

  21. [21]

    Ly-Trong, N., Naser-Khdour, S., Lanfear, R., and Minh, B. Q. (2022). AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.Molecular Biology and Evolution39, msac092. doi:10.1093/molbev/msac092

  22. [22]

    and Sturmfels, B

    Maclagan, D. and Sturmfels, B. (2015).Introduction to Tropical Geometry(American Mathematical Society)

  23. [23]

    Monod, A., Lin, B., Yoshida, R., and Kang, Q. (2022). Tropical geometry of phylogenetic tree space: A statistical perspective.arXiv preprint arXiv:1805.12400

  24. [24]

    Nesterenko, L., Blassel, L., Veber, P., and Gascuel, O. (2025). Phyloformer: Fast, accurate, and versatile phylogenetic reconstruction with deep neural networks.Molecular Biology and Evolution42, msaf051. doi:10.1093/molbev/msaf051

  25. [25]

    Nguyen, L.-T. e. a. (2015). Iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.Molecular Biology and Evolution

  26. [26]

    N., Dehal, P

    Price, M. N., Dehal, P. S., and Arkin, A. P. (2010). Fasttree 2–approximately maximum- likelihood trees for large alignments.PLoS ONE

  27. [27]

    A., Monical, C., Delcourt, M., et al

    Ren, Y., Zha, S., Bi, J., Sanchez, J. A., Monical, C., Delcourt, M., et al. (2021). A combinatorial method for connecting bhv spaces representing different numbers of taxa. arXiv preprint arXiv:1708.02626October 30, 2021 update

  28. [28]

    Robinson, D. F. and Foulds, L. R. (1981). Comparison of phylogenetic trees.Bellman Prize in Mathematical Biosciences53, 131–147

  29. [29]

    and Nei, M

    Saitou, N. and Nei, M. (1987). The neighbor-joining method: a new method for recon- structing phylogenetic trees.Molecular Biology and Evolution

  30. [30]

    Soltis, D. E. and Soltis, P. S. (2003). The role of phylogenetics in comparative genetics. Plant Physiology132, 1790–1800. doi:10.1104/pp.103.022509

  31. [31]

    and Sturmfels, B

    Speyer, D. and Sturmfels, B. (2004). The tropical Grassmannian.Advances in Geometry 4, 389–411

  32. [32]

    Stamatakis, A. (2014). Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.Bioinformatics PHYLOGENETIC TREE INFERENCE WITH TROPICAL AXIAL ATTENTION 17

  33. [33]

    N., et al

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. InAdvances in Neural Information Processing Systems. vol. 30, 5998–6008

  34. [34]

    [Dataset] Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., and Chen, L.-C. (2020). Axial-deeplab: Stand-alone axial-attention for panoptic segmentation

  35. [35]

    [Dataset] Wang, P. (2021). axial-attention. https://github.com/lucidrains/ axial-attention

  36. [36]

    and Matsen, F

    Whidden, C. and Matsen, F. A. (2015). Quantifying MCMC exploration of phylogenetic tree space.Systematic Biology64, 472–491. doi:10.1093/sysbio/syv006 (Chris Teska, Kurt Pasque, and Ruriko Yoshida)Department of Operations Research, Naval Post- graduate School, Monterey, CA, USA Email address, Chris Teska:christopher.teska@nps.edu (Baran Hashemi)Max Planck...