Recognition: unknown
HyperEvoGen: Exploring deep phylogeny using non-Euclidean variational inference
Pith reviewed 2026-05-08 08:38 UTC · model grok-4.3
The pith
HyperEvoGen embeds protein sequences in hyperbolic space so latent distances scale with true evolutionary divergence and preserve phylogenetic structure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HyperEvoGen is a Poincaré variational autoencoder with adversarial training, hyperbolic latent geometry, and a compound loss function that learns evolutionarily meaningful representations from single-family alignments. The arrangement of protein sequences in this embedding preserves phylogenetic structure and produces latent distances which scale with true evolutionary divergence, enabling fast scalable modeling that outperforms standard p-distance or substitution-corrected methods on deep divergences.
What carries the argument
Poincaré variational autoencoder whose hyperbolic latent geometry and compound loss embed sequences to maintain hierarchical relatedness and produce distances proportional to evolutionary divergence.
If this is right
- Ancestral sequence reconstructions become more accurate than those from standard distance-based methods on simulated deep phylogenies.
- Sequence generation achieves higher quality than Potts models while using substantially less training time.
- Large protein families can be modeled scalably while retaining hierarchical evolutionary structure in the embeddings.
- Latent distances provide a non-saturating metric for quantifying evolutionary divergence between sequences.
Where Pith is reading between the lines
- The same hyperbolic embeddings could be tested for compatibility with structural data to refine divergence estimates.
- If latent distances reliably track divergence, the model might support inference of branch lengths without separate substitution models.
- Extending the single-family training regime to multi-family alignments could reveal whether conserved hyperbolically embedded motifs appear across distant homologs.
Load-bearing premise
The hyperbolic latent geometry and compound loss function can extract phylogenetically meaningful representations from single-family alignments alone.
What would settle it
On the Potts-coupled simulation benchmarks, ancestral reconstruction accuracy no higher than that of conventional p-distance or Jukes-Cantor baselines would falsify the central claim.
read the original abstract
Homologous proteins evolve from a common ancestral sequence, constrained by intricate patterns of co-evolving residues. Accurate reconstruction of evolutionary histories remains a challenge, primarily due to the inability of the existing approaches to capture long-range coevolutionary ties and lack of a precise metric to represent the evolutionary distance between sequences. Standard approaches are based on p-distance or substitution-corrected measures such as Jukes-Cantor. These methods saturate in cases of deep evolutionary divergence, losing all evolutionary signal after enough time. We present HyperEvoGen, a Poincar\'e variational autoencoder with adversarial training, hyperbolic latent geometry, and a compound loss function that learns evolutionarily meaningful representations from single-family alignments. The arrangement of protein sequences in HyperEvoGen's hyperbolic embedding aims to preserve phylogenetic structure and produce latent distances which scale with true evolutionary divergence. HyperEvoGen enables fast, scalable modeling of protein evolution while preserving hierarchical relatedness in a geometry-aware representation. On Potts-coupled simulation benchmarks, it produces more accurate ancestral reconstructions than conventional baselines, and offers higher-quality sequence generation with less training time than Potts models. This combination of accuracy and throughput supports large-family evolutionary studies and accelerates design-oriented applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces HyperEvoGen, a Poincaré variational autoencoder with adversarial training, hyperbolic latent geometry, and a compound loss function. It learns embeddings from single-family protein alignments such that the arrangement of sequences preserves phylogenetic structure and latent distances scale with true evolutionary divergence. The work claims more accurate ancestral sequence reconstruction on Potts-coupled simulation benchmarks than conventional baselines, plus higher-quality sequence generation with less training time than Potts models.
Significance. If the central claims hold, the method would address saturation of standard distance metrics (p-distance, Jukes-Cantor) at deep divergences by exploiting hyperbolic geometry's suitability for hierarchical data. This could enable scalable modeling of long-range coevolution from alignments alone and support large-family phylogenetic studies as well as design applications. The simulation benchmark setup supplies a falsifiable test of whether latent distances track divergence.
major comments (2)
- [Abstract] Abstract: performance claims on Potts-coupled simulation benchmarks are stated without any quantitative results, error bars, specific metrics, or derivation showing how the compound loss produces distances that scale with divergence.
- [Abstract] The abstract frames the representations as 'evolutionarily meaningful' yet supplies no independent validation that the latent metric is not simply recovering parameters already fitted by the compound loss; this risks circular evaluation.
minor comments (1)
- Notation for the compound loss weights and the precise form of the hyperbolic distance are not introduced in the provided text, making it hard to reproduce the scaling claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight opportunities to strengthen the clarity of our claims in the abstract. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: performance claims on Potts-coupled simulation benchmarks are stated without any quantitative results, error bars, specific metrics, or derivation showing how the compound loss produces distances that scale with divergence.
Authors: We agree that the abstract would be improved by including concise quantitative support for the performance claims. The full manuscript already reports specific metrics (e.g., ancestral reconstruction accuracy and generation quality) with error bars from repeated simulations in the Results section, along with a description of the compound loss components in Methods that are intended to promote distance scaling with divergence. In the revision we will incorporate key numerical highlights and a brief statement on the loss design into the abstract itself. revision: yes
-
Referee: [Abstract] The abstract frames the representations as 'evolutionarily meaningful' yet supplies no independent validation that the latent metric is not simply recovering parameters already fitted by the compound loss; this risks circular evaluation.
Authors: We appreciate the concern about potential circularity. While the compound loss is constructed to encourage phylogenetic structure, the primary validation relies on an independent downstream task: accuracy of ancestral sequence reconstruction on simulated data whose true phylogenies and divergence times are generated by the Potts model and are never seen during training. This provides an external check that the learned metric captures evolutionary signal beyond the loss terms. We will revise the abstract to explicitly distinguish the loss objective from this independent benchmark validation. revision: yes
Circularity Check
No significant circularity
full rationale
The paper introduces HyperEvoGen as a Poincaré VAE trained on single-family alignments to produce hyperbolic embeddings that preserve phylogenetic structure. Claims of improved ancestral reconstruction and sequence generation are evaluated on independent Potts-coupled simulation benchmarks with known ground-truth phylogenies and divergence times. No equations, loss terms, or distance metrics are shown to reduce by construction to the fitted parameters themselves; the model is optimized to capture co-evolution and hierarchy, while benchmark metrics (reconstruction accuracy, generation quality) are computed against external simulation labels. No self-citation chains or uniqueness theorems are invoked as load-bearing premises. The derivation is therefore self-contained against the stated external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- compound loss weights
axioms (1)
- domain assumption Hyperbolic geometry preserves hierarchical phylogenetic structure better than Euclidean space for sequence embeddings
Reference graph
Works this paper leans on
-
[1]
Oxford University Press, ??? (2000).https://books.google.com/books?id=vtWW9bmVd1IC
Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics. Oxford University Press, ??? (2000).https://books.google.com/books?id=vtWW9bmVd1IC
2000
-
[2]
Bioengineer- ing11(5), 480 (2024) https://doi.org/10.3390/bioengineering11050480
Zou, Y., Zhang, Z., Zeng, Y., Hu, H., Hao, Y., Huang, S., Li, B.: Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineer- ing11(5), 480 (2024) https://doi.org/10.3390/bioengineering11050480 . Number:
-
[3]
PLoS Biology9(3), 1000602 (2011) https: //doi.org/10.1371/journal.pbio.1000602
Philippe, H., Brinkmann, H., Lavrov, D.V., Littlewood, D.T.J., Manuel, M., W¨ orheide, G., Baurain, D.: Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough. PLoS Biology9(3), 1000602 (2011) https: //doi.org/10.1371/journal.pbio.1000602 . Accessed 2025-08-11
-
[4]
Journal of Molecular Evolution92(2), 181–206 (2024) https://doi
Sennett, M.A., Theobald, D.L.: Extant Sequence Reconstruction: The Accuracy of Ancestral Sequence Reconstructions Evaluated by Extant Sequence Cross- Validation. Journal of Molecular Evolution92(2), 181–206 (2024) https://doi. org/10.1007/s00239-024-10162-3 . Accessed 2025-08-05
-
[5]
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstruct- ing phylogenetic trees. Molecular Biology and Evolution4(4), 406–425 (1987) https://doi.org/10.1093/oxfordjournals.molbev.a040454
-
[6]
OUP Oxford, ??? (2007)
Liberles, D.A.: Ancestral Sequence Reconstruction. OUP Oxford, ??? (2007). Google-Books-ID: G3YTDAAAQBAJ
2007
-
[7]
Current opinion in structural biology43, 55–62 (2017) https://doi.org/10.1016/j.sbi.2016.11.004
Levy, R.M., Haldane, A., Flynn, W.F.: Potts Hamiltonian models of protein co- variation, free energy landscapes, and evolutionary fitness. Current opinion in structural biology43, 55–62 (2017) https://doi.org/10.1016/j.sbi.2016.11.004 . Accessed 2025-08-05
-
[8]
Nature Biotechnology35(2), 128–135 (2017) https://doi.org/10.1038/nbt.3769 17
Hopf, T.A., Ingraham, J.B., Poelwijk, F.J., Sch¨ arfe, C.P.I., Springer, M., Sander, C., Marks, D.S.: Mutation effects predicted from sequence co-variation. Nature Biotechnology35(2), 128–135 (2017) https://doi.org/10.1038/nbt.3769 17
-
[9]
Nature Communications12(1), 6302 (2021) https://doi.org/10.1038/s41467-021-26529-9
McGee, F., Hauri, S., Novinger, Q., Vucetic, S., Levy, R.M., Carnevale, V., Hal- dane, A.: The generative capacity of probabilistic protein sequence models. Nature Communications12(1), 6302 (2021) https://doi.org/10.1038/s41467-021-26529-9 . Accessed 2024-08-06
-
[10]
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. arXiv. arXiv:1312.6114 [stat] (2022). https://doi.org/10.48550/arXiv.1312.6114 . http: //arxiv.org/abs/1312.6114 Accessed 2025-08-11
-
[11]
Nickel, M., Kiela, D.: Poincar´ e Embeddings for Learning Hierarchical Representa- tions. arXiv. arXiv:1705.08039 [cs] (2017). https://doi.org/10.48550/arXiv.1705. 08039 . http://arxiv.org/abs/1705.08039 Accessed 2025-08-13
-
[12]
Mathieu, E., Lan, C.L., Maddison, C.J., Tomioka, R., Teh, Y.W.: Continuous Hierarchical Representations with Poincar\’e Variational Auto-Encoders (2019). https://arxiv.org/abs/1901.06033v3 Accessed 2024-09-04
-
[13]
Biology Methods and Protocols6(1), 006 (2021) https://doi.org/10.1093/biomethods/bpab006
Matsumoto, H., Mimori, T., Fukunaga, T.: Novel metric for hyperbolic phy- logenetic tree embeddings. Biology Methods and Protocols6(1), 006 (2021) https://doi.org/10.1093/biomethods/bpab006 . Accessed 2025-08-19
-
[14]
Biology11(9), 1256 (2022) https://doi.org/ 10.3390/biology11091256
Jiang, Y., Tabaghi, P., Mirarab, S.: Learning Hyperbolic Embedding for Phyloge- netic Tree Placement and Updates. Biology11(9), 1256 (2022) https://doi.org/ 10.3390/biology11091256 . Accessed 2025-08-12
-
[15]
PLOS Computational Biology19(4), 1011084 (2023) https://doi.org/10.1371/journal.pcbi.1011084
Macaulay, M., Darling, A., Fourment, M.: Fidelity of hyperbolic space for Bayesian phylogenetic inference. PLOS Computational Biology19(4), 1011084 (2023) https://doi.org/10.1371/journal.pcbi.1011084 . Accessed 2026-03-19
-
[16]
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv. arXiv:1406.2661 [cs, stat] (2014). https://doi.org/10.48550/arXiv.1406.2661 . http://arxiv.org/abs/1406.2661 Accessed 2024-09-04
work page internal anchor Pith review doi:10.48550/arxiv.1406.2661 2014
-
[17]
Nature Reviews Genetics14(4), 249–261 (2013) https://doi.org/10.1038/nrg3414
Juan, D., Pazos, F., Valencia, A.: Emerging methods in protein co-evolution. Nature Reviews Genetics14(4), 249–261 (2013) https://doi.org/10.1038/nrg3414 . Accessed 2025-08-05
-
[18]
Domingo, J., Baeza-Centurion, P., Lehner, B.: The Causes and Consequences of Genetic Interactions (Epistasis). Annual Review of Genomics and Human Genet- ics20, 433–460 (2019) https://doi.org/10.1146/annurev-genom-083118-014857
-
[19]
Molecular Biology and Evolution33(1), 268–280 (2016) https://doi.org/10.1093/molbev/msv211
Figliuzzi, M., Jacquier, H., Schug, A., Tenaillon, O., Weigt, M.: Coevolution- ary Landscape Inference and the Context-Dependence of Mutations in Beta- Lactamase TEM-1. Molecular Biology and Evolution33(1), 268–280 (2016) https://doi.org/10.1093/molbev/msv211 . Accessed 2025-08-11 18
-
[20]
Yule, G.U.: II.—A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F. R. S. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character213(402-410), 21–87 (1997) https://doi.org/10.1098/rstb.1925.0002 . Accessed 2025-08-19
-
[21]
Accessed 2025-08-19
Wong, T.K.F., Ly-Trong, N., Ren, H., Ba˜ nos, H., Roger, A.J., Susko, E., Bielow, C., Maio, N.D., Goldman, N., Hahn, M.W., Huttley, G., Lanfear, R., Minh, B.Q.: IQ-TREE 3: Phylogenomic Inference Software using Complex Evolutionary Models (2025). Accessed 2025-08-19
2025
-
[22]
Nature Methods14(6), 587–589 (2017) https://doi.org/10.1038/nmeth.4285
Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K.F., Haeseler, A., Jermiin, L.S.: ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods14(6), 587–589 (2017) https://doi.org/10.1038/nmeth.4285 . Accessed 2025-08-19
-
[23]
Horta, E.R., Lage-Castellanos, A., Mulet, R.: Ancestral Sequence Reconstruc- tion for Co-evolutionary models. Journal of Statistical Mechanics: Theory and Experiment2022(1), 013502 (2022) https://doi.org/10.1088/1742-5468/ac3d93 . arXiv:2108.03801 [cond-mat]. Accessed 2025-08-12
-
[24]
Nucleic Acids Research53(D1), 523–534 (2025) https://doi.org/10.1093/nar/ gkae997
Paysan-Lafosse, T., Andreeva, A., Blum, M., Chuguransky, S., Grego, T., Pinto, B., Salazar, G., Bileschi, M., Llinares-L´ opez, F., Meng-Papaxanthos, L., Colwell, L., Grishin, N., Schaeffer, R.D., Clementel, D., Tosatto, S.E., Sonnhammer, E., Wood, V., Bateman, A.: The Pfam protein families database: embracing AI/ML. Nucleic Acids Research53(D1), 523–534 ...
-
[25]
Computer Physics Communications260, 107312 (2021) https://doi.org/10.1016/j.cpc.2020.107312
Haldane, A., Levy, R.M.: Mi3-GPU: MCMC-based inverse Ising inference on GPUs for protein covariation analysis. Computer Physics Communications260, 107312 (2021) https://doi.org/10.1016/j.cpc.2020.107312 . Accessed 2025-08-05
-
[26]
Nature Methods15(10), 816–822 (2018) https://doi.org/10.1038/s41592-018-0138-4
Riesselman, A.J., Ingraham, J.B., Marks, D.S.: Deep generative models of genetic variation capture the effects of mutations. Nature Methods15(10), 816–822 (2018) https://doi.org/10.1038/s41592-018-0138-4
-
[27]
Nature Reviews Genetics5(5), 366–375 (2004) https://doi.org/10
Thornton, J.W.: Resurrecting ancient genes: experimental analysis of extinct molecules. Nature Reviews Genetics5(5), 366–375 (2004) https://doi.org/10. 1038/nrg1324 . Accessed 2025-08-19
2004
-
[28]
Molecular Phylogenetics and Evolution214, 108473 (2026) https://doi.org/10.1016/j.ympev.2025.108473
Ferreiro, D., Pazos, E., Arenas, M.: Trends in substitution models of protein evolution for phylogenetic inference. Molecular Phylogenetics and Evolution214, 108473 (2026) https://doi.org/10.1016/j.ympev.2025.108473 . Accessed 2026-03- 19
-
[29]
Di Bari, L., Bisardi, M., Cotogno, S., Weigt, M., Zamponi, F.: Emergent time scales of epistasis in protein evolution. Proceedings of the National Academy of Sciences121(40), 2406807121 (2024) https://doi.org/10.1073/pnas.2406807121 . 19 Accessed 2025-08-13
-
[30]
Nishikawa, Mustafa Acar, and David A
Nishikawa, K.K., Hoppe, N., Smith, R., Bingman, C., Raman, S.: Epistasis shapes the fitness landscape of an allosteric specificity switch. Nature Communications 12(1), 5562 (2021) https://doi.org/10.1038/s41467-021-25826-7 . Accessed 2025- 08-19 20
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.