pith. machine review for the scientific record. sign in

arxiv: 2601.22866 · v2 · submitted 2026-01-30 · 🧬 q-bio.GN

Recognition: 2 theorem links

· Lean Theorem

Classification of SARS-CoV-2 Variants through The Epistatical Circos Plots with Convolutional Neural Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-16 09:40 UTC · model grok-4.3

classification 🧬 q-bio.GN
keywords SARS-CoV-2variant classificationdirect coupling analysisCircos plotsconvolutional neural networksepistasisgenomicsvariant of concern
0
0 comments X

The pith

SARS-CoV-2 variants are classified with high accuracy by turning epistatic couplings into Circos plots and feeding the images to convolutional neural networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a pipeline that first applies direct coupling analysis to large sets of SARS-CoV-2 genomic sequences to infer pairwise mutational interactions, then renders those interactions as Circos plot images, and finally trains convolutional neural networks on the images to assign sequences to one of four variant classes: Alpha, Delta, Omicron, or Else. The best model reaches a weighted-average F1-score of 98.68 percent with an AUC near 1. A reader would care because rapid, accurate classification from raw sequences could support real-time tracking of which lineages are spreading. The work demonstrates that epistatic signatures captured in these visual representations carry enough lineage-specific information for image-based deep learning to succeed at scale.

Core claim

By converting DCA-inferred mutational couplings from SARS-CoV-2 sequences into Circos images and training CNN models on them, the framework achieves reliable classification of variants into four classes, with the best model attaining a weighted-average F1-score of 98.68±0.75 percent and an AUC close to 1, indicating that the visualized epistatic signatures carry sufficient discriminative information.

What carries the argument

Circos plot images created from direct coupling analysis pairwise couplings, serving as input for convolutional neural networks to classify SARS-CoV-2 variants.

Load-bearing premise

The Circos plots derived from DCA couplings on the available sequences contain enough lineage-specific epistatic information for the CNN to generalize across the four classes despite temporal and sequence-availability biases.

What would settle it

Applying the trained model to an independent set of sequences from a newly emerged variant or sublineage and measuring whether the weighted F1-score falls substantially below 95 percent.

Figures

Figures reproduced from arXiv: 2601.22866 by Bo Jing, Erik Aurell, Hong-Li Zeng, Kai-Rui Zhang.

Figure 1
Figure 1. Figure 1: FIG. 1: Temporal distribution of SARS-CoV-2 genome sequences by variants. The main panel shows the daily [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: Effect of filtering threshold on the number of surviving loci for four variant categories. With the threshold [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3: The graph is an image of the epistasis of data sets, with the line segments indicating the top 200 significant [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4: Training loss and validation accuracy curves of [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5: Confusion matrix of the DenseNet121 model [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6: Receiver operating characteristic (ROC) curves [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7: Visualization of EfficientNet [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

The COVID-19 pandemic has profoundly affected global health, driven by the remarkable transmissibility and mutational adaptability of the SARS-CoV-2 virus. Although five variants of concern, Alpha, Beta, Gamma, Delta, and Omicron, have been identified, the classification task in this study is formulated using four classes: Alpha, Delta, Omicron, and Else, reflecting the sequence availability and temporal coverage of the dataset. Here, we develop an integrative framework that combines direct coupling analysis (DCA), Circos-based visualization, and convolutional neural networks (CNNs) to characterize lineage-specific epistatic signatures from large-scale SARS-CoV-2 genomic sequences. DCA-inferred pairwise mutational couplings were transformed into Circos images, which were then used as inputs for CNN-based classification models. The proposed framework achieved robust variant classification, with the best-performing model reaching a weighted-average F1-score of $98.68\pm 0.75\%$ and an AUC close to 1.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an integrative framework that applies direct coupling analysis (DCA) to SARS-CoV-2 genomic sequences to infer pairwise mutational couplings, converts these into Circos plots, and feeds the resulting images into convolutional neural networks (CNNs) for classification into four categories (Alpha, Delta, Omicron, Else). The central claim is that the Circos visualizations encode lineage-specific epistatic signatures that enable robust classification, with the best model achieving a weighted-average F1-score of 98.68±0.75% and AUC near 1.

Significance. If the reported performance genuinely reflects extraction of functional epistatic patterns rather than phylogenetic or sampling artifacts, the approach could offer a visually interpretable, image-based alternative to sequence-based classifiers for viral variant typing. The integration of DCA, Circos visualization, and CNNs is a coherent pipeline, but the absence of controls for temporal and phylogenetic structure limits the strength of the epistasis interpretation.

major comments (2)
  1. [Abstract] Abstract: the headline performance metrics (F1 98.68±0.75%, AUC≈1) are presented without any description of dataset cardinality, train/test split ratios, cross-validation scheme, or class-balance handling. These omissions are load-bearing because the abstract itself notes unequal sequence availability across lineages and temporal coverage; without these details it is impossible to determine whether the CNN is learning lineage-specific epistasis or simply phylogenetic correlations induced by the pooled alignment.
  2. [Methods/Results] Methods/Results (DCA + Circos pipeline): standard DCA applied to a temporally heterogeneous alignment is expected to recover strong couplings driven by shared ancestry and sampling structure rather than functional constraints. No phylogeny-aware null model, time-stratified ablation, or control that removes phylogenetic signal (e.g., by using only contemporaneous sequences or a phylogenetic generalized least-squares correction) is described. This directly undermines the claim that the Circos plots isolate “lineage-specific epistatic signatures.”
minor comments (2)
  1. [Abstract] Abstract: the class label “Else” is introduced without a precise definition of which sequences it contains or how it differs from the three named variants of concern.
  2. [Figures/Methods] Figure captions and methods: notation for the Circos plot construction (color scale, coupling threshold, image resolution) is not fully specified, hindering reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and describe the revisions we will implement to improve clarity and strengthen the interpretation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline performance metrics (F1 98.68±0.75%, AUC≈1) are presented without any description of dataset cardinality, train/test split ratios, cross-validation scheme, or class-balance handling. These omissions are load-bearing because the abstract itself notes unequal sequence availability across lineages and temporal coverage; without these details it is impossible to determine whether the CNN is learning lineage-specific epistasis or simply phylogenetic correlations induced by the pooled alignment.

    Authors: We agree that the abstract should include these methodological details for full transparency. In the revised manuscript we will expand the abstract to report the total number of sequences (with per-class counts), the train/test split ratios, the cross-validation procedure, and the class-balance strategy (weighted loss function). These additions will allow readers to better evaluate the robustness of the reported performance. revision: yes

  2. Referee: [Methods/Results] Methods/Results (DCA + Circos pipeline): standard DCA applied to a temporally heterogeneous alignment is expected to recover strong couplings driven by shared ancestry and sampling structure rather than functional constraints. No phylogeny-aware null model, time-stratified ablation, or control that removes phylogenetic signal (e.g., by using only contemporaneous sequences or a phylogenetic generalized least-squares correction) is described. This directly undermines the claim that the Circos plots isolate “lineage-specific epistatic signatures.”

    Authors: We acknowledge that DCA on temporally heterogeneous data can capture phylogenetic correlations in addition to functional epistasis. Our current pipeline does not include explicit phylogeny-aware controls or time-stratified ablations. We will add a dedicated limitations paragraph discussing this potential confound and include a supplementary analysis restricted to contemporaneous sequences within each lineage to test whether the Circos-derived features remain discriminative. We maintain that the distinct visual patterns and high classification accuracy provide supporting evidence for lineage-specific signatures, but agree that additional controls will strengthen the epistasis interpretation. revision: partial

Circularity Check

0 steps flagged

No circularity: standard ML pipeline with empirical performance metrics

full rationale

The paper's chain consists of applying DCA to pooled SARS-CoV-2 sequences to obtain couplings, rendering them as Circos images, and training CNN classifiers whose weighted F1-score and AUC are measured on held-out data. These metrics are not equivalent to the inputs by construction, nor do any equations or self-citations reduce the classification result to a tautology or fitted parameter. The derivation remains a conventional data-driven workflow whose outputs depend on the actual sequence patterns and model training rather than definitional identity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the standard assumption that DCA recovers biologically meaningful pairwise couplings and on the paper-specific choice to render those couplings as Circos images suitable for CNNs.

axioms (2)
  • domain assumption Direct coupling analysis accurately infers epistatic couplings from aligned genomic sequences
    Invoked when the abstract states DCA-inferred pairwise mutational couplings are transformed into Circos images.
  • ad hoc to paper Circos plots preserve the essential epistatic signatures needed for CNN classification
    The transformation step is introduced by the authors without independent justification in the abstract.

pith-pipeline@v0.9.0 · 5480 in / 1364 out tokens · 30838 ms · 2026-05-16T09:40:45.362448+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

  1. [1]

    Data Collection This section details the data preprocessing framework used in this study. Specifically, raw SARS-CoV-2 ge- nomic sequences retrieved from GISAID were subjected to systematic cleaning, alignment, and quality filtering to construct high-confidence multiple sequence alignments (MSAs). Subsequently, the Direct Coupling Analysis (DCA) method wa...

  2. [2]

    Wuhan-Hu-1

    Sequence Alignment and loci filtering All remained raw genomic sequences within each vari- ant group were aligned using multiple sequence align- ment (MSA) to ensure positional correspondence across nucleotide sites. Here the online server MAFFT [20, 21] was used to perform the alignment with “Wuhan-Hu-1”

  3. [3]

    unknown nucleotide

    serving as the reference sequence. Such manipula- tion not only speeds up the alignment but also reduces the burden on computational resources. With the obtained MSAs, each sub dataset can be represented as a large matrix S = {σn i |i = 1 , 2, ..., N}, where N denotes the number of aligned genomic se- quences in the sub dataset and L the number of loci. H...

  4. [4]

    Wuhan-Hu- 1

    Direct Coupling Analysis The Direct Coupling Analysis (DCA), otherwise al- ternatively referred to as maxentropy methods, inverse statistical mechanics or model inference in an exponen- tial family, means to learn a probabilistic model of the Gibbs-Boltzmann type from data, and then to use the pa- rameters of this model to predict a quantity of interest. ...

  5. [5]

    It is built on the principle of stacking small 3 ×3 convolutional kernels with stride 1 and padding 1, followed by 2×2 max-pooling layers with stride 2

    VGG The VGGNet family, introduced by Simonyan and Zis- serman in 2014 [45], represents one of the most classical and widely adopted convolutional neural network (CNN) architectures for visual feature extraction. It is built on the principle of stacking small 3 ×3 convolutional kernels with stride 1 and padding 1, followed by 2×2 max-pooling layers with st...

  6. [6]

    Unlike traditional CNNs that rely on standard convolu- tion operations, MobileNet adopts depthwise separable convolutions to decouple spatial and channel-wise filter- ing

    MobileNet MobileNets [46], is a lightweight CNN architecture specifically designed for efficient computation on limited- resource environments while maintaining high accuracy. Unlike traditional CNNs that rely on standard convolu- tion operations, MobileNet adopts depthwise separable convolutions to decouple spatial and channel-wise filter- ing. This oper...

  7. [7]

    in 2017 [48], is a deep CNN that introduces the concept of dense connec- tivity to enhance feature reuse and gradient propagation across layers

    DenseNet DenseNet, proposed by Huang et al. in 2017 [48], is a deep CNN that introduces the concept of dense connec- tivity to enhance feature reuse and gradient propagation across layers. Unlike traditional feed-forward architec- tures, where each layer receives input only from its im- mediate predecessor, DenseNet establishes direct connec- tions from e...

  8. [8]

    EfficientNet EfficientNetB0, proposed by Tan and Le in 2019 [49], represents a new generation of CNN optimized through compound scaling. Unlike traditional approaches that scale depth, width, or input resolution independently, Ef- ficientNet employs a unified scaling strategy that jointly adjusts all three dimensions using a small set of fixed coefficient...

  9. [9]

    Therefore, a series of experiments were conducted to se- lect the most suitable hyperparameter configuration for all CNN architectures

    Model Training Configuration The setting of hyperparameters plays a crucial role in determining the performance of deep learning models. Therefore, a series of experiments were conducted to se- lect the most suitable hyperparameter configuration for all CNN architectures. Specifically, the learning rate, batch size, and number of iterations were systemati...

  10. [10]

    Comparative Analysis between Random Initialization and Transfer Learning The results in Table IV clearly shows the substan- tial performance improvement achieved through trans- fer learning. When initialized with pre-trained Ima- geNet weights, all five CNN architectures exhibited faster convergence, higher classification accuracy, and stronger generaliza...

  11. [11]

    https://covid19.who.int/

    World health organization. https://covid19.who.int/. Accessed June 2025

  12. [12]

    Sars-cov-2 variants, spike mutations and im- mune escape

    William T Harvey, Alessandro M Carabelli, Ben Jack- son, Ravindra K Gupta, Emma C Thomson, Ewan M Harrison, Catherine Ludden, Richard Reeve, Andrew Rambaut, COVID-19 Genomics UK (COG-UK) Consor- tium, et al. Sars-cov-2 variants, spike mutations and im- mune escape. Nature reviews microbiology , 19(7):409– 424, 2021

  13. [13]

    Freunde von GISAID e.V. Gisaid. https://www.gisaid. org/. Accessed 2022

  14. [14]

    Compensatory epistasis maintains ace2 affinity in sars-cov-2 omicron ba

    Alief Moulana, Thomas Dupic, Angela M Phillips, Jef- frey Chang, Serafina Nieves, Anne A Roffler, Allison J Greaney, Tyler N Starr, Jesse D Bloom, and Michael M Desai. Compensatory epistasis maintains ace2 affinity in sars-cov-2 omicron ba. 1. Nature communications, 13(1):7011, 2022

  15. [15]

    Epistasis lowers the genetic barrier to sars-cov-2 neutralizing antibody escape

    Leander Witte, Viren A Baharani, Fabian Schmidt, Zi- jun Wang, Alice Cho, Raphael Raspe, Camila Guzman- Cardozo, Frauke Muecksch, Marie Canis, Debby J Park, et al. Epistasis lowers the genetic barrier to sars-cov-2 neutralizing antibody escape. Nature communications, 14(1):302, 2023

  16. [16]

    Improved contact prediction in proteins: Using pseudolikelihoods to infer potts models

    Magnus Ekeberg, Cecilia L¨ ovkvist, Yueheng Lan, Martin Weigt, and Erik Aurell. Improved contact prediction in proteins: Using pseudolikelihoods to infer potts models. Phys. Rev. E , 87:012707, January 2013

  17. [17]

    Fast pseudolikelihood maximization for direct-coupling anal- ysis of protein structure from many homologous amino- acid sequences

    Magnus Ekeberg, Tuomo Hartonen, and Erik Aurell. Fast pseudolikelihood maximization for direct-coupling anal- ysis of protein structure from many homologous amino- acid sequences. J. Comput. Phys. , 276:341–356, 2014

  18. [18]

    High-resolution pro- tein complexes from integrating genomic information with molecular simulation

    Alexander Schug, Martin Weigt, Jos´ e N Onuchic, Ter- ence Hwa, and Hendrik Szurmant. High-resolution pro- tein complexes from integrating genomic information with molecular simulation. Proceedings of the National Academy of Sciences, 106(52):22124–22129, 2009

  19. [20]

    Temporal epistasis inference from more than 3,500,000 SARS-CoV-2 genomic sequences

    Hong-Li Zeng, Yue Liu, Vito Dichio, and Erik Au- rell. Temporal epistasis inference from more than 3,500,000 SARS-CoV-2 genomic sequences. Phys. Rev. E, 106:044409, Oct 2022

  20. [21]

    Two fitness inference schemes compared using allele frequencies from 1068 391 sequences sampled in the uk during the covid-19 pandemic.Physical Biology, 22(1):016003, nov 2025

    Hong-Li Zeng, Cheng-Long Yang, Bo Jing, John Barton, and Erik Aurell. Two fitness inference schemes compared using allele frequencies from 1068 391 sequences sampled in the uk during the covid-19 pandemic.Physical Biology, 22(1):016003, nov 2025

  21. [22]

    Circos: an in- formation aesthetic for comparative genomics

    Martin Krzywinski, Jacqueline Schein, Inanc Birol, Joseph Connors, Randy Gascoyne, Doug Horsman, Steven J Jones, and Marco A Marra. Circos: an in- formation aesthetic for comparative genomics. Genome research, 19(9):1639–1645, 2009

  22. [23]

    Deep learning

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015

  23. [24]

    Imagenet classification with deep convolutional neu- ral networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin- ton. Imagenet classification with deep convolutional neu- ral networks. Advances in neural information processing systems, 25, 2012

  24. [25]

    Improved protein structure prediction using po- tentials from deep learning

    Andrew W Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Au- gustin ˇZ´ ıdek, Alexander WR Nelson, Alex Bridgland, et al. Improved protein structure prediction using po- tentials from deep learning. Nature, 577(7792):706–710, 2020

  25. [26]

    Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome.Genome biology, 21(1):81, 2020

    Jacob Schreiber, Timothy Durham, Jeffrey Bilmes, and William Stafford Noble. Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome.Genome biology, 21(1):81, 2020

  26. [27]

    Predicting effects of noncoding variants with deep learning–based sequence model

    Jian Zhou and Olga G Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model. Nature methods, 12(10):931–934, 2015

  27. [28]

    Alejandro Lopez-Rincon, Alberto Tonda, Lucero Mendoza-Maldonado, Daphne G. J. C. Mulders, Richard Molenkamp, Carmina A. Perez-Romero, Eric Claassen, Johan Garssen, and Aletta D. Kraneveld. Classification and specific primer design for accurate detection of sars-cov-2 using deep learning. Scientific Reports , 11(1):947, Jan 2021

  28. [29]

    The lag in SARS-CoV-2 genome submissions to gisaid

    Kishan Kalia, Gayatri Saberwal, and Gaurav Sharma. The lag in SARS-CoV-2 genome submissions to gisaid. Nature Biotechnology, 39(9):1058–1060, 2021

  29. [30]

    Mafft online service: multiple sequence alignment, interactive sequence choice and visualization

    Kazutaka Katoh, John Rozewicki, and Kazunori D Ya- mada. Mafft online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings in bioinformatics, 20(4):1160–1166, 2019

  30. [31]

    aleaves facilitates on- demand exploration of metazoan gene family trees on mafft sequence alignment server with enhanced interac- tivity

    Shigehiro Kuraku, Christian M Zmasek, Osamu Nishimura, and Kazutaka Katoh. aleaves facilitates on- demand exploration of metazoan gene family trees on mafft sequence alignment server with enhanced interac- tivity. Nucleic acids research, 41(W1):W22–W28, 2013

  31. [32]

    A new coronavirus associ- ated with human respiratory disease in china

    Fan Wu, Su Zhao, Bin Yu, Yan-Mei Chen, Wen Wang, Zhi-Gang Song, Yi Hu, Zhao-Wu Tao, Jun-Hua Tian, Yuan-Yuan Pei, et al. A new coronavirus associ- ated with human respiratory disease in china. Nature, 579(7798):265–269, 2020

  32. [33]

    Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis be- tween eight viral genes

    Hong-Li Zeng, Vito Dichio, Edwin Rodr´ ıguez Horta, Kaisa Thorell, and Erik Aurell. Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis be- tween eight viral genes. Proceedings of the National Academy of Sciences, 117(49):31519–31526, 2020

  33. [34]

    White, Hendrik Szurmant, James A

    Martin Weigt, Robert A. White, Hendrik Szurmant, James A. Hoch, and Terence Hwa. Identification of di- rect residue contacts in protein–protein interaction by 16 message passing. Proceedings of the National Academy of Sciences, 106(1):67–72, 2009

  34. [35]

    Marks, Chris Sander, Riccardo Zecchina, Jos´ e N

    Faruck Morcos, Andrea Pagnani, Bryan Lunt, Arianna Bertolino, Debora S. Marks, Chris Sander, Riccardo Zecchina, Jos´ e N. Onuchic, Terence Hwa, and Mar- tin Weigt. Direct-coupling analysis of residue coevolu- tion captures native contacts across many protein fam- ilies. Proceedings of the National Academy of Sciences , 108(49):E1293–E1301, 2011

  35. [36]

    Sequence co-evolution gives 3d contacts and structures of protein complexes

    Thomas A Hopf, Charlotta P I Sch¨ arfe, Jo˜ao P G L M Rodrigues, Anna G Green, Oliver Kohlbacher, Chris Sander, Alexandre M J J Bonvin, and Debora S Marks. Sequence co-evolution gives 3d contacts and structures of protein complexes. eLife, 3:e03430, Sep 2014

  36. [37]

    Chau Nguyen and Johannes Berg

    Riccardo Zecchina H. Chau Nguyen and Johannes Berg. Inverse statistical problems: from the inverse ising prob- lem to data science. Advances in Physics, 66(3):197–261, 2017

  37. [38]

    Inverse statistical physics of protein sequences: a key issues review

    Simona Cocco, Christoph Feinauer, Matteo Figliuzzi, R ´ mi Monasson, and Martin Weigt. Inverse statistical physics of protein sequences: a key issues review. Re- ports on progress in physics , 81:032601, Mar 2018

  38. [39]

    The Maximum Entropy Fallacy Redux? PLoS computational biology, 12:e1004777, May 2016

    Erik Aurell. The Maximum Entropy Fallacy Redux? PLoS computational biology, 12:e1004777, May 2016

  39. [40]

    Inferring Contacting Residues Within and Between Proteins: What Do the Probabil- ities Mean? PLoS computational biology , 12:e1004726, May 2016

    Erik van Nimwegen. Inferring Contacting Residues Within and Between Proteins: What Do the Probabil- ities Mean? PLoS computational biology , 12:e1004726, May 2016

  40. [41]

    Attainment of quasi linkage equilibrium when gene frequencies are changing by natural selection

    Motoo Kimura. Attainment of quasi linkage equilibrium when gene frequencies are changing by natural selection. Genetics, 52(5):875–890, 1965

  41. [42]

    Neher and Boris I

    Richard A. Neher and Boris I. Shraiman. Competition between recombination and epistasis can cause a transi- tion from allele to genotype selection. Proc. Natl. Acad. Sci., 106(16):6866–6871, 2009

  42. [43]

    Neher and Boris I

    Richard A. Neher and Boris I. Shraiman. Statistical ge- netics and evolution of quantitative traits. Rev. Mod. Phys., 83:1283–1300, Nov 2011

  43. [44]

    Dca for genome-wide epistasis analysis: the statistical genetics perspective

    Chen-Yi Gao, Fabio Cecconi, Angelo Vulpiani, Hai-Jun Zhou, and Erik Aurell. Dca for genome-wide epistasis analysis: the statistical genetics perspective. Physical biology, 16(2):026002, 2019

  44. [45]

    Inferring genetic fitness from genomic data

    Hong-Li Zeng and Erik Aurell. Inferring genetic fitness from genomic data. Phys. Rev. E, 101:052409, May 2020

  45. [46]

    Statistical genetics in and out of quasi-linkage equilibrium

    Vito Dichio, Hong-Li Zeng, and Erik Aurell. Statistical genetics in and out of quasi-linkage equilibrium. Reports on Progress in Physics , 86(5):052601, apr 2023

  46. [47]

    Emergence of clones in sexual pop- ulations

    Richard A Neher, Marija Vucelja, Mark Mezard, and Boris I Shraiman. Emergence of clones in sexual pop- ulations. Journal of Statistical Mechanics: Theory and Experiment, 2013(01):P01008, jan 2013

  47. [48]

    Statistical analysis of non-lattice data

    Julian Besag. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society: Series D (The Statistician), 24(3):179–195, 1975

  48. [49]

    High-dimensional Ising model selection using L- 1 regularized logistic regression

    Pradeep Ravikumar, Martin J Wainwright, and John D Lafferty. High-dimensional Ising model selection using L- 1 regularized logistic regression. The Annals of Statistics, 38(3):1287–1319, 2010

  49. [50]

    Inverse ising inference using all the data

    Erik Aurell and Magnus Ekeberg. Inverse ising inference using all the data. Physical review letters, 108(9):090201, 2012

  50. [51]

    Improved contact prediction in proteins: using pseudolikelihoods to infer potts models

    Magnus Ekeberg, Cecilia L¨ ovkvist, Yueheng Lan, Martin Weigt, and Erik Aurell. Improved contact prediction in proteins: using pseudolikelihoods to infer potts models. Physical Review E, 87(1):012707, 2013

  51. [52]

    Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences

    Magnus Ekeberg, Tuomo Hartonen, and Erik Aurell. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. Journal of Computational Physics, 276:341–356, 2014

  52. [53]

    Data from ”xiaohuolongx/circos-data- set.” github

    JingBo. Data from ”xiaohuolongx/circos-data- set.” github. https://github.com/xiaohuolongx/ circos-data-set.git, 2023

  53. [54]

    A survey of convolutional neural networks: analysis, applications, and prospects

    Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems , 2021

  54. [55]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014

  55. [56]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An- dreetto, and Hartwig Adam. Mobilenets: Efficient con- volutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 , 2017

  56. [57]

    Searching for mobilenetv3

    Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF in- ternational conference on computer vision , pages 1314– 1324, 2019

  57. [58]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on com- puter vision and pattern recognition , pages 4700–4708, 2017

  58. [59]

    Efficientnet: Rethinking model scaling for convolutional neural networks

    Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In In- ternational conference on machine learning , pages 6105–