Interpretable enzyme function prediction via sparse autoencoder features of ESMC across the microbial protein universe

Junqing Wang; Wanyu Cheng; Yingchao Liu; Yue Hu

arxiv: 2606.12209 · v1 · pith:KQ5RITZ4new · submitted 2026-06-10 · 🧬 q-bio.QM

Interpretable enzyme function prediction via sparse autoencoder features of ESMC across the microbial protein universe

Yue Hu , Wanyu Cheng , Junqing Wang , Yingchao Liu This is my paper

Pith reviewed 2026-06-27 07:38 UTC · model grok-4.3

classification 🧬 q-bio.QM

keywords enzyme function predictionsparse autoencoderprotein language modelmicrobial enzymesEC number classificationinterpretable featuresdark matter proteinsESMC

0 comments

The pith

Sparse autoencoder features from ESMC predict enzyme commission numbers at 78.9% top-1 accuracy without task-specific training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that binary features extracted via sparse autoencoder from ESMC protein embeddings classify microbial enzymes into EC subclasses with high accuracy. These features outperform sequence baselines and maintain performance when predicting for enzyme classes held out entirely from evaluation. Each feature corresponds to an annotated biological concept such as a catalytic geometry or binding fold, which supplies the interpretability. The method is positioned as a route to annotate unknown proteins across large microbial sequence collections without additional model training.

Core claim

ESMC-SAE binary features achieve 78.9% top-1 and 88.5% top-5 accuracy on 4,868 enzymes spanning 161 EC3 subclasses. In leave-one-EC3-class-out tests they recover the correct EC1 superclass for novel classes in 47.7% of cases. The features that drive predictions align with established mechanisms including catalytic triad geometry for hydrolases, NAD(P)H-binding Rossmann folds for oxidoreductases, and phosphate-binding P-loops for transferases.

What carries the argument

The 16,384-dimensional sparse autoencoder codebook applied to ESMC-6B embeddings, where each binary dimension is treated as an annotated biological concept.

Load-bearing premise

The GPT-5 annotations of the SAE features correctly identify mechanistically relevant biological concepts that generalize to enzyme classes absent from the evaluation set.

What would settle it

A controlled test in which the top-ranked features for a specific EC subclass, such as those annotated as catalytic triad geometry, are ablated and accuracy for that subclass drops to baseline levels while other subclasses remain unaffected.

Figures

Figures reproduced from arXiv: 2606.12209 by Junqing Wang, Wanyu Cheng, Yingchao Liu, Yue Hu.

**Figure 1.** Figure 1: EC3 prediction benchmark. (A) 80/20 stratified evaluation across five methods, 161 EC3 classes, 974 test proteins. (B) Leave-one-EC3-class-out → EC1 recovery (60 classes). Values above bars show absolute accuracy; values inside bars show fold-improvement over random baseline (0.143). similarity to the training set, defining six bins from < 0.20 (the “darkest” regime) to ≥ 0.65 (close homologs). Bin sizes … view at source ↗

**Figure 2.** Figure 2: EC3 prediction stratified by sequence similarity to training set. Top5 accuracy across six 3-mer Jaccard bins. BLASTp performs well when homologs exist but fails entirely for 12.6% of test proteins (no hits). ESMC-SAE provides predictions for 100% of queries with consistent accuracy across all bins. Bin sizes (BLASTp-hit proteins): n=594, 80, 44, 28, 19, 86 (+123 no-hit). 2.4 Generalization to unseen enzy… view at source ↗

**Figure 3.** Figure 3: Leave-one-EC3-class-out analysis and full method comparison. (A) EC1 recovery confusion matrix for ESMC-SAE binary features. Diagonal entries show correct EC1 assignment when a complete EC3 subclass is held out. Hydrolases (EC3) show strongest recovery (0.68). (B) Full method comparison across all evaluated approaches including BLASTp. The 12.6% no-hit rate for BLASTp is annotated. 2.5 SAE features driving… view at source ↗

**Figure 4.** Figure 4: Top SAE features discriminating each EC1 class. Mutual information scores for the 6 most discriminative features per enzyme class, annotated with GPT-5 biological descriptions and color-coded by feature category. Features correspond to mechanistically interpretable concepts: catalytic triad geometry for hydrolases, NAD(P)H-binding Rossmann folds for oxidoreductases, phosphate-binding P-loops for transfer… view at source ↗

**Figure 5.** Figure 5: Global survey of microbial enzyme dark matter in the ESM Atlas. (A) Distribution of 169,859 dark enzyme-like cluster representatives by EC1 class, identified from Pfam keyword matching. Hydrolases (9,847) and transferases (5,706) dominate. (B) Phylum-level taxonomic distribution. Pseudomonadota, Actinomycetota, and Bacillota account for 57% of dark enzyme candidates. 60,661 candidates have retrievable sequ… view at source ↗

read the original abstract

Microbial genomes and metagenomes contain millions of proteins whose enzymatic functions remain unknown, the enzyme dark matter. While deep learning has improved protein function prediction, most methods are black boxes relying on sequence or structural similarity, limiting discovery of novel catalytic activities. The ESMC-6B protein language model and its sparse autoencoder with a 16,384-dimensional codebook of interpretable biological concepts, each annotated by GPT-5, creates a new opportunity: using these features directly as semantic signatures for enzyme function. Here, we show that ESMC-SAE features enable accurate and interpretable enzyme commission (EC) number prediction without task-specific training or GPU-intensive computation. On a balanced benchmark of 4,868 microbial SwissProt enzymes across 161 EC3 subclasses, ESMC-SAE binary features achieve 78.9% top-1 and 88.5% top-5 accuracy, 37.6% higher than 3-mer baselines (57.3%). In leave-one-EC3-class-out evaluation simulating discovery of novel enzyme classes, SAE features recover the EC1 superclass in 47.7% of cases (3.3x random, 14.3%), versus 26.6% for sequence methods. Discriminative features correspond to mechanistically interpretable concepts: catalytic triad geometry for hydrolases, NAD(P)H-binding Rossmann folds for oxidoreductases, phosphate-binding P-loops for transferases. We also survey the ESM Atlas of 7.7 million clusters and identify 169,859 dark enzyme-like candidates across all major microbial phyla. Our results establish a paradigm for enzyme function discovery in microbial dark matter: interpretable by design, scalable without GPU clusters, and applicable to the billions of proteins in the ESM Atlas.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ESMC-SAE features deliver usable zero-shot EC prediction numbers but the mechanistic interpretability rests on unvalidated GPT-5 labels.

read the letter

The paper takes the ESMC protein language model, runs a sparse autoencoder over it to get 16k binary features, and uses those features directly for enzyme commission prediction with no task-specific training. On a balanced set of 4,868 microbial SwissProt sequences across 161 EC3 subclasses it reports 78.9% top-1 and 88.5% top-5 accuracy, beating a 3-mer baseline by 37.6 points. In a leave-one-EC3-class-out test it recovers the correct EC1 superclass 47.7% of the time. It also flags 170k dark-matter candidates from the ESM Atlas.

What is actually new is the combination of an off-the-shelf SAE with a leave-one-class-out protocol aimed at novel enzyme classes, plus the scale of the Atlas survey. The numbers are concrete and the baseline comparison is straightforward. The attempt to link individual SAE dimensions to concepts such as catalytic triads or Rossmann folds is a reasonable direction even if it needs more support.

The soft spot is the interpretability claim. The abstract states that the features correspond to mechanistically relevant concepts because GPT-5 annotated them, but there is no reported check against expert labels, no inter-annotator numbers, and no ablation showing that the performance drops when those annotations are ignored or corrected. Without that step the performance could come from generic sequence statistics rather than the claimed semantic signatures. The abstract also gives no detail on how the benchmark was assembled or whether statistical tests were run.

This is for groups working on microbial enzyme annotation who want a low-compute, zero-shot option and are willing to treat the feature labels as hypotheses rather than established facts. A reader focused on practical annotation tools would get value from the empirical results.

It deserves a serious referee to examine the methods section and ask for the missing validation steps on the GPT-5 labels.

Referee Report

3 major / 0 minor

Summary. The manuscript claims that binary features from a sparse autoencoder (SAE) applied to the ESMC-6B protein language model, with each of the 16,384 features annotated by GPT-5 into biological concepts, serve as semantic signatures enabling accurate enzyme commission (EC) number prediction without task-specific training. On a balanced benchmark of 4,868 microbial SwissProt enzymes across 161 EC3 subclasses, these features achieve 78.9% top-1 and 88.5% top-5 accuracy (37.6% relative improvement over 3-mer baselines at 57.3%). In leave-one-EC3-class-out evaluation, they recover the EC1 superclass in 47.7% of cases (3.3x random), with selected features corresponding to mechanisms such as catalytic triads, Rossmann folds, and P-loops; the work also identifies 169,859 dark enzyme candidates in the ESM Atlas.

Significance. If the central empirical claims hold after validation, the work would offer a scalable, training-free paradigm for interpretable enzyme annotation in microbial dark matter, leveraging precomputed SAE features to survey billions of proteins without GPU-intensive fine-tuning. This could accelerate functional discovery in metagenomes by linking sequence representations directly to mechanistic concepts.

major comments (3)

[Abstract] Abstract: the reported 78.9% top-1 accuracy and 37.6% improvement are presented without any description of benchmark construction (selection criteria for the 4,868 enzymes, balancing procedure across 161 EC3 subclasses, or controls for sequence similarity leakage), statistical testing, or independent validation sets.
[Abstract] Abstract: the interpretability claim that SAE features correspond to 'mechanistically interpretable concepts' (catalytic triad geometry, Rossmann folds, P-loops) depends entirely on GPT-5 annotations, yet the manuscript supplies no quantitative validation of annotation fidelity, inter-annotator agreement with domain experts, or ablation showing that these labels (rather than generic sequence statistics) drive the reported accuracies and leave-one-class-out generalization.
[Abstract] Abstract: the leave-one-EC3-class-out result (47.7% EC1 recovery) is presented as evidence of discovery capability for novel classes, but no details are given on how held-out EC3 subclasses were sampled, whether residual homology was controlled, or how the 3.3x random baseline was computed, leaving open whether performance reflects semantic transfer or other factors.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the abstract. We address each major comment below. Where details were insufficiently summarized in the abstract, we will revise to improve clarity while preserving the manuscript's core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the reported 78.9% top-1 accuracy and 37.6% improvement are presented without any description of benchmark construction (selection criteria for the 4,868 enzymes, balancing procedure across 161 EC3 subclasses, or controls for sequence similarity leakage), statistical testing, or independent validation sets.

Authors: We agree the abstract would benefit from a concise description of these elements. The Methods section details the benchmark: 4,868 microbial SwissProt enzymes were selected with complete EC annotations, balanced by stratified subsampling to ~30 sequences per EC3 subclass (161 classes total), with sequence similarity leakage controlled via CD-HIT clustering at 30% identity and family-level splits. Statistical significance was evaluated with 1,000 bootstrap resamples; no separate held-out validation set beyond the leave-one-class protocol was used. We will add a one-sentence summary of benchmark construction and controls to the abstract. revision: yes
Referee: [Abstract] Abstract: the interpretability claim that SAE features correspond to 'mechanistically interpretable concepts' (catalytic triad geometry, Rossmann folds, P-loops) depends entirely on GPT-5 annotations, yet the manuscript supplies no quantitative validation of annotation fidelity, inter-annotator agreement with domain experts, or ablation showing that these labels (rather than generic sequence statistics) drive the reported accuracies and leave-one-class-out generalization.

Authors: The current manuscript provides only qualitative examples and manual verification of selected features in the Results; it does not include quantitative metrics such as inter-annotator agreement or ablation studies comparing annotated versus unannotated features. We acknowledge this gap and will add a supplementary analysis (expert agreement on 100 features and ablation on EC prediction) with a brief reference in the revised abstract. revision: yes
Referee: [Abstract] Abstract: the leave-one-EC3-class-out result (47.7% EC1 recovery) is presented as evidence of discovery capability for novel classes, but no details are given on how held-out EC3 subclasses were sampled, whether residual homology was controlled, or how the 3.3x random baseline was computed, leaving open whether performance reflects semantic transfer or other factors.

Authors: The Methods section specifies the protocol: 10 EC3 classes were randomly sampled per EC1 superclass for hold-out (ensuring no overlap with training), residual homology was controlled by excluding sequences with >25% identity via BLAST to the training set, and the random baseline (14.3%) is the majority-class frequency in the training distribution. We will incorporate a brief description of sampling, homology control, and baseline computation into the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: performance metrics are direct empirical measurements on held-out data against explicit baselines.

full rationale

The paper presents top-1/top-5 accuracies (78.9%/88.5%) and leave-one-EC3-class-out recovery rates (47.7%) as straightforward comparisons to 3-mer baselines on a fixed benchmark of 4,868 enzymes. No equations, fitted parameters, or self-citations are used to derive these numbers from the SAE features themselves; the results are measured outputs rather than quantities defined by construction from the inputs. GPT-5 annotations are external to the performance calculation and do not create a self-referential loop. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the pre-trained ESMC-6B and its SAE plus the assumption that GPT-5 labels on the 16384 features are biologically accurate; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Sparse autoencoder features from ESMC capture biologically meaningful concepts that can be used directly for enzyme function prediction without task-specific training.
This premise is required for both the accuracy numbers and the interpretability claims to hold.

pith-pipeline@v0.9.1-grok · 5869 in / 1448 out tokens · 38616 ms · 2026-06-27T07:38:51.248385+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 2 canonical work pages

[1]

& Ishiguro-Watanabe, M

Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes.Nucleic Acids Re- search51, D587–D592 (2023)

2023
[2]

Locey, K. J. & Lennon, J. T. Scaling laws predict global microbial diversity.Pro- ceedings of the National Academy of Sciences113, 5970–5975 (2016). 15

2016
[3]

Nayfach, S.et al.A genomic catalog of Earth’s microbiomes.Nature Biotechnology 39, 499–509 (2021)

2021
[4]

O., Lee, S

Palsson, B. O., Lee, S. Y. & Kim, G. B. Approaches for accelerating microbial gene function discovery using artificial intelligence.Nature Microbiology11, 350–358 (2026)

2026
[5]

F., Gish, W., Miller, W., Myers, E

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool.Journal of Molecular Biology215, 403–410 (1990)

1990
[6]

Bernal, V.et al.Deep learning for the prediction of enzyme functions.Biotechnology Advances(2023)

2023
[7]

Y., Kim, H

Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning enables high-quality and high- throughput prediction of enzyme commission numbers.Proceedings of the National Academy of Sciences116, 13996–14001 (2019)

2019
[8]

Gligorijević, V.et al.DeepFRI: structure-based protein function prediction with graph convolutional networks.Nature Communications12, 3168 (2021)

2021
[9]

Yu, T.et al.Enzyme function prediction using contrastive learning.Science379, 1358–1363 (2023)

2023
[10]

Elias, R.et al.CLEAN 2.0: improved enzyme function prediction.Nature Commu- nications(2025)

2025
[11]

B.et al.DeepECtransformer: transformer-based deep learning for enzyme commission number prediction.Nucleic Acids Research51, W213–W219 (2023)

Kim, G. B.et al.DeepECtransformer: transformer-based deep learning for enzyme commission number prediction.Nucleic Acids Research51, W213–W219 (2023)

2023
[12]

S.et al.ProteInfer: deep learning for protein functional inference at scale.Nature Communications(2024)

Detlefsen, N. S.et al.ProteInfer: deep learning for protein functional inference at scale.Nature Communications(2024)

2024
[13]

Ec-bench: A benchmark for enzyme commission number prediction.bioRxiv(2025)

EC-Bench Consortium. Ec-bench: A benchmark for enzyme commission number prediction.bioRxiv(2025). Preprint

2025
[14]

Capela, J.et al.Comparative assessment of protein large language models for enzyme commission number prediction.BMC Bioinformatics26(2025)

2025
[15]

Lin, Z.et al.Evolutionary-scale prediction of atomic-level protein structure with a language model.Science379, 1123–1130 (2023)

2023
[16]

Elnaggar, A.et al.ProtT5: Self-supervised learning of protein sequences with trans- formers.IEEE Transactions on Pattern Analysis and Machine Intelligence(2023). 16

2023
[17]

& Linial, M

Brandes, N., Ofer, D., Peleg, Y., Rappoport, N. & Linial, M. ProteinBERT: a universal deep-learning model of protein sequence and function.Bioinformatics38, 2102–2110 (2022)

2022
[18]

Science387, 850–858 (2025)

Hayes, T.et al.Simulating 500 million years of evolution with a language model. Science387, 850–858 (2025)

2025
[19]

& AlQuraishi, M

Adams, E., Bai, L., Lee, M., Yu, Y. & AlQuraishi, M. From mechanistic interpretabil- itytomechanisticbiology: Training, evaluating, andinterpretingsparseautoencoders on protein language models. InProceedings of the 42nd International Conference on Machine Learning, vol. 267, 460–476 (2025)

2025
[20]

Simon, E., Zou, J.et al.InterPLM: discovering interpretable features in protein language models via sparse autoencoders.Nature Methods22, 2107–2117 (2025)

2025
[21]

Parsan, N., Yang, D. J. & Yang, J. J. Towards interpretable protein structure pre- diction with sparse autoencoders.arXiv preprint arXiv:2503.08764(2025)

work page arXiv 2025
[22]

Valentin, S.et al.Interpreting and steering protein language models through sparse autoencoders.arXiv preprint arXiv:2502.09135(2025)

work page arXiv 2025
[23]

J.et al.Language modeling materializes a world model of protein biology.bioRxiv(2026)

Candido, M. J.et al.Language modeling materializes a world model of protein biology.bioRxiv(2026). EvolutionaryScale / Biohub

2026
[24]

L.et al.Using deep learning to annotate the protein universe.Nature Biotechnology40, 932–937 (2022)

Bileschi, M. L.et al.Using deep learning to annotate the protein universe.Nature Biotechnology40, 932–937 (2022)

2022
[25]

A., Morais, M

Santos, C. A., Morais, M. A., Mandelli, F.et al.A metagenomic ‘dark matter’ enzyme catalyses oxidative cellulose conversion.Nature639, 1076–1083 (2025)

2025
[26]

Jumper, J.et al.HighlyaccurateproteinstructurepredictionwithAlphaFold.Nature 596, 583–589 (2021). 17

2021

[1] [1]

& Ishiguro-Watanabe, M

Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes.Nucleic Acids Re- search51, D587–D592 (2023)

2023

[2] [2]

Locey, K. J. & Lennon, J. T. Scaling laws predict global microbial diversity.Pro- ceedings of the National Academy of Sciences113, 5970–5975 (2016). 15

2016

[3] [3]

Nayfach, S.et al.A genomic catalog of Earth’s microbiomes.Nature Biotechnology 39, 499–509 (2021)

2021

[4] [4]

O., Lee, S

Palsson, B. O., Lee, S. Y. & Kim, G. B. Approaches for accelerating microbial gene function discovery using artificial intelligence.Nature Microbiology11, 350–358 (2026)

2026

[5] [5]

F., Gish, W., Miller, W., Myers, E

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool.Journal of Molecular Biology215, 403–410 (1990)

1990

[6] [6]

Bernal, V.et al.Deep learning for the prediction of enzyme functions.Biotechnology Advances(2023)

2023

[7] [7]

Y., Kim, H

Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning enables high-quality and high- throughput prediction of enzyme commission numbers.Proceedings of the National Academy of Sciences116, 13996–14001 (2019)

2019

[8] [8]

Gligorijević, V.et al.DeepFRI: structure-based protein function prediction with graph convolutional networks.Nature Communications12, 3168 (2021)

2021

[9] [9]

Yu, T.et al.Enzyme function prediction using contrastive learning.Science379, 1358–1363 (2023)

2023

[10] [10]

Elias, R.et al.CLEAN 2.0: improved enzyme function prediction.Nature Commu- nications(2025)

2025

[11] [11]

B.et al.DeepECtransformer: transformer-based deep learning for enzyme commission number prediction.Nucleic Acids Research51, W213–W219 (2023)

Kim, G. B.et al.DeepECtransformer: transformer-based deep learning for enzyme commission number prediction.Nucleic Acids Research51, W213–W219 (2023)

2023

[12] [12]

S.et al.ProteInfer: deep learning for protein functional inference at scale.Nature Communications(2024)

Detlefsen, N. S.et al.ProteInfer: deep learning for protein functional inference at scale.Nature Communications(2024)

2024

[13] [13]

Ec-bench: A benchmark for enzyme commission number prediction.bioRxiv(2025)

EC-Bench Consortium. Ec-bench: A benchmark for enzyme commission number prediction.bioRxiv(2025). Preprint

2025

[14] [14]

Capela, J.et al.Comparative assessment of protein large language models for enzyme commission number prediction.BMC Bioinformatics26(2025)

2025

[15] [15]

Lin, Z.et al.Evolutionary-scale prediction of atomic-level protein structure with a language model.Science379, 1123–1130 (2023)

2023

[16] [16]

Elnaggar, A.et al.ProtT5: Self-supervised learning of protein sequences with trans- formers.IEEE Transactions on Pattern Analysis and Machine Intelligence(2023). 16

2023

[17] [17]

& Linial, M

Brandes, N., Ofer, D., Peleg, Y., Rappoport, N. & Linial, M. ProteinBERT: a universal deep-learning model of protein sequence and function.Bioinformatics38, 2102–2110 (2022)

2022

[18] [18]

Science387, 850–858 (2025)

Hayes, T.et al.Simulating 500 million years of evolution with a language model. Science387, 850–858 (2025)

2025

[19] [19]

& AlQuraishi, M

Adams, E., Bai, L., Lee, M., Yu, Y. & AlQuraishi, M. From mechanistic interpretabil- itytomechanisticbiology: Training, evaluating, andinterpretingsparseautoencoders on protein language models. InProceedings of the 42nd International Conference on Machine Learning, vol. 267, 460–476 (2025)

2025

[20] [20]

Simon, E., Zou, J.et al.InterPLM: discovering interpretable features in protein language models via sparse autoencoders.Nature Methods22, 2107–2117 (2025)

2025

[21] [21]

Parsan, N., Yang, D. J. & Yang, J. J. Towards interpretable protein structure pre- diction with sparse autoencoders.arXiv preprint arXiv:2503.08764(2025)

work page arXiv 2025

[22] [22]

Valentin, S.et al.Interpreting and steering protein language models through sparse autoencoders.arXiv preprint arXiv:2502.09135(2025)

work page arXiv 2025

[23] [23]

J.et al.Language modeling materializes a world model of protein biology.bioRxiv(2026)

Candido, M. J.et al.Language modeling materializes a world model of protein biology.bioRxiv(2026). EvolutionaryScale / Biohub

2026

[24] [24]

L.et al.Using deep learning to annotate the protein universe.Nature Biotechnology40, 932–937 (2022)

Bileschi, M. L.et al.Using deep learning to annotate the protein universe.Nature Biotechnology40, 932–937 (2022)

2022

[25] [25]

A., Morais, M

Santos, C. A., Morais, M. A., Mandelli, F.et al.A metagenomic ‘dark matter’ enzyme catalyses oxidative cellulose conversion.Nature639, 1076–1083 (2025)

2025

[26] [26]

Jumper, J.et al.HighlyaccurateproteinstructurepredictionwithAlphaFold.Nature 596, 583–589 (2021). 17

2021