pith. sign in

arxiv: 2607.00180 · v1 · pith:RGRAKMQSnew · submitted 2026-06-30 · 🧬 q-bio.BM

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Pith reviewed 2026-07-02 00:39 UTC · model grok-4.3

classification 🧬 q-bio.BM
keywords MSA subsamplingprotein conformationenergetic frustrationalternative conformationsallosteric systemsstructure predictionfold switching
0
0 comments X

The pith

SF-Cluster subsamples MSAs via local energetic frustration patterns to recover alternative protein conformations more reliably than sequence-space methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SF-Cluster as a way to select subsets of multiple sequence alignments by using predicted local energetic frustration patterns instead of sequence similarity. This selection targets specific conformational states in proteins that switch between two structures. On a set of 48 test cases covering fold-switching, allosteric, oligomerization, and disordered proteins, the method raises the rate at which the target alternative conformation is recovered compared with AF-Cluster. The gain is largest for allosteric systems. The same selected alignments also work when fed to a different structure predictor, showing that the conformational information sits in the composition of the alignment itself rather than in any single model.

Core claim

SF-Cluster improves target-state recovery of the alternative conformation over AF-Cluster across the two-state classes in a benchmark of 48 cases, with the largest improvement observed for allosteric systems. The recovery advantage is largely explained by the effective depth of the selected subsets, which frustration-pattern selection reliably reaches. At the same time, highly frustrated residues are enriched at sites supported by deep mutational scanning and NMR two-state exchange, and frustration covariation is enriched at state-switching contacts while remaining distinct from coevolutionary coupling.

What carries the argument

Frustration-pattern-based MSA subsampling, in which predicted local energetic frustration patterns serve as the guide for choosing alignment subsets that favor one conformational basin over another.

If this is right

  • The selected MSAs transfer to an architecturally distinct predictor, showing that the conformational signal resides in MSA composition.
  • Matched-depth controls indicate that the recovery advantage is largely explained by the effective depth reached by frustration-pattern selection.
  • Highly frustrated residues are enriched at sites supported by deep mutational scanning and NMR two-state exchange.
  • Frustration covariation is enriched at state-switching contacts yet remains distinct from coevolutionary coupling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same frustration signal could be tested as an input feature for predictors that do not rely on MSAs at all.
  • Frustration-guided selection might be combined with other orthogonal signals such as predicted contacts or evolutionary couplings to further refine basin targeting.
  • The enrichment of frustration at experimentally validated switching sites suggests a route to prioritize residues for mutagenesis experiments aimed at shifting conformational equilibria.

Load-bearing premise

That patterns of predicted local energetic frustration form a signal largely independent of sequence similarity and that this signal reliably encodes which conformational basin an MSA subset will favor.

What would settle it

A new benchmark of two-state proteins in which frustration-selected MSAs produce no recovery gain over AF-Cluster once MSA depth is matched, or in which the selected alignments fail to transfer to a second predictor.

Figures

Figures reproduced from arXiv: 2607.00180 by Chunbin Gu, Ge Liu, Hanqun Cao, Pheng Ann Heng, Pranam Chatterjee, Zijun Gao.

Figure 1
Figure 1. Figure 1: Overview of SF-Cluster. (a) For each homolog in the MSA, a per-residue frustration profile is computed and compressed into a region-level embedding (frustration High to Low; gaps grey). (b) Homologs are positioned by their geometry in frustration-pattern space, and a single mosaic selection draws fixed-size subsets (n = 32) spanning diverse modes, intended to enrich the target basin while retaining the alt… view at source ↗
Figure 2
Figure 2. Figure 2: SF-Cluster improves AF-Cluster-style target-state recovery of the alternative con￾formation. Two-state recovery was scored using the AF-Cluster-style dual-reference criterion, requiring a full-chain Cα RMSD to the target reference of at most 3 Å, a lower RMSD to the target than to the dominant reference, and a mean pLDDT of at least 70 (Methods). (a) Target-state recovery rate by class on the cleaned bench… view at source ↗
Figure 3
Figure 3. Figure 3: The selected MSAs carry the conformational signal to a second predictor. (a) Proteins recovering the target conformation under Boltz-1 (n = 10): zero from single sequence, four with SF-Cluster MSAs, three with depth-matched random MSAs (SF-Cluster versus depth-matched, McNemar p = 1.0). (b) Per-protein target recovery under AF2 and Boltz-1 with SF-Cluster MSAs. 2.3 Frustration marks functionally and confor… view at source ↗
Figure 4
Figure 4. Figure 4: Frustration marks functional and state-switching residues along an axis distinct from coevolution. (a) Enrichment of the most-frustrated residues at DMS functional sites across eight assays (red, individually significant; grey, n.s.); mean enrichment 3.3×, combined signif￾icance Stouffer p ≈ 10−16 . (b) Recall of NMR two-state exchange residues by switch-confident frustration, per protein; pooled recall 62… view at source ↗
Figure 5
Figure 5. Figure 5: Frustration covariation targets state-switching contacts that coevolution does not. (a) Shared variance ρ 2 between frustration covariation and coevolution per protein (log scale). (b) Enrichment at state-switching contacts (∆CM) per protein; significant for frustration on KaiB and RfaH (asterisks). (c) Pooled enrichment at switching versus constant contacts (frustration 1.47× at switching contacts, coevol… view at source ↗
Figure 6
Figure 6. Figure 6: Recovery is governed by per-subset effective depth. (a) Target-state recovery (AF￾Cluster-style dual-reference endpoint) versus per-subset effective depth (Neff80, log scale); re￾covery rises with effective depth and saturates near Neff80 ≈ 30 for SF-Cluster, depth-matched random and FI-shuffled subsets, while depth-blind AF-Cluster subsets fall below this regime. (b) Per-case recovery difference, SF-Clust… view at source ↗
read the original abstract

Deep-learning structure predictors are sensitive to their multiple sequence alignment (MSA) input, making MSA subsampling a practical route to recovering alternative conformations. Existing approaches such as AF-Cluster operate in sequence space, providing limited control over which conformational basin is sampled. We introduce SF-Cluster, which subsamples MSAs using patterns of predicted local energetic frustration, a representation largely independent of sequence similarity. Across a benchmark of 48 cases spanning fold-switching, allosteric, oligomerization-coupled, and intrinsically disordered systems, and using an AF-Cluster-style dual-reference RMSD criterion, SF-Cluster improves target-state recovery of the alternative conformation over AF-Cluster across the two-state classes, with the largest improvement observed for allosteric systems (+15.5 percentage points). The selected MSAs transfer to an architecturally distinct predictor, indicating that the conformational signal resides in MSA composition. Mechanistically, matched-depth controls show that this recovery advantage is largely explained by the effective depth of the selected subsets, which frustration-pattern selection reliably reaches. At the same time, highly frustrated residues are enriched at sites supported by deep mutational scanning and NMR two-state exchange, and frustration covariation is enriched at state-switching contacts while remaining distinct from coevolutionary coupling. Together, these results identify frustration patterns as a transferable representation for conformational prediction and position MSA subsampling as a representation-guided reweighting problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces SF-Cluster, a method for subsampling MSAs using patterns of predicted local energetic frustration to recover alternative protein conformations with deep-learning predictors. On a benchmark of 48 cases across fold-switching, allosteric, oligomerization, and disordered systems, and using an AF-Cluster-style dual-reference RMSD criterion, SF-Cluster improves target-state recovery over AF-Cluster (largest gain +15.5 pp in allosteric systems). The selected MSAs transfer to an architecturally distinct predictor; frustration is enriched at DMS- and NMR-supported sites and at state-switching contacts (distinct from coevolution); matched-depth controls indicate that recovery gains are largely explained by the effective depth reached by frustration-pattern selection.

Significance. If the benchmark results, transferability, and orthogonal validations hold, the work supplies a practical MSA-subsampling tool, empirical evidence that effective depth is a dominant factor in conformational recovery, and biological support linking local frustration to two-state dynamics. It frames MSA subsampling as representation-guided reweighting and identifies frustration patterns as a transferable signal, which could guide future methods even if the primary mechanism is depth selection.

major comments (1)
  1. [Abstract] Abstract: The claim that frustration patterns supply 'a representation largely independent of sequence similarity' that 'captures conformational basin information' is in tension with the statement that 'matched-depth controls show that this recovery advantage is largely explained by the effective depth of the selected subsets.' If depth-matched random subsampling yields statistically indistinguishable recovery rates, the method reduces to reliable depth reweighting rather than frustration-guided basin targeting; the manuscript should directly compare SF-Cluster to depth-matched random controls and revise the mechanistic interpretation accordingly.
minor comments (2)
  1. [Abstract] Abstract: No error bars, confidence intervals, or statistical significance tests are reported for the percentage-point improvements or enrichment statistics.
  2. [Abstract] Abstract: The frustration prediction method is referenced but not described at even a high level (e.g., which energy function or software is used), making it impossible to assess reproducibility from the abstract alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful and constructive review. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that frustration patterns supply 'a representation largely independent of sequence similarity' that 'captures conformational basin information' is in tension with the statement that 'matched-depth controls show that this recovery advantage is largely explained by the effective depth of the selected subsets.' If depth-matched random subsampling yields statistically indistinguishable recovery rates, the method reduces to reliable depth reweighting rather than frustration-guided basin targeting; the manuscript should directly compare SF-Cluster to depth-matched random controls and revise the mechanistic interpretation accordingly.

    Authors: We acknowledge the tension in the abstract wording. The manuscript already reports matched-depth controls showing that recovery gains are largely explained by the effective depth reliably reached via frustration-pattern selection. This supports interpreting the method primarily as representation-guided depth reweighting rather than direct basin targeting. We agree the phrase 'captures conformational basin information' risks overstating the case and will revise the abstract to align the mechanistic claims with the depth-based findings while retaining the statements on independence from sequence similarity (supported by the method design) and the orthogonal enrichment results at DMS-, NMR-, and state-switching sites. The matched-depth controls already provide the requested comparison to depth-matched selection; we will ensure this is stated explicitly in the revised text. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external benchmarks and depth-matched controls

full rationale

The paper evaluates on an external 48-case benchmark using an AF-Cluster-style dual-reference RMSD criterion and explicitly reports matched-depth controls showing that recovery gains are largely attributable to effective MSA depth reached by the selection procedure. No equations, fitted parameters, or self-citations are shown that reduce the reported improvements or the frustration representation claim to quantities defined by the authors' own prior fits or inputs. The central positioning of frustration patterns as a transferable signal is supported by enrichment observations (deep mutational scanning, NMR, state-switching contacts) that remain distinct from coevolutionary coupling, keeping the derivation self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities. The method implicitly relies on the accuracy of an upstream frustration predictor and on the validity of the dual-reference RMSD criterion, but these are not quantified or derived within the provided text.

pith-pipeline@v0.9.1-grok · 5796 in / 1245 out tokens · 25144 ms · 2026-07-02T00:39:40.385007+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references

  1. [1]

    Extant fold-switching proteins are widespread.Proceed- ings of the National Academy of Sciences, 115(23):5968–5973, 2018

    Lauren L Porter and Loren L Looger. Extant fold-switching proteins are widespread.Proceed- ings of the National Academy of Sciences, 115(23):5968–5973, 2018

  2. [2]

    Functional and regulatory roles of fold-switching proteins

    Allen K Kim and Lauren L Porter. Functional and regulatory roles of fold-switching proteins. Structure, 29(1):6–14, 2021

  3. [3]

    TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Gener- ation

    Hanqun Cao, Aastha Pal, Sophia Tang, Yinuo Zhang, Jingjie Zhang, Pheng-Ann Heng, and Pranam Chatterjee. TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Gener- ation. InForty-third International Conference on Machine Learning (Spotlight), 2026

  4. [4]

    Highly accurate protein structure prediction with alphafold.nature, 596(7873):583–589, 2021

    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ron- neberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold.nature, 596(7873):583–589, 2021

  5. [5]

    Alphafold2 fails to predict protein fold switching

    Devlina Chakravarty and Lauren L Porter. Alphafold2 fails to predict protein fold switching. Protein Science, 31(6):e4353, 2022

  6. [6]

    Alphafold predictions of fold-switched conformations are driven by structure memorization.Nature communications, 15(1):7296, 2024

    Devlina Chakravarty, Joseph W Schafer, Ethan A Chen, Joseph F Thole, Leslie A Ronish, Myeongsang Lee, and Lauren L Porter. Alphafold predictions of fold-switched conformations are driven by structure memorization.Nature communications, 15(1):7296, 2024

  7. [7]

    Sampling alternative conformational states of transporters and receptors with alphafold2.elife, 11:e75751, 2022

    Diego Del Alamo, Davide Sala, Hassane S Mchaourab, and Jens Meiler. Sampling alternative conformational states of transporters and receptors with alphafold2.elife, 11:e75751, 2022

  8. [8]

    Speach_af: Sampling protein ensembles and conformational heterogeneity with alphafold2.PLoS computational biology, 18(8):e1010483, 2022

    Richard A Stein and Hassane S Mchaourab. Speach_af: Sampling protein ensembles and conformational heterogeneity with alphafold2.PLoS computational biology, 18(8):e1010483, 2022

  9. [9]

    Predicting multiple conformations via sequence clustering and alphafold2.Nature, 625(7996):832–839, 2024

    Hannah K Wayment-Steele, Adedolapo Ojoawo, Renee Otten, Julia M Apitz, Warintra Pit- sawong, Marc Hömberger, Sergey Ovchinnikov, Lucy Colwell, and Dorothee Kern. Predicting multiple conformations via sequence clustering and alphafold2.Nature, 625(7996):832–839, 2024

  10. [10]

    Leveraging sequence purification for accurate prediction of multiple conformational states with alphafold2.Research Square, pages rs–3, 2025

    Enming Xing, Junjie Zhang, Shen Wang, and Xiaolin Cheng. Leveraging sequence purification for accurate prediction of multiple conformational states with alphafold2.Research Square, pages rs–3, 2025

  11. [11]

    Structure prediction of alternative protein conformations

    Patrick Bryant and Frank Noé. Structure prediction of alternative protein conformations. Nature Communications, 15(1):7328, 2024. 15

  12. [12]

    Large-scale predictions of alternative protein conformations by alphafold2-based sequence association.Nature Communications, 16(1):5622, 2025

    Myeongsang Lee, Joseph W Schafer, Jeshuwin Prabakaran, Devlina Chakravarty, Madeleine F Clore, and Lauren L Porter. Large-scale predictions of alternative protein conformations by alphafold2-based sequence association.Nature Communications, 16(1):5622, 2025

  13. [13]

    Disentangling coevolutionary constraints for modeling protein conformational heterogeneity.Communica- tions Chemistry, 9:146, 2026

    Shimian Li, Chengwei Zhang, Lupeng Kong, Yue Xue, Sirui Liu, and Yi Qin Gao. Disentangling coevolutionary constraints for modeling protein conformational heterogeneity.Communica- tions Chemistry, 9:146, 2026

  14. [14]

    Frustrai-seq: Scaling local energetic frustration to the protein sequence space.bioRxiv, pages 2026–02, 2026

    Jan-Philipp Leusch, Miriam Poley-Gil, Miguel Fernandez-Martin, Nicola Bordin, Burkhard Rost, R Gonzalo Parra, and Michael Heinzinger. Frustrai-seq: Scaling local energetic frustration to the protein sequence space.bioRxiv, pages 2026–02, 2026

  15. [15]

    Frustration in biomolecules

    Diego U Ferreiro, Elizabeth A Komives, and Peter G Wolynes. Frustration in biomolecules. Quarterly reviews of biophysics, 47(4):285–363, 2014

  16. [16]

    On the role of frustration in the energy landscapes of allosteric proteins.Proceedings of the National Academy of Sciences, 108(9):3499–3503, 2011

    Diego U Ferreiro, Joseph A Hegler, Elizabeth A Komives, and Peter G Wolynes. On the role of frustration in the energy landscapes of allosteric proteins.Proceedings of the National Academy of Sciences, 108(9):3499–3503, 2011

  17. [17]

    Cath–a hierarchic classification of protein domain structures.Structure, 5(8): 1093–1109, 1997

    Christine A Orengo, Alex D Michie, Susan Jones, David T Jones, Mark B Swindells, and Janet M Thornton. Cath–a hierarchic classification of protein domain structures.Structure, 5(8): 1093–1109, 1997

  18. [18]

    Boltz-1 democratizing biomolecular interaction modeling.BioRxiv, pages 2024–11, 2025

    Jeremy Wohlwend, Gabriele Corso, Saro Passaro, Noah Getz, Mateo Reveiz, Ken Leidal, Wojtek Swiderski, Liam Atkinson, Tally Portnoi, Itamar Chinn, et al. Boltz-1 democratizing biomolecular interaction modeling.BioRxiv, pages 2024–11, 2025

  19. [19]

    Proteingym: Large- scale benchmarks for protein fitness prediction and design.Advances in neural information processing systems, 36:64331–64379, 2023

    Pascal Notin, Aaron Kollasch, Daniel Ritter, Lood Van Niekerk, Steffanie Paul, Han Spinner, Nathan Rollins, Ada Shaw, Rose Orenbuch, Ruben Weitzman, et al. Proteingym: Large- scale benchmarks for protein fitness prediction and design.Advances in neural information processing systems, 36:64331–64379, 2023

  20. [20]

    A combined approach reveals a regulatory mechanism coupling src’s kinase activity, localization, and phosphotransferase-independent functions.Molecular cell, 74(2):393–408, 2019

    Ethan Ahler, Ames C Register, Sujata Chakraborty, Linglan Fang, Emily M Dieter, Katherine A Sitko, Rama Subba Rao Vidadala, Bridget M Trevillian, Martin Golkowski, Hannah Gelman, et al. A combined approach reveals a regulatory mechanism coupling src’s kinase activity, localization, and phosphotransferase-independent functions.Molecular cell, 74(2):393–408, 2019

  21. [21]

    Molecular determinants of hsp90 dependence of src kinase revealed by deep mutational scanning.Protein Science, 32(7):e4656, 2023

    Vanessa Nguyen, Ethan Ahler, Katherine A Sitko, Jason J Stephany, Dustin J Maly, and Dou- glas M Fowler. Molecular determinants of hsp90 dependence of src kinase revealed by deep mutational scanning.Protein Science, 32(7):e4656, 2023

  22. [22]

    Deconstruction of the ras switching cycle through saturation mutagenesis.Elife, 6: e27810, 2017

    Pradeep Bandaru, Neel H Shah, Moitrayee Bhattacharyya, John P Barton, Yasushi Kondo, Joshua C Cofsky, Christine L Gee, Arup K Chakraborty, Tanja Kortemme, Rama Ranganathan, et al. Deconstruction of the ras switching cycle through saturation mutagenesis.Elife, 6: e27810, 2017

  23. [23]

    A framework for exhaustively mapping functional missense variants.Molecular systems biology, 13(12):MSB177908, 2017

    Jochen Weile, Song Sun, Atina G Cote, Jennifer Knapp, Marta Verby, Joseph C Mellor, Yingzhou Wu, Carles Pons, Cassandra Wong, Natascha van Lieshout, et al. A framework for exhaustively mapping functional missense variants.Molecular systems biology, 13(12):MSB177908, 2017. 16

  24. [24]

    Deep mutational scanning reveals the structural basis forα-synuclein activity.Nature chemical biology, 16(6):653–659, 2020

    Robert W Newberry, Jaime T Leong, Eric D Chow, Martin Kampmann, and William F De- Grado. Deep mutational scanning reveals the structural basis forα-synuclein activity.Nature chemical biology, 16(6):653–659, 2020

  25. [25]

    Metamorphic protein iscu changes conformation by cis–trans isomerizations of two peptidyl–prolyl peptide bonds.Biochemistry, 51(48): 9595–9602, 2012

    Ziqi Dai, Marco Tonelli, and John L Markley. Metamorphic protein iscu changes conformation by cis–trans isomerizations of two peptidyl–prolyl peptide bonds.Biochemistry, 51(48): 9595–9602, 2012

  26. [26]

    The mad2 spindle checkpoint protein has two distinct natively folded states.Nature structural & molecular biology, 11(4):338–345, 2004

    Xuelian Luo, Zhanyun Tang, Guohong Xia, Katja Wassmann, Tomohiro Matsumoto, Josep Rizo, and Hongtao Yu. The mad2 spindle checkpoint protein has two distinct natively folded states.Nature structural & molecular biology, 11(4):338–345, 2004

  27. [27]

    An α helix to β barrel domain switch transforms the transcription factor rfah into a translation factor.Cell, 150(2):291–303, 2012

    Björn M Burmann, Stefan H Knauer, Anastasia Sevostyanova, Kristian Schweimer, Rachel A Mooney, Robert Landick, Irina Artsimovitch, and Paul Rösch. An α helix to β barrel domain switch transforms the transcription factor rfah into a translation factor.Cell, 150(2):291–303, 2012

  28. [28]

    A protein fold switch joins the circadian oscillator to clock output in cyanobacteria.Science, 349(6245):324–328, 2015

    Yong-Gang Chang, Susan E Cohen, Connie Phong, William K Myers, Yong-Ick Kim, Roger Tseng, Jenny Lin, Li Zhang, Joseph S Boyd, Yvonne Lee, et al. A protein fold switch joins the circadian oscillator to clock output in cyanobacteria.Science, 349(6245):324–328, 2015

  29. [29]

    Unsupervisedly prompting alphafold2 for accurate few-shot protein structure prediction.Journal of Chemical Theory and Computation, 19(22): 8460–8471, 2023

    Jun Zhang, Sirui Liu, Mengyun Chen, Haotian Chu, Min Wang, Zidong Wang, Jialiang Yu, Ningxi Ni, Fan Yu, Dechin Chen, et al. Unsupervisedly prompting alphafold2 for accurate few-shot protein structure prediction.Journal of Chemical Theory and Computation, 19(22): 8460–8471, 2023

  30. [30]

    Msa generation with seqs2seqs pre- training: advancing protein structure predictions.Advances in Neural Information Processing Systems, 37:57324–57348, 2024

    Le Zhang, Jiayang Chen, Tao Shen, Yu Li, and Siqi Sun. Msa generation with seqs2seqs pre- training: advancing protein structure predictions.Advances in Neural Information Processing Systems, 37:57324–57348, 2024

  31. [31]

    Msagpt: Neural prompting protein structure prediction via msa generative pre-training.Advances in Neural Information Processing Systems, 37:37504–37534, 2024

    Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, and Le Song. Msagpt: Neural prompting protein structure prediction via msa generative pre-training.Advances in Neural Information Processing Systems, 37:37504–37534, 2024

  32. [32]

    Plame: Lightweight msa design advances protein folding from evolu- tionary embeddings

    Hanqun Cao, Xinyi Zhou, Zijun Gao, Chenyu Wang, Xin Gao, Zhi Zhang, Chunbin Gu, Ge Liu, and Pheng-Ann Heng. Plame: Lightweight msa design advances protein folding from evolu- tionary embeddings. InNeurIPS 2025 AI for Science Workshop, 2025

  33. [33]

    Generative power of a protein language model trained on multiple sequence alignments.Elife, 12:e79854, 2023

    Damiano Sgarbossa, Umberto Lupo, and Anne-Florence Bitbol. Generative power of a protein language model trained on multiple sequence alignments.Elife, 12:e79854, 2023

  34. [34]

    Msaflow: a unified approach for msa representation, augmentation, and family-based protein design

    Anirudh Venkatraman, Hanqun Cao, Tong Wei, Chaoran Cheng, and Ge Liu. Msaflow: a unified approach for msa representation, augmentation, and family-based protein design. InNeurIPS 2025 AI for Science Workshop

  35. [35]

    Localizing frustration in native proteins and protein assemblies.Proceedings of the National Academy of Sciences, 104(50):19819–19824, 2007

    Diego U Ferreiro, Joseph A Hegler, Elizabeth A Komives, and Peter G Wolynes. Localizing frustration in native proteins and protein assemblies.Proceedings of the National Academy of Sciences, 104(50):19819–19824, 2007

  36. [36]

    17 Local frustration around enzyme active sites.Proceedings of the National Academy of Sciences, 116(10):4037–4043, 2019

    Maria I Freiberger, A Brenda Guzovsky, Peter G Wolynes, R Gonzalo Parra, and Diego U Ferreiro. 17 Local frustration around enzyme active sites.Proceedings of the National Academy of Sciences, 116(10):4037–4043, 2019

  37. [37]

    Colabfold: making protein folding accessible to all.Nature methods, 19 (6):679–682, 2022

    Milot Mirdita, Konstantin Schütze, Yoshitaka Moriwaki, Lim Heo, Sergey Ovchinnikov, and Martin Steinegger. Colabfold: making protein folding accessible to all.Nature methods, 19 (6):679–682, 2022

  38. [38]

    Tm-align: a protein structure alignment algorithm based on the tm-score.Nucleic acids research, 33(7):2302–2309, 2005

    Yang Zhang and Jeffrey Skolnick. Tm-align: a protein structure alignment algorithm based on the tm-score.Nucleic acids research, 33(7):2302–2309, 2005

  39. [39]

    Protein frustratometer: a tool to localize energetic frustration in protein molecules.Nucleic acids research, 40(W1):W348–W351, 2012

    Michael Jenik, R Gonzalo Parra, Leandro G Radusky, Adrian Turjanski, Peter G Wolynes, and Diego U Ferreiro. Protein frustratometer: a tool to localize energetic frustration in protein molecules.Nucleic acids research, 40(W1):W348–W351, 2012

  40. [40]

    Protein frustratometer 2: a tool to localize energetic frustration in protein molecules, now with electrostatics.Nucleic acids research, 44(W1): W356–W360, 2016

    R Gonzalo Parra, Nicholas P Schafer, Leandro G Radusky, Min-Yeh Tsai, A Brenda Guzovsky, Peter G Wolynes, and Diego U Ferreiro. Protein frustratometer 2: a tool to localize energetic frustration in protein molecules, now with electrostatics.Nucleic acids research, 44(W1): W356–W360, 2016. 18 Supplementary Information S1 Implementation Details S1.1 Matched...

  41. [41]

    ThenP(B s | A ′ 1)̸=P(B s | A ′ 2)

    ̸= ϕ(A′ 2), and that for some basinB s and at least oner̸=s, gs(ϕ(A′ 1))−g r(ϕ(A′ 1)) ̸= gs(ϕ(A′ 2))−g r(ϕ(A′ 2)) . ThenP(B s | A ′ 1)̸=P(B s | A ′ 2). Proof.The log-odds between basinssandrunder any sub-MSAA ′ is log P(B s | A ′) P(B r | A ′) =g s(ϕ(A′))−g r(ϕ(A′)). By assumption this quantity differs betweenA′ 1 and A′ 2 for at least one r̸ = s, so the ...

  42. [42]

    ̸= P (Bs | A ′ 2). Theorem S2.1 establishes that MSA subsampling is a conformational reweighting operation whose effect is mediated entirely by the shift in representationϕ(A′), making the choice of ϕ the binding constraint on what subsampling can achieve. 31 S2.3 Theorem 2: Conditions for focusing and impossibility Theorem S2.2.(a) Targeted focusing.Supp...