Recognition: no theorem link
EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering
Pith reviewed 2026-05-15 12:51 UTC · model grok-4.3
The pith
EvoFlows models protein engineering as edit flows between evolutionarily related sequences to generate variants with controllable insertions, deletions and substitutions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EvoFlows learns mutational trajectories between evolutionarily related protein sequences via edit flows, allowing it to perform a controllable number of insertions, deletions, and substitutions on a template sequence while generating variants that remain consistent with natural protein families.
What carries the argument
Edit flows, which represent the learned continuous trajectories of edit operations (insertions, deletions, substitutions) that transform one evolutionarily related sequence into another.
If this is right
- The model can generate sequences whose length differs from the template without requiring pre-chosen edit positions.
- Generated variants remain statistically consistent with natural families drawn from UniRef and OAS while lying farther from the starting sequence than outputs from leading baselines.
- Both the type and the location of each mutation are predicted jointly rather than in separate steps.
- The same framework supports optimization tasks that require variable numbers of changes, such as directed evolution campaigns.
Where Pith is reading between the lines
- The approach could be paired with structure prediction or physics-based scoring to rank variants for specific functional targets.
- If the in-silico consistency metrics correlate with experimental success, the method could reduce the number of sequences that must be synthesized and tested.
- Similar edit-flow formulations might apply to other variable-length biological sequences such as RNA or antibody regions when aligned evolutionary data are available.
Load-bearing premise
That mutational trajectories learned from evolutionarily related sequences will produce useful engineered proteins that generalize beyond natural variation.
What would settle it
Synthesizing and assaying EvoFlows-generated variants in the laboratory to measure whether they retain or improve function compared with the template and with variants from autoregressive or masked-language baselines.
Figures
read the original abstract
We introduce EvoFlows, a variable-length protein sequence-to-sequence modeling approach designed for protein engineering. Existing protein language models are poorly suited for optimization tasks: autoregressive models require full sequence generation, masked language and discrete diffusion models rely on pre-specified mutation locations, and no existing methods naturally support insertions and deletions relative to a template sequence. EvoFlows learns mutational trajectories between evolutionarily related protein sequences via edit flows, allowing it to perform a controllable number of mutations (insertions, deletions, and substitutions) on a template sequence, predicting not only _which_ mutation to perform, but also _where_ it should occur. Through extensive _in silico_ evaluation on diverse protein families from UniRef and OAS, we show that EvoFlows generates variants that remain consistent with natural protein families while exploring farther from template sequences than leading baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces EvoFlows, a variable-length sequence-to-sequence model for protein engineering based on edit flows that learns mutational trajectories (insertions, deletions, substitutions) between evolutionarily related sequences from datasets like UniRef and OAS. Unlike autoregressive PLMs, masked LMs, or discrete diffusion models, it supports controllable edits without pre-specifying locations. The central claim is that extensive in silico evaluations demonstrate generated variants remain consistent with natural protein families while achieving greater distance from template sequences than leading baselines.
Significance. If the in silico consistency and distance metrics hold and correlate with functional properties, EvoFlows could address key limitations in existing protein language models for optimization tasks by enabling natural handling of indels and mutation placement. This would represent a methodological advance in controllable sequence generation for directed evolution. The paper's strength lies in its focus on evolutionary edit trajectories, but the lack of wet-lab validation or orthogonal functional assays limits claims about engineering utility beyond natural variation.
major comments (2)
- [Evaluation section] Evaluation section (and abstract): the claim of 'extensive in silico evaluation' and superiority over baselines is load-bearing for the central result, yet no specific quantitative metrics (e.g., mean edit distances, family likelihood or MSA scores, R² values, or statistical tests such as p-values or confidence intervals) are reported. Without these, including details on baseline implementations, data splits, and exclusion rules, the assertion that EvoFlows explores farther while remaining consistent cannot be assessed.
- [§3] §3 (model formulation): the edit-flow trajectories are learned from evolutionary pairs, but the manuscript does not address whether the resulting variants generalize to functional improvements outside observed natural variation. The chosen consistency metrics (family likelihood) and distance measures may simply recover non-functional sequences at higher edit distance; a concrete test or ablation on held-out functional data would strengthen this link.
minor comments (2)
- [Abstract] Abstract: 'leading baselines' are referenced without naming the specific methods (e.g., autoregressive, diffusion, or masked models) or their implementations; add this for reproducibility.
- [§2] Notation in the edit-flow equations: clarify how variable-length handling is achieved in the flow-matching objective to avoid ambiguity with standard diffusion formulations.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major point below and indicate the revisions that will be incorporated in the next version.
read point-by-point responses
-
Referee: [Evaluation section] Evaluation section (and abstract): the claim of 'extensive in silico evaluation' and superiority over baselines is load-bearing for the central result, yet no specific quantitative metrics (e.g., mean edit distances, family likelihood or MSA scores, R² values, or statistical tests such as p-values or confidence intervals) are reported. Without these, including details on baseline implementations, data splits, and exclusion rules, the assertion that EvoFlows explores farther while remaining consistent cannot be assessed.
Authors: We agree that explicit numerical reporting is necessary to substantiate the claims. In the revised manuscript we will add a new results table that reports mean edit distances (with standard deviations and 95% confidence intervals), average family likelihood under an independent MSA model, and MSA consistency scores for EvoFlows versus all baselines. We will also include p-values from paired Wilcoxon signed-rank tests and report R² values where regression analyses appear. The methods section will be expanded with complete baseline implementation details (hyperparameters, adaptation for variable-length generation), the precise train/test splits (UniRef cluster-based partitioning to prevent leakage), and all sequence exclusion rules (length, identity, and quality filters). These additions will make the quantitative comparisons fully reproducible and assessable. revision: yes
-
Referee: [§3] §3 (model formulation): the edit-flow trajectories are learned from evolutionary pairs, but the manuscript does not address whether the resulting variants generalize to functional improvements outside observed natural variation. The chosen consistency metrics (family likelihood) and distance measures may simply recover non-functional sequences at higher edit distance; a concrete test or ablation on held-out functional data would strengthen this link.
Authors: The manuscript uses consistency with natural evolutionary families as a standard computational proxy for plausibility, which is the appropriate scope for an in silico method. We acknowledge that an explicit link to functional outcomes would be valuable. In revision we will add an ablation study on held-out functional benchmarks (e.g., ProteinGym fitness datasets). Variants will be generated at controlled edit distances and evaluated for correlation between our consistency metrics and predicted fitness; we will report whether higher-distance EvoFlows sequences maintain or improve functional scores relative to baselines. This directly tests whether the distance-consistency tradeoff recovers non-functional sequences. revision: partial
Circularity Check
No circularity: new edit-flow model trained on evolutionary pairs with external in silico benchmarks
full rationale
The paper defines EvoFlows as a sequence-to-sequence edit-flow model that learns mutational trajectories directly from pairs of evolutionarily related sequences. Generation proceeds by applying learned edit operations to a template, with the number and type of edits controlled at inference. Evaluation metrics (family consistency via likelihood or MSA scores on UniRef/OAS, edit distance from template) are computed post-generation against held-out or external sequence sets and compared to independent baselines. No equation reduces a claimed prediction to a fitted parameter by algebraic identity, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled via prior work. The central claim therefore rests on empirical comparison rather than definitional closure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Evolutionary relationships between protein sequences provide a reliable source of mutational trajectories for learning edit operations
Forward citations
Cited by 1 Pith paper
-
Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction
A new tree-conditioned edit-flow model for ancestral sequence reconstruction achieves reasonable accuracy on substitution-only evolved sequences and superior localization of changes on natural indel-rich sequences.
Reference graph
Works this paper leans on
-
[1]
Information on EC 2.6.1.51 - serine-pyruvate transaminase, 2025
https://doi.org/10.1093/bioinformatics/btac020 BRENDA Enzyme Database. Information on EC 2.6.1.51 - serine-pyruvate transaminase, 2025. Chen, A., Stanton, S. D., Alberstein, R. G., Watkins, A. M., Bonneau, R., Gligorijevic, V., Cho, K., and Frey, N. C. LLMs Are Highly-Constrained Biophysical Sequence Optimizers. NeurIPS 2024 Workshop on AI for New Drug Mo...
-
[2]
https://openreview.net/forum?id=Lm8T39vLDTE Jr, T. F. T., and Bepler, T. Understanding protein function with a multimodal retrieval-aug- mented foundation model. The Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025. https://openreview.net/forum?id=fKerD2AQai Koudelakova, T., Bidmanova, S., Dvorak, P., Pavelka, A., Chaloupkova,...
-
[3]
https://doi.org/10.1093/bioinformatics/btac353 Leslie, C., Eskin, E., and Noble, W. S. The spectrum kernel: a string kernel for SVM protein classification. Pac Symp Biocomput, 564–575, 2002. Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., and others. Evolutionary-scale prediction of atomic-level pr...
-
[4]
FGF2 - Fibroblast growth factor 2 - Gallus gallus (Chicken), 2025
https://doi.org/10.1093/bioinformatics/btm098 UniProt Consortium. FGF2 - Fibroblast growth factor 2 - Gallus gallus (Chicken), 2025. Verkuil, R., Kabeli, O., Du, Y., Wicky, B. I. M., Milles, L. F., Dauparas, J., Baker, D., Ovchinnikov, S., Sercu, T., and Rives, A. Language models generalize beyond natural proteins. Biorxiv, 2022. https://doi.org/10.1101/2...
-
[5]
as biologically informed priors. These frequencies reflect the natural abundance of amino acids in protein databases and have been empirically validated across diverse protein families. The smoothed probability for amino acid 𝑖 is calculated as: 𝑝𝑎,𝛼 = 𝑥𝑎 + 𝛼𝜇𝑎 𝑁 + 𝛼 ⋅ 𝑑. (24) where 𝑥𝑎 is the observed count of amino acid 𝑎, 𝑁 is the total number of observ...
-
[6]
Ensures biologically plausible probability estimates even with limited data
-
[7]
Weights pseudo-counts according to amino acid abundance in natural proteins rather than treating all amino acids equally
-
[8]
Provides a Bayesian interpretation with Dirichlet priors informed by empirical protein evolution 16 Foundation Models for Science: Real-World Impact and Science-First Design, ICLR 2026
work page 2026
-
[9]
Reduces the variance of frequency estimates while introducing minimal bias, as the smoothed probabilities converge to maximum likelihood estimates as dataset size increases This approach is particularly valuable when comparing generated sequences to natural sequences, as it ensures that distance measurements remain finite and meaningful even when datasets...
work page 2012
-
[10]
It is efficient to compute using sparse representations, avoiding the need to explicitly enumerate all |𝒜︀|𝑘 possible 𝑘-mers
-
[11]
It makes no assumptions about the data distribution, unlike learned embeddings
-
[12]
It captures local sequence composition
-
[13]
Its simplicity ensures robustness when evaluating artificial sequences (Kucera et al., 2022). C Protein Types and Datasets We use the following seed proteins to construct homolog datasets via iterative profile search. Anti-SARS-CoV-2 VHH (Ty1) Ty1 is an alpaca-derived single-domain antibody (nanobody) that targets the receptor- binding domain (RBD) of the...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.