Protein Folding with Neural Ordinary Differential Equations

Arielle Sanford; Christian B. Mendl; Shuo Sun

arxiv: 2510.16253 · v1 · pith:CSJVYP53new · submitted 2025-10-17 · 💻 cs.LG · cs.AI· q-bio.BM· q-bio.QM· stat.ML

Protein Folding with Neural Ordinary Differential Equations

Arielle Sanford , Shuo Sun , Christian B. Mendl This is my paper

Pith reviewed 2026-05-18 05:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.BMq-bio.QMstat.ML

keywords neural ordinary differential equationsprotein structure predictioncontinuous depthEvoformerattention mechanismssecondary structuremachine learning

0 comments

The pith

A Neural ODE replaces the Evoformer's 48 discrete blocks to achieve constant memory and adaptive accuracy in protein structure prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a continuous-depth version of the Evoformer by recasting its stacked blocks as the dynamics of a Neural ODE while preserving the core attention operations. This formulation keeps memory usage fixed regardless of effective depth through the adjoint method and permits trading compute time for precision with adaptive solvers. A sympathetic reader would care if the approach yields structurally plausible protein predictions and reliably identifies elements such as alpha-helices after far less training than the original model. The authors demonstrate this on structure prediction tasks, reporting plausible outputs and partial secondary-structure capture with only 17.5 hours of single-GPU training.

Core claim

The central claim is that the Evoformer's attention-based operations can be faithfully expressed as the vector field of an ordinary differential equation, so that the 48 discrete layers become a discretization of a continuous-time model solved by Neural ODE techniques; this yields constant memory cost in depth, a tunable runtime-accuracy tradeoff via solver choice, and protein predictions that remain structurally plausible while capturing certain secondary structures such as alpha-helices.

What carries the argument

Neural ODE parameterization of the Evoformer, which treats the discrete stacked blocks as a discretization of a continuous dynamical system whose solution is obtained by adaptive integration.

If this is right

The continuous model generates structurally plausible protein structure predictions.
Certain secondary structure elements such as alpha-helices are reliably captured.
Training the model requires only 17.5 hours on a single GPU.
Memory cost remains constant with respect to model depth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The continuous formulation may integrate more naturally with time-dependent physical simulations of folding trajectories.
Adaptive solvers could support deployment across hardware with different speed and memory limits by adjusting integration tolerances.
The same continuous-depth replacement may extend to other deep attention architectures used in biomolecular tasks.

Load-bearing premise

The core attention-based operations of the Evoformer can be represented as the vector field of an ordinary differential equation without substantial loss of the modeling power needed for protein conformation constraints.

What would settle it

A side-by-side evaluation on a benchmark set of proteins containing known alpha-helices in which the continuous model produces no such helices while the original discrete model does would show that the replacement does not preserve necessary modeling capacity.

read the original abstract

Recent advances in protein structure prediction, such as AlphaFold, have demonstrated the power of deep neural architectures like the Evoformer for capturing complex spatial and evolutionary constraints on protein conformation. However, the depth of the Evoformer, comprising 48 stacked blocks, introduces high computational costs and rigid layerwise discretization. Inspired by Neural Ordinary Differential Equations (Neural ODEs), we propose a continuous-depth formulation of the Evoformer, replacing its 48 discrete blocks with a Neural ODE parameterization that preserves its core attention-based operations. This continuous-time Evoformer achieves constant memory cost (in depth) via the adjoint method, while allowing a principled trade-off between runtime and accuracy through adaptive ODE solvers. Benchmarking on protein structure prediction tasks, we find that the Neural ODE-based Evoformer produces structurally plausible predictions and reliably captures certain secondary structure elements, such as alpha-helices, though it does not fully replicate the accuracy of the original architecture. However, our model achieves this performance using dramatically fewer resources, just 17.5 hours of training on a single GPU, highlighting the promise of continuous-depth models as a lightweight and interpretable alternative for biomolecular modeling. This work opens new directions for efficient and adaptive protein structure prediction frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Neural ODE Evoformer trades accuracy for efficiency with limited evidence backing the claims.

read the letter

The main thing to know is that this paper replaces the 48 discrete blocks of the Evoformer with a Neural ODE parameterization. This keeps the attention operations but runs them in continuous time, which should mean constant memory use no matter the depth and a way to trade speed for accuracy with the solver. The results are structurally plausible and catch alpha-helices, but they don't reach the accuracy of the standard version. What works here is the resource angle. Training in 17.5 hours on a single GPU is a real plus, and the authors are straightforward about the accuracy shortfall instead of claiming parity. Applying Neural ODEs this way to the Evoformer is the new part. The soft spots come down to the evidence level. The abstract gives no error bars, no specific baseline scores, and little on the dataset, so the performance claims are hard to weigh. The stress-test point about whether the core attention mechanisms stay intact in the ODE form is worth paying attention to. If the continuous dynamics change how the signals move through the model, you could end up with outputs that look reasonable but don't enforce the conformation rules as tightly as the discrete stack does. This paper is for people who want lighter versions of protein structure predictors, maybe for quicker iterations in drug design or similar. Readers who like seeing Neural ODEs applied to real problems in biology would get something from it. I would recommend sending it to peer review. The efficiency idea is worth a closer look with fuller experiments and comparisons.

Referee Report

2 major / 2 minor

Summary. The paper proposes replacing the 48 discrete blocks of the Evoformer with a Neural ODE parameterization dx/dt = f_θ(x) that preserves core attention-based operations (MSA attention, pair attention, transition). This yields constant memory cost in depth via the adjoint method and enables runtime-accuracy trade-offs through adaptive ODE solvers. On protein structure prediction tasks the resulting model produces structurally plausible outputs that capture alpha-helices, albeit with lower accuracy than the original discrete architecture, while requiring only 17.5 hours of single-GPU training.

Significance. If the continuous formulation can be shown to retain the essential conformation-modeling capacity of the discrete Evoformer, the work would demonstrate a practical route to memory-efficient, adaptive-depth biomolecular networks. The explicit use of the adjoint method for constant memory and the reported low-resource training constitute concrete strengths that could be leveraged in follow-on studies.

major comments (2)

[Methods / Model Definition] The central claim that the Neural ODE faithfully encodes the core attention operations without substantial loss of modeling power for protein conformation constraints is load-bearing yet unsupported by direct evidence. No ablation, attention-map comparison, or intermediate-state analysis is presented to verify that the continuous vector field f_θ(x) reproduces the iterative refinement behavior of the original 48-block stack.
[Experiments / Results] Benchmarking results are reported without error bars, baseline comparisons against the discrete Evoformer or other continuous-depth models, or dataset statistics. This absence makes it impossible to determine whether the “structurally plausible” predictions and alpha-helix capture constitute a meaningful advance or merely superficial agreement with ground truth.

minor comments (2)

[Model Architecture] Specify the precise functional form of the vector field f_θ(x) (e.g., how MSA and pair representations are injected into the ODE right-hand side) so that readers can assess continuity with the original Evoformer equations.
[Experiments] Clarify the adaptive solver tolerances and step-size heuristics used to realize the claimed runtime-accuracy trade-off; include at least one table showing wall-clock time versus TM-score or RMSD for different tolerance settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our work. We address the major comments point by point below, and have updated the manuscript accordingly where revisions were needed.

read point-by-point responses

Referee: [Methods / Model Definition] The central claim that the Neural ODE faithfully encodes the core attention operations without substantial loss of modeling power for protein conformation constraints is load-bearing yet unsupported by direct evidence. No ablation, attention-map comparison, or intermediate-state analysis is presented to verify that the continuous vector field f_θ(x) reproduces the iterative refinement behavior of the original 48-block stack.

Authors: We appreciate this observation. The Neural ODE parameterization is designed such that the vector field f_θ(x) incorporates the same attention mechanisms (MSA attention, pair attention, and transition) as the discrete Evoformer blocks, but in a continuous-time formulation. This is achieved by defining the ODE dynamics to mirror the operations within each block. While direct ablations and attention map comparisons were not included in the initial submission to focus on the overall feasibility and efficiency gains, we acknowledge their value for validating the modeling power. In the revised manuscript, we will include an analysis of intermediate states and a comparison of attention patterns between the continuous and discrete models to provide direct evidence. revision: yes
Referee: [Experiments / Results] Benchmarking results are reported without error bars, baseline comparisons against the discrete Evoformer or other continuous-depth models, or dataset statistics. This absence makes it impossible to determine whether the “structurally plausible” predictions and alpha-helix capture constitute a meaningful advance or merely superficial agreement with ground truth.

Authors: We agree that additional statistical rigor would strengthen the experimental section. The reported results demonstrate that the model produces plausible structures and captures alpha-helices using only 17.5 hours of single-GPU training, which is a key contribution highlighting the efficiency of the continuous approach. However, we did not provide error bars from multiple independent runs or detailed dataset statistics in the original manuscript. We will revise the paper to include error bars where feasible, provide dataset statistics, and add explicit comparisons to the discrete 48-block Evoformer as well as other continuous-depth baselines to better contextualize the accuracy trade-offs. revision: yes

Circularity Check

0 steps flagged

No significant circularity in continuous-depth Evoformer proposal

full rationale

The paper defines a Neural ODE parameterization that encodes the Evoformer's attention operations as the vector field of a continuous-depth model, then evaluates it empirically on protein structure benchmarks. This is an explicit architectural substitution and modeling choice rather than a mathematical derivation whose outputs reduce to its inputs by construction. No self-citations, fitted parameters renamed as predictions, or uniqueness theorems appear in the abstract or described chain. The reported performance (plausible structures, partial secondary-structure capture, reduced resources) is benchmark-driven and does not tautologically follow from the ODE formulation itself. The central assumption of faithful preservation is stated openly as a hypothesis to be tested, not smuggled in via prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the Evoformer's attention operations remain effective when lifted into a continuous ODE flow; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption The core attention-based operations of the Evoformer can be preserved in a continuous-time formulation.
Invoked when the 48 discrete blocks are replaced by the Neural ODE parameterization.

pith-pipeline@v0.9.0 · 5756 in / 1263 out tokens · 31996 ms · 2026-05-18T05:38:13.699546+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

replacing its 48 discrete blocks with a Neural ODE parameterization that preserves its core attention-based operations... f(m, z) = (σ_m(t)·(m′ − m), σ_z(t)·(z′ − z))
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

continuous-depth formulation of the Evoformer... adjoint sensitivity method for backpropagation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.