Quantitative and Predictive Folding Models from Limited Single-Molecule Data Using Simulation-Based Inference
Pith reviewed 2026-05-19 00:52 UTC · model grok-4.3
The pith
Simulation-based inference extracts accurate folding landscapes and dynamics from a single two-second trajectory.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a simulation-based inference framework that integrates physics-based modeling with deep learning to derive quantitative folding models from minimal SMFS data. Applied to constant-force measurements, this method reconstructs the free energy landscape and folding dynamics of a DNA hairpin from a single two-second experimental trajectory, yielding results consistent with deconvolution methods that require substantially larger datasets. The framework extends to a riboswitch aptamer, resolving a landscape with four metastable states from one trajectory, while Bayesian inference quantifies uncertainties in parameters such as diffusion coefficients without separate calibrations.
What carries the argument
A neural network trained via simulation-based inference on synthetic trajectories generated from a physics-based model of the molecule, linkers, and instrument noise; the network inverts observed data to infer the underlying free-energy profile, diffusion coefficient, and experimental parameters.
If this is right
- The inferred models generate new trajectories that match experimental thermodynamics and kinetics without further fitting.
- Uncertainties on all parameters, including diffusion coefficients and linker stiffness, are obtained directly from the data.
- The method works for systems with multiple metastable states and tertiary contacts from a single short recording.
- Quantitative models become feasible for biomolecular systems where collecting large datasets is impractical.
Where Pith is reading between the lines
- The framework could be adapted to other single-molecule modalities such as fluorescence or optical tweezers that share similar noise and artifact characteristics.
- Short-trajectory inference opens the possibility of tracking folding changes in response to varying conditions within one continuous experiment rather than averaging many separate runs.
- Because the approach separates the physical model from the inference step, it could be extended to include additional molecular degrees of freedom once faster simulation engines become available.
Load-bearing premise
The physics-based simulation model used to generate training data accurately captures all relevant instrumental noise, linker artifacts, and stochastic dynamics of the real experimental system.
What would settle it
If simulated trajectories drawn from the inferred parameters fail to reproduce the observed folding rates, equilibrium occupancies, or transition statistics in a new, independent experimental recording of the same molecule, the reconstruction would be shown to be inaccurate.
Figures
read the original abstract
The study of biomolecular folding has been greatly advanced by single-molecule force spectroscopy (SMFS), which enables the observation of the dynamics of individual molecules. However, extracting quantitative models of fundamental properties such as folding landscapes from SMFS data is very challenging due to instrumental noise, linker artifacts, and the inherent stochasticity of the process, often requiring extensive datasets and complex calibration. Here, we introduce a framework based on simulation-based inference (SBI) that overcomes these limitations by integrating physics-based modeling with deep learning. We first apply this framework to analyze constant-force measurements of a DNA hairpin. From a single experimental trajectory of only two seconds, we successfully reconstruct the hairpin's free energy landscape and folding dynamics, obtaining results in close agreement with established deconvolution methods that require 10-100 times more data. Furthermore, we demonstrate the generality of our approach by applying it to a riboswitch aptamer featuring multiple states and tertiary contacts, resolving the profile of a landscape featuring four metastable states from a single trajectory. The Bayesian nature of this approach robustly quantifies uncertainties for all inferred parameters, including diffusion coefficients and linker stiffness, without needing independent measurements of instrument properties. The inferred models are predictive, generating simulated trajectories that quantitatively reproduce experimental thermodynamics and kinetics. The ability to derive statistically robust models from minimal datasets is crucial for investigating complex biomolecular systems where extensive data collection is impractical, paving the way for novel applications of SMFS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a simulation-based inference (SBI) framework integrating physics-based modeling with deep learning to derive quantitative and predictive models of biomolecular folding from limited single-molecule force spectroscopy (SMFS) data. For a DNA hairpin, it reconstructs the free energy landscape and folding dynamics from a single 2-second experimental trajectory, claiming close agreement with established deconvolution methods that require 10-100 times more data. The approach is extended to a riboswitch aptamer with four metastable states and tertiary contacts. The Bayesian procedure quantifies uncertainties in parameters including diffusion coefficients and linker stiffness without independent calibrations, and the inferred models generate simulated trajectories that reproduce experimental thermodynamics and kinetics.
Significance. If the central results hold, the work has substantial significance for single-molecule biophysics. It directly addresses the practical barrier of extensive data requirements in SMFS by enabling landscape and kinetic reconstruction from minimal trajectories, which is especially valuable for complex or low-abundance systems. The built-in uncertainty quantification for instrumental and physical parameters, combined with demonstrated predictive power of the inferred models, represents a methodological strength. The generality shown on the multi-state riboswitch further broadens potential impact.
major comments (2)
- [Abstract] Abstract: the claim of 'close agreement' with established deconvolution methods is stated without any quantitative metrics (e.g., RMSD between landscapes, overlap of rate distributions, or statistical tests), error bars, or explicit validation details. This absence makes it impossible to evaluate whether the 10-100× data reduction is quantitatively supported or merely qualitative.
- [Methods] The physics-based simulator used to train the SBI network is load-bearing for the headline result. The abstract asserts that the posterior quantifies uncertainties in diffusion coefficients and linker stiffness 'without needing independent measurements,' yet this holds only if the forward model already encodes the correct functional forms and ranges for all relevant noise, filtering, and linker effects. No section provides explicit validation of the simulator against independent controls or known benchmark trajectories.
minor comments (2)
- Add a dedicated table or figure panel that directly compares key extracted quantities (barrier heights, well depths, folding/unfolding rates) between the SBI posterior and the reference deconvolution results, including uncertainties.
- Clarify the precise architecture and training procedure of the inference network (e.g., number of simulations, summary statistics used, network depth) so that the method can be reproduced from the text alone.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the significance of our work and for the constructive major comments. We address each point below and describe the revisions we will implement.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of 'close agreement' with established deconvolution methods is stated without any quantitative metrics (e.g., RMSD between landscapes, overlap of rate distributions, or statistical tests), error bars, or explicit validation details. This absence makes it impossible to evaluate whether the 10-100× data reduction is quantitatively supported or merely qualitative.
Authors: We agree that quantitative metrics would strengthen the abstract and enable a more rigorous assessment of the data-reduction claim. In the revised manuscript we will add explicit metrics, including the RMSD between the SBI-inferred free-energy landscape and the reference landscape from established deconvolution, an overlap coefficient for the rate distributions, and posterior-derived error bars. We will also add a sentence directing readers to the relevant main-text and supplementary figures that contain the full validation details. revision: yes
-
Referee: [Methods] The physics-based simulator used to train the SBI network is load-bearing for the headline result. The abstract asserts that the posterior quantifies uncertainties in diffusion coefficients and linker stiffness 'without needing independent measurements,' yet this holds only if the forward model already encodes the correct functional forms and ranges for all relevant noise, filtering, and linker effects. No section provides explicit validation of the simulator against independent controls or known benchmark trajectories.
Authors: We appreciate the referee’s emphasis on forward-model validation. The simulator is built from established SMFS physical models (worm-like-chain linkers, trap dynamics, and additive noise) whose functional forms are standard in the literature. The current manuscript already shows that trajectories generated from the inferred posterior reproduce experimental thermodynamics and kinetics, providing indirect support. Nevertheless, we agree that a dedicated validation subsection would increase transparency. In the revision we will add a supplementary section and figure that compares simulator outputs against both analytical benchmark cases and independent experimental controls for the DNA-hairpin system, thereby confirming that the chosen functional forms and parameter ranges adequately capture the dominant effects. revision: yes
Circularity Check
No significant circularity; SBI inference is self-contained against external benchmarks
full rationale
The paper's core procedure trains a neural network on trajectories generated from an independent physics-based simulator (incorporating force, linkers, and diffusion) and then applies the trained network to infer parameters and landscapes from a single experimental trace. This chain does not reduce to self-definition or fitted-input-as-prediction because the simulator is constructed from first-principles models of the instrument and molecule rather than being tuned to the target trace itself. Results are explicitly cross-checked against established deconvolution methods that operate on 10-100 times more data, supplying an external benchmark. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to force the outcome. Posterior predictive checks (simulated trajectories reproducing observed thermodynamics and kinetics) constitute standard validation rather than tautological reproduction of inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- diffusion coefficients
- linker stiffness
axioms (1)
- domain assumption The simulation model accurately reproduces instrumental noise and linker effects present in real SMFS experiments
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We used the harmonic-linker model... G(x,q)=G0(x)+kl/2(x−q)^2... anisotropic Brownian dynamics... cubic spline interpolation... SNPE to infer posterior
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
From a single experimental trajectory of only two seconds, we successfully reconstruct the hairpin's free energy landscape
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The protein-folding problem, 50 years on
Ken A Dill and Justin L MacCallum. The protein-folding problem, 50 years on. Science, 338(6110):1042–1046, 2012
work page 2012
-
[2]
From levinthal to path- ways to funnels
Ken A Dill and Hue Sun Chan. From levinthal to path- ways to funnels. Nature Structural Biology, 4(1):10–19, 1997
work page 1997
-
[3]
Johannes Buchner and Thomas Kiefhaber.Protein fold- ing handbook, volume 3. Wiley-VCH Weinheim, 2005
work page 2005
- [4]
-
[5]
Keir C Neuman and Attila Nagy. Single-molecule force spectroscopy: optical tweezers, magnetic tweezers and atomic force microscopy.Nature Methods, 5(6):491–505, 2008
work page 2008
-
[6]
Free energy recon- struction from nonequilibrium single-molecule pulling ex- periments
Gerhard Hummer and Atilla Szabo. Free energy recon- struction from nonequilibrium single-molecule pulling ex- periments. Proceedings of the National Academy of Sci- ences, 98(7):3658–3661, 2001
work page 2001
-
[7]
Michael T. Woodside, Peter C. Anthony, William M. Behnke-Parks, Kevan Larizadeh, Daniel Herschlag, and Steven M. Block. Direct measurement of the full, sequence-dependent folding landscape of a nucleic acid. Science, 314(5801):1001–1004, 2006
work page 2006
-
[8]
Olga Dudko, Gerhard Hummer, and Attila Szabo. In- trinsic Rates and Activation Free Energies from Single- Molecule Pulling Experiments.Physical Review Letters, 96(10):108101, 2006
work page 2006
-
[9]
Theory, analysis, and interpretation of single-molecule force spectroscopy experiments
Olga K Dudko, Gerhard Hummer, and Attila Szabo. Theory, analysis, and interpretation of single-molecule force spectroscopy experiments. Proceedings of the Na- tional Academy of Sciences, 105(41):15755–15760, 2008
work page 2008
-
[10]
Gerhard Hummer and Attila Szabo. Free energy profiles from single-molecule pulling experiments.Proceedings of the National Academy of Sciences of the United States of America, 107(50):21441–21446, 2010
work page 2010
-
[11]
Reconstruct- ing folding energy landscapes by single-molecule force spectroscopy
Michael T Woodside and Steven M Block. Reconstruct- ing folding energy landscapes by single-molecule force spectroscopy. Annual Review of Biophysics, 43:19–39, 2014
work page 2014
-
[12]
J. Christof M. Gebhardt, Thomas Bornschlögl, and Matthias Rief. Full distance-resolved folding energy land- scape of one single protein molecule.Proceedings of the National Academy of Sciences, 107(5):2013–2018, 2010
work page 2013
-
[13]
Robert Walder, William J. Van Patten, Dustin B. Ritchie, Rebecca K. Montange, Ty W. Miller, Michael T. Woodside, and Thomas T. Perkins. High-Precision Single-Molecule Characterization of the Folding of an HIV RNA Hairpin by Atomic Force Microscopy.Nano Letters, 18(10):6318–6325, 2018
work page 2018
- [14]
-
[15]
Pilar Cossio, Gerhard Hummer, and Attila Szabo. On artifacts in single-molecule force spectroscopy.Proceed- ings of the National Academy of Sciences, 112(46):14248– 14253, 2015
work page 2015
- [16]
-
[17]
Gebhardt, Matthias Rief, and D
Michael Hinczewski, Christof M. Gebhardt, Matthias Rief, and D. Thirumalai. From mechanical folding tra- jectories to intrinsic energy landscapes of biopolymers. Proceedings of the National Academy of Sciences of the United States of America, 110(12):4500–4505, 2013
work page 2013
-
[18]
Simulation-based inference of single-molecule force spec- troscopy
Lars Dingeldein, Pilar Cossio, and Roberto Covino. Simulation-based inference of single-molecule force spec- troscopy. Machine Learning: Science and Technology, 4(2):025009, 2022
work page 2022
-
[19]
Woodside, Gerhard Hum- mer, Attila Szabo, and Pilar Cossio
Roberto Covino, Michael T. Woodside, Gerhard Hum- mer, Attila Szabo, and Pilar Cossio. Molecular free en- ergy profiles from force spectroscopy experiments by in- version of observed committors.The Journal of Chemical Physics, 151(15):154115, 2019. 10
work page 2019
-
[20]
Simulation-based inference of single-molecule experi- ments
Lars Dingeldein, Pilar Cossio, and Roberto Covino. Simulation-based inference of single-molecule experi- ments. Current Opinion in Structural Biology, 91:102988, 2025
work page 2025
-
[21]
Kyle Cranmer, Johann Brehmer, and Gilles Louppe. The frontier of simulation-based inference.Proceedings of the National Academy of Sciences of the United States of America, 117(48):30055–30062, 2020
work page 2020
-
[22]
George Papamakarios and Iain Murray. Fastε-free in- ference of simulation models with bayesian conditional density estimation.Advances in Neural Information Pro- cessing Systems, 29, 2016
work page 2016
-
[23]
Sequential neural likelihood: Fast likelihood-free infer- ence with autoregressive flows
George Papamakarios, David Sterratt, and Iain Murray. Sequential neural likelihood: Fast likelihood-free infer- ence with autoregressive flows. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 837–848, 2019
work page 2019
-
[24]
On contrastive learning for likelihood-free inference
Conor Durkan, Iain Murray, and George Papamakar- ios. On contrastive learning for likelihood-free inference. In International Conference on Machine Learning, pages 2771–2781, 2020
work page 2020
-
[25]
Maximilian Dax, Stephen R. Green, Jonathan Gair, Jakob H. Macke, Alessandra Buonanno, and Bernhard Schölkopf. Real-Time Gravitational Wave Science with Neural Posterior Estimation. Physical Review Letters, 127(24):241103, 2021
work page 2021
-
[26]
Galaxy clustering analysis with simbig and the wavelet scattering transform
Bruno Régaldo-Saint Blancard, ChangHoon Hahn, Shirley Ho, Jiamin Hou, Pablo Lemos, Elena Mas- sara, Chirag Modi, Azadeh Moradinezhad Dizgah, Liam Parker, Yuling Yao, et al. Galaxy clustering analysis with simbig and the wavelet scattering transform. Physical Review D, 109(8):083535, 2024
work page 2024
-
[27]
Deep inverse modeling reveals dynamic-dependent invariances in neural circuit mechanisms
Richard Gao, Michael Deistler, Auguste Schulz, Pedro J Gonçalves, and Jakob H Macke. Deep inverse modeling reveals dynamic-dependent invariances in neural circuit mechanisms. Biorxiv, 2024
work page 2024
-
[28]
Flexible statistical inference for mech- anistic models of neural dynamics
Jan-Matthis Lueckmann, Pedro J Goncalves, Gia- como Bassetto, Kaan Öcal, Marcel Nonnenmacher, and Jakob H Macke. Flexible statistical inference for mech- anistic models of neural dynamics. Advances in Neural Information Processing Systems, 30, 2017
work page 2017
-
[29]
Benchmarking simulation-based inference
Jan-Matthis Lueckmann, Jan Boelts, David Greenberg, Pedro Goncalves, and Jakob Macke. Benchmarking simulation-based inference. In International Conference on Artificial Intelligence and Statistics, pages 343–351, 2021
work page 2021
-
[30]
Krishna Neupane, Ajay P Manuel, John Lambert, and Michael T Woodside. Transition-path probability as a test of reaction-coordinate quality reveals DNA hairpin folding is a one-dimensional diffusive process.The Jour- nal of Physical Chemistry Letters, 6(6):1005–1010, 2015
work page 2015
-
[31]
Krishna Neupane, Dustin B Ritchie, Hao Yu, Daniel AN Foster, Feng Wang, and Michael T Woodside. Transi- tion path times for nucleic acid folding determined from energy-landscape analysis of single-molecule trajectories. Physical Review Letters, 109(6):068102, 2012
work page 2012
-
[32]
Ajay P Manuel, John Lambert, and Michael T Wood- side. Reconstructing folding energy landscapes from splitting probability analysis of single-molecule trajecto- ries. Proceedings of the National Academy of Sciences, 112(23):7183–7188, 2015
work page 2015
-
[33]
Reversible unfolding of single RNA molecules by mechanical force.Science, 292(5517):733–737, 2001
Jan Liphardt, Bibiana Onoa, Steven B Smith, Ignacio Tinoco Jr, and Carlos Bustamante. Reversible unfolding of single RNA molecules by mechanical force.Science, 292(5517):733–737, 2001
work page 2001
-
[34]
Krishna Neupane, Ajay P. Manuel, and Michael T. Woodside. Protein folding trajectories can be described quantitatively by one-dimensional diffusion over mea- sured energy landscapes.Nature Physics, 12(7):700–703, 2016
work page 2016
-
[35]
Exceptionallystablenucleicacidhairpins
G.Varani. Exceptionallystablenucleicacidhairpins. An- nual Review of Biophysics and Biomolecular Structure, 24:379–404, 1995
work page 1995
-
[36]
Elena K. Davydova, Thomas J. Santangelo, and Lu- cia B. Rothman-Denes. Bacteriophage N4 virion RNA polymerase interaction with its promoter DNA hairpin. Proceedings of the National Academy of Sciences of the United States of America, 104(17):7033–7038, 2007
work page 2007
-
[37]
Megan C. Engel, Dustin B. Ritchie, Daniel A. N. Foster, Kevin S. D. Beach, and Michael T. Woodside. Recon- structing folding energy landscape profiles from nonequi- librium pulling curves with an inverse weierstrass integral transform. Physical Review Letters, 113:238104, 2014
work page 2014
-
[38]
Andrew G. T. Pyo and Michael T. Woodside. Memory ef- fects in single-molecule force spectroscopy measurements of biomolecular folding. Physical Chemistry Chemical Physics, 21(44):24527–24534, 2019
work page 2019
-
[39]
Rohit Satija, Atanu Das, and Dmitrii E. Makarov. Tran- sition path times reveal memory effects and anomalous diffusion in the dynamics of protein folding.The Journal of Chemical Physics, 147(15):152707, 2017
work page 2017
-
[40]
Rohit Satija and Dmitrii E. Makarov. Generalized Langevin Equation as a Model for Barrier Crossing Dy- namics in Biomolecular Folding.The Journal of Physical Chemistry B, 123(4):802–810, 2019
work page 2019
-
[41]
Distinguishing Signatures of Multipathway Conformational Transitions
ChristopherA.PierseandOlgaK.Dudko. Distinguishing Signatures of Multipathway Conformational Transitions. Physical Review Letters, 118(8):088101, 2017
work page 2017
-
[42]
RohitSatija, AlexanderM.Berezhkovskii, andDmitriiE. Makarov. Broad distributions of transition-path times are fingerprints of multidimensionality of the underly- ing free energy landscapes. Proceedings of the National Academy of Sciences, 117(44):27116–27123, 2020
work page 2020
-
[43]
Jan Boelts, Michael Deistler, Manuel Gloeckler, Álvaro Tejero-Cantero, Jan-Matthis Lueckmann, Guy Moss, Pe- ter Steinbach, Thomas Moreau, Fabio Muratore, Julia Linhart, Conor Durkan, Julius Vetter, Benjamin Kurt Miller, Maternus Herold, Abolfazl Ziaeemehr, Matthijs Pals, Theo Gruner, Sebastian Bischoff, Nastya Krou- glova, Richard Gao, Janne K. Lappalaine...
work page 2025
-
[44]
Neural spline flows.Advances in Neural Information Processing Systems, 32, 2019
Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[45]
Automatic posterior transformation for likelihood-free inference
David Greenberg, Marcel Nonnenmacher, and Jakob Macke. Automatic posterior transformation for likelihood-free inference. InInternational Conference on Machine Learning, pages 2404–2414, 2019
work page 2019
-
[46]
Aaron Lyons, Anita Devi, Noel Q. Hoffer, and Michael T. Woodside. Quantifying the Properties of Nonproduc- tive Attempts at Thermally Activated Energy-Barrier Crossing through Direct Observation. Physical Review X, 14(1):011017, 2024. 11
work page 2024
-
[47]
William J. Greenleaf, Michael T. Woodside, Elio A. Ab- bondanzieri, and Steven M. Block. Passive All-Optical Force Clamp for High-Resolution Laser Trapping.Phys- ical Review Letters, 95(20):208102, 2005. 12 SI Figure 1. One and two-dimensional marginals of the full posterior distributionfϕ(θ|qexp [1:N ]). 13 SI Figure 2. Distribution of barrier heights co...
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.