CrystalBoltz: End-to-End Protein Structure Determination via Experiment-Guided Diffusion for X-Ray Crystallography
Pith reviewed 2026-05-20 20:11 UTC · model grok-4.3
The pith
CrystalBoltz conditions a pre-trained diffusion model on X-ray structure-factor amplitudes to sample and refine protein structures directly from diffraction data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CrystalBoltz casts crystallographic refinement as Bayesian inference over atomic structures and operates directly on structure-factor amplitudes. It moves from unguided generation with a pre-trained prior over protein structures to experiment-guided posterior sampling, followed by atomic coordinate and B-factor refinement.
What carries the argument
Experiment-guided posterior sampling that conditions a pre-trained diffusion prior over protein structures on measured structure-factor amplitudes.
If this is right
- Atomic models can be obtained with lower coordinate errors than strongest existing baselines.
- R-factors improve relative to the same baselines.
- Runtime drops by a factor of 33 compared with existing experimentally guided refinement.
- The workflow applies across multiple protein crystallography datasets without per-target retraining.
Where Pith is reading between the lines
- Similar conditioning of generative priors could be tested on other experimental modalities that also produce incomplete data, such as electron microscopy.
- If the prior continues to generalize, the method may reduce the amount of manual intervention needed for novel or low-resolution targets.
- Faster end-to-end pipelines could increase throughput in structural biology projects that rely on repeated structure determination.
Load-bearing premise
A generative model pre-trained on existing protein structures can be conditioned on fresh experimental diffraction data without major loss of physical consistency.
What would settle it
New crystallographic datasets in which CrystalBoltz produces higher coordinate RMSD or higher R-factors than conventional experimentally guided refinement would falsify the reported performance gains.
Figures
read the original abstract
Generative models trained on public databases of protein structures, most of which have been determined by X-ray crystallography, now provide powerful priors for structure prediction. However, they are not readily conditioned on the measurements from a new crystallographic experiment, limiting their use for X-ray structure determination. In crystallography, the measured structure-factor amplitudes do not by themselves determine an electron density map or atomic structure because the associated phases are unobserved and must be inferred. Structure determination therefore remains an inverse problem in which candidate models must be both structurally plausible and consistent with measured diffraction data, often requiring substantial manual refinement by human experts. Emerging methods aim to incorporate experimental information more directly into predictive and refinement workflows. We present CrystalBoltz, a generative framework that casts crystallographic refinement as Bayesian inference over atomic structures and operates directly on structure-factor amplitudes. CrystalBoltz moves from unguided generation with a pre-trained prior over protein structures to experiment-guided posterior sampling, followed by atomic coordinate and B-factor refinement. Across multiple protein crystallography datasets, CrystalBoltz attains lower coordinate RMSD and lower R-factors than the strongest baselines considered, while reducing runtime by a factor of 33 relative to existing experimentally guided refinement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CrystalBoltz, a generative framework that treats X-ray crystallographic structure determination as Bayesian inference. It starts from a pre-trained diffusion prior over protein structures (trained on PDB entries) and conditions this prior on measured structure-factor amplitudes to produce posterior samples of atomic coordinates and B-factors; these samples are then subjected to conventional atomic refinement. The central empirical claim is that the resulting models achieve lower coordinate RMSD and lower R-factors than the strongest baselines while delivering a 33-fold runtime reduction relative to existing experimentally guided refinement pipelines.
Significance. If the conditioning step can be shown to produce physically consistent structures without substantial domain shift from the PDB training distribution, the method would constitute a meaningful advance in automated structure solution. It would demonstrate that diffusion-based generative priors can be effectively fused with experimental likelihoods in a manner that both accelerates refinement and improves final model quality, a result with direct implications for high-throughput crystallography.
major comments (3)
- [§3.2] §3.2 (Posterior sampling): the manuscript does not specify whether structure-factor amplitudes enter the reverse SDE through a differentiable forward model, classifier-free guidance, or an auxiliary likelihood term. Without this detail it is impossible to assess whether the sampled structures remain physically consistent or whether the final refinement step is still required to achieve the reported metrics.
- [Table 2, §4.3] Table 2 and §4.3: the reported 33× runtime reduction and RMSD/R-factor gains are presented without error bars, without the number of independent runs, and without explicit listing of the baseline methods’ hyper-parameters or convergence criteria. These omissions prevent evaluation of whether the improvements are statistically robust or sensitive to implementation details.
- [§4.1] §4.1 (Datasets): the claim that the pre-trained prior generalizes to new experimental amplitudes rests on an unverified assumption of limited domain shift. No ablation is shown that isolates the effect of the conditioning mechanism from the subsequent refinement stage.
minor comments (2)
- [Figure 3] Figure 3 caption: the electron-density isosurface threshold is not stated, making visual comparison of the maps difficult to reproduce.
- [Eq. (7)] Notation: the symbol for the structure-factor amplitude likelihood is introduced without an explicit definition linking it to the standard crystallographic |F| term.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript describing CrystalBoltz. We have carefully considered each major comment and provide point-by-point responses below, indicating where revisions will be made to address the concerns.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Posterior sampling): the manuscript does not specify whether structure-factor amplitudes enter the reverse SDE through a differentiable forward model, classifier-free guidance, or an auxiliary likelihood term. Without this detail it is impossible to assess whether the sampled structures remain physically consistent or whether the final refinement step is still required to achieve the reported metrics.
Authors: We agree that the description of the posterior sampling procedure in §3.2 lacks sufficient technical detail. The structure-factor amplitudes are incorporated using an auxiliary likelihood term that is added to the reverse SDE, computed via a differentiable forward model that simulates the diffraction process from atomic coordinates. This approach ensures that the generated samples are guided towards physical consistency with the experimental data. The subsequent atomic refinement step is retained to fine-tune B-factors and resolve any minor inconsistencies, as is standard in crystallographic workflows. In the revised manuscript, we will expand §3.2 with a detailed description of this mechanism, including the mathematical formulation of the likelihood term and a note on the role of refinement. revision: yes
-
Referee: [Table 2, §4.3] Table 2 and §4.3: the reported 33× runtime reduction and RMSD/R-factor gains are presented without error bars, without the number of independent runs, and without explicit listing of the baseline methods’ hyper-parameters or convergence criteria. These omissions prevent evaluation of whether the improvements are statistically robust or sensitive to implementation details.
Authors: The referee correctly identifies that additional statistical details would strengthen the empirical claims. We will revise Table 2 to include error bars representing standard deviations over 5 independent runs for each metric. We will also add a new subsection or appendix that lists the hyper-parameters used for CrystalBoltz and all baseline methods, along with their convergence criteria. This will enable readers to better evaluate the robustness of the reported 33× speedup and quality improvements. revision: yes
-
Referee: [§4.1] §4.1 (Datasets): the claim that the pre-trained prior generalizes to new experimental amplitudes rests on an unverified assumption of limited domain shift. No ablation is shown that isolates the effect of the conditioning mechanism from the subsequent refinement stage.
Authors: We appreciate this point regarding the need for more rigorous validation of generalization. While the test proteins were chosen to be distinct from the PDB training set, we acknowledge that an explicit analysis of domain shift and an ablation study would be beneficial. In the revised manuscript, we will add an ablation experiment that compares the performance of the full CrystalBoltz pipeline against a version without the experiment-guided conditioning (i.e., using only the prior followed by refinement). Additionally, we will include a quantitative assessment of structural similarity between the training distribution and the test cases to address the domain shift concern. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper presents CrystalBoltz as a new generative framework that shifts from a pre-trained diffusion prior over protein structures to experiment-guided posterior sampling conditioned on structure-factor amplitudes, followed by refinement. No equations or steps in the provided abstract or description reduce claimed outputs (lower RMSD, R-factors, 33x speedup) to inputs by construction, nor do they rely on self-citations for uniqueness theorems or ansatzes that would make the central Bayesian inference claim tautological. The method introduces novel conditioning and sampling procedures that are evaluated empirically on external datasets, rendering the derivation self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CrystalBoltz formulates structure determination as diffusion posterior sampling... guided by gradients from a differentiable crystallographic likelihood on observed structure-factor amplitudes
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We cast crystallographic structure determination as experiment-guided posterior sampling with a diffusion prior
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ballard, Joshua Bambrick, Sebastian W
Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Laurence Willmore, Andrew J. Ballard, Joseph Bambrick, Sebastian Bodenstein, et al. Accurate structure prediction of biomolecular interactions with alphafold 3.Nature, 2024. doi: 10.1038/s41586-024-07487-w
-
[2]
Towards automated crystallographic structure refinement with phenix
Pavel V Afonine, Ralf W Grosse-Kunstleve, Nathaniel Echols, Jeffrey J Headd, Nigel W Moriarty, Marat Mustyakimov, Thomas C Terwilliger, Alexandre Urzhumtsev, Peter H Zwart, and Paul D Adams. Towards automated crystallographic structure refinement with phenix. refine.Biological crystallography, 68(4):352–367, 2012
work page 2012
-
[3]
PV Afonine, RW Grosse-Kunstleve, PD Adams, and A Urzhumtsev. Bulk-solvent and overall scaling revisited: faster calculations, improved results.Biological Crystallography, 69(4): 625–634, 2013
work page 2013
-
[4]
Gustaf Ahdritz, Nazim Bouatta, Christina Floristean, Sachin Kadyan, Qinghui Xia, William Gerecke, Timothy J O’Donnell, Daniel Berenberg, Ian Fisk, Niccolò Zanichelli, et al. Open- fold: retraining alphafold2 yields new insights into its learning mechanisms and capacity for generalization.Nature methods, 21(8):1514–1524, 2024
work page 2024
-
[5]
Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N. Kinch, R. Dustin Schaeffer, Claudia Millán, Hahnbeom Park, Carson Adams, Caleb R. Glassman, Andy DeGiovanni, Jose H. Pereira, Andria V . Rodrigues, Alberdina A. van Dijk, Ana C. Ebrecht, Diederik J. Opperman, Theo Sagmeister, Chris...
-
[6]
Berman, John Westbrook, Zukang Feng, Gary Gilliland, T
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. The Protein Data Bank.Nucleic Acids Research, 28 (1):235–242, 2000. doi: 10.1093/nar/28.1.235
-
[7]
Axel T Brünger. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures.Nature, 355(6359):472–475, 1992
work page 1992
-
[8]
De novo design of all-atom biomolecular interactions with rfdiffusion3.bioRxiv, 2025
Jasper Butcher, Rohith Krishna, Raktim Mitra, Rafael I Brent, Yanjing Li, Nathaniel Corley, Paul T Kim, Jonathan Funk, Simon Mathis, Saman Salike, et al. De novo design of all-atom biomolecular interactions with rfdiffusion3.bioRxiv, 2025
work page 2025
-
[9]
Diffusion posterior sampling for general noisy inverse problems
Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh Inter- national Conference on Learning Representations, 2023. URL https://openreview.net/ forum?id=OnD9zGAGT0k
work page 2023
-
[10]
A Survey on Diffusion Models for Inverse Problems
Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems, 2024. URLhttps://arxiv.org/abs/2410.00083
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
Diego Del Alamo, Davide Sala, Hassane S Mchaourab, and Jens Meiler. Sampling alternative conformational states of transporters and receptors with alphafold2.elife, 11:e75751, 2022
work page 2022
-
[12]
Alisia Fadini, Minhuan Li, Airlie J. McCoy, Thomas C. Terwilliger, Randy J. Read, Doeke R. Hekstra, and Mohammed AlQuraishi. Alphafold as a prior: experimental structure determination conditioned on a pretrained neural network.Nature Methods, 23(7):785–795, 2026. doi: 10.1038/s41592-026-03047-4
-
[13]
S. French and K. Wilson. On the treatment of negative intensity observations.Acta Crystallo- graphica Section A, 34(4):517–525, 1978. doi: 10.1107/S0567739478001114
-
[14]
Facing the phase problem.IUCrJ, 10(5):521–543, 2023
Wayne A Hendrickson. Facing the phase problem.IUCrJ, 10(5):521–543, 2023. doi: 10.1107/ S2052252523006449. 10
work page 2023
-
[15]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ron- neberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Zıdek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reim...
-
[16]
xds.Biological crystallography, 66(2):125–132, 2010
Wolfgang Kabsch. xds.Biological crystallography, 66(2):125–132, 2010
work page 2010
-
[17]
Yogesh Kalakoti and Björn Wallner. Afsample2 predicts multiple conformations and ensembles with alphafold2.Communications biology, 8(1):373, 2025
work page 2025
-
[18]
Ronan M Keegan, Adam J Simpkin, and Daniel J Rigden. The success rate of processed predicted models in molecular replacement: implications for experimental phasing in the alphafold era.Biological Crystallography, 80(11), 2024
work page 2024
-
[19]
Dual ascent diffusion for inverse problems
Minseo Kim, Axel Levy, and Gordon Wetzstein. Dual ascent diffusion for inverse problems. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026
work page 2026
-
[20]
Chan, Sara Fridovich-Keil, Frederic Poitevin, Ellen D
Axel Levy, Eric R. Chan, Sara Fridovich-Keil, Frederic Poitevin, Ellen D. Zhong, and Gordon Wetzstein. Solving inverse problems in protein space using diffusion-based priors, 2024. URL https://arxiv.org/abs/2406.04239
-
[21]
Sfcalculator: connecting deep generative models and crystallography.bioRxiv, pages 2025–01, 2025
Minhuan Li, Kevin M Dalton, and Doeke Romke Hekstra. Sfcalculator: connecting deep generative models and crystallography.bioRxiv, pages 2025–01, 2025
work page 2025
-
[22]
Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization
Minhuan Li, Jiequn Han, Pilar Cossio, and Luhuan Wu. Robust inference-time steering of protein diffusion models via embedding optimization.arXiv preprint arXiv:2602.05285, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[23]
Dorothee Liebschner, Pavel V . Afonine, Matthew L. Baker, Gábor Bunkóczi, Vincent B. Chen, Tristan I. Croll, Bradley Hintze, Li-Wei Hung, Swati Jain, Airlie J. McCoy, Nigel W. Moriarty, Robert D. Oeffner, Billy K. Poon, Mikhail G. Prisant, Randy J. Read, Jane S. Richardson, David C. Richardson, Michael D. Sammito, Oleg V . Sobolev, Daniel H. Stockwell, Th...
-
[24]
doi: 10.1107/S2059798319011471
-
[25]
Inverse problems with experiment-guided alphafold
Sai Advaith Maddipatla, Nadav Bojan, Meital Bojan, Sanketh Vedula, Ailie Marx, Paul Schanda, and Alexander Bronstein. Inverse problems with experiment-guided alphafold. InICLR 2025 Workshop on Generative and Experimental Perspectives for Biomolecular Design, 2025. URL https://openreview.net/forum?id=1gp130uxfw
work page 2025
-
[26]
Airlie J McCoy, Massimo D Sammito, and Randy J Read. Implications of alphafold2 for crys- tallographic phasing by molecular replacement.Acta Crystallographica Section D: Structural Biology, 78(1):1–13, 2022. doi: 10.1107/S2059798321012122
-
[27]
G. N. Murshudov, A. A. Vagin, and E. J. Dodson. Refinement of macromolecular structures by the maximum-likelihood method.Acta Crystallographica Section D: Biological Crystallogra- phy, 53(3):240–255, 1997. doi: 10.1107/S0907444996012255
-
[28]
G. N. Murshudov, P. Skubák, A. A. Lebedev, N. S. Pannu, R. A. Steiner, R. A. Nicholls, M. D. Winn, F. Long, and A. A. Vagin. Refmac5 for the refinement of macromolecular crystal structures.Acta Crystallographica Section D: Biological Crystallography, 67(4):355–367,
-
[29]
doi: 10.1107/S0907444911001314
-
[30]
Putting alphafold models to work with phenix
Robert D Oeffner, Tristan I Croll, Claudia Millán, Billy K Poon, Christopher J Schlicksup, Randy J Read, and Tom C Terwilliger. Putting alphafold models to work with phenix. pro- cess_predicted_model and isolde.Biological Crystallography, 78(11):1303–1314, 2022. 11
work page 2022
-
[31]
Boltz-2: Towards accurate and efficient binding affinity prediction.bioRxiv, 2025
Saro Passaro, Gabriele Corso, Jeremy Wohlwend, Mateo Reveiz, Stephan Thaler, Vignesh Ram Somnath, Noah Getz, Tally Portnoi, Julien Roy, Hannes Stark, David Kwabi-Addo, Dominique Beaini, Tommi Jaakkola, and Regina Barzilay. Boltz-2: Towards accurate and efficient binding affinity prediction.bioRxiv, 2025
work page 2025
-
[32]
Rishwanth Raghu, Axel Levy, Gordon Wetzstein, and Ellen D. Zhong. Multiscale guidance of protein structure prediction with heterogeneous cryo-em data. InNeurIPS, 2025
work page 2025
-
[33]
Randy J. Read. Structure-factor probabilities for related structures.Acta Crystallographica Section A, 46(11):900–912, 1990. doi: 10.1107/S0108767390005529
-
[34]
Randy J Read and Airlie J McCoy. A log-likelihood-gain intensity target for crystallographic phasing that accounts for experimental error.Biological Crystallography, 72(3):375–387, 2016
work page 2016
-
[35]
Gale Rhodes.Crystallography Made Crystal Clear: A Guide for Users of Macromolecular Models. Complementary Science. Academic Press, Amsterdam, 3rd edition, 2006. ISBN 978-0-12-587073-3
work page 2006
-
[36]
Garland Science, New York, 1st edition, 2009
Bernhard Rupp.Biomolecular Crystallography: Principles, Practice, and Application to Structural Biology. Garland Science, New York, 1st edition, 2009. ISBN 9780429258756
work page 2009
-
[37]
Ramachandran Srinivasan and Soundarajan Parthasarathy.Some Statistical Applications in X-ray Crystallography. Pergamon Press, Oxford, 1976. ISBN 0080180469
work page 1976
-
[38]
Richard A Stein and Hassane S Mchaourab. Speach_af: Sampling protein ensembles and conformational heterogeneity with alphafold2.PLoS computational biology, 18(8):e1010483, 2022
work page 2022
-
[39]
The phase problem.Biological Crystallography, 59(11):1881–1890, 2003
Garry Taylor. The phase problem.Biological Crystallography, 59(11):1881–1890, 2003
work page 2003
-
[40]
ByteDance AML AI4Science Team, Xinshi Chen, Yuxuan Zhang, Chan Lu, Wenzhi Ma, Jiaqi Guan, Chengyue Gong, Jincai Yang, Hanyu Zhang, Ke Zhang, et al. Protenix-advancing structure prediction through a comprehensive alphafold3 reproduction.BioRxiv, pages 2025–01, 2025
work page 2025
-
[41]
Thomas C. Terwilliger et al. Alphafold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination.Nature Methods, 2023. doi: 10.1038/ s41592-023-02031-9
work page 2023
-
[42]
Wei Wang, Zhening Gong, and Wayne A Hendrickson. Alphafold-guided molecular replacement for solving challenging crystal structures.Acta Crystallographica Section D: Structural Biology, 81:4–21, 2025. doi: 10.1107/S2059798324011999
-
[43]
Hannah K Wayment-Steele, Adedolapo Ojoawo, Renee Otten, Julia M Apitz, Warintra Pit- sawong, Marc Hömberger, Sergey Ovchinnikov, Lucy Colwell, and Dorothee Kern. Predicting multiple conformations via sequence clustering and alphafold2.Nature, 625(7996):832–839, 2024
work page 2024
-
[44]
xia2: an expert system for macromolecular crystallography data reduction
Graeme Winter. xia2: an expert system for macromolecular crystallography data reduction. Journal of Applied Crystallography, 43(1):186–190, 2010. doi: 10.1107/S0021889809045701
-
[45]
Graeme Winter, David G Waterman, James M Parkhurst, Aaron S Brewster, Richard J Gildea, Markus Gerstel, Luis Fuentes-Montero, Melanie V ollmar, Tara Michels-Clark, Iris D Young, et al. Dials: implementation and evaluation of a new integration package.Biological Crystal- lography, 74(2):85–97, 2018
work page 2018
-
[46]
Boltz-1: Democratizing biomolecular interaction modeling.bioRxiv, 2024
Jeremy Wohlwend, Gabriele Corso, Saro Passaro, Mateo Reveiz, Ken Leidal, Wojtek Swiderski, Tally Portnoi, Itamar Chinn, Jacob Silterra, Tommi Jaakkola, and Regina Barzilay. Boltz-1: Democratizing biomolecular interaction modeling.bioRxiv, 2024. doi: 10.1101/2024.11.19. 624167. URLhttps://www.biorxiv.org/content/10.1101/2024.11.19.624167v2
-
[47]
− |Eo(⃗h)|2 + σA|Ec(⃗h)| 2 Σ2 ⃗h # I0 2σA|Eo(⃗h)||Ec(⃗h)| Σ2 ⃗h ! , (11) pc |Eo(⃗h)|;|E c(⃗h)| =
Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. Improving diffusion inverse problem solving with decoupled noise annealing. In International Conference on Learning Representations, 2025. 12 A Technical Appendices and Supplementary Material A.1 Rice distribution likelihood This section provides the derivation and d...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.