pith. sign in

arxiv: 2606.20756 · v1 · pith:MWQB3ZR7new · submitted 2026-06-18 · ⚛️ physics.chem-ph · cs.AI· cs.LG

A large-scale foundation model enables simulation-to-real adaptation for nuclear magnetic resonance-based molecular structure analysis

Pith reviewed 2026-06-26 15:29 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cs.AIcs.LG
keywords NMR spectroscopyfoundation modelsimulation-to-real adaptationmolecular structure analysispre-trainingspectral representationsstructure retrieval
0
0 comments X

The pith

UltraNMR pre-trained on 158 million simulated NMR spectra adapts to experimental data for state-of-the-art molecular structure analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UltraNMR as a foundation model trained on a large collection of simulated NMR spectra to learn representations that generalize to real experimental spectra. By using simulation pre-training, it overcomes the scarcity of labeled real data that has limited previous AI applications in NMR spectroscopy. When fine-tuned on experimental tasks, the model achieves better performance than models trained only on the target data. This approach also supports building large spectral libraries for molecule retrieval and aids in identifying unknown compounds from natural sources.

Core claim

UltraNMR is trained on 158 million paired simulated 1H and 13C NMR spectra using domain-specific pre-training objectives that capture intra- and inter-spectral dependencies. Adaptation of this model to molecular structure analysis tasks on real experimental NMR spectra produces state-of-the-art results that surpass those from models trained directly on the downstream experimental data. The model further enables encoding of simulated spectra into a library covering 94 million molecules for structure-aware retrieval and has been applied to elucidate structures of previously unknown natural products.

What carries the argument

UltraNMR, a foundation model pre-trained on simulated NMR spectra with objectives designed to capture spectral dependencies for simulation-to-real transfer.

Load-bearing premise

Simulated NMR spectra capture enough of the statistical properties and noise characteristics of real experimental spectra for pre-trained representations to transfer without major corrections.

What would settle it

Observing that a model trained solely on real experimental NMR data achieves higher accuracy than the simulation-pre-trained UltraNMR on the same set of molecular structure analysis tasks would falsify the central claim.

read the original abstract

Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful tool for molecular structure analysis, and spectral artificial intelligence offers great potential for its rapid and automated interpretation. However, the scarcity of experimental NMR datasets has constrained deep learning in this domain to narrow, task-specific applications that lack broad generalization. Here, we introduce UltraNMR, a large-scale foundation model for NMR that leverages the intrinsic properties of NMR spectra to learn generalizable spectral representations. We collected 158 million paired simulated $^{1}$H and $^{13}$C NMR spectra to train UltraNMR, employing multiple domain-specific pre-training objectives. UltraNMR captures both intra-spectral and inter-spectral dependencies, enabling seamless simulation-to-real adaptation. We demonstrate that adapting UltraNMR to a range of molecular structure analysis tasks on experimental NMR spectra consistently yields state-of-the-art performance and clearly outperforms UltraNMR variants trained directly on downstream data without simulation pre-training. We also construct a large-scale NMR spectral vector library by encoding simulated NMR spectra using UltraNMR, covering 94 million unique molecules and enabling effective structure-aware retrieval. In real-world applications, UltraNMR facilitates the structural elucidation of two previously unknown natural products from Chinese herbal medicines recorded in the Chinese Pharmacopoeia. These results suggest that large-scale simulation pre-training can effectively bridge the simulation-to-real gap, enabling robust and generalizable molecular structure analysis of real-world NMR spectra.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces UltraNMR, a large-scale foundation model pre-trained on 158 million simulated paired 1H and 13C NMR spectra using multiple domain-specific pre-training objectives to capture intra- and inter-spectral dependencies. It claims that fine-tuning/adapting this model to experimental NMR spectra yields state-of-the-art performance across molecular structure analysis tasks, clearly outperforming UltraNMR variants trained directly on the downstream experimental data without simulation pre-training. The work also constructs a vector library encoding 94 million unique molecules for structure-aware retrieval and applies the model to elucidate structures of two previously unknown natural products.

Significance. If the simulation-to-real transfer results hold under rigorous validation, the work would be significant for NMR-based molecular analysis: it shows that large-scale simulation pre-training can address experimental data scarcity and enable generalizable representations, with potential for broad impact in automated spectral interpretation and retrieval in chemistry.

major comments (2)
  1. [Abstract] Abstract (and central claim): the assertion that simulation pre-training 'clearly outperforms' direct-training variants and achieves consistent SOTA requires quantitative support (e.g., specific accuracy/F1 metrics, dataset sizes for downstream tasks, error bars, and statistical tests) to establish that gains arise from the pre-training rather than model scale or optimization choices; without these, the simulation-to-real adaptation cannot be verified as the load-bearing factor.
  2. [Results / Methods] The weakest assumption (simulated spectra reproducing real experimental joint statistics in shifts, couplings, noise, solvent effects, impurities, and artifacts) is load-bearing for all transfer claims; the manuscript must include explicit validation (e.g., distributional comparisons or ablation on simulator fidelity) in the results or methods sections, as omission of these effects could explain observed gains independently of the pre-training strategy.
minor comments (2)
  1. [Abstract] The phrase 'seamless simulation-to-real adaptation' is imprecise; the adaptation procedure (e.g., fine-tuning protocol, any domain-adversarial components, or retrieval method) should be defined with concrete steps and hyperparameters.
  2. [Abstract] Clarify the exact number and nature of the 'range of molecular structure analysis tasks' and the composition of the experimental test sets to allow reproducibility assessment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important points for strengthening the presentation of our results on simulation-to-real transfer. We address each major comment below and will incorporate revisions as noted.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and central claim): the assertion that simulation pre-training 'clearly outperforms' direct-training variants and achieves consistent SOTA requires quantitative support (e.g., specific accuracy/F1 metrics, dataset sizes for downstream tasks, error bars, and statistical tests) to establish that gains arise from the pre-training rather than model scale or optimization choices; without these, the simulation-to-real adaptation cannot be verified as the load-bearing factor.

    Authors: We agree that the abstract would be strengthened by including explicit quantitative metrics. The full manuscript already reports detailed performance numbers (accuracy, F1, dataset sizes) with comparisons to direct-training baselines across multiple tasks, including error bars from multiple runs. In the revision we will add a concise summary of these key metrics, dataset sizes, and statistical significance indicators directly into the abstract to make the central claims self-contained and verifiable. revision: yes

  2. Referee: [Results / Methods] The weakest assumption (simulated spectra reproducing real experimental joint statistics in shifts, couplings, noise, solvent effects, impurities, and artifacts) is load-bearing for all transfer claims; the manuscript must include explicit validation (e.g., distributional comparisons or ablation on simulator fidelity) in the results or methods sections, as omission of these effects could explain observed gains independently of the pre-training strategy.

    Authors: We acknowledge that explicit validation of simulator fidelity is important for supporting the transfer claims. The current manuscript demonstrates successful downstream transfer on experimental data but does not contain the requested distributional comparisons or simulator ablations. We will add these analyses (e.g., shift/coupling distribution overlays and fidelity ablations) to the Methods and Results sections in the revision to directly address this point. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML pipeline with no derivations or self-referential reductions.

full rationale

The paper describes a standard simulation-pretrain-then-adapt pipeline on 158M simulated NMR pairs followed by fine-tuning and evaluation on experimental spectra. No equations, derivations, or parameter-fitting steps are presented that could reduce a claimed prediction to its own inputs by construction. All performance claims are external empirical comparisons (SOTA on real data, outperforming direct-training baselines), with no load-bearing self-citations or ansatzes invoked. This is a self-contained empirical result whose validity rests on data and benchmarks outside the model itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim implicitly rests on the unstated premise that simulated spectra are sufficiently representative.

pith-pipeline@v0.9.1-grok · 5810 in / 1165 out tokens · 24809 ms · 2026-06-26T15:29:17.506336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 1 canonical work pages

  1. [1]

    Jon M Fukuto, Samantha J Carrington, Dean J Tantillo, Jason G Harrison, Louis J Ignarro, Bruce A Freeman, Andrew Chen, and David A Wink. Small molecule signaling agents: the integrated chemistry and biochemistry of nitrogen oxides, oxides of carbon, dioxygen, hydrogen sulfide, and their derived species.Chemical research in toxicology, 25(4):769–793, 2012

  2. [2]

    Small molecule metabo- lites: discovery of biomarkers and therapeutic targets.Signal Transductionand TargetedTherapy, 8(1):132, 2023

    Shi Qiu, Ying Cai, Hong Yao, Chunsheng Lin, Yiqiang Xie, Songqi Tang, and Aihua Zhang. Small molecule metabo- lites: discovery of biomarkers and therapeutic targets.Signal Transductionand TargetedTherapy, 8(1):132, 2023

  3. [3]

    Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019.Journal of natural products, 83(3):770–803, 2020

    David J Newman and Gordon M Cragg. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019.Journal of natural products, 83(3):770–803, 2020

  4. [4]

    Introduction to small molecule drug discovery and preclinical develop- ment

    Michelle WY Southey and Michael Brunavs. Introduction to small molecule drug discovery and preclinical develop- ment. Frontiersin Drug Discovery, 3:1314077, 2023

  5. [5]

    Chemical space.Nature, 432(7019):823–824, 2004

    Peter Kirkpatrick and Clare Ellis. Chemical space.Nature, 432(7019):823–824, 2004

  6. [6]

    How much of the chemical space has been explored? selecting the right exploration measure for drug discovery

    Yutong Xie, Ziqiao Xu, Jiaqi Ma, and Qiaozhu Mei. How much of the chemical space has been explored? selecting the right exploration measure for drug discovery. InICML 2022 2nd AI for Science Workshop, 2022

  7. [7]

    Pubchem 2025 update.Nucleic acids research, 53(D1):D1516–D1525, 2025

    Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. Pubchem 2025 update.Nucleic acids research, 53(D1):D1516–D1525, 2025

  8. [8]

    Robust auto- mated backbone triple resonance nmr assignments of proteins using bayesian-based simulated annealing.Nature Communications, 14(1):1556, 2023

    Anthony C Bishop, Glorisé Torres-Montalvo, Sravya Kotaru, Kyle Mimun, and A Joshua Wand. Robust auto- mated backbone triple resonance nmr assignments of proteins using bayesian-based simulated annealing.Nature Communications, 14(1):1556, 2023

  9. [9]

    Challenges and perspectives in quantitative nmr.Magnetic Resonance in Chemistry, 55(1):61–69, 2017

    Patrick Giraudeau. Challenges and perspectives in quantitative nmr.Magnetic Resonance in Chemistry, 55(1):61–69, 2017

  10. [10]

    Artificialintelligenceinspectroscopy: advancingchemistry from prediction to generation and beyond

    Kehan Guo, Yili Shen, Gisela Abigail Gonzalez-Montiel, Yue Huang, Yujun Zhou, Mihir Surve, Zhichun Guo, Payel Das, NiteshV.Chawla, OlafWiest, andXiangliangZhang. Artificialintelligenceinspectroscopy: advancingchemistry from prediction to generation and beyond. IJCAI ’25, 2025. ISBN 978-1-956792-06-5. doi: 10.24963/ijcai.2025/1160. URLhttps://doi.org/10.24...

  11. [11]

    Deep learning and its applications in nuclear magnetic resonance spectroscopy.Progress in Nuclear Magnetic Resonance Spectroscopy, 146:101556, 2025

    Yao Luo, Xiaoxu Zheng, Mengjie Qiu, Yaoping Gou, Zhengxian Yang, Xiaobo Qu, Zhong Chen, and Yanqin Lin. Deep learning and its applications in nuclear magnetic resonance spectroscopy.Progress in Nuclear Magnetic Resonance Spectroscopy, 146:101556, 2025

  12. [12]

    Deepsat: learning molecular structures from nuclear magnetic resonance data

    Hyun Woo Kim, Chen Zhang, Raphael Reher, Mingxun Wang, Kelsey L Alexander, Louis-Félix Nothias, Yoo Kyong Han, Hyeji Shin, Ki Yong Lee, Kyu Hyeong Lee, et al. Deepsat: learning molecular structures from nuclear magnetic resonance data. Journal of Cheminformatics, 15(1):71, 2023

  13. [13]

    Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization

    Yongqi Jin, Jun-Jie Wang, Fanjie Xu, Xiaohong Ji, Zhifeng Gao, Linfeng Zhang, Guolin Ke, Rong Zhu, and Weinan E. Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization. Nature Communications, 2026

  14. [14]

    Cross-modal retrieval between 13c nmr spectra and structures for compound identification using deep contrastive learning.Analytical Chemistry, 93(50):16947–16955, 2021

    Zhuo Yang, Jianfei Song, Minjian Yang, Lin Yao, Jiahua Zhang, Hui Shi, Xiangyang Ji, Yafeng Deng, and Xiao- jian Wang. Cross-modal retrieval between 13c nmr spectra and structures for compound identification using deep contrastive learning.Analytical Chemistry, 93(50):16947–16955, 2021

  15. [15]

    Learning the language of nmr: structure elucidation from nmr spectra using transformer models

    Marvin Alberts, Federico Zipoli, and Alain Vaucher. Learning the language of nmr: structure elucidation from nmr spectra using transformer models. InAI for Accelerated Materials Design-NeurIPS 2023 Workshop, 2023. 17 SpectraAI Research Article

  16. [16]

    A transformer based generative chemical language ai model for structural elucidation of organic compounds

    Xiaofeng Tan. A transformer based generative chemical language ai model for structural elucidation of organic compounds. Journal of cheminformatics, 17(1):103, 2025

  17. [17]

    Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures

    Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, et al. Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures. Analytical Chemistry, 97(41):22603–22614, 2025

  18. [18]

    Diffnmr: Diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 2025

    Qingsong Yang, Binglan Wu, Xuwei Liu, Bo Chen, Wei Li, Gen Long, Xin Chen, and Mingjun Xiao. Diffnmr: Diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 2025

  19. [19]

    Atomic diffusion models for small molecule structure elucidation from nmr spectra.Advancesin Neural Information Processing Systems, 38:115995–116031, 2026

    Ziyu Xiong, Yichi Zhang, Foyez Alauddin, Chu Xin Cheng, Joon An, Mohammad Seyedsayamdost, and Ellen Zhong. Atomic diffusion models for small molecule structure elucidation from nmr spectra.Advancesin Neural Information Processing Systems, 38:115995–116031, 2026

  20. [20]

    Identifying molecular functional groups of organic compounds by deep learning of nmr data.Magnetic Resonance in Chemistry, 60(11):1061–1069, 2022

    Chongcan Li, Yong Cong, and Weihua Deng. Identifying molecular functional groups of organic compounds by deep learning of nmr data.Magnetic Resonance in Chemistry, 60(11):1061–1069, 2022

  21. [21]

    Machine-learning approach to identify organic functional groups from ft-ir and nmr spectral data.ACS omega, 10(12):12717–12723, 2025

    Gwanho Lee, Hyekyoung Shim, Juhyun Cho, and Sang-Il Choi. Machine-learning approach to identify organic functional groups from ft-ir and nmr spectral data.ACS omega, 10(12):12717–12723, 2025

  22. [22]

    Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning

    Frank Hu, Michael S Chen, Grant M Rotskoff, Matthew W Kanan, and Thomas E Markland. Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning. ACS Central Science, 10(11):2162–2170, 2024

  23. [23]

    A pilot study for fragment identification using 2d nmr and deep learning.Magnetic Resonance in Chemistry, 60(11):1052–1060, 2022

    Stefan Kuhn, Eda Tumer, Simon Colreavy-Donnelly, and Ricardo Moreira Borges. A pilot study for fragment identification using 2d nmr and deep learning.Magnetic Resonance in Chemistry, 60(11):1052–1060, 2022

  24. [24]

    Prediction of natural product classes using machine learning and 13c nmr spectroscopic data

    Saul H Martinez-Trevino, Victor Uc-Cetina, María A Fernández-Herrera, and Gabriel Merino. Prediction of natural product classes using machine learning and 13c nmr spectroscopic data. Journal of Chemical Information and Modeling, 60(7):3376–3386, 2020

  25. [25]

    Dinov3.arXiv preprint arXiv:2508.10104, 2025

    Oriane Siméoni, Huy V Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025

  26. [26]

    A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

    Yukun Zhou, Mark A Chia, Siegfried K Wagner, Murat S Ayhan, Dominic J Williamson, Robbert R Struyven, Timing Liu, Moucheng Xu, Mateo G Lozano, Peter Woodward-Court, et al. A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

  27. [27]

    Self- supervised learning of molecular representations from millions of tandem mass spectra using dreams

    Roman Bushuiev, Anton Bushuiev, Raman Samusevich, Corinna Brungs, Josef Sivic, and Tomáš Pluskal. Self- supervised learning of molecular representations from millions of tandem mass spectra using dreams. Nature Biotechnology, pages 1–11, 2025

  28. [28]

    scgpt: toward building a foundation model for single-cell multi-omics using generative ai.Nature methods, 21(8):1470–1480, 2024

    Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. scgpt: toward building a foundation model for single-cell multi-omics using generative ai.Nature methods, 21(8):1470–1480, 2024

  29. [29]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021

  30. [30]

    Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

  31. [31]

    M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Men- nucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goin...

  32. [32]

    Predicting chemical shifts with graph neural networks

    Ziyue Yang, Maghesree Chakraborty, and Andrew D White. Predicting chemical shifts with graph neural networks. Chemical science, 12(32):10802–10809, 2021

  33. [33]

    Impression–prediction of nmr parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy.Chemical science, 11(2):508–515, 2020

    Will Gerrard, Lars A Bratholm, Martin J Packer, Adrian J Mulholland, David R Glowacki, and Craig P Butts. Impression–prediction of nmr parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy.Chemical science, 11(2):508–515, 2020

  34. [34]

    Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, pages 1–9, 2025

    Fanjie Xu, Wentao Guo, Feng Wang, Lin Yao, Hongshuai Wang, Fujie Tang, Zhifeng Gao, Linfeng Zhang, Weinan E, Zhong-Qun Tian, et al. Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, pages 1–9, 2025

  35. [35]

    Attention is all you need.Advancesin neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advancesin neural information processing systems, 30, 2017

  36. [36]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  37. [37]

    Umap: Uniform manifold approximation and projection for dimension reduction

    Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018

  38. [38]

    Sirius 4: a rapid tool for turning tandem mass spectra into metabolite structure information

    Kai Dührkop, Markus Fleischauer, Marcus Ludwig, Alexander A Aksenov, Alexey V Melnik, Marvin Meusel, Pieter C Dorrestein, Juho Rousu, and Sebastian Böcker. Sirius 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nature methods, 16(4):299–302, 2019

  39. [39]

    13c nmr dereplication using mixonat software: a practical guide to decipher natural products mixtures.Planta Medica, 87(12/13):1061–1068, 2021

    Antoine Bruguière, Séverine Derbré, Dimitri Bréard, Félix Tomi, Jean-Marc Nuzillard, and Pascal Richomme. 13c nmr dereplication using mixonat software: a practical guide to decipher natural products mixtures.Planta Medica, 87(12/13):1061–1068, 2021

  40. [40]

    Nmrexp: A database of 3.3 million experimental nmr spectra.Scientific Data, 12(1):1954, 2025

    Jun-Jie Wang, Yongqi Jin, Chen-Yu Zhi, Yu-Jie Liu, Xu-Hao Huang, Fanjie Xu, Xiaohong Ji, Xi Fang, Haoyi Tao, Weinan E, et al. Nmrexp: A database of 3.3 million experimental nmr spectra.Scientific Data, 12(1):1954, 2025

  41. [41]

    Nmrgym: A comprehensive benchmark for nuclear magnetic resonance based molecular structure elucidation.arXiv preprint arXiv:2601.15763, 2026

    Zheng Fang, Chen Yang, Hai-tao Yu, Haoming Luo, Haitao He, Jiaqing Xie, Zhuo Yang, and Jun Xia. Nmrgym: A comprehensive benchmark for nuclear magnetic resonance based molecular structure elucidation.arXiv preprint arXiv:2601.15763, 2026

  42. [42]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

  43. [43]

    Self-supervised learning from images with a joint-embedding predictive architecture

    Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023

  44. [44]

    Momentum contrast for unsupervised visual representation learning

    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

  45. [45]

    Npclassifier: a deep neural network-based structural classification tool for natural products

    Hyun Woo Kim, Mingxun Wang, Christopher A Leber, Louis-Félix Nothias, Raphael Reher, Kyo Bin Kang, Justin JJ Van Der Hooft, Pieter C Dorrestein, William H Gerwick, and Garrison W Cottrell. Npclassifier: a deep neural network-based structural classification tool for natural products. Journal of natural products, 84(11):2795–2807, 2021

  46. [46]

    Stefan Kuhn and Nils E Schlörer. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2–a free in-house nmr database with integrated lims for academic service laboratories.Magnetic Resonance in Chemistry, 53(8):582–589, 2015

  47. [47]

    On the spectral bias of neural networks

    Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks. InInternational conference on machine learning, pages 5301–5310. PMLR, 2019

  48. [48]

    Fourier features let networks learn high frequency functions in low dimensional domains

    Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. Advancesin neural information processing systems, 33:7537–7547, 2020. 19 SpectraAI Research Article

  49. [49]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  50. [50]

    Layer normalization.arXiv preprint arXiv:1607.06450, 2016

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.arXiv preprint arXiv:1607.06450, 2016

  51. [51]

    Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

    D Hendrycks. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

  52. [52]

    A survey on ordinal regression: Applications, advances and prospects.arXiv preprint arXiv:2503.00952, 2025

    Jinhong Wang, Jintai Chen, Jian Liu, Dongqi Tang, Danny Z Chen, and Jian Wu. A survey on ordinal regression: Applications, advances and prospects.arXiv preprint arXiv:2503.00952, 2025

  53. [53]

    Age estimation based on a single network with soft softmax of aging modeling

    Zichang Tan, Shuai Zhou, Jun Wan, Zhen Lei, and Stan Z Li. Age estimation based on a single network with soft softmax of aging modeling. InAsian Conference on Computer Vision, pages 203–216. Springer, 2016

  54. [54]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

  55. [55]

    Supervised contrastive learning leads to more reasonable spectral embeddings

    Peng Xiong, Hongtao Xu, and Haoran Zheng. Supervised contrastive learning leads to more reasonable spectral embeddings. Analytical Chemistry, 97(37):20137–20146, 2025

  56. [56]

    Anno- tating metabolite mass spectra with domain-inspired chemical formula transformers.Nature Machine Intelligence, 5 (9):965–979, 2023

    Samuel Goldman, Jeremy Wohlwend, Martin Stražar, Guy Haroush, Ramnik J Xavier, and Connor W Coley. Anno- tating metabolite mass spectra with domain-inspired chemical formula transformers.Nature Machine Intelligence, 5 (9):965–979, 2023

  57. [57]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advancesin neural information processing systems, 32, 2019

  58. [58]

    Chemberta: large-scale self-supervised pretraining for molecular property prediction.arXiv preprint arXiv:2010.09885, 2020

    Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. Chemberta: large-scale self-supervised pretraining for molecular property prediction.arXiv preprint arXiv:2010.09885, 2020. 20 SpectraAI Research Article Appendix A Implementation Details For simulated NMR data preprocessing, we align the representation of simulated13C NMR shifts with that of the ...