pith. sign in

arxiv: 2508.08441 · v3 · submitted 2025-08-04 · 🧬 q-bio.QM · cs.CE· cs.LG

SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data

Pith reviewed 2026-05-19 01:07 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.CEcs.LG
keywords molecular structure elucidationlarge language modelsmulti-spectral dataIR spectroscopyNMR spectroscopymass spectrometrystructure predictionspectroscopic analysis
0
0 comments X

The pith

SpectraLLM shows large language models can predict molecular structures by treating multiple spectra as shared language input.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SpectraLLM, an LLM that takes one or more spectra as input and outputs a predicted molecular structure in an end-to-end manner. It converts continuous spectra such as IR, Raman, UV-Vis, and NMR along with discrete mass spectra into a common language representation. This lets the model combine complementary substructural clues that single-modality methods miss. The model is pretrained and fine-tuned on small-molecule data and evaluated on four public benchmarks, where it exceeds prior single-spectrum approaches. Accuracy rises further when the model reasons over several spectral types at once.

Core claim

SpectraLLM performs end-to-end structure prediction by reasoning over one or multiple spectra, representing both continuous (IR, Raman, UV-Vis, NMR) and discrete (MS) modalities in a shared language space that enables capture of complementary substructural patterns, and achieves state-of-the-art performance on four public benchmark datasets while showing robustness in unimodal use and gains from multi-spectral inputs.

What carries the argument

Shared language-space representation of diverse spectra, which converts both continuous and discrete inputs into token sequences the LLM can process jointly to integrate complementary substructural information.

If this is right

  • The model surpasses single-modality baselines on four public benchmark datasets.
  • Prediction accuracy increases when the model jointly reasons over multiple spectral types.
  • Performance remains strong even when only one spectrum type is provided.
  • The approach creates a scalable route for language-based analysis of spectroscopic data without database lookup.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same language-space mapping could be applied to other analytical signals such as chromatography or imaging data.
  • Training on larger or more diverse molecular sets might extend the method beyond small molecules.
  • Integration with existing rule-based or graph-based structure generators could further constrain outputs.

Load-bearing premise

Converting different spectra into a shared language format lets the model reliably detect and combine substructural patterns that are not visible in any single spectrum alone.

What would settle it

Running the model on a held-out set of molecules where multi-spectral inputs produce no accuracy gain over the best single-spectrum baseline, or where overall accuracy falls below existing non-LLM spectrum-to-structure methods.

read the original abstract

Automated molecular structure elucidation remains challenging, as existing approaches often depend on pre-compiled databases or restrict themselves to single spectroscopic modalities. Here we introduce SpectraLLM, a large language model that performs end-to-end structure prediction by reasoning over one or multiple spectra. Unlike conventional spectrum-to-structure pipelines, SpectraLLM represents both continuous (IR, Raman, UV-Vis, NMR) and discrete (MS) modalities in a shared language space, enabling it to capture substructural patterns that are complementary across different spectral types. We pretrain and fine-tune the model on small-molecule domains and evaluate it on four public benchmark datasets. SpectraLLM achieves state-of-the-art performance, substantially surpassing single-modality baselines. Moreover, it demonstrates strong robustness in unimodal settings and further improves prediction accuracy when jointly reasoning over diverse spectra, establishing a scalable paradigm for language-based spectroscopic analysis. Code is available at https://github.com/OPilgrim/SpectraLLM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SpectraLLM, a large language model for end-to-end molecular structure elucidation that reasons over single or multiple spectra (IR, Raman, UV-Vis, NMR, MS) by converting them into a shared language space. It claims to capture complementary substructural patterns across modalities, achieving state-of-the-art performance on four public benchmark datasets while outperforming single-modality baselines and showing further gains in multi-modal settings. The model is pretrained and fine-tuned on small-molecule data, with code released.

Significance. If the performance claims and multi-modal improvements hold under rigorous evaluation, this could represent a meaningful advance in automated structure elucidation by providing a scalable, language-based framework that integrates diverse spectral data without relying on pre-compiled databases. The open code supports reproducibility, which strengthens the contribution if the results prove robust.

major comments (2)
  1. [Abstract] Abstract and evaluation sections: The abstract asserts SOTA results, substantial gains over single-modality baselines, and further multi-modal improvements, yet provides no quantitative metrics, baseline details, error bars, dataset statistics, or specific performance numbers. Full assessment of whether the data support the central claim requires explicit reporting of these in the results section (e.g., accuracy, top-k rates, or comparison tables).
  2. [Methods] Representation and methods sections: The core claim that a shared language space enables the LLM to capture complementary substructural patterns across continuous (IR, Raman, UV-Vis, NMR) and discrete (MS) spectra hinges on the spectrum-to-text conversion preserving chemically discriminative features such as exact chemical shifts, coupling constants, or relative intensities. If the encoding relies on coarse peak lists, fixed binning, or textual descriptions, small but decisive differences between isomers could be lost, making any multi-modal gain potentially attributable to increased data volume rather than true cross-modal reasoning. Please provide the exact tokenization/encoding procedure and ablation studies isolating information loss.
minor comments (1)
  1. [Evaluation] The paper would benefit from clearer notation distinguishing unimodal vs. multi-modal input formats and from explicit discussion of how the four public benchmark datasets were split or preprocessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We have addressed each major comment below and describe the changes we will make to improve clarity and rigor in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract and evaluation sections: The abstract asserts SOTA results, substantial gains over single-modality baselines, and further multi-modal improvements, yet provides no quantitative metrics, baseline details, error bars, dataset statistics, or specific performance numbers. Full assessment of whether the data support the central claim requires explicit reporting of these in the results section (e.g., accuracy, top-k rates, or comparison tables).

    Authors: We agree that the abstract would be strengthened by the inclusion of specific quantitative metrics. In the revised manuscript we will update the abstract to report key performance numbers (e.g., top-1 accuracy on each of the four benchmarks and the magnitude of improvement over single-modality baselines). The full results, including error bars, dataset statistics, and comparison tables, are already presented in the evaluation section; we will ensure these are cross-referenced clearly from the abstract. revision: yes

  2. Referee: [Methods] Representation and methods sections: The core claim that a shared language space enables the LLM to capture complementary substructural patterns across continuous (IR, Raman, UV-Vis, NMR) and discrete (MS) spectra hinges on the spectrum-to-text conversion preserving chemically discriminative features such as exact chemical shifts, coupling constants, or relative intensities. If the encoding relies on coarse peak lists, fixed binning, or textual descriptions, small but decisive differences between isomers could be lost, making any multi-modal gain potentially attributable to increased data volume rather than true cross-modal reasoning. Please provide the exact tokenization/encoding procedure and ablation studies isolating information loss.

    Authors: We appreciate the referee’s emphasis on verifying that the spectrum-to-text conversion retains chemically relevant information. The current manuscript describes the conversion as a peak-based discretization that encodes position, intensity, and multiplicity information into tokens; however, we acknowledge that additional detail and targeted ablations would strengthen the claim. In the revision we will expand the methods section with the precise tokenization algorithm (including bin widths for continuous spectra and handling of discrete MS peaks) and add new ablation experiments that match data volume across single- and multi-modal settings to isolate the contribution of cross-modal reasoning. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external benchmark evaluation

full rationale

The paper describes a standard LLM pipeline: spectra are tokenized into a shared language space, the model is pretrained and fine-tuned on small-molecule data, and performance is measured on four independent public benchmark datasets. No equations, fitted parameters, or self-citations are presented that reduce any prediction or uniqueness claim to the inputs by construction. The central result (multi-modal improvement over single-modality baselines) is an empirical comparison against external test sets rather than a quantity defined internally by the model itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that spectral modalities can be tokenized into a shared language space and that fine-tuning on small-molecule data yields generalizable structure predictions; these are domain assumptions rather than independently derived facts.

free parameters (1)
  • LLM fine-tuning hyperparameters and model weights
    Weights and training choices are fitted to spectral datasets to achieve the reported performance.
axioms (1)
  • domain assumption Spectral data from different modalities can be represented in a shared language space that preserves substructural information
    Invoked to justify end-to-end reasoning over multiple spectra.

pith-pipeline@v0.9.0 · 5724 in / 1278 out tokens · 41053 ms · 2026-05-19T01:07:23.071844+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · 5 internal anchors

  1. [1]

    Crystallography Reports 66(4), 663–672 (2021)

    Shklover, V.Y., Kazanskii, P., Artemov, N., Maryasev, I.: Electron microscopy and electron diffraction studies of morphology and crystal structure of natural silicas. Crystallography Reports 66(4), 663–672 (2021)

  2. [2]

    Minerals 12(2), 8 205 (2022)

    Ali, A., Chiang, Y.W., Santos, R.M.: X-ray diffraction techniques for mineral characterization: A review for engineers of the fundamentals, applications, and research directions. Minerals 12(2), 8 205 (2022)

  3. [3]

    Clay Minerals 40(1), 1–13 (2005)

    Beermann, T., Brockamp, O.: Structure analysis of montmorillonite crystallites by convergent- beam electron diffraction. Clay Minerals 40(1), 1–13 (2005)

  4. [4]

    Physics today 48(11), 34–40 (1995)

    Als-Nielsen, J., Materlik, G.: Recent applications of x rays in condensed matter physics. Physics today 48(11), 34–40 (1995)

  5. [5]

    Crystallography Reports 56, 751–773 (2011)

    Ishchenko, A., Bagratashvili, V., Avilov, A.: Methods for studying the coherent 4d structural dynamics of free molecules and condensed state of matter. Crystallography Reports 56, 751–773 (2011)

  6. [6]

    Filipponi, A., Di Cicco, A., Natoli, C.R.: X-ray-absorption spectroscopy and n-body distribution functions in condensed matter. i. theory. Physical Review B 52(21), 15122 (1995)

  7. [7]

    eLS 10, 0002716 (2012)

    Krishnan, V., Rupp, B.: Macromolecular structure determination: comparison of x-ray crystal- lography and nmr spectroscopy. eLS 10, 0002716 (2012)

  8. [8]

    Biological Crystallography 54(5), 905–921 (1998)

    Br¨ unger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S.,et al.: Crystallography & nmr system: A new software suite for macromolecular structure determination. Biological Crystallography 54(5), 905–921 (1998)

  9. [9]

    Polymer Composites 41(10), 3940–3965 (2020)

    Hemath, M., Mavinkere Rangappa, S., Kushvaha, V., Dhakal, H.N., Siengchin, S.: A compre- hensive review on mechanical, electromagnetic radiation shielding, and thermal conductivity of fibers/inorganic fillers reinforced hybrid polymer composites. Polymer Composites 41(10), 3940–3965 (2020)

  10. [10]

    Chemical Reviews 124(3), 1247–1287 (2024)

    Gu, J., Duan, F., Liu, S., Cha, W., Lu, J.: Phase engineering of nanostructural metallic materials: Classification, structures, and applications. Chemical Reviews 124(3), 1247–1287 (2024)

  11. [11]

    John Wiley & Sons, ??? (2004)

    Stuart, B.H.: Infrared Spectroscopy: Fundamentals and Applications. John Wiley & Sons, ??? (2004)

  12. [12]

    Journal of pharmaceutical sciences 104(11), 3612–3638 (2015)

    Rantanen, J., Khinast, J.: The future of pharmaceutical manufacturing sciences. Journal of pharmaceutical sciences 104(11), 3612–3638 (2015)

  13. [13]

    Nature 308(5954), 32–36 (1984)

    Adrian, M., Dubochet, J., Lepault, J., McDowall, A.W.: Cryo-electron microscopy of viruses. Nature 308(5954), 32–36 (1984)

  14. [14]

    Advances in carbohydrate chemistry 19, 51–93 (1964)

    Hall, L.: Nuclear magnetic resonance. Advances in carbohydrate chemistry 19, 51–93 (1964)

  15. [15]

    Analytical chemistry 71(12), 343–350 (1999)

    Ng, L.M., Simmons, R.: Infrared spectroscopy. Analytical chemistry 71(12), 343–350 (1999)

  16. [16]

    John Wiley & Sons, ??? (2007)

    De Hoffmann, E., Stroobant, V.: Mass Spectrometry: Principles and Applications. John Wiley & Sons, ??? (2007)

  17. [17]

    Journal of the Physical Society of Japan 61(12), 4619–4637 (1992)

    Okada, K., Kotani, A.: Interatomic and intra-atomic configuration interactions in core-level x- ray photoemission spectra of late transition-metal compounds. Journal of the Physical Society of Japan 61(12), 4619–4637 (1992)

  18. [18]

    : Interpretation of infrared spectra, a practical approach

    Coates, J., et al. : Interpretation of infrared spectra, a practical approach. Encyclopedia of analytical chemistry 12, 10815–10837 (2000)

  19. [19]

    Journal of Chemical & Engineering Data 46(5), 1059–1063 (2001)

    Linstrom, P.J., Mallard, W.G.: The nist chemistry webbook: A chemical data resource on the internet. Journal of Chemical & Engineering Data 46(5), 1059–1063 (2001)

  20. [20]

    : Massbank: a public repository for sharing mass spectral data for life sciences

    Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., Ojima, Y., Tanaka, K., Tanaka, S., Aoshima, K., et al. : Massbank: a public repository for sharing mass spectral data for life sciences. Journal of mass spectrometry 45(7), 703–714 (2010) 9

  21. [21]

    Journal of Cheminformatics 17(1), 1–13 (2025)

    Punjabi, D., Huang, Y.-C., Holzhauer, L., Tremouilhac, P., Friederich, P., Jung, N., Br¨ ase, S.: Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data. Journal of Cheminformatics 17(1), 1–13 (2025)

  22. [22]

    : Reproducible molecular networking of untargeted mass spectrometry data using gnps

    Aron, A.T., Gentry, E.C., McPhail, K.L., Nothias, L.-F., Nothias-Esposito, M., Bouslimani, A., Petras, D., Gauglitz, J.M., Sikora, N., Vargas, F., et al. : Reproducible molecular networking of untargeted mass spectrometry data using gnps. Nature protocols 15(6), 1954–1991 (2020)

  23. [23]

    Journal of proteome research 19(7), 2786–2793 (2020)

    Shiferaw, G.A., Vandermarliere, E., Hulstaert, N., Gabriels, R., Martens, L., Volders, P.-J.: Coss: A fast and user-friendly tool for spectral library searching. Journal of proteome research 19(7), 2786–2793 (2020)

  24. [24]

    Journal of Molecular Structure 1073, 3–9 (2014)

    Platte, F., Heise, H.M.: Substance identification based on transmission thz spectra using library search. Journal of Molecular Structure 1073, 3–9 (2014)

  25. [25]

    CRC press, ??? (2018)

    Smith, B.C.: Infrared Spectral Interpretation: a Systematic Approach. CRC press, ??? (2018)

  26. [26]

    : Using raman spectroscopy to characterize biological materials

    Butler, H.J., Ashton, L., Bird, B., Cinque, G., Curtis, K., Dorney, J., Esmonde-White, K., Full- wood, N.J., Gardner, B., Martin-Hirsch, P.L., et al. : Using raman spectroscopy to characterize biological materials. Nature protocols 11(4), 664–687 (2016)

  27. [27]

    Springer, ??? (2013)

    Perkampus, H.-H.: UV-VIS Spectroscopy and Its Applications. Springer, ??? (2013)

  28. [28]

    Physical sciences reviews 4(4), 20180008 (2019)

    Picollo, M., Aceto, M., Vitorino, T.: Uv-vis spectroscopy. Physical sciences reviews 4(4), 20180008 (2019)

  29. [29]

    Elsevier, ??? (1988)

    Bovey, F.A., Mirau, P.A., Gutowsky, H.: Nuclear Magnetic Resonance Spectroscopy. Elsevier, ??? (1988)

  30. [30]

    Elsevier, ??? (2012)

    James, T.: Nuclear Magnetic Resonance in Biochemistry. Elsevier, ??? (2012)

  31. [31]

    : Global chemical effects of the microbiome include new bile-acid conjugations

    Quinn, R.A., Melnik, A.V., Vrbanac, A., Fu, T., Patras, K.A., Christy, M.P., Bodai, Z., Belda- Ferre, P., Tripathi, A., Chung, L.K., et al. : Global chemical effects of the microbiome include new bile-acid conjugations. Nature 579(7797), 123–129 (2020)

  32. [32]

    Courier Corporation, ??? (1980)

    Wilson, E.B., Decius, J.C., Cross, P.C.: Molecular Vibrations: the Theory of Infrared and Raman Vibrational Spectra. Courier Corporation, ??? (1980)

  33. [33]

    ACS omega 5(46), 29864–29871 (2020)

    Zhang, H., Li, L., Quan, S., Tian, W., Zhang, K., Nie, L., Zang, H.: Novel similarity methods evaluation and feasible application for pharmaceutical raw material identification with near- infrared spectroscopy. ACS omega 5(46), 29864–29871 (2020)

  34. [34]

    Scientific reports 3(1), 1111 (2013)

    Kim, S., Lee, D., Liu, X., Van Neste, C., Jeon, S., Thundat, T.: Molecular recognition using receptor-free nanomechanical infrared spectroscopy based on a quantum cascade laser. Scientific reports 3(1), 1111 (2013)

  35. [35]

    Nature Machine Intelligence 3(11), 973–984 (2021)

    Skinnider, M.A., Wang, F., Pasin, D., Greiner, R., Foster, L.J., Dalsgaard, P.W., Wishart, D.S.: A deep generative model enables automated structure elucidation of novel psychoactive substances. Nature Machine Intelligence 3(11), 973–984 (2021)

  36. [36]

    Communications Chemistry 7(1), 268 (2024)

    Alberts, M., Laino, T., Vaucher, A.C.: Leveraging infrared spectroscopy for automated structure elucidation. Communications Chemistry 7(1), 268 (2024)

  37. [37]

    Journal of chemical information and modeling 47(6), 2089–2097 (2007)

    Binev, Y., Marques, M.M., Aires-de-Sousa, J.: Prediction of 1h nmr coupling constants with associative neural networks trained for chemical shifts. Journal of chemical information and modeling 47(6), 2089–2097 (2007)

  38. [38]

    Journal of the American Chemical Society 101(16), 4481–4484 (1979) 10

    Mueller, L.: Sensitivity enhanced detection of weak nuclei using heteronuclear multiple quantum coherence. Journal of the American Chemical Society 101(16), 4481–4484 (1979) 10

  39. [39]

    ACS Central Science 10(11), 2162–2170 (2024)

    Hu, F., Chen, M.S., Rotskoff, G.M., Kanan, M.W., Markland, T.E.: Accurate and efficient struc- ture elucidation from routine one-dimensional nmr spectra using multitask machine learning. ACS Central Science 10(11), 2162–2170 (2024)

  40. [40]

    Communications Chemistry 6(1), 132 (2023)

    Litsa, E.E., Chenthamarakshan, V., Das, P., Kavraki, L.E.: An end-to-end deep learning frame- work for translating mass spectra to de-novo molecules. Communications Chemistry 6(1), 132 (2023)

  41. [41]

    Metabolomics 18(12), 94 (2022)

    Bittremieux, W., Wang, M., Dorrestein, P.C.: The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 18(12), 94 (2022)

  42. [42]

    Proceedings of the National Academy of Sciences 112(41), 12580–12585 (2015)

    D¨ uhrkop, K., Shen, H., Meusel, M., Rousu, J., B¨ ocker, S.: Searching molecular structure databases with tandem mass spectra using csi: Fingerid. Proceedings of the National Academy of Sciences 112(41), 12580–12585 (2015)

  43. [43]

    Briefings in bioinformatics 20(6), 2028–2043 (2019)

    Nguyen, D.H., Nguyen, C.H., Mamitsuka, H.: Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches. Briefings in bioinformatics 20(6), 2028–2043 (2019)

  44. [44]

    Microchemical Journal 159, 105395 (2020)

    Wang, Z., Feng, X., Liu, J., Lu, M., Li, M.: Functional groups prediction from infrared spectra based on computer-assist approaches. Microchemical Journal 159, 105395 (2020)

  45. [45]

    Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 317, 124461 (2024)

    Yang, J., Xu, P., Wu, S., Chen, Z., Fang, S., Xiao, H., Hu, F., Jiang, L., Wang, L., Mo, B., et al.: Raman spectroscopy for esophageal tumor diagnosis and delineation using machine learning and the portable raman spectrometer. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 317, 124461 (2024)

  46. [46]

    In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp

    Nalla, R., Pinge, R., Narwaria, M., Chaudhury, B.: Priority based functional group identifi- cation of organic molecules using machine learning. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 201–209 (2018)

  47. [47]

    Journal of the American Chemical Society 142(45), 19071–19077 (2020)

    Ye, S., Zhong, K., Zhang, J., Hu, W., Hirst, J.D., Zhang, G., Mukamel, S., Jiang, J.: A machine learning protocol for predicting protein infrared spectra. Journal of the American Chemical Society 142(45), 19071–19077 (2020)

  48. [48]

    Nature Computational Science 3(11), 957–964 (2023)

    Zou, Z., Zhang, Y., Liang, L., Wei, M., Leng, J., Jiang, J., Luo, Y., Hu, W.: A deep learning model for predicting selected organic molecular spectra. Nature Computational Science 3(11), 957–964 (2023)

  49. [49]

    Chemical Physics Letters 856, 141603 (2024)

    Al, S.A., Allouche, A.-R.: Neural network approach for predicting infrared spectra from 3d molecular structure. Chemical Physics Letters 856, 141603 (2024)

  50. [50]

    Chemical science 11(18), 4618–4630 (2020)

    Fine, J.A., Rajasekar, A.A., Jethava, K.P., Chopra, G.: Spectral deep learning for prediction and prospective validation of functional groups. Chemical science 11(18), 4618–4630 (2020)

  51. [51]

    Analytical chemistry 80(11), 4186–4192 (2008)

    Judge, K., Brown, C.W., Hamel, L.: Sensitivity of infrared spectra to chemical functional groups. Analytical chemistry 80(11), 4186–4192 (2008)

  52. [52]

    Journal of chemical information and computer sciences 36(1), 69–81 (1996)

    Klawun, C., Wilkins, C.L.: Optimization of functional group prediction from infrared spectra using neural networks. Journal of chemical information and computer sciences 36(1), 69–81 (1996)

  53. [53]

    Journal of the Chemical Society, Perkin Transactions 2 (11), 1755–1762 (1991)

    Fessenden, R.J., Gy¨ orgyi, L.: Identifying functional groups in ir spectra using an artificial neural network. Journal of the Chemical Society, Perkin Transactions 2 (11), 1755–1762 (1991)

  54. [54]

    Analytica chimica acta 420(2), 145–154 (2000)

    Hemmer, M.C., Gasteiger, J.: Prediction of three-dimensional molecular structures using information from infrared spectra. Analytica chimica acta 420(2), 145–154 (2000)

  55. [55]

    Journal of the American Chemical Society144(35), 16069–16076 (2022)

    Wang, X., Jiang, S., Hu, W., Ye, S., Wang, T., Wu, F., Yang, L., Li, X., Zhang, G., Chen, X., et al.: Quantitatively determining surface–adsorbate properties from vibrational spectroscopy with 11 interpretable machine learning. Journal of the American Chemical Society144(35), 16069–16076 (2022)

  56. [56]

    The Journal of Physical Chemistry Letters 14(20), 4858–4865 (2023)

    Chen, P.-Y., Shibata, K., Hagita, K., Miyata, T., Mizoguchi, T.: Prediction of the ground-state electronic structure from core-loss spectra of organic molecules by machine learning. The Journal of Physical Chemistry Letters 14(20), 4858–4865 (2023)

  57. [57]

    Magnetic Resonance in Chemistry 60(11), 1061–1069 (2022)

    Li, C., Cong, Y., Deng, W.: Identifying molecular functional groups of organic compounds by deep learning of nmr data. Magnetic Resonance in Chemistry 60(11), 1061–1069 (2022)

  58. [58]

    Magnetic Resonance in Chemistry62(4), 286–297 (2024)

    Specht, T., Arweiler, J., St¨ uber, J., M¨ unnemann, K., Hasse, H., Jirasek, F.: Automated nuclear magnetic resonance fingerprinting of mixtures. Magnetic Resonance in Chemistry62(4), 286–297 (2024)

  59. [59]

    The Journal of Physical Chemistry Letters 13(22), 4924–4933 (2022)

    Sridharan, B., Mehta, S., Pathak, Y., Priyakumar, U.D.: Deep reinforcement learning for molec- ular inverse problem of nuclear magnetic resonance spectra to molecular structure. The Journal of Physical Chemistry Letters 13(22), 4924–4933 (2022)

  60. [60]

    Chemical Science 12(46), 15329–15338 (2021)

    Huang, Z., Chen, M.S., Woroch, C.P., Markland, T.E., Kanan, M.W.: A framework for auto- mated structure elucidation from routine nmr spectra. Chemical Science 12(46), 15329–15338 (2021)

  61. [61]

    Digital Discovery 3(4), 818–829 (2024)

    Devata, S., Sridharan, B., Mehta, S., Pathak, Y., Laghuvarapu, S., Varma, G., Priyakumar, U.D.: Deepspinn–deep reinforcement learning for molecular structure prediction from infrared and 13 c nmr spectra. Digital Discovery 3(4), 818–829 (2024)

  62. [62]

    Proceedings of the IEEE 86(11), 2278–2324 (2002)

    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (2002)

  63. [63]

    IEEE transactions on neural networks and learning systems 33(12), 6999–7019 (2021)

    Li, Z., Liu, F., Yang, W., Peng, S., Zhou, J.: A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems 33(12), 6999–7019 (2021)

  64. [64]

    An Introduction to Convolutional Neural Networks

    O’shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)

  65. [65]

    Advances in neural information processing systems 30 (2017)

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polo- sukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

  66. [66]

    : A survey on vision transformer

    Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al. : A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence 45(1), 87–110 (2022)

  67. [67]

    Advances in neural information processing systems 34, 15908–15919 (2021)

    Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. Advances in neural information processing systems 34, 15908–15919 (2021)

  68. [68]

    Digital Discovery 3(1), 186–200 (2024)

    Sapegin, D.A., Bear, J.C.: Structure seer–a machine learning model for chemical structure elucidation from node labelling of a molecular graph. Digital Discovery 3(1), 186–200 (2024)

  69. [69]

    Advances in neural information processing systems 28 (2015)

    Rippel, O., Snoek, J., Adams, R.P.: Spectral representations for convolutional neural networks. Advances in neural information processing systems 28 (2015)

  70. [70]

    Advances in neural information processing systems 29 (2016)

    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016)

  71. [71]

    Magnetic Resonance in Chemistry 60(11), 1052– 1060 (2022)

    Kuhn, S., Tumer, E., Colreavy-Donnelly, S., Moreira Borges, R.: A pilot study for fragment identification using 2d nmr and deep learning. Magnetic Resonance in Chemistry 60(11), 1052– 1060 (2022)

  72. [72]

    Chemometrics and Intelligent Laboratory Systems 234, 104757 (2023)

    Zhao, Z., Liu, Z., Ji, M., Zhao, X., Zhu, Q., Huang, M.: Conincedeep: A novel deep learning 12 method for component identification of mixture based on raman spectroscopy. Chemometrics and Intelligent Laboratory Systems 234, 104757 (2023)

  73. [73]

    Analytical chemistry 95(12), 5393–5401 (2023)

    Yao, L., Yang, M., Song, J., Yang, Z., Sun, H., Shi, H., Liu, X., Ji, X., Deng, Y., Wang, X.: Conditional molecular generation net enables automated structure elucidation based on 13c nmr spectra and prior knowledge. Analytical chemistry 95(12), 5393–5401 (2023)

  74. [74]

    Alberts, M., Zipoli, F., Vaucher, A.C.: Learning the language of nmr: Structure elucidation from nmr spectra using transformer models (2023)

  75. [75]

    Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    Cho, K., Van Merri¨ enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Ben- gio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  76. [76]

    On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

    Cho, K., Van Merri¨ enboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

  77. [77]

    ACS central science 3(10), 1103–1113 (2017)

    Liu, B., Ramsundar, B., Kawthekar, P., Shi, J., Gomes, J., Luu Nguyen, Q., Ho, S., Sloane, J., Wender, P., Pande, V.: Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10), 1103–1113 (2017)

  78. [78]

    Bioinformatics 36(21), 5177–5186 (2020)

    Tang, Y.-J., Pang, Y.-H., Liu, B.: Idp-seq2seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics 36(21), 5177–5186 (2020)

  79. [79]

    Virus Evolution 9(1), 022 (2023)

    Berman, D.S., Howser, C., Mehoke, T., Ernlund, A.W., Evans, J.D.: Mutagan: A sequence-to- sequence gan framework to predict mutations of evolving protein populations. Virus Evolution 9(1), 022 (2023)

  80. [80]

    Briefings in Bioinformatics 25(4), 298 (2024)

    Zhang, R., Lin, Y., Wu, Y., Deng, L., Zhang, H., Liao, M., Peng, Y.: Mvmrl: a multi-view molecu- lar representation learning method for molecular property prediction. Briefings in Bioinformatics 25(4), 298 (2024)

Showing first 80 references.