MSAlign: Aligning Molecule and Mass Spectra Foundation Models for Metabolite Identification
Pith reviewed 2026-05-20 07:43 UTC · model grok-4.3
The pith
Aligning frozen foundation models for mass spectra and molecules via lightweight projections and contrastive learning improves retrieval of metabolite structures from spectra.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MSAlign learns a shared representation space by aligning two frozen foundation models through lightweight MLP projections trained with a candidate-based contrastive objective, leading to consistent outperformance over existing approaches in molecule retrieval from mass spectra.
What carries the argument
MSAlign, the method that aligns frozen foundation models for mass spectra and molecules using lightweight MLP projections and candidate-based contrastive training to create a shared representation space for improved retrieval.
If this is right
- MSAlign is simple to implement and fast to train compared to prior methods.
- It consistently outperforms existing approaches across all benchmarks for molecule retrieval.
- The candidate-based contrastive objective enables effective alignment without joint fine-tuning of the foundation models.
- Quantifying distribution shift provides a way to evaluate and improve data splitting strategies in retrieval benchmarks.
Where Pith is reading between the lines
- This approach indicates that lightweight alignment can suffice for multimodal tasks in chemistry instead of full model retraining.
- The technique might extend to aligning other types of spectral and structural data in related scientific domains.
- Releasing unified code and splits could standardize comparisons and reduce implementation barriers in metabolomics research.
Load-bearing premise
The frozen foundation models for mass spectra and molecules already encode sufficiently rich and compatible features that lightweight MLP projections plus candidate-based contrastive training can reliably improve retrieval without needing joint fine-tuning or suffering from distribution shift in real-world candidate sets.
What would settle it
Observing that MSAlign does not outperform baselines on a benchmark with substantial distribution shift between the candidate sets used in training and real-world conditions would falsify the central claim.
Figures
read the original abstract
Accurately identifying metabolites i.e. small molecules from mass spectrometry data remains a core challenge in metabolomics, with broad applications in drug discovery, environmental analysis, and clinical research. We address the Molecule Retrieval task, which consists in recovering the chemical structure of a metabolite from its MS/MS spectrum given a set of candidate molecules. While the recent release of benchmark datasets such as MassSpecGym and Spectraverse has considerably accelerated the development of novel machine learning approaches, the complexity of data preprocessing pipelines and the lack of unified implementations make methods and results difficult to reproduce and compare. We make three contributions. First, we propose a unified framework encompassing recent approaches based on representation alignment and contrastive learning. Second, we introduce MSAlign, inspired by multimodal alignment in vision-language models, which learns a shared representation space by aligning two frozen foundation models (DreaMS for mass spectra and ChemBERTa for molecules) through lightweight MLP projections trained with a candidate-based contrastive objective. MSAlign is simple to implement, fast to train and consistently outperforms existing approaches across all benchmarks. Third, we investigate a long-standing evaluation problem: data splitting strategies in molecule retrieval implicitly trade off data leakage against domain shift. We formalize this tension by introducing a quantitative measure of distribution shift, and use it to evaluate splitting strategies in existing benchmarks. All datasets, splits, candidate sets, and a unified implementation of MSAlign and baselines are publicly released to support reproducible research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MSAlign, a lightweight method to align frozen DreaMS (mass spectra) and ChemBERTa (molecules) foundation models via MLP projections trained with a candidate-based contrastive objective for metabolite identification from MS/MS spectra. It also proposes a quantitative distribution-shift metric to analyze data-splitting strategies, critiques existing benchmarks for trading leakage against shift, and releases unified code, datasets, splits, and baseline implementations.
Significance. If the central results hold, MSAlign shows that simple alignment of independently pre-trained models can deliver consistent retrieval gains without joint fine-tuning, providing an efficient and reproducible approach. The distribution-shift metric addresses a persistent evaluation issue in molecule retrieval. The public release of all datasets, splits, candidate sets, and a unified implementation framework is a clear strength that supports reproducibility and future work.
major comments (2)
- [§4] §4 (experimental results): the central claim of consistent outperformance across all benchmarks is presented without error bars, standard deviations, or statistical significance tests over multiple random seeds or runs; this makes it difficult to assess whether the reported gains over baselines are robust.
- [§5] §5 (distribution shift analysis): the paper introduces a quantitative shift measure and explicitly notes that existing splits trade leakage against domain shift, yet the main benchmark results rely on the very splits critiqued in this section; this creates a tension with the claim that gains will hold for realistic candidate sets (e.g., PubChem or HMDB) that may exhibit larger shifts.
minor comments (2)
- [Figure 1] The architecture diagram (Figure 1 or 2) would benefit from explicit notation of the projection dimensions and the exact form of the contrastive loss to improve clarity for readers implementing the method.
- [Table 2] Table 2 (or equivalent results table) lists baseline comparisons but does not indicate which components of the unified framework were used for each baseline; adding a short column or footnote would aid reproducibility.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment point by point below and describe the revisions we will make to improve the robustness of the reported results and to clarify the relationship between our benchmark evaluations and the distribution-shift analysis.
read point-by-point responses
-
Referee: [§4] §4 (experimental results): the central claim of consistent outperformance across all benchmarks is presented without error bars, standard deviations, or statistical significance tests over multiple random seeds or runs; this makes it difficult to assess whether the reported gains over baselines are robust.
Authors: We agree that reporting variability across runs would strengthen the central claims. In the revised manuscript we will rerun all experiments with at least five random seeds, report mean performance together with standard deviations for every metric and baseline, and add paired statistical significance tests (e.g., Wilcoxon signed-rank) between MSAlign and the strongest baselines on each benchmark. revision: yes
-
Referee: [§5] §5 (distribution shift analysis): the paper introduces a quantitative shift measure and explicitly notes that existing splits trade leakage against domain shift, yet the main benchmark results rely on the very splits critiqued in this section; this creates a tension with the claim that gains will hold for realistic candidate sets (e.g., PubChem or HMDB) that may exhibit larger shifts.
Authors: We acknowledge the tension. The main tables use the canonical MassSpecGym and Spectraverse splits solely to enable head-to-head comparison with all previously published numbers; this is the conventional practice when introducing a new method. Section 5 then quantifies the leakage-versus-shift trade-off for these and alternative splits using the new metric we introduce. In the revision we will add an explicit paragraph in the discussion that (i) states the scope of the current claims is the standard benchmarks and (ii) notes that larger gains or smaller gains may be observed under higher-shift regimes. We will also include a short additional experiment that evaluates MSAlign on one higher-shift split constructed according to the metric, thereby directly addressing the referee’s concern about realistic candidate sets. revision: partial
Circularity Check
No significant circularity; method and claims are empirically grounded in independent pre-trained models and external benchmarks
full rationale
The paper's core contribution is an empirical alignment method (MSAlign) that freezes independently pre-trained DreaMS and ChemBERTa models, adds lightweight MLP projections, and trains them with a candidate-based contrastive loss on retrieval tasks. This construction does not reduce any claimed performance gain to a fitted parameter defined by the evaluation data itself, nor does it rely on self-citation for load-bearing uniqueness theorems or ansatzes. The newly introduced distribution-shift metric is used to analyze existing splits rather than to derive the method's superiority. All reported outperformance is validated on public benchmarks (MassSpecGym, Spectraverse) with released code and splits, keeping the derivation self-contained against external data rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption DreaMS and ChemBERTa produce representations that are alignable by simple MLPs for the retrieval task.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MSAlign ... aligning two frozen foundation models (DreaMS for mass spectra and ChemBERTa for molecules) through lightweight MLP projections trained with a candidate-based contrastive objective.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we formalize this tension by introducing a quantitative measure of distribution shift
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Alseekh, S., Aharoni, A., Brotman, Y ., Contrepois, K., D’Auria, J., Ewald, J., C. Ewald, J., Fraser, P. D., Giavalisco, P., Hall, R. D., Heinemann, M., Link, H., Luo, J., Neumann, S., Nielsen, J., Perez de Souza, L., Saito, K., Sauer, U., Schroeder, F. C., Schuster, S., Siuzdak, G., Skirycz, A., Sumner, L. W., Snyder, M. P., Tang, H., Tohge, T., Wang, Y ...
work page 2021
-
[3]
J., Taskar, B., and Vishwanathan, S
Bakır, G., Hofmann, T., Schölkopf, B., Smola, A. J., Taskar, B., and Vishwanathan, S. V . N., editors (2007). Predicting Structured Data. MIT Press, Cambridge, MA
work page 2007
-
[4]
Bittremieux, W., Wang, M., and Dorrestein, P. C. (2022). The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics, 18(12):94
work page 2022
-
[5]
Bohde, M., Manjrekar, M., Wang, R., Ji, S., and Coley, C. W. (2025). DiffMS: diffusion generation of molecules conditioned on mass spectra. In Proceedings of the 42nd International Conference on Machine Learning, ICML’25. JMLR.org
work page 2025
-
[6]
Brogat-Motte, L., Flamary, R., Brouard, C., Rousu, J., and d’Alché Buc, F. (2022). Learning to predict graphs with fused gromov-wasserstein barycenters. In International Conference on Machine Learning, pages 2321–2335. PMLR
work page 2022
-
[7]
Brouard, C., Shen, H., Dührkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics, 32(12):i28–i36
work page 2016
-
[8]
F., Young, A., Kretschmer, F., Samusevich, R., Heirman, J., Wang, F., Zhang, L., Dührkop, K., et al
Bushuiev, R., Bushuiev, A., de Jonge, N. F., Young, A., Kretschmer, F., Samusevich, R., Heirman, J., Wang, F., Zhang, L., Dührkop, K., et al. (2024). MassSpecGym: A benchmark for the discovery and identification of molecules. Advances in Neural Information Processing Systems, 37:110010–110027
work page 2024
-
[9]
Bushuiev, R., Bushuiev, A., Samusevich, R., Brungs, C., Sivic, J., and Pluskal, T. (2025). Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS. Nature Biotechnology, pages 1–11
work page 2025
-
[10]
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR
work page 2020
-
[11]
Z., Rushing, B., and Hassoun, S
Chen, Y . Z., Rushing, B., and Hassoun, S. (2026). FLARE: Fine-grained learning for alignment of spectra-molecule representation enhances metabolite annotation. bioRxiv, pages 2026–01
work page 2026
-
[12]
S., Junot, C., Tabet, J.-C., and Fenaille, F
Damont, A., Darii, E., Cao, C., Legrand, A., Perret, A., Dechaumet, S., Woods, A. S., Junot, C., Tabet, J.-C., and Fenaille, F. (2025). Exploring the fragmentation of sodiated species involving covalent-bond cleavages for metabolite characterization. Rapid Communications in Mass Spectrometry, page e10133
work page 2025
-
[13]
de Jonge, N., van der Hooft, J. J. J., and Probst, D. (2025). To Bin or not to Bin: Alternative Representations of Mass Spectra. 10
work page 2025
-
[14]
P., Laukens, K., and Cuyckens, F
De Vijlder, T., Valkenborg, D., Lemière, F., Romijn, E. P., Laukens, K., and Cuyckens, F. (2018). A tutorial in small molecule identification via electrospray ionization-mass spectrometry: The practical art of structural elucidation. Mass Spectrometry Reviews, 37(5):607–629
work page 2018
- [15]
-
[16]
Domingo-Almenara, X., Montenegro-Burke, J. R., Benton, H. P., and Siuzdak, G. (2018). Annotation: A Computational Solution for Streamlining Metabolomics Analysis. Analytical Chemistry, 90(1):480–489
work page 2018
-
[17]
Dührkop, K., Fleischauer, M., Ludwig, M., Aksenov, A. A., Melnik, A. V ., Meusel, M., Dorrestein, P. C., Rousu, J., and Böcker, S. (2019). SIRIUS 4: A rapid tool for turning tandem mass spectra into metabolite structure information. Nature Methods, 16(4):299–302
work page 2019
-
[18]
Dührkop, K., Nothias, L.-F., Fleischauer, M., Reher, R., Ludwig, M., Hoffmann, M. A., Petras, D., Gerwick, W. H., Rousu, J., Dorrestein, P. C., et al. (2021). Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nature biotechnology, 39(4):462–471
work page 2021
-
[19]
Dührkop, K., Shen, H., Meusel, M., Rousu, J., and Böcker, S. (2015). Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proceedings of the National Academy of Sciences, 112(41):12580–12585
work page 2015
-
[20]
El Abiead, Y ., Rutz, A., Zuffa, S., Amer, B., Xing, S., Brungs, C., Schmid, R., Correia, M. S. P., Caraballo- Rodriguez, A. M., Zarrinpar, A., Mannochio-Russo, H., Witting, M., Mohanty, I., Pluskal, T., Bittremieux, W., Knight, R., Patterson, A. D., van der Hooft, J. J. J., Böcker, S., Dunn, W. B., Linington, R. G., Wishart, D. S., Wolfender, J.-L., Fieh...
work page 2025
-
[21]
El Ahmad, T., Brogat-Motte, L., Laforgue, P., and d’Alché Buc, F. (2024). Sketch in, sketch out: Accelerating both learning and inference for structured prediction with kernels. In International conference on artificial intelligence and statistics, pages 109–117. PMLR
work page 2024
-
[22]
Elser, D., Huber, F., and Gaquerel, E. (2023). Mass2SMILES: deep learning based fast prediction of structures and functional groups directly from high-resolution ms/ms spectra. bioRxiv, pages 2023–07
work page 2023
-
[23]
Fan, Z., Alley, A., Ghaffari, K., and Ressom, H. W. (2020). MetFID: Artificial neural network-based compound fingerprint prediction for metabolite annotation. Metabolomics, 16(10):104
work page 2020
-
[24]
Farahani, A., V oghoei, S., Rasheed, K., and Arabnia, H. R. (2021). A brief review of domain adaptation. Advances in data science and information engineering: proceedings from ICDATA2020 and IKE 2020, pages 877–894
work page 2021
-
[25]
Q., David, L., Bonet, C., Cassereau, N., Gnassounou, T., et al
Flamary, R., Vincent-Cuaz, C., Courty, N., Gramfort, A., Kachaiev, O., Tran, H. Q., David, L., Bonet, C., Cassereau, N., Gnassounou, T., et al. (2024). Pot python optimal transport (version 0.9. 5), 2024. URL https://github. com/PythonOT/POT, 10
work page 2024
-
[26]
Goldman, S., Wohlwend, J., Stražar, M., Haroush, G., Xavier, R. J., and Coley, C. W. (2023a). Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nature Machine Intelligence, 5(9):965–979
-
[27]
Goldman, S., Xin, J., Provenzano, J., and Coley, C. W. (2023b). MIST-CF: Chemical formula inference from tandem mass spectra. Journal of Chemical Information and Modeling, 64(7):2421–2431
-
[28]
Gupta, V ., Qiang, H., Chung, H.-H., Herbst, E., and Skinnider, M. A. (2026). Comprehensive curation and harmonization of small-molecule MS/MS libraries in Spectraverse. Analytical Chemistry, 98(5):3934–3943
work page 2026
- [29]
-
[30]
Heinonen, M., Shen, H., Zamboni, N., and Rousu, J. (2012). Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics, 28(18):2333–2341
work page 2012
-
[31]
Heirman, J. and Bittremieux, W. (2024). Reusability report: annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nature Machine Intelligence, 6(11):1296–1302
work page 2024
-
[32]
Hong, Y ., Li, S., Ye, Y ., and Tang, H. (2025). FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra. Nature Communications, 16(1):11102
work page 2025
-
[33]
Huh, M., Cheung, B., Wang, T., and Isola, P. (2024). Position: The platonic representation hypothesis. In ICML, pages 20617–20642
work page 2024
-
[34]
Ji, H., Deng, H., Lu, H., and Zhang, Z. (2020). Predicting a molecular fingerprint from an electron ionization mass spectrum with deep neural networks. Analytical chemistry, 92(13):8649–8653
work page 2020
- [35]
-
[36]
Kalia, A., Zhou Chen, Y ., Krishnan, D., and Hassoun, S. (2025). JESTR: Joint embedding space tech- nique for ranking candidate molecules for the annotation of untargeted metabolomics data. Bioinformatics, 41(7):btaf354
work page 2025
-
[37]
Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B. A., Thiessen, P. A., Yu, B., et al. (2023). PubChem 2023 update. Nucleic acids research, 51(D1):D1373–D1380
work page 2023
-
[38]
S., Wohlgemuth, G., Barupal, D
Kind, T., Tsugawa, H., Cajka, T., Ma, Y ., Lai, Z., Mehta, S. S., Wohlgemuth, G., Barupal, D. K., Showalter, M. R., Arita, M., and Fiehn, O. (2018). Identification of small molecules using accurate mass MS/MS search. Mass Spectrometry Reviews, 37(4):513–532
work page 2018
- [39]
-
[40]
Kudriavtseva, P., Kashkinov, M., and Kertész-Farkas, A. (2021). Deep convolutional neural networks help scoring tandem mass spectrometry data in database-searching approaches. Journal of proteome research, 20(10):4708–4717
work page 2021
-
[41]
Landrum, G. et al. (2013). Rdkit documentation. Release, 1(1-79):4
work page 2013
-
[42]
LeCun, Y ., Chopra, S., Hadsell, R., Ranzato, M., Huang, F., et al. (2006). A tutorial on energy-based learning. Predicting structured data, 1(0)
work page 2006
-
[43]
Litsa, E., Chenthamarakshan, V ., Das, P., and Kavraki, L. (2021). Spec2Mol: An end-to-end deep learning framework for translating ms/ms spectra to de-novo molecules. ChemRxiv
work page 2021
-
[44]
Ludwig, M., Broeckling, C. D., Dorrestein, P. C., Dührkop, K., Schymanski, E. L., Böcker, S., and Nothias, L.-F. (2020). Studying charge migration fragmentation of sodiated precursor ions in collision-induced dissociation at the library scale. Journal of the American Society for Mass Spectrometry, 32(1):180–186
work page 2020
- [45]
-
[46]
Nguyen, D. H., Nguyen, C. H., and Mamitsuka, H. (2019). Recent advances and prospects of computational methods for metabolite identification: A review with emphasis on machine learning approaches. Briefings in Bioinformatics, 20(6):2028–2043
work page 2019
-
[47]
Nowozin, S. and Lampert, C. H. (2011). Structured prediction and learning in computer vision.Foundations and Trends in Computer Graphics and Vision, 6(3-4):3–4
work page 2011
-
[48]
Peyré, G. and Cuturi, M. (2019). Computational optimal transport with applications to data sciences. Foundations and Trends® in Machine Learning, 11(5-6):355–607
work page 2019
-
[49]
Pollmann, J., Bushuiev, R., Bushuiev, A., Pluskal, T., and Huber, F. (2026). Bridging ms2 spectra and chemical space: Advances in spectral similarity, molecular retrieval, and de novo structure discovery. chemrxiv.15000536
work page 2026
-
[50]
Quinn, R. A., Melnik, A. V ., Vrbanac, A., Fu, T., Patras, K. A., Christy, M. P., Bodai, Z., Belda-Ferre, P., Tripathi, A., Chung, L. K., Downes, M., Welch, R. D., Quinn, M., Humphrey, G., Panitchpakdi, M., Weldon, K. C., Aksenov, A., da Silva, R., Avila-Pacheco, J., Clish, C., Bae, S., Mallick, H., Franzosa, E. A., Lloyd-Price, J., Bussell, R., Thron, T....
work page 2020
-
[51]
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763
work page 2021
-
[52]
Rakhshaninejad, M., De Waele, G., Jürgens, M., and Waegeman, W. (2026). Reliable molecular retrieval from mass spectra using conformal prediction. bioRxiv, pages 2026–03
work page 2026
- [53]
-
[54]
Rong, Y ., Bian, Y ., Xu, T., Xie, W., Wei, Y ., Huang, W., and Huang, J. (2020). Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, 33:12559– 12571
work page 2020
- [55]
-
[56]
F., Nowatzky, Y ., Jaeger, C., Parr, M
Russo, F. F., Nowatzky, Y ., Jaeger, C., Parr, M. K., Benner, P., Muth, T., and Lisec, J. (2024). Machine learning methods for compound annotation in non-targeted mass spectrometry—A brief overview of fin- gerprinting, in silico fragmentation and de novo methods. Rapid Communications in Mass Spectrometry, 38(20):e9876. 12
work page 2024
-
[57]
L., Jeon, J., Gulde, R., Fenner, K., Ruff, M., Singer, H
Schymanski, E. L., Jeon, J., Gulde, R., Fenner, K., Ruff, M., Singer, H. P., and Hollender, J. (2014). Identi- fying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence. Environmental Science & Technology, 48(4):2097–2098
work page 2014
-
[58]
A., Dührkop, K., Böcker, S., and Zamboni, N
Stravs, M. A., Dührkop, K., Böcker, S., and Zamboni, N. (2022). MSNovelist: de novo structure generation from mass spectra. Nature Methods, 19(7):865–870
work page 2022
- [59]
-
[60]
Vaniya, A. and Fiehn, O. (2022). Revisiting CASMI: Compound ID for 500 new unknowns, using LC-MS/MS data
work page 2022
-
[61]
N., Kaiser, Ł., and Polosukhin, I
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30
work page 2017
-
[62]
K., Villecroze, V ., Cresswell, J
V ouitsis, N., Liu, Z., Gorti, S. K., Villecroze, V ., Cresswell, J. C., Yu, G., Loaiza-Ganem, G., and V olkovs, M. (2024). Data-efficient multimodal fusion on a single GPU. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27239–27251
work page 2024
-
[63]
Wang, T. and Isola, P. (2020). Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International conference on machine learning, pages 9929–9939. PMLR
work page 2020
- [64]
-
[65]
Wishart, D. S. (2019). Metabolomics for investigating physiological and pathophysiological processes. Physiological Reviews, 99(4):1819–1875
work page 2019
-
[66]
N., Gomes, J., Geniesse, C., Pappu, A
Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K., and Pande, V . (2018). MoleculeNet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530
work page 2018
-
[67]
Xing, S., Shen, S., Xu, B., Li, X., and Huan, T. (2023). BUDDY: Molecular formula discovery via bottom-up MS/MS interrogation. Nature Methods, 20(6):881–890
work page 2023
-
[68]
Xu, R. and Zhu, J. (2025). Unveiling the dark matter of the metabolome: A narrative review of bioinfor- matics tools for LC-HRMS-based compound annotation. Talanta, 295:128327
work page 2025
-
[69]
Zhang, L., Yang, Q., and Agrawal, A. (2025). Assessing and learning alignment of unimodal vision and language models. In CVPR, pages 14604–14614. 13 A Implementation details All SMILES strings are canonicalized and sanitized using RDKit [ 41]. Chemical formulas and weights are computed by explicitly accounting for implicit hydrogen atoms. The molecular ma...
work page 2025
-
[70]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.