SCENT: Aligning Mass Spectra with Molecular Structure for Olfactory Perception
Pith reviewed 2026-06-29 19:12 UTC · model grok-4.3
The pith
Aligning mass spectra with chemical structure embeddings enables odor prediction from spectra alone at test time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SCENT uses multi-modal contrastive learning to align electron ionization mass spectrometry representations with pretrained chemical structure embeddings, so that only mass spectra are required at inference while still supporting accurate prediction of multi-label odor descriptors at levels comparable to models that receive explicit structure input.
What carries the argument
Spectrum-to-Chemical Embedding alignmeNT (SCENT), a contrastive learning framework that pulls mass-spectra representations toward pretrained structure embeddings and pushes unrelated pairs apart.
If this is right
- The spectrum-only model beats standard MS-only baselines on multi-label odor descriptor prediction.
- Performance reaches levels comparable to models that receive explicit molecular structure at test time.
- The learned representations more closely match continuous human perceptual ratings than baselines.
- The approach generalizes to real-world laboratory-measured mass spectra.
Where Pith is reading between the lines
- Portable or field-deployable sensors could use this alignment to estimate odor properties without needing chemical structure databases at runtime.
- The same alignment strategy might transfer to other analytical signals such as infrared spectra or chromatography data for related perceptual or functional predictions.
- If the structure embeddings already encode perceptual semantics, the method effectively distills that knowledge into a cheaper input modality.
Load-bearing premise
The contrastive alignment successfully moves perceptual semantic information into the spectrum encoder so that spectra alone become sufficient for accurate odor prediction.
What would settle it
Training the alignment and then testing on a held-out set where the spectrum-only model performs no better than an untrained spectrum baseline on odor descriptor prediction.
Figures
read the original abstract
Predicting human olfactory perception from molecular structure has seen remarkable progress, yet these approaches require explicit chemical structure at inference, which is not available in practical sensing settings. We address this gap by exploring direct electron ionization mass spectrometry (EI-MS), a sensing technique that acquires chemically informative fragmentation fingerprints in seconds, as an alternative input modality for olfactory prediction. We contribute Spectrum-to-Chemical Embedding alignmeNT (SCENT), a multi-modal contrastive learning framework that aligns EI-MS representations with pretrained chemical structure embeddings, while requiring only mass spectra at inference. On the multi-label odor descriptor prediction task, SCENT significantly outperforms MS-only baselines and achieves performance comparable to structure-based models, despite requiring no explicit molecular structure at test time. The learned representations also better approximate continuous human perceptual ratings and generalize to real-world lab-measured spectra, suggesting that cross-modal alignment is an effective strategy for grounding analytical spectra in chemical semantics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SCENT, a contrastive learning framework that aligns EI-MS spectra representations with pretrained molecular structure embeddings. At inference, only mass spectra are required for multi-label odor descriptor prediction. The central claim is that SCENT significantly outperforms MS-only baselines while matching the performance of structure-based models, with additional benefits in approximating continuous perceptual ratings and generalizing to real-world spectra.
Significance. If the alignment successfully transfers odor-relevant perceptual semantics, the work would enable practical sensing applications where molecular structures are unavailable, using rapid EI-MS acquisition. The approach demonstrates a concrete use of cross-modal contrastive learning to ground analytical data in chemical semantics without requiring structure at test time.
major comments (2)
- [Abstract] Abstract: the claim that contrastive alignment transfers perceptual semantic information such that spectra alone suffice for accurate odor prediction is not supported by any reported cross-modal retrieval metrics, embedding-space odor correlation analysis, or ablation removing the contrastive term; without these, outperformance over MS baselines could arise from generic chemical similarity rather than olfactory transfer.
- [Abstract] The weakest assumption (that structure embeddings encode perceptual rather than purely structural features and that the alignment objective prioritizes perceptual axes) is load-bearing for the parity-with-structure-models claim, yet no quantitative evidence is provided to rule out that the MS embeddings improve baselines without carrying the claimed perceptual signal.
minor comments (1)
- [Abstract] Abstract: dataset sizes, number of odor descriptors, specific baselines, and exact performance metrics (e.g., F1, mAP) are omitted, which hinders immediate assessment of the reported gains.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for stronger evidence supporting the perceptual transfer claims. We address each major comment below and commit to revisions that directly test the alignment's role in transferring olfactory semantics.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that contrastive alignment transfers perceptual semantic information such that spectra alone suffice for accurate odor prediction is not supported by any reported cross-modal retrieval metrics, embedding-space odor correlation analysis, or ablation removing the contrastive term; without these, outperformance over MS baselines could arise from generic chemical similarity rather than olfactory transfer.
Authors: We agree that the current manuscript lacks these direct diagnostics and that outperformance alone does not conclusively isolate perceptual transfer from generic chemical similarity. In the revised version we will add: (i) cross-modal retrieval metrics (spectrum-to-molecule and molecule-to-spectrum recall@K), (ii) embedding-space analysis correlating aligned MS vectors with odor descriptor labels, and (iii) an ablation that trains an identical architecture without the contrastive term and reports the resulting drop in odor-descriptor F1. These additions will quantify whether the contrastive objective specifically aligns perceptual axes. revision: yes
-
Referee: [Abstract] The weakest assumption (that structure embeddings encode perceptual rather than purely structural features and that the alignment objective prioritizes perceptual axes) is load-bearing for the parity-with-structure-models claim, yet no quantitative evidence is provided to rule out that the MS embeddings improve baselines without carrying the claimed perceptual signal.
Authors: We acknowledge that the manuscript does not yet provide quantitative evidence ruling out purely structural transfer. We will add two analyses in revision: (1) direct correlation between the pretrained structure embeddings and continuous human perceptual ratings (e.g., Pearson r on intensity or pleasantness), and (2) a controlled comparison showing that odor-prediction performance of the aligned MS embeddings exceeds that of non-contrastively trained MS embeddings by a margin comparable to the structure-model gap. These results will test whether the alignment objective preferentially captures perceptual rather than generic structural dimensions. revision: yes
Circularity Check
No circularity: empirical contrastive alignment claims rest on reported experiments, not definitional reduction
full rationale
The paper describes a standard multi-modal contrastive framework (SCENT) that aligns EI-MS representations to pretrained structure embeddings and then evaluates odor descriptor prediction empirically. No equations, training objectives, or performance metrics are shown to reduce by construction to the inputs (e.g., no fitted parameter renamed as prediction, no self-definitional loop, no load-bearing self-citation chain). The central claim—that alignment transfers perceptual information—is presented as an experimental outcome rather than a mathematical identity. This is the most common honest finding for an applied ML paper whose results are benchmark-driven.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
D. Feng, C. Li, W. Dai, and P. P. Liang. Smellnet: A large-scale dataset for real-world smell recognition.arXiv preprint arXiv:2506.00239,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
B. Sanchez-Lengeling, J. N. Wei, B. K. Lee, R. C. Gerkin, A. Aspuru-Guzik, and A. B. Wiltschko. Machine learning for scent: Learning generalizable perceptual representations of small molecules. arXiv preprint arXiv:1910.10685,
-
[3]
Detailed model architecture, objectives, and hyperparameters are provided in Section 3 and Appendix C
11 A SCENT workflow This section summarizes the experimental workflow. Detailed model architecture, objectives, and hyperparameters are provided in Section 3 and Appendix C. Data division.We filter GS-LF dataset with molecules weight in 50-300 Da, remains 2,588 molecule. We first split the 2,588 molecule-spectrum pairs with valid MS spectra into a fixed 1...
2023
-
[4]
(*p < .05). F Additional results: human rating regression F.1 Statistical test results of Pearsonr To better understand the statistical significance of the perceptual regression results, we apply the Wilcoxon signed-rank test (Wilcoxon, 1992), a non-parametric paired test, to each label’s Pearson’s r vector (n= 21 ) in 100 folds cross-validation. The non-...
1992
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.