Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components

Julia Westermayr; Peter Fichtelmann

arxiv: 2512.08683 · v1 · pith:EKS4YBUXnew · submitted 2025-12-09 · ⚛️ physics.chem-ph

Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components

Peter Fichtelmann , Julia Westermayr This is my paper

Pith reviewed 2026-05-21 18:25 UTC · model grok-4.3

classification ⚛️ physics.chem-ph

keywords odor strength predictionmachine learningmolecular descriptorsordinal classificationfragrance designperfumerycheminformaticsSHAP analysis

0 comments

The pith

A machine learning approach predicts odor strength categories for new perfume molecules by training on an integrated dataset of over 2000 compounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates an ordinal dataset labeling more than 2000 molecules as odorless, low, medium, or high strength by merging two public sources. It tests multiple molecular representations and algorithms to find the best way to predict these categories from structure. Analysis shows that features like molecular size, polarity, rings, and branching are most important, aligning with how molecules reach and activate smell receptors. This framework supports estimating strength for untested molecules. Such predictions could speed up the search for new fragrances by reducing the need for physical experiments.

Core claim

By integrating two public sources into an ordinal odor strength dataset of over 2,000 molecules mapped to odorless, low, medium, and high categories, and applying supervised learning across various encodings, the work demonstrates that molecular size, polarity, ring features, and branching drive odor strength predictions, consistent with mass-transport constraints, thereby enabling reliable estimation for novel molecules.

What carries the argument

The ordinal supervised learning framework that combines molecular encodings with algorithms and uses dimensionality reduction plus SHAP analysis to identify primary drivers of odor strength.

If this is right

Novel molecules can be screened for odor strength without synthesis or sensory testing.
The identified molecular features provide interpretable rules for designing stronger or weaker scents.
The scalable method serves as a starting point for computational fragrance development in industries like perfumery and food.
Similar ordinal approaches could be applied to other scarce olfactory data sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Combining this with generative AI models might allow automated design of molecules with desired odor profiles.
Validation on independent sensory data could reveal if the public sources introduce systematic biases in labeling.
Extending the model to predict continuous intensity values or specific odor descriptors would increase its utility for practical applications.

Load-bearing premise

The integration of two different public sources produces consistent and accurate ordinal labels for odor strength without substantial noise, bias, or incompatibility in the mapping process.

What would settle it

Collecting independent human sensory ratings for a new set of 100-200 molecules and checking whether the model's predictions match these ratings at rates significantly better than chance or simple baselines.

Figures

Figures reproduced from arXiv: 2512.08683 by Julia Westermayr, Peter Fichtelmann.

read the original abstract

Predicting olfactory perception directly from molecular structure is central to fragrance design that plays a role in a wide range of industries, such as perfumery, food and beverage, and health care. Among olfactory attributes, odor strength is a key factor in shaping odor perception, but its modeling has been impeded by scarce and fragmented intensity data. In this work, we introduce an ordinal odor strength data set of over 2,000 molecules by integrating two different public sources, mapping structures to odorless, low, medium, and high categories. Across several molecular encodings and supervised learning algorithms we compared different prediction strategies. Dimensionality reduction and SHAP analysis identifies molecular size, polarity, ring features, and branching as primary drivers, consistent with mass-transport constraints on volatility, sorption, and receptor access. This scalable ordinal framework enables reliable odor-strength estimation for novel molecules and provides a foundation for in silico fragrance design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They integrated two sources into a 2000+ molecule ordinal dataset for odor strength and ran standard ML plus SHAP, but the abstract shows no performance numbers or label-agreement checks.

read the letter

The main takeaway is that this paper pulls together an ordinal dataset of over 2000 molecules by combining two public sources and maps them to odorless/low/medium/high, then compares molecular encodings and models while using SHAP to flag size, polarity, rings, and branching as key drivers. That feature list lines up with basic physical expectations around volatility and access, which is a small plus for interpretability over pure black-box work.

Referee Report

2 major / 1 minor

Summary. The manuscript constructs an ordinal odor-strength dataset of >2000 molecules by integrating two public sources and mapping structures onto four categories (odorless, low, medium, high). It then benchmarks multiple molecular encodings and supervised classifiers for predicting these ordinal labels, followed by dimensionality reduction and SHAP analysis that highlights molecular size, polarity, ring count, and branching as dominant features. The central claim is that the resulting scalable framework supports reliable in-silico estimation of odor strength for novel molecules and thereby provides a foundation for fragrance design.

Significance. If the label integration is shown to be consistent and the models demonstrate robust generalization, the work would fill a documented gap in olfactory QSAR by supplying both a sizable curated dataset and interpretable predictors tied to volatility and receptor-access mechanisms. The explicit use of SHAP for post-hoc feature attribution is a strength that could guide future structure–odor studies.

major comments (2)

[§2] §2 (Dataset Construction): The mapping of two distinct public sources onto the four ordinal categories is presented without any inter-source agreement metric, overlap statistics, or external validation. Because this label set is the sole supervision signal for all subsequent model training and SHAP interpretations, the absence of such diagnostics leaves the central claim of reliable estimation for novel molecules vulnerable to systematic label noise or source-specific bias.
[§4] §4 (Model Evaluation): No quantitative performance figures, cross-validation scheme details, or class-imbalance handling are referenced in the main results; without these, it is impossible to judge whether the reported feature importances translate into practically useful predictive accuracy.

minor comments (1)

[Abstract / §2] The abstract states the dataset size as “over 2,000” while the methods section should give the exact count after deduplication and filtering; this minor inconsistency should be harmonized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments below and have made revisions to the manuscript to strengthen the presentation of our dataset construction and model evaluation procedures.

read point-by-point responses

Referee: [§2] §2 (Dataset Construction): The mapping of two distinct public sources onto the four ordinal categories is presented without any inter-source agreement metric, overlap statistics, or external validation. Because this label set is the sole supervision signal for all subsequent model training and SHAP interpretations, the absence of such diagnostics leaves the central claim of reliable estimation for novel molecules vulnerable to systematic label noise or source-specific bias.

Authors: We agree that including quantitative measures of consistency between the two public sources would enhance the reliability of the dataset. In the revised manuscript, we have added overlap statistics, including the number of molecules common to both sources and the agreement rate on their ordinal labels. We have also computed an inter-rater agreement metric (Cohen's kappa) for the overlapping subset. Regarding external validation, we note that the sources are established public databases, but we discuss potential biases in the mapping process in the updated Section 2. These additions directly address concerns about label noise and support the robustness of our subsequent analyses. revision: yes
Referee: [§4] §4 (Model Evaluation): No quantitative performance figures, cross-validation scheme details, or class-imbalance handling are referenced in the main results; without these, it is impossible to judge whether the reported feature importances translate into practically useful predictive accuracy.

Authors: We appreciate this observation. While detailed performance metrics were provided in the supplementary materials, we acknowledge that they should be more prominently featured in the main text for better accessibility. In the revision, we have incorporated a new table in the results section summarizing key performance metrics from cross-validation, including accuracy, macro-F1 score, and ordinal-specific metrics. We have also added explicit details on the cross-validation procedure (stratified 5-fold CV) and the methods used to handle class imbalance, such as weighted loss functions in the classifiers. This allows readers to better assess the practical utility of the models alongside the SHAP interpretations. revision: yes

Circularity Check

0 steps flagged

Empirical ML framework is self-contained with no derivation chain

full rationale

The paper describes an empirical supervised learning pipeline: integration of two public odor datasets into four ordinal categories, featurization of molecules, training of classifiers/regressors, and post-hoc SHAP analysis for feature importance. No equations, ansatzes, uniqueness theorems, or self-citations are invoked to derive predictions. The mapping of sources to labels and the resulting model outputs are not shown to reduce to fitted parameters by construction; they remain falsifiable against external benchmarks. This matches the default expectation of no significant circularity for data-driven work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality and consistency of the integrated public dataset plus the assumption that trained models generalize to unseen molecules; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Public data sources can be reliably mapped to consistent ordinal odor strength categories without major labeling conflicts or errors.
The abstract states integration of two sources into odorless/low/medium/high labels; this mapping is a prerequisite for the supervised learning task.

pith-pipeline@v0.9.0 · 5686 in / 1396 out tokens · 46006 ms · 2026-05-21T18:25:20.022796+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dimensionality reduction and SHAP analysis identifies molecular size, polarity, ring features, and branching as primary drivers, consistent with mass-transport constraints on volatility, sorption, and receptor access.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

See & Sniff: Learning Visuo-Olfactory Representations
cs.CV 2026-06 unverdicted novelty 7.0

Introduces SmellNet-V synthetic visuo-olfactory dataset and See & Sniff self-supervised framework that learns aligned representations and produces smell saliency maps.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

T o address class imbalance, all predictors used cost-sensitive learning via odor strength category weighted loss functions

for the direct and the second step of the indirect approach: the macro-averaged MSE across odor strength categories of the validation set; 2) for the ﬁrst step of the indirect approach (binary classiﬁer: if a molecule is odorous): F1-score where the target was the minority class. T o address class imbalance, all predictors used cost-sensitive learning via...

work page
[2]

There Are More Astronauts Alive than There Are Perfumers

A. Wiltschko, “There Are More Astronauts Alive than There Are Perfumers”: The Complex Supply Chains of Scents , https://www .luxcapital.com/content/there-are-more-astronauts-alive-than-there-are-perfumers-the-complex-supply-chains- of-scents, 11/04/2023, Interview by Danny Crichton

work page 2023
[3]

E. J. Mayhew , C. J. Arayata, R. C. Gerkin, B. K. Lee, J. M. Magill, L. L. Snyder , K. A. Little, C. W. Yu and J. D. Mainland, Proc. Natl. Acad. Sci. U.S.A. , 2022, 119, e2116576119

work page 2022
[4]

G. T om, C. Ser , E. M. Rajaonson, S. Lo, H. S. Park, B. K. Lee and B. Sánchez-Lengeling, arXiv, 2025, preprint, arXiv:2501.16271, https://arxiv .org/abs/2501.16271

work page arXiv 2025
[5]

B. Lee, E. J. Mayhew , B. Sánchez-Lengeling, J. N. Wei, W. W. Qian, K. A. Little, M. Andres, B. B. Nguyen, T. Moloy , J. Y asonik, J. K. Parker , R. C. Gerkin, J. D. Mainland and A. B. Wiltschko, Science, 2023, 381, 999–1006

work page 2023
[6]

Y. Wang, Q. Zhao, M. Ma and J. Xu, Appl. Sci., 2022, 12, 8777

work page 2022
[7]

Sharma, R

A. Sharma, R. Kumar , S. Ranjta and P. K. Varadwaj, J. Chem. Inf. Model. , 2021, 61, 676–688

work page 2021
[8]

Sisson, A

L. Sisson, A. A. Barsainyan, M. Sharma and R. Kumar , ACS Omega, 2025, 10, 8980–8992

work page 2025
[9]

Luebke, The Good Scents Company , http://thegoodscentscompany .com/, (accessed November 2025)

B. Luebke, The Good Scents Company , http://thegoodscentscompany .com/, (accessed November 2025)

work page 2025
[10]

Lefﬁngwell & Associates, PMP 2001 - Database of Perfumery Materials and Performance ,

work page 2001
[11]

H. R. Moskowitz, A. Dravnieks and L. Klarman, Atten. Percept. Psychophys., 1976, 19, 122–128

work page 1976
[12]

Keller , M

A. Keller , M. Hempstead, I. A. Gomez, A. N. Gilbert and L. B. Vosshall, BMC Neurosci., 2012, 13, 122–122

work page 2012
[13]

Keller , R

A. Keller , R. C. Gerkin, Y. Guan, A. Dhurandhar , G. T uru, B. Szalai, J. D. Mainland, Y. Ihara, C. W. Yu, R. Wolﬁnger , C. Vens, L. Schietgat, K. De Grave, R. Norel, D. O. P. Consortium, G. Stolovitzky , G. A. Cecchi, L. B. Vosshall and P. Meyer , Science, 2017, 355, 820–826

work page 2017
[14]

Wakayama, M

H. Wakayama, M. Sakasai, K. Y oshikawa and M. Inoue, Ind. Eng. Chem. Res. , 2019, 58, 15036–15044

work page 2019
[15]

Ravia, K

A. Ravia, K. Snitz, D. Honigstein, M. Finkel, R. Zirler , O. Perl, L. Secundo, C. Laudamiel, D. Harel and N. Sobel, Nature, 2020, 588, 118–123

work page 2020
[16]

Y. Ma, K. T ang, Y. Xu and T. Thomas-Danguin, Data Brief, 2021, 36, 107143

work page 2021
[17]

PTB -XL, a large publicly available electrocardiography dataset

A. Bierling, A. Croy , T. Jesgarzewsky , M. Rommel, G. Cuniberti and T. Hummel, Sci. Data, 2025, 12, DOI: 10.1038/s41597–025– 04644–2

work page doi:10.1038/s41597 2025
[18]

Fechner , Elemente Der Psychophysik , Breitkopf u

G. Fechner , Elemente Der Psychophysik , Breitkopf u. Härtel, 1860. 9

work page
[19]

S. S. Stevens, Psychol. Rev., 1957, 64, 153–181

work page 1957
[20]

Chastrette, T

M. Chastrette, T. Thomas-Danguin and E. Rallet, Chem. Senses, 1998, 23, 181–196

work page 1998
[21]

L. J. van Gemert, Odour Thresholds: Compilations of Odour Threshold Values in Air , Water and Other Media , Oliemans Punter & Partners BV, Utrecht, The Netherlands, second enlarged and revised edition edn, 2011

work page 2011
[22]

Audouin, F

V. Audouin, F. Bonnet, Z. M. Vickers and G. A. Reineccius, In Gas Chromatography-Olfactometry , ed. J.V. Leland, P. Schieberle, A. Buettner , T. E. Acree, American Chemical Society , Washington, DC, 2001, vol. 782, Chapter 14, pp. 156–171

work page 2001
[23]

Pellegrino, K

R. Pellegrino, K. Samoilova, Y. Ihara, M. Andres, V. Singh, R. C. Gerkin, A. Koulakov and J. D. Mainland, bioRxiv, 2025, preprint, DOI: 10.1101/2025.08.08.668954

work page doi:10.1101/2025.08.08.668954 2025
[24]

PubChem, https://pubchem.ncbi.nlm.nih.gov/, (accessed November 2025)

work page 2025
[25]

Ruddigkeit, R

L. Ruddigkeit, R. van Deursen, L. C. Blum and J.-L. Reymond, J. Chem. Inf. Model. , 2012, 52, 2864–2875

work page 2012
[26]

Thiboud, Perfumes, Springer Netherlands, Dordrecht, 1994, pp

M. Thiboud, Perfumes, Springer Netherlands, Dordrecht, 1994, pp. 253–286

work page 1994
[27]

Pearson, Philos

K. Pearson, Philos. Mag., 1901, 2, 559–572

work page 1901
[28]

Hotelling, J

H. Hotelling, J. Educ. Psychol. , 1933, 24, 498–520

work page 1933
[30]

McInnes, J

L. McInnes, J. J. Healy , N. Saul and L. Großberger , J. Open Source Softw. , 2018, 3, 861

work page 2018
[31]

Lloyd, IEEE Trans

S. Lloyd, IEEE Trans. Inf. Theory, 1982, 28, 129–136

work page 1982
[32]

D. J. Hand, G. J. McLachlan and K. E. Basford, J. R. Stat. Soc. C: Appl. Stat. , 1989, 38, 384

work page 1989
[33]

Ester , H.-P

M. Ester , H.-P. Kriegel, J. Sander and X. Xu, Data Min. Knowl. Discov. , 1996, 226–231

work page 1996
[34]

Malik, IEEE Trans

Jianbo Shi and J. Malik, IEEE Trans. Pattern Anal. Mach. Intell. , 2000, 22, 888–905

work page 2000
[35]

A. Ng, M. Jordan and Y. Weiss, NIPS’01: Proceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic , MIT Press, Vancouver , British Columbia, Canada, 2001, vol. 14, pp. 849–856

work page 2001
[36]

J. H. Ward, J. Am. Stat. Assoc. , 1963, 58, 236–244

work page 1963
[37]

S. M. Lundberg and S.-I. Lee, NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems , Curran Associates Inc., Red Hook, NY, USA, 2017, pp. 4765–4774

work page 2017
[38]

Verhulst, Nouv

P.-F. Verhulst, Nouv. Mem. Acad. R. Sci. Bruxelles , 1845, 18, 1–38

work page
[39]

Breiman, Mach

L. Breiman, Mach. Learn., 2001, 45, 5–32

work page 2001
[40]

Chen and C

T. Chen and C. Guestrin, KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery , San Francisco, California, USA, 2016, pp. 785–794

work page 2016
[41]

D. E. Rumelhart, G. E. Hinton and R. J. Williams, Nature, 1986, 323, 533–536

work page 1986
[42]

P. Adam, A. Paszke, G. Sam, S. Gross, M. Francisco, F. Massa, A. Lerer , B. James, J. T. Bradbury , C. Gregory , G. Chanan, K. Trevor , T. Killeen, L. Zeming, Z. Lin, G. Natalia, N. Gimelshein, A. Luca, L. Antiga, D. Alban, A. Desmaison, K. Andreas, A. Köpf, Y. Edward, E. Y ang, D. Zach, Z. DeVito, R. Martin, M. Raison, T. Alykhan, A. T ejani, C. Sasank, ...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[43]

W. Cao, V. Mirjalili and S. Raschka, Pattern Recognit. Lett., 2020, 140, 325–331

work page 2020
[44]

H. L. Morgan, J. Chem. Doc. , 1965, 5, 107–113

work page 1965
[45]

Rogers and M

D. Rogers and M. Hahn, J. Chem. Inf. Model. , 2010, 50, 742–754

work page 2010
[46]

J. L. Durant, B. A. Leland, D. R. Henry and J. G. Nourse, J. Chem. Inf. Comput. Sci. , 2002, 42, 1273–1280

work page 2002
[47]

Nilakantan, N

R. Nilakantan, N. Bauman, J. S. Dixon and R. Venkataraghavan, J. Chem. Inf. Comput. Sci. , 1987, 27, 82–85

work page 1987
[48]

R. E. Carhart, D. H. Smith and R. Venkataraghavan, J. Chem. Inf. Comput. Sci. , 1985, 25, 64–73. 10

work page 1985
[49]

ChemBERTa- 2: Towards chemical foundation models.arXiv preprint arXiv:2209.01712, 2022

W. Ahmad, E. Simon, S. Chithrananda, G. Grand and B. Ramsundar , arXiv, 2022, preprint, arXiv:2209.01712, https://arxiv .org/abs/2209.01712

work page arXiv 2022
[50]

Y ang, K

K. Y ang, K. Swanson, W. Jin, C. W. Coley , P. Eiden, H. Gao, A. Guzmán-Pérez, A. Guzman-Perez, T. Hopper , B. Kelley , M. Mathea, A. Palmer , V. Settels, T. S. Jaakkola, K. F. Jensen and R. Barzilay , J. Chem. Inf. Model. , 2019, 59, 3370–3388

work page 2019
[51]

E. Heid, K. P. Greenman, Y. Chung, S.-C. Li, D. E. Graff, F. H. Vermeire, H. Wu, W. H. Green and C. J. McGill, J. Chem. Inf. Model. , 2024, 64, 9–17

work page 2024
[52]

Ramanujan, T

V. Ramanujan, T. Nguyen, S. Oh, A. Farhadi and L. Schmidt, Advances in Neural Information Processing Systems , Curran Associates, Inc., 2023, vol. 36, pp. 66426–66437

work page 2023
[53]

Burns, Akshat Shirish Zalte, Charlles R

J. Burns, A. S. Zalte and W. Green, arXiv, 2025, preprint, arXiv: 2506.15792, https://arxiv .org/abs/2506.15792

work page arXiv 2025
[54]

Coussement, M

K. Coussement, M. Z. Abedin, M. Kraus, S. Maldonado and K. T opuz, Decis. Support Syst. , 2024, 184, 114276

work page 2024
[55]

L. S. Shapley , Contributions to the Theory of Games (AM-28), Volume II , Princeton University Press, 1953, pp. 307–318

work page 1953
[56]

L. H. Hall and L. B. Kier , Reviews in Computational Chemistry , Wiley , 1st edn, 1991, vol. 2, pp. 367–422

work page 1991
[57]

Paoli, D

M. Paoli, D. Münch, A. Haase, E. M. C. Skoulakis, L. T urin and C. G. Galizia, eNeuro, 2017, 4, ENEURO.0070–17.2017

work page 2017
[58]

J. D. Mainland, J. N. Lundström, J. Reisert and G. Lowe, Trends Neurosci., 2014, 37, 443–454

work page 2014
[59]

Richardson, Beautiful Soup, https://www .crummy .com/software/BeautifulSoup/, (accessed September 2025)

L. Richardson, Beautiful Soup, https://www .crummy .com/software/BeautifulSoup/, (accessed September 2025)

work page 2025
[60]

A. C. Society , CAS Common Chemistry API , https://commonchemistry .cas.org/, (accessed October 2025)

work page 2025
[61]

E. A. Hamel, J. B. Castro, T. J. Gould, R. Pellegrino, Z. Liang, L. A. Coleman, F. Patel, D. S. Wallace, T. Bhatnagar , J. D. Mainland and R. C. Gerkin, Sci. Data, 2024, 11, 1220

work page 2024
[62]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer , R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher , M. Perrot and É. Duchesnay , J. Mach. Learn. Res. , 2011, 12, 2825–2830

work page 2011
[63]

Akiba, S

T. Akiba, S. Sano, T. Y anase, T. Ohta and M. Koyama, KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , Association for Computing Machinery , Anchorage, AK, USA, 2019, pp. 2623–2631

work page 2019
[64]

K. Aas, M. Jullum and A. Løland, Artif. Intell., 2021, 298, 103502

work page 2021
[65]

Hothorn, K

T. Hothorn, K. Hornik and A. Zeileis, J. Comput. Graph. Stat. , 2006, 15, 651–674

work page 2006
[66]

L. H. B. Olsen, I. K. Glad, M. Jullum and K. Aas, J. Mach. Learn. Res. , 2022, 23, 1–51

work page 2022
[67]

Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components

L. H. B. Olsen, I. K. Glad, M. Jullum and K. Aas, Data. Min. Knowl. Discov. , 2024, 38, 1782–1829. 11 Supporting Information for "Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components" Peter Fichtelmanna and Julia Westermayra,b a Wilhelm-Ostwald Institute of Physical and Theoretical Chemistry , Leipzig University ,...

work page doi:10.5281/zenodo.591637 2024

[1] [1]

T o address class imbalance, all predictors used cost-sensitive learning via odor strength category weighted loss functions

for the direct and the second step of the indirect approach: the macro-averaged MSE across odor strength categories of the validation set; 2) for the ﬁrst step of the indirect approach (binary classiﬁer: if a molecule is odorous): F1-score where the target was the minority class. T o address class imbalance, all predictors used cost-sensitive learning via...

work page

[2] [2]

There Are More Astronauts Alive than There Are Perfumers

A. Wiltschko, “There Are More Astronauts Alive than There Are Perfumers”: The Complex Supply Chains of Scents , https://www .luxcapital.com/content/there-are-more-astronauts-alive-than-there-are-perfumers-the-complex-supply-chains- of-scents, 11/04/2023, Interview by Danny Crichton

work page 2023

[3] [3]

E. J. Mayhew , C. J. Arayata, R. C. Gerkin, B. K. Lee, J. M. Magill, L. L. Snyder , K. A. Little, C. W. Yu and J. D. Mainland, Proc. Natl. Acad. Sci. U.S.A. , 2022, 119, e2116576119

work page 2022

[4] [4]

G. T om, C. Ser , E. M. Rajaonson, S. Lo, H. S. Park, B. K. Lee and B. Sánchez-Lengeling, arXiv, 2025, preprint, arXiv:2501.16271, https://arxiv .org/abs/2501.16271

work page arXiv 2025

[5] [5]

B. Lee, E. J. Mayhew , B. Sánchez-Lengeling, J. N. Wei, W. W. Qian, K. A. Little, M. Andres, B. B. Nguyen, T. Moloy , J. Y asonik, J. K. Parker , R. C. Gerkin, J. D. Mainland and A. B. Wiltschko, Science, 2023, 381, 999–1006

work page 2023

[6] [6]

Y. Wang, Q. Zhao, M. Ma and J. Xu, Appl. Sci., 2022, 12, 8777

work page 2022

[7] [7]

Sharma, R

A. Sharma, R. Kumar , S. Ranjta and P. K. Varadwaj, J. Chem. Inf. Model. , 2021, 61, 676–688

work page 2021

[8] [8]

Sisson, A

L. Sisson, A. A. Barsainyan, M. Sharma and R. Kumar , ACS Omega, 2025, 10, 8980–8992

work page 2025

[9] [9]

Luebke, The Good Scents Company , http://thegoodscentscompany .com/, (accessed November 2025)

B. Luebke, The Good Scents Company , http://thegoodscentscompany .com/, (accessed November 2025)

work page 2025

[10] [10]

Lefﬁngwell & Associates, PMP 2001 - Database of Perfumery Materials and Performance ,

work page 2001

[11] [11]

H. R. Moskowitz, A. Dravnieks and L. Klarman, Atten. Percept. Psychophys., 1976, 19, 122–128

work page 1976

[12] [12]

Keller , M

A. Keller , M. Hempstead, I. A. Gomez, A. N. Gilbert and L. B. Vosshall, BMC Neurosci., 2012, 13, 122–122

work page 2012

[13] [13]

Keller , R

A. Keller , R. C. Gerkin, Y. Guan, A. Dhurandhar , G. T uru, B. Szalai, J. D. Mainland, Y. Ihara, C. W. Yu, R. Wolﬁnger , C. Vens, L. Schietgat, K. De Grave, R. Norel, D. O. P. Consortium, G. Stolovitzky , G. A. Cecchi, L. B. Vosshall and P. Meyer , Science, 2017, 355, 820–826

work page 2017

[14] [14]

Wakayama, M

H. Wakayama, M. Sakasai, K. Y oshikawa and M. Inoue, Ind. Eng. Chem. Res. , 2019, 58, 15036–15044

work page 2019

[15] [15]

Ravia, K

A. Ravia, K. Snitz, D. Honigstein, M. Finkel, R. Zirler , O. Perl, L. Secundo, C. Laudamiel, D. Harel and N. Sobel, Nature, 2020, 588, 118–123

work page 2020

[16] [16]

Y. Ma, K. T ang, Y. Xu and T. Thomas-Danguin, Data Brief, 2021, 36, 107143

work page 2021

[17] [17]

PTB -XL, a large publicly available electrocardiography dataset

A. Bierling, A. Croy , T. Jesgarzewsky , M. Rommel, G. Cuniberti and T. Hummel, Sci. Data, 2025, 12, DOI: 10.1038/s41597–025– 04644–2

work page doi:10.1038/s41597 2025

[18] [18]

Fechner , Elemente Der Psychophysik , Breitkopf u

G. Fechner , Elemente Der Psychophysik , Breitkopf u. Härtel, 1860. 9

work page

[19] [19]

S. S. Stevens, Psychol. Rev., 1957, 64, 153–181

work page 1957

[20] [20]

Chastrette, T

M. Chastrette, T. Thomas-Danguin and E. Rallet, Chem. Senses, 1998, 23, 181–196

work page 1998

[21] [21]

L. J. van Gemert, Odour Thresholds: Compilations of Odour Threshold Values in Air , Water and Other Media , Oliemans Punter & Partners BV, Utrecht, The Netherlands, second enlarged and revised edition edn, 2011

work page 2011

[22] [22]

Audouin, F

V. Audouin, F. Bonnet, Z. M. Vickers and G. A. Reineccius, In Gas Chromatography-Olfactometry , ed. J.V. Leland, P. Schieberle, A. Buettner , T. E. Acree, American Chemical Society , Washington, DC, 2001, vol. 782, Chapter 14, pp. 156–171

work page 2001

[23] [23]

Pellegrino, K

R. Pellegrino, K. Samoilova, Y. Ihara, M. Andres, V. Singh, R. C. Gerkin, A. Koulakov and J. D. Mainland, bioRxiv, 2025, preprint, DOI: 10.1101/2025.08.08.668954

work page doi:10.1101/2025.08.08.668954 2025

[24] [24]

PubChem, https://pubchem.ncbi.nlm.nih.gov/, (accessed November 2025)

work page 2025

[25] [25]

Ruddigkeit, R

L. Ruddigkeit, R. van Deursen, L. C. Blum and J.-L. Reymond, J. Chem. Inf. Model. , 2012, 52, 2864–2875

work page 2012

[26] [26]

Thiboud, Perfumes, Springer Netherlands, Dordrecht, 1994, pp

M. Thiboud, Perfumes, Springer Netherlands, Dordrecht, 1994, pp. 253–286

work page 1994

[27] [27]

Pearson, Philos

K. Pearson, Philos. Mag., 1901, 2, 559–572

work page 1901

[28] [28]

Hotelling, J

H. Hotelling, J. Educ. Psychol. , 1933, 24, 498–520

work page 1933

[29] [30]

McInnes, J

L. McInnes, J. J. Healy , N. Saul and L. Großberger , J. Open Source Softw. , 2018, 3, 861

work page 2018

[30] [31]

Lloyd, IEEE Trans

S. Lloyd, IEEE Trans. Inf. Theory, 1982, 28, 129–136

work page 1982

[31] [32]

D. J. Hand, G. J. McLachlan and K. E. Basford, J. R. Stat. Soc. C: Appl. Stat. , 1989, 38, 384

work page 1989

[32] [33]

Ester , H.-P

M. Ester , H.-P. Kriegel, J. Sander and X. Xu, Data Min. Knowl. Discov. , 1996, 226–231

work page 1996

[33] [34]

Malik, IEEE Trans

Jianbo Shi and J. Malik, IEEE Trans. Pattern Anal. Mach. Intell. , 2000, 22, 888–905

work page 2000

[34] [35]

A. Ng, M. Jordan and Y. Weiss, NIPS’01: Proceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic , MIT Press, Vancouver , British Columbia, Canada, 2001, vol. 14, pp. 849–856

work page 2001

[35] [36]

J. H. Ward, J. Am. Stat. Assoc. , 1963, 58, 236–244

work page 1963

[36] [37]

S. M. Lundberg and S.-I. Lee, NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems , Curran Associates Inc., Red Hook, NY, USA, 2017, pp. 4765–4774

work page 2017

[37] [38]

Verhulst, Nouv

P.-F. Verhulst, Nouv. Mem. Acad. R. Sci. Bruxelles , 1845, 18, 1–38

work page

[38] [39]

Breiman, Mach

L. Breiman, Mach. Learn., 2001, 45, 5–32

work page 2001

[39] [40]

Chen and C

T. Chen and C. Guestrin, KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery , San Francisco, California, USA, 2016, pp. 785–794

work page 2016

[40] [41]

D. E. Rumelhart, G. E. Hinton and R. J. Williams, Nature, 1986, 323, 533–536

work page 1986

[41] [42]

P. Adam, A. Paszke, G. Sam, S. Gross, M. Francisco, F. Massa, A. Lerer , B. James, J. T. Bradbury , C. Gregory , G. Chanan, K. Trevor , T. Killeen, L. Zeming, Z. Lin, G. Natalia, N. Gimelshein, A. Luca, L. Antiga, D. Alban, A. Desmaison, K. Andreas, A. Köpf, Y. Edward, E. Y ang, D. Zach, Z. DeVito, R. Martin, M. Raison, T. Alykhan, A. T ejani, C. Sasank, ...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[42] [43]

W. Cao, V. Mirjalili and S. Raschka, Pattern Recognit. Lett., 2020, 140, 325–331

work page 2020

[43] [44]

H. L. Morgan, J. Chem. Doc. , 1965, 5, 107–113

work page 1965

[44] [45]

Rogers and M

D. Rogers and M. Hahn, J. Chem. Inf. Model. , 2010, 50, 742–754

work page 2010

[45] [46]

J. L. Durant, B. A. Leland, D. R. Henry and J. G. Nourse, J. Chem. Inf. Comput. Sci. , 2002, 42, 1273–1280

work page 2002

[46] [47]

Nilakantan, N

R. Nilakantan, N. Bauman, J. S. Dixon and R. Venkataraghavan, J. Chem. Inf. Comput. Sci. , 1987, 27, 82–85

work page 1987

[47] [48]

R. E. Carhart, D. H. Smith and R. Venkataraghavan, J. Chem. Inf. Comput. Sci. , 1985, 25, 64–73. 10

work page 1985

[48] [49]

ChemBERTa- 2: Towards chemical foundation models.arXiv preprint arXiv:2209.01712, 2022

W. Ahmad, E. Simon, S. Chithrananda, G. Grand and B. Ramsundar , arXiv, 2022, preprint, arXiv:2209.01712, https://arxiv .org/abs/2209.01712

work page arXiv 2022

[49] [50]

Y ang, K

K. Y ang, K. Swanson, W. Jin, C. W. Coley , P. Eiden, H. Gao, A. Guzmán-Pérez, A. Guzman-Perez, T. Hopper , B. Kelley , M. Mathea, A. Palmer , V. Settels, T. S. Jaakkola, K. F. Jensen and R. Barzilay , J. Chem. Inf. Model. , 2019, 59, 3370–3388

work page 2019

[50] [51]

E. Heid, K. P. Greenman, Y. Chung, S.-C. Li, D. E. Graff, F. H. Vermeire, H. Wu, W. H. Green and C. J. McGill, J. Chem. Inf. Model. , 2024, 64, 9–17

work page 2024

[51] [52]

Ramanujan, T

V. Ramanujan, T. Nguyen, S. Oh, A. Farhadi and L. Schmidt, Advances in Neural Information Processing Systems , Curran Associates, Inc., 2023, vol. 36, pp. 66426–66437

work page 2023

[52] [53]

Burns, Akshat Shirish Zalte, Charlles R

J. Burns, A. S. Zalte and W. Green, arXiv, 2025, preprint, arXiv: 2506.15792, https://arxiv .org/abs/2506.15792

work page arXiv 2025

[53] [54]

Coussement, M

K. Coussement, M. Z. Abedin, M. Kraus, S. Maldonado and K. T opuz, Decis. Support Syst. , 2024, 184, 114276

work page 2024

[54] [55]

L. S. Shapley , Contributions to the Theory of Games (AM-28), Volume II , Princeton University Press, 1953, pp. 307–318

work page 1953

[55] [56]

L. H. Hall and L. B. Kier , Reviews in Computational Chemistry , Wiley , 1st edn, 1991, vol. 2, pp. 367–422

work page 1991

[56] [57]

Paoli, D

M. Paoli, D. Münch, A. Haase, E. M. C. Skoulakis, L. T urin and C. G. Galizia, eNeuro, 2017, 4, ENEURO.0070–17.2017

work page 2017

[57] [58]

J. D. Mainland, J. N. Lundström, J. Reisert and G. Lowe, Trends Neurosci., 2014, 37, 443–454

work page 2014

[58] [59]

Richardson, Beautiful Soup, https://www .crummy .com/software/BeautifulSoup/, (accessed September 2025)

L. Richardson, Beautiful Soup, https://www .crummy .com/software/BeautifulSoup/, (accessed September 2025)

work page 2025

[59] [60]

A. C. Society , CAS Common Chemistry API , https://commonchemistry .cas.org/, (accessed October 2025)

work page 2025

[60] [61]

E. A. Hamel, J. B. Castro, T. J. Gould, R. Pellegrino, Z. Liang, L. A. Coleman, F. Patel, D. S. Wallace, T. Bhatnagar , J. D. Mainland and R. C. Gerkin, Sci. Data, 2024, 11, 1220

work page 2024

[61] [62]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer , R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher , M. Perrot and É. Duchesnay , J. Mach. Learn. Res. , 2011, 12, 2825–2830

work page 2011

[62] [63]

Akiba, S

T. Akiba, S. Sano, T. Y anase, T. Ohta and M. Koyama, KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , Association for Computing Machinery , Anchorage, AK, USA, 2019, pp. 2623–2631

work page 2019

[63] [64]

K. Aas, M. Jullum and A. Løland, Artif. Intell., 2021, 298, 103502

work page 2021

[64] [65]

Hothorn, K

T. Hothorn, K. Hornik and A. Zeileis, J. Comput. Graph. Stat. , 2006, 15, 651–674

work page 2006

[65] [66]

L. H. B. Olsen, I. K. Glad, M. Jullum and K. Aas, J. Mach. Learn. Res. , 2022, 23, 1–51

work page 2022

[66] [67]

Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components

L. H. B. Olsen, I. K. Glad, M. Jullum and K. Aas, Data. Min. Knowl. Discov. , 2024, 38, 1782–1829. 11 Supporting Information for "Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components" Peter Fichtelmanna and Julia Westermayra,b a Wilhelm-Ostwald Institute of Physical and Theoretical Chemistry , Leipzig University ,...

work page doi:10.5281/zenodo.591637 2024