pith. sign in

arxiv: 2606.24990 · v1 · pith:WI2BOOJOnew · submitted 2026-06-23 · 💻 cs.LG · cs.AI

Uncertainty-aware reinforcement learning for chemical language models

Pith reviewed 2026-06-26 00:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords reinforcement learningchemical language modelsuncertainty estimationmolecular designhit discoveryde novo designconformal prediction
0
0 comments X

The pith

Uncertainty-aware RL for chemical language models raises true hit rate from 0.5 to 0.75 by favoring reliable predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes two ways to add predictive uncertainty to reinforcement learning loops that train chemical language models for de novo molecular design. One treats uncertainty as an extra term in the reward; the other uses it to down-weight policy updates on molecules whose properties fall outside the scorer’s confident domain. In both a synthetic test bed and two real tasks that employ ChemProp or conformal-prediction wrappers, the resulting policies explore lower-uncertainty regions of chemical space. This produces molecules whose predicted scores are more likely to match experimental reality, increasing the true hit rate by 0.25 and nearly doubling the absolute number of true hits while leaving average molecular scores unchanged.

Core claim

Treating uncertainty either as an additional optimization objective or as a modulator of policy updates lets CLMs avoid high-uncertainty regions, yielding generated molecules whose property predictions are more likely to be correct and thereby raising the fraction of true hits from 0.5 to 0.75 without lowering the average molecular score.

What carries the argument

Uncertainty modulation of policy updates (or multi-objective reward) that down-weights molecules far from the training distribution of the property predictor.

If this is right

  • True hit rate in de novo design tasks rises from 0.5 to 0.75.
  • Total number of true hits nearly doubles while average molecular score stays the same.
  • Exploration is steered toward lower-uncertainty regions of chemical space.
  • The same two uncertainty-handling mechanisms work across synthetic, ChemProp, and conformal-prediction settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may transfer to other generative models whose property oracles also produce uncertainty estimates.
  • Success hinges on keeping the uncertainty model calibrated as the generative policy drifts farther from the original data.
  • If calibration fails for out-of-distribution molecules, the modulation step could inadvertently reinforce over-confident errors.

Load-bearing premise

The uncertainty estimates from the ChemProp or conformal-prediction models remain accurate and well-calibrated for molecules generated outside the original training distribution.

What would settle it

An experiment in which the uncertainty estimates are deliberately miscalibrated on the generated molecules, after which the uncertainty-aware RL shows no gain or a loss in true hit rate, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.24990 by Borja Medina, Jon Paul Janet.

Figure 1
Figure 1. Figure 1: RL setup in REINVENT4. The framework consists of an RNN prior that samples molecules from a learned chemical distribution. Generated molecules are evaluated using multiple scoring functions, which provide scalar predictions; these predictions can be interpreted as samples from an underlying distribution. The individual scalar scores are aggregated into a single final score used for the RL optimization proc… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between point estimation and probabilistic scoring func￾tions. In the point estimation scoring function, the score distribution is summa￾rized into a single point estimate, which is then transformed. In contrast, the probabilistic scoring function transforms the entire score distribution using Monte Carlo sampling, and the final value is computed as the mean of the transformed MPO scores. Orange… view at source ↗
Figure 3
Figure 3. Figure 3: Schematic representation of the different strategies for incorporating uncertainty in RL. The top panel shows the default REINVENT4 approach, which ignores uncertainty in predictions. The Score Modulation (SM) strategy includes uncertainty as an additional component in the final scoring function, whereas the Loss Modulation (LM) strategy modulates the contribution of each molecule during the gradient updat… view at source ↗
Figure 4
Figure 4. Figure 4: Results for Model System with one noisy scoring component using RDKit logP as the scoring predictor. We report (a) the total number of accumu￾lated hit-scaffolds, (b) the True hit ratio and (c) the mean transformed distance, which in Model System corresponds to the uncertainty measure. For all reported scores, except uncertainty, values closer to 1 indicate better performance. Results are averaged over fiv… view at source ↗
Figure 5
Figure 5. Figure 5: Results for Model System with two noisy scoring component using RDKit logP and RDKit BertzCT as the scoring predictors. We report (a) the total number of accumulated hit-scaffolds, (b) the ratio of true to total accumulated hit-scaffolds. Additionally, we report (c) the geometric mean of the transformed distances, which in Model System corresponds to the uncertainty measure. For all reported scores, except… view at source ↗
Figure 6
Figure 6. Figure 6: Results obtained using ChemProp Predictor model as the activity scor￾ing predictor. We report (a) the total number of accumulated hit-scaffolds, (b) the number of false hits among them, and (c) the ratio of true to total accumulated hit-scaffolds. Additionally, we report several metrics throughout the RL run, in￾cluding (d) the ChemProp Predictor score use to guide the optimization, (e) the ChemProp oracle… view at source ↗
Figure 7
Figure 7. Figure 7: Results obtained using a CP built on top of a RF classifier as the activity scoring predictor. We report (a) the total number of accumulated hit￾scaffolds, (b) the number of false hits, where false hits were defined as hits that are ultimately classified as uncertain by the CP, and (c) the ratio of true to total accumulated hit-scaffolds. Additionally, we report several metrics throughout the RL run, inclu… view at source ↗
read the original abstract

Reinforcement Learning (RL) has become a powerful paradigm for de novo molecular design, enabling Chemical Language Models (CLMs) to navigate and explore the chemical space while optimizing specific desired properties. However, the existing RL frameworks treat all scoring functions as deterministic oracles, neglecting the inherent uncertainty attached to the predictions of the different molecular properties. This can lead to the exploration of highly-uncertain regions of the chemical space, focusing on the generation of highly scored molecules which are poorly supported by the training data. This can destabilize the optimization process, yielding predictions that are far from their true values. We propose and compare two complementary ways of incorporating predictive uncertainty into RL. In the first one, uncertainty is treated as an additional optimization objective and incorporated along with the rest of the scoring functions, allowing the policy to trade off exploitation against reliability. Secondly, uncertainty is used to modulate policy updates, reducing the influence of molecules whose properties lie far outside the scoring function confidence domain. Both approaches were evaluated across three different settings: (i) a controlled model system, in which the prediction error is modeled as a Gaussian distribution, with a variance proportional to the distance to the training data; and two real-world tasks, making use of (ii) ChemProp models and (iii) a Conformal Prediction wrapper applied to a Random forest classifier. We show that uncertainty-aware RL enables CLMs to explore chemical space more robustly by favoring lower-uncertainty regions. This leads to more reliable hit discovery without compromising molecular score, increasing the true hit rate by 0.25 (from 0.5 to 0.75), and nearly doubling the total number of true hits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes two methods to incorporate predictive uncertainty into RL for chemical language models (CLMs): treating uncertainty as an additional optimization objective alongside property scores, and using uncertainty to modulate the magnitude of policy updates. These are evaluated in a controlled Gaussian-error simulation and two real-world tasks (ChemProp models; conformal prediction on random forests), with the central empirical claim being an increase in true hit rate from 0.5 to 0.75 and nearly doubled total true hits without loss in molecular score.

Significance. If the uncertainty estimates remain well-calibrated on the OOD molecules produced by the policy, the work provides a practical way to stabilize RL-driven molecular optimization and reduce the risk of exploiting spurious high scores. The controlled-to-real-world progression and the two complementary uncertainty-handling strategies are strengths; the absence of any machine-checked proofs or parameter-free derivations is noted but does not detract from the empirical focus.

major comments (2)
  1. [real-world evaluation] Real-world evaluation sections: no calibration diagnostics (prediction-interval coverage, error-vs-uncertainty correlation, or reliability diagrams) are reported for the final set of molecules generated by the RL policy. Because these molecules are produced by optimizing the very scoring functions whose uncertainty is being used, they lie outside the original training support by construction; without such checks the attribution of the 0.25 hit-rate gain to uncertainty awareness cannot be verified.
  2. [controlled setting and results] Controlled Gaussian-error setting and abstract results: the reported true-hit-rate improvement (0.5 → 0.75) is given without error bars, number of independent runs, statistical tests, or explicit parameterization of how variance scales with distance to training data and how molecules are labeled as true hits. These omissions make it impossible to judge whether the observed difference is robust or reproducible.
minor comments (1)
  1. Notation for the two uncertainty-handling variants is introduced only in the abstract and methods; a short table or explicit equation labels in the results would improve readability when comparing the “additional objective” versus “modulate updates” variants.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the evaluation methodology. We address each major comment below and will revise the manuscript to incorporate the suggested diagnostics and statistical details.

read point-by-point responses
  1. Referee: [real-world evaluation] Real-world evaluation sections: no calibration diagnostics (prediction-interval coverage, error-vs-uncertainty correlation, or reliability diagrams) are reported for the final set of molecules generated by the RL policy. Because these molecules are produced by optimizing the very scoring functions whose uncertainty is being used, they lie outside the original training support by construction; without such checks the attribution of the 0.25 hit-rate gain to uncertainty awareness cannot be verified.

    Authors: We agree that calibration checks on the OOD molecules produced by the policy are necessary to strengthen the attribution of performance gains to uncertainty awareness. The original manuscript reported calibration metrics only on the static held-out test sets for the property predictors. In the revision we will compute and report prediction-interval coverage, error-vs-uncertainty correlation, and reliability diagrams specifically on the final molecules generated by each RL policy for both the ChemProp and conformal-prediction experiments. revision: yes

  2. Referee: [controlled setting and results] Controlled Gaussian-error setting and abstract results: the reported true-hit-rate improvement (0.5 → 0.75) is given without error bars, number of independent runs, statistical tests, or explicit parameterization of how variance scales with distance to training data and how molecules are labeled as true hits. These omissions make it impossible to judge whether the observed difference is robust or reproducible.

    Authors: The controlled experiments were performed with five independent random seeds. We will add error bars (standard deviation across seeds), state the number of runs explicitly, and include a paired statistical test on the hit-rate difference. The Gaussian variance is parameterized as linearly proportional to the distance in the pre-trained molecular embedding to the nearest training point; true hits are defined as molecules whose noise-free property value exceeds the threshold. These details and the corresponding figures will be expanded in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; central claims rest on independent empirical evaluation

full rationale

The paper proposes two uncertainty-aware RL variants for CLMs and reports empirical hit-rate improvements from controlled Gaussian-error simulations plus real-world ChemProp and conformal-prediction experiments. No derivation chain, equation, or self-citation reduces the reported true-hit-rate gains (0.5 to 0.75) to a fitted parameter or input by construction; the evaluation metrics are computed directly from generated molecules against held-out oracles. Self-citations, if present, are not load-bearing for the main result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities. The approach inherits standard RL assumptions (Markov decision process, policy gradient) and the calibration properties of the cited property predictors; none of these are enumerated or justified in the provided text.

pith-pipeline@v0.9.1-grok · 5831 in / 1325 out tokens · 24970 ms · 2026-06-26T00:15:38.998343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Sadybekov and Vsevolod Katritch

    Anastasiia V. Sadybekov and Vsevolod Katritch. Computational approaches streamlining drug dis- covery.Nature, 616(7958):673–685, April 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-05905-z. URL http://dx.doi.org/10.1038/ s41586-023-05905-z

  2. [2]

    Generative deep learning for de novo drug design - a chemical space odyssey.Journal of Chemical Information and Modeling, 65(14): 7352–7372, July 2025

    Rıza Özçelik, Helena Brinkmann, Emanuele Criscuolo, and Francesca Grisoni. Generative deep learning for de novo drug design - a chemical space odyssey.Journal of Chemical Information and Modeling, 65(14): 7352–7372, July 2025. ISSN 1549- 960X. doi: 10.1021/acs.jcim.5c00641. URLhttp://dx.doi.org/10.1021/ acs.jcim.5c00641

  3. [3]

    Computational chemistry as applied in environmen- tal research: Opportunities and chal- lenges.ACS ES&T Engineering, 4(1):66–95, October 2023

    Christian Sandoval-Pauker, Sheng Yin, Alexandria Castillo, Neidy Ocuane, Diego Puerto-Diaz, and Dino Villagrán. Computational chemistry as applied in environmen- tal research: Opportunities and chal- lenges.ACS ES&T Engineering, 4(1):66–95, October 2023. ISSN 2690-0645. doi: 10.1021/acsestengg. 3c00227. URLhttp://dx.doi.org/ 10.1021/acsestengg.3c00227

  4. [4]

    Leonard, Faruque Hasan, Helen F

    Kevin C. Leonard, Faruque Hasan, Helen F. Sneddon, and Fengqi You. Can artificial intelligence and ma- chine learning be used to acceler- ate sustainable chemistry and en- gineering?ACS Sustainable Chemistry & Engineering, 9(18): 6126–6129, May 2021. ISSN 2168-

  5. [5]

    doi: 10.1021/acssuschemeng. 1c02741. URLhttp://dx.doi.org/ 10.1021/acssuschemeng.1c02741

  6. [6]

    Achar and John A

    Siddarth K. Achar and John A. Keith. Small data machine learn- 21 ing approaches in molecular and materials science.Chemical Re- views, 124(24):13571–13573, Decem- ber 2024. ISSN 1520-6890. doi: 10.1021/acs.chemrev.4c00957. URL http://dx.doi.org/10.1021/acs. chemrev.4c00957

  7. [7]

    Chenru Duan, Aditya Nandy, and Heather J. Kulik. Machine learn- ing for the discovery, design, and engineering of materials.Annual Re- view of Chemical and Biomolecular Engineering, 13(1):405–429, June

  8. [8]

    doi: 10.1146/ annurev-chembioeng-092320-120230

    ISSN 1947-5446. doi: 10.1146/ annurev-chembioeng-092320-120230. URLhttp://dx.doi.org/10.1146/ annurev-chembioeng-092320-120230

  9. [9]

    The enumeration of chemical space.WIREs Com- putational Molecular Science, 2(5): 717–733, April 2012

    Jean-Louis Reymond, Lars Rud- digkeit, Lorenz Blum, and Ruud van Deursen. The enumeration of chemical space.WIREs Com- putational Molecular Science, 2(5): 717–733, April 2012. ISSN 1759-0884. doi: 10.1002/wcms.1104. URLhttp: //dx.doi.org/10.1002/wcms.1104

  10. [10]

    Science328, 1021–1025 (2010) https://doi.org/10.1126/science

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Tim- othy Lillicrap, Karen Simonyan, and Demis Hassabis. A gen- eral reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419): 1140–1144, December 2018...

  11. [11]

    OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Che- ung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fis- cher, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Cather- ine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor,...

  12. [12]

    doi: 10.48550/ARXIV.1912. 06680. URLhttps://arxiv.org/ abs/1912.06680

  13. [13]

    Reinforcement learning in robotic applications: a comprehensive survey.Artificial Intelligence Review, 55(2):945–990, April 2021

    Bharat Singh, Rajesh Kumar, and Vinay Pratap Singh. Reinforcement learning in robotic applications: a comprehensive survey.Artificial Intelligence Review, 55(2):945–990, April 2021. ISSN 1573-7462. doi: 10.1007/s10462-021-09997-9. URL http://dx.doi.org/10.1007/ s10462-021-09997-9

  14. [14]

    Exploring applica- tions of deep reinforcement learning for real-world autonomous driving systems.arXiv, 2019

    Victor Talpaert, Ibrahim Sobh, B Ravi Kiran, Patrick Mannion, Senthil Yogamani, Ahmad El-Sallab, and Patrick Perez. Exploring applica- tions of deep reinforcement learning for real-world autonomous driving systems.arXiv, 2019. URLhttps: //arxiv.org/abs/1901.01536

  15. [15]

    Cordova, L

    B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey.IEEE Transactions on Intel- ligent Transportation Systems, 23(6): 4909–4926, 2022. doi: 10.1109/TITS. 2021.3054625

  16. [16]

    Rad- chenko, Olena Savych, Yuriy S

    Maria Korshunova, Niles Huang, Stephen Capuzzi, Dmytro S. Rad- chenko, Olena Savych, Yuriy S. Moroz, Carrow I. Wells, Timo- thy M. Willson, Alexander Tropsha, and Olexandr Isayev. Genera- tive and reinforcement learning 22 approaches for the automated de novo design of bioactive compounds. Communications Chemistry, 5(1), October 2022. ISSN 2399-3669. doi:...

  17. [17]

    Zare, and Patrick Riley

    Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N. Zare, and Patrick Riley. Optimization of molecules via deep reinforcement learning.Scientific Reports, 9 (1), July 2019. ISSN 2045-2322. doi: 10.1038/s41598-019-47148-x. URLhttp://dx.doi.org/10.1038/ s41598-019-47148-x

  18. [18]

    Deep reinforcement learning for multipa- rameter optimization in de novo drug design.Journal of Chemical Informa- tion and Modeling, 59(7):3166–3176, June 2019

    Niclas Ståhl, Göran Falkman, Alexander Karlsson, Gunnar Math- iason, and Jonas Boström. Deep reinforcement learning for multipa- rameter optimization in de novo drug design.Journal of Chemical Informa- tion and Modeling, 59(7):3166–3176, June 2019. ISSN 1549-960X. doi: 10.1021/acs.jcim.9b00325. URL http://dx.doi.org/10.1021/acs. jcim.9b00325

  19. [19]

    Molecular de-novo design through deep reinforcement learning

    Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hong- ming Chen. Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1), September 2017. ISSN 1758-2946. doi: 10.1186/s13321-017-0235-x. URLhttp://dx.doi.org/10.1186/ s13321-017-0235-x

  20. [20]

    Deep reinforce- ment learning for de novo drug de- sign.Science Advances, 4(7), July

    Mariya Popova, Olexandr Isayev, and Alexander Tropsha. Deep reinforce- ment learning for de novo drug de- sign.Science Advances, 4(7), July

  21. [21]

    doi: 10.1126/ sciadv.aap7885

    ISSN 2375-2548. doi: 10.1126/ sciadv.aap7885. URLhttp://dx. doi.org/10.1126/sciadv.aap7885

  22. [22]

    MIT press Cambridge, 1998

    Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

  23. [23]

    Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H

    Hannes H. Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H. Mervin, and Ola Engkvist. Reinvent 4: Mod- ern ai–driven generative molecule de- sign.Journal of Cheminformatics, 16 (1), February 2024. ISSN 1758-2946. doi: 10.1186/s13321-024-00812-5. URLhttp://dx.doi.org/10.1186/ s13321-024-00812-5

  24. [24]

    Acegen: Reinforcement learn- ing of generative chemical agents for drug discovery.Journal of Chemical Information and Modeling, 64(15): 5900–5911, August 2024

    Albert Bou, Morgan Thomas, Se- bastian Dittert, Carles Navarro, Ma- ciej Majewski, Ye Wang, Shivam Pa- tel, Gary Tresadern, Mazen Ahmad, Vincent Moens, Woody Sherman, Si- mone Sciabola, and Gianni De Fab- ritiis. Acegen: Reinforcement learn- ing of generative chemical agents for drug discovery.Journal of Chemical Information and Modeling, 64(15): 5900–591...

  25. [25]

    Sample-efficient gen- erative molecular design using memory manipulation.Nature Machine Intelligence, 8(3):449–460, March 2026

    Jeff Guo, Junwu Chen, An- thony GX-Chen, and Philippe Schwaller. Sample-efficient gen- erative molecular design using memory manipulation.Nature Machine Intelligence, 8(3):449–460, March 2026. ISSN 2522-5839. doi: 10.1038/s42256-026-01200-4. URLhttp://dx.doi.org/10.1038/ s42256-026-01200-4

  26. [26]

    Areview of uncertainty for deep reinforcement learning.arXiv, 2022

    OwenLockwoodandMeiSi. Areview of uncertainty for deep reinforcement learning.arXiv, 2022. doi: 10.48550/ ARXIV.2208.09052. URLhttps:// arxiv.org/abs/2208.09052

  27. [27]

    What uncertainties do we need in bayesian deep learning for com- puter vision? In I

    Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for com- puter vision? In I. Guyon, U. Von 23 Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Asso- ciates, Inc., 2017. URLhttps: //proceedings.neurips.cc/ paper_files/...

  28. [28]

    Mervin, Simon Johans- son, Elizaveta Semenova, Kathryn A

    Lewis H. Mervin, Simon Johans- son, Elizaveta Semenova, Kathryn A. Giblin, and Ola Engkvist. Un- certainty quantification in drug de- sign.Drug Discovery Today, 26(2): 474–489, February 2021. ISSN 1359-

  29. [29]

    doi: 10.1016/j.drudis.2020.11

  30. [30]

    1016/j.drudis.2020.11.027

    URLhttp://dx.doi.org/10. 1016/j.drudis.2020.11.027

  31. [31]

    Iverson.Bayesian Versus Fre- quentist Inference, pages 181–207

    Eric-Jan Wagenmakers, Michael Lee, Tom Lodewyckx, and Geoffrey J. Iverson.Bayesian Versus Fre- quentist Inference, pages 181–207. Springer New York, New York, NY, 2008. ISBN 978-0-387-09612-

  32. [32]

    doi: 10.1007/978-0-387-09612-4_

  33. [33]

    URLhttps://doi.org/10.1007/ 978-0-387-09612-4_9

  34. [34]

    Dis- entangling classical and bayesian ap- proaches to uncertainty analysis.New Zeland: Measurement Standards Lab- oratory, 2012

    Robin Willink and Rod White. Dis- entangling classical and bayesian ap- proaches to uncertainty analysis.New Zeland: Measurement Standards Lab- oratory, 2012

  35. [35]

    Methods for comparing uncertainty quantifications for material property predictions.Machine Learning: Science and Technology, 1(2):025006, May 2020

    Kevin Tran, Willie Neiswanger, Junwoong Yoon, Qingyang Zhang, Eric Xing, and Zachary W Ulissi. Methods for comparing uncertainty quantifications for material property predictions.Machine Learning: Science and Technology, 1(2):025006, May 2020. ISSN 2632-2153. doi: 10.1088/2632-2153/ab7e1a. URL http://dx.doi.org/10.1088/ 2632-2153/ab7e1a

  36. [36]

    Benjamin Kompa, Jasper Snoek, and Andrew L. Beam. Empirical frequen- tist coverage of deep learning uncer- tainty quantification procedures.En- tropy, 23(12):1608, November 2021. ISSN 1099-4300. doi: 10.3390/ e23121608. URLhttp://dx.doi. org/10.3390/e23121608

  37. [37]

    Frequentist uncertainty quantification in semi-structured neural networks

    Emilio Dorigatti, Benjamin Schu- bert, Bernd Bischl, and David Ruegamer. Frequentist uncertainty quantification in semi-structured neural networks. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statis- tics, volume 206 ofProceedings of Machine Learning R...

  38. [38]

    Alexander Tropsha, Paola Gramat- ica, and Vijay K. Gombar. The im- portance of being earnest: Validation is the absolute essential for success- ful application and interpretation of qspr models.QSAR & Combinato- rial Science, 22(1):69–77, April 2003. ISSN 1611-0218. doi: 10.1002/qsar. 200390007. URLhttp://dx.doi. org/10.1002/qsar.200390007

  39. [39]

    A stepwise approach for defining the applicabil- ity domain of sar and qsar mod- els.Journal of Chemical Information and Modeling, 45(4):839–849, June

    Sabcho Dimitrov, Gergana Dim- itrova, Todor Pavlov, Nadezhda Dim- itrova, Grace Patlewicz, Jay Niemela, and Ovanes Mekenyan. A stepwise approach for defining the applicabil- ity domain of sar and qsar mod- els.Journal of Chemical Information and Modeling, 45(4):839–849, June

  40. [40]

    ISSN 1549-960X. doi: 10. 1021/ci0500381. URLhttp://dx. doi.org/10.1021/ci0500381

  41. [41]

    Lennart Eriksson, Joanna Jaworska, Andrew P Worth, Mark T D Cronin, 24 Robert M McDowell, and Paola Gra- matica. Methods for reliability and uncertainty assessment and for appli- cability evaluations of classification- and regression-based qsars.Envi- ronmental Health Perspectives, 111 (10):1361–1375, August 2003. ISSN 1552-9924. doi: 10.1289/ehp.5758. UR...

  42. [42]

    Applicability domain for qsar mod- els: Where theory meets reality

    Domenico Gadaleta, Giuseppe Fe- lice Mangiatordi, Marco Catto, An- gelo Carotti, and Orazio Nicolotti. Applicability domain for qsar mod- els: Where theory meets reality. International Journal of Quantita- tive Structure-Property Relationships, 1(1):45–63, January 2016. ISSN 2379-7479. doi: 10.4018/ijqspr. 2016010102. URLhttp://dx.doi. org/10.4018/IJQSPR....

  43. [43]

    Comparison of dif- ferent approaches to define the applicability domain of qsar mod- els.Molecules, 17(5):4791–4810, April 2012

    Faizan Sahigara, Kamel Mansouri, Davide Ballabio, Andrea Mauri, Viviana Consonni, and Roberto Todeschini. Comparison of dif- ferent approaches to define the applicability domain of qsar mod- els.Molecules, 17(5):4791–4810, April 2012. ISSN 1420-3049. doi: 10.3390/molecules17054791. URL http://dx.doi.org/10.3390/ molecules17054791

  44. [44]

    Schultz, Y

    Lane E. Schultz, Yiqi Wang, Ryan Jacobs, and Dane Morgan. A general approach for determining applicabil- ity domain of machine learning mod- els.npj Computational Materials, 11(1), April 2025. ISSN 2057-3960. doi: 10.1038/s41524-025-01573-x. URLhttp://dx.doi.org/10.1038/ s41524-025-01573-x

  45. [45]

    and Bates, Stephen , title =

    Anastasios N. Angelopoulos and Stephen Bates. Conformal predic- tion: A gentle introduction.Founda- tions and Trends®in Machine Learn- ing, 16(4):494–591, March2023. ISSN 1935-8245. doi: 10.1561/2200000101. URLhttp://dx.doi.org/10.1561/ 2200000101

  46. [46]

    Exploring chemical space with score-based out-of-distribution generation.arXiv, 2023

    Seul Lee, Jaehyeong Jo, and Sung Ju Hwang. Exploring chemical space with score-based out-of-distribution generation.arXiv, 2023. URLhttps: //arxiv.org/abs/2206.07632

  47. [47]

    Alshehri, Bryan Tantisujjatham, and Maher M

    Abdulelah S. Alshehri, Bryan Tantisujjatham, and Maher M. Alrashed. Uncertainty-aware deep reinforcement learning approach for computational molecular design. Industrial & Engineering Chem- istry Research, 64(20):10117–10130, May 2025. ISSN 1520-5045. doi: 10.1021/acs.iecr.4c04993. URL http://dx.doi.org/10.1021/acs. iecr.4c04993

  48. [48]

    Comparative study of deep gen- erative models on chemical space coverage.Journal of Chemical Information and Modeling, 61(6): 2572–2581, May 2021

    Jie Zhang, Rocío Mercado, Ola Engkvist, and Hongming Chen. Comparative study of deep gen- erative models on chemical space coverage.Journal of Chemical Information and Modeling, 61(6): 2572–2581, May 2021. ISSN 1549- 960X. doi: 10.1021/acs.jcim.0c01328. URLhttp://dx.doi.org/10.1021/ acs.jcim.0c01328

  49. [49]

    On failure modes in molecule genera- tion and optimization.Drug Dis- covery Today: Technologies, 32-33: 55–63, December 2019

    Philipp Renz, Dries Van Rompaey, Jörg Kurt Wegner, Sepp Hochre- iter, and Günter Klambauer. On failure modes in molecule genera- tion and optimization.Drug Dis- covery Today: Technologies, 32-33: 55–63, December 2019. ISSN 1740-

  50. [50]

    doi: 10.1016/j.ddtec.2020.09

  51. [51]

    1016/j.ddtec.2020.09.003

    URLhttp://dx.doi.org/10. 1016/j.ddtec.2020.09.003

  52. [52]

    25 A data-driven generative strat- egy to avoid reward hacking in multi-objective molecular design

    Tatsuya Yoshizawa, Shoichi Ishida, Tomohiro Sato, Masateru Ohta, Teruki Honma, and Kei Terayama. 25 A data-driven generative strat- egy to avoid reward hacking in multi-objective molecular design. Nature Communications, 16(1), March 2025. ISSN 2041-1723. doi: 10.1038/s41467-025-57582-3. URLhttp://dx.doi.org/10.1038/ s41467-025-57582-3

  53. [53]

    Uncertainty in qsar predictions.Alternatives to Laboratory Animals, 41(1):111–125, March 2013

    Ullrika Sahlin. Uncertainty in qsar predictions.Alternatives to Laboratory Animals, 41(1):111–125, March 2013. ISSN 2632-3559. doi: 10.1177/026119291304100111. URLhttp://dx.doi.org/10.1177/ 026119291304100111

  54. [54]

    Lior Hirschfeld, Kyle Swanson, Kevin Yang, Regina Barzilay, and Con- nor W. Coley. Uncertainty quantifica- tion using neural networks for molec- ular property prediction.Journal of Chemical Information and Modeling, 60(8):3770–3780, July 2020. ISSN 1549-960X. doi: 10.1021/acs.jcim. 0c00502. URLhttp://dx.doi.org/ 10.1021/acs.jcim.0c00502

  55. [55]

    Tom Frömbgen, Elizaveta Surzhikova, Jürgen Dölz, Jonny Proppe, Barbara Kirchner, and Christoph R. Jacob. Uncer- tainty quantification for <i>in silico</i> chemistry.Chemi- cal Reviews, 126(7):4189–4236, March 2026. ISSN 1520-6890. doi: 10.1021/acs.chemrev.5c00931. URL http://dx.doi.org/10.1021/acs. chemrev.5c00931

  56. [56]

    Rasmussen, Chenru Duan, Heather J

    Maria H. Rasmussen, Chenru Duan, Heather J. Kulik, and Jan H. Jensen. Uncertain of uncertainties? a com- parison of uncertainty quantification metrics for chemical data sets. Journal of Cheminformatics, 15(1), December 2023. ISSN 1758-2946. doi: 10.1186/s13321-023-00790-0. URLhttp://dx.doi.org/10.1186/ s13321-023-00790-0

  57. [57]

    Costas D. Maranas. Optimal molec- ular design under property prediction uncertainty.AIChE Journal, 43(5): 1250–1264, May 1997. ISSN 1547-

  58. [58]

    URLhttp://dx.doi.org/10.1002/ aic.690430514

    doi: 10.1002/aic.690430514. URLhttp://dx.doi.org/10.1002/ aic.690430514

  59. [59]

    Heil, Philip M

    Thomas Blaschke, Josep Arús-Pous, Hongming Chen, Christian Margreit- ter, Christian Tyrchan, Ola Engkvist, Kostas Papadopoulos, and Atanas Patronov. Reinvent 2.0: An ai tool for de novo drug design.Journal of Chemical Information and Model- ing, 60(12):5918–5922, October 2020. ISSN 1549-960X. doi: 10.1021/acs. jcim.0c00915. URLhttp://dx.doi. org/10.1021/a...

  60. [60]

    Libinvent: Reaction-based generative scaffold decoration for in silico library design.Journal of Chemical Information and Modeling, 62(9):2046– 2063, 2022

    Vendy Fialková, Jiaxi Zhao, Kostas Papadopoulos, Ola Engkvist, Es- ben Jannik Bjerrum, Thierry Ko- gej, and Atanas Patronov. Libin- vent: Reaction-based generative scaf- fold decoration for <i>in silico</i> library design.Journal of Chemi- cal Information and Modeling, 62(9): 2046–2063, August 2021. ISSN 1549- 960X. doi: 10.1021/acs.jcim.1c00469. URLhttp:...

  61. [61]

    Sample effi- cient reinforcement learning with ac- tive learning for molecular design

    Michael Dodds, Jeff Guo, Thomas Löhr, Alessandro Tibo, Ola Engkvist, and Jon Paul Janet. Sample effi- cient reinforcement learning with ac- tive learning for molecular design. Chemical Science, 15(11):4146–4160,

  62. [62]

    doi: 10.1039/ d3sc04653b

    ISSN 2041-6539. doi: 10.1039/ d3sc04653b. URLhttp://dx.doi. org/10.1039/D3SC04653B

  63. [63]

    Intro- ducing conformal prediction in pre- dictive modeling

    Ulf Norinder, Lars Carlsson, Scott Boyer, and Martin Eklund. Intro- ducing conformal prediction in pre- dictive modeling. a transparent and 26 flexible alternative to applicability domain determination.Journal of Chemical Information and Modeling, 54(6):1596–1603, May 2014. ISSN 1549-960X. doi: 10.1021/ci5001168. URLhttp://dx.doi.org/10.1021/ ci5001168

  64. [64]

    nonconformist: Python implementation of the conformal prediction framework

    Henrik Linusson. nonconformist: Python implementation of the conformal prediction framework. https://github.com/donlnz/ nonconformist, 2017. 27 Supplementary Information for Uncertainty-aware reinforcement learning for chemical language models Borja Medina Molecular AI, Discovery Sciences, BioPharmaceuticals R&D AstraZeneca AB Gothenburg, Sweden borja.med...

  65. [65]

    Aggregation by Molecule ChEMBL ID, retaining the canonical SMILES and the median pChEMBL Value

  66. [66]

    Noisy Compo- nent

    Aggregation by canonical SMILES, retaining the first Molecule ChEMBL ID and the median pChEMBL Value. With this procedure we make sure we end up with one potency value per unique canonical compound. For the creation of the ChemPropEGFR full , the downloaded EGFR curated dataset was randomly split into train/validation/test partitions using a fixed random ...