Uncertainty-aware reinforcement learning for chemical language models

Borja Medina; Jon Paul Janet

arxiv: 2606.24990 · v1 · pith:WI2BOOJOnew · submitted 2026-06-23 · 💻 cs.LG · cs.AI

Uncertainty-aware reinforcement learning for chemical language models

Borja Medina , Jon Paul Janet This is my paper

Pith reviewed 2026-06-26 00:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords reinforcement learningchemical language modelsuncertainty estimationmolecular designhit discoveryde novo designconformal prediction

0 comments

The pith

Uncertainty-aware RL for chemical language models raises true hit rate from 0.5 to 0.75 by favoring reliable predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes two ways to add predictive uncertainty to reinforcement learning loops that train chemical language models for de novo molecular design. One treats uncertainty as an extra term in the reward; the other uses it to down-weight policy updates on molecules whose properties fall outside the scorer’s confident domain. In both a synthetic test bed and two real tasks that employ ChemProp or conformal-prediction wrappers, the resulting policies explore lower-uncertainty regions of chemical space. This produces molecules whose predicted scores are more likely to match experimental reality, increasing the true hit rate by 0.25 and nearly doubling the absolute number of true hits while leaving average molecular scores unchanged.

Core claim

Treating uncertainty either as an additional optimization objective or as a modulator of policy updates lets CLMs avoid high-uncertainty regions, yielding generated molecules whose property predictions are more likely to be correct and thereby raising the fraction of true hits from 0.5 to 0.75 without lowering the average molecular score.

What carries the argument

Uncertainty modulation of policy updates (or multi-objective reward) that down-weights molecules far from the training distribution of the property predictor.

If this is right

True hit rate in de novo design tasks rises from 0.5 to 0.75.
Total number of true hits nearly doubles while average molecular score stays the same.
Exploration is steered toward lower-uncertainty regions of chemical space.
The same two uncertainty-handling mechanisms work across synthetic, ChemProp, and conformal-prediction settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may transfer to other generative models whose property oracles also produce uncertainty estimates.
Success hinges on keeping the uncertainty model calibrated as the generative policy drifts farther from the original data.
If calibration fails for out-of-distribution molecules, the modulation step could inadvertently reinforce over-confident errors.

Load-bearing premise

The uncertainty estimates from the ChemProp or conformal-prediction models remain accurate and well-calibrated for molecules generated outside the original training distribution.

What would settle it

An experiment in which the uncertainty estimates are deliberately miscalibrated on the generated molecules, after which the uncertainty-aware RL shows no gain or a loss in true hit rate, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.24990 by Borja Medina, Jon Paul Janet.

**Figure 1.** Figure 1: RL setup in REINVENT4. The framework consists of an RNN prior that samples molecules from a learned chemical distribution. Generated molecules are evaluated using multiple scoring functions, which provide scalar predictions; these predictions can be interpreted as samples from an underlying distribution. The individual scalar scores are aggregated into a single final score used for the RL optimization proc… view at source ↗

**Figure 2.** Figure 2: Comparison between point estimation and probabilistic scoring functions. In the point estimation scoring function, the score distribution is summarized into a single point estimate, which is then transformed. In contrast, the probabilistic scoring function transforms the entire score distribution using Monte Carlo sampling, and the final value is computed as the mean of the transformed MPO scores. Orange… view at source ↗

**Figure 3.** Figure 3: Schematic representation of the different strategies for incorporating uncertainty in RL. The top panel shows the default REINVENT4 approach, which ignores uncertainty in predictions. The Score Modulation (SM) strategy includes uncertainty as an additional component in the final scoring function, whereas the Loss Modulation (LM) strategy modulates the contribution of each molecule during the gradient updat… view at source ↗

**Figure 4.** Figure 4: Results for Model System with one noisy scoring component using RDKit logP as the scoring predictor. We report (a) the total number of accumulated hit-scaffolds, (b) the True hit ratio and (c) the mean transformed distance, which in Model System corresponds to the uncertainty measure. For all reported scores, except uncertainty, values closer to 1 indicate better performance. Results are averaged over fiv… view at source ↗

**Figure 5.** Figure 5: Results for Model System with two noisy scoring component using RDKit logP and RDKit BertzCT as the scoring predictors. We report (a) the total number of accumulated hit-scaffolds, (b) the ratio of true to total accumulated hit-scaffolds. Additionally, we report (c) the geometric mean of the transformed distances, which in Model System corresponds to the uncertainty measure. For all reported scores, except… view at source ↗

**Figure 6.** Figure 6: Results obtained using ChemProp Predictor model as the activity scoring predictor. We report (a) the total number of accumulated hit-scaffolds, (b) the number of false hits among them, and (c) the ratio of true to total accumulated hit-scaffolds. Additionally, we report several metrics throughout the RL run, including (d) the ChemProp Predictor score use to guide the optimization, (e) the ChemProp oracle… view at source ↗

**Figure 7.** Figure 7: Results obtained using a CP built on top of a RF classifier as the activity scoring predictor. We report (a) the total number of accumulated hitscaffolds, (b) the number of false hits, where false hits were defined as hits that are ultimately classified as uncertain by the CP, and (c) the ratio of true to total accumulated hit-scaffolds. Additionally, we report several metrics throughout the RL run, inclu… view at source ↗

read the original abstract

Reinforcement Learning (RL) has become a powerful paradigm for de novo molecular design, enabling Chemical Language Models (CLMs) to navigate and explore the chemical space while optimizing specific desired properties. However, the existing RL frameworks treat all scoring functions as deterministic oracles, neglecting the inherent uncertainty attached to the predictions of the different molecular properties. This can lead to the exploration of highly-uncertain regions of the chemical space, focusing on the generation of highly scored molecules which are poorly supported by the training data. This can destabilize the optimization process, yielding predictions that are far from their true values. We propose and compare two complementary ways of incorporating predictive uncertainty into RL. In the first one, uncertainty is treated as an additional optimization objective and incorporated along with the rest of the scoring functions, allowing the policy to trade off exploitation against reliability. Secondly, uncertainty is used to modulate policy updates, reducing the influence of molecules whose properties lie far outside the scoring function confidence domain. Both approaches were evaluated across three different settings: (i) a controlled model system, in which the prediction error is modeled as a Gaussian distribution, with a variance proportional to the distance to the training data; and two real-world tasks, making use of (ii) ChemProp models and (iii) a Conformal Prediction wrapper applied to a Random forest classifier. We show that uncertainty-aware RL enables CLMs to explore chemical space more robustly by favoring lower-uncertainty regions. This leads to more reliable hit discovery without compromising molecular score, increasing the true hit rate by 0.25 (from 0.5 to 0.75), and nearly doubling the total number of true hits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a clear hit-rate lift from adding uncertainty to RL for CLMs, but the OOD calibration on generated molecules is not demonstrated.

read the letter

The core result is that two simple uncertainty integrations—treating it as an extra reward term and using it to scale policy updates—raise the true hit rate from 0.5 to 0.75 in their tests while keeping molecular scores intact. The controlled Gaussian-error setting confirms the mechanism works when the uncertainty model is correct by construction. The ChemProp and conformal-prediction runs extend the same pattern to realistic oracles.

This is an application of existing uncertainty-aware RL ideas to chemical language models rather than a new algorithm. The controlled experiment is the strongest part because it isolates the effect without relying on unverified calibration.

The main gap is that the real-world gains rest on the assumption that ChemProp and conformal uncertainties remain accurate for molecules the policy itself generates. Those molecules are out-of-distribution by design, yet the abstract reports no coverage checks or error-vs-uncertainty plots on the final set. Without that, it is hard to attribute the improvement specifically to uncertainty awareness. The lack of error bars or significance tests on the hit-rate numbers is also noticeable.

The work is aimed at groups already running RL pipelines for de novo design who want a practical robustness tweak. The methods are described clearly enough that a referee could check the calibration issue and the experimental details. It is worth sending to review.

Referee Report

2 major / 1 minor

Summary. The paper proposes two methods to incorporate predictive uncertainty into RL for chemical language models (CLMs): treating uncertainty as an additional optimization objective alongside property scores, and using uncertainty to modulate the magnitude of policy updates. These are evaluated in a controlled Gaussian-error simulation and two real-world tasks (ChemProp models; conformal prediction on random forests), with the central empirical claim being an increase in true hit rate from 0.5 to 0.75 and nearly doubled total true hits without loss in molecular score.

Significance. If the uncertainty estimates remain well-calibrated on the OOD molecules produced by the policy, the work provides a practical way to stabilize RL-driven molecular optimization and reduce the risk of exploiting spurious high scores. The controlled-to-real-world progression and the two complementary uncertainty-handling strategies are strengths; the absence of any machine-checked proofs or parameter-free derivations is noted but does not detract from the empirical focus.

major comments (2)

[real-world evaluation] Real-world evaluation sections: no calibration diagnostics (prediction-interval coverage, error-vs-uncertainty correlation, or reliability diagrams) are reported for the final set of molecules generated by the RL policy. Because these molecules are produced by optimizing the very scoring functions whose uncertainty is being used, they lie outside the original training support by construction; without such checks the attribution of the 0.25 hit-rate gain to uncertainty awareness cannot be verified.
[controlled setting and results] Controlled Gaussian-error setting and abstract results: the reported true-hit-rate improvement (0.5 → 0.75) is given without error bars, number of independent runs, statistical tests, or explicit parameterization of how variance scales with distance to training data and how molecules are labeled as true hits. These omissions make it impossible to judge whether the observed difference is robust or reproducible.

minor comments (1)

Notation for the two uncertainty-handling variants is introduced only in the abstract and methods; a short table or explicit equation labels in the results would improve readability when comparing the “additional objective” versus “modulate updates” variants.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the evaluation methodology. We address each major comment below and will revise the manuscript to incorporate the suggested diagnostics and statistical details.

read point-by-point responses

Referee: [real-world evaluation] Real-world evaluation sections: no calibration diagnostics (prediction-interval coverage, error-vs-uncertainty correlation, or reliability diagrams) are reported for the final set of molecules generated by the RL policy. Because these molecules are produced by optimizing the very scoring functions whose uncertainty is being used, they lie outside the original training support by construction; without such checks the attribution of the 0.25 hit-rate gain to uncertainty awareness cannot be verified.

Authors: We agree that calibration checks on the OOD molecules produced by the policy are necessary to strengthen the attribution of performance gains to uncertainty awareness. The original manuscript reported calibration metrics only on the static held-out test sets for the property predictors. In the revision we will compute and report prediction-interval coverage, error-vs-uncertainty correlation, and reliability diagrams specifically on the final molecules generated by each RL policy for both the ChemProp and conformal-prediction experiments. revision: yes
Referee: [controlled setting and results] Controlled Gaussian-error setting and abstract results: the reported true-hit-rate improvement (0.5 → 0.75) is given without error bars, number of independent runs, statistical tests, or explicit parameterization of how variance scales with distance to training data and how molecules are labeled as true hits. These omissions make it impossible to judge whether the observed difference is robust or reproducible.

Authors: The controlled experiments were performed with five independent random seeds. We will add error bars (standard deviation across seeds), state the number of runs explicitly, and include a paired statistical test on the hit-rate difference. The Gaussian variance is parameterized as linearly proportional to the distance in the pre-trained molecular embedding to the nearest training point; true hits are defined as molecules whose noise-free property value exceeds the threshold. These details and the corresponding figures will be expanded in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; central claims rest on independent empirical evaluation

full rationale

The paper proposes two uncertainty-aware RL variants for CLMs and reports empirical hit-rate improvements from controlled Gaussian-error simulations plus real-world ChemProp and conformal-prediction experiments. No derivation chain, equation, or self-citation reduces the reported true-hit-rate gains (0.5 to 0.75) to a fitted parameter or input by construction; the evaluation metrics are computed directly from generated molecules against held-out oracles. Self-citations, if present, are not load-bearing for the main result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities. The approach inherits standard RL assumptions (Markov decision process, policy gradient) and the calibration properties of the cited property predictors; none of these are enumerated or justified in the provided text.

pith-pipeline@v0.9.1-grok · 5831 in / 1325 out tokens · 24970 ms · 2026-06-26T00:15:38.998343+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Sadybekov and Vsevolod Katritch

Anastasiia V. Sadybekov and Vsevolod Katritch. Computational approaches streamlining drug dis- covery.Nature, 616(7958):673–685, April 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-05905-z. URL http://dx.doi.org/10.1038/ s41586-023-05905-z

work page doi:10.1038/s41586-023-05905-z 2023
[2]

Generative deep learning for de novo drug design - a chemical space odyssey.Journal of Chemical Information and Modeling, 65(14): 7352–7372, July 2025

Rıza Özçelik, Helena Brinkmann, Emanuele Criscuolo, and Francesca Grisoni. Generative deep learning for de novo drug design - a chemical space odyssey.Journal of Chemical Information and Modeling, 65(14): 7352–7372, July 2025. ISSN 1549- 960X. doi: 10.1021/acs.jcim.5c00641. URLhttp://dx.doi.org/10.1021/ acs.jcim.5c00641

work page doi:10.1021/acs.jcim.5c00641 2025
[3]

Computational chemistry as applied in environmen- tal research: Opportunities and chal- lenges.ACS ES&T Engineering, 4(1):66–95, October 2023

Christian Sandoval-Pauker, Sheng Yin, Alexandria Castillo, Neidy Ocuane, Diego Puerto-Diaz, and Dino Villagrán. Computational chemistry as applied in environmen- tal research: Opportunities and chal- lenges.ACS ES&T Engineering, 4(1):66–95, October 2023. ISSN 2690-0645. doi: 10.1021/acsestengg. 3c00227. URLhttp://dx.doi.org/ 10.1021/acsestengg.3c00227

work page doi:10.1021/acsestengg 2023
[4]

Leonard, Faruque Hasan, Helen F

Kevin C. Leonard, Faruque Hasan, Helen F. Sneddon, and Fengqi You. Can artificial intelligence and ma- chine learning be used to acceler- ate sustainable chemistry and en- gineering?ACS Sustainable Chemistry & Engineering, 9(18): 6126–6129, May 2021. ISSN 2168-

2021
[5]

doi: 10.1021/acssuschemeng. 1c02741. URLhttp://dx.doi.org/ 10.1021/acssuschemeng.1c02741

work page doi:10.1021/acssuschemeng
[6]

Achar and John A

Siddarth K. Achar and John A. Keith. Small data machine learn- 21 ing approaches in molecular and materials science.Chemical Re- views, 124(24):13571–13573, Decem- ber 2024. ISSN 1520-6890. doi: 10.1021/acs.chemrev.4c00957. URL http://dx.doi.org/10.1021/acs. chemrev.4c00957

work page doi:10.1021/acs.chemrev.4c00957 2024
[7]

Chenru Duan, Aditya Nandy, and Heather J. Kulik. Machine learn- ing for the discovery, design, and engineering of materials.Annual Re- view of Chemical and Biomolecular Engineering, 13(1):405–429, June
[8]

doi: 10.1146/ annurev-chembioeng-092320-120230

ISSN 1947-5446. doi: 10.1146/ annurev-chembioeng-092320-120230. URLhttp://dx.doi.org/10.1146/ annurev-chembioeng-092320-120230

1947
[9]

The enumeration of chemical space.WIREs Com- putational Molecular Science, 2(5): 717–733, April 2012

Jean-Louis Reymond, Lars Rud- digkeit, Lorenz Blum, and Ruud van Deursen. The enumeration of chemical space.WIREs Com- putational Molecular Science, 2(5): 717–733, April 2012. ISSN 1759-0884. doi: 10.1002/wcms.1104. URLhttp: //dx.doi.org/10.1002/wcms.1104

work page doi:10.1002/wcms.1104 2012
[10]

Science328, 1021–1025 (2010) https://doi.org/10.1126/science

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Tim- othy Lillicrap, Karen Simonyan, and Demis Hassabis. A gen- eral reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419): 1140–1144, December 2018...

work page doi:10.1126/science 2018
[11]

OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Che- ung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fis- cher, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Cather- ine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor,...
[12]

doi: 10.48550/ARXIV.1912. 06680. URLhttps://arxiv.org/ abs/1912.06680

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1912 1912
[13]

Reinforcement learning in robotic applications: a comprehensive survey.Artificial Intelligence Review, 55(2):945–990, April 2021

Bharat Singh, Rajesh Kumar, and Vinay Pratap Singh. Reinforcement learning in robotic applications: a comprehensive survey.Artificial Intelligence Review, 55(2):945–990, April 2021. ISSN 1573-7462. doi: 10.1007/s10462-021-09997-9. URL http://dx.doi.org/10.1007/ s10462-021-09997-9

work page doi:10.1007/s10462-021-09997-9 2021
[14]

Exploring applica- tions of deep reinforcement learning for real-world autonomous driving systems.arXiv, 2019

Victor Talpaert, Ibrahim Sobh, B Ravi Kiran, Patrick Mannion, Senthil Yogamani, Ahmad El-Sallab, and Patrick Perez. Exploring applica- tions of deep reinforcement learning for real-world autonomous driving systems.arXiv, 2019. URLhttps: //arxiv.org/abs/1901.01536

Pith/arXiv arXiv 2019
[15]

Cordova, L

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey.IEEE Transactions on Intel- ligent Transportation Systems, 23(6): 4909–4926, 2022. doi: 10.1109/TITS. 2021.3054625

work page doi:10.1109/tits 2022
[16]

Rad- chenko, Olena Savych, Yuriy S

Maria Korshunova, Niles Huang, Stephen Capuzzi, Dmytro S. Rad- chenko, Olena Savych, Yuriy S. Moroz, Carrow I. Wells, Timo- thy M. Willson, Alexander Tropsha, and Olexandr Isayev. Genera- tive and reinforcement learning 22 approaches for the automated de novo design of bioactive compounds. Communications Chemistry, 5(1), October 2022. ISSN 2399-3669. doi:...

work page doi:10.1038/s42004-022-00733-0 2022
[17]

Zare, and Patrick Riley

Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N. Zare, and Patrick Riley. Optimization of molecules via deep reinforcement learning.Scientific Reports, 9 (1), July 2019. ISSN 2045-2322. doi: 10.1038/s41598-019-47148-x. URLhttp://dx.doi.org/10.1038/ s41598-019-47148-x

work page doi:10.1038/s41598-019-47148-x 2019
[18]

Deep reinforcement learning for multipa- rameter optimization in de novo drug design.Journal of Chemical Informa- tion and Modeling, 59(7):3166–3176, June 2019

Niclas Ståhl, Göran Falkman, Alexander Karlsson, Gunnar Math- iason, and Jonas Boström. Deep reinforcement learning for multipa- rameter optimization in de novo drug design.Journal of Chemical Informa- tion and Modeling, 59(7):3166–3176, June 2019. ISSN 1549-960X. doi: 10.1021/acs.jcim.9b00325. URL http://dx.doi.org/10.1021/acs. jcim.9b00325

work page doi:10.1021/acs.jcim.9b00325 2019
[19]

Molecular de-novo design through deep reinforcement learning

Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hong- ming Chen. Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1), September 2017. ISSN 1758-2946. doi: 10.1186/s13321-017-0235-x. URLhttp://dx.doi.org/10.1186/ s13321-017-0235-x

work page doi:10.1186/s13321-017-0235-x 2017
[20]

Deep reinforce- ment learning for de novo drug de- sign.Science Advances, 4(7), July

Mariya Popova, Olexandr Isayev, and Alexander Tropsha. Deep reinforce- ment learning for de novo drug de- sign.Science Advances, 4(7), July
[21]

doi: 10.1126/ sciadv.aap7885

ISSN 2375-2548. doi: 10.1126/ sciadv.aap7885. URLhttp://dx. doi.org/10.1126/sciadv.aap7885

work page doi:10.1126/sciadv.aap7885
[22]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

1998
[23]

Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H

Hannes H. Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H. Mervin, and Ola Engkvist. Reinvent 4: Mod- ern ai–driven generative molecule de- sign.Journal of Cheminformatics, 16 (1), February 2024. ISSN 1758-2946. doi: 10.1186/s13321-024-00812-5. URLhttp://dx.doi.org/10.1186/ s13321-024-00812-5

work page doi:10.1186/s13321-024-00812-5 2024
[24]

Acegen: Reinforcement learn- ing of generative chemical agents for drug discovery.Journal of Chemical Information and Modeling, 64(15): 5900–5911, August 2024

Albert Bou, Morgan Thomas, Se- bastian Dittert, Carles Navarro, Ma- ciej Majewski, Ye Wang, Shivam Pa- tel, Gary Tresadern, Mazen Ahmad, Vincent Moens, Woody Sherman, Si- mone Sciabola, and Gianni De Fab- ritiis. Acegen: Reinforcement learn- ing of generative chemical agents for drug discovery.Journal of Chemical Information and Modeling, 64(15): 5900–591...

work page doi:10.1021/acs.jcim.4c00895 2024
[25]

Sample-efficient gen- erative molecular design using memory manipulation.Nature Machine Intelligence, 8(3):449–460, March 2026

Jeff Guo, Junwu Chen, An- thony GX-Chen, and Philippe Schwaller. Sample-efficient gen- erative molecular design using memory manipulation.Nature Machine Intelligence, 8(3):449–460, March 2026. ISSN 2522-5839. doi: 10.1038/s42256-026-01200-4. URLhttp://dx.doi.org/10.1038/ s42256-026-01200-4

work page doi:10.1038/s42256-026-01200-4 2026
[26]

Areview of uncertainty for deep reinforcement learning.arXiv, 2022

OwenLockwoodandMeiSi. Areview of uncertainty for deep reinforcement learning.arXiv, 2022. doi: 10.48550/ ARXIV.2208.09052. URLhttps:// arxiv.org/abs/2208.09052

arXiv 2022
[27]

What uncertainties do we need in bayesian deep learning for com- puter vision? In I

Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for com- puter vision? In I. Guyon, U. Von 23 Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Asso- ciates, Inc., 2017. URLhttps: //proceedings.neurips.cc/ paper_files/...

2017
[28]

Mervin, Simon Johans- son, Elizaveta Semenova, Kathryn A

Lewis H. Mervin, Simon Johans- son, Elizaveta Semenova, Kathryn A. Giblin, and Ola Engkvist. Un- certainty quantification in drug de- sign.Drug Discovery Today, 26(2): 474–489, February 2021. ISSN 1359-

2021
[29]

doi: 10.1016/j.drudis.2020.11

work page doi:10.1016/j.drudis.2020.11 2020
[30]

1016/j.drudis.2020.11.027

URLhttp://dx.doi.org/10. 1016/j.drudis.2020.11.027

2020
[31]

Iverson.Bayesian Versus Fre- quentist Inference, pages 181–207

Eric-Jan Wagenmakers, Michael Lee, Tom Lodewyckx, and Geoffrey J. Iverson.Bayesian Versus Fre- quentist Inference, pages 181–207. Springer New York, New York, NY, 2008. ISBN 978-0-387-09612-

2008
[32]

doi: 10.1007/978-0-387-09612-4_

work page doi:10.1007/978-0-387-09612-4_
[33]

URLhttps://doi.org/10.1007/ 978-0-387-09612-4_9
[34]

Dis- entangling classical and bayesian ap- proaches to uncertainty analysis.New Zeland: Measurement Standards Lab- oratory, 2012

Robin Willink and Rod White. Dis- entangling classical and bayesian ap- proaches to uncertainty analysis.New Zeland: Measurement Standards Lab- oratory, 2012

2012
[35]

Methods for comparing uncertainty quantifications for material property predictions.Machine Learning: Science and Technology, 1(2):025006, May 2020

Kevin Tran, Willie Neiswanger, Junwoong Yoon, Qingyang Zhang, Eric Xing, and Zachary W Ulissi. Methods for comparing uncertainty quantifications for material property predictions.Machine Learning: Science and Technology, 1(2):025006, May 2020. ISSN 2632-2153. doi: 10.1088/2632-2153/ab7e1a. URL http://dx.doi.org/10.1088/ 2632-2153/ab7e1a

work page doi:10.1088/2632-2153/ab7e1a 2020
[36]

Benjamin Kompa, Jasper Snoek, and Andrew L. Beam. Empirical frequen- tist coverage of deep learning uncer- tainty quantification procedures.En- tropy, 23(12):1608, November 2021. ISSN 1099-4300. doi: 10.3390/ e23121608. URLhttp://dx.doi. org/10.3390/e23121608

work page doi:10.3390/e23121608 2021
[37]

Frequentist uncertainty quantification in semi-structured neural networks

Emilio Dorigatti, Benjamin Schu- bert, Bernd Bischl, and David Ruegamer. Frequentist uncertainty quantification in semi-structured neural networks. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statis- tics, volume 206 ofProceedings of Machine Learning R...

1924
[38]

Alexander Tropsha, Paola Gramat- ica, and Vijay K. Gombar. The im- portance of being earnest: Validation is the absolute essential for success- ful application and interpretation of qspr models.QSAR & Combinato- rial Science, 22(1):69–77, April 2003. ISSN 1611-0218. doi: 10.1002/qsar. 200390007. URLhttp://dx.doi. org/10.1002/qsar.200390007

work page doi:10.1002/qsar 2003
[39]

A stepwise approach for defining the applicabil- ity domain of sar and qsar mod- els.Journal of Chemical Information and Modeling, 45(4):839–849, June

Sabcho Dimitrov, Gergana Dim- itrova, Todor Pavlov, Nadezhda Dim- itrova, Grace Patlewicz, Jay Niemela, and Ovanes Mekenyan. A stepwise approach for defining the applicabil- ity domain of sar and qsar mod- els.Journal of Chemical Information and Modeling, 45(4):839–849, June
[40]

ISSN 1549-960X. doi: 10. 1021/ci0500381. URLhttp://dx. doi.org/10.1021/ci0500381

work page doi:10.1021/ci0500381
[41]

Lennart Eriksson, Joanna Jaworska, Andrew P Worth, Mark T D Cronin, 24 Robert M McDowell, and Paola Gra- matica. Methods for reliability and uncertainty assessment and for appli- cability evaluations of classification- and regression-based qsars.Envi- ronmental Health Perspectives, 111 (10):1361–1375, August 2003. ISSN 1552-9924. doi: 10.1289/ehp.5758. UR...

work page doi:10.1289/ehp.5758 2003
[42]

Applicability domain for qsar mod- els: Where theory meets reality

Domenico Gadaleta, Giuseppe Fe- lice Mangiatordi, Marco Catto, An- gelo Carotti, and Orazio Nicolotti. Applicability domain for qsar mod- els: Where theory meets reality. International Journal of Quantita- tive Structure-Property Relationships, 1(1):45–63, January 2016. ISSN 2379-7479. doi: 10.4018/ijqspr. 2016010102. URLhttp://dx.doi. org/10.4018/IJQSPR....

work page doi:10.4018/ijqspr 2016
[43]

Comparison of dif- ferent approaches to define the applicability domain of qsar mod- els.Molecules, 17(5):4791–4810, April 2012

Faizan Sahigara, Kamel Mansouri, Davide Ballabio, Andrea Mauri, Viviana Consonni, and Roberto Todeschini. Comparison of dif- ferent approaches to define the applicability domain of qsar mod- els.Molecules, 17(5):4791–4810, April 2012. ISSN 1420-3049. doi: 10.3390/molecules17054791. URL http://dx.doi.org/10.3390/ molecules17054791

work page doi:10.3390/molecules17054791 2012
[44]

Schultz, Y

Lane E. Schultz, Yiqi Wang, Ryan Jacobs, and Dane Morgan. A general approach for determining applicabil- ity domain of machine learning mod- els.npj Computational Materials, 11(1), April 2025. ISSN 2057-3960. doi: 10.1038/s41524-025-01573-x. URLhttp://dx.doi.org/10.1038/ s41524-025-01573-x

work page doi:10.1038/s41524-025-01573-x 2025
[45]

and Bates, Stephen , title =

Anastasios N. Angelopoulos and Stephen Bates. Conformal predic- tion: A gentle introduction.Founda- tions and Trends®in Machine Learn- ing, 16(4):494–591, March2023. ISSN 1935-8245. doi: 10.1561/2200000101. URLhttp://dx.doi.org/10.1561/ 2200000101

work page doi:10.1561/2200000101 1935
[46]

Exploring chemical space with score-based out-of-distribution generation.arXiv, 2023

Seul Lee, Jaehyeong Jo, and Sung Ju Hwang. Exploring chemical space with score-based out-of-distribution generation.arXiv, 2023. URLhttps: //arxiv.org/abs/2206.07632

arXiv 2023
[47]

Alshehri, Bryan Tantisujjatham, and Maher M

Abdulelah S. Alshehri, Bryan Tantisujjatham, and Maher M. Alrashed. Uncertainty-aware deep reinforcement learning approach for computational molecular design. Industrial & Engineering Chem- istry Research, 64(20):10117–10130, May 2025. ISSN 1520-5045. doi: 10.1021/acs.iecr.4c04993. URL http://dx.doi.org/10.1021/acs. iecr.4c04993

work page doi:10.1021/acs.iecr.4c04993 2025
[48]

Comparative study of deep gen- erative models on chemical space coverage.Journal of Chemical Information and Modeling, 61(6): 2572–2581, May 2021

Jie Zhang, Rocío Mercado, Ola Engkvist, and Hongming Chen. Comparative study of deep gen- erative models on chemical space coverage.Journal of Chemical Information and Modeling, 61(6): 2572–2581, May 2021. ISSN 1549- 960X. doi: 10.1021/acs.jcim.0c01328. URLhttp://dx.doi.org/10.1021/ acs.jcim.0c01328

work page doi:10.1021/acs.jcim.0c01328 2021
[49]

On failure modes in molecule genera- tion and optimization.Drug Dis- covery Today: Technologies, 32-33: 55–63, December 2019

Philipp Renz, Dries Van Rompaey, Jörg Kurt Wegner, Sepp Hochre- iter, and Günter Klambauer. On failure modes in molecule genera- tion and optimization.Drug Dis- covery Today: Technologies, 32-33: 55–63, December 2019. ISSN 1740-

2019
[50]

doi: 10.1016/j.ddtec.2020.09

work page doi:10.1016/j.ddtec.2020.09 2020
[51]

1016/j.ddtec.2020.09.003

URLhttp://dx.doi.org/10. 1016/j.ddtec.2020.09.003

2020
[52]

25 A data-driven generative strat- egy to avoid reward hacking in multi-objective molecular design

Tatsuya Yoshizawa, Shoichi Ishida, Tomohiro Sato, Masateru Ohta, Teruki Honma, and Kei Terayama. 25 A data-driven generative strat- egy to avoid reward hacking in multi-objective molecular design. Nature Communications, 16(1), March 2025. ISSN 2041-1723. doi: 10.1038/s41467-025-57582-3. URLhttp://dx.doi.org/10.1038/ s41467-025-57582-3

work page doi:10.1038/s41467-025-57582-3 2025
[53]

Uncertainty in qsar predictions.Alternatives to Laboratory Animals, 41(1):111–125, March 2013

Ullrika Sahlin. Uncertainty in qsar predictions.Alternatives to Laboratory Animals, 41(1):111–125, March 2013. ISSN 2632-3559. doi: 10.1177/026119291304100111. URLhttp://dx.doi.org/10.1177/ 026119291304100111

work page doi:10.1177/026119291304100111 2013
[54]

Lior Hirschfeld, Kyle Swanson, Kevin Yang, Regina Barzilay, and Con- nor W. Coley. Uncertainty quantifica- tion using neural networks for molec- ular property prediction.Journal of Chemical Information and Modeling, 60(8):3770–3780, July 2020. ISSN 1549-960X. doi: 10.1021/acs.jcim. 0c00502. URLhttp://dx.doi.org/ 10.1021/acs.jcim.0c00502

work page doi:10.1021/acs.jcim 2020
[55]

Tom Frömbgen, Elizaveta Surzhikova, Jürgen Dölz, Jonny Proppe, Barbara Kirchner, and Christoph R. Jacob. Uncer- tainty quantification for <i>in silico</i> chemistry.Chemi- cal Reviews, 126(7):4189–4236, March 2026. ISSN 1520-6890. doi: 10.1021/acs.chemrev.5c00931. URL http://dx.doi.org/10.1021/acs. chemrev.5c00931

work page doi:10.1021/acs.chemrev.5c00931 2026
[56]

Rasmussen, Chenru Duan, Heather J

Maria H. Rasmussen, Chenru Duan, Heather J. Kulik, and Jan H. Jensen. Uncertain of uncertainties? a com- parison of uncertainty quantification metrics for chemical data sets. Journal of Cheminformatics, 15(1), December 2023. ISSN 1758-2946. doi: 10.1186/s13321-023-00790-0. URLhttp://dx.doi.org/10.1186/ s13321-023-00790-0

work page doi:10.1186/s13321-023-00790-0 2023
[57]

Costas D. Maranas. Optimal molec- ular design under property prediction uncertainty.AIChE Journal, 43(5): 1250–1264, May 1997. ISSN 1547-

1997
[58]

URLhttp://dx.doi.org/10.1002/ aic.690430514

doi: 10.1002/aic.690430514. URLhttp://dx.doi.org/10.1002/ aic.690430514

work page doi:10.1002/aic.690430514
[59]

Heil, Philip M

Thomas Blaschke, Josep Arús-Pous, Hongming Chen, Christian Margreit- ter, Christian Tyrchan, Ola Engkvist, Kostas Papadopoulos, and Atanas Patronov. Reinvent 2.0: An ai tool for de novo drug design.Journal of Chemical Information and Model- ing, 60(12):5918–5922, October 2020. ISSN 1549-960X. doi: 10.1021/acs. jcim.0c00915. URLhttp://dx.doi. org/10.1021/a...

work page doi:10.1021/acs 2020
[60]

Libinvent: Reaction-based generative scaffold decoration for in silico library design.Journal of Chemical Information and Modeling, 62(9):2046– 2063, 2022

Vendy Fialková, Jiaxi Zhao, Kostas Papadopoulos, Ola Engkvist, Es- ben Jannik Bjerrum, Thierry Ko- gej, and Atanas Patronov. Libin- vent: Reaction-based generative scaf- fold decoration for <i>in silico</i> library design.Journal of Chemi- cal Information and Modeling, 62(9): 2046–2063, August 2021. ISSN 1549- 960X. doi: 10.1021/acs.jcim.1c00469. URLhttp:...

work page doi:10.1021/acs.jcim.1c00469 2046
[61]

Sample effi- cient reinforcement learning with ac- tive learning for molecular design

Michael Dodds, Jeff Guo, Thomas Löhr, Alessandro Tibo, Ola Engkvist, and Jon Paul Janet. Sample effi- cient reinforcement learning with ac- tive learning for molecular design. Chemical Science, 15(11):4146–4160,
[62]

doi: 10.1039/ d3sc04653b

ISSN 2041-6539. doi: 10.1039/ d3sc04653b. URLhttp://dx.doi. org/10.1039/D3SC04653B

work page doi:10.1039/d3sc04653b 2041
[63]

Intro- ducing conformal prediction in pre- dictive modeling

Ulf Norinder, Lars Carlsson, Scott Boyer, and Martin Eklund. Intro- ducing conformal prediction in pre- dictive modeling. a transparent and 26 flexible alternative to applicability domain determination.Journal of Chemical Information and Modeling, 54(6):1596–1603, May 2014. ISSN 1549-960X. doi: 10.1021/ci5001168. URLhttp://dx.doi.org/10.1021/ ci5001168

work page doi:10.1021/ci5001168 2014
[64]

nonconformist: Python implementation of the conformal prediction framework

Henrik Linusson. nonconformist: Python implementation of the conformal prediction framework. https://github.com/donlnz/ nonconformist, 2017. 27 Supplementary Information for Uncertainty-aware reinforcement learning for chemical language models Borja Medina Molecular AI, Discovery Sciences, BioPharmaceuticals R&D AstraZeneca AB Gothenburg, Sweden borja.med...

2017
[65]

Aggregation by Molecule ChEMBL ID, retaining the canonical SMILES and the median pChEMBL Value
[66]

Noisy Compo- nent

Aggregation by canonical SMILES, retaining the first Molecule ChEMBL ID and the median pChEMBL Value. With this procedure we make sure we end up with one potency value per unique canonical compound. For the creation of the ChemPropEGFR full , the downloaded EGFR curated dataset was randomly split into train/validation/test partitions using a fixed random ...

2025

[1] [1]

Sadybekov and Vsevolod Katritch

Anastasiia V. Sadybekov and Vsevolod Katritch. Computational approaches streamlining drug dis- covery.Nature, 616(7958):673–685, April 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-05905-z. URL http://dx.doi.org/10.1038/ s41586-023-05905-z

work page doi:10.1038/s41586-023-05905-z 2023

[2] [2]

Generative deep learning for de novo drug design - a chemical space odyssey.Journal of Chemical Information and Modeling, 65(14): 7352–7372, July 2025

Rıza Özçelik, Helena Brinkmann, Emanuele Criscuolo, and Francesca Grisoni. Generative deep learning for de novo drug design - a chemical space odyssey.Journal of Chemical Information and Modeling, 65(14): 7352–7372, July 2025. ISSN 1549- 960X. doi: 10.1021/acs.jcim.5c00641. URLhttp://dx.doi.org/10.1021/ acs.jcim.5c00641

work page doi:10.1021/acs.jcim.5c00641 2025

[3] [3]

Computational chemistry as applied in environmen- tal research: Opportunities and chal- lenges.ACS ES&T Engineering, 4(1):66–95, October 2023

Christian Sandoval-Pauker, Sheng Yin, Alexandria Castillo, Neidy Ocuane, Diego Puerto-Diaz, and Dino Villagrán. Computational chemistry as applied in environmen- tal research: Opportunities and chal- lenges.ACS ES&T Engineering, 4(1):66–95, October 2023. ISSN 2690-0645. doi: 10.1021/acsestengg. 3c00227. URLhttp://dx.doi.org/ 10.1021/acsestengg.3c00227

work page doi:10.1021/acsestengg 2023

[4] [4]

Leonard, Faruque Hasan, Helen F

Kevin C. Leonard, Faruque Hasan, Helen F. Sneddon, and Fengqi You. Can artificial intelligence and ma- chine learning be used to acceler- ate sustainable chemistry and en- gineering?ACS Sustainable Chemistry & Engineering, 9(18): 6126–6129, May 2021. ISSN 2168-

2021

[5] [5]

doi: 10.1021/acssuschemeng. 1c02741. URLhttp://dx.doi.org/ 10.1021/acssuschemeng.1c02741

work page doi:10.1021/acssuschemeng

[6] [6]

Achar and John A

Siddarth K. Achar and John A. Keith. Small data machine learn- 21 ing approaches in molecular and materials science.Chemical Re- views, 124(24):13571–13573, Decem- ber 2024. ISSN 1520-6890. doi: 10.1021/acs.chemrev.4c00957. URL http://dx.doi.org/10.1021/acs. chemrev.4c00957

work page doi:10.1021/acs.chemrev.4c00957 2024

[7] [7]

Chenru Duan, Aditya Nandy, and Heather J. Kulik. Machine learn- ing for the discovery, design, and engineering of materials.Annual Re- view of Chemical and Biomolecular Engineering, 13(1):405–429, June

[8] [8]

doi: 10.1146/ annurev-chembioeng-092320-120230

ISSN 1947-5446. doi: 10.1146/ annurev-chembioeng-092320-120230. URLhttp://dx.doi.org/10.1146/ annurev-chembioeng-092320-120230

1947

[9] [9]

The enumeration of chemical space.WIREs Com- putational Molecular Science, 2(5): 717–733, April 2012

Jean-Louis Reymond, Lars Rud- digkeit, Lorenz Blum, and Ruud van Deursen. The enumeration of chemical space.WIREs Com- putational Molecular Science, 2(5): 717–733, April 2012. ISSN 1759-0884. doi: 10.1002/wcms.1104. URLhttp: //dx.doi.org/10.1002/wcms.1104

work page doi:10.1002/wcms.1104 2012

[10] [10]

Science328, 1021–1025 (2010) https://doi.org/10.1126/science

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Tim- othy Lillicrap, Karen Simonyan, and Demis Hassabis. A gen- eral reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419): 1140–1144, December 2018...

work page doi:10.1126/science 2018

[11] [11]

OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Che- ung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fis- cher, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Cather- ine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor,...

[12] [12]

doi: 10.48550/ARXIV.1912. 06680. URLhttps://arxiv.org/ abs/1912.06680

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1912 1912

[13] [13]

Reinforcement learning in robotic applications: a comprehensive survey.Artificial Intelligence Review, 55(2):945–990, April 2021

Bharat Singh, Rajesh Kumar, and Vinay Pratap Singh. Reinforcement learning in robotic applications: a comprehensive survey.Artificial Intelligence Review, 55(2):945–990, April 2021. ISSN 1573-7462. doi: 10.1007/s10462-021-09997-9. URL http://dx.doi.org/10.1007/ s10462-021-09997-9

work page doi:10.1007/s10462-021-09997-9 2021

[14] [14]

Exploring applica- tions of deep reinforcement learning for real-world autonomous driving systems.arXiv, 2019

Victor Talpaert, Ibrahim Sobh, B Ravi Kiran, Patrick Mannion, Senthil Yogamani, Ahmad El-Sallab, and Patrick Perez. Exploring applica- tions of deep reinforcement learning for real-world autonomous driving systems.arXiv, 2019. URLhttps: //arxiv.org/abs/1901.01536

Pith/arXiv arXiv 2019

[15] [15]

Cordova, L

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey.IEEE Transactions on Intel- ligent Transportation Systems, 23(6): 4909–4926, 2022. doi: 10.1109/TITS. 2021.3054625

work page doi:10.1109/tits 2022

[16] [16]

Rad- chenko, Olena Savych, Yuriy S

Maria Korshunova, Niles Huang, Stephen Capuzzi, Dmytro S. Rad- chenko, Olena Savych, Yuriy S. Moroz, Carrow I. Wells, Timo- thy M. Willson, Alexander Tropsha, and Olexandr Isayev. Genera- tive and reinforcement learning 22 approaches for the automated de novo design of bioactive compounds. Communications Chemistry, 5(1), October 2022. ISSN 2399-3669. doi:...

work page doi:10.1038/s42004-022-00733-0 2022

[17] [17]

Zare, and Patrick Riley

Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N. Zare, and Patrick Riley. Optimization of molecules via deep reinforcement learning.Scientific Reports, 9 (1), July 2019. ISSN 2045-2322. doi: 10.1038/s41598-019-47148-x. URLhttp://dx.doi.org/10.1038/ s41598-019-47148-x

work page doi:10.1038/s41598-019-47148-x 2019

[18] [18]

Deep reinforcement learning for multipa- rameter optimization in de novo drug design.Journal of Chemical Informa- tion and Modeling, 59(7):3166–3176, June 2019

Niclas Ståhl, Göran Falkman, Alexander Karlsson, Gunnar Math- iason, and Jonas Boström. Deep reinforcement learning for multipa- rameter optimization in de novo drug design.Journal of Chemical Informa- tion and Modeling, 59(7):3166–3176, June 2019. ISSN 1549-960X. doi: 10.1021/acs.jcim.9b00325. URL http://dx.doi.org/10.1021/acs. jcim.9b00325

work page doi:10.1021/acs.jcim.9b00325 2019

[19] [19]

Molecular de-novo design through deep reinforcement learning

Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hong- ming Chen. Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1), September 2017. ISSN 1758-2946. doi: 10.1186/s13321-017-0235-x. URLhttp://dx.doi.org/10.1186/ s13321-017-0235-x

work page doi:10.1186/s13321-017-0235-x 2017

[20] [20]

Deep reinforce- ment learning for de novo drug de- sign.Science Advances, 4(7), July

Mariya Popova, Olexandr Isayev, and Alexander Tropsha. Deep reinforce- ment learning for de novo drug de- sign.Science Advances, 4(7), July

[21] [21]

doi: 10.1126/ sciadv.aap7885

ISSN 2375-2548. doi: 10.1126/ sciadv.aap7885. URLhttp://dx. doi.org/10.1126/sciadv.aap7885

work page doi:10.1126/sciadv.aap7885

[22] [22]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

1998

[23] [23]

Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H

Hannes H. Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H. Mervin, and Ola Engkvist. Reinvent 4: Mod- ern ai–driven generative molecule de- sign.Journal of Cheminformatics, 16 (1), February 2024. ISSN 1758-2946. doi: 10.1186/s13321-024-00812-5. URLhttp://dx.doi.org/10.1186/ s13321-024-00812-5

work page doi:10.1186/s13321-024-00812-5 2024

[24] [24]

Acegen: Reinforcement learn- ing of generative chemical agents for drug discovery.Journal of Chemical Information and Modeling, 64(15): 5900–5911, August 2024

Albert Bou, Morgan Thomas, Se- bastian Dittert, Carles Navarro, Ma- ciej Majewski, Ye Wang, Shivam Pa- tel, Gary Tresadern, Mazen Ahmad, Vincent Moens, Woody Sherman, Si- mone Sciabola, and Gianni De Fab- ritiis. Acegen: Reinforcement learn- ing of generative chemical agents for drug discovery.Journal of Chemical Information and Modeling, 64(15): 5900–591...

work page doi:10.1021/acs.jcim.4c00895 2024

[25] [25]

Sample-efficient gen- erative molecular design using memory manipulation.Nature Machine Intelligence, 8(3):449–460, March 2026

Jeff Guo, Junwu Chen, An- thony GX-Chen, and Philippe Schwaller. Sample-efficient gen- erative molecular design using memory manipulation.Nature Machine Intelligence, 8(3):449–460, March 2026. ISSN 2522-5839. doi: 10.1038/s42256-026-01200-4. URLhttp://dx.doi.org/10.1038/ s42256-026-01200-4

work page doi:10.1038/s42256-026-01200-4 2026

[26] [26]

Areview of uncertainty for deep reinforcement learning.arXiv, 2022

OwenLockwoodandMeiSi. Areview of uncertainty for deep reinforcement learning.arXiv, 2022. doi: 10.48550/ ARXIV.2208.09052. URLhttps:// arxiv.org/abs/2208.09052

arXiv 2022

[27] [27]

What uncertainties do we need in bayesian deep learning for com- puter vision? In I

Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for com- puter vision? In I. Guyon, U. Von 23 Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Asso- ciates, Inc., 2017. URLhttps: //proceedings.neurips.cc/ paper_files/...

2017

[28] [28]

Mervin, Simon Johans- son, Elizaveta Semenova, Kathryn A

Lewis H. Mervin, Simon Johans- son, Elizaveta Semenova, Kathryn A. Giblin, and Ola Engkvist. Un- certainty quantification in drug de- sign.Drug Discovery Today, 26(2): 474–489, February 2021. ISSN 1359-

2021

[29] [29]

doi: 10.1016/j.drudis.2020.11

work page doi:10.1016/j.drudis.2020.11 2020

[30] [30]

1016/j.drudis.2020.11.027

URLhttp://dx.doi.org/10. 1016/j.drudis.2020.11.027

2020

[31] [31]

Iverson.Bayesian Versus Fre- quentist Inference, pages 181–207

Eric-Jan Wagenmakers, Michael Lee, Tom Lodewyckx, and Geoffrey J. Iverson.Bayesian Versus Fre- quentist Inference, pages 181–207. Springer New York, New York, NY, 2008. ISBN 978-0-387-09612-

2008

[32] [32]

doi: 10.1007/978-0-387-09612-4_

work page doi:10.1007/978-0-387-09612-4_

[33] [33]

URLhttps://doi.org/10.1007/ 978-0-387-09612-4_9

[34] [34]

Dis- entangling classical and bayesian ap- proaches to uncertainty analysis.New Zeland: Measurement Standards Lab- oratory, 2012

Robin Willink and Rod White. Dis- entangling classical and bayesian ap- proaches to uncertainty analysis.New Zeland: Measurement Standards Lab- oratory, 2012

2012

[35] [35]

Methods for comparing uncertainty quantifications for material property predictions.Machine Learning: Science and Technology, 1(2):025006, May 2020

Kevin Tran, Willie Neiswanger, Junwoong Yoon, Qingyang Zhang, Eric Xing, and Zachary W Ulissi. Methods for comparing uncertainty quantifications for material property predictions.Machine Learning: Science and Technology, 1(2):025006, May 2020. ISSN 2632-2153. doi: 10.1088/2632-2153/ab7e1a. URL http://dx.doi.org/10.1088/ 2632-2153/ab7e1a

work page doi:10.1088/2632-2153/ab7e1a 2020

[36] [36]

Benjamin Kompa, Jasper Snoek, and Andrew L. Beam. Empirical frequen- tist coverage of deep learning uncer- tainty quantification procedures.En- tropy, 23(12):1608, November 2021. ISSN 1099-4300. doi: 10.3390/ e23121608. URLhttp://dx.doi. org/10.3390/e23121608

work page doi:10.3390/e23121608 2021

[37] [37]

Frequentist uncertainty quantification in semi-structured neural networks

Emilio Dorigatti, Benjamin Schu- bert, Bernd Bischl, and David Ruegamer. Frequentist uncertainty quantification in semi-structured neural networks. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statis- tics, volume 206 ofProceedings of Machine Learning R...

1924

[38] [38]

Alexander Tropsha, Paola Gramat- ica, and Vijay K. Gombar. The im- portance of being earnest: Validation is the absolute essential for success- ful application and interpretation of qspr models.QSAR & Combinato- rial Science, 22(1):69–77, April 2003. ISSN 1611-0218. doi: 10.1002/qsar. 200390007. URLhttp://dx.doi. org/10.1002/qsar.200390007

work page doi:10.1002/qsar 2003

[39] [39]

A stepwise approach for defining the applicabil- ity domain of sar and qsar mod- els.Journal of Chemical Information and Modeling, 45(4):839–849, June

Sabcho Dimitrov, Gergana Dim- itrova, Todor Pavlov, Nadezhda Dim- itrova, Grace Patlewicz, Jay Niemela, and Ovanes Mekenyan. A stepwise approach for defining the applicabil- ity domain of sar and qsar mod- els.Journal of Chemical Information and Modeling, 45(4):839–849, June

[40] [40]

ISSN 1549-960X. doi: 10. 1021/ci0500381. URLhttp://dx. doi.org/10.1021/ci0500381

work page doi:10.1021/ci0500381

[41] [41]

Lennart Eriksson, Joanna Jaworska, Andrew P Worth, Mark T D Cronin, 24 Robert M McDowell, and Paola Gra- matica. Methods for reliability and uncertainty assessment and for appli- cability evaluations of classification- and regression-based qsars.Envi- ronmental Health Perspectives, 111 (10):1361–1375, August 2003. ISSN 1552-9924. doi: 10.1289/ehp.5758. UR...

work page doi:10.1289/ehp.5758 2003

[42] [42]

Applicability domain for qsar mod- els: Where theory meets reality

Domenico Gadaleta, Giuseppe Fe- lice Mangiatordi, Marco Catto, An- gelo Carotti, and Orazio Nicolotti. Applicability domain for qsar mod- els: Where theory meets reality. International Journal of Quantita- tive Structure-Property Relationships, 1(1):45–63, January 2016. ISSN 2379-7479. doi: 10.4018/ijqspr. 2016010102. URLhttp://dx.doi. org/10.4018/IJQSPR....

work page doi:10.4018/ijqspr 2016

[43] [43]

Comparison of dif- ferent approaches to define the applicability domain of qsar mod- els.Molecules, 17(5):4791–4810, April 2012

Faizan Sahigara, Kamel Mansouri, Davide Ballabio, Andrea Mauri, Viviana Consonni, and Roberto Todeschini. Comparison of dif- ferent approaches to define the applicability domain of qsar mod- els.Molecules, 17(5):4791–4810, April 2012. ISSN 1420-3049. doi: 10.3390/molecules17054791. URL http://dx.doi.org/10.3390/ molecules17054791

work page doi:10.3390/molecules17054791 2012

[44] [44]

Schultz, Y

Lane E. Schultz, Yiqi Wang, Ryan Jacobs, and Dane Morgan. A general approach for determining applicabil- ity domain of machine learning mod- els.npj Computational Materials, 11(1), April 2025. ISSN 2057-3960. doi: 10.1038/s41524-025-01573-x. URLhttp://dx.doi.org/10.1038/ s41524-025-01573-x

work page doi:10.1038/s41524-025-01573-x 2025

[45] [45]

and Bates, Stephen , title =

Anastasios N. Angelopoulos and Stephen Bates. Conformal predic- tion: A gentle introduction.Founda- tions and Trends®in Machine Learn- ing, 16(4):494–591, March2023. ISSN 1935-8245. doi: 10.1561/2200000101. URLhttp://dx.doi.org/10.1561/ 2200000101

work page doi:10.1561/2200000101 1935

[46] [46]

Exploring chemical space with score-based out-of-distribution generation.arXiv, 2023

Seul Lee, Jaehyeong Jo, and Sung Ju Hwang. Exploring chemical space with score-based out-of-distribution generation.arXiv, 2023. URLhttps: //arxiv.org/abs/2206.07632

arXiv 2023

[47] [47]

Alshehri, Bryan Tantisujjatham, and Maher M

Abdulelah S. Alshehri, Bryan Tantisujjatham, and Maher M. Alrashed. Uncertainty-aware deep reinforcement learning approach for computational molecular design. Industrial & Engineering Chem- istry Research, 64(20):10117–10130, May 2025. ISSN 1520-5045. doi: 10.1021/acs.iecr.4c04993. URL http://dx.doi.org/10.1021/acs. iecr.4c04993

work page doi:10.1021/acs.iecr.4c04993 2025

[48] [48]

Comparative study of deep gen- erative models on chemical space coverage.Journal of Chemical Information and Modeling, 61(6): 2572–2581, May 2021

Jie Zhang, Rocío Mercado, Ola Engkvist, and Hongming Chen. Comparative study of deep gen- erative models on chemical space coverage.Journal of Chemical Information and Modeling, 61(6): 2572–2581, May 2021. ISSN 1549- 960X. doi: 10.1021/acs.jcim.0c01328. URLhttp://dx.doi.org/10.1021/ acs.jcim.0c01328

work page doi:10.1021/acs.jcim.0c01328 2021

[49] [49]

On failure modes in molecule genera- tion and optimization.Drug Dis- covery Today: Technologies, 32-33: 55–63, December 2019

Philipp Renz, Dries Van Rompaey, Jörg Kurt Wegner, Sepp Hochre- iter, and Günter Klambauer. On failure modes in molecule genera- tion and optimization.Drug Dis- covery Today: Technologies, 32-33: 55–63, December 2019. ISSN 1740-

2019

[50] [50]

doi: 10.1016/j.ddtec.2020.09

work page doi:10.1016/j.ddtec.2020.09 2020

[51] [51]

1016/j.ddtec.2020.09.003

URLhttp://dx.doi.org/10. 1016/j.ddtec.2020.09.003

2020

[52] [52]

25 A data-driven generative strat- egy to avoid reward hacking in multi-objective molecular design

Tatsuya Yoshizawa, Shoichi Ishida, Tomohiro Sato, Masateru Ohta, Teruki Honma, and Kei Terayama. 25 A data-driven generative strat- egy to avoid reward hacking in multi-objective molecular design. Nature Communications, 16(1), March 2025. ISSN 2041-1723. doi: 10.1038/s41467-025-57582-3. URLhttp://dx.doi.org/10.1038/ s41467-025-57582-3

work page doi:10.1038/s41467-025-57582-3 2025

[53] [53]

Uncertainty in qsar predictions.Alternatives to Laboratory Animals, 41(1):111–125, March 2013

Ullrika Sahlin. Uncertainty in qsar predictions.Alternatives to Laboratory Animals, 41(1):111–125, March 2013. ISSN 2632-3559. doi: 10.1177/026119291304100111. URLhttp://dx.doi.org/10.1177/ 026119291304100111

work page doi:10.1177/026119291304100111 2013

[54] [54]

Lior Hirschfeld, Kyle Swanson, Kevin Yang, Regina Barzilay, and Con- nor W. Coley. Uncertainty quantifica- tion using neural networks for molec- ular property prediction.Journal of Chemical Information and Modeling, 60(8):3770–3780, July 2020. ISSN 1549-960X. doi: 10.1021/acs.jcim. 0c00502. URLhttp://dx.doi.org/ 10.1021/acs.jcim.0c00502

work page doi:10.1021/acs.jcim 2020

[55] [55]

Tom Frömbgen, Elizaveta Surzhikova, Jürgen Dölz, Jonny Proppe, Barbara Kirchner, and Christoph R. Jacob. Uncer- tainty quantification for <i>in silico</i> chemistry.Chemi- cal Reviews, 126(7):4189–4236, March 2026. ISSN 1520-6890. doi: 10.1021/acs.chemrev.5c00931. URL http://dx.doi.org/10.1021/acs. chemrev.5c00931

work page doi:10.1021/acs.chemrev.5c00931 2026

[56] [56]

Rasmussen, Chenru Duan, Heather J

Maria H. Rasmussen, Chenru Duan, Heather J. Kulik, and Jan H. Jensen. Uncertain of uncertainties? a com- parison of uncertainty quantification metrics for chemical data sets. Journal of Cheminformatics, 15(1), December 2023. ISSN 1758-2946. doi: 10.1186/s13321-023-00790-0. URLhttp://dx.doi.org/10.1186/ s13321-023-00790-0

work page doi:10.1186/s13321-023-00790-0 2023

[57] [57]

Costas D. Maranas. Optimal molec- ular design under property prediction uncertainty.AIChE Journal, 43(5): 1250–1264, May 1997. ISSN 1547-

1997

[58] [58]

URLhttp://dx.doi.org/10.1002/ aic.690430514

doi: 10.1002/aic.690430514. URLhttp://dx.doi.org/10.1002/ aic.690430514

work page doi:10.1002/aic.690430514

[59] [59]

Heil, Philip M

Thomas Blaschke, Josep Arús-Pous, Hongming Chen, Christian Margreit- ter, Christian Tyrchan, Ola Engkvist, Kostas Papadopoulos, and Atanas Patronov. Reinvent 2.0: An ai tool for de novo drug design.Journal of Chemical Information and Model- ing, 60(12):5918–5922, October 2020. ISSN 1549-960X. doi: 10.1021/acs. jcim.0c00915. URLhttp://dx.doi. org/10.1021/a...

work page doi:10.1021/acs 2020

[60] [60]

Libinvent: Reaction-based generative scaffold decoration for in silico library design.Journal of Chemical Information and Modeling, 62(9):2046– 2063, 2022

Vendy Fialková, Jiaxi Zhao, Kostas Papadopoulos, Ola Engkvist, Es- ben Jannik Bjerrum, Thierry Ko- gej, and Atanas Patronov. Libin- vent: Reaction-based generative scaf- fold decoration for <i>in silico</i> library design.Journal of Chemi- cal Information and Modeling, 62(9): 2046–2063, August 2021. ISSN 1549- 960X. doi: 10.1021/acs.jcim.1c00469. URLhttp:...

work page doi:10.1021/acs.jcim.1c00469 2046

[61] [61]

Sample effi- cient reinforcement learning with ac- tive learning for molecular design

Michael Dodds, Jeff Guo, Thomas Löhr, Alessandro Tibo, Ola Engkvist, and Jon Paul Janet. Sample effi- cient reinforcement learning with ac- tive learning for molecular design. Chemical Science, 15(11):4146–4160,

[62] [62]

doi: 10.1039/ d3sc04653b

ISSN 2041-6539. doi: 10.1039/ d3sc04653b. URLhttp://dx.doi. org/10.1039/D3SC04653B

work page doi:10.1039/d3sc04653b 2041

[63] [63]

Intro- ducing conformal prediction in pre- dictive modeling

Ulf Norinder, Lars Carlsson, Scott Boyer, and Martin Eklund. Intro- ducing conformal prediction in pre- dictive modeling. a transparent and 26 flexible alternative to applicability domain determination.Journal of Chemical Information and Modeling, 54(6):1596–1603, May 2014. ISSN 1549-960X. doi: 10.1021/ci5001168. URLhttp://dx.doi.org/10.1021/ ci5001168

work page doi:10.1021/ci5001168 2014

[64] [64]

nonconformist: Python implementation of the conformal prediction framework

Henrik Linusson. nonconformist: Python implementation of the conformal prediction framework. https://github.com/donlnz/ nonconformist, 2017. 27 Supplementary Information for Uncertainty-aware reinforcement learning for chemical language models Borja Medina Molecular AI, Discovery Sciences, BioPharmaceuticals R&D AstraZeneca AB Gothenburg, Sweden borja.med...

2017

[65] [65]

Aggregation by Molecule ChEMBL ID, retaining the canonical SMILES and the median pChEMBL Value

[66] [66]

Noisy Compo- nent

Aggregation by canonical SMILES, retaining the first Molecule ChEMBL ID and the median pChEMBL Value. With this procedure we make sure we end up with one potency value per unique canonical compound. For the creation of the ChemPropEGFR full , the downloaded EGFR curated dataset was randomly split into train/validation/test partitions using a fixed random ...

2025