pith. sign in

arxiv: 2409.02231 · v5 · submitted 2024-09-03 · ⚛️ physics.chem-ph · cs.LG

SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration

Pith reviewed 2026-05-23 21:14 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cs.LG
keywords large language modelssupervised fine-tuningchemical space explorationdrug discoverymolecular generationdirect preference optimizationSmileyLlama
0
0 comments X

The pith

Large language models can be fine-tuned with engineered prompts to generate drug-like molecules with user-specified properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that large language models can be transformed via supervised fine-tuning of engineered prompts into SmileyLlama, which explores chemical space to produce valid and novel drug molecules. This matters to a sympathetic reader because it lets users direct molecule generation through familiar language interfaces rather than relying on separate specialized models or chatbots with only general chemistry knowledge. The work further applies direct preference optimization both to tighten prompt adherence and to support an optimization loop for molecules with desired 3D shapes and strong binding to targets. A reader cares about the outcome because the same base model keeps most of its original language abilities while performing the chemical task.

Core claim

By training an LLM to speak directly as a chemical language model through supervised fine-tuning on engineered prompts, SmileyLlama reliably generates molecules that match user-specified properties. Direct preference optimization strengthens adherence to those prompts and integrates into a reinforcement learning setup that favors molecules with optimized conformations and high binding affinity. The resulting system is benchmarked against both general pre-trained language models and chemical language models trained from scratch, while the overall supervised fine-tuning plus preference optimization approach is presented as extensible beyond drug discovery.

What carries the argument

Supervised fine-tuning of engineered prompts combined with direct preference optimization, which converts the LLM into SmileyLlama for directed molecular generation.

If this is right

  • SmileyLlama produces valid and novel drug-like molecules at rates comparable to or better than chemical language models trained from scratch.
  • Direct preference optimization inside the iMiner framework yields molecules with improved 3D conformations and higher binding affinity to chosen targets.
  • The model continues to handle ordinary natural language queries alongside its chemical generation task.
  • The supervised fine-tuning and direct preference optimization steps can be applied to other chemical, biological, or materials generation problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Chemists could steer molecule design through ordinary conversational instructions instead of writing specialized code or SMILES strings.
  • The retained language abilities open the possibility of chaining the model with other language-based tools for multi-step design reasoning.
  • Similar prompt-based adaptation might shorten the data requirements when moving the same base model to new scientific domains.

Load-bearing premise

That supervised fine-tuning on engineered prompts plus direct preference optimization will produce reliable molecule generation matching user properties while keeping most natural language capabilities intact.

What would settle it

A controlled test set of prompts that specify concrete molecular properties such as molecular weight range or binding score threshold, followed by independent chemical validation showing that the generated structures fail to meet those properties at rates comparable to or worse than baseline models.

Figures

Figures reproduced from arXiv: 2409.02231 by Andrew Gritsevskiy, Dorian Bagni, Joseph M. Cavanagh, Kunyang Sun, Teresa Head-Gordon, Thomas D. Bannister, Yingze Wang.

Figure 1
Figure 1. Figure 1: A visualization of the SFT workflow for Smiley-Llama. Given the Llama-3.1-8B-Instruct model 26, we used prompt-response pairs consisting of calculated molecular properties and completed SMILES strings to fine-tune Llama on SMILES strings completions, yielding SmileyLlama. Crucially, we construct the prompt for each example using properties calculated from the correct response (a SMILES string from ChEMBLv3… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution comparisons for different properties of the generated molecules from SmileyLlama (blue) with molecules from the training dataset from ChEMBL (gold). (a) UMAP visualization of a random selection of 10,000 ChEMBL molecules and 10,000 SmileyLlama-generated molecules, using 15 neighbors and a minimum distance of 0.1. (b) The molecular properties considered are fraction of sp3 hybridized carbons an… view at source ↗
Figure 3
Figure 3. Figure 3: Conditional generation with SmileyLlama for fragment growth and before and after DPO compared to ChEMBL. (a) Example molecules generated by growing from one of the Enamine substructures and to satisfy Lipinski’s Rule of 5 using the prompt Output a SMILES string for a drug like molecule with the following properties: a substructure of O=C(O)c1ccc(C(F)(F)F)cc1, <= 500 MW, <=5 logP, <= 5 H-bond donors, <= 10 … view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of the shift in docking score distributions for iMiner compared to SmileyLlama over optimization epochs as illustrated for SARS2-MPro. (a) For iMiner, in later epochs diversity crashes which explains the sharpening peaks in later iterations. SmileyLlama with DPO (SL+DPO) enforces diversity throughout the optimizations (Algorithm S3), which accounts for the broad peaks, and shows superior data ef… view at source ↗
Figure 5
Figure 5. Figure 5: SmileyLlama de novo generated molecules in the active site of SARS2 main protease. Surface rendering of the SmileyLlama generated molecules in the SARS2 Mpro canonical binding pocket. Generated by SmileyLlama after optimization with (a) the SARS2PRO prompt. (b) and (c) the SARS2Pro+Ro5 prompt. Supplementary Table S2 provides their SMILES string and docking scores, and Supplementary Figure S3 shows their do… view at source ↗
read the original abstract

We show that large language model (LLMs) can be transformed via supervised fine-tuning (SFT) of engineered prompts into SmileyLlama for exploring the chemical space of drug molecules. We benchmark SmileyLlama against pre-trained LLMs and chemical language models (CLM) trained from scratch for generating valid and novel drug-like molecules, and use direct preference optimization (DPO) to both improve SmileyLlama's adherence to a prompt and as part of the iMiner reinforcement learning framework to predict molecules with optimized 3D conformations and high binding affinity to drug targets. By training an LLM to speak directly as a CLM, while retaining most of its natural language capabilities, we show that we can reliably generate molecules with user-specified properties rather than acting only as a chatbot with knowledge of chemistry or as a virtual assistant. While SmileyLlama is geared toward drug discovery, the SFT/DPO/LLM framework can be extended to other chemical, biological, and materials applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that large language models can be transformed via supervised fine-tuning (SFT) of engineered prompts into SmileyLlama for exploring the chemical space of drug molecules. It benchmarks this model against pre-trained LLMs and chemical language models (CLMs) for generating valid and novel drug-like molecules, employs direct preference optimization (DPO) to improve prompt adherence, and integrates DPO with the iMiner reinforcement learning framework to predict molecules with optimized 3D conformations and high binding affinity to targets. The central assertion is that this produces reliable generation of molecules with user-specified properties while retaining most natural language capabilities, with potential extension to other chemical and materials applications.

Significance. If the quantitative benchmarks, error bars, and controls for natural-language retention are supplied and hold, the work could offer a meaningful bridge between flexible natural-language interfaces and directed chemical generation, potentially enabling more intuitive prompt-based exploration than standalone CLMs in drug discovery.

major comments (2)
  1. [Abstract] Abstract: the assertion of benchmarks against pre-trained LLMs and CLMs plus improvements via DPO supplies no numerical results, error bars, or method details, rendering the central performance claims unverifiable from the text.
  2. [Abstract] Abstract: the load-bearing claim that SFT/DPO produces a model that both generates molecules with user-specified properties and retains most natural-language capabilities lacks any before/after metrics on general NL tasks (e.g., MMLU, GSM8K) or chemistry QA, and no ablation controls isolating prompt engineering from capability loss are described.
minor comments (1)
  1. The iMiner framework and the precise form of the engineered prompts would benefit from explicit pseudocode or algorithmic description to allow reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. We agree that the abstract requires strengthening with quantitative results and will revise it in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion of benchmarks against pre-trained LLMs and CLMs plus improvements via DPO supplies no numerical results, error bars, or method details, rendering the central performance claims unverifiable from the text.

    Authors: We agree the abstract is currently too high-level. In revision we will insert the key numerical benchmark outcomes (validity, novelty, and uniqueness rates versus baselines; DPO-induced gains in prompt adherence) together with standard deviations from repeated runs, plus a one-sentence pointer to the methods section for experimental details. revision: yes

  2. Referee: [Abstract] Abstract: the load-bearing claim that SFT/DPO produces a model that both generates molecules with user-specified properties and retains most natural-language capabilities lacks any before/after metrics on general NL tasks (e.g., MMLU, GSM8K) or chemistry QA, and no ablation controls isolating prompt engineering from capability loss are described.

    Authors: The primary scope of the work is directed chemical generation rather than general LLM evaluation. We will revise the abstract to qualify the retention statement and, where internal chemistry-specific QA results exist, include a brief before/after comparison. Full ablation details on prompt engineering versus capability retention appear in the supplementary material of the current manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical SFT/DPO training

full rationale

The paper describes transforming LLMs via supervised fine-tuning on engineered prompts into SmileyLlama, with subsequent DPO for prompt adherence and iMiner for 3D affinity optimization, then benchmarking generation of valid/novel molecules against pre-trained LLMs and CLMs. No mathematical derivations, equations, or fitted parameters are invoked that reduce claimed performance to self-definition or construction from inputs. The central premise of dual molecule generation plus retained natural-language capability is presented as an outcome of new training rather than a self-citation chain, uniqueness theorem, or renamed known result. This is a standard empirical training paper whose results are externally falsifiable via the described benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities; all ledger entries are therefore empty.

pith-pipeline@v0.9.0 · 5731 in / 1058 out tokens · 18695 ms · 2026-05-23T21:14:54.380440+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages · 11 internal anchors

  1. [1]

    Chemical language models for de novo drug design: Challenges and opportunities

    Grisoni, F. Chemical language models for de novo drug design: Challenges and opportunities. 79, 102527

  2. [2]

    SMILES, a chemical language and information system

    Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36

  3. [3]

    Self-referencing embedded strings (SELFIES): A 100

    Krenn, M.; Hase, F.; Nigam, A.; Friederich, P.; Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): A 100

  4. [4]

    De Cao and T

    Cao, N. D.; Kipf, T. MolGAN: An implicit generative model for small molecular graphs.ArXiv 2018, abs/1805.11973, null

  5. [5]

    Generative Models for De Novo Drug Design

    Tong, X.; Liu, X.; Tan, X.; Li, X.; Jiang, J.; Xiong, Z.; Xu, T.; Jiang, H.; Qiao, N.; Zheng, M. Generative Models for De Novo Drug Design. Journal of medicinal chemistry 2021, null, null

  6. [6]

    Language models can learn complex molecular distributions

    Flam-Shepherd, D.; Zhu, K.; Aspuru-Guzik, A. Language models can learn complex molecular distributions. Nature Communications 2022, 13, 3293

  7. [7]

    Chemical language models enable navigation in sparsely populated chemical space

    Skinnider, M.; Stacey, R.; Wishart, D.; Foster, L. Chemical language models enable navigation in sparsely populated chemical space. Nature Machine Intelligence 2021, 3, 759 – 770

  8. [8]

    REINVENT 2.0: An AI Tool for De Novo Drug Design

    Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: An AI Tool for De Novo Drug Design. Journal of chemical information and modeling 2020, null, null

  9. [9]

    Chemical language models for de novo drug design: Challenges and opportunities

    Grisoni, F. Chemical language models for de novo drug design: Challenges and opportunities. Current opinion in structural biology 2023, 79, 102527

  10. [10]

    J.; Bento, A

    Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y .; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery. 40, D1100–D1107

  11. [11]

    I.; Tang, K

    Tingle, B. I.; Tang, K. G.; Castanon, M.; Gutierrez, J. J.; Khurelbaatar, M.; Dandarchuluun, C.; Moroz, Y . S.; Irwin, J. J. ZINC-22–A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery.J. Chem. Info. Model. 2023, 63, 1166–1176. 12

  12. [12]

    N.; Duvenaud, D.; Hernández-Lobato, J

    Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science 2018, 4, 268–276

  13. [13]

    Long Short-Term Memory

    Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Computation 1997, 9, 1735–1780

  14. [14]

    T.; Huisman, B

    Gupta, A.; Müller, A. T.; Huisman, B. J. H.; Fuchs, J. A.; Schneider, P.; Schneider, G. Generative Recurrent Networks for De Novo Drug Design. Molecular Informatics 2017, 37

  15. [15]

    Improving Language Understanding by Generative Pre- Training

    Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre- Training. 2018; https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/ language-unsupervised/language_understanding_paper.pdf

  16. [16]

    K.; Priyakumar, U

    Bagal, V .; Aggarwal, R.; Vinod, P. K.; Priyakumar, U. MolGPT: Molecular Generation Using a Transformer-Decoder Model. Journal of chemical information and modeling 2021, 62, 2064–2076

  17. [17]

    Efficiently Modeling Long Sequences with Structured State Spaces

    Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. 2022; https: //arxiv.org/abs/2111.00396

  18. [18]

    Chemical language modeling with structured state space sequence models

    Özçelik, R.; de Ruiter, S.; Criscuolo, E.; Grisoni, F. Chemical language modeling with structured state space sequence models. Nature Communications 2024, 15, 6176

  19. [19]

    cMolGPT: A Conditional Generative Pre-Trained Transformer for Target-Specific De Novo Molecular Generation.Molecules 2023, 28, null

    Wang, Y .; Zhao, H.; Sciabola, S.; Wang, W. cMolGPT: A Conditional Generative Pre-Trained Transformer for Target-Specific De Novo Molecular Generation.Molecules 2023, 28, null

  20. [20]

    N.; Riley, P

    Zhou, Z.; Kearnes, S.; Li, L.; Zare, R. N.; Riley, P. Optimization of Molecules via Deep Reinforcement Learning. Scientific Reports 2019, 9, 10752, Published: 24 July 2019

  21. [21]

    L.; Parks, C.; Amaro, R

    Li, J.; Zhang, O.; Sun, K.; Wang, Y .; Guan, X.; Bagni, D.; Haghighatlari, M.; Kearns, F. L.; Parks, C.; Amaro, R. E.; Head-Gordon, T. Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design. Journal of Chemical Information and Modeling 2024,

  22. [22]

    Two decades of statistical language modeling: where do we go from here? Proceedings of the IEEE 2000, 88, 1270–1278

    Rosenfeld, R. Two decades of statistical language modeling: where do we go from here? Proceedings of the IEEE 2000, 88, 1270–1278

  23. [23]

    Attention Is All You Need

    Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. 2023; https://arxiv.org/abs/1706.03762

  24. [24]

    Language Models are Unsupervised Multitask Learners

    Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. 2018,

  25. [25]

    GPT-4 Technical Report

    OpenAI et al. GPT-4 Technical Report. http://arxiv.org/abs/2303.08774

  26. [26]

    Dubey, A. et al. The Llama 3 Herd of Models. http://arxiv.org/abs/2407.21783

  27. [27]

    L.; Rampal, N.; Alawadhi, A

    Zheng, Z.; Zhang, O.; Nguyen, H. L.; Rampal, N.; Alawadhi, A. H.; Rong, Z.; Head-Gordon, T.; Borgs, C.; Chayes, J. T.; Yaghi, O. M. ChatGPT Research Group for Optimizing the Crystallinity of MOFs and COFs.ACS Cent. Sci. 2023, 9, 2161–2170

  28. [28]

    A.; MacKnight, R.; Kline, B.; Gomes, G

    Boiko, D. A.; MacKnight, R.; Kline, B.; Gomes, G. Autonomous chemical research with large language models. 624, 570–578, Publisher: Nature Publishing Group

  29. [29]

    Bran, A.; Cox, S.; Schilter, O.; Baldassari, C.; White, A

    M. Bran, A.; Cox, S.; Schilter, O.; Baldassari, C.; White, A. D.; Schwaller, P. Augmenting large language models with chemistry tools. 6, 525–535, Publisher: Nature Publishing Group

  30. [30]

    Translation between Molecules and Natural Language

    Edwards, C.; Lai, T.; Ros, K.; Honke, G.; Cho, K.; Ji, H. Translation between Molecules and Natural Language. 2022; http://arxiv.org/abs/2204.11817, arXiv:2204.11817 [cs]. 13

  31. [31]

    BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

    Pei, Q.; Zhang, W.; Zhu, J.; Wu, K.; Gao, K.; Wu, L.; Xia, Y .; Yan, R. BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations. ArXiv 2023, abs/2310.07276, null

  32. [32]

    N.; Chen, Z.; Ning, X.; Sun, H

    Yu, B.; Baker, F. N.; Chen, Z.; Ning, X.; Sun, H. LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset. 2024; https://arxiv.org/abs/ 2402.09391

  33. [33]

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Wang, H.; Skreta, M.; Ser, C.-T.; Gao, W.; Kong, L.; Strieth-Kalthoff, F.; Duan, C.; Zhuang, Y .; Yu, Y .; Zhu, Y .; Du, Y .; Aspuru-Guzik, A.; Neklyudov, K.; Zhang, C. Efficient Evolutionary Search Over Chemical Space with Large Language Models. 2024; http://arxiv.org/abs/2406.16976, arXiv:2406.16976 [physics]

  34. [34]

    Small Molecule Optimization with Large Language Models

    Guevorguian, P.; Bedrosian, M.; Fahradyan, T.; Chilingaryan, G.; Khachatrian, H.; Aghajanyan, A. Small Molecule Optimization with Large Language Models. http://arxiv.org/abs/2407.18897, version: 1

  35. [35]

    Large Language Models as Molecular Design Engines

    Bhattacharya, D.; Cassady, H.; Hickner, M.; Reinhart, W. Large Language Models as Molecular Design Engines. 2024; https://chemrxiv.org/engage/chemrxiv/article-details/ 664c98ea418a5379b0e07d31

  36. [36]

    DrugLLM: Open Large Language Model for Few-shot Molecule Generation

    Liu, X.; Guo, Y .; Li, H.; Liu, J.; Huang, S.; Ke, B.; Lv, J. DrugLLM: Open Large Language Model for Few-shot Molecule Generation. ArXiv 2024,

  37. [37]

    Leveraging language model for advanced multiproperty molecular optimization via prompt engineering

    Wu, Z.; Zhang, O.; Wang, X.; Fu, L.; Zhao, H.; Wang, J.; Du, H.; Jiang, D.; Deng, Y .; Cao, D.; Hsieh, C.-Y .; Hou, T. Leveraging language model for advanced multiproperty molecular optimization via prompt engineering. 1–11, Publisher: Nature Publishing Group

  38. [38]

    J.; Elattar, M

    Ahmed, S. J.; Elattar, M. A. Improving Targeted Molecule Generation through Language Model Fine-Tuning Via Reinforcement Learning. 2024; http://arxiv.org/abs/2405.06836, arXiv:2405.06836 [cs, q-bio]

  39. [39]

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    Rafailov, R.; Sharma, A.; Mitchell, E.; Ermon, S.; Manning, C. D.; Finn, C. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. http://arxiv.org/abs/2305.18290

  40. [40]

    Recent Progress in the Drug Development Targeting SARS-CoV-2 Main Protease as Treatment for COVID-19

    Cui, W.; Yang, K.; Yang, H. Recent Progress in the Drug Development Targeting SARS-CoV-2 Main Protease as Treatment for COVID-19. Front. Mol. Biosci. 2020, 7, 398

  41. [41]

    M.; Wang, Y .; Sawyer, J

    Sun, K.; Bagni, D.; Cavanagh, J. M.; Wang, Y .; Sawyer, J. M.; Gritsevskiy, A.; Zhang, O.; Head-Gordon, T. SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models. 2025; https: //arxiv.org/abs/2503.12602

  42. [42]

    RDKit: Open-Source Cheminformatics Software

    Landrum, G. RDKit: Open-Source Cheminformatics Software. 2016,

  43. [43]

    C.; Murray, C

    Jhoti, H.; Williams, G.; Rees, D. C.; Murray, C. W. The ’rule of three’ for fragment-based drug discovery: where are we now? 12, 644–644, Publisher: Nature Publishing Group

  44. [44]

    F.; Johnson, S

    Veber, D. F.; Johnson, S. R.; Cheng, H.-Y .; Smith, B. R.; Ward, K. W.; Kopple, K. D. Molecular Properties That Influence the Oral Bioavailability of Drug Candidates. 45, 2615–2623, Publisher: American Chemical Society

  45. [45]

    Lian, W. axolotl. URL https://github.com/axolotl-ai-cloud/axolotl/tree/main. https://github.com/ axolotl-ai-cloud/axolotl/tree/main

  46. [46]

    LoRA: Low-Rank Adaptation of Large Language Models

    Hu, E. J.; Shen, Y .; Wallis, P.; Allen-Zhu, Z.; Li, Y .; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. http://arxiv.org/abs/2106.09685

  47. [47]

    Y .; Ermon, S.; Rudra, A.; Ré, C

    Dao, T.; Fu, D. Y .; Ermon, S.; Rudra, A.; Ré, C. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. Advances in Neural Information Processing Systems (NeurIPS). 2022. 14

  48. [48]

    FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

    Dao, T. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. International Conference on Learning Representations (ICLR). 2024

  49. [49]

    Adam: A Method for Stochastic Optimization

    Kingma, D. P.; Ba, J. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980

  50. [50]

    Taori, R.; Gulrajani, I.; Zhang, T.; Dubois, Y .; Li, X.; Guestrin, C.; Liang, P.; Hashimoto, T. B. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca, 2023

  51. [51]

    A.; Khashabi, D.; Hajishirzi, H

    Wang, Y .; Kordi, Y .; Mishra, S.; Liu, A.; Smith, N. A.; Khashabi, D.; Hajishirzi, H. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers). Toronto, Canada, 2023; pp 13484–13508

  52. [52]

    H.; Vaucher, A

    Brown, N.; Fiscato, M.; Segler, M. H.; Vaucher, A. C. GuacaMol: Benchmarking Models for de Novo Molecular Design. Journal of Chemical Information and Modeling 2019, 59, 1096–1108

  53. [53]

    Fréchet ChemNet distance: a metric for generative models for molecules in drug discovery

    Preuer, K.; Renz, P.; Unterthiner, T.; Hochreiter, S.; Klambauer, G. Fréchet ChemNet distance: a metric for generative models for molecules in drug discovery. J. Chem. Inform. Model. 2018, 58, 1736–1741

  54. [54]

    VGAE-MCTS: A New Molecular Gener- ative Model Combining the Variational Graph Auto-Encoder and Monte Carlo Tree Search

    Iwata, H.; Nakai, T.; Koyama, T.; Matsumoto, S.; Kojima, R.; Okuno, Y . VGAE-MCTS: A New Molecular Gener- ative Model Combining the Variational Graph Auto-Encoder and Monte Carlo Tree Search. Journal of Chemical Information and Modeling 2023, 63, 7392–7400

  55. [55]

    R.; Paolini, G

    Bickerton, G. R.; Paolini, G. V .; Besnard, J.; Muresan, S.; Hopkins, A. L. Quantifying the chemical beauty of drugs. Nature Chem. 2012, 4, 90–98

  56. [56]

    A.; Crippen, G

    Wildman, S. A.; Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inform. Comp. Sci. 1999, 39, 868–873

  57. [57]

    Topological polar surface area: a useful descriptor in 2D-QSAR

    S, P.; RJ, D. Topological polar surface area: a useful descriptor in 2D-QSAR. Curr Med Chem 2009, 16, 21–41

  58. [58]

    H.; Frearson, J.; Wyatt, P

    Brenk, R.; Schipani, A.; James, D.; Krasowski, A.; Gilbert, I. H.; Frearson, J.; Wyatt, P. G. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem 2008, 3, 435

  59. [59]

    NumeroLogic: Number Encoding for Enhanced LLMs’ Numerical Reasoning

    Schwartz, E.; Choshen, L.; Shtok, J.; Doveh, S.; Karlinsky, L.; Arbelle, A. NumeroLogic: Number Encoding for Enhanced LLMs’ Numerical Reasoning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA, 2024; pp 206–212

  60. [60]

    Multivariate Density Estimation; John Wiley & Sons, Ltd, 1992; Chapter 6, pp 125–193

  61. [61]

    https://enamine.net/compound-libraries/ fragment-libraries/essential-library, Accessed: 2024-08-23

    Enamine Essential Fragment Library. https://enamine.net/compound-libraries/ fragment-libraries/essential-library, Accessed: 2024-08-23

  62. [62]

    A.; Lombardo, F.; Dominy, B

    Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings1PII of original article: S0169-409X(96)00423-

  63. [63]

    Advanced Drug Delivery Reviews 2001, 46, 3–26, Special issue dedicated to Dr

    The article was originally published in Advanced Drug Delivery Reviews 23 (1997) 3–25.1. Advanced Drug Delivery Reviews 2001, 46, 3–26, Special issue dedicated to Dr. Eric Tomlinson, Advanced Drug Delivery Reviews, A Selection of the Most Highly Cited Articles, 1991-1998

  64. [64]

    Preference Optimization for Molecular Language Models

    Park, R.; Theisen, R.; Sahni, N.; Patek, M.; Cicho ´nska, A.; Rahman, R. Preference Optimization for Molecular Language Models. http://arxiv.org/abs/2310.12304

  65. [65]

    Molecular de-novo design through deep reinforcement learning

    Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 2017, 9, 1–14

  66. [66]

    Deep reinforcement learning for de novo drug design.Sci

    Popova, M.; Isayev, O.; Tropsha, A. Deep reinforcement learning for de novo drug design.Sci. Adv. 2018, 4, eaap7885. 15

  67. [67]

    Trott, O.; Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comp. Chem. 2010, 31, 455–461

  68. [68]

    S.; Socher, R

    Merity, S.; Keskar, N. S.; Socher, R. Regularizing and Optimizing LSTM Language Models. International Conference on Learning Representations. 2018

  69. [69]

    Proximal Policy Optimization Algorithms

    Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms.CoRR 2017, abs/1707.06347

  70. [70]

    Jin, Z.; Du, X.; Xu, Y .; Deng, Y .; Liu, M.; Zhao, Y .; Zhang, B.; Li, X.; Zhang, L.; Peng, C.; others Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors.Nature 2020, 582, 289–293

  71. [71]

    Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors

    Zhang, L.; Lin, D.; Sun, X.; Curth, U.; Drosten, C.; Sauerhering, L.; Becker, S.; Rox, K.; Hilgenfeld, R. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science 2020, 368, 409–412

  72. [72]

    Zhang, C.-H. et al. Potent Noncovalent Inhibitors of the Main Protease of SARS-CoV-2 from Molecular Sculpting of the Drug Perampanel Guided by Free Energy Perturbation Calculations. ACS Central Science 2021, 7, 467–475

  73. [73]

    TTD: Therapeutic Target Database describing target druggability information

    Zhou, Y .; Zhang, Y .; Zhao, D.; Yu, X.; Shen, X.; Zhou, Y .; Wang, S.; Qiu, Y .; Chen, Y .; Zhu, F. TTD: Therapeutic Target Database describing target druggability information. Nucleic Acids Res 2024, 52, D1465–d1477

  74. [74]

    Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions

    Ertl, P.; Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics 2009, 1, 8

  75. [75]

    Gao, L. et al. A framework for few-shot language model evaluation. 2024; https://zenodo.org/records/ 12608602

  76. [76]

    Measuring Massive Multitask Language Understanding

    Hendrycks, D.; Burns, C.; Basart, S.; Zou, A.; Mazeika, M.; Song, D.; Steinhardt, J. Measuring Massive Multitask Language Understanding

  77. [77]

    Wang, Y . et al. MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark. 2024; https://arxiv.org/abs/2406.01574

  78. [78]

    GPQA: A Graduate-Level Google-Proof Q&A Benchmark

    Rein, D.; Hou, B. L.; Stickland, A. C.; Petty, J.; Pang, R. Y .; Dirani, J.; Michael, J.; Bowman, S. R. GPQA: A Graduate-Level Google-Proof Q&A Benchmark. 2023; https://arxiv.org/abs/2311.12022

  79. [79]

    Measuring Mathematical Problem Solving With the MATH Dataset

    Hendrycks, D.; Burns, C.; Kadavath, S.; Arora, A.; Basart, S.; Tang, E.; Song, D.; Steinhardt, J. Measuring Mathematical Problem Solving With the MATH Dataset. 2021;https://arxiv.org/abs/2103.03874

  80. [80]

    Gema, A. P. et al. Are We Done with MMLU? Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers). Albuquerque, New Mexico, 2025; pp 5069–5096

Showing first 80 references.