pith. machine review for the scientific record. sign in

arxiv: 2304.05376 · v5 · submitted 2023-04-11 · ⚛️ physics.chem-ph · stat.ML

Recognition: 2 theorem links

· Lean Theorem

ChemCrow: Augmenting large-language models with chemistry tools

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:01 UTC · model grok-4.3

classification ⚛️ physics.chem-ph stat.ML
keywords large language modelschemistry toolsautonomous synthesisorganocatalystschromophoredrug discoverymaterials designLLM agents
0
0 comments X

The pith

An LLM agent augmented with 18 chemistry tools autonomously plans and executes real syntheses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ChemCrow, a system that connects a large language model to 18 specialized chemistry tools so the model can handle organic synthesis, drug discovery, and materials design tasks. It demonstrates that the resulting agent can independently plan and carry out the production of an insect repellent, three organocatalysts, and help identify a new chromophore. A sympathetic reader would care because this approach removes the steep learning curve that keeps many computational tools out of reach and lets non-experts perform laboratory-level work with less constant supervision. The evaluation combines LLM self-assessment with expert review and shows consistent success on varied chemical workflows. The work also notes that even GPT-4 evaluators fail to spot clear errors in unaugmented model outputs, highlighting how tool access changes what the base model can achieve.

Core claim

By integrating 18 expert-designed tools, ChemCrow augments large-language-model performance in chemistry so that new capabilities emerge. The agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Evaluations by both the model and human experts confirm its effectiveness across a diverse set of chemical tasks. The system overcomes the limitations of standalone language models by giving them access to external knowledge sources and specialized functions, thereby bridging experimental and computational chemistry.

What carries the argument

The set of 18 expert-designed chemistry tools that the large language model can invoke to retrieve data, run calculations, and guide experimental steps, turning the model into an agent that produces and follows multi-step plans.

If this is right

  • Routine chemical planning and execution become automated across synthesis, discovery, and design workflows.
  • Expert chemists receive assistance while non-experts gain access to previously inaccessible capabilities.
  • The gap between computational predictions and actual laboratory experiments narrows.
  • Scientific progress accelerates because tool-augmented agents handle tasks that once required extensive manual coordination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern of tool integration could be applied to adjacent domains such as biology or materials science to create parallel autonomous systems.
  • Longer-horizon experiments become feasible if the agent can iteratively adjust plans based on intermediate tool results.
  • Safety protocols will need explicit design because the agent operates without constant human oversight on real chemical reactions.

Load-bearing premise

The base large language model can reliably interpret tool outputs and avoid hallucinating invalid chemistry when it builds multi-step plans.

What would settle it

Laboratory execution of a synthesis plan produced by the agent yields no product, the wrong product, or an unsafe outcome without any human correction or filtering.

read the original abstract

Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ChemCrow, an LLM-based agent augmented with 18 expert-designed chemistry tools to address tasks in organic synthesis, drug discovery, and materials design. It reports that the agent autonomously planned and executed syntheses of an insect repellent, three organocatalysts, and guided discovery of a novel chromophore, with supporting evaluations from both LLMs and human experts. The work also claims that GPT-4 evaluators cannot distinguish ChemCrow outputs from clearly incorrect GPT-4 completions.

Significance. If the autonomy and reliability claims are substantiated, the integration of multiple domain-specific tools into a single agent framework would represent a practical advance in making computational chemistry tools more accessible and bridging experimental and computational workflows. The reported case studies provide concrete demonstrations of multi-step planning, which could lower barriers for non-experts while aiding experts.

major comments (2)
  1. [Results] Results (autonomous synthesis cases): The central claim of autonomous planning and execution requires evidence that the reported successful trajectories (insect repellent, organocatalysts, chromophore) were produced without human filtering, correction, or post-hoc selection of paths. The manuscript supplies neither the complete tool-call traces nor an explicit statement confirming absence of human intervention at any step; this directly affects the strength of the autonomy assertion.
  2. [Evaluation] Evaluation section: The claim that GPT-4 cannot distinguish wrong GPT-4 completions from ChemCrow outputs is presented as surprising but lacks sufficient protocol detail, including how the 'clearly wrong' completions were constructed, the exact evaluator prompt, and any error analysis or inter-rater statistics. This weakens support for the evaluation rigor.
minor comments (2)
  1. [Abstract] Abstract and introduction: The phrase 'autonomously planned and executed' should be qualified with a brief note on the scope of human oversight in tool design and result validation to avoid overstatement.
  2. [Methods] Methods: A table listing the 18 tools with brief descriptions and input/output formats would improve reproducibility and clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important aspects for strengthening the clarity and rigor of our claims regarding autonomy and evaluation. We have carefully considered each major comment and provide point-by-point responses below, along with our plans for revision.

read point-by-point responses
  1. Referee: [Results] Results (autonomous synthesis cases): The central claim of autonomous planning and execution requires evidence that the reported successful trajectories (insect repellent, organocatalysts, chromophore) were produced without human filtering, correction, or post-hoc selection of paths. The manuscript supplies neither the complete tool-call traces nor an explicit statement confirming absence of human intervention at any step; this directly affects the strength of the autonomy assertion.

    Authors: We agree that explicit documentation is necessary to fully substantiate the autonomy claims. In the revised manuscript, we will add a clear statement in the Results section confirming that the reported trajectories were generated without human filtering, correction, or post-hoc selection of paths. We will also include the complete tool-call traces and interaction logs for all three case studies (insect repellent, organocatalysts, and chromophore) as supplementary material, allowing readers to directly inspect the autonomous execution process. revision: yes

  2. Referee: [Evaluation] Evaluation section: The claim that GPT-4 cannot distinguish wrong GPT-4 completions from ChemCrow outputs is presented as surprising but lacks sufficient protocol detail, including how the 'clearly wrong' completions were constructed, the exact evaluator prompt, and any error analysis or inter-rater statistics. This weakens support for the evaluation rigor.

    Authors: We acknowledge that additional protocol details are required to support the evaluation claims rigorously. In the revised manuscript, we will expand the Evaluation section to provide: (i) a precise description of how the 'clearly wrong' GPT-4 completions were generated, (ii) the exact prompt template used for the GPT-4 evaluator, and (iii) any available error analysis along with inter-rater statistics where applicable. These additions will improve reproducibility and strengthen the interpretation of the results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical agent evaluation with no derivation chain

full rationale

The manuscript presents an LLM agent (ChemCrow) augmented by 18 chemistry tools and evaluates it via case studies of autonomous synthesis planning and execution. No mathematical derivation, equations, or parameter-fitting procedure exists whose outputs reduce by construction to the inputs. Claims rest on reported experimental trajectories and expert/LLM assessments rather than self-definitional loops, fitted-input predictions, or load-bearing self-citations. The work is therefore self-contained as an empirical demonstration.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the existing capabilities of large language models and pre-built chemistry tools; no new free parameters, axioms beyond standard LLM assumptions, or invented entities are introduced.

axioms (1)
  • domain assumption Large language models can follow multi-step instructions and correctly invoke external tools when given appropriate prompts.
    The entire agent architecture depends on this capability of the base model.

pith-pipeline@v0.9.0 · 5524 in / 1245 out tokens · 49531 ms · 2026-05-15T19:01:47.404607+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    LLM agents reach only 50.6% accuracy on chemical cost estimation within 25% error even with tools, dropping with noise due to parsing, pack selection, and tool-use failures.

  2. AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

    physics.flu-dyn 2026-05 conditional novelty 7.0

    AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction reducing lower-wall Cf RMSE by 7.89% on the periodic hill at Reh=5600 while using a vision-language gate to detect 14 of 16 silent failures...

  3. AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

    physics.flu-dyn 2026-05 conditional novelty 7.0

    AI CFD Scientist autonomously finds a Spalart-Allmaras turbulence correction that lowers wall-friction error by 7.89% versus DNS on the periodic hill case using vision-language physics verification.

  4. AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification

    astro-ph.IM 2026-05 unverdicted novelty 7.0

    AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.

  5. SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

    cs.AI 2026-04 unverdicted novelty 7.0

    SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.

  6. The limits of bio-molecular modeling with large language models : a cross-scale evaluation

    cs.LG 2026-04 unverdicted novelty 7.0

    LLMs perform adequately on bio-molecular classification tasks but remain weak on regression, with hybrid architectures outperforming others on long sequences and fine-tuning hurting generalization.

  7. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    cs.CL 2025-11 unverdicted novelty 7.0

    Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.

  8. ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery

    cs.LG 2026-05 unverdicted novelty 6.0

    ToolMol integrates evolutionary algorithms with agentic LLMs and precise RDKit tools to optimize multi-objective drug properties, yielding ligands with over 10% better predicted binding affinity and 35% gains in absol...

  9. Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration

    cs.AI 2026-05 unverdicted novelty 6.0

    NIAgent uses code-centric multi-agent collaboration and hierarchical verification to build adaptive neuroimaging pipelines that outperform static baselines on ADHD-200 and ADNI data.

  10. ADKO: Agentic Decentralized Knowledge Optimization

    cs.LG 2026-05 unverdicted novelty 6.0

    ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.

  11. FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

    cs.LG 2026-05 unverdicted novelty 6.0

    FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.

  12. AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

    physics.flu-dyn 2026-05 unverdicted novelty 6.0

    An integrated AI agent framework for CFD uses vision-based physics gates to autonomously discover a Spalart-Allmaras runtime correction that cuts lower-wall skin-friction error by 7.89% versus DNS on the periodic hill...

  13. YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts

    cs.LG 2026-05 unverdicted novelty 6.0

    YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more...

  14. AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

    cs.LG 2024-10 accept novelty 6.0

    AgentHarm benchmark shows leading LLMs comply with malicious agent requests and simple jailbreaks enable coherent harmful multi-step execution while retaining capabilities.

  15. A Survey on Large Language Model based Autonomous Agents

    cs.AI 2023-08 accept novelty 6.0

    A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future di...

  16. ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

    cs.CL 2023-05 conditional novelty 6.0

    ReWOO decouples reasoning from tool observations in augmented language models, delivering 5x token efficiency and 4% higher accuracy on multi-step reasoning benchmarks like HotpotQA.

  17. ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery

    cs.LG 2026-05 unverdicted novelty 5.0

    ToolMol is an evolutionary agentic framework that pairs multi-objective genetic algorithms with LLM tool-calling to generate drug-like ligands with over 10% better predicted binding affinity and 35% better ABFE scores...

  18. The HTC-Claw: Automating Discovery through High-Throughput Computational Campaigns

    cond-mat.mtrl-sci 2026-04 unverdicted novelty 5.0

    HTC-Claw is a new intelligent high-throughput computing platform that decomposes research goals into adaptive task workflows for automated materials discovery.

  19. EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments

    cs.MA 2026-05 unverdicted novelty 4.0

    EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.

  20. Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents

    cs.MA 2026-05 unverdicted novelty 4.0

    The LMMP framework improves tool-calling accuracy and task success rates for Earth observation agents by grounding plans in multimodal features and remote sensing expert knowledge via a two-stage training process.

  21. A Scoping Review of Large Language Model-Based Pedagogical Agents

    cs.AI 2026-04 unverdicted novelty 4.0

    A scoping review of 52 studies maps four design dimensions for LLM-based pedagogical agents and notes trends such as multi-agent systems and ethical issues.

  22. Materials Informatics Across the Length Scales

    cond-mat.mtrl-sci 2026-04 unverdicted novelty 2.0

    A survey of data-driven methods for materials modeling at nanoscale, mesoscale, and micro-to-continuum scales that identifies established capabilities, data quality issues, and obstacles to cross-scale integration.

Reference graph

Works this paper leans on

118 extracted references · 118 canonical work pages · cited by 19 Pith papers · 12 internal anchors

  1. [1]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transform- ers for language understanding. arXiv preprint arXiv:1810.04805 2018,

  2. [2]

    D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A., et al

    Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A., et al. Language models are few-shot learners.Advances in neural information processing systems 2020, 33, 1877–1901

  3. [3]

    On the Opportunities and Risks of Foundation Models

    Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M. S.; Bohg, J.; Bosselut, A.; Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 2021,

  4. [4]

    PaLM: Scaling Language Modeling with Pathways

    Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H. W.; Sutton, C.; Gehrmann, S., et al. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 2022,

  5. [5]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Bubeck, S.; Chandrasekaran, V .; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y . T.; Li, Y .; Lundberg, S., et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 2023,

  6. [6]

    https://copilot.github.com

    GitHub Copilot: Your AI pair programmer. https://copilot.github.com

  7. [7]

    Li, R. et al. StarCoder: may the source be with you! 2023

  8. [8]

    A.; Rice, A.; Rifkin, D.; Simister, S.; Sittampalam, G.; Aftandilian, E

    Ziegler, A.; Kalliamvakou, E.; Li, X. A.; Rice, A.; Rifkin, D.; Simister, S.; Sittampalam, G.; Aftandilian, E. Productivity assessment of neural code completion. 2022, 21–29

  9. [9]

    N.; Kaiser, Ł.; Polo- sukhin, I

    Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polo- sukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30

  10. [10]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Zettlemoyer, L.; Cancedda, N.; Scialom, T. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 2023,

  11. [11]

    M.; Pimentel, A

    Castro Nascimento, C. M.; Pimentel, A. S. Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. Journal of Chemical Information and Modeling 2023, 63, 1649–1655

  12. [12]

    OpenAI, GPT-4 Technical Report. 2023

  13. [13]

    Training language models to follow instructions with human feedback

    Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A., et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 2022, 35, 27730–27744

  14. [14]

    D.; Hocky, G

    White, A. D.; Hocky, G. M.; Gandhi, H. A.; Ansari, M.; Cox, S.; Wellawatte, G. P.; Sasmal, S.; Yang, Z.; Liu, K.; Singh, Y ., et al. Assessment of chemistry knowledge in large language models that generate code. Digital Discovery 2023,

  15. [15]

    M.; Corbett, P

    Lowe, D. M.; Corbett, P. T.; Murray-Rust, P.; Glen, R. C. Chemical Name to Structure: OPSIN, an Open Source Solution. Journal of Chemical Information and Modeling 2011, 51, 739–753, PMID: 21384929

  16. [16]

    W.; Barzilay, R.; Jaakkola, T

    Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS central science 2017, 3, 434–443. 12

  17. [17]

    W.; Jin, W.; Rogers, L.; Jamison, T

    Coley, C. W.; Jin, W.; Rogers, L.; Jamison, T. F.; Jaakkola, T. S.; Green, W. H.; Barzilay, R.; Jensen, K. F. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 2019, 10, 370–377

  18. [18]

    A.; Bekas, C.; Lee, A

    Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science 2019, 5, 1572–1583

  19. [19]

    Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates

    Pesciullesi, G.; Schwaller, P.; Laino, T.; Reymond, J.-L. Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates. Nat. Commun. 2020, 11, 1–8

  20. [20]

    Irwin, R.; Dimitriadis, S.; He, J.; Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 2022, 3, 015022

  21. [21]

    P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B

    Szymku´c, S.; Gajewska, E. P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B. A. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. - Int. Ed. 2016, 55, 5904–5937

  22. [22]

    H.; Preuss, M.; Waller, M

    Segler, M. H.; Preuss, M.; Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 2018, 555, 604–610

  23. [23]

    W.; Thomas, D

    Coley, C. W.; Thomas, D. A.; Lummiss, J. A.; Jaworski, J. N.; Breen, C. P.; Schultz, V .; Hart, T.; Fishman, J. S.; Rogers, L.; Gao, H., et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 2019, 365

  24. [24]

    H.; Haeuselmann, R

    Schwaller, P.; Petraglia, R.; Zullo, V .; Nair, V . H.; Haeuselmann, R. A.; Pisoni, R.; Bekas, C.; Iuliano, A.; Laino, T. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science 2020, 11, 3316–3325

  25. [25]

    AiZyn- thFinder: a fast, robust and flexible open-source software for retrosynthetic planning

    Genheden, S.; Thakkar, A.; Chadimová, V .; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. AiZyn- thFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminf. 2020, 12, 1–9

  26. [26]

    Molga, K.; Szymku´c, S.; Grzybowski, B. A. Chemist Ex Machina: Advanced Synthesis Planning by Computers. Acc. Chem. Res. 2021, 54, 1094–1106

  27. [27]

    C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T

    Schwaller, P.; Vaucher, A. C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T. Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science 2022, 12, e1604

  28. [28]

    DeepTox: toxicity prediction using deep learning

    Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: toxicity prediction using deep learning. Frontiers in Environmental Science2016, 3, 80

  29. [29]

    Analyzing learned molecular representations for property prediction

    Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M., et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling 2019, 59, 3370–3388

  30. [30]

    arXiv preprint arXiv:2010.09885 (2020)

    Chithrananda, S.; Grand, G.; Ramsundar, B. Chemberta: Large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 2020,

  31. [31]

    Exposing the limitations of molecular machine learning with activity cliffs

    van Tilborg, D.; Alenicheva, A.; Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. Journal of Chemical Information and Modeling 2022, 62, 5938–5951

  32. [32]

    M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B

    Jablonka, K. M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Is GPT-3 all you need for low-data discovery in chemistry? 2023,

  33. [33]

    N.; Duvenaud, D.; Hernández-Lobato, J

    Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276, PMID: 29532027

  34. [34]

    REINVENT 2.0: an AI tool for de novo drug design

    Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: an AI tool for de novo drug design. Journal of chemical information and modeling 2020, 60, 5918–5922. 13

  35. [35]

    Machine learning for perovskite materials design and discovery.npj Computational Materials 2021, 7, 1–18, Number: 1 Publisher: Nature Publishing Group

    Tao, Q.; Xu, P.; Li, M.; Lu, W. Machine learning for perovskite materials design and discovery.npj Computational Materials 2021, 7, 1–18, Number: 1 Publisher: Nature Publishing Group

  36. [36]

    Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high- throughput virtual screening and experimental approach. Nature Materials 2016, 15, 1120–1127, Number: 10 Publisher: Nature Publishing Group

  37. [37]

    J.; Stevens, J.; Li, J.; Parasram, M.; Damani, F.; Alvarado, J

    Shields, B. J.; Stevens, J.; Li, J.; Parasram, M.; Damani, F.; Alvarado, J. I. M.; Janey, J. M.; Adams, R. P.; Doyle, A. G. Bayesian reaction optimization as a tool for chemical synthesis. Nature 2021, 590, 89–96

  38. [38]

    Torres, J. A. G.; Lau, S. H.; Anchuri, P.; Stevens, J. M.; Tabora, J. E.; Li, J.; Borovika, A.; Adams, R. P.; Doyle, A. G. A Multi-Objective Active Learning Platform and Web App for Reaction Optimization. Journal of the American Chemical Society 2022, 144, 19999–20007

  39. [39]

    C.; Michtavy, S

    Ramos, M. C.; Michtavy, S. S.; Porosoff, M. D.; White, A. D. Bayesian Optimization of Catalysts With In-context Learning. arXiv preprint arXiv:2304.05341 2023,

  40. [40]

    Integrating learning and reasoning with deep logic models

    Marra, G.; Giannini, F.; Diligenti, M.; Gori, M. Integrating learning and reasoning with deep logic models. 2020, 517–532

  41. [41]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Chi, E.; Le, Q.; Zhou, D. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 2022,

  42. [42]

    Large Language Models Are Reasoning Teachers.arXiv preprint arXiv:2212.10071 2022,

    Ho, N.; Schmid, L.; Yun, S.-Y . Large Language Models Are Reasoning Teachers.arXiv preprint arXiv:2212.10071 2022,

  43. [43]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y . React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 2022,

  44. [44]

    Star: Bootstrapping reasoning with reasoning.Advances in Neural Information Processing Systems 2022, 35, 15476–15488

    Zelikman, E.; Wu, Y .; Mu, J.; Goodman, N. Star: Bootstrapping reasoning with reasoning.Advances in Neural Information Processing Systems 2022, 35, 15476–15488

  45. [45]

    Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors

    Zhao, Z.-W.; del Cueto, M.; Troisi, A. Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors. Digital Discovery 2022, 1, 266–276

  46. [46]

    C.; Schwaller, P.; Geluykens, J.; Nair, V

    Vaucher, A. C.; Schwaller, P.; Geluykens, J.; Nair, V . H.; Iuliano, A.; Laino, T. Inferring experimental procedures from text-based representations of chemical reactions. Nature communications 2021, 12, 2573

  47. [47]

    C.; Nair, V

    Schwaller, P.; Probst, D.; Vaucher, A. C.; Nair, V . H.; Kreutter, D.; Laino, T.; Reymond, J.-L. Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 2021, 3, 144–152

  48. [48]

    https://github.com/rxn4chemistry/rxn4chemistry, 2020; Accessed: April 2023

    rxn4Chemistry, rxn4Chemistry. https://github.com/rxn4chemistry/rxn4chemistry, 2020; Accessed: April 2023

  49. [49]

    Thakkar, A.; Kogej, T.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chemical science 2020, 11, 154–168

  50. [50]

    Ring breaker

    Thakkar, A.; Selmi, N.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. “Ring breaker”: neural network driven synthesis prediction of the ring system chemical space. Journal of medicinal chemistry 2020, 63, 8791–8808

  51. [51]

    MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

    Yang, Z.; Li, L.; Wang, J.; Lin, K.; Azarnasab, E.; Ahmed, F.; Liu, Z.; Liu, C.; Zeng, M.; Wang, L. MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.arXiv preprint arXiv:2303.11381 2023,

  52. [52]

    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

    Shen, Y .; Song, K.; Tan, X.; Li, D.; Lu, W.; Zhuang, Y . HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. 2023. 14

  53. [53]

    MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

    Karpas, E.; Abend, O.; Belinkov, Y .; Lenz, B.; Lieber, O.; Ratner, N.; Shoham, Y .; Bata, H.; Levine, Y .; Leyton-Brown, K., et al. MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445 2022,

  54. [54]

    A.; MacKnight, R.; Gomes, G

    Boiko, D. A.; MacKnight, R.; Gomes, G. Emergent autonomous scientific research capabilities of large language models. arXiv preprint 2023,

  55. [55]

    https://research.ibm.com/ science/ibm-roborxn/, [Accessed 12-May-2023]

    IBM RoboRXN | Science | IBM Research — research.ibm.com. https://research.ibm.com/ science/ibm-roborxn/, [Accessed 12-May-2023]

  56. [56]

    Wittkopp, A.; Schreiner, P. R. Metal-Free, Noncovalent Catalysis of Diels–Alder Reactions by Neutral Hydrogen Bond Donors in Organic Solvents and in Water.Chemistry – A European Journal 2003, 9, 407–414

  57. [57]

    R.; Wittkopp, A

    Schreiner, P. R.; Wittkopp, A. H-Bonding Additives Act Like Lewis Acid Catalysts.Organic Letters 2002, 4, 217–220, Publisher: American Chemical Society

  58. [58]

    P.; Sgarzani, V .; Bernardi, L.; Ricci, A

    Herrera, R. P.; Sgarzani, V .; Bernardi, L.; Ricci, A. Catalytic Enantioselective Friedel–Crafts Alkylation of Indoles with Nitroalkenes by Using a Simple Thiourea Organocatalyst. Angewandte Chemie International Edition 2005, 44, 6576–6579

  59. [59]

    Enantioselective Michael Reaction of Malonates to Nitroolefins Catalyzed by Bifunctional Organocatalysts

    Okino, T.; Hoashi, Y .; Takemoto, Y . Enantioselective Michael Reaction of Malonates to Nitroolefins Catalyzed by Bifunctional Organocatalysts. Journal of the American Chemical Society 2003, 125, 12672–12673, Publisher: American Chemical Society

  60. [60]

    Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012

  61. [61]

    N.; Gomes, J.; Geniesse, C.; Pappu, A

    Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V . MoleculeNet: a benchmark for molecular machine learning.Chemical science 2018, 9, 513–530

  62. [62]

    G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

    Liu, Y .; Iter, D.; Xu, Y .; Wang, S.; Xu, R.; Zhu, C. GPTEval: NLG Evaluation using GPT-4 with Better Human Alignment. arXiv preprint arXiv:2303.16634 2023,

  63. [63]

    & Rock, D

    Eloundou, T.; Manning, S.; Mishkin, P.; Rock, D. Gpts are gpts: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130 2023,

  64. [64]

    A.; Badowski, T.; Molga, K.; Szymku´c, S

    Grzybowski, B. A.; Badowski, T.; Molga, K.; Szymku´c, S. Network search algorithms and scoring functions for advanced-level computerized synthesis planning. WIREs Computational Molecular Science 2023, 13, e1630

  65. [65]

    Artificial intelli- gence and automation in computer aided synthesis planning

    Thakkar, A.; Johansson, S.; Jorner, K.; Buttar, D.; Reymond, J.-L.; Engkvist, O. Artificial intelli- gence and automation in computer aided synthesis planning. Reaction chemistry & engineering 2021, 6, 27–51

  66. [66]

    Dual use of artificial-intelligence-powered drug discovery

    Urbina, F.; Lentzos, F.; Invernizzi, C.; Ekins, S. Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence 2022, 4, 189–191

  67. [67]

    A teachable moment for dual-use.Nature machine intelligence 2022, 4, 607–607

    Urbina, F.; Lentzos, F.; Invernizzi, C.; Ekins, S. A teachable moment for dual-use.Nature machine intelligence 2022, 4, 607–607

  68. [68]

    L.; Herington, J.; White, A

    Campbell, Q. L.; Herington, J.; White, A. D. Censoring chemical data to mitigate dual use risk. arXiv preprint arXiv:2304.10510 2023,

  69. [69]

    Scaling Laws for Reward Model Overoptimization

    Gao, L.; Schulman, J.; Hilton, J. Scaling Laws for Reward Model Overoptimization. arXiv preprint arXiv:2210.10760 2022,

  70. [70]

    Improving language understanding by generative pre-training

    Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I., et al. Improving language understanding by generative pre-training. 2018,

  71. [71]

    Trustworthy AI: From Principles to Practices

    Li, B.; Qi, P.; Liu, B.; Di, S.; Liu, J.; Pei, J.; Yi, J.; Zhou, B. Trustworthy AI: From Principles to Practices. ACM Computing Surveys 2021, 55, 1 – 46. 15

  72. [72]

    M.; White, A

    Hocky, G. M.; White, A. D. Natural language processing models that automate programming will transform chemistry research and teaching. Digital Discovery 2022, 1, 79–83

  73. [73]

    A.; Liang, P

    Henderson, P.; Li, X.; Jurafsky, D.; Hashimoto, T.; Lemley, M. A.; Liang, P. Foundation Models and Fair Use. arXiv preprint arXiv:2303.15715 2023,

  74. [74]

    The Role of Cooperation in Responsible AI Development

    Askell, A.; Brundage, M.; Hadfield, G. The Role of Cooperation in Responsible AI Development. 2019

  75. [75]

    d.; Baum, S

    Neufville, R. d.; Baum, S. D. Collective action on artificial intelligence: A primer and review. Technology in Society 2021, 66, 101649

  76. [76]

    LLaMA: Open and Efficient Foundation Language Models

    Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; Rodriguez, A.; Joulin, A.; Grave, E.; Lample, G. LLaMA: Open and Efficient Foundation Language Models. 2023

  77. [77]

    E.; Stoica, I.; Xing, E

    Chiang, W.-L.; Li, Z.; Lin, Z.; Sheng, Y .; Wu, Z.; Zhang, H.; Zheng, L.; Zhuang, S.; Zhuang, Y .; Gonzalez, J. E.; Stoica, I.; Xing, E. P. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. 2023; https://lmsys.org/blog/2023-03-30-vicuna/

  78. [78]

    Orca: Progressive Learning from Complex Explanation Traces of GPT-4

    Mukherjee, S.; Mitra, A.; Jawahar, G.; Agarwal, S.; Palangi, H.; Awadallah, A. Orca: Progressive Learning from Complex Explanation Traces of GPT-4. 2023

  79. [79]

    LangChain

    Chase, H. LangChain. 2022; https://github.com/hwchase17/langchain

  80. [80]

    A.; Lewis, M

    Press, O.; Zhang, M.; Min, S.; Schmidt, L.; Smith, N. A.; Lewis, M. Measuring and Narrowing the Compositionality Gap in Language Models. arXiv preprint arXiv:2210.03350 2022,

Showing first 80 references.