arxiv: 2304.05376 · v5 · submitted 2023-04-11 · ⚛️ physics.chem-ph · stat.ML

Recognition: 2 theorem links

· Lean Theorem

ChemCrow: Augmenting large-language models with chemistry tools

Andres M Bran , Sam Cox , Oliver Schilter , Carlo Baldassari , Andrew D White , Philippe Schwaller

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:01 UTC · model grok-4.3

classification ⚛️ physics.chem-ph stat.ML

keywords large language modelschemistry toolsautonomous synthesisorganocatalystschromophoredrug discoverymaterials designLLM agents

0 comments

The pith

An LLM agent augmented with 18 chemistry tools autonomously plans and executes real syntheses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ChemCrow, a system that connects a large language model to 18 specialized chemistry tools so the model can handle organic synthesis, drug discovery, and materials design tasks. It demonstrates that the resulting agent can independently plan and carry out the production of an insect repellent, three organocatalysts, and help identify a new chromophore. A sympathetic reader would care because this approach removes the steep learning curve that keeps many computational tools out of reach and lets non-experts perform laboratory-level work with less constant supervision. The evaluation combines LLM self-assessment with expert review and shows consistent success on varied chemical workflows. The work also notes that even GPT-4 evaluators fail to spot clear errors in unaugmented model outputs, highlighting how tool access changes what the base model can achieve.

Core claim

By integrating 18 expert-designed tools, ChemCrow augments large-language-model performance in chemistry so that new capabilities emerge. The agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Evaluations by both the model and human experts confirm its effectiveness across a diverse set of chemical tasks. The system overcomes the limitations of standalone language models by giving them access to external knowledge sources and specialized functions, thereby bridging experimental and computational chemistry.

What carries the argument

The set of 18 expert-designed chemistry tools that the large language model can invoke to retrieve data, run calculations, and guide experimental steps, turning the model into an agent that produces and follows multi-step plans.

If this is right

Routine chemical planning and execution become automated across synthesis, discovery, and design workflows.
Expert chemists receive assistance while non-experts gain access to previously inaccessible capabilities.
The gap between computational predictions and actual laboratory experiments narrows.
Scientific progress accelerates because tool-augmented agents handle tasks that once required extensive manual coordination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pattern of tool integration could be applied to adjacent domains such as biology or materials science to create parallel autonomous systems.
Longer-horizon experiments become feasible if the agent can iteratively adjust plans based on intermediate tool results.
Safety protocols will need explicit design because the agent operates without constant human oversight on real chemical reactions.

Load-bearing premise

The base large language model can reliably interpret tool outputs and avoid hallucinating invalid chemistry when it builds multi-step plans.

What would settle it

Laboratory execution of a synthesis plan produced by the agent yields no product, the wrong product, or an unsafe outcome without any human correction or filtering.

read the original abstract

Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ChemCrow gives a concrete example of wiring 18 chemistry tools to an LLM agent and running it on real synthesis targets, but the autonomy and evaluation claims need tighter documentation to hold up fully.

read the letter

ChemCrow is a working demonstration of an LLM agent equipped with chemistry tools that can handle planning for real synthesis targets. The authors report autonomous execution on an insect repellent, three organocatalysts, and a novel chromophore discovery. This combination of 18 tools with the agent framework is the concrete new element, and it shows how to bridge the gap between computational tools and experimental chemistry without requiring the model to internalize all the details. What the paper does well is lay out a clear set of tools and show results on diverse tasks. The dual assessment with LLM and expert judges is a straightforward way to check the quality of the plans and outcomes. It lowers the barrier for non-experts while giving experts a starting point for more complex work. The soft spots center on the autonomy and the surprising evaluation result. The stress-test point holds: the paper needs to show that the successful trajectories had no human edits or post-hoc selection, otherwise the autonomy claim is not fully supported by the evidence presented. Similarly, the finding that GPT-4 cannot distinguish wrong GPT-4 outputs from ChemCrow's requires a clearer description of the test setup to be convincing. These are not fatal but they mean the central claims are only partially backed so far. This paper is for people in computational chemistry and AI agents who are interested in practical applications. It gives value to readers who want to replicate or extend tool-augmented systems in science. It deserves a serious referee because the implementation is specific and the results are tied to actual chemical outcomes, which makes it worth the time to review and improve. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ChemCrow, an LLM-based agent augmented with 18 expert-designed chemistry tools to address tasks in organic synthesis, drug discovery, and materials design. It reports that the agent autonomously planned and executed syntheses of an insect repellent, three organocatalysts, and guided discovery of a novel chromophore, with supporting evaluations from both LLMs and human experts. The work also claims that GPT-4 evaluators cannot distinguish ChemCrow outputs from clearly incorrect GPT-4 completions.

Significance. If the autonomy and reliability claims are substantiated, the integration of multiple domain-specific tools into a single agent framework would represent a practical advance in making computational chemistry tools more accessible and bridging experimental and computational workflows. The reported case studies provide concrete demonstrations of multi-step planning, which could lower barriers for non-experts while aiding experts.

major comments (2)

[Results] Results (autonomous synthesis cases): The central claim of autonomous planning and execution requires evidence that the reported successful trajectories (insect repellent, organocatalysts, chromophore) were produced without human filtering, correction, or post-hoc selection of paths. The manuscript supplies neither the complete tool-call traces nor an explicit statement confirming absence of human intervention at any step; this directly affects the strength of the autonomy assertion.
[Evaluation] Evaluation section: The claim that GPT-4 cannot distinguish wrong GPT-4 completions from ChemCrow outputs is presented as surprising but lacks sufficient protocol detail, including how the 'clearly wrong' completions were constructed, the exact evaluator prompt, and any error analysis or inter-rater statistics. This weakens support for the evaluation rigor.

minor comments (2)

[Abstract] Abstract and introduction: The phrase 'autonomously planned and executed' should be qualified with a brief note on the scope of human oversight in tool design and result validation to avoid overstatement.
[Methods] Methods: A table listing the 18 tools with brief descriptions and input/output formats would improve reproducibility and clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important aspects for strengthening the clarity and rigor of our claims regarding autonomy and evaluation. We have carefully considered each major comment and provide point-by-point responses below, along with our plans for revision.

read point-by-point responses

Referee: [Results] Results (autonomous synthesis cases): The central claim of autonomous planning and execution requires evidence that the reported successful trajectories (insect repellent, organocatalysts, chromophore) were produced without human filtering, correction, or post-hoc selection of paths. The manuscript supplies neither the complete tool-call traces nor an explicit statement confirming absence of human intervention at any step; this directly affects the strength of the autonomy assertion.

Authors: We agree that explicit documentation is necessary to fully substantiate the autonomy claims. In the revised manuscript, we will add a clear statement in the Results section confirming that the reported trajectories were generated without human filtering, correction, or post-hoc selection of paths. We will also include the complete tool-call traces and interaction logs for all three case studies (insect repellent, organocatalysts, and chromophore) as supplementary material, allowing readers to directly inspect the autonomous execution process. revision: yes
Referee: [Evaluation] Evaluation section: The claim that GPT-4 cannot distinguish wrong GPT-4 completions from ChemCrow outputs is presented as surprising but lacks sufficient protocol detail, including how the 'clearly wrong' completions were constructed, the exact evaluator prompt, and any error analysis or inter-rater statistics. This weakens support for the evaluation rigor.

Authors: We acknowledge that additional protocol details are required to support the evaluation claims rigorously. In the revised manuscript, we will expand the Evaluation section to provide: (i) a precise description of how the 'clearly wrong' GPT-4 completions were generated, (ii) the exact prompt template used for the GPT-4 evaluator, and (iii) any available error analysis along with inter-rater statistics where applicable. These additions will improve reproducibility and strengthen the interpretation of the results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical agent evaluation with no derivation chain

full rationale

The manuscript presents an LLM agent (ChemCrow) augmented by 18 chemistry tools and evaluates it via case studies of autonomous synthesis planning and execution. No mathematical derivation, equations, or parameter-fitting procedure exists whose outputs reduce by construction to the inputs. Claims rest on reported experimental trajectories and expert/LLM assessments rather than self-definitional loops, fitted-input predictions, or load-bearing self-citations. The work is therefore self-contained as an empirical demonstration.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the existing capabilities of large language models and pre-built chemistry tools; no new free parameters, axioms beyond standard LLM assumptions, or invented entities are introduced.

axioms (1)

domain assumption Large language models can follow multi-step instructions and correctly invoke external tools when given appropriate prompts.
The entire agent architecture depends on this capability of the base model.

pith-pipeline@v0.9.0 · 5524 in / 1245 out tokens · 49531 ms · 2026-05-15T19:01:47.404607+00:00 · methodology

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning
cs.AI 2026-05 unverdicted novelty 7.0

LLM agents reach only 50.6% accuracy on chemical cost estimation within 25% error even with tools, dropping with noise due to parsing, pack selection, and tool-use failures.
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
physics.flu-dyn 2026-05 conditional novelty 7.0

AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction reducing lower-wall Cf RMSE by 7.89% on the periodic hill at Reh=5600 while using a vision-language gate to detect 14 of 16 silent failures...
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
physics.flu-dyn 2026-05 conditional novelty 7.0

AI CFD Scientist autonomously finds a Spalart-Allmaras turbulence correction that lowers wall-friction error by 7.89% versus DNS on the periodic hill case using vision-language physics verification.
AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification
astro-ph.IM 2026-05 unverdicted novelty 7.0

AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.
SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources
cs.AI 2026-04 unverdicted novelty 7.0

SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.
The limits of bio-molecular modeling with large language models : a cross-scale evaluation
cs.LG 2026-04 unverdicted novelty 7.0

LLMs perform adequately on bio-molecular classification tasks but remain weak on regression, with hybrid architectures outperforming others on long sequences and fine-tuning hurting generalization.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
cs.LG 2026-05 unverdicted novelty 6.0

ToolMol integrates evolutionary algorithms with agentic LLMs and precise RDKit tools to optimize multi-objective drug properties, yielding ligands with over 10% better predicted binding affinity and 35% gains in absol...
Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration
cs.AI 2026-05 unverdicted novelty 6.0

NIAgent uses code-centric multi-agent collaboration and hierarchical verification to build adaptive neuroimaging pipelines that outperform static baselines on ADHD-200 and ADNI data.
ADKO: Agentic Decentralized Knowledge Optimization
cs.LG 2026-05 unverdicted novelty 6.0

ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution
cs.LG 2026-05 unverdicted novelty 6.0

FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
physics.flu-dyn 2026-05 unverdicted novelty 6.0

An integrated AI agent framework for CFD uses vision-based physics gates to autonomously discover a Spalart-Allmaras runtime correction that cuts lower-wall skin-friction error by 7.89% versus DNS on the periodic hill...
YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts
cs.LG 2026-05 unverdicted novelty 6.0

YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more...
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
cs.LG 2024-10 accept novelty 6.0

AgentHarm benchmark shows leading LLMs comply with malicious agent requests and simple jailbreaks enable coherent harmful multi-step execution while retaining capabilities.
A Survey on Large Language Model based Autonomous Agents
cs.AI 2023-08 accept novelty 6.0

A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future di...
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
cs.CL 2023-05 conditional novelty 6.0

ReWOO decouples reasoning from tool observations in augmented language models, delivering 5x token efficiency and 4% higher accuracy on multi-step reasoning benchmarks like HotpotQA.
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
cs.LG 2026-05 unverdicted novelty 5.0

ToolMol is an evolutionary agentic framework that pairs multi-objective genetic algorithms with LLM tool-calling to generate drug-like ligands with over 10% better predicted binding affinity and 35% better ABFE scores...
The HTC-Claw: Automating Discovery through High-Throughput Computational Campaigns
cond-mat.mtrl-sci 2026-04 unverdicted novelty 5.0

HTC-Claw is a new intelligent high-throughput computing platform that decomposes research goals into adaptive task workflows for automated materials discovery.
EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments
cs.MA 2026-05 unverdicted novelty 4.0

EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.
Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents
cs.MA 2026-05 unverdicted novelty 4.0

The LMMP framework improves tool-calling accuracy and task success rates for Earth observation agents by grounding plans in multimodal features and remote sensing expert knowledge via a two-stage training process.
A Scoping Review of Large Language Model-Based Pedagogical Agents
cs.AI 2026-04 unverdicted novelty 4.0

A scoping review of 52 studies maps four design dimensions for LLM-based pedagogical agents and notes trends such as multi-agent systems and ethical issues.
Materials Informatics Across the Length Scales
cond-mat.mtrl-sci 2026-04 unverdicted novelty 2.0

A survey of data-driven methods for materials modeling at nanoscale, mesoscale, and micro-to-continuum scales that identifies established capabilities, data quality issues, and obstacles to cross-scale integration.

Reference graph

Works this paper leans on

118 extracted references · 118 canonical work pages · cited by 19 Pith papers · 12 internal anchors

[1]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transform- ers for language understanding. arXiv preprint arXiv:1810.04805 2018,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A., et al

Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A., et al. Language models are few-shot learners.Advances in neural information processing systems 2020, 33, 1877–1901

work page 2020
[3]

On the Opportunities and Risks of Foundation Models

Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M. S.; Bohg, J.; Bosselut, A.; Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 2021,

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

PaLM: Scaling Language Modeling with Pathways

Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H. W.; Sutton, C.; Gehrmann, S., et al. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 2022,

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Bubeck, S.; Chandrasekaran, V .; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y . T.; Li, Y .; Lundberg, S., et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 2023,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

https://copilot.github.com

GitHub Copilot: Your AI pair programmer. https://copilot.github.com

work page
[7]

Li, R. et al. StarCoder: may the source be with you! 2023

work page 2023
[8]

A.; Rice, A.; Rifkin, D.; Simister, S.; Sittampalam, G.; Aftandilian, E

Ziegler, A.; Kalliamvakou, E.; Li, X. A.; Rice, A.; Rifkin, D.; Simister, S.; Sittampalam, G.; Aftandilian, E. Productivity assessment of neural code completion. 2022, 21–29

work page 2022
[9]

N.; Kaiser, Ł.; Polo- sukhin, I

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polo- sukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30

work page 2017
[10]

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Zettlemoyer, L.; Cancedda, N.; Scialom, T. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 2023,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

M.; Pimentel, A

Castro Nascimento, C. M.; Pimentel, A. S. Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. Journal of Chemical Information and Modeling 2023, 63, 1649–1655

work page 2023
[12]

OpenAI, GPT-4 Technical Report. 2023

work page 2023
[13]

Training language models to follow instructions with human feedback

Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A., et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 2022, 35, 27730–27744

work page 2022
[14]

D.; Hocky, G

White, A. D.; Hocky, G. M.; Gandhi, H. A.; Ansari, M.; Cox, S.; Wellawatte, G. P.; Sasmal, S.; Yang, Z.; Liu, K.; Singh, Y ., et al. Assessment of chemistry knowledge in large language models that generate code. Digital Discovery 2023,

work page 2023
[15]

M.; Corbett, P

Lowe, D. M.; Corbett, P. T.; Murray-Rust, P.; Glen, R. C. Chemical Name to Structure: OPSIN, an Open Source Solution. Journal of Chemical Information and Modeling 2011, 51, 739–753, PMID: 21384929

work page 2011
[16]

W.; Barzilay, R.; Jaakkola, T

Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS central science 2017, 3, 434–443. 12

work page 2017
[17]

W.; Jin, W.; Rogers, L.; Jamison, T

Coley, C. W.; Jin, W.; Rogers, L.; Jamison, T. F.; Jaakkola, T. S.; Green, W. H.; Barzilay, R.; Jensen, K. F. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 2019, 10, 370–377

work page 2019
[18]

A.; Bekas, C.; Lee, A

Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science 2019, 5, 1572–1583

work page 2019
[19]

Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates

Pesciullesi, G.; Schwaller, P.; Laino, T.; Reymond, J.-L. Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates. Nat. Commun. 2020, 11, 1–8

work page 2020
[20]

Irwin, R.; Dimitriadis, S.; He, J.; Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 2022, 3, 015022

work page 2022
[21]

P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B

Szymku´c, S.; Gajewska, E. P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B. A. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. - Int. Ed. 2016, 55, 5904–5937

work page 2016
[22]

H.; Preuss, M.; Waller, M

Segler, M. H.; Preuss, M.; Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 2018, 555, 604–610

work page 2018
[23]

W.; Thomas, D

Coley, C. W.; Thomas, D. A.; Lummiss, J. A.; Jaworski, J. N.; Breen, C. P.; Schultz, V .; Hart, T.; Fishman, J. S.; Rogers, L.; Gao, H., et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 2019, 365

work page 2019
[24]

H.; Haeuselmann, R

Schwaller, P.; Petraglia, R.; Zullo, V .; Nair, V . H.; Haeuselmann, R. A.; Pisoni, R.; Bekas, C.; Iuliano, A.; Laino, T. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science 2020, 11, 3316–3325

work page 2020
[25]

AiZyn- thFinder: a fast, robust and flexible open-source software for retrosynthetic planning

Genheden, S.; Thakkar, A.; Chadimová, V .; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. AiZyn- thFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminf. 2020, 12, 1–9

work page 2020
[26]

Molga, K.; Szymku´c, S.; Grzybowski, B. A. Chemist Ex Machina: Advanced Synthesis Planning by Computers. Acc. Chem. Res. 2021, 54, 1094–1106

work page 2021
[27]

C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T

Schwaller, P.; Vaucher, A. C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T. Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science 2022, 12, e1604

work page 2022
[28]

DeepTox: toxicity prediction using deep learning

Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: toxicity prediction using deep learning. Frontiers in Environmental Science2016, 3, 80

work page
[29]

Analyzing learned molecular representations for property prediction

Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M., et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling 2019, 59, 3370–3388

work page 2019
[30]

arXiv preprint arXiv:2010.09885 (2020)

Chithrananda, S.; Grand, G.; Ramsundar, B. Chemberta: Large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 2020,

work page arXiv 2010
[31]

Exposing the limitations of molecular machine learning with activity cliffs

van Tilborg, D.; Alenicheva, A.; Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. Journal of Chemical Information and Modeling 2022, 62, 5938–5951

work page 2022
[32]

M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B

Jablonka, K. M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Is GPT-3 all you need for low-data discovery in chemistry? 2023,

work page 2023
[33]

N.; Duvenaud, D.; Hernández-Lobato, J

Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276, PMID: 29532027

work page 2018
[34]

REINVENT 2.0: an AI tool for de novo drug design

Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: an AI tool for de novo drug design. Journal of chemical information and modeling 2020, 60, 5918–5922. 13

work page 2020
[35]

Machine learning for perovskite materials design and discovery.npj Computational Materials 2021, 7, 1–18, Number: 1 Publisher: Nature Publishing Group

Tao, Q.; Xu, P.; Li, M.; Lu, W. Machine learning for perovskite materials design and discovery.npj Computational Materials 2021, 7, 1–18, Number: 1 Publisher: Nature Publishing Group

work page 2021
[36]

Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high- throughput virtual screening and experimental approach. Nature Materials 2016, 15, 1120–1127, Number: 10 Publisher: Nature Publishing Group

work page 2016
[37]

J.; Stevens, J.; Li, J.; Parasram, M.; Damani, F.; Alvarado, J

Shields, B. J.; Stevens, J.; Li, J.; Parasram, M.; Damani, F.; Alvarado, J. I. M.; Janey, J. M.; Adams, R. P.; Doyle, A. G. Bayesian reaction optimization as a tool for chemical synthesis. Nature 2021, 590, 89–96

work page 2021
[38]

Torres, J. A. G.; Lau, S. H.; Anchuri, P.; Stevens, J. M.; Tabora, J. E.; Li, J.; Borovika, A.; Adams, R. P.; Doyle, A. G. A Multi-Objective Active Learning Platform and Web App for Reaction Optimization. Journal of the American Chemical Society 2022, 144, 19999–20007

work page 2022
[39]

C.; Michtavy, S

Ramos, M. C.; Michtavy, S. S.; Porosoff, M. D.; White, A. D. Bayesian Optimization of Catalysts With In-context Learning. arXiv preprint arXiv:2304.05341 2023,

work page arXiv 2023
[40]

Integrating learning and reasoning with deep logic models

Marra, G.; Giannini, F.; Diligenti, M.; Gori, M. Integrating learning and reasoning with deep logic models. 2020, 517–532

work page 2020
[41]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Chi, E.; Le, Q.; Zhou, D. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 2022,

work page internal anchor Pith review Pith/arXiv arXiv 2022
[42]

Large Language Models Are Reasoning Teachers.arXiv preprint arXiv:2212.10071 2022,

Ho, N.; Schmid, L.; Yun, S.-Y . Large Language Models Are Reasoning Teachers.arXiv preprint arXiv:2212.10071 2022,

work page arXiv 2022
[43]

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y . React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 2022,

work page internal anchor Pith review Pith/arXiv arXiv 2022
[44]

Star: Bootstrapping reasoning with reasoning.Advances in Neural Information Processing Systems 2022, 35, 15476–15488

Zelikman, E.; Wu, Y .; Mu, J.; Goodman, N. Star: Bootstrapping reasoning with reasoning.Advances in Neural Information Processing Systems 2022, 35, 15476–15488

work page 2022
[45]

Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors

Zhao, Z.-W.; del Cueto, M.; Troisi, A. Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors. Digital Discovery 2022, 1, 266–276

work page 2022
[46]

C.; Schwaller, P.; Geluykens, J.; Nair, V

Vaucher, A. C.; Schwaller, P.; Geluykens, J.; Nair, V . H.; Iuliano, A.; Laino, T. Inferring experimental procedures from text-based representations of chemical reactions. Nature communications 2021, 12, 2573

work page 2021
[47]

C.; Nair, V

Schwaller, P.; Probst, D.; Vaucher, A. C.; Nair, V . H.; Kreutter, D.; Laino, T.; Reymond, J.-L. Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 2021, 3, 144–152

work page 2021
[48]

https://github.com/rxn4chemistry/rxn4chemistry, 2020; Accessed: April 2023

rxn4Chemistry, rxn4Chemistry. https://github.com/rxn4chemistry/rxn4chemistry, 2020; Accessed: April 2023

work page 2020
[49]

Thakkar, A.; Kogej, T.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chemical science 2020, 11, 154–168

work page 2020
[50]

Ring breaker

Thakkar, A.; Selmi, N.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. “Ring breaker”: neural network driven synthesis prediction of the ring system chemical space. Journal of medicinal chemistry 2020, 63, 8791–8808

work page 2020
[51]

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Yang, Z.; Li, L.; Wang, J.; Lin, K.; Azarnasab, E.; Ahmed, F.; Liu, Z.; Liu, C.; Zeng, M.; Wang, L. MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.arXiv preprint arXiv:2303.11381 2023,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

Shen, Y .; Song, K.; Tan, X.; Li, D.; Lu, W.; Zhuang, Y . HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. 2023. 14

work page 2023
[53]

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

Karpas, E.; Abend, O.; Belinkov, Y .; Lenz, B.; Lieber, O.; Ratner, N.; Shoham, Y .; Bata, H.; Levine, Y .; Leyton-Brown, K., et al. MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445 2022,

work page internal anchor Pith review Pith/arXiv arXiv 2022
[54]

A.; MacKnight, R.; Gomes, G

Boiko, D. A.; MacKnight, R.; Gomes, G. Emergent autonomous scientific research capabilities of large language models. arXiv preprint 2023,

work page 2023
[55]

https://research.ibm.com/ science/ibm-roborxn/, [Accessed 12-May-2023]

IBM RoboRXN | Science | IBM Research — research.ibm.com. https://research.ibm.com/ science/ibm-roborxn/, [Accessed 12-May-2023]

work page 2023
[56]

Wittkopp, A.; Schreiner, P. R. Metal-Free, Noncovalent Catalysis of Diels–Alder Reactions by Neutral Hydrogen Bond Donors in Organic Solvents and in Water.Chemistry – A European Journal 2003, 9, 407–414

work page 2003
[57]

R.; Wittkopp, A

Schreiner, P. R.; Wittkopp, A. H-Bonding Additives Act Like Lewis Acid Catalysts.Organic Letters 2002, 4, 217–220, Publisher: American Chemical Society

work page 2002
[58]

P.; Sgarzani, V .; Bernardi, L.; Ricci, A

Herrera, R. P.; Sgarzani, V .; Bernardi, L.; Ricci, A. Catalytic Enantioselective Friedel–Crafts Alkylation of Indoles with Nitroalkenes by Using a Simple Thiourea Organocatalyst. Angewandte Chemie International Edition 2005, 44, 6576–6579

work page 2005
[59]

Enantioselective Michael Reaction of Malonates to Nitroolefins Catalyzed by Bifunctional Organocatalysts

Okino, T.; Hoashi, Y .; Takemoto, Y . Enantioselective Michael Reaction of Malonates to Nitroolefins Catalyzed by Bifunctional Organocatalysts. Journal of the American Chemical Society 2003, 125, 12672–12673, Publisher: American Chemical Society

work page 2003
[60]

Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012

work page 2012
[61]

N.; Gomes, J.; Geniesse, C.; Pappu, A

Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V . MoleculeNet: a benchmark for molecular machine learning.Chemical science 2018, 9, 513–530

work page 2018
[62]

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Liu, Y .; Iter, D.; Xu, Y .; Wang, S.; Xu, R.; Zhu, C. GPTEval: NLG Evaluation using GPT-4 with Better Human Alignment. arXiv preprint arXiv:2303.16634 2023,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[63]

& Rock, D

Eloundou, T.; Manning, S.; Mishkin, P.; Rock, D. Gpts are gpts: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130 2023,

work page arXiv 2023
[64]

A.; Badowski, T.; Molga, K.; Szymku´c, S

Grzybowski, B. A.; Badowski, T.; Molga, K.; Szymku´c, S. Network search algorithms and scoring functions for advanced-level computerized synthesis planning. WIREs Computational Molecular Science 2023, 13, e1630

work page 2023
[65]

Artificial intelli- gence and automation in computer aided synthesis planning

Thakkar, A.; Johansson, S.; Jorner, K.; Buttar, D.; Reymond, J.-L.; Engkvist, O. Artificial intelli- gence and automation in computer aided synthesis planning. Reaction chemistry & engineering 2021, 6, 27–51

work page 2021
[66]

Dual use of artificial-intelligence-powered drug discovery

Urbina, F.; Lentzos, F.; Invernizzi, C.; Ekins, S. Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence 2022, 4, 189–191

work page 2022
[67]

A teachable moment for dual-use.Nature machine intelligence 2022, 4, 607–607

Urbina, F.; Lentzos, F.; Invernizzi, C.; Ekins, S. A teachable moment for dual-use.Nature machine intelligence 2022, 4, 607–607

work page 2022
[68]

L.; Herington, J.; White, A

Campbell, Q. L.; Herington, J.; White, A. D. Censoring chemical data to mitigate dual use risk. arXiv preprint arXiv:2304.10510 2023,

work page arXiv 2023
[69]

Scaling Laws for Reward Model Overoptimization

Gao, L.; Schulman, J.; Hilton, J. Scaling Laws for Reward Model Overoptimization. arXiv preprint arXiv:2210.10760 2022,

work page arXiv 2022
[70]

Improving language understanding by generative pre-training

Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I., et al. Improving language understanding by generative pre-training. 2018,

work page 2018
[71]

Trustworthy AI: From Principles to Practices

Li, B.; Qi, P.; Liu, B.; Di, S.; Liu, J.; Pei, J.; Yi, J.; Zhou, B. Trustworthy AI: From Principles to Practices. ACM Computing Surveys 2021, 55, 1 – 46. 15

work page 2021
[72]

M.; White, A

Hocky, G. M.; White, A. D. Natural language processing models that automate programming will transform chemistry research and teaching. Digital Discovery 2022, 1, 79–83

work page 2022
[73]

A.; Liang, P

Henderson, P.; Li, X.; Jurafsky, D.; Hashimoto, T.; Lemley, M. A.; Liang, P. Foundation Models and Fair Use. arXiv preprint arXiv:2303.15715 2023,

work page arXiv 2023
[74]

The Role of Cooperation in Responsible AI Development

Askell, A.; Brundage, M.; Hadfield, G. The Role of Cooperation in Responsible AI Development. 2019

work page 2019
[75]

d.; Baum, S

Neufville, R. d.; Baum, S. D. Collective action on artificial intelligence: A primer and review. Technology in Society 2021, 66, 101649

work page 2021
[76]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; Rodriguez, A.; Joulin, A.; Grave, E.; Lample, G. LLaMA: Open and Efficient Foundation Language Models. 2023

work page 2023
[77]

E.; Stoica, I.; Xing, E

Chiang, W.-L.; Li, Z.; Lin, Z.; Sheng, Y .; Wu, Z.; Zhang, H.; Zheng, L.; Zhuang, S.; Zhuang, Y .; Gonzalez, J. E.; Stoica, I.; Xing, E. P. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. 2023; https://lmsys.org/blog/2023-03-30-vicuna/

work page 2023
[78]

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Mukherjee, S.; Mitra, A.; Jawahar, G.; Agarwal, S.; Palangi, H.; Awadallah, A. Orca: Progressive Learning from Complex Explanation Traces of GPT-4. 2023

work page 2023
[79]

LangChain

Chase, H. LangChain. 2022; https://github.com/hwchase17/langchain

work page 2022
[80]

A.; Lewis, M

Press, O.; Zhang, M.; Min, S.; Schmidt, L.; Smith, N. A.; Lewis, M. Measuring and Narrowing the Compositionality Gap in Language Models. arXiv preprint arXiv:2210.03350 2022,

work page arXiv 2022

Showing first 80 references.