Recognition: 2 theorem links
· Lean TheoremChemCrow: Augmenting large-language models with chemistry tools
Pith reviewed 2026-05-15 19:01 UTC · model grok-4.3
The pith
An LLM agent augmented with 18 chemistry tools autonomously plans and executes real syntheses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By integrating 18 expert-designed tools, ChemCrow augments large-language-model performance in chemistry so that new capabilities emerge. The agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Evaluations by both the model and human experts confirm its effectiveness across a diverse set of chemical tasks. The system overcomes the limitations of standalone language models by giving them access to external knowledge sources and specialized functions, thereby bridging experimental and computational chemistry.
What carries the argument
The set of 18 expert-designed chemistry tools that the large language model can invoke to retrieve data, run calculations, and guide experimental steps, turning the model into an agent that produces and follows multi-step plans.
If this is right
- Routine chemical planning and execution become automated across synthesis, discovery, and design workflows.
- Expert chemists receive assistance while non-experts gain access to previously inaccessible capabilities.
- The gap between computational predictions and actual laboratory experiments narrows.
- Scientific progress accelerates because tool-augmented agents handle tasks that once required extensive manual coordination.
Where Pith is reading between the lines
- The same pattern of tool integration could be applied to adjacent domains such as biology or materials science to create parallel autonomous systems.
- Longer-horizon experiments become feasible if the agent can iteratively adjust plans based on intermediate tool results.
- Safety protocols will need explicit design because the agent operates without constant human oversight on real chemical reactions.
Load-bearing premise
The base large language model can reliably interpret tool outputs and avoid hallucinating invalid chemistry when it builds multi-step plans.
What would settle it
Laboratory execution of a synthesis plan produced by the agent yields no product, the wrong product, or an unsafe outcome without any human correction or filtering.
read the original abstract
Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ChemCrow, an LLM-based agent augmented with 18 expert-designed chemistry tools to address tasks in organic synthesis, drug discovery, and materials design. It reports that the agent autonomously planned and executed syntheses of an insect repellent, three organocatalysts, and guided discovery of a novel chromophore, with supporting evaluations from both LLMs and human experts. The work also claims that GPT-4 evaluators cannot distinguish ChemCrow outputs from clearly incorrect GPT-4 completions.
Significance. If the autonomy and reliability claims are substantiated, the integration of multiple domain-specific tools into a single agent framework would represent a practical advance in making computational chemistry tools more accessible and bridging experimental and computational workflows. The reported case studies provide concrete demonstrations of multi-step planning, which could lower barriers for non-experts while aiding experts.
major comments (2)
- [Results] Results (autonomous synthesis cases): The central claim of autonomous planning and execution requires evidence that the reported successful trajectories (insect repellent, organocatalysts, chromophore) were produced without human filtering, correction, or post-hoc selection of paths. The manuscript supplies neither the complete tool-call traces nor an explicit statement confirming absence of human intervention at any step; this directly affects the strength of the autonomy assertion.
- [Evaluation] Evaluation section: The claim that GPT-4 cannot distinguish wrong GPT-4 completions from ChemCrow outputs is presented as surprising but lacks sufficient protocol detail, including how the 'clearly wrong' completions were constructed, the exact evaluator prompt, and any error analysis or inter-rater statistics. This weakens support for the evaluation rigor.
minor comments (2)
- [Abstract] Abstract and introduction: The phrase 'autonomously planned and executed' should be qualified with a brief note on the scope of human oversight in tool design and result validation to avoid overstatement.
- [Methods] Methods: A table listing the 18 tools with brief descriptions and input/output formats would improve reproducibility and clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which highlights important aspects for strengthening the clarity and rigor of our claims regarding autonomy and evaluation. We have carefully considered each major comment and provide point-by-point responses below, along with our plans for revision.
read point-by-point responses
-
Referee: [Results] Results (autonomous synthesis cases): The central claim of autonomous planning and execution requires evidence that the reported successful trajectories (insect repellent, organocatalysts, chromophore) were produced without human filtering, correction, or post-hoc selection of paths. The manuscript supplies neither the complete tool-call traces nor an explicit statement confirming absence of human intervention at any step; this directly affects the strength of the autonomy assertion.
Authors: We agree that explicit documentation is necessary to fully substantiate the autonomy claims. In the revised manuscript, we will add a clear statement in the Results section confirming that the reported trajectories were generated without human filtering, correction, or post-hoc selection of paths. We will also include the complete tool-call traces and interaction logs for all three case studies (insect repellent, organocatalysts, and chromophore) as supplementary material, allowing readers to directly inspect the autonomous execution process. revision: yes
-
Referee: [Evaluation] Evaluation section: The claim that GPT-4 cannot distinguish wrong GPT-4 completions from ChemCrow outputs is presented as surprising but lacks sufficient protocol detail, including how the 'clearly wrong' completions were constructed, the exact evaluator prompt, and any error analysis or inter-rater statistics. This weakens support for the evaluation rigor.
Authors: We acknowledge that additional protocol details are required to support the evaluation claims rigorously. In the revised manuscript, we will expand the Evaluation section to provide: (i) a precise description of how the 'clearly wrong' GPT-4 completions were generated, (ii) the exact prompt template used for the GPT-4 evaluator, and (iii) any available error analysis along with inter-rater statistics where applicable. These additions will improve reproducibility and strengthen the interpretation of the results. revision: yes
Circularity Check
No circularity: empirical agent evaluation with no derivation chain
full rationale
The manuscript presents an LLM agent (ChemCrow) augmented by 18 chemistry tools and evaluates it via case studies of autonomous synthesis planning and execution. No mathematical derivation, equations, or parameter-fitting procedure exists whose outputs reduce by construction to the inputs. Claims rest on reported experimental trajectories and expert/LLM assessments rather than self-definitional loops, fitted-input predictions, or load-bearing self-citations. The work is therefore self-contained as an empirical demonstration.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can follow multi-step instructions and correctly invoke external tools when given appropriate prompts.
Forward citations
Cited by 22 Pith papers
-
Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning
LLM agents reach only 50.6% accuracy on chemical cost estimation within 25% error even with tools, dropping with noise due to parsing, pack selection, and tool-use failures.
-
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction reducing lower-wall Cf RMSE by 7.89% on the periodic hill at Reh=5600 while using a vision-language gate to detect 14 of 16 silent failures...
-
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
AI CFD Scientist autonomously finds a Spalart-Allmaras turbulence correction that lowers wall-friction error by 7.89% versus DNS on the periodic hill case using vision-language physics verification.
-
AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification
AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.
-
SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources
SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.
-
The limits of bio-molecular modeling with large language models : a cross-scale evaluation
LLMs perform adequately on bio-molecular classification tasks but remain weak on regression, with hybrid architectures outperforming others on long sequences and fine-tuning hurting generalization.
-
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
-
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
ToolMol integrates evolutionary algorithms with agentic LLMs and precise RDKit tools to optimize multi-objective drug properties, yielding ligands with over 10% better predicted binding affinity and 35% gains in absol...
-
Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration
NIAgent uses code-centric multi-agent collaboration and hierarchical verification to build adaptive neuroimaging pipelines that outperform static baselines on ADHD-200 and ADNI data.
-
ADKO: Agentic Decentralized Knowledge Optimization
ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.
-
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution
FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
-
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
An integrated AI agent framework for CFD uses vision-based physics gates to autonomously discover a Spalart-Allmaras runtime correction that cuts lower-wall skin-friction error by 7.89% versus DNS on the periodic hill...
-
YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts
YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more...
-
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
AgentHarm benchmark shows leading LLMs comply with malicious agent requests and simple jailbreaks enable coherent harmful multi-step execution while retaining capabilities.
-
A Survey on Large Language Model based Autonomous Agents
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future di...
-
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
ReWOO decouples reasoning from tool observations in augmented language models, delivering 5x token efficiency and 4% higher accuracy on multi-step reasoning benchmarks like HotpotQA.
-
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
ToolMol is an evolutionary agentic framework that pairs multi-objective genetic algorithms with LLM tool-calling to generate drug-like ligands with over 10% better predicted binding affinity and 35% better ABFE scores...
-
The HTC-Claw: Automating Discovery through High-Throughput Computational Campaigns
HTC-Claw is a new intelligent high-throughput computing platform that decomposes research goals into adaptive task workflows for automated materials discovery.
-
EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments
EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.
-
Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents
The LMMP framework improves tool-calling accuracy and task success rates for Earth observation agents by grounding plans in multimodal features and remote sensing expert knowledge via a two-stage training process.
-
A Scoping Review of Large Language Model-Based Pedagogical Agents
A scoping review of 52 studies maps four design dimensions for LLM-based pedagogical agents and notes trends such as multi-agent systems and ethical issues.
-
Materials Informatics Across the Length Scales
A survey of data-driven methods for materials modeling at nanoscale, mesoscale, and micro-to-continuum scales that identifies established capabilities, data quality issues, and obstacles to cross-scale integration.
Reference graph
Works this paper leans on
-
[1]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transform- ers for language understanding. arXiv preprint arXiv:1810.04805 2018,
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A., et al
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A., et al. Language models are few-shot learners.Advances in neural information processing systems 2020, 33, 1877–1901
work page 2020
-
[3]
On the Opportunities and Risks of Foundation Models
Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M. S.; Bohg, J.; Bosselut, A.; Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 2021,
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[4]
PaLM: Scaling Language Modeling with Pathways
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H. W.; Sutton, C.; Gehrmann, S., et al. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 2022,
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Bubeck, S.; Chandrasekaran, V .; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y . T.; Li, Y .; Lundberg, S., et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 2023,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
GitHub Copilot: Your AI pair programmer. https://copilot.github.com
-
[7]
Li, R. et al. StarCoder: may the source be with you! 2023
work page 2023
-
[8]
A.; Rice, A.; Rifkin, D.; Simister, S.; Sittampalam, G.; Aftandilian, E
Ziegler, A.; Kalliamvakou, E.; Li, X. A.; Rice, A.; Rifkin, D.; Simister, S.; Sittampalam, G.; Aftandilian, E. Productivity assessment of neural code completion. 2022, 21–29
work page 2022
-
[9]
N.; Kaiser, Ł.; Polo- sukhin, I
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polo- sukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30
work page 2017
-
[10]
Toolformer: Language Models Can Teach Themselves to Use Tools
Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Zettlemoyer, L.; Cancedda, N.; Scialom, T. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 2023,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
Castro Nascimento, C. M.; Pimentel, A. S. Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. Journal of Chemical Information and Modeling 2023, 63, 1649–1655
work page 2023
-
[12]
OpenAI, GPT-4 Technical Report. 2023
work page 2023
-
[13]
Training language models to follow instructions with human feedback
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A., et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 2022, 35, 27730–27744
work page 2022
-
[14]
White, A. D.; Hocky, G. M.; Gandhi, H. A.; Ansari, M.; Cox, S.; Wellawatte, G. P.; Sasmal, S.; Yang, Z.; Liu, K.; Singh, Y ., et al. Assessment of chemistry knowledge in large language models that generate code. Digital Discovery 2023,
work page 2023
-
[15]
Lowe, D. M.; Corbett, P. T.; Murray-Rust, P.; Glen, R. C. Chemical Name to Structure: OPSIN, an Open Source Solution. Journal of Chemical Information and Modeling 2011, 51, 739–753, PMID: 21384929
work page 2011
-
[16]
Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS central science 2017, 3, 434–443. 12
work page 2017
-
[17]
W.; Jin, W.; Rogers, L.; Jamison, T
Coley, C. W.; Jin, W.; Rogers, L.; Jamison, T. F.; Jaakkola, T. S.; Green, W. H.; Barzilay, R.; Jensen, K. F. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 2019, 10, 370–377
work page 2019
-
[18]
Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science 2019, 5, 1572–1583
work page 2019
-
[19]
Pesciullesi, G.; Schwaller, P.; Laino, T.; Reymond, J.-L. Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates. Nat. Commun. 2020, 11, 1–8
work page 2020
-
[20]
Irwin, R.; Dimitriadis, S.; He, J.; Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 2022, 3, 015022
work page 2022
-
[21]
P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B
Szymku´c, S.; Gajewska, E. P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B. A. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. - Int. Ed. 2016, 55, 5904–5937
work page 2016
-
[22]
Segler, M. H.; Preuss, M.; Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 2018, 555, 604–610
work page 2018
-
[23]
Coley, C. W.; Thomas, D. A.; Lummiss, J. A.; Jaworski, J. N.; Breen, C. P.; Schultz, V .; Hart, T.; Fishman, J. S.; Rogers, L.; Gao, H., et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 2019, 365
work page 2019
-
[24]
Schwaller, P.; Petraglia, R.; Zullo, V .; Nair, V . H.; Haeuselmann, R. A.; Pisoni, R.; Bekas, C.; Iuliano, A.; Laino, T. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science 2020, 11, 3316–3325
work page 2020
-
[25]
AiZyn- thFinder: a fast, robust and flexible open-source software for retrosynthetic planning
Genheden, S.; Thakkar, A.; Chadimová, V .; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. AiZyn- thFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminf. 2020, 12, 1–9
work page 2020
-
[26]
Molga, K.; Szymku´c, S.; Grzybowski, B. A. Chemist Ex Machina: Advanced Synthesis Planning by Computers. Acc. Chem. Res. 2021, 54, 1094–1106
work page 2021
-
[27]
C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T
Schwaller, P.; Vaucher, A. C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T. Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science 2022, 12, e1604
work page 2022
-
[28]
DeepTox: toxicity prediction using deep learning
Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: toxicity prediction using deep learning. Frontiers in Environmental Science2016, 3, 80
-
[29]
Analyzing learned molecular representations for property prediction
Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M., et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling 2019, 59, 3370–3388
work page 2019
-
[30]
arXiv preprint arXiv:2010.09885 (2020)
Chithrananda, S.; Grand, G.; Ramsundar, B. Chemberta: Large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 2020,
-
[31]
Exposing the limitations of molecular machine learning with activity cliffs
van Tilborg, D.; Alenicheva, A.; Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. Journal of Chemical Information and Modeling 2022, 62, 5938–5951
work page 2022
-
[32]
M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B
Jablonka, K. M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Is GPT-3 all you need for low-data discovery in chemistry? 2023,
work page 2023
-
[33]
N.; Duvenaud, D.; Hernández-Lobato, J
Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276, PMID: 29532027
work page 2018
-
[34]
REINVENT 2.0: an AI tool for de novo drug design
Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: an AI tool for de novo drug design. Journal of chemical information and modeling 2020, 60, 5918–5922. 13
work page 2020
-
[35]
Tao, Q.; Xu, P.; Li, M.; Lu, W. Machine learning for perovskite materials design and discovery.npj Computational Materials 2021, 7, 1–18, Number: 1 Publisher: Nature Publishing Group
work page 2021
-
[36]
Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high- throughput virtual screening and experimental approach. Nature Materials 2016, 15, 1120–1127, Number: 10 Publisher: Nature Publishing Group
work page 2016
-
[37]
J.; Stevens, J.; Li, J.; Parasram, M.; Damani, F.; Alvarado, J
Shields, B. J.; Stevens, J.; Li, J.; Parasram, M.; Damani, F.; Alvarado, J. I. M.; Janey, J. M.; Adams, R. P.; Doyle, A. G. Bayesian reaction optimization as a tool for chemical synthesis. Nature 2021, 590, 89–96
work page 2021
-
[38]
Torres, J. A. G.; Lau, S. H.; Anchuri, P.; Stevens, J. M.; Tabora, J. E.; Li, J.; Borovika, A.; Adams, R. P.; Doyle, A. G. A Multi-Objective Active Learning Platform and Web App for Reaction Optimization. Journal of the American Chemical Society 2022, 144, 19999–20007
work page 2022
-
[39]
Ramos, M. C.; Michtavy, S. S.; Porosoff, M. D.; White, A. D. Bayesian Optimization of Catalysts With In-context Learning. arXiv preprint arXiv:2304.05341 2023,
-
[40]
Integrating learning and reasoning with deep logic models
Marra, G.; Giannini, F.; Diligenti, M.; Gori, M. Integrating learning and reasoning with deep logic models. 2020, 517–532
work page 2020
-
[41]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Chi, E.; Le, Q.; Zhou, D. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 2022,
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[42]
Large Language Models Are Reasoning Teachers.arXiv preprint arXiv:2212.10071 2022,
Ho, N.; Schmid, L.; Yun, S.-Y . Large Language Models Are Reasoning Teachers.arXiv preprint arXiv:2212.10071 2022,
-
[43]
ReAct: Synergizing Reasoning and Acting in Language Models
Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y . React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 2022,
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[44]
Zelikman, E.; Wu, Y .; Mu, J.; Goodman, N. Star: Bootstrapping reasoning with reasoning.Advances in Neural Information Processing Systems 2022, 35, 15476–15488
work page 2022
-
[45]
Zhao, Z.-W.; del Cueto, M.; Troisi, A. Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors. Digital Discovery 2022, 1, 266–276
work page 2022
-
[46]
C.; Schwaller, P.; Geluykens, J.; Nair, V
Vaucher, A. C.; Schwaller, P.; Geluykens, J.; Nair, V . H.; Iuliano, A.; Laino, T. Inferring experimental procedures from text-based representations of chemical reactions. Nature communications 2021, 12, 2573
work page 2021
-
[47]
Schwaller, P.; Probst, D.; Vaucher, A. C.; Nair, V . H.; Kreutter, D.; Laino, T.; Reymond, J.-L. Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 2021, 3, 144–152
work page 2021
-
[48]
https://github.com/rxn4chemistry/rxn4chemistry, 2020; Accessed: April 2023
rxn4Chemistry, rxn4Chemistry. https://github.com/rxn4chemistry/rxn4chemistry, 2020; Accessed: April 2023
work page 2020
-
[49]
Thakkar, A.; Kogej, T.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chemical science 2020, 11, 154–168
work page 2020
-
[50]
Thakkar, A.; Selmi, N.; Reymond, J.-L.; Engkvist, O.; Bjerrum, E. J. “Ring breaker”: neural network driven synthesis prediction of the ring system chemical space. Journal of medicinal chemistry 2020, 63, 8791–8808
work page 2020
-
[51]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Yang, Z.; Li, L.; Wang, J.; Lin, K.; Azarnasab, E.; Ahmed, F.; Liu, Z.; Liu, C.; Zeng, M.; Wang, L. MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.arXiv preprint arXiv:2303.11381 2023,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[52]
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Shen, Y .; Song, K.; Tan, X.; Li, D.; Lu, W.; Zhuang, Y . HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. 2023. 14
work page 2023
-
[53]
Karpas, E.; Abend, O.; Belinkov, Y .; Lenz, B.; Lieber, O.; Ratner, N.; Shoham, Y .; Bata, H.; Levine, Y .; Leyton-Brown, K., et al. MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445 2022,
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[54]
Boiko, D. A.; MacKnight, R.; Gomes, G. Emergent autonomous scientific research capabilities of large language models. arXiv preprint 2023,
work page 2023
-
[55]
https://research.ibm.com/ science/ibm-roborxn/, [Accessed 12-May-2023]
IBM RoboRXN | Science | IBM Research — research.ibm.com. https://research.ibm.com/ science/ibm-roborxn/, [Accessed 12-May-2023]
work page 2023
-
[56]
Wittkopp, A.; Schreiner, P. R. Metal-Free, Noncovalent Catalysis of Diels–Alder Reactions by Neutral Hydrogen Bond Donors in Organic Solvents and in Water.Chemistry – A European Journal 2003, 9, 407–414
work page 2003
-
[57]
Schreiner, P. R.; Wittkopp, A. H-Bonding Additives Act Like Lewis Acid Catalysts.Organic Letters 2002, 4, 217–220, Publisher: American Chemical Society
work page 2002
-
[58]
P.; Sgarzani, V .; Bernardi, L.; Ricci, A
Herrera, R. P.; Sgarzani, V .; Bernardi, L.; Ricci, A. Catalytic Enantioselective Friedel–Crafts Alkylation of Indoles with Nitroalkenes by Using a Simple Thiourea Organocatalyst. Angewandte Chemie International Edition 2005, 44, 6576–6579
work page 2005
-
[59]
Okino, T.; Hoashi, Y .; Takemoto, Y . Enantioselective Michael Reaction of Malonates to Nitroolefins Catalyzed by Bifunctional Organocatalysts. Journal of the American Chemical Society 2003, 125, 12672–12673, Publisher: American Chemical Society
work page 2003
-
[60]
Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012
work page 2012
-
[61]
N.; Gomes, J.; Geniesse, C.; Pappu, A
Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V . MoleculeNet: a benchmark for molecular machine learning.Chemical science 2018, 9, 513–530
work page 2018
-
[62]
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Liu, Y .; Iter, D.; Xu, Y .; Wang, S.; Xu, R.; Zhu, C. GPTEval: NLG Evaluation using GPT-4 with Better Human Alignment. arXiv preprint arXiv:2303.16634 2023,
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [63]
-
[64]
A.; Badowski, T.; Molga, K.; Szymku´c, S
Grzybowski, B. A.; Badowski, T.; Molga, K.; Szymku´c, S. Network search algorithms and scoring functions for advanced-level computerized synthesis planning. WIREs Computational Molecular Science 2023, 13, e1630
work page 2023
-
[65]
Artificial intelli- gence and automation in computer aided synthesis planning
Thakkar, A.; Johansson, S.; Jorner, K.; Buttar, D.; Reymond, J.-L.; Engkvist, O. Artificial intelli- gence and automation in computer aided synthesis planning. Reaction chemistry & engineering 2021, 6, 27–51
work page 2021
-
[66]
Dual use of artificial-intelligence-powered drug discovery
Urbina, F.; Lentzos, F.; Invernizzi, C.; Ekins, S. Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence 2022, 4, 189–191
work page 2022
-
[67]
A teachable moment for dual-use.Nature machine intelligence 2022, 4, 607–607
Urbina, F.; Lentzos, F.; Invernizzi, C.; Ekins, S. A teachable moment for dual-use.Nature machine intelligence 2022, 4, 607–607
work page 2022
-
[68]
Campbell, Q. L.; Herington, J.; White, A. D. Censoring chemical data to mitigate dual use risk. arXiv preprint arXiv:2304.10510 2023,
-
[69]
Scaling Laws for Reward Model Overoptimization
Gao, L.; Schulman, J.; Hilton, J. Scaling Laws for Reward Model Overoptimization. arXiv preprint arXiv:2210.10760 2022,
-
[70]
Improving language understanding by generative pre-training
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I., et al. Improving language understanding by generative pre-training. 2018,
work page 2018
-
[71]
Trustworthy AI: From Principles to Practices
Li, B.; Qi, P.; Liu, B.; Di, S.; Liu, J.; Pei, J.; Yi, J.; Zhou, B. Trustworthy AI: From Principles to Practices. ACM Computing Surveys 2021, 55, 1 – 46. 15
work page 2021
-
[72]
Hocky, G. M.; White, A. D. Natural language processing models that automate programming will transform chemistry research and teaching. Digital Discovery 2022, 1, 79–83
work page 2022
-
[73]
Henderson, P.; Li, X.; Jurafsky, D.; Hashimoto, T.; Lemley, M. A.; Liang, P. Foundation Models and Fair Use. arXiv preprint arXiv:2303.15715 2023,
-
[74]
The Role of Cooperation in Responsible AI Development
Askell, A.; Brundage, M.; Hadfield, G. The Role of Cooperation in Responsible AI Development. 2019
work page 2019
-
[75]
Neufville, R. d.; Baum, S. D. Collective action on artificial intelligence: A primer and review. Technology in Society 2021, 66, 101649
work page 2021
-
[76]
LLaMA: Open and Efficient Foundation Language Models
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; Rodriguez, A.; Joulin, A.; Grave, E.; Lample, G. LLaMA: Open and Efficient Foundation Language Models. 2023
work page 2023
-
[77]
Chiang, W.-L.; Li, Z.; Lin, Z.; Sheng, Y .; Wu, Z.; Zhang, H.; Zheng, L.; Zhuang, S.; Zhuang, Y .; Gonzalez, J. E.; Stoica, I.; Xing, E. P. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. 2023; https://lmsys.org/blog/2023-03-30-vicuna/
work page 2023
-
[78]
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Mukherjee, S.; Mitra, A.; Jawahar, G.; Agarwal, S.; Palangi, H.; Awadallah, A. Orca: Progressive Learning from Complex Explanation Traces of GPT-4. 2023
work page 2023
- [79]
-
[80]
Press, O.; Zhang, M.; Min, S.; Schmidt, L.; Smith, N. A.; Lewis, M. Measuring and Narrowing the Compositionality Gap in Language Models. arXiv preprint arXiv:2210.03350 2022,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.