hub Canonical reference

Augmenting large language models with chemistry tools

Bran, Andres M · 2024 · DOI 10.1038/s42256-024-00832-8

Canonical reference. 86% of citing Pith papers cite this work as background.

21 Pith papers citing it

Background 86% of classified citations

open at publisher browse 21 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 6 method 1

citation-polarity summary

background 6 use method 1

representative citing papers

Evaluating Large Language Models in Scientific Discovery

cs.AI · 2025-12-17 · unverdicted · novelty 8.0

The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.

LAB-Bench: Measuring Capabilities of Language Models for Biology Research

cs.AI · 2024-07-14 · accept · novelty 8.0

LAB-Bench provides over 2,400 multiple-choice questions to measure LLM performance on real biology research tasks like literature recall, figure reading, database access, and sequence manipulation, with initial results compared against human expert biologists.

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechanics disambiguation cases.

Margin-calibrated Classifier Guidance for Property-driven Synthesis Planning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Margin-calibrated classifier guidance via Sequence Completion Ranking raises multi-step retrosynthesis solve rates from 16.8% to 95.3% on USPTO-190 and unlocks previously unsolvable targets.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

SCPT creates similarity-constrained preference triplets from scaffolds to train LLMs as conditional molecular editors that improve properties while keeping scaffolds intact.

MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration

cond-mat.mtrl-sci · 2026-04-03 · conditional · novelty 7.0 · 2 refs

MatClaw shows a code-first LLM agent autonomously generating and executing workflows for ML force field training, Curie temperature prediction, and parameter search on CuInP2S6, succeeding on code but requiring interventions for tacit domain knowledge.

Integrating Domain-Specialized Language Models with AI Measurement Tools for Deterministic Atomic-Resolution Experimentation

physics.app-ph · 2026-02-24 · unverdicted · novelty 7.0

Domain-specialized small language models enable deterministic atomic-resolution scanning probe microscopy control with 99.3% command accuracy, lower computational cost, and better domain performance than larger general models.

AlphaEvolve: A coding agent for scientific and algorithmic discovery

cs.AI · 2025-06-16 · unverdicted · novelty 7.0

AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.

ColPackAgent: Agent-Skill-Guided Hard-Particle Monte Carlo Workflows for Colloidal Packing

cs.AI · 2026-05-15 · unverdicted · novelty 6.0

ColPackAgent integrates a custom colpack Python package wrapping HOOMD-blue with MCP tools and an agent skill to enable reliable autonomous workflows for colloidal packing simulations across interactive, prompt-driven, and autoresearch modes.

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

An LLM entity-tagging pipeline plus multi-agent system extracts ~6.3M nuanced records from 22.5M PubMed papers across six tasks with lower measured error than existing curated databases.

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

BioTool dataset enables fine-tuning a 4B-parameter LLM to outperform GPT-5.1 in biomedical tool calling while improving downstream answer quality per human experts.

In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.

Large Language Model Agent for User-friendly Chemical Process Simulations

physics.chem-ph · 2026-01-15 · unverdicted · novelty 6.0

An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.

CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents

cs.AI · 2025-11-30 · conditional · novelty 6.0

CodeDistiller distills 250 materials-science GitHub repositories into vetted code libraries that improve the accuracy and scientific soundness of experiments generated by ASD agents.

DynaMate2: Democratization of Agentic AI for Expert-Designed Custom Workflows

physics.chem-ph · 2026-05-20 · unverdicted · novelty 5.0

DynaMate2 is an open-source hierarchical agentic framework that converts expert-defined Python functions into AI-callable tools for supervised multi-agent scientific workflows.

Evidence-Grounded Frontier Mapping and Agentic Hypothesis Generation in Nanomedicine

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

pArticleMap combines article embeddings, graph-based frontier extraction, and agentic LLMs to map nanomedicine literature and generate hypotheses, achieving 10.8% gold recovery and 61% future-neighborhood rate in retrospective benchmarks.

Bolek: A Multimodal Language Model for Molecular Reasoning

cs.LG · 2026-05-04 · unverdicted · novelty 5.0

Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.

From Perception to Autonomous Computational Modeling: A Multi-Agent Approach

cs.CE · 2026-04-08 · unverdicted · novelty 5.0

A multi-agent LLM framework autonomously completes the full computational mechanics pipeline from a photograph to a code-compliant engineering report on a steel L-bracket example.

Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator

cs.DL · 2025-07-16 · unverdicted · novelty 4.0

The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.

ToolRosella: Translating Code Repositories into Standardized Tools for Scientific Agents

cs.SE · 2026-03-10

citing papers explorer

Showing 21 of 21 citing papers.

Evaluating Large Language Models in Scientific Discovery cs.AI · 2025-12-17 · unverdicted · none · ref 18
The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.
LAB-Bench: Measuring Capabilities of Language Models for Biology Research cs.AI · 2024-07-14 · accept · none · ref 32
LAB-Bench provides over 2,400 multiple-choice questions to measure LLM performance on real biology research tasks like literature recall, figure reading, database access, and sequence manipulation, with initial results compared against human expert biologists.
SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science cs.AI · 2026-05-18 · unverdicted · none · ref 11
SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechanics disambiguation cases.
Margin-calibrated Classifier Guidance for Property-driven Synthesis Planning cs.LG · 2026-05-13 · unverdicted · none · ref 95
Margin-calibrated classifier guidance via Sequence Completion Ranking raises multi-step retrosynthesis solve rates from 16.8% to 95.3% on USPTO-190 and unlocks previously unsolvable targets.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 167
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models cs.LG · 2026-04-14 · unverdicted · none · ref 4
SCPT creates similarity-constrained preference triplets from scaffolds to train LLMs as conditional molecular editors that improve properties while keeping scaffolds intact.
MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration cond-mat.mtrl-sci · 2026-04-03 · conditional · none · ref 3 · 2 links
MatClaw shows a code-first LLM agent autonomously generating and executing workflows for ML force field training, Curie temperature prediction, and parameter search on CuInP2S6, succeeding on code but requiring interventions for tacit domain knowledge.
Integrating Domain-Specialized Language Models with AI Measurement Tools for Deterministic Atomic-Resolution Experimentation physics.app-ph · 2026-02-24 · unverdicted · none · ref 10
Domain-specialized small language models enable deterministic atomic-resolution scanning probe microscopy control with 99.3% command accuracy, lower computational cost, and better domain performance than larger general models.
AlphaEvolve: A coding agent for scientific and algorithmic discovery cs.AI · 2025-06-16 · unverdicted · none · ref 10
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
ColPackAgent: Agent-Skill-Guided Hard-Particle Monte Carlo Workflows for Colloidal Packing cs.AI · 2026-05-15 · unverdicted · none · ref 51
ColPackAgent integrates a custom colpack Python package wrapping HOOMD-blue with MCP tools and an agent skill to enable reliable autonomous workflows for colloidal packing simulations across interactive, prompt-driven, and autoresearch modes.
Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale cs.LG · 2026-05-07 · unverdicted · none · ref 13 · 2 links
An LLM entity-tagging pipeline plus multi-agent system extracts ~6.3M nuanced records from 22.5M PubMed papers across six tasks with lower measured error than existing curated databases.
BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models cs.CL · 2026-05-07 · unverdicted · none · ref 45
BioTool dataset enables fine-tuning a 4B-parameter LLM to outperform GPT-5.1 in biomedical tool calling while improving downstream answer quality per human experts.
In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach cs.AI · 2026-04-10 · unverdicted · none · ref 76
A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.
Large Language Model Agent for User-friendly Chemical Process Simulations physics.chem-ph · 2026-01-15 · unverdicted · none · ref 38
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents cs.AI · 2025-11-30 · conditional · none · ref 2
CodeDistiller distills 250 materials-science GitHub repositories into vetted code libraries that improve the accuracy and scientific soundness of experiments generated by ASD agents.
DynaMate2: Democratization of Agentic AI for Expert-Designed Custom Workflows physics.chem-ph · 2026-05-20 · unverdicted · none · ref 1
DynaMate2 is an open-source hierarchical agentic framework that converts expert-defined Python functions into AI-callable tools for supervised multi-agent scientific workflows.
Evidence-Grounded Frontier Mapping and Agentic Hypothesis Generation in Nanomedicine cs.AI · 2026-05-18 · unverdicted · none · ref 22
pArticleMap combines article embeddings, graph-based frontier extraction, and agentic LLMs to map nanomedicine literature and generate hypotheses, achieving 10.8% gold recovery and 61% future-neighborhood rate in retrospective benchmarks.
Bolek: A Multimodal Language Model for Molecular Reasoning cs.LG · 2026-05-04 · unverdicted · none · ref 6
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
From Perception to Autonomous Computational Modeling: A Multi-Agent Approach cs.CE · 2026-04-08 · unverdicted · none · ref 25
A multi-agent LLM framework autonomously completes the full computational mechanics pipeline from a photograph to a code-compliant engineering report on a steel L-bracket example.
Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator cs.DL · 2025-07-16 · unverdicted · none · ref 120
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.
ToolRosella: Translating Code Repositories into Standardized Tools for Scientific Agents cs.SE · 2026-03-10 · unreviewed · ref 13

Augmenting large language models with chemistry tools

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer