Are large language models superhuman chemists?arXiv preprint arXiv:2404.01475

Are large language models superhuman chemists? , author= · 2024 · arXiv 2404.01475

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Agentic generation of verifiable rules for deterministic, self-expanding reaction classification

cs.AI · 2026-07-01 · unverdicted · novelty 7.0

Multi-agent LLMs generate and verify 14,073 deterministic reaction rules from 665,901 patents, enabling 97.7% classification of unseen reactions with finer resolution than fixed proprietary systems.

From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

ChemCoTBench-V2 is a new rule-verifiable benchmark with 5,620 samples across 18 tasks that evaluates LLM chemical reasoning traces using deterministic chemistry rules and reference traces rather than final answers alone.

Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research

cs.CE · 2026-06-01 · unverdicted · novelty 7.0

Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.

PolyReal: A Benchmark for Real-World Polymer Science Workflows

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

Sci-PRM is a tool-aware process reward model trained on the SCIPRM70K dataset to provide fine-grained supervision for scientific reasoning and shown to boost foundation models via Best-of-N selection and RL.

From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

cond-mat.mtrl-sci · 2026-05-04 · unverdicted · novelty 2.0

Hackathon submissions indicate LLMs are moving from general assistants toward composable multi-agent systems for structuring scientific knowledge and automating tasks in materials science and chemistry.

citing papers explorer

Showing 1 of 1 citing paper after filters.

PolyReal: A Benchmark for Real-World Polymer Science Workflows cs.CV · 2026-04-03 · unverdicted · none · ref 35
PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.

Are large language models superhuman chemists?arXiv preprint arXiv:2404.01475

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer