ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

Zihan Zhao , Ziping Wan , Lu Chen , Xuanze Lin , Shiyang Yu , Situo Zhang , Da Ma , Zichen Zhu

show 7 more authors

Danyang Zhang Huayang Wang Zhongyang Dai Liyang Wen Bo Chen Xin Chen Kai Yu

Authors on Pith no claims yet

classification 💻 cs.CE cs.AI

keywords chemicalreasoningatomizedknowledgechemdfm-rllmsmodelfunctional

0 comments

read the original abstract

Atomized chemical knowledge, such as functional group information of molecules and reactions, plays a pivotal intermediate role in the reasoning process that connects molecular structures with their properties and reactivities. While large language models (LLMs) have achieved impressive progress, the absence of atomized chemical knowledge results in their superficial understanding of chemistry and limited chemical reasoning capabilities. In this work, to tackle this problem, we develop a Chemical Reasoning LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized chemical knowledge, ChemFG, annotating the presence of functional groups in molecules and the changes of functional groups during chemical reactions, to enhance the model's understanding of the fundamental principles and internal logic of chemistry. Then, we propose a mixed-source distillation method that initializes the model's reasoning capability with limited distilled data, and develop a four-stage training pipeline to equip the model with atomized chemical knowledge and chemical reasoning logic. Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs, surpassing both the general-domain LLMs and domain-specific chemical LLMs. Moreover, ChemDFM-R achieves comparable or superior performance compared with cutting-edge commercial LLMs, such as o4-mini. Further case studies illustrate how explicit reasoning chains significantly improve the model's reliability, transparency, and practicality in real-world human-AI collaboration scenarios.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MolDeTox: Evaluating Language Model's Stepwise Fragment Editing for Molecular Detoxification
cs.AI 2026-05 unverdicted novelty 6.0

MolDeTox is a new benchmark that shows fragment-level stepwise editing by LLMs and VLMs improves structural validity and detoxification quality over prior toxicity-focused evaluations.
Mol-Debate: Multi-Agent Debate Improves Structural Reasoning in Molecular Design
cs.AI 2026-04 unverdicted novelty 6.0

Mol-Debate applies multi-agent debate in an iterative loop with perspective orchestration to achieve state-of-the-art text-guided molecular design, scoring 59.82% exact match on ChEBI-20 and 50.52% weighted success on...