hub

Chemllm: A chemical large language model

Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Dongzhan Zhou, et al · 2024 · arXiv 2402.06852

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Distribution-Aware Reward optimizes LLM regression by treating rollouts as empirical predictive distributions and rewarding marginal improvements in CRPS quality rather than point accuracy alone.

FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.

Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

LLM agents reach only 50.6% accuracy on chemical cost estimation within 25% error even with tools, dropping with noise due to parsing, pack selection, and tool-use failures.

Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

SCPT creates similarity-constrained preference triplets from scaffolds to train LLMs as conditional molecular editors that improve properties while keeping scaffolds intact.

Speak-to-Structure: Evaluating LLMs in Open-domain Natural Language-Driven Molecule Generation

cs.CL · 2024-12-19 · unverdicted · novelty 7.0

S^2-Bench is a new one-to-many benchmark for natural language-driven molecule generation with three tasks, and OpenMolIns is an instruction dataset enabling Llama3.1-8B to outperform GPT-4o and Claude-3.5 on it.

Large Language Model Agent for User-friendly Chemical Process Simulations

physics.chem-ph · 2026-01-15 · unverdicted · novelty 6.0

An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

cs.CL · 2024-11-22 · unverdicted · novelty 6.0

MolReFlect introduces a teacher-student framework that automatically creates fine-grained molecule-text alignments to achieve SOTA results on molecule-caption translation.

SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

SciCore-Mol augments LLMs with three integrated modules for molecular perception, latent diffusion generation, and reaction reasoning, claiming an 8B open model competes with or exceeds proprietary systems on chemical tasks.

RefiningGPT: Specialized language Models for Automated Refinery Unit-level Process Diagram Synthesis

cs.CE · 2026-05-19 · unverdicted · novelty 5.0

RefineGPT is a hierarchical LLM agent that selects refinery units via a supervised fine-tuned small model and generates topologies via a large model, trained on motifs extracted from legacy diagrams.

ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

cs.AI · 2026-05-17 · unverdicted · novelty 5.0

ChemVA framework uses hybrid-granularity visual anchors and entity-name alignment to improve LLM performance on chemical reaction diagrams by ~20 points, reaching 92% structural accuracy on the new OCRD-Bench dataset.

Bolek: A Multimodal Language Model for Molecular Reasoning

cs.LG · 2026-05-04 · unverdicted · novelty 5.0

Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.

Heterogeneous Scientific Foundation Model Collaboration

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.

AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

cs.AI · 2026-03-04 · unverdicted · novelty 5.0

AI4S-SDS uses sparse MCTS and differentiable physics alignment to generate valid solvent mixtures and identifies a competitive photoresist developer formulation.

ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

cs.CE · 2025-07-29 · unverdicted · novelty 5.0

ChemDFM-R is a chemical reasoning LLM trained via a four-stage pipeline on the ChemFG dataset of functional-group annotations for molecules and reactions, reaching performance comparable to or better than commercial models on chemical benchmarks.

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 2.0

Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

citing papers explorer

Showing 15 of 15 citing papers.

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression cs.LG · 2026-05-20 · unverdicted · none · ref 10
Distribution-Aware Reward optimizes LLM regression by treating rollouts as empirical predictive distributions and rewarding marginal improvements in CRPS quality rather than point accuracy alone.
FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization cs.LG · 2026-05-11 · unverdicted · none · ref 43
FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning cs.AI · 2026-05-08 · unverdicted · none · ref 35
LLM agents reach only 50.6% accuracy on chemical cost estimation within 25% error even with tools, dropping with noise due to parsing, pack selection, and tool-use failures.
Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models cs.LG · 2026-04-14 · unverdicted · none · ref 28
SCPT creates similarity-constrained preference triplets from scaffolds to train LLMs as conditional molecular editors that improve properties while keeping scaffolds intact.
Speak-to-Structure: Evaluating LLMs in Open-domain Natural Language-Driven Molecule Generation cs.CL · 2024-12-19 · unverdicted · none · ref 40
S^2-Bench is a new one-to-many benchmark for natural language-driven molecule generation with three tasks, and OpenMolIns is an instruction dataset enabling Llama3.1-8B to outperform GPT-4o and Claude-3.5 on it.
Large Language Model Agent for User-friendly Chemical Process Simulations physics.chem-ph · 2026-01-15 · unverdicted · none · ref 39
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts cs.CL · 2024-11-22 · unverdicted · none · ref 44
MolReFlect introduces a teacher-student framework that automatically creates fine-grained molecule-text alignments to achieve SOTA results on molecule-caption translation.
SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules cs.AI · 2026-05-21 · unverdicted · none · ref 60
SciCore-Mol augments LLMs with three integrated modules for molecular perception, latent diffusion generation, and reaction reasoning, claiming an 8B open model competes with or exceeds proprietary systems on chemical tasks.
RefiningGPT: Specialized language Models for Automated Refinery Unit-level Process Diagram Synthesis cs.CE · 2026-05-19 · unverdicted · none · ref 25
RefineGPT is a hierarchical LLM agent that selects refinery units via a supervised fine-tuned small model and generates topologies via a large model, trained on motifs extracted from legacy diagrams.
ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding cs.AI · 2026-05-17 · unverdicted · none · ref 46
ChemVA framework uses hybrid-granularity visual anchors and entity-name alignment to improve LLM performance on chemical reaction diagrams by ~20 points, reaching 92% structural accuracy on the new OCRD-Bench dataset.
Bolek: A Multimodal Language Model for Molecular Reasoning cs.LG · 2026-05-04 · unverdicted · none · ref 13
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
Heterogeneous Scientific Foundation Model Collaboration cs.AI · 2026-04-30 · unverdicted · none · ref 68
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment cs.AI · 2026-03-04 · unverdicted · none · ref 18
AI4S-SDS uses sparse MCTS and differentiable physics alignment to generate valid solvent mixtures and identifies a competitive photoresist developer formulation.
ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge cs.CE · 2025-07-29 · unverdicted · none · ref 24
ChemDFM-R is a chemical reasoning LLM trained via a four-stage pipeline on the ChemFG dataset of functional-group annotations for molecules and reactions, reaching performance comparable to or better than commercial models on chemical benchmarks.
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 245
Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

Chemllm: A chemical large language model

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer