MolViBench is the first benchmark designed to evaluate LLMs on generating executable programs for molecular tasks in drug discovery.
Cmphysbench: A benchmark for evaluating large language models in condensed matter physics
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.
AI will evolve from a research tool into a collaborator, fundamentally reshaping scientific collaboration, discovery, publishing, and evaluation while requiring continuous learning and idea diversity for original contributions.
citing papers explorer
-
MolViBench: Evaluating LLMs on Molecular Vibe Coding
MolViBench is the first benchmark designed to evaluate LLMs on generating executable programs for molecular tasks in drug discovery.
-
PolyReal: A Benchmark for Real-World Polymer Science Workflows
PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.
-
The Agentification of Scientific Research: A Physicist's Perspective
AI will evolve from a research tool into a collaborator, fundamentally reshaping scientific collaboration, discovery, publishing, and evaluation while requiring continuous learning and idea diversity for original contributions.