Cmphysbench: A benchmark for evaluating large language models in condensed matter physics

Weida Wang, Dongchen Huang, Jiatong Li, Tengchao Yang, Ziyang Zheng, Di Zhang, Dong Han, Benteng Chen, Binzhao Luo, Zhiyu Liu, et al · 2025 · arXiv 2508.18124

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

MolViBench: Evaluating LLMs on Molecular Vibe Coding

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

MolViBench is the first benchmark designed to evaluate LLMs on generating executable programs for molecular tasks in drug discovery.

PolyReal: A Benchmark for Real-World Polymer Science Workflows

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.

The Agentification of Scientific Research: A Physicist's Perspective

cs.AI · 2026-04-16 · unverdicted · novelty 3.0

AI will evolve from a research tool into a collaborator, fundamentally reshaping scientific collaboration, discovery, publishing, and evaluation while requiring continuous learning and idea diversity for original contributions.

citing papers explorer

Showing 3 of 3 citing papers.

MolViBench: Evaluating LLMs on Molecular Vibe Coding cs.CL · 2026-05-04 · unverdicted · none · ref 2
MolViBench is the first benchmark designed to evaluate LLMs on generating executable programs for molecular tasks in drug discovery.
PolyReal: A Benchmark for Real-World Polymer Science Workflows cs.CV · 2026-04-03 · unverdicted · none · ref 49
PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.
The Agentification of Scientific Research: A Physicist's Perspective cs.AI · 2026-04-16 · unverdicted · none · ref 29
AI will evolve from a research tool into a collaborator, fundamentally reshaping scientific collaboration, discovery, publishing, and evaluation while requiring continuous learning and idea diversity for original contributions.

Cmphysbench: A benchmark for evaluating large language models in condensed matter physics

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer