CritPt benchmark shows state-of-the-art LLMs reach only 5.7% average accuracy on full-scale unpublished physics research tasks, rising to about 10% with coding tools.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
A potential-independent dynamical system closes the cosmographic hierarchy, revealing inflationary and radiation-dominated phases as attractors in the early universe expansion history.
citing papers explorer
-
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
CritPt benchmark shows state-of-the-art LLMs reach only 5.7% average accuracy on full-scale unpublished physics research tasks, rising to about 10% with coding tools.
-
Closing the Cosmographic Hierarchy: Dynamical Attractors from Inflation to Reheating
A potential-independent dynamical system closes the cosmographic hierarchy, revealing inflationary and radiation-dominated phases as attractors in the early universe expansion history.