pith. machine review for the scientific record.
sign in

Frontierscience: Evaluating ai’s ability to perform expert-level scien- tific reasoning

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

dataset 2

citation-polarity summary

years

2026 9

roles

dataset 2

polarities

use dataset 2

representative citing papers

CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

CITE certifies that a prespecified answer is the unique mode of an LLM response distribution with anytime-valid error control under arbitrary data-driven stopping and without prior knowledge of the answer set.

COMPOSITE-Stem

cs.AI · 2026-04-10 · conditional · novelty 5.0

COMPOSITE-STEM is a new benchmark of 70 expert-curated STEM tasks where frontier AI agents score at most 21% using flexible exact-match and rubric-based grading.

citing papers explorer

Showing 9 of 9 citing papers.