pith. sign in

Researcherbench: Evaluating deep ai research systems on the frontiers of scientific inquiry

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

years

2026 8

roles

background 3

polarities

background 3

clear filters

representative citing papers

Can AI Agents Synthesize Scientific Conclusions?

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

A new benchmark and clean-room harness show frontier AI agents reach only 0.337 factual F1 when synthesizing conclusions from scientific evidence.

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

cs.AI · 2026-05-25 · unverdicted · novelty 6.0

ScientistOne introduces Chain-of-Evidence and an audit system that achieves zero hallucinated references, perfect score verification, and top method-code alignment while matching or beating human experts on five frontier tasks and generalizing to six more.

citing papers explorer

Showing 5 of 5 citing papers after filters.