pith. sign in

hub Canonical reference

ScienceAgentBench: Toward rigorous assessment of language agents for data-driven scientific discovery

Canonical reference. 100% of citing Pith papers cite this work as background.

19 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 5 dataset 2

citation-polarity summary

years

2026 12 2025 7

polarities

background 6

representative citing papers

Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

AI agents handle individual data-loading and reformatting steps on neuroscience datasets but rarely complete fully error-free end-to-end pipelines, and AI judges are unreliable without ground-truth references.

How Far Are We From True Auto-Research?

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

ResearchArena shows that agent-generated papers fail top-tier acceptance standards primarily due to fabricated results, underpowered experiments, and plan-execution mismatches that vary sharply by agent.

citing papers explorer

Showing 19 of 19 citing papers.