pith. sign in

Blade: Benchmarking language model agents for data-driven science

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

dataset 2 background 1

citation-polarity summary

years

2026 4 2025 2

clear filters

representative citing papers

Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

AI agents handle individual data-loading and reformatting steps on neuroscience datasets but rarely complete fully error-free end-to-end pipelines, and AI judges are unreliable without ground-truth references.

Evidence-Informed LLM Beliefs for Continual Scientific Discovery

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

Evidence-informed belief updates make Bayesian surprise non-stationary in LLM hypothesis search, with embedding-based RAG identifying 37.5% spurious static surprisals and modified search (filtering plus diversity) yielding 30.62% higher accumulated non-stationary surprisal across five domains.

citing papers explorer

Showing 6 of 6 citing papers.