Blade: Benchmarking language model agents for data-driven science

· 2024 · arXiv 2408.09667

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

dataset 2 background 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

AI agents handle individual data-loading and reformatting steps on neuroscience datasets but rarely complete fully error-free end-to-end pipelines, and AI judges are unreliable without ground-truth references.

Agentic-imodels: Evolving agentic interpretability tools via autoresearch

cs.AI · 2026-05-05 · unverdicted · novelty 7.0

Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

cs.AI · 2025-04-28 · accept · novelty 4.0

A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

cs.CL · 2025-03-27 · accept · novelty 3.0

A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

citing papers explorer

Showing 4 of 4 citing papers.

Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse cs.LG · 2026-05-12 · unverdicted · none · ref 21 · 2 links
AI agents handle individual data-loading and reformatting steps on neuroscience datasets but rarely complete fully error-free end-to-end pipelines, and AI judges are unreliable without ground-truth references.
Agentic-imodels: Evolving agentic interpretability tools via autoresearch cs.AI · 2026-05-05 · unverdicted · none · ref 23
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review cs.AI · 2025-04-28 · accept · none · ref 101
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
Large Language Model Agent: A Survey on Methodology, Applications and Challenges cs.CL · 2025-03-27 · accept · none · ref 127
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

Blade: Benchmarking language model agents for data-driven science

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer