pith. sign in

arxiv: 2509.02473 · v2 · pith:YTNZSBJTnew · submitted 2025-09-02 · 💻 cs.DB

FDABench: A Benchmark for Data Agents on Analytical Queries over Heterogeneous Data

classification 💻 cs.DB
keywords dataagentsagentanalyticalbenchmarkfdabenchheterogeneousacross
0
0 comments X
read the original abstract

The growing demand for data-driven decision-making has created an urgent need for data agents that can reason over heterogeneous data (databases, documents, web content, images, videos, and audio) to answer complex analytical queries. However, evaluating such agents remains challenging: existing benchmarks often focus on isolated agent capabilities or limited data modalities, lacking comprehensive coverage of heterogeneous data and rigorous evaluation across diverse data agent architectures. To address these challenges, we present FDABench, a benchmark for evaluating data agents' reasoning ability over heterogeneous data in analytical scenarios. Our contributions are threefold: (1) A comprehensive benchmark of 2,007 tasks spanning six data modalities with a unified, multi-granularity evaluation framework. (2) We design PUDDING, an agentic dataset construction framework that leverages LLM generation with iterative expert validation for reliable and scalable benchmark construction. (3) Extensive experiments across diverse data agent architectures, including general analytical agents, semantic operator frameworks, and RAG-based methods, revealing key insights and guidelines for future data agent development. Our data and source code are released at https://github.com/fdabench/FDAbench.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Data Agents Under Attack: Vulnerabilities in LLM-Driven Analytical Systems

    cs.CR 2026-06 unverdicted novelty 7.0

    The paper introduces a layered vulnerability framework and attack taxonomy for LLM-driven data agents and demonstrates attacks on four open-source and two production systems.

  2. Business Utility of Large Language Models as Exploratory Data Analysis Agents

    cs.CY 2026-05 unverdicted novelty 5.0

    Evaluation of 15 LLM configurations across four conditions in a supply chain EDA benchmark finds most lack sufficient repeatability for autonomous deployment, with GPT-5.4 at extra-high reasoning effort scoring highes...