TQA-Bench: Evaluating LLMs for Multi-Table Question Answering

Binhang Yuan; Chen Wang; Chenyue Li; Guangxin He; You Peng; Zipeng Qiu

arxiv: 2411.19504 · v2 · pith:3YI64NYTnew · submitted 2024-11-29 · 💻 cs.AI · cs.CL· cs.IR

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering

Zipeng Qiu , Chenyue Li , You Peng , Guangxin He , Binhang Yuan , Chen Wang This is my paper

classification 💻 cs.AI cs.CLcs.IR

keywords llmsmulti-tabledatarelationalansweringbillioncomplexcritical

0 comments

read the original abstract

The advance of large language models (LLMs) has unlocked great opportunities in complex multi-modal data management tasks, particularly in question answering (QA) over complicated multi-table relational data. Despite significant progress, systematically evaluating LLMs on multi-table QA remains a critical challenge due to the inherent complexity of analyzing the modality of relational data structures and the potentially large scale of serialized tabular data. Existing benchmarks primarily focus on single-table QA, failing to capture the intricacies of connections across multiple relational tables, as required in real-world domains such as finance, healthcare, and e-commerce. We present TQA-Bench, a long-context analytical multi-table QA benchmark derived from real-world public datasets, with a flexible sampling mechanism that varies context length (8K--64K tokens) and symbolic extensions for assessing reasoning beyond retrieval and pattern matching. We systematically evaluate a set of LLMs spanning model scales from 2 billion to 671 billion parameters. Our extensive experiments reveal critical insights into the performance of LLMs in multi-table QA, highlighting both challenges and opportunities for advancing their application in complex, data-driven environments.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis
cs.DB 2026-03 unverdicted novelty 7.0

The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database pr...