Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation
Pith reviewed 2026-05-18 09:14 UTC · model grok-4.3
The pith
This survey organizes table question answering research by categorizing benchmarks, task setups, and LLM modeling strategies while identifying open problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The field of table question answering lacks systematic organization of task formulations, core challenges, and methodological trends. This survey addresses the gap by providing a comprehensive categorization of existing benchmarks and task setups, grouping current modeling strategies according to the challenges they target, and highlighting underexplored but timely topics that have not been systematically covered in prior research, thereby offering a consolidated foundation for the TQA community.
What carries the argument
The categorization framework that organizes benchmarks by representation, complexity, modality and domain, and groups modeling strategies by targeted challenges.
If this is right
- Researchers gain a clearer map of existing benchmarks and can more readily identify which task setups remain underexplored.
- Modeling approaches can be compared directly by the challenges they address rather than by isolated papers.
- New work on reinforcement learning or other emerging directions can be situated within the highlighted gaps.
- The survey's analysis of strengths and limitations can steer choices among LLM-based methods for specific TQA settings.
Where Pith is reading between the lines
- The same challenge-based grouping might transfer to other structured-data QA tasks such as knowledge-base question answering.
- Future benchmarks could be designed explicitly to test the boundaries between the categories the survey defines.
- Empirical studies could measure whether models trained under one challenge category generalize to another.
Load-bearing premise
That prior TQA research is fragmented enough for a single survey to unify the threads and deliver a useful organization that researchers can build on.
What would settle it
A follow-up analysis that identifies many recent papers or trends that fall outside the proposed benchmark categories and challenge-based groupings, showing the organization does not capture the field's structure.
Figures
read the original abstract
Table Question Answering (TQA) aims to answer natural language questions about tabular data, often accompanied by additional contexts such as text passages. The task spans diverse settings, varying in table representation, question/answer complexity, modality involved, and domain. While recent advances in large language models (LLMs) have led to substantial progress in TQA, the field still lacks a systematic organization and understanding of task formulations, core challenges, and methodological trends, particularly in light of emerging research directions such as reinforcement learning. This survey addresses this gap by providing a comprehensive and structured overview of TQA research with a focus on LLM-based methods. We provide a comprehensive categorization of existing benchmarks and task setups. We group current modeling strategies according to the challenges they target, and analyze their strengths and limitations. Furthermore, we highlight underexplored but timely topics that have not been systematically covered in prior research. By unifying disparate research threads and identifying open problems, our survey offers a consolidated foundation for the TQA community, enabling a deeper understanding of the state of the art and guiding future developments in this rapidly evolving area.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on Table Question Answering (TQA) in the era of large language models. It claims to address the lack of systematic organization in the field by delivering a comprehensive categorization of existing benchmarks and task setups, grouping current LLM-based modeling strategies according to the challenges they target, analyzing their strengths and limitations, and highlighting underexplored topics such as reinforcement learning applications.
Significance. If the categorization proves complete and the analysis balanced, the survey would provide a valuable consolidated foundation for the TQA community. By unifying disparate research threads across modalities, domains, and LLM adaptations, it can enable a deeper understanding of the state of the art and guide future work in this rapidly evolving area. The structured grouping of methods by targeted challenges is a useful contribution for researchers.
minor comments (3)
- [Abstract] Abstract: The reference to 'emerging research directions such as reinforcement learning' would be more concrete if accompanied by at least one specific citation or example of recent work in that direction.
- [§1] §1: Consider adding a short table early in the introduction that contrasts traditional TQA settings with LLM-adapted variants (e.g., differences in table representation and answer complexity) to improve accessibility for readers unfamiliar with the subfield.
- [Introduction] The survey would benefit from an explicit statement of the time period and search methodology used to collect the reviewed papers, which would strengthen the claim of comprehensive coverage.
Simulated Author's Rebuttal
We thank the referee for the positive summary of our survey and for recommending minor revision. The referee accurately captures the manuscript's focus on systematically organizing TQA benchmarks, LLM-based modeling strategies grouped by targeted challenges, and underexplored directions such as reinforcement learning. We are pleased that the structured overview is viewed as a useful foundation for the community.
Circularity Check
No significant circularity in survey synthesis
full rationale
This is a survey paper whose central contribution is a descriptive categorization of existing TQA benchmarks, task setups, and LLM-based modeling strategies drawn from the published literature. No equations, fitted parameters, predictions, or derivations are present that could reduce to the paper's own inputs by construction. The framing of addressing a 'lack of systematic organization' is a standard meta-claim for survey articles and does not rely on self-citation chains or self-definitional loops for its validity; the work remains self-contained as an external synthesis of prior threads.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Table Question Answering spans diverse settings varying in table representation, question/answer complexity, modality, and domain.
Forward citations
Cited by 2 Pith papers
-
PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries
PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.
-
OmniTQA: A Cost-Aware System for Hybrid Query Processing over Semi-Structured Data
OmniTQA integrates LLM semantic reasoning as a first-class query operator with classical relational operators in a cost-aware planner for hybrid structured and semi-structured data.
Reference graph
Works this paper leans on
-
[1]
Tanq: An open domain dataset of table an- swered questions.Trans. Assoc. Comput. Linguistics, 13:461–480. Rana Alshaikh, Israa Alghanmi, and Shelan S. Jeawak
-
[2]
Aratable: Benchmarking llms’ reasoning and understanding of arabic tabular data.ArXiv, abs/2507.18442. Shir Ashury-Tahan, Yifan Mai, C Rajmohan, Ariel Gera, Yotam Perlitz, Asaf Yehudai, Elron Bandel, Leshem Choshen, Eyal Shnarch, Percy Liang, and Michal Shmueli-Scheuer. 2025. The mighty torr: A benchmark for table reasoning and robustness.ArXiv, abs/2502....
-
[3]
Ttqa-rs- a break-down prompting approach for multi-hop table-text question answering with rea- soning and summarization.ArXiv, abs/2406.14732. Jacob Beck, Anna Steinberg, Andreas Dimmelmeier, Laia Domenech Burin, Emily Kormanyos, Maurice Fehr, and Malte Schierholz. 2025. Addressing data gaps in sustainability reporting: A benchmark dataset for greenhouse ...
-
[4]
Ttc-quali: A text-table-chart dataset for multi- modal quantity alignment. InProceedings of the 17th ACM International Conference on Web Search and Data Mining, WSDM ’24, page 181–189, New York, NY , USA. Association for Computing Machinery. Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socol- insky, Srini...
-
[5]
TabComp: A dataset for visual table read- ing comprehension. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 5773–5780, Albuquerque, New Mexico. Association for Computational Linguistics. Carlos Gemmell and Jeff Dalton. 2023. ToolWriter: Question specific tool synthesis for tabular data. In Proceedings of the 2023 Conference...
work page 2025
-
[6]
TableLoRA: Low-rank adaptation on table structure understanding for large language models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22376–22391, Vienna, Austria. Association for Computational Linguistics. Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao,...
work page 2023
-
[7]
TaPas: Weakly supervised table parsing via pre-training. InProceedings of the 58th Annual Meet- ing of the Association for Computational Linguistics, pages 4320–4333, Online. Association for Computa- tional Linguistics. Maximiliano Hormazábal-Lagos, Álvaro Bueno Saez, Pedro Alonso Doval, Jorge Alcalde Vesteiro, and Héctor Cerezo-Costas. 2025. Explicit-qa:...
-
[8]
Tables as semi-structured knowledge for ques- tion answering. InProceedings of the 54th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), pages 474–483, Berlin, Germany. Association for Computational Lin- guistics. Deyi Ji, Lanyun Zhu, Siqi Gao, Peng Xu, Hongtao Lu, Jieping Ye, and Feng Zhao. 2024. Tree-of-table: U...
-
[9]
AIT-QA: Question answering dataset over complex tables in the airline industry. InProceed- ings of the 2022 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies: Industry Track, pages 305–314, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics. Rohit Khoja, De...
-
[10]
Tablevqa-bench: A visual question answering benchmark on multiple table domains
Tablevqa-bench: A visual question answer- ing benchmark on multiple table domains.ArXiv, abs/2404.19205. Atsushi Kojima. 2024. Sub-table rescorer for table ques- tion answering. InProceedings of the 2024 Joint International Conference on Computational Linguis- tics, Language Resources and Evaluation (LREC- COLING 2024), pages 15422–15427, Torino, Italia. ...
-
[11]
InProceedings of the 4th Table Representation Learning Workshop, pages 217– 228, Vienna, Austria
Improving table retrieval with question gener- ation from partial tables. InProceedings of the 4th Table Representation Learning Workshop, pages 217– 228, Vienna, Austria. Association for Computational Linguistics. Weizhe Lin, Rexhina Blloshmi, Bill Byrne, Adria de Gispert, and Gonzalo Iglesias. 2023. An inner table retriever for robust table question ans...
-
[12]
LyS at SemEval 2025 task 8: Zero-shot code generation for tabular QA. InProceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 1282–1288, Vienna, Austria. Association for Computational Linguistics. Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, and A. Kalyan. 2022. Dynami...
-
[13]
Question answering over tabular data with DataBench: A large-scale empirical evaluation of LLMs. InProceedings of the 2024 Joint In- ternational Conference on Computational Linguis- tics, Language Resources and Evaluation (LREC- COLING 2024), pages 13471–13488, Torino, Italia. ELRA and ICCL. Vaishali Pal, Evangelos Kanoulas, Andrew Yates, and Maarten de R...
-
[14]
Atakan Site, Emre Hakan Erdemir, and Gül¸ sen Ery- i˘git
Mtabvqa: Evaluating multi-tabular reason- ing of language models in visual space.ArXiv, abs/2506.11684. Atakan Site, Emre Hakan Erdemir, and Gül¸ sen Ery- i˘git. 2025. Itunlp at semeval-2025 task 8: Question- answering over tabular data: A zero-shot approach using llm-driven code generation. Josefa Lia Stoisser, Marc Boubnovski Martell, and Julien Fauqueu...
-
[15]
Matata: Weakly supervised end-to-end mathe- matical tool-augmented reasoning for tabular appli- cations. A. A. Vyatkin and V . D. Oliseenko. 2025. Generating pandas code for big table question answering using large language models. In2025 XXVIII International Conference on Soft Computing and Measurements (SCM), pages 164–166. Hanjun Wang, Wenda Liu, Qun W...
-
[16]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain of thought prompting elicits reasoning in large language models.ArXiv, abs/2201.11903. Cornelius Wolff and Madelon Hulsebos. 2025. How well do llms reason over tabular data, really?ArXiv, abs/2505.07453. Jian Wu, Linyi Yang, Dongyuan Li, Yuliang Ji, Manabu Okumura, and Yue Zhang. 2025a. MMQA: Evaluat- ing LLMs with multi-table multi-hop complex ques...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
CRT-QA: A dataset of complex reasoning question answering over tabular data. InProceed- ings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2131–2153, Singapore. Association for Computational Linguis- tics. Bowen Zhao, Tianhao Cheng, Yuejie Zhang, Ying Cheng, Rui Feng, and Xiaobo Zhang. 2024a. Ct2c- qa: Multimodal questi...
-
[18]
MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6588–6600, Dublin, Ireland. Association for Computational Linguistics. Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu ...
work page 2024
-
[19]
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Seq2sql: Generating structured queries from natural language using reinforcement learning. ArXiv, abs/1709.00103. Bangbang Zhou, Zuan Gao, Zixiao Wang, Boqiang Zhang, Yuxin Wang, Zhineng Chen, and Hongtao Xie. 2025a. SynTab-LLaV A: Enhancing Multimodal Table Understanding with Decoupled Synthesis . In 2025 IEEE/CVF Conference on Computer Vision and Patter...
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.