TableMaster: A Recipe to Advance Table Understanding with Language Models
Pith reviewed 2026-05-23 04:13 UTC · model grok-4.3
The pith
TableMaster improves LM table understanding by extracting relevant data, adding semantic context through verbalization, and switching dynamically between textual and symbolic reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TableMaster works by first extracting relevant table content and verbalizing it with enriched semantic context, then applying adaptive reasoning that dynamically selects between textual and symbolic reasoning paths depending on the query, which together address the four identified challenges and produce higher accuracy on table understanding tasks.
What carries the argument
Adaptive reasoning, a mechanism that lets the model switch between textual reasoning and symbolic reasoning for each individual query after the table has been extracted and verbalized.
If this is right
- Language models can reach higher accuracy on table question answering benchmarks such as WikiTQ while using smaller base models like GPT-4o-mini.
- Numerical inaccuracies that arise during textual reasoning are reduced when the system can fall back to symbolic methods for the same query.
- Semantic gaps in raw tables are filled by first converting extracted cells into enriched natural-language descriptions.
- The same extraction-plus-verbalization pipeline can be reused across different language models without task-specific fine-tuning.
Where Pith is reading between the lines
- If the four challenges turn out to be shared across other structured-data formats, the same extraction-verbalization-adaptive sequence could be tested on knowledge graphs or database schemas.
- The reported gains with a compact model suggest the method may lower the compute cost of reliable table reasoning in production settings.
- Re-running the pipeline on table datasets that emphasize different error types, such as those with heavy missing values, would test whether the four challenges remain the dominant ones.
Load-bearing premise
The four listed challenges are the main obstacles to table understanding and that extraction, verbalization, and adaptive reasoning are enough to remove them.
What would settle it
An ablation study on WikiTQ in which removing the adaptive reasoning step leaves accuracy at or above 78.13 percent would show that the switching mechanism is not required for the reported gains.
Figures
read the original abstract
Tables serve as a fundamental format for representing structured relational data. While current language models (LMs) excel at many text-based tasks, they still face challenges in table understanding due to the complex characteristics of tabular data, such as their structured nature. In this paper, we aim to enhance LMs for improved table understanding. We identify four key challenges: 1) difficulty in locating target data, 2) deficiency in table semantics, 3) numerical inaccuracies in textual reasoning, and 4) semantic inflexibility in symbolic reasoning. To address these issues, we propose TableMaster, a recipe and comprehensive framework that integrates multiple solutions to overcome these obstacles. TableMaster first extracts relevant table content and verbalizes it with enriched semantic context. Additionally, we introduce adaptive reasoning, a flexible approach that dynamically adjusts between textual and symbolic reasoning, tailoring the reasoning process to each query. Extensive analyses and experiments demonstrate our findings and the effectiveness of TableMaster. On the WikiTQ dataset, TableMaster achieves an accuracy of 78.13% using GPT-4o-mini, surpassing existing baselines. We hope this work will serve as a practical step toward more robust and reliable table understanding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies four challenges in LM-based table understanding (locating target data, deficient table semantics, numerical inaccuracies in textual reasoning, and semantic inflexibility in symbolic reasoning) and proposes TableMaster, a prompting recipe that extracts relevant table content, verbalizes it with enriched semantics, and applies adaptive reasoning that dynamically switches between textual and symbolic modes. It reports an accuracy of 78.13% on WikiTQ using GPT-4o-mini, claiming this surpasses existing baselines, and positions the work as a practical framework for more robust table understanding.
Significance. If the empirical gains are shown to be robust through controlled experiments, the work could supply a reusable prompting recipe that improves LM handling of structured data, with potential utility for downstream applications in data analysis and question answering. The contribution is primarily empirical rather than theoretical, and its significance hinges on whether the reported accuracy reflects genuine advances rather than unaccounted implementation details.
major comments (2)
- [Abstract] Abstract: The central performance claim (78.13% on WikiTQ with GPT-4o-mini, surpassing baselines) is presented without any baseline scores, ablation results, statistical significance tests, error bars, or implementation specifics for adaptive reasoning, rendering it impossible to evaluate whether the data supports the improvement assertion.
- [Abstract] Abstract: The assumption that the four listed challenges are the primary barriers and that extraction + verbalization + adaptive reasoning are sufficient to overcome them is stated without supporting analysis, references to prior work quantifying these challenges, or discussion of potential confounding factors in model behavior or evaluation protocols.
Simulated Author's Rebuttal
We thank the referee for highlighting issues in the abstract that affect evaluability of our claims. We address both comments below and will revise the abstract accordingly while preserving its conciseness. The main paper already contains the supporting details referenced in our responses.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claim (78.13% on WikiTQ with GPT-4o-mini, surpassing baselines) is presented without any baseline scores, ablation results, statistical significance tests, error bars, or implementation specifics for adaptive reasoning, rendering it impossible to evaluate whether the data supports the improvement assertion.
Authors: We agree the abstract should enable direct evaluation of the performance claim. In revision we will insert the strongest baseline scores from the main results table (e.g., the best prior GPT-4o-mini result) and explicitly state that ablations, statistical significance tests, error bars, and adaptive-reasoning implementation details appear in Sections 4 and 5. Space constraints preclude full error bars in the abstract, but we will reference their presence in the body. revision: yes
-
Referee: [Abstract] Abstract: The assumption that the four listed challenges are the primary barriers and that extraction + verbalization + adaptive reasoning are sufficient to overcome them is stated without supporting analysis, references to prior work quantifying these challenges, or discussion of potential confounding factors in model behavior or evaluation protocols.
Authors: The four challenges are synthesized from prior empirical studies on table reasoning failures; we will add two concise citations in the revised abstract to the works that first quantified numerical inaccuracies in textual reasoning and semantic rigidity in symbolic methods. A dedicated analysis section in the full paper examines confounding factors (model scale, prompt sensitivity, evaluation protocol) with controlled experiments. We will also insert a short clause noting that the sufficiency claim is supported by the ablation study rather than asserted a priori. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's central claim is an empirical performance result (78.13% accuracy on WikiTQ) obtained via a prompting recipe that extracts table content, verbalizes it, and applies adaptive reasoning. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing steps are present in the provided text. The four challenges are stated as observations and addressed through explicit procedural steps whose effectiveness is validated externally on benchmark data rather than by construction or self-reference. The result is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables
TableVision benchmark shows explicit spatial grounding recovers MLLM reasoning on hierarchical tables, delivering 12.3% accuracy improvement through a decoupled perception-reasoning framework.
-
Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning
SpreadsheetAgent uses incremental multi-format reading, structural sketching, and verification to raise spreadsheet benchmark accuracy from 35.27% to 38.16%.
Reference graph
Works this paper leans on
-
[1]
Process mining in healthcare: a systematised literature review
Mahdi Ghasemi and Daniel Amyot. Process mining in healthcare: a systematised literature review. International Journal of Electronic Healthcare, 9(1):60–88, 2016
work page 2016
-
[2]
Gfte: Graph-based financial table extraction, 2020
Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. Gfte: Graph-based financial table extraction, 2020
work page 2020
-
[3]
Textbooks are all you need, 2023
Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tau- man Kalai, Yin Tat Lee, and Yuanzhi Li. Textbooks are all you need, 2023
work page 2023
- [4]
-
[5]
Llama: Open and efficient foundation language models, 2023
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023
work page 2023
-
[6]
Large language models: A survey, 2024
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. Large language models: A survey, 2024
work page 2024
-
[7]
Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, and Bo-Hsiang Tseng. Can large language models understand context? In Yvette Graham and Matthew Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 2004–2018, St. Julian’s, Malta, March 2024. Associat...
work page 2024
-
[8]
Reasoning with large language models, a survey, 2024
Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, and Thomas Back. Reasoning with large language models, a survey, 2024
work page 2024
-
[9]
Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, and Christos Faloutsos. Large language models(llms) on tabular data: Prediction, generation, and understanding – a survey, 2024
work page 2024
-
[10]
A survey of table reasoning with large language models, 2024
Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, and Wanxiang Che. A survey of table reasoning with large language models, 2024
work page 2024
-
[11]
Compositional semantic parsing on semi-structured tables
Panupong Pasupat and Percy Liang. Compositional semantic parsing on semi-structured tables. In Chengqing Zong and Michael Strube, editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, Beijing, China, July
-
[13]
Tabfact: A large-scale dataset for table-based fact verification, 2020
Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. Tabfact: A large-scale dataset for table-based fact verification, 2020
work page 2020
-
[14]
Chain-of-thought prompting elicits reasoning in large language models, 2023
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2023
work page 2023
-
[15]
Griffiths, Yuan Cao, and Karthik Narasimhan
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models, 2023
work page 2023
-
[16]
Large language models are few(1)-shot table reasoners
Wenhu Chen. Large language models are few(1)-shot table reasoners. In Andreas Vlachos and Isabelle Augenstein, editors, Findings of the Association for Computational Linguistics: EACL 2023, pages 1120–1130, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. 10
work page 2023
-
[17]
Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, and Yongbin Li. Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning, 2023
work page 2023
-
[18]
Chain- of-table: Evolving tables in the reasoning chain for table understanding, 2024
Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, and Tomas Pfister. Chain- of-table: Evolving tables in the reasoning chain for table understanding, 2024
work page 2024
-
[19]
Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, and Tao Yu. Binding language models in symbolic languages, 2023
work page 2023
-
[20]
Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov, Wen tau Yih, Sida I. Wang, and Xi Victoria Lin. Lever: Learning to verify language-to-code generation with execution, 2023
work page 2023
-
[21]
Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models, 2022
work page 2022
- [22]
-
[23]
Evaluating open-domain question answering in the era of large language models
Ehsan Kamalloo, Nouha Dziri, Charles Clarke, and Davood Rafiei. Evaluating open-domain question answering in the era of large language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5591–5606, Toronto, Canada, July
-
[25]
Foun- dation models for decision making: Problems, methods, and opportunities, 2023
Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, and Dale Schuurmans. Foun- dation models for decision making: Problems, methods, and opportunities, 2023
work page 2023
-
[26]
Large language models for mathematical reasoning: Progresses and challenges
Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, and Wenpeng Yin. Large language models for mathematical reasoning: Progresses and challenges. In Neele Falk, Sara Papi, and Mike Zhang, editors, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 225–237, St. Ju...
work page 2024
-
[27]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...
work page 2020
-
[28]
Least-to-most prompting enables complex reasoning in large language models, 2023
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, and Ed Chi. Least-to-most prompting enables complex reasoning in large language models, 2023
work page 2023
-
[29]
Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks, 2023
work page 2023
-
[30]
Self-consistency improves chain of thought reasoning in language models, 2023
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models, 2023
work page 2023
-
[31]
Graph of thoughts: Solving elaborate problems with large language models
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. Graph of thoughts: Solving elaborate problems with large language models. Proceed- ings of the AAAI Conference on Artificial Intelligence, 38(16):17682–17690, March 2024
work page 2024
-
[32]
Lang Cao. GraphReason: Enhancing reasoning capabilities of large language models through a graph-based verification approach. In Bhavana Dalvi Mishra, Greg Durrett, Peter Jansen, Ben Lipkin, Danilo Neves Ribeiro, Lionel Wong, Xi Ye, and Wenting Zhao, editors,Proceedings of the 2nd Workshop on Natural Language Reasoning and Structured Explanations (@ACL 20...
work page 2024
-
[33]
Let’s verify step by step, 2023
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step, 2023
work page 2023
-
[34]
Solving math word problems with process- and outcome-based feedback, 2022
Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, and Irina Higgins. Solving math word problems with process- and outcome-based feedback, 2022
work page 2022
-
[35]
Bert: Pre-training of deep bidirectional transformers for language understanding, 2019
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019
work page 2019
-
[36]
K., M¨uller, T., Piccinno, F., and Eisen- schlos, J
Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, rancesco Piccinno, and Julian Martin Eisenschlos. TAPAS: weakly supervised table parsing via pre-training. CoRR, abs/2004.02349, 2020
-
[37]
Pasta: Table- operations aware fact verification via sentence-table cloze pre-training, 2022
Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, and Xiaoyong Du. Pasta: Table- operations aware fact verification via sentence-table cloze pre-training, 2022
work page 2022
-
[38]
Tuta: Tree-based transformers for generally structured table pre-training
Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, and Dongmei Zhang. Tuta: Tree-based transformers for generally structured table pre-training. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1780–1790, 2021
work page 2021
-
[39]
Tapex: Table pre-training via learning a neural sql executor, 2022
Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, and Jian-Guang Lou. Tapex: Table pre-training via learning a neural sql executor, 2022
work page 2022
-
[40]
Tablellama: Towards open large generalist models for tables, 2024
Tianshu Zhang, Xiang Yue, Yifei Li, and Huan Sun. Tablellama: Towards open large generalist models for tables, 2024
work page 2024
-
[41]
Tablegpt: Towards unifying tables, nature language and commands into one gpt, 2023
Liangyu Zha, Junlin Zhou, Liyao Li, Rui Wang, Qingyi Huang, Saisai Yang, Jing Yuan, Changbao Su, Xiang Li, Aofeng Su, Tao Zhang, Chen Zhou, Kaizhe Shou, Miao Wang, Wufang Zhu, Guoshan Lu, Chao Ye, Yali Ye, Wentao Ye, Yiming Zhang, Xinglong Deng, Jie Xu, Haobo Wang, Gang Chen, and Junbo Zhao. Tablegpt: Towards unifying tables, nature language and commands ...
work page 2023
-
[42]
Huang, Jie Fu, Xiang Yue, and Wenhu Chen
Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, and Wenhu Chen. Structlm: Towards building generalist models for structured knowledge grounding, 2024
work page 2024
-
[43]
Potable: Program- ming standardly on table-based reasoning like a human analyst, 2024
Qingyang Mao, Qi Liu, Zhi Li, Mingyue Cheng, Zheng Zhang, and Rui Li. Potable: Program- ming standardly on table-based reasoning like a human analyst, 2024
work page 2024
-
[44]
TabSQLify: Enhancing reasoning capabilities of LLMs through table decomposition
Md Nahid and Davood Rafiei. TabSQLify: Enhancing reasoning capabilities of LLMs through table decomposition. In Kevin Duh, Helena Gomez, and Steven Bethard, editors, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages 5725–5737, Mexic...
work page 2024
-
[45]
Yunjia Zhang, Jordan Henkel, Avrilia Floratou, Joyce Cahoon, Shaleen Deep, and Jignesh M. Patel. Reactable: Enhancing react for table question answering, 2023
work page 2023
-
[46]
Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, and Dongmei Zhang. TAP4LLM: Table provider on sampling, augmenting, and packing semi-structured data for large language model reasoning. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10306–10323, Miami,...
work page 2024
-
[47]
Tree-of-table: Unleashing the power of llms for enhanced large-scale table understanding, 2024
Deyi Ji, Lanyun Zhu, Siqi Gao, Peng Xu, Hongtao Lu, Jieping Ye, and Feng Zhao. Tree-of-table: Unleashing the power of llms for enhanced large-scale table understanding, 2024
work page 2024
-
[48]
Rethinking tabular data understanding with large language models
Tianyang Liu, Fei Wang, and Muhao Chen. Rethinking tabular data understanding with large language models. In Kevin Duh, Helena Gomez, and Steven Bethard, editors, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 450–482, Mexico City...
work page 2024
-
[49]
Encoding spreadsheets for large language models
Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Junyu Xiong, Mengyu Zhou, Yun Lin, José Cam- bronero, Yeye He, Shi Han, and Dongmei Zhang. Encoding spreadsheets for large language models. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20728–20748, Miami, F...
work page 2024
-
[50]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transac- tions on Information Systems, nov 2024
work page 2024
-
[51]
Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024
work page 2024
-
[52]
Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, and Dongmei Zhang. Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks, 2022
work page 2022
-
[53]
Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipanjan Das
Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipanjan Das. Totto: A controlled table-to-text generation dataset, 2020
work page 2020
-
[54]
Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, and Faramarz Fekri. Can LLMs reason in the wild with programs? In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9806–9829, Miami, Florida, USA, November 2024. Association for Computational Linguistics
work page 2024
-
[55]
Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models
Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, and Ee-Peng Lim. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long...
work page 2023
-
[56]
Lang Cao. Learn to refuse: Making large language models more controllable and reliable through knowledge scope limitation and refusal mechanism. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 3628–3646, Miami, Florida, USA, November
work page 2024
-
[57]
Association for Computational Linguistics
-
[58]
Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, and Xuanjing Huang. Do large language models know what they don’t know? In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665, Toronto, Canada, July 2023. Association for Computational Linguistics
work page 2023
-
[59]
Evaluating the text-to-sql capabilities of large language models, 2022
Nitarshan Rajkumar, Raymond Li, and Dzmitry Bahdanau. Evaluating the text-to-sql capabilities of large language models, 2022
work page 2022
-
[60]
FeTaQA: Free-form table question answering
Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kry´sci´nski, Hailey Schoelkopf, Riley Kong, Xiangru Tang, Mutethia Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev, and Dragomir Radev. FeTaQA: Free-form table question answering. Transactions of the Association...
work page 2022
-
[61]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Compu...
work page 2002
-
[62]
ROUGE: A package for automatic evaluation of summaries
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summariza- tion Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics
work page 2004
-
[63]
HiTab: A hierarchical table dataset for question answering and natural language generation
Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, and Dongmei Zhang. HiTab: A hierarchical table dataset for question answering and natural language generation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Lingui...
work page 2022
-
[64]
Finqa: A dataset of numerical reasoning over financial data
Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, and William Yang Wang. Finqa: A dataset of numerical reasoning over financial data. Proceedings of EMNLP 2021, 2021. 13
work page 2021
-
[65]
Medec: A benchmark for medical error detection and correction in clinical notes, 2025
Asma Ben Abacha, Wen wai Yim, Yujuan Fu, Zhaoyi Sun, Meliha Yetisgen, Fei Xia, and Thomas Lin. Medec: A benchmark for medical error detection and correction in clinical notes, 2025
work page 2025
-
[66]
NormTab: Improving symbolic reasoning in LLMs through tabular data normalization
Md Mahadi Hasan Nahid and Davood Rafiei. NormTab: Improving symbolic reasoning in LLMs through tabular data normalization. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3569–3585, Miami, Florida, USA, November 2024. Association for Computational Linguistics
work page 2024
-
[67]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...
work page 2021
-
[68]
Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Z...
work page 2023
-
[69]
Benchmarking large language model capabilities for conditional generation
Joshua Maynez, Priyanka Agrawal, and Sebastian Gehrmann. Benchmarking large language model capabilities for conditional generation. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 9194–9213, Toronto, Canada, July 2023. A...
work page 2023
-
[70]
MultiCoT: Chain-of-table reasoning with multiple tables
CYQIQ. MultiCoT: Chain-of-table reasoning with multiple tables. https://github.com/ CYQIQ/MultiCoT, 2025. GitHub repository
work page 2025
-
[71]
Zhehao Zhang, Yan Gao, and Jian-Guang Lou. e5: Zero-shot hierarchical table analysis using augmented LLMs via explain, extract, execute, exhibit and extrapolate. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolog...
work page 2024
-
[72]
Target-data localization
-
[73]
Numerical inaccuracy in textual reasoning
-
[74]
Semantic rigidity in symbolic reasoning For each challenge we propose a dedicated, minimal solution, whereas earlier efforts typically address only one point and overlook the others. Efficient subtable extraction and symbolic reasoning. Prior systems rely on elaborate heuristics for subtable extraction, which often lose information. TableMaster instead co...
-
[75]
Code Execution WinsPointsTeamCountryRiderPlace 33066SuzukiBelgiumSylvain Geboers1 22331MaicoGermanyAdolf Weil2 02052HusqvarnaSwedenTorlief Hansen3 31865SuzukiBelgiumRoger De Coster4 11730SuzukiBelgiumJoel Robert5 21680HusqvarnaFinlandHeikki Mikkola6 01276MaicoGermanyWilly Bauer7 01112ČZBelgiumGaston Rahier8 01110HusqvarnaNetherlandsPierre Karsmakers9 0107...
-
[76]
Identify the Relevant Column–Locate the “Wins” column (C).2. Extract the Wins Data–Retrieve the win values for Belgian riders.3. Convert Wins to Numeric Values–Ensure all values are in numeric format.4. Sum the Wins–Add up the total number of wins.5. Calculate the Total–Perform the addition.6. Verify the Calculation–Double-check for accuracy.7. Present th...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.