pith. sign in

arxiv: 2606.06842 · v1 · pith:WWKQWFAFnew · submitted 2026-06-05 · 💻 cs.CL

CRAFT: A Unified Counterfactual Reasoning Framework for Tabular Question Answering and Fact Verification

Pith reviewed 2026-06-27 22:18 UTC · model grok-4.3

classification 💻 cs.CL
keywords counterfactual reasoningtabular question answeringfact verificationlarge language modelstable reasoningbidirectional verificationWikiTQTabFact
0
0 comments X

The pith

CRAFT improves tabular QA and fact verification by constructing counterfactual statement variants and weighting evidence from both original and alternative reasoning paths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework called CRAFT that converts single-direction table reasoning into bidirectional verification. It builds declarative statements from the input, generates their counterfactual counterparts, pulls evidence along both paths, and combines the signals through a weighting step to decide the answer. This is shown to raise performance on WikiTQ and TabFact, with bigger gains on harder questions and smaller differences across LLM backbones. The claim matters because current LLM methods often fail when they cannot test alternatives, and the new process directly supplies those alternatives inside the same model call sequence. If the method holds, structured reasoning tasks move from one-pass inference to explicit hypothesis testing.

Core claim

CRAFT reformulates tabular question answering and fact verification as a general bidirectional verification process. Declarative statements and their counterfactual variants are constructed explicitly; evidence is extracted from reasoning along both the original and counterfactual paths; and the two streams are integrated by a weighted mechanism to produce the final answer. This process is shown to outperform single-direction baselines on WikiTQ and TabFact while narrowing gaps between different backbone LLMs.

What carries the argument

Bidirectional verification that generates counterfactual variants of declarative statements and integrates weighted evidence from both original and alternative reasoning paths.

If this is right

  • Accuracy rises on WikiTQ and TabFact, especially for complex multi-step questions.
  • Performance differences shrink across different LLM backbones.
  • Single-direction inference limits are reduced by explicit alternative-hypothesis testing.
  • The same framework applies to both question answering and fact verification without task-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The weighting step could be replaced by learned fusion to test whether hand-designed weights are necessary.
  • Counterfactual construction might transfer to non-table structured sources such as knowledge graphs or code repositories.
  • The method could be combined with self-consistency sampling to further increase robustness on long tables.

Load-bearing premise

Constructing and reasoning over counterfactual variants, followed by weighted integration, will produce more accurate answers than single-direction reasoning without adding new errors or biases.

What would settle it

Running the full CRAFT pipeline on WikiTQ or TabFact with a fixed LLM backbone and finding that accuracy is equal to or lower than the single-direction baseline.

Figures

Figures reproduced from arXiv: 2606.06842 by Changzai Pan, Chenshuo Pan, Jiayi Liang, Jie Zhang, Shuangyong Song, Yongxiang Li, Yujie Mao, Yu Zhao, Zhenhe Wu, Zhongjiang He.

Figure 1
Figure 1. Figure 1: Overview of CRAFT, a multi-agent framework for table reasoning, Rewriter forms an initial statement [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: WikiTQ and TabFact accuracy as the number [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Repeated-Sampling analyses on Wik￾iTQ/TabFact. (a) Accuracy comparison between voting methods and CRAFT. (b) Ideal(Pass@K) accuracy up￾per bounds compared with our method. their frequencies; and (2) Confidence-Weighted (CW), which also samples N answers but aggre￾gates them by summing exp(score) for each unique candidate and choosing the answer with the highest total weight. Here we choose N = 3 to allow a… view at source ↗
Figure 4
Figure 4. Figure 4: Performance across different table sizes. We partition tables into three size groups with thresh￾olds: For WikiTQ,small (<2000 tokens), medium (2000– 4000), and large (>4000); For TabFact, small (<500 tokens), medium (500–800), and large (>800). effectively track, integrate, and reason over long input contexts (Liu et al., 2023a; Ye et al., 2023). To evaluate the impact of table size on performance, we com… view at source ↗
Figure 5
Figure 5. Figure 5: A Case Study Comparing Self-Critique and Counterfactual Reasoning Paths [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A Case Study Showing how CRAFT get correct answer when Both Reasoning Paths start with a wrong [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Rewriter Prompt Template [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Reverser Prompt Template [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Extractor Prompt Template [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Rethinker Prompt Template [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
read the original abstract

Table reasoning remains challenging for large language models (LLMs), particularly in tasks that require multi-step inference over long and structured tables. Existing approaches predominantly rely on single-direction reasoning, which limits their ability to explore alternative hypotheses across tasks. In this work, we propose CRAFT, a unified Counterfactual Reasoning Framework that reformulates Tabular question answering and fact verification into a general bidirectional verification process. Our method explicitly constructs both declarative statements and their counterfactual variants. Evidence is then extracted from reasoning along both the original and counterfactual paths, and integrated via a weighted mechanism to arrive at the final answer. Experimental results show that our approach consistently surpasses representative baselines on table reasoning datasets such as WikiTQ and TabFact, achieving especially large improvements on complex question answering. Our framework also significantly mitigates performance gaps between different backbone LLMs. This indicates that counterfactual reasoning effectively overcomes the limitations of single-direction inference, guiding LLMs toward more discerning reasoning and establishing a more principled paradigm for structured reasoning tasks. Our code will be made publicly available upon acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes CRAFT, a unified counterfactual reasoning framework for tabular question answering and fact verification. It reformulates the tasks as a bidirectional verification process by explicitly constructing declarative statements and their counterfactual variants, extracting evidence along both original and counterfactual reasoning paths, and integrating the evidence via a weighted mechanism to produce the final answer. The central empirical claims are that this approach consistently outperforms representative baselines on WikiTQ and TabFact (with especially large gains on complex QA) and significantly reduces performance gaps across different LLM backbones.

Significance. If the empirical results hold and are supported by proper ablations and error analysis, the work could be moderately significant for table reasoning, as it offers a concrete mechanism to move beyond single-direction inference and potentially improve robustness and consistency across LLMs. The promised public code release would strengthen reproducibility.

major comments (2)
  1. [Abstract] Abstract: the abstract asserts performance improvements, gap reduction, and superiority on complex QA but supplies no quantitative results, implementation details, ablation studies, or error analysis, so it is impossible to determine whether the data actually support the stated claims.
  2. [Abstract] The central claim depends on the premise that bidirectional counterfactual construction plus weighted integration produces more accurate answers without introducing new sources of error or bias; no evidence is provided to evaluate whether this premise holds or whether the weighting step is robust.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review and comments. We address the two major comments on the abstract below, noting that the abstract provides a high-level summary while the supporting quantitative results, ablations, and analyses appear in the main manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the abstract asserts performance improvements, gap reduction, and superiority on complex QA but supplies no quantitative results, implementation details, ablation studies, or error analysis, so it is impossible to determine whether the data actually support the stated claims.

    Authors: Abstracts are designed to be concise overviews and conventionally omit detailed numbers, implementation specifics, ablations, and error analyses. The manuscript supplies these in full: Section 4 reports the performance gains on WikiTQ and TabFact (including larger gains on complex questions), Section 5 contains the ablation studies on each component, and Section 6 presents the error analysis and LLM-gap reduction results. These sections directly support the claims summarized in the abstract. revision: no

  2. Referee: [Abstract] The central claim depends on the premise that bidirectional counterfactual construction plus weighted integration produces more accurate answers without introducing new sources of error or bias; no evidence is provided to evaluate whether this premise holds or whether the weighting step is robust.

    Authors: The manuscript evaluates this premise through controlled experiments. Main results show consistent accuracy gains from the bidirectional paths and weighted integration over single-direction baselines. Ablations isolate the contribution of counterfactual construction and the weighting mechanism, while error analysis and cross-LLM consistency metrics demonstrate that the approach reduces rather than introduces errors or bias. These evaluations appear in Sections 4–6. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces CRAFT as an empirical framework that constructs counterfactual variants, extracts bidirectional evidence, and applies weighted integration for tabular QA and fact verification. All central claims rest on reported experimental gains versus baselines on WikiTQ and TabFact rather than any mathematical derivation, self-referential definition, or fitted parameter renamed as a prediction. No equations, uniqueness theorems, or self-citation chains appear in the abstract or method description that would reduce the target result to its own inputs by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No concrete free parameters, axioms, or invented entities are described in the abstract; the ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5743 in / 1099 out tokens · 25970 ms · 2026-06-27T22:18:27.414047+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 34 canonical work pages

  1. [1]

    Sengamedu and Christos Faloutsos , journal=

    Xi Fang and Weijie Xu and Fiona Anting Tan and Ziqing Hu and Jiani Zhang and Yanjun Qi and Srinivasan H. Sengamedu and Christos Faloutsos , journal=. Large Language Models (. 2024 , url=

  2. [2]

    Transformers for Tabular Data Representation: A Survey of Models and Applications

    Badaro, Gilbert and Saeed, Mohammed and Papotti, Paolo. Transformers for Tabular Data Representation: A Survey of Models and Applications. Transactions of the Association for Computational Linguistics. 2023. doi:10.1162/tacl_a_00544

  3. [3]

    LLM s instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

    Bavaresco, Anna and Bernardi, Raffaella and Bertolazzi, Leonardo and Elliott, Desmond and Fern \'a ndez, Raquel and Gatt, Albert and Ghaleb, Esam and Giulianelli, Mario and Hanna, Michael and Koller, Alexander and Martins, Andre and Mondorf, Philipp and Neplenbroek, Vera and Pezzelle, Sandro and Plank, Barbara and Schlangen, David and Suglia, Alessandro a...

  4. [4]

    2025 , howpublished =

    OpenAI , title =. 2025 , howpublished =

  5. [5]

    Language Models are Few-Shot Learners , url =

    Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

  6. [6]

    2020 , eprint=

    TabFact: A Large-scale Dataset for Table-based Fact Verification , author=. 2020 , eprint=

  7. [7]

    F in QA : A Dataset of Numerical Reasoning over Financial Data

    Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and Wang, William Yang. F in QA : A Dataset of Numerical Reasoning over Financial Data. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021...

  8. [8]

    2023 , eprint=

    Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=

  9. [9]

    2023 , eprint=

    Binding Language Models in Symbolic Languages , author=. 2023 , eprint=

  10. [10]

    2021 , eprint=

    Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

  11. [11]

    Empowering Language Understanding with Counterfactual Reasoning

    Feng, Fuli and Zhang, Jizhi and He, Xiangnan and Zhang, Hanwang and Chua, Tat-Seng. Empowering Language Understanding with Counterfactual Reasoning. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.196

  12. [12]

    T a P as: Weakly Supervised Table Parsing via Pre-training

    Herzig, Jonathan and Nowak, Pawel Krzysztof and M. T a P as: Weakly Supervised Table Parsing via Pre-training. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.398

  13. [13]

    Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation

    Jeong, Jiwon and Jang, Hyeju and Park, Hogun. Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation. Findings of the Association for Computational Linguistics: NAACL 2025. 2025. doi:10.18653/v1/2025.findings-naacl.384

  14. [14]

    2024 , eprint=

    Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding , author=. 2024 , eprint=

  15. [15]

    Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models

    Kim, Jongho and Hwang, Seung-won. Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2025. doi:10.18653/v1/2025.acl-short.97

  16. [16]

    2024 , eprint=

    The Llama 3 Herd of Models , author=. 2024 , eprint=

  17. [17]

    Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios

    Li, Jiaxuan and Yu, Lang and Ettinger, Allyson. Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023. doi:10.18653/v1/2023.acl-short.70

  18. [18]

    G raph OTTER : Evolving LLM -based Graph Reasoning for Complex Table Question Answering

    Li, Qianlong and Huang, Chen and Li, Shuai and Xiang, Yuanxin and Xiong, Deng and Lei, Wenqiang. G raph OTTER : Evolving LLM -based Graph Reasoning for Complex Table Question Answering. Proceedings of the 31st International Conference on Computational Linguistics. 2025

  19. [19]

    ROUGE : A Package for Automatic Evaluation of Summaries

    Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

  20. [20]

    From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods , journal =

    Jixiong Liu and Yoan Chabot and Raphaël Troncy and Viet-Phi Huynh and Thomas Labbé and Pierre Monnin , keywords =. From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.websem.2022.100761 , url =

  21. [21]

    2022 , eprint=

    TAPEX: Table Pre-training via Learning a Neural SQL Executor , author=. 2022 , eprint=

  22. [22]

    Rethinking Tabular Data Understanding with Large Language Models

    Liu, Tianyang and Wang, Fei and Chen, Muhao. Rethinking Tabular Data Understanding with Large Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.26

  23. [23]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Liu, Nelson F. and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy. Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00638

  24. [24]

    Large language model for table processing: a survey , journal =

    Weizheng LU Jing ZHANG Ju FAN Zihao FU Yueguo CHEN Xiaoyong DU , keywords =. Large language model for table processing: a survey , journal =. 2025 , issn =. doi:https://doi.org/10.1007/s11704-024-40763-6 , url =

  25. [25]

    2025 , eprint=

    PoTable: Towards Systematic Thinking via Stage-oriented Plan-then-Execute Reasoning on Tables , author=. 2025 , eprint=

  26. [26]

    T ab SQL ify: Enhancing Reasoning Capabilities of LLM s Through Table Decomposition

    Nahid, Md Mahadi Hasan and Rafiei, Davood. T ab SQL ify: Enhancing Reasoning Capabilities of LLM s Through Table Decomposition. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.320

  27. [27]

    F e T a QA : Free-form Table Question Answering

    Nan, Linyong and Hsieh, Chiachun and Mao, Ziming and Lin, Xi Victoria and Verma, Neha and Zhang, Rui and Kry \'s ci \'n ski, Wojciech and Schoelkopf, Hailey and Kong, Riley and Tang, Xiangru and Mutuma, Mutethia and Rosand, Ben and Trindade, Isabel and Bandaru, Renusree and Cunningham, Jacob and Xiong, Caiming and Radev, Dragomir. F e T a QA : Free-form T...

  28. [28]

    2023 , eprint=

    LEVER: Learning to Verify Language-to-Code Generation with Execution , author=. 2023 , eprint=

  29. [29]

    2026 , eprint=

    ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios , author=. 2026 , eprint=

  30. [30]

    F a VIQ : FA ct Verification from Information-seeking Questions

    Park, Jungsoo and Min, Sewon and Kang, Jaewoo and Zettlemoyer, Luke and Hajishirzi, Hannaneh. F a VIQ : FA ct Verification from Information-seeking Questions. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.354

  31. [31]

    Compositional Semantic Parsing on Semi-Structured Tables

    Pasupat, Panupong and Liang, Percy. Compositional Semantic Parsing on Semi-Structured Tables. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015. doi:10.3115/v1/P15-1142

  32. [32]

    B leu: a Method for Automatic Evaluation of Machine Translation

    Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. B leu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002. doi:10.3115/1073083.1073135

  33. [33]

    2025 , eprint=

    Qwen2.5 Technical Report , author=. 2025 , eprint=

  34. [34]

    2024 , eprint=

    Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution , author=. 2024 , eprint=

  35. [35]

    2024 , eprint=

    TableGPT2: A Large Multimodal Model with Tabular Data Integration , author=. 2024 , eprint=

  36. [36]

    TAP 4 LLM : Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

    Sui, Yuan and Zou, Jiaru and Zhou, Mengyu and He, Xinyi and Du, Lun and Han, Shi and Zhang, Dongmei. TAP 4 LLM : Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.603

  37. [37]

    2025 , eprint=

    Exchange of Perspective Prompting Enhances Reasoning in Large Language Models , author=. 2025 , eprint=

  38. [38]

    Rating Roulette: Self-Inconsistency in LLM -As-A-Judge Frameworks

    Haldar, Rajarshi and Hockenmaier, Julia. Rating Roulette: Self-Inconsistency in LLM -As-A-Judge Frameworks. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.1361

  39. [39]

    QA - N at V er: Question Answering for Natural Logic-based Fact Verification

    Aly, Rami and Strong, Marek and Vlachos, Andreas. QA - N at V er: Question Answering for Natural Logic-based Fact Verification. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.521

  40. [40]

    Is LLM -as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment

    Raina, Vyas and Liusie, Adian and Gales, Mark. Is LLM -as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.427

  41. [41]

    Proceedings of the International Conference on Learning Representations (ICLR) , year =

    Self-Consistency Improves Chain of Thought Reasoning in Language Models , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

  42. [42]

    Proceedings of the International Conference on Learning Representations (ICLR) , year =

    Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

  43. [43]

    On Positional Bias of Faithfulness for Long-form Summarization

    Wan, David and Vig, Jesse and Bansal, Mohit and Joty, Shafiq. On Positional Bias of Faithfulness for Long-form Summarization. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.442

  44. [44]

    PNAS Nexus , volume =

    Evidence from counterfactual tasks supports emergent analogical reasoning in large language models , author =. PNAS Nexus , volume =. 2025 , month =

  45. [45]

    and Le, Quoc V

    Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , isbn =

  46. [46]

    Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation

    Xie, Yuexiang and Sun, Fei and Deng, Yang and Li, Yaliang and Ding, Bolin. Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation. Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.10

  47. [47]

    Proceedings of INTERSPEECH 2023 , year =

    Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models , author =. Proceedings of INTERSPEECH 2023 , year =

  48. [48]

    Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

    Ye, Yunhu and Hui, Binyuan and Yang, Min and Li, Binhua and Huang, Fei and Li, Yongbin , title =. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2023 , isbn =. doi:10.1145/3539618.3591708 , abstract =

  49. [49]

    Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication

    Yin, Zhangyue and Sun, Qiushi and Chang, Cheng and Guo, Qipeng and Dai, Junqi and Huang, Xuanjing and Qiu, Xipeng. Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.936

  50. [50]

    Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning

    Yu, Peiying and Chen, Guoxin and Wang, Jingjing. Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.853

  51. [51]

    , title =

    Zhang, Yunjia and Henkel, Jordan and Floratou, Avrilia and Cahoon, Joyce and Deep, Shaleen and Patel, Jignesh M. , title =. Proc. VLDB Endow. , month = apr, pages =. 2024 , issue_date =. doi:10.14778/3659437.3659452 , abstract =

  52. [52]

    ALTER : Augmentation for Large-Table-Based Reasoning

    Zhang, Han and Ma, Yuheng and Yang, Hanfang. ALTER : Augmentation for Large-Table-Based Reasoning. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.9

  53. [53]

    Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives

    Zhang, Xinliang Frederick and Beauchamp, Nick and Wang, Lu. Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.963

  54. [54]

    T able LLM : Enabling Tabular Data Manipulation by LLM s in Real Office Usage Scenarios

    Zhang, Xiaokang and Luo, Sijia and Zhang, Bohan and Ma, Zeyao and Zhang, Jing and Li, Yang and Li, Guanlin and Yao, Zijun and Xu, Kangli and Zhou, Jinchang and Zhang-Li, Daniel and Yu, Jifan and Zhao, Shu and Li, Juanzi and Tang, Jie. T able LLM : Enabling Tabular Data Manipulation by LLM s in Real Office Usage Scenarios. Findings of the Association for C...

  55. [55]

    Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning

    Zhang, Yanfang and Sun, Yiliu and Zhan, Yibing and Tao, Dapeng and Tao, Dacheng and Gong, Chen. Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning. Proceedings of the 31st International Conference on Computational Linguistics. 2025

  56. [56]

    Proceedings of the International Conference on Learning Representations (ICLR) , year =

    Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

  57. [57]

    Context-faithful Prompting for Large Language Models

    Zhou, Wenxuan and Zhang, Sheng and Poon, Hoifung and Chen, Muhao. Context-faithful Prompting for Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.968

  58. [58]

    Critic- C o T : Boosting the Reasoning Abilities of Large Language Model via Chain-of-Thought Critic

    Zheng, Xin and Lou, Jie and Cao, Boxi and Wen, Xueru and Ji, Yuqiu and Lin, Hongyu and Lu, Yaojie and Han, Xianpei and Zhang, Debing and Sun, Le. Critic- C o T : Boosting the Reasoning Abilities of Large Language Model via Chain-of-Thought Critic. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.89

  59. [59]

    2025 , eprint=

    Table-R1: Region-based Reinforcement Learning for Table Understanding , author=. 2025 , eprint=

  60. [60]

    Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

  61. [61]

    High-Quality Complex Text-to- SQL Data Generation through Chain-of-Verification

    Zhang, Yuchen and Gao, Yuze and Chen, Bin and Li, Wenfeng and Sun, Shuo and Su, Jian. High-Quality Complex Text-to- SQL Data Generation through Chain-of-Verification. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 2025

  62. [62]

    R e P anda: Pandas-powered Tabular Verification and Reasoning

    Chegini, Atoosa and Rezaei, Keivan and Eghbalzadeh, Hamid and Feizi, Soheil. R e P anda: Pandas-powered Tabular Verification and Reasoning. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1549

  63. [63]

    2025 , eprint=

    Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? , author=. 2025 , eprint=

  64. [64]

    Vicinagearth , volume=

    TableZoomer: a collaborative agent framework for large-scale table question answering , author=. Vicinagearth , volume=. 2025 , doi=