ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering

Huajun Chen; Songze Li; Wen Zhang; Xiaoke Guo; Yuanxiang Liu; Zhaoyan Gong; Zhiqiang Liu

arxiv: 2604.08999 · v1 · submitted 2026-04-10 · 💻 cs.CL · cs.AI· cs.LG

ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering

Xiaoke Guo , Songze Li , Zhiqiang Liu , Zhaoyan Gong , Yuanxiang Liu , Huajun Chen , Wen Zhang This is my paper

Pith reviewed 2026-05-10 17:42 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords table question answeringlogical semantic treesadaptive serializationdual-mode reasoninglarge language modelshierarchical structure

0 comments

The pith

Reconstructing tables as adaptive logical semantic trees lets LLMs reach state-of-the-art accuracy on complex question answering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that turning tables into plain text for large language models discards hierarchies and makes reasoning steps hard to check. It introduces a two-part method: first an adaptive process that has the model itself rebuild the table as a tree showing explicit parent-child links and scaling the construction to table size, then a dual reasoning step that searches the tree in language while also running code for exact verification. If the reconstruction succeeds and the two reasoning modes reinforce each other, models can handle multi-level data questions without the usual loss of structure or unverifiable mistakes.

Core claim

ASTRA uses AdaSTR to let LLMs globally reconstruct tables into Logical Semantic Trees that model hierarchical dependencies explicitly and adapt construction strategies to table scale, then applies DuTR to combine tree-search textual navigation for linguistic alignment with symbolic code execution for precise verification, producing state-of-the-art results on complex table benchmarks.

What carries the argument

Logical Semantic Trees, which explicitly encode table hierarchies and are built adaptively by LLMs to close representation gaps before dual-mode reasoning begins.

Load-bearing premise

Large language models can reliably turn tables into logical semantic trees that capture every relevant hierarchy without introducing reconstruction errors.

What would settle it

Run the same benchmark questions on the identical base LLM once with standard table serialization and once with the automatically generated Logical Semantic Trees; a negligible accuracy gap would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.08999 by Huajun Chen, Songze Li, Wen Zhang, Xiaoke Guo, Yuanxiang Liu, Zhaoyan Gong, Zhiqiang Liu.

**Figure 2.** Figure 2: Overview of the Adaptive Semantic Tree Reconstruction process. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the Dual-Mode Tree Reasoning process. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Performance breakdown by question type and [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Visualization of the Operating Expenses Analysis table. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Relationship between evaluation metric scores [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Statistics of error categories by reasoning failure mode. [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

read the original abstract

Table serialization remains a critical bottleneck for Large Language Models (LLMs) in complex table question answering, hindered by challenges such as structural neglect, representation gaps, and reasoning opacity. Existing serialization methods fail to capture explicit hierarchies and lack schema flexibility, while current tree-based approaches suffer from limited semantic adaptability. To address these limitations, we propose ASTRA (Adaptive Semantic Tree Reasoning Architecture) including two main modules, AdaSTR and DuTR. First, we introduce AdaSTR, which leverages the global semantic awareness of LLMs to reconstruct tables into Logical Semantic Trees. This serialization explicitly models hierarchical dependencies and employs an adaptive mechanism to optimize construction strategies based on table scale. Second, building on this structure, we present DuTR, a dual-mode reasoning framework that integrates tree-search-based textual navigation for linguistic alignment and symbolic code execution for precise verification. Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes ASTRA, an architecture for complex table question answering consisting of two modules: AdaSTR, which uses LLMs' global semantic awareness to reconstruct tables into Logical Semantic Trees with an adaptive mechanism that optimizes construction based on table scale, and DuTR, a dual-mode reasoning framework combining tree-search-based textual navigation for linguistic alignment with symbolic code execution for precise verification. The central claim is that this approach overcomes limitations in table serialization (structural neglect, representation gaps, reasoning opacity) and achieves state-of-the-art performance on complex table benchmarks.

Significance. If the experimental claims hold with proper validation, the work could offer a practical advance in handling hierarchical dependencies in tables for LLMs by combining adaptive tree construction with verifiable dual-mode reasoning. The adaptive scaling in AdaSTR and the integration of textual and symbolic paths in DuTR address real bottlenecks in current serialization methods. However, the absence of any reported metrics, baselines, ablations, or reconstruction-quality checks in the manuscript as described substantially weakens the ability to assess whether these contributions deliver measurable gains.

major comments (2)

[Abstract] Abstract: The statement that 'Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance' is made without any quantitative results, specific benchmark names, baseline comparisons, ablation studies, or error analysis. This renders the central empirical claim unsupported and load-bearing for the paper's contribution.
[AdaSTR] AdaSTR module description: The reconstruction of tables into Logical Semantic Trees is asserted to 'explicitly model hierarchical dependencies' via LLM global awareness and adaptive scaling, yet no fidelity metrics (e.g., tree-edit distance, structural accuracy rates, or human validation scores on complex tables) are provided to confirm that the trees capture all relevant dependencies without hallucinations or omissions. This assumption is load-bearing for both the serialization improvement and the downstream DuTR gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and will revise the paper to strengthen the empirical presentation and validation of key components.

read point-by-point responses

Referee: [Abstract] Abstract: The statement that 'Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance' is made without any quantitative results, specific benchmark names, baseline comparisons, ablation studies, or error analysis. This renders the central empirical claim unsupported and load-bearing for the paper's contribution.

Authors: We agree that the abstract would benefit from greater specificity to support the SOTA claim. In the revised version, we will expand the abstract to name the benchmarks (e.g., WikiTableQuestions, TabFact, and others from the complex table QA suite), report key performance deltas against baselines, and briefly reference ablation findings. The full manuscript already contains these quantitative details in the Experiments section, but we will ensure the abstract is self-contained and evidence-based. revision: yes
Referee: [AdaSTR] AdaSTR module description: The reconstruction of tables into Logical Semantic Trees is asserted to 'explicitly model hierarchical dependencies' via LLM global awareness and adaptive scaling, yet no fidelity metrics (e.g., tree-edit distance, structural accuracy rates, or human validation scores on complex tables) are provided to confirm that the trees capture all relevant dependencies without hallucinations or omissions. This assumption is load-bearing for both the serialization improvement and the downstream DuTR gains.

Authors: We acknowledge the need for direct validation of the Logical Semantic Tree quality. The current submission emphasizes end-to-end task performance rather than intermediate reconstruction metrics. In revision, we will add an analysis subsection (or appendix) reporting tree fidelity measures such as structural similarity scores, tree-edit distance on sampled tables, and qualitative examples of hierarchical dependency capture. This will explicitly address potential hallucinations or omissions and better justify the contribution of AdaSTR. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architecture adds independent modules to LLMs

full rationale

The paper describes ASTRA as a new architecture with AdaSTR (LLM-driven Logical Semantic Tree reconstruction with adaptive scaling) and DuTR (dual-mode textual navigation plus symbolic execution). No equations, fitted parameters, or first-principles derivations appear that could reduce to inputs by construction. Claims rest on experimental SOTA results rather than self-referential predictions or self-citation chains. The method is presented as an additive extension of existing LLMs, with no load-bearing steps that rename fits as predictions or smuggle ansatzes via prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that current LLMs already possess sufficient global semantic awareness to build accurate logical semantic trees; no free parameters, invented entities, or additional axioms are identifiable from the abstract alone.

axioms (1)

domain assumption LLMs possess global semantic awareness sufficient to reconstruct table hierarchies accurately and adaptively
Directly invoked to justify the AdaSTR module in the abstract.

pith-pipeline@v0.9.0 · 5478 in / 1126 out tokens · 113233 ms · 2026-05-10T17:42:17.696965+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AdaSTR leverages LLMs to reconstruct tables into Logical Semantic Trees... adaptive mechanism to optimize construction strategies based on table scale... DuTR integrates tree-search-based textual navigation... symbolic code execution
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Large language models (llms) on tabular data: Prediction, generation, and understanding–a survey.arXiv preprint arXiv:2402.17944, 2024

Large language models(llms) on tabular data: Prediction, generation, and understanding – a survey. Preprint, arXiv:2402.17944. Xinyi He, Yihao Liu, Mengyu Zhou, Yeye He, Haoyu Dong, Shi Han, Zejian Yuan, and Dongmei Zhang

work page arXiv
[2]

Preprint, arXiv:2503.04396

Tablelora: Low-rank adaptation on table structure understanding for large language models. Preprint, arXiv:2503.04396. Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shan- tanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. Ruler: What’s the real context size of your long-context language models? Preprint, arXiv:2404.06654. Yannis Kats...

work page arXiv 2024
[3]

AIT-QA: Question answering dataset over complex tables in the airline industry. InProceed- ings of the 2022 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies: Industry Track, pages 305–314, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics. Rohit Khoja, De...

work page 2022
[4]

InProceedings of the Fourth Work- shop on Scholarly Document Processing (SDP 2024), pages 293–306, Bangkok, Thailand

Integrating table representations into large language models for improved scholarly document comprehension. InProceedings of the Fourth Work- shop on Scholarly Document Processing (SDP 2024), pages 293–306, Bangkok, Thailand. Association for Computational Linguistics. Liyao Li, Chao Ye, Wentao Ye, Yifei Sun, Zhe Jiang, Haobo Wang, Jiaming Tian, Yiming Zha...

work page 2024
[5]

Root2Leaf

Table as a modality for large language models. Preprint, arXiv:2512.00947. Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dong- mei Zhang, and Surajit Chaudhuri. 2023. Table-gpt: Table-tuned gpt for diverse table tasks.Preprint, arXiv:2310.09263. Qianlong Li, Chen Huang, Shuai Li, Yuanxin Xiang, Deng Xiong, a...

work page arXiv 2023
[6]

Locate relevant data within the table

work page
[7]

A" or "B

Compare the consistency of both answers against the table data. Output Requirements: Do not output any explanations, punctuation, or analysis processes. Strictly output ONLY a single character: "A" or "B". The Correct Answer: We also conduct preliminary explorations on improving the selector; details are provided in Ap- pendix G. A.3 Implementation of Eva...

work page 2023
[8]

Textual reasoning (End-to-End approaches). Textual reasoning treats TableQA as conditional generation: (ˆy,ˆa) =LLM q∥Serialize(T) ,(2) where Serialize(·) linearizes the table into a token sequence (e.g., Markdown/CSV/TSV , row- wise templates, or hierarchical header strings). The LLM produces a natural-language reasoning trace ˆy(optional) and the final ...

work page 2023
[9]

None") and structural corruption in the

Symbolic reasoning (Program-aided ap- proaches).Symbolic reasoning explicitly pro- duces an executable program (e.g., SQL, pan- das/Python) whose execution yields the final an- swer: ˆp=LLM q∥Schema(T) ˆa=Exec(ˆp, T) (3) This paradigm includes classical semantic pars- ing (text-to-SQL) and modern LLM tool-use vari- ants where the model generates code and ...

work page 2023
[10]

Hybrid reasoning (Textual⊕ Symbolic).Hy- brid systems integrate the semantic flexibility of textual reasoning with the precision of symbolic execution, typically employing paradigms such as adaptive routingfor dynamic selection (Liu et al., 2023b; Zhang et al., 2024a),interleaved modular- ityfor step-wise refinement (Khoja et al., 2025) to mitigate halluc...

work page 2025
[11]

DSP (Default): If the estimated token foot- print fits within the context budget, we use direct generation for maximum semantic fi- delity. S≤B(9)

work page
[12]

S > B∧n≤n high ∧(¯s > µ∨r long > η) (10)

SRE (Density-First): If the table exceeds the budget but the scale is not massive (cell count is manageable), we attribute the overflow pri- marily to verbose content (e.g., high long-cell ratio) and switch to Symbolic Reference En- coding for compression via address placehold- ers. S > B∧n≤n high ∧(¯s > µ∨r long > η) (10)

work page
[13]

PSS is most effective for hyperscale ta- bles because loop-based code expands large structures more reliably than token-by-token enumeration

PSS (Scale-First): If the table exceeds the budget and the number of cells is massive, we switch to Programmatic Structure Synthe- sis. PSS is most effective for hyperscale ta- bles because loop-based code expands large structures more reliably than token-by-token enumeration. S > B∧n > n high (11) If S > B but neither SRE nor PSS conditions are strictly ...

work page
[14]

Information Coverage.This metric measures the completeness of the information transfer from the tabular structure to the tree structure. It is cal- culated as the ratio of original table cells whose content is successfully represented in the generated tree nodes: Coverage= |Cmapped| |Ctotal| (12) where Cmapped denotes the set of cells found in the tree an...

work page
[15]

We employ a bottom-up path verification strategy

Structural Integrity.This metric evaluates the correctness of the hierarchical relationships in the generated tree. We employ a bottom-up path verification strategy. For every data leaf node (value) in the tree, we trace the path back to the ROOT. The validity of a path is determined as fol- lows: • Initialization: Start from the Leaf node. Let the Leaf b...

work page
[16]

If the current node is in thesame row or same columnas the Leaf node, the rela- tionship is valid; continue to the parent

work page
[17]

If yes, the relationship is valid (transitive alignment); continue

If not, check if the current node is in the same row or same columnas its imme- diate child node (the node just visited). If yes, the relationship is valid (transitive alignment); continue

work page
[18]

Gender," yet the raw data often represents this as a null value. Conse- quently, during structural evaluation, items (e.g.,

If neither condition is met, the path is deemedStructurally Broken, and veri- fication terminates. • Success: If the traversal reaches the ROOT without error, the path is valid. TheStructural Integrityscore is the percentage of valid paths out of all leaf-to-root paths. Discussion on Evaluation Metrics.(1)Merged Cell Representation:Many table datasets lac...

work page 2023
[19]

3 Textual)

Structural Misalignment (36.7%).While this error affects both paradigms, it is predomi- nantly a Symbolic failure (12 cases vs. 3 Textual). The Symbolic module struggles because code gen- erators often hallucinate a flattened schema for deeply nested trees (e.g., missing a parent key like Loan Bank and iterating directly over children). Textual reasoning ...

work page
[20]

Both Wrong

Annotation Errors (26.7%).This category represents a mode-agnostic failure, heavily con- centrated in the "Both Wrong" intersection (12 cases). Unexpectedly, a quarter of failures were actually correct model predictions penalized by flawed datasets. For instance, given the query"vari- ance rate", both models correctly extracted the percentage (-1.4), but ...

work page
[21]

This error manifests in distinct forms across the two modes

Arithmetic & Logic Hallucination (20.0%). This error manifests in distinct forms across the two modes. ForTextual Mode(11 cases), it is aCalcu- lation Failure: the model retrieves correct numbers but drifts during long-chain floating-point aggrega- tion (e.g., 26.8+. . . ) due to the lack of a calculator. However,Symbolic Modeis not immune (4 cases); it s...

work page
[22]

granularity mismatch

Retrieval Bias (16.7%).This error reflects a "granularity mismatch" where models select high- level summary nodes instead of specific leaves. It is more prevalent in Textual Mode (4 cases) due to “semantic short-circuiting”—the model stops read- ing at the first keyword match (e.g., "Total Expen- diture") ignoring temporal constraints.Symbolic Modealso en...

work page arXiv 2023
[23]

Total Students

**Strict Preservation of Original Wording**: You must preserve the original text from the table cells. It is forbidden to create new names or summaries. (For example: if a cell contains "Total Students", the header must include "Total Students", not "Total Students Metrics".)

work page
[24]

The part before " - " must be the most specific level

**Hierarchical Combination**: If header information is distributed across multiple header rows, combine them using the format: [Lower-Level Header] - [Upper-Level Header]. The part before " - " must be the most specific level

work page
[25]

Hierarchy Keys

**Strict Column Count Matching**: The number of strings in the output array must exactly match the number of columns in the data rows. The table is provided in JSON array form. [Input Table]: {TABLE_AS_JSON_STRING} Your output must be a single, valid JSON string array representing the normalized headers. Do not provide any explanation. Listing 2: AdaSTR: ...

work page 2023
[26]

total",

**Block headers + detail rows in the same column**: - Some rows in a text column look like high-level categories and have many empty cells or missing numeric values in other columns. - The rows immediately following such a row contain repeated labels such as "total", "weekday", "weekend", etc. with numeric values filled. - The same small set of labels ("t...

work page
[27]

group header

**Strong repetition of small categorical sets**: - A column alternates between "group header" values and a small fixed set of "detail" values in a patterned way. - This usually indicates a hierarchy like: Behaviour Group > Behaviour Detail. In such cases: - You should set "table_type" to "complex", even if there is only one text column. - The **hierarchy_...

work page
[28]

There is no obvious multi-row header structure

work page
[29]

There is no clear row-based block pattern as described in Step 3

work page
[30]

Each row is largely independent

work page
[31]

- Classify as **complex** if ANY of the following holds:

Typically there is at most one main categorical column and the others are mostly numeric measures. - Classify as **complex** if ANY of the following holds:

work page
[32]

There are multiple categorical columns that naturally form a chain

work page
[33]

Some categorical columns have heavily repeated values and clearly act as grouping keys

work page
[34]

There are apparent aggregation / subtotal / summary rows

work page
[35]

table_type

There is a row-based block structure as described in Step 3. ## Output Requirements Your output must be a single, valid JSON object, and must strictly follow the structure below: { "table_type": "simple" or "complex", "analysis_reason": "A brief explanation of the reasoning behind your judgment", "hierarchy_keys": ["header1", "header2", ...], "value_leave...

work page
[36]

[Header Name] - [Cell Value]

Semantic Hierarchy Keys - For each hierarchy_key column, the corresponding key in the JSON must follow the format "[Header Name] - [Cell Value]". - You must use " - " (space, hyphen, space) as the separator. - Example: if the header is "Grade" and the cell value is "1", then the JSON key should be "Grade - 1"

work page
[37]

Strict Preservation of Leaf Node Names - The keys of value_leaves in the final JSON must exactly match the normalized headers

work page
[38]

- For each data row, use the hierarchy_keys (in order) to create or move into the corresponding nested structure

General Structure Generation Rules - Traverse the data row by row, skipping the header row. - For each data row, use the hierarchy_keys (in order) to create or move into the corresponding nested structure. - At the deepest level for a given row, create an object that stores all value_leaves for that row

work page
[39]

) For simple tables, you should keep the structure shallow and flat: - Typically there is only one hierarchy_key. - For each data row: - Construct a single key by applying the

Handling of Simple Tables (table_type = "simple") For simple tables, you should keep the structure shallow and flat: - Typically there is only one hierarchy_key. - For each data row: - Construct a single key by applying the "[Header Name] - [Cell Value]" rule to that hierarchy_key column. - Store an object under that key with all value_leaves for that row

work page
[40]

Handling of Complex Tables (table_type = "complex") For complex tables, you MUST allow multi-level nesting, even if there is only one hierarchy_key column. 5.1 Column-based hierarchies - When there are multiple hierarchy_keys across different columns, you should: - For each row, create or move into a nested path defined by those columns in order (e.g. Reg...

work page
[41]

- Every non-header row must contribute at least one object into the JSON tree

No Data Omission - You must not omit any cell content from the input table. - Every non-header row must contribute at least one object into the JSON tree

work page
[42]

- Do NOT rename, summarize, or translate headers or cell values

Prohibited Behaviors - Do NOT invent new textual labels that do not appear in the original table or headers. - Do NOT rename, summarize, or translate headers or cell values. [Input Table]: {TABLE_AS_JSON_STRING} [Normalized Headers]: {NORMALIZED_HEADERS_FROM_STEP_1} [Hierarchy Definition]: {HIERARCHY_DEFINITION_FROM_STEP_2} Your output must be a single, v...

work page
[43]

Please enclose your final answer in square brackets, e.g., [Answer]

work page
[44]

Which"** or **

**Pay strict attention to the required answer type.** - If the question starts with **"Which"** or **"What"** (e.g., "Which item?", "What category?"), your answer must be a *name or description* (e.g., "Sell Product A"), **NOT** the associated numerical value. - If the question starts with **"How much"**, **"What is the value"**, or **"How many"**, the an...

work page
[45]

units" (e.g.,

**ONLY when** the answer is numerical (as per rule 2), you must check for any associated "units" (e.g., "ten thousand", "million") and provide the final converted numerical result. ## Table: {table_str} ## Question: {question} ## Final Answer:

work page

[1] [1]

Large language models (llms) on tabular data: Prediction, generation, and understanding–a survey.arXiv preprint arXiv:2402.17944, 2024

Large language models(llms) on tabular data: Prediction, generation, and understanding – a survey. Preprint, arXiv:2402.17944. Xinyi He, Yihao Liu, Mengyu Zhou, Yeye He, Haoyu Dong, Shi Han, Zejian Yuan, and Dongmei Zhang

work page arXiv

[2] [2]

Preprint, arXiv:2503.04396

Tablelora: Low-rank adaptation on table structure understanding for large language models. Preprint, arXiv:2503.04396. Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shan- tanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. Ruler: What’s the real context size of your long-context language models? Preprint, arXiv:2404.06654. Yannis Kats...

work page arXiv 2024

[3] [3]

AIT-QA: Question answering dataset over complex tables in the airline industry. InProceed- ings of the 2022 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies: Industry Track, pages 305–314, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics. Rohit Khoja, De...

work page 2022

[4] [4]

InProceedings of the Fourth Work- shop on Scholarly Document Processing (SDP 2024), pages 293–306, Bangkok, Thailand

Integrating table representations into large language models for improved scholarly document comprehension. InProceedings of the Fourth Work- shop on Scholarly Document Processing (SDP 2024), pages 293–306, Bangkok, Thailand. Association for Computational Linguistics. Liyao Li, Chao Ye, Wentao Ye, Yifei Sun, Zhe Jiang, Haobo Wang, Jiaming Tian, Yiming Zha...

work page 2024

[5] [5]

Root2Leaf

Table as a modality for large language models. Preprint, arXiv:2512.00947. Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dong- mei Zhang, and Surajit Chaudhuri. 2023. Table-gpt: Table-tuned gpt for diverse table tasks.Preprint, arXiv:2310.09263. Qianlong Li, Chen Huang, Shuai Li, Yuanxin Xiang, Deng Xiong, a...

work page arXiv 2023

[6] [6]

Locate relevant data within the table

work page

[7] [7]

A" or "B

Compare the consistency of both answers against the table data. Output Requirements: Do not output any explanations, punctuation, or analysis processes. Strictly output ONLY a single character: "A" or "B". The Correct Answer: We also conduct preliminary explorations on improving the selector; details are provided in Ap- pendix G. A.3 Implementation of Eva...

work page 2023

[8] [8]

Textual reasoning (End-to-End approaches). Textual reasoning treats TableQA as conditional generation: (ˆy,ˆa) =LLM q∥Serialize(T) ,(2) where Serialize(·) linearizes the table into a token sequence (e.g., Markdown/CSV/TSV , row- wise templates, or hierarchical header strings). The LLM produces a natural-language reasoning trace ˆy(optional) and the final ...

work page 2023

[9] [9]

None") and structural corruption in the

Symbolic reasoning (Program-aided ap- proaches).Symbolic reasoning explicitly pro- duces an executable program (e.g., SQL, pan- das/Python) whose execution yields the final an- swer: ˆp=LLM q∥Schema(T) ˆa=Exec(ˆp, T) (3) This paradigm includes classical semantic pars- ing (text-to-SQL) and modern LLM tool-use vari- ants where the model generates code and ...

work page 2023

[10] [10]

Hybrid reasoning (Textual⊕ Symbolic).Hy- brid systems integrate the semantic flexibility of textual reasoning with the precision of symbolic execution, typically employing paradigms such as adaptive routingfor dynamic selection (Liu et al., 2023b; Zhang et al., 2024a),interleaved modular- ityfor step-wise refinement (Khoja et al., 2025) to mitigate halluc...

work page 2025

[11] [11]

DSP (Default): If the estimated token foot- print fits within the context budget, we use direct generation for maximum semantic fi- delity. S≤B(9)

work page

[12] [12]

S > B∧n≤n high ∧(¯s > µ∨r long > η) (10)

SRE (Density-First): If the table exceeds the budget but the scale is not massive (cell count is manageable), we attribute the overflow pri- marily to verbose content (e.g., high long-cell ratio) and switch to Symbolic Reference En- coding for compression via address placehold- ers. S > B∧n≤n high ∧(¯s > µ∨r long > η) (10)

work page

[13] [13]

PSS is most effective for hyperscale ta- bles because loop-based code expands large structures more reliably than token-by-token enumeration

PSS (Scale-First): If the table exceeds the budget and the number of cells is massive, we switch to Programmatic Structure Synthe- sis. PSS is most effective for hyperscale ta- bles because loop-based code expands large structures more reliably than token-by-token enumeration. S > B∧n > n high (11) If S > B but neither SRE nor PSS conditions are strictly ...

work page

[14] [14]

Information Coverage.This metric measures the completeness of the information transfer from the tabular structure to the tree structure. It is cal- culated as the ratio of original table cells whose content is successfully represented in the generated tree nodes: Coverage= |Cmapped| |Ctotal| (12) where Cmapped denotes the set of cells found in the tree an...

work page

[15] [15]

We employ a bottom-up path verification strategy

Structural Integrity.This metric evaluates the correctness of the hierarchical relationships in the generated tree. We employ a bottom-up path verification strategy. For every data leaf node (value) in the tree, we trace the path back to the ROOT. The validity of a path is determined as fol- lows: • Initialization: Start from the Leaf node. Let the Leaf b...

work page

[16] [16]

If the current node is in thesame row or same columnas the Leaf node, the rela- tionship is valid; continue to the parent

work page

[17] [17]

If yes, the relationship is valid (transitive alignment); continue

If not, check if the current node is in the same row or same columnas its imme- diate child node (the node just visited). If yes, the relationship is valid (transitive alignment); continue

work page

[18] [18]

Gender," yet the raw data often represents this as a null value. Conse- quently, during structural evaluation, items (e.g.,

If neither condition is met, the path is deemedStructurally Broken, and veri- fication terminates. • Success: If the traversal reaches the ROOT without error, the path is valid. TheStructural Integrityscore is the percentage of valid paths out of all leaf-to-root paths. Discussion on Evaluation Metrics.(1)Merged Cell Representation:Many table datasets lac...

work page 2023

[19] [19]

3 Textual)

Structural Misalignment (36.7%).While this error affects both paradigms, it is predomi- nantly a Symbolic failure (12 cases vs. 3 Textual). The Symbolic module struggles because code gen- erators often hallucinate a flattened schema for deeply nested trees (e.g., missing a parent key like Loan Bank and iterating directly over children). Textual reasoning ...

work page

[20] [20]

Both Wrong

Annotation Errors (26.7%).This category represents a mode-agnostic failure, heavily con- centrated in the "Both Wrong" intersection (12 cases). Unexpectedly, a quarter of failures were actually correct model predictions penalized by flawed datasets. For instance, given the query"vari- ance rate", both models correctly extracted the percentage (-1.4), but ...

work page

[21] [21]

This error manifests in distinct forms across the two modes

Arithmetic & Logic Hallucination (20.0%). This error manifests in distinct forms across the two modes. ForTextual Mode(11 cases), it is aCalcu- lation Failure: the model retrieves correct numbers but drifts during long-chain floating-point aggrega- tion (e.g., 26.8+. . . ) due to the lack of a calculator. However,Symbolic Modeis not immune (4 cases); it s...

work page

[22] [22]

granularity mismatch

Retrieval Bias (16.7%).This error reflects a "granularity mismatch" where models select high- level summary nodes instead of specific leaves. It is more prevalent in Textual Mode (4 cases) due to “semantic short-circuiting”—the model stops read- ing at the first keyword match (e.g., "Total Expen- diture") ignoring temporal constraints.Symbolic Modealso en...

work page arXiv 2023

[23] [23]

Total Students

**Strict Preservation of Original Wording**: You must preserve the original text from the table cells. It is forbidden to create new names or summaries. (For example: if a cell contains "Total Students", the header must include "Total Students", not "Total Students Metrics".)

work page

[24] [24]

The part before " - " must be the most specific level

**Hierarchical Combination**: If header information is distributed across multiple header rows, combine them using the format: [Lower-Level Header] - [Upper-Level Header]. The part before " - " must be the most specific level

work page

[25] [25]

Hierarchy Keys

**Strict Column Count Matching**: The number of strings in the output array must exactly match the number of columns in the data rows. The table is provided in JSON array form. [Input Table]: {TABLE_AS_JSON_STRING} Your output must be a single, valid JSON string array representing the normalized headers. Do not provide any explanation. Listing 2: AdaSTR: ...

work page 2023

[26] [26]

total",

**Block headers + detail rows in the same column**: - Some rows in a text column look like high-level categories and have many empty cells or missing numeric values in other columns. - The rows immediately following such a row contain repeated labels such as "total", "weekday", "weekend", etc. with numeric values filled. - The same small set of labels ("t...

work page

[27] [27]

group header

**Strong repetition of small categorical sets**: - A column alternates between "group header" values and a small fixed set of "detail" values in a patterned way. - This usually indicates a hierarchy like: Behaviour Group > Behaviour Detail. In such cases: - You should set "table_type" to "complex", even if there is only one text column. - The **hierarchy_...

work page

[28] [28]

There is no obvious multi-row header structure

work page

[29] [29]

There is no clear row-based block pattern as described in Step 3

work page

[30] [30]

Each row is largely independent

work page

[31] [31]

- Classify as **complex** if ANY of the following holds:

Typically there is at most one main categorical column and the others are mostly numeric measures. - Classify as **complex** if ANY of the following holds:

work page

[32] [32]

There are multiple categorical columns that naturally form a chain

work page

[33] [33]

Some categorical columns have heavily repeated values and clearly act as grouping keys

work page

[34] [34]

There are apparent aggregation / subtotal / summary rows

work page

[35] [35]

table_type

There is a row-based block structure as described in Step 3. ## Output Requirements Your output must be a single, valid JSON object, and must strictly follow the structure below: { "table_type": "simple" or "complex", "analysis_reason": "A brief explanation of the reasoning behind your judgment", "hierarchy_keys": ["header1", "header2", ...], "value_leave...

work page

[36] [36]

[Header Name] - [Cell Value]

Semantic Hierarchy Keys - For each hierarchy_key column, the corresponding key in the JSON must follow the format "[Header Name] - [Cell Value]". - You must use " - " (space, hyphen, space) as the separator. - Example: if the header is "Grade" and the cell value is "1", then the JSON key should be "Grade - 1"

work page

[37] [37]

Strict Preservation of Leaf Node Names - The keys of value_leaves in the final JSON must exactly match the normalized headers

work page

[38] [38]

- For each data row, use the hierarchy_keys (in order) to create or move into the corresponding nested structure

General Structure Generation Rules - Traverse the data row by row, skipping the header row. - For each data row, use the hierarchy_keys (in order) to create or move into the corresponding nested structure. - At the deepest level for a given row, create an object that stores all value_leaves for that row

work page

[39] [39]

) For simple tables, you should keep the structure shallow and flat: - Typically there is only one hierarchy_key. - For each data row: - Construct a single key by applying the

Handling of Simple Tables (table_type = "simple") For simple tables, you should keep the structure shallow and flat: - Typically there is only one hierarchy_key. - For each data row: - Construct a single key by applying the "[Header Name] - [Cell Value]" rule to that hierarchy_key column. - Store an object under that key with all value_leaves for that row

work page

[40] [40]

Handling of Complex Tables (table_type = "complex") For complex tables, you MUST allow multi-level nesting, even if there is only one hierarchy_key column. 5.1 Column-based hierarchies - When there are multiple hierarchy_keys across different columns, you should: - For each row, create or move into a nested path defined by those columns in order (e.g. Reg...

work page

[41] [41]

- Every non-header row must contribute at least one object into the JSON tree

No Data Omission - You must not omit any cell content from the input table. - Every non-header row must contribute at least one object into the JSON tree

work page

[42] [42]

- Do NOT rename, summarize, or translate headers or cell values

Prohibited Behaviors - Do NOT invent new textual labels that do not appear in the original table or headers. - Do NOT rename, summarize, or translate headers or cell values. [Input Table]: {TABLE_AS_JSON_STRING} [Normalized Headers]: {NORMALIZED_HEADERS_FROM_STEP_1} [Hierarchy Definition]: {HIERARCHY_DEFINITION_FROM_STEP_2} Your output must be a single, v...

work page

[43] [43]

Please enclose your final answer in square brackets, e.g., [Answer]

work page

[44] [44]

Which"** or **

**Pay strict attention to the required answer type.** - If the question starts with **"Which"** or **"What"** (e.g., "Which item?", "What category?"), your answer must be a *name or description* (e.g., "Sell Product A"), **NOT** the associated numerical value. - If the question starts with **"How much"**, **"What is the value"**, or **"How many"**, the an...

work page

[45] [45]

units" (e.g.,

**ONLY when** the answer is numerical (as per rule 2), you must check for any associated "units" (e.g., "ten thousand", "million") and provide the final converted numerical result. ## Table: {table_str} ## Question: {question} ## Final Answer:

work page