ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering
Pith reviewed 2026-05-10 17:42 UTC · model grok-4.3
The pith
Reconstructing tables as adaptive logical semantic trees lets LLMs reach state-of-the-art accuracy on complex question answering.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ASTRA uses AdaSTR to let LLMs globally reconstruct tables into Logical Semantic Trees that model hierarchical dependencies explicitly and adapt construction strategies to table scale, then applies DuTR to combine tree-search textual navigation for linguistic alignment with symbolic code execution for precise verification, producing state-of-the-art results on complex table benchmarks.
What carries the argument
Logical Semantic Trees, which explicitly encode table hierarchies and are built adaptively by LLMs to close representation gaps before dual-mode reasoning begins.
Load-bearing premise
Large language models can reliably turn tables into logical semantic trees that capture every relevant hierarchy without introducing reconstruction errors.
What would settle it
Run the same benchmark questions on the identical base LLM once with standard table serialization and once with the automatically generated Logical Semantic Trees; a negligible accuracy gap would falsify the central claim.
Figures
read the original abstract
Table serialization remains a critical bottleneck for Large Language Models (LLMs) in complex table question answering, hindered by challenges such as structural neglect, representation gaps, and reasoning opacity. Existing serialization methods fail to capture explicit hierarchies and lack schema flexibility, while current tree-based approaches suffer from limited semantic adaptability. To address these limitations, we propose ASTRA (Adaptive Semantic Tree Reasoning Architecture) including two main modules, AdaSTR and DuTR. First, we introduce AdaSTR, which leverages the global semantic awareness of LLMs to reconstruct tables into Logical Semantic Trees. This serialization explicitly models hierarchical dependencies and employs an adaptive mechanism to optimize construction strategies based on table scale. Second, building on this structure, we present DuTR, a dual-mode reasoning framework that integrates tree-search-based textual navigation for linguistic alignment and symbolic code execution for precise verification. Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ASTRA, an architecture for complex table question answering consisting of two modules: AdaSTR, which uses LLMs' global semantic awareness to reconstruct tables into Logical Semantic Trees with an adaptive mechanism that optimizes construction based on table scale, and DuTR, a dual-mode reasoning framework combining tree-search-based textual navigation for linguistic alignment with symbolic code execution for precise verification. The central claim is that this approach overcomes limitations in table serialization (structural neglect, representation gaps, reasoning opacity) and achieves state-of-the-art performance on complex table benchmarks.
Significance. If the experimental claims hold with proper validation, the work could offer a practical advance in handling hierarchical dependencies in tables for LLMs by combining adaptive tree construction with verifiable dual-mode reasoning. The adaptive scaling in AdaSTR and the integration of textual and symbolic paths in DuTR address real bottlenecks in current serialization methods. However, the absence of any reported metrics, baselines, ablations, or reconstruction-quality checks in the manuscript as described substantially weakens the ability to assess whether these contributions deliver measurable gains.
major comments (2)
- [Abstract] Abstract: The statement that 'Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance' is made without any quantitative results, specific benchmark names, baseline comparisons, ablation studies, or error analysis. This renders the central empirical claim unsupported and load-bearing for the paper's contribution.
- [AdaSTR] AdaSTR module description: The reconstruction of tables into Logical Semantic Trees is asserted to 'explicitly model hierarchical dependencies' via LLM global awareness and adaptive scaling, yet no fidelity metrics (e.g., tree-edit distance, structural accuracy rates, or human validation scores on complex tables) are provided to confirm that the trees capture all relevant dependencies without hallucinations or omissions. This assumption is load-bearing for both the serialization improvement and the downstream DuTR gains.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and will revise the paper to strengthen the empirical presentation and validation of key components.
read point-by-point responses
-
Referee: [Abstract] Abstract: The statement that 'Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance' is made without any quantitative results, specific benchmark names, baseline comparisons, ablation studies, or error analysis. This renders the central empirical claim unsupported and load-bearing for the paper's contribution.
Authors: We agree that the abstract would benefit from greater specificity to support the SOTA claim. In the revised version, we will expand the abstract to name the benchmarks (e.g., WikiTableQuestions, TabFact, and others from the complex table QA suite), report key performance deltas against baselines, and briefly reference ablation findings. The full manuscript already contains these quantitative details in the Experiments section, but we will ensure the abstract is self-contained and evidence-based. revision: yes
-
Referee: [AdaSTR] AdaSTR module description: The reconstruction of tables into Logical Semantic Trees is asserted to 'explicitly model hierarchical dependencies' via LLM global awareness and adaptive scaling, yet no fidelity metrics (e.g., tree-edit distance, structural accuracy rates, or human validation scores on complex tables) are provided to confirm that the trees capture all relevant dependencies without hallucinations or omissions. This assumption is load-bearing for both the serialization improvement and the downstream DuTR gains.
Authors: We acknowledge the need for direct validation of the Logical Semantic Tree quality. The current submission emphasizes end-to-end task performance rather than intermediate reconstruction metrics. In revision, we will add an analysis subsection (or appendix) reporting tree fidelity measures such as structural similarity scores, tree-edit distance on sampled tables, and qualitative examples of hierarchical dependency capture. This will explicitly address potential hallucinations or omissions and better justify the contribution of AdaSTR. revision: yes
Circularity Check
No significant circularity; architecture adds independent modules to LLMs
full rationale
The paper describes ASTRA as a new architecture with AdaSTR (LLM-driven Logical Semantic Tree reconstruction with adaptive scaling) and DuTR (dual-mode textual navigation plus symbolic execution). No equations, fitted parameters, or first-principles derivations appear that could reduce to inputs by construction. Claims rest on experimental SOTA results rather than self-referential predictions or self-citation chains. The method is presented as an additive extension of existing LLMs, with no load-bearing steps that rename fits as predictions or smuggle ansatzes via prior self-work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs possess global semantic awareness sufficient to reconstruct table hierarchies accurately and adaptively
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AdaSTR leverages LLMs to reconstruct tables into Logical Semantic Trees... adaptive mechanism to optimize construction strategies based on table scale... DuTR integrates tree-search-based textual navigation... symbolic code execution
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experiments on complex table benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Large language models(llms) on tabular data: Prediction, generation, and understanding – a survey. Preprint, arXiv:2402.17944. Xinyi He, Yihao Liu, Mengyu Zhou, Yeye He, Haoyu Dong, Shi Han, Zejian Yuan, and Dongmei Zhang
-
[2]
Tablelora: Low-rank adaptation on table structure understanding for large language models. Preprint, arXiv:2503.04396. Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shan- tanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. Ruler: What’s the real context size of your long-context language models? Preprint, arXiv:2404.06654. Yannis Kats...
-
[3]
AIT-QA: Question answering dataset over complex tables in the airline industry. InProceed- ings of the 2022 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies: Industry Track, pages 305–314, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics. Rohit Khoja, De...
work page 2022
-
[4]
Integrating table representations into large language models for improved scholarly document comprehension. InProceedings of the Fourth Work- shop on Scholarly Document Processing (SDP 2024), pages 293–306, Bangkok, Thailand. Association for Computational Linguistics. Liyao Li, Chao Ye, Wentao Ye, Yifei Sun, Zhe Jiang, Haobo Wang, Jiaming Tian, Yiming Zha...
work page 2024
-
[5]
Table as a modality for large language models. Preprint, arXiv:2512.00947. Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dong- mei Zhang, and Surajit Chaudhuri. 2023. Table-gpt: Table-tuned gpt for diverse table tasks.Preprint, arXiv:2310.09263. Qianlong Li, Chen Huang, Shuai Li, Yuanxin Xiang, Deng Xiong, a...
-
[6]
Locate relevant data within the table
-
[7]
Compare the consistency of both answers against the table data. Output Requirements: Do not output any explanations, punctuation, or analysis processes. Strictly output ONLY a single character: "A" or "B". The Correct Answer: We also conduct preliminary explorations on improving the selector; details are provided in Ap- pendix G. A.3 Implementation of Eva...
work page 2023
-
[8]
Textual reasoning (End-to-End approaches). Textual reasoning treats TableQA as conditional generation: (ˆy,ˆa) =LLM q∥Serialize(T) ,(2) where Serialize(·) linearizes the table into a token sequence (e.g., Markdown/CSV/TSV , row- wise templates, or hierarchical header strings). The LLM produces a natural-language reasoning trace ˆy(optional) and the final ...
work page 2023
-
[9]
None") and structural corruption in the
Symbolic reasoning (Program-aided ap- proaches).Symbolic reasoning explicitly pro- duces an executable program (e.g., SQL, pan- das/Python) whose execution yields the final an- swer: ˆp=LLM q∥Schema(T) ˆa=Exec(ˆp, T) (3) This paradigm includes classical semantic pars- ing (text-to-SQL) and modern LLM tool-use vari- ants where the model generates code and ...
work page 2023
-
[10]
Hybrid reasoning (Textual⊕ Symbolic).Hy- brid systems integrate the semantic flexibility of textual reasoning with the precision of symbolic execution, typically employing paradigms such as adaptive routingfor dynamic selection (Liu et al., 2023b; Zhang et al., 2024a),interleaved modular- ityfor step-wise refinement (Khoja et al., 2025) to mitigate halluc...
work page 2025
-
[11]
DSP (Default): If the estimated token foot- print fits within the context budget, we use direct generation for maximum semantic fi- delity. S≤B(9)
-
[12]
S > B∧n≤n high ∧(¯s > µ∨r long > η) (10)
SRE (Density-First): If the table exceeds the budget but the scale is not massive (cell count is manageable), we attribute the overflow pri- marily to verbose content (e.g., high long-cell ratio) and switch to Symbolic Reference En- coding for compression via address placehold- ers. S > B∧n≤n high ∧(¯s > µ∨r long > η) (10)
-
[13]
PSS (Scale-First): If the table exceeds the budget and the number of cells is massive, we switch to Programmatic Structure Synthe- sis. PSS is most effective for hyperscale ta- bles because loop-based code expands large structures more reliably than token-by-token enumeration. S > B∧n > n high (11) If S > B but neither SRE nor PSS conditions are strictly ...
-
[14]
Information Coverage.This metric measures the completeness of the information transfer from the tabular structure to the tree structure. It is cal- culated as the ratio of original table cells whose content is successfully represented in the generated tree nodes: Coverage= |Cmapped| |Ctotal| (12) where Cmapped denotes the set of cells found in the tree an...
-
[15]
We employ a bottom-up path verification strategy
Structural Integrity.This metric evaluates the correctness of the hierarchical relationships in the generated tree. We employ a bottom-up path verification strategy. For every data leaf node (value) in the tree, we trace the path back to the ROOT. The validity of a path is determined as fol- lows: • Initialization: Start from the Leaf node. Let the Leaf b...
-
[16]
If the current node is in thesame row or same columnas the Leaf node, the rela- tionship is valid; continue to the parent
-
[17]
If yes, the relationship is valid (transitive alignment); continue
If not, check if the current node is in the same row or same columnas its imme- diate child node (the node just visited). If yes, the relationship is valid (transitive alignment); continue
-
[18]
If neither condition is met, the path is deemedStructurally Broken, and veri- fication terminates. • Success: If the traversal reaches the ROOT without error, the path is valid. TheStructural Integrityscore is the percentage of valid paths out of all leaf-to-root paths. Discussion on Evaluation Metrics.(1)Merged Cell Representation:Many table datasets lac...
work page 2023
-
[19]
Structural Misalignment (36.7%).While this error affects both paradigms, it is predomi- nantly a Symbolic failure (12 cases vs. 3 Textual). The Symbolic module struggles because code gen- erators often hallucinate a flattened schema for deeply nested trees (e.g., missing a parent key like Loan Bank and iterating directly over children). Textual reasoning ...
-
[20]
Annotation Errors (26.7%).This category represents a mode-agnostic failure, heavily con- centrated in the "Both Wrong" intersection (12 cases). Unexpectedly, a quarter of failures were actually correct model predictions penalized by flawed datasets. For instance, given the query"vari- ance rate", both models correctly extracted the percentage (-1.4), but ...
-
[21]
This error manifests in distinct forms across the two modes
Arithmetic & Logic Hallucination (20.0%). This error manifests in distinct forms across the two modes. ForTextual Mode(11 cases), it is aCalcu- lation Failure: the model retrieves correct numbers but drifts during long-chain floating-point aggrega- tion (e.g., 26.8+. . . ) due to the lack of a calculator. However,Symbolic Modeis not immune (4 cases); it s...
-
[22]
Retrieval Bias (16.7%).This error reflects a "granularity mismatch" where models select high- level summary nodes instead of specific leaves. It is more prevalent in Textual Mode (4 cases) due to “semantic short-circuiting”—the model stops read- ing at the first keyword match (e.g., "Total Expen- diture") ignoring temporal constraints.Symbolic Modealso en...
-
[23]
**Strict Preservation of Original Wording**: You must preserve the original text from the table cells. It is forbidden to create new names or summaries. (For example: if a cell contains "Total Students", the header must include "Total Students", not "Total Students Metrics".)
-
[24]
The part before " - " must be the most specific level
**Hierarchical Combination**: If header information is distributed across multiple header rows, combine them using the format: [Lower-Level Header] - [Upper-Level Header]. The part before " - " must be the most specific level
-
[25]
**Strict Column Count Matching**: The number of strings in the output array must exactly match the number of columns in the data rows. The table is provided in JSON array form. [Input Table]: {TABLE_AS_JSON_STRING} Your output must be a single, valid JSON string array representing the normalized headers. Do not provide any explanation. Listing 2: AdaSTR: ...
work page 2023
-
[26]
**Block headers + detail rows in the same column**: - Some rows in a text column look like high-level categories and have many empty cells or missing numeric values in other columns. - The rows immediately following such a row contain repeated labels such as "total", "weekday", "weekend", etc. with numeric values filled. - The same small set of labels ("t...
-
[27]
**Strong repetition of small categorical sets**: - A column alternates between "group header" values and a small fixed set of "detail" values in a patterned way. - This usually indicates a hierarchy like: Behaviour Group > Behaviour Detail. In such cases: - You should set "table_type" to "complex", even if there is only one text column. - The **hierarchy_...
-
[28]
There is no obvious multi-row header structure
-
[29]
There is no clear row-based block pattern as described in Step 3
-
[30]
Each row is largely independent
-
[31]
- Classify as **complex** if ANY of the following holds:
Typically there is at most one main categorical column and the others are mostly numeric measures. - Classify as **complex** if ANY of the following holds:
-
[32]
There are multiple categorical columns that naturally form a chain
-
[33]
Some categorical columns have heavily repeated values and clearly act as grouping keys
-
[34]
There are apparent aggregation / subtotal / summary rows
-
[35]
There is a row-based block structure as described in Step 3. ## Output Requirements Your output must be a single, valid JSON object, and must strictly follow the structure below: { "table_type": "simple" or "complex", "analysis_reason": "A brief explanation of the reasoning behind your judgment", "hierarchy_keys": ["header1", "header2", ...], "value_leave...
-
[36]
Semantic Hierarchy Keys - For each hierarchy_key column, the corresponding key in the JSON must follow the format "[Header Name] - [Cell Value]". - You must use " - " (space, hyphen, space) as the separator. - Example: if the header is "Grade" and the cell value is "1", then the JSON key should be "Grade - 1"
-
[37]
Strict Preservation of Leaf Node Names - The keys of value_leaves in the final JSON must exactly match the normalized headers
-
[38]
General Structure Generation Rules - Traverse the data row by row, skipping the header row. - For each data row, use the hierarchy_keys (in order) to create or move into the corresponding nested structure. - At the deepest level for a given row, create an object that stores all value_leaves for that row
-
[39]
Handling of Simple Tables (table_type = "simple") For simple tables, you should keep the structure shallow and flat: - Typically there is only one hierarchy_key. - For each data row: - Construct a single key by applying the "[Header Name] - [Cell Value]" rule to that hierarchy_key column. - Store an object under that key with all value_leaves for that row
-
[40]
Handling of Complex Tables (table_type = "complex") For complex tables, you MUST allow multi-level nesting, even if there is only one hierarchy_key column. 5.1 Column-based hierarchies - When there are multiple hierarchy_keys across different columns, you should: - For each row, create or move into a nested path defined by those columns in order (e.g. Reg...
-
[41]
- Every non-header row must contribute at least one object into the JSON tree
No Data Omission - You must not omit any cell content from the input table. - Every non-header row must contribute at least one object into the JSON tree
-
[42]
- Do NOT rename, summarize, or translate headers or cell values
Prohibited Behaviors - Do NOT invent new textual labels that do not appear in the original table or headers. - Do NOT rename, summarize, or translate headers or cell values. [Input Table]: {TABLE_AS_JSON_STRING} [Normalized Headers]: {NORMALIZED_HEADERS_FROM_STEP_1} [Hierarchy Definition]: {HIERARCHY_DEFINITION_FROM_STEP_2} Your output must be a single, v...
-
[43]
Please enclose your final answer in square brackets, e.g., [Answer]
-
[44]
**Pay strict attention to the required answer type.** - If the question starts with **"Which"** or **"What"** (e.g., "Which item?", "What category?"), your answer must be a *name or description* (e.g., "Sell Product A"), **NOT** the associated numerical value. - If the question starts with **"How much"**, **"What is the value"**, or **"How many"**, the an...
-
[45]
**ONLY when** the answer is numerical (as per rule 2), you must check for any associated "units" (e.g., "ten thousand", "million") and provide the final converted numerical result. ## Table: {table_str} ## Question: {question} ## Final Answer:
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.