FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers
Pith reviewed 2026-05-22 07:09 UTC · model grok-4.3
The pith
FastTab recovers table structure by predicting row and column counts plus separators directly with a tiny recursive module and 1D transformers instead of sequential HTML generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FastTab constructs table structure by predicting the number of rows and columns, identifying header rows, and localizing horizontal and vertical separators with a combination of a Tiny Recursive Module for global reasoning and axial 1D Transformer encoders that model long-range dependencies separately along each axis; cell spans are then inferred from ROI-aligned features extracted after the grid is assembled.
What carries the argument
Tiny Recursive Module (TRM) combined with axial 1D Transformer encoders that process rows and columns independently to predict grid elements before span inference.
If this is right
- Table structure can be recovered at inference speeds suitable for real-time document pipelines without sacrificing benchmark accuracy.
- Pixel-level anonymization of cell content does not degrade separator localization when the model relies on grid-level predictions.
- The same grid-construction pipeline extends to camera-captured documents containing curved separators.
- Avoiding sequential tag generation removes a major source of compounding errors in complex multi-span tables.
Where Pith is reading between the lines
- The grid-first strategy could transfer to other layout-heavy tasks such as form field extraction or chart parsing where sequential decoding currently dominates.
- Replacing the Tiny Recursive Module with a larger but still lightweight recurrent unit might improve global coherence on very large tables without losing the speed benefit.
- Because the model separates row-wise and column-wise reasoning, it may scale more gracefully to tables with hundreds of rows than fully 2D attention approaches.
Load-bearing premise
Directly predicting row and column counts, headers, and separators followed by ROI-based span inference is enough to recover accurate structure without the error buildup that occurs in autoregressive HTML decoding.
What would settle it
A head-to-head comparison on PubTables-1M or SciTSR showing that FastTab's structure accuracy falls below strong autoregressive baselines on tables with frequent spanning cells while its latency advantage remains.
Figures
read the original abstract
Table structure recognition (TSR) requires both table-level coherence (row/column counts, headers, spanning cells) and precise separator localization. We introduce FastTab, a grid-centric TSR model that avoids autoregressive HTML decoding by combining (i) a lightweight Tiny Recursive Module (TRM) for global reasoning and (ii) axial 1D Transformer encoders that capture long-range dependencies along rows and columns. The model predicts row/column counts, header rows, and separators to construct a grid, then infers rowspan/colspan using ROI-aligned cell features. Across four benchmarks (PubTabNet, FinTabNet, PubTables-1M, and SciTSR), FastTab achieves competitive structure recovery performance while operating at low-latency inference. We further study robustness under pixel-level anonymisation and show an extension to curved separators for camera-captured documents. The source code will be made publicly available at https://github.com/hamdilaziz/FastTab .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FastTab, a grid-centric table structure recognition (TSR) model that combines a Tiny Recursive Module (TRM) for global reasoning with axial 1D Transformer encoders to predict row/column counts, header rows, and separators, thereby constructing a grid before performing ROI-aligned inference of rowspan and colspan on cell features. It reports competitive structure recovery on PubTabNet, FinTabNet, PubTables-1M, and SciTSR while emphasizing low-latency inference, robustness under pixel-level anonymization, and an extension to curved separators; source code is promised to be released.
Significance. If the performance claims are substantiated with detailed metrics and ablations, the work could offer a practical advance for high-throughput document pipelines by sidestepping autoregressive HTML decoding and its potential error accumulation. The lightweight TRM plus 1D axial design and the public-code commitment are clear strengths; the robustness and curved-separator experiments add real-world relevance.
major comments (2)
- Abstract: the central performance claim is stated only qualitatively ('competitive structure recovery performance') with no TEDS, F1, or latency numbers, error bars, or ablation tables. Without these data it is impossible to judge whether the grid-first pipeline actually delivers the promised reduction in error accumulation relative to autoregressive baselines.
- Method description (grid construction step): the approach instantiates the table grid from predicted row/column counts and separators before ROI span inference. An off-by-one error in any global count would misalign all subsequent cell features. The manuscript provides no quantitative breakdown of count/separator prediction accuracy versus final structure metrics, leaving open whether competitive results hold only for simple rectangular tables or also for complex spanning cases.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: Abstract: the central performance claim is stated only qualitatively ('competitive structure recovery performance') with no TEDS, F1, or latency numbers, error bars, or ablation tables. Without these data it is impossible to judge whether the grid-first pipeline actually delivers the promised reduction in error accumulation relative to autoregressive baselines.
Authors: We agree that the abstract would be strengthened by quantitative claims. In the revised version we will include specific TEDS, F1, and latency figures (with comparisons to the main baselines) so that readers can directly evaluate the claimed advantages of the grid-centric pipeline. revision: yes
-
Referee: Method description (grid construction step): the approach instantiates the table grid from predicted row/column counts and separators before ROI span inference. An off-by-one error in any global count would misalign all subsequent cell features. The manuscript provides no quantitative breakdown of count/separator prediction accuracy versus final structure metrics, leaving open whether competitive results hold only for simple rectangular tables or also for complex spanning cases.
Authors: We acknowledge the importance of quantifying error propagation from the global count and separator predictions. While the final structure metrics already reflect performance on benchmarks containing many spanning cells, we will add a dedicated analysis (new table and discussion) that reports count/separator prediction accuracy separately and correlates it with the end-to-end TEDS and F1 scores. This will explicitly address both simple and complex table cases. revision: yes
Circularity Check
No circularity: architecture and evaluation are self-contained against external benchmarks
full rationale
The paper presents FastTab as a new grid-centric architecture that predicts row/column counts, headers and separators to instantiate a grid, followed by ROI-based span inference. All performance claims rest on evaluation against four independent external benchmarks (PubTabNet, FinTabNet, PubTables-1M, SciTSR) rather than any internal derivation that reduces to fitted parameters or self-citations. No equations, uniqueness theorems, or ansatzes are shown to be smuggled in via prior self-work; the method is described as a direct architectural choice evaluated empirically. This is the standard non-circular pattern for an ML vision paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard benchmarks (PubTabNet, FinTabNet, PubTables-1M, SciTSR) provide reliable ground truth for table structure.
invented entities (1)
-
Tiny Recursive Module (TRM)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.lean (and Cost/FunctionalEquation.lean)reality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
FastTab combines (i) a Tiny Recursive Module (TRM) for global reasoning and (ii) axial 1D Transformer encoders that capture long-range dependencies along rows and columns. The model predicts row/column counts, header rows, and separators to construct a grid, then infers rowspan/colspan using ROI-aligned cell features.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
Zhang, Z., Hu, P., Ma, J., Du, J., Zhang, J., Zhu, H., Yin, B., Yin, B., Liu, C.: SEMv2: Table separation line detection based on conditional convolution. arXiv:2303.04384 (2023)
-
[4]
Pattern Recognition126, 108565 (2022)
Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table structure recognizer. Pattern Recognition126, 108565 (2022)
work page 2022
- [5]
- [6]
- [7]
- [8]
-
[9]
Guo, Z., Yu, Y., Lv, P., Zhang, C., Li, H., Wang, Z., Yao, K., Liu, J., Wang, J.: TRUST: An accurate and end-to-end table structure recognizer using splitting- based transformers. arXiv:2208.14687 (2022)
- [10]
- [11]
-
[12]
Chi, Z., Huang, H., Xu, H.-D., Yu, H., Yin, W., Mao, X.-L.: Complicated table structure recognition. arXiv:1908.04729 (2019)
- [13]
- [14]
-
[15]
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoy- anov, V., Zettlemoyer, L.: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv:1910.13461 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[16]
Pattern Recognition Letters165(2023)
Wang, H., Xue, Y., Zhang, J., Jin, L.: Scene table structure recognition with seg- mentation collaboration and alignment. Pattern Recognition Letters165(2023)
work page 2023
- [17]
- [18]
-
[19]
Hou, Q., Wang, J.: TABLET: Table structure recognition using encoder-only trans- formers. arXiv:2506.07015 (2025)
- [20]
- [21]
-
[22]
Zhang, Z., Liu, S., Hu, P., Ma, J., Du, J., Zhang, J., Hu, Y.: UniTabNet: Bridging vision and language models for enhanced table structure recognition. arXiv:2409.13148 (2024)
- [23]
- [24]
- [25]
-
[26]
Less is More: Recursive Reasoning with Tiny Networks
Jolicoeur-Martineau, A.: Less is more: Recursive reasoning with tiny networks. arXiv:2510.04871 (2025) FastTab: A Fast Table Recognizer with a TRM and 1D Transformers 17
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [27]
- [28]
-
[29]
ACM Computing Surveys 56(10) (2024)
Huang, J., Chen, H., Yu, F., Lu, W.: From detection to application: Recent ad- vances in understanding scientific tables and figures. ACM Computing Surveys 56(10) (2024)
work page 2024
-
[30]
Better & Faster Large Language Models via Multi-token Prediction
Gloeckle, F., Youbi Idrissi, B., Rozi‘ere, B., Lopez-Paz, D., Synnaeve, G.: Better & faster large language models via multi-token prediction. arXiv:2404.19737 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Pattern Recognition157, 110816 (2025)
Long,R.,Xing,H.,Yang,Z.,Zheng,Q.,Yu,Z.,Huang,F.,Yao,C.:LORE++:Log- ical location regression network for table structure recognition with pre-training. Pattern Recognition157, 110816 (2025)
work page 2025
-
[32]
Yu, C., Li, W., Li, W., Zhu, Z., Liu, R., Hou, B., Jiao, L.: A survey for table recognition based on deep learning. Neurocomputing (2024)
work page 2024
- [33]
- [34]
- [35]
- [36]
-
[37]
Khang, M., Hong, T.: TFLOP: Table structure recognition framework with layout pointer mechanism. arXiv:2501.11800 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.