ConRTF: Edge-Constrained Boundary Distribution Refinement for Realtime TransFormer Table Structure Recognition

Antoine Doucet; Aurelie Joseph; Eliott Thomas; Gaspar Deloin; Jean-Marc Ogier; Mickael Coustaty; Tri-Cong Pham; Vincent Poulain D'Andecy

arxiv: 2607.00734 · v1 · pith:55R2XEVSnew · submitted 2026-07-01 · 💻 cs.CV · cs.AI

ConRTF: Edge-Constrained Boundary Distribution Refinement for Realtime TransFormer Table Structure Recognition

Eliott Thomas , Tri-Cong Pham , Mickael Coustaty , Aurelie Joseph , Gaspar Deloin , Vincent Poulain d'Andecy , Jean-Marc Ogier , Antoine Doucet This is my paper

Pith reviewed 2026-07-02 14:37 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords table structure recognitionboundary refinementreal-time detectionedge-constrained lossdocument image analysistransformer detector

0 comments

The pith

An edge-constrained loss that weights horizontal boundaries for rows and vertical boundaries for columns improves table structure recognition accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard detection losses overlook the distinct structural roles of rows and columns in tables, leading to suboptimal boundary placement. It introduces an Edge-constrained Fine-grained Localization loss that applies table-specific geometric priors during training only. This guides a real-time detector's distribution-based boundary refinement to prioritize the boundaries that matter most for each element type. The result is higher cell assignment accuracy on standard and private datasets, including gains of up to 1.6 GriTS points at unchanged inference speed, while remaining effective with only a few thousand training examples.

Core claim

ConRTF encodes structural asymmetry directly into the training objective through the EFL loss: row-like elements receive stronger supervision on horizontal boundaries and column-like elements on vertical boundaries. When applied inside a real-time detector that already performs distribution-based boundary refinement, this produces more structurally consistent boundaries without altering the inference pipeline or requiring additional data.

What carries the argument

Edge-constrained Fine-grained Localization loss (EFL) that encodes table-specific geometric priors by emphasizing horizontal boundaries for row-like elements and vertical boundaries for column-like elements.

If this is right

The method delivers consistent accuracy gains over optimized baselines and real-time detectors such as RT-DETRv2 and YOLOv10-11 while preserving identical inference speed.
Performance remains robust when training data is limited to 2k-3k annotated tables on PubTables-1M and private datasets.
Because EFL operates only during training, the approach adds no computational cost at inference time.
Boundary refinements are guided toward table geometry rather than generic object detection objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same boundary-type asymmetry principle could be tested on other layout tasks such as form parsing or diagram understanding where horizontal and vertical elements play unequal roles.
If the EFL weighting proves stable across table styles, it may reduce the need for post-processing heuristics that currently correct boundary errors in production TSR systems.
Data efficiency with 2k-3k examples suggests the loss could serve as a regularizer when adapting TSR models to new document domains with limited labels.

Load-bearing premise

Emphasizing horizontal boundaries for row-like elements and vertical boundaries for column-like elements will produce structurally meaningful boundary refinements that improve cell assignment accuracy without introducing new failure modes.

What would settle it

A controlled test on tables with irregular or merged cells where the EFL-trained model shows equal or lower cell assignment accuracy than the baseline detector at the same boundary localization precision.

Figures

Figures reproduced from arXiv: 2607.00734 by Antoine Doucet, Aurelie Joseph, Eliott Thomas, Gaspar Deloin, Jean-Marc Ogier, Mickael Coustaty, Tri-Cong Pham, Vincent Poulain D'Andecy.

**Figure 2.** Figure 2: Overview of ConRTF. Each decoder layer iteratively refines boundary [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on two Business dataset tables. Colors: [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

read the original abstract

Table Structure Recognition (TSR) aims to recover the row and column layout of tables from document images, a key step in document understanding pipelines. Accurate TSR depends on precise boundary localization: small errors in row or column boundaries can propagate into incorrect cell assignments and structural inconsistencies. Yet detection-based approaches treat table elements as generic objects, ignoring a fundamental property of table layout: rows and columns play structurally distinct roles and their boundaries carry unequal importance. We propose an Edge-constrained Fine-grained Localization loss (EFL) that formalizes this structural asymmetry by encoding table-specific geometric priors into the training objective: row-like elements are supervised with emphasis on their horizontal boundaries, while column-like elements prioritize vertical boundaries. Implemented within a real-time detector with distribution-based boundary refinement (D-FINE), EFL operates during training only and guides boundary refinement toward structurally meaningful adjustments with no change to the inference pipeline. The proposed approach, ConRTF, is also data-efficient, maintaining robust accuracy with as few as 2k--3k annotated tables. Experiments on PubTables-1M and two private datasets show consistent improvements over the optimized baseline and several real-time detectors including RT-DETRv2 and YOLOv10-11, with gains of up to +1.6 GriTS points at equal inference speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This adds a table-specific boundary loss to a real-time detector but the abstract leaves the supporting experiments and derivations too thin to evaluate the central claim.

read the letter

ConRTF adds an edge-constrained loss (EFL) that tells the model to weight horizontal boundaries more for row-like elements and vertical ones for column-like elements. It sits inside an existing real-time detector with distribution refinement and changes nothing at inference time.

The targeted prior is the main new piece. Treating rows and columns as structurally asymmetric during training is a reasonable extension for table structure recognition, and the claim of holding accuracy with only 2k-3k examples is practically useful for document pipelines.

The soft spots are the missing pieces that matter most. The abstract states gains up to +1.6 GriTS over baselines like RT-DETRv2 but gives no loss equations, no ablation that turns EFL on and off, and no per-boundary error metrics or checks on irregular tables. Without those, it is impossible to tell whether the reported improvement comes from the asymmetry or from other parts of the pipeline. The assumption that this emphasis produces cleaner cell assignments without new failure modes stays untested in the visible text.

This is the sort of narrow, engineering-focused tweak that might interest teams already running real-time table parsers who want a small lift without speed cost. Readers looking for new architectures or broad theoretical advances will not find them here. The work deserves a serious referee so the authors can supply the controls and analysis that are currently absent.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ConRTF, a real-time transformer detector for table structure recognition that augments a distribution-based boundary refinement module (D-FINE) with an Edge-constrained Fine-grained Localization (EFL) loss. EFL encodes table layout asymmetry by supervising row-like elements with emphasis on horizontal boundaries and column-like elements with emphasis on vertical boundaries. The loss is applied only at training time. Experiments are reported on PubTables-1M and two private datasets, claiming consistent GriTS gains (up to +1.6 points) over optimized baselines and real-time detectors (RT-DETRv2, YOLOv10-11) at matched inference speed, together with robustness when trained on only 2k–3k annotated tables.

Significance. If the EFL loss demonstrably produces boundary refinements that reduce cell-assignment errors without introducing new structural failure modes, the work would supply a lightweight, inference-neutral way to inject domain-specific geometric priors into detection-based TSR. The reported data-efficiency on small annotated sets would also be a practical strength for document-processing pipelines. The approach builds directly on existing real-time detectors, so any validated gains would be immediately deployable.

major comments (2)

[Abstract] Abstract: performance gains of up to +1.6 GriTS are asserted, yet the text supplies neither an ablation that isolates EFL from the D-FINE refinement nor boundary-specific localization metrics (e.g., per-edge error stratified by row vs. column). This omission is load-bearing for the central claim that the structural asymmetry produces the observed cell-assignment improvements.
[Abstract] The weakest link identified in the stress-test note remains unaddressed: without an experiment that measures whether the asymmetric weighting reduces cell-assignment errors on irregular tables or introduces new mis-refinements, it is impossible to confirm that the EFL prior yields structurally meaningful adjustments rather than incidental gains from other components.

minor comments (1)

The two private datasets are referenced but neither named nor characterized (size, domain, annotation protocol), hindering reproducibility and assessment of generalizability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below, with a focus on the evidence for the EFL loss contribution.

read point-by-point responses

Referee: [Abstract] Abstract: performance gains of up to +1.6 GriTS are asserted, yet the text supplies neither an ablation that isolates EFL from the D-FINE refinement nor boundary-specific localization metrics (e.g., per-edge error stratified by row vs. column). This omission is load-bearing for the central claim that the structural asymmetry produces the observed cell-assignment improvements.

Authors: We agree the abstract would benefit from greater specificity. The manuscript presents ConRTF as an augmentation of the D-FINE baseline, with all reported gains measured against that baseline to isolate the effect of adding EFL. We will revise the abstract to explicitly reference the baseline comparison and the role of EFL. We will also incorporate boundary-specific per-edge localization metrics stratified by row versus column elements in the revised manuscript. revision: yes
Referee: [Abstract] The weakest link identified in the stress-test note remains unaddressed: without an experiment that measures whether the asymmetric weighting reduces cell-assignment errors on irregular tables or introduces new mis-refinements, it is impossible to confirm that the EFL prior yields structurally meaningful adjustments rather than incidental gains from other components.

Authors: The GriTS metric directly quantifies cell-assignment accuracy and structural fidelity. Consistent gains on PubTables-1M (which contains irregular tables) and the absence of new failure modes in our qualitative results indicate that EFL produces structurally meaningful boundary adjustments. We will add a targeted breakdown of performance on irregular table subsets in the revision to address this concern more explicitly. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation chain self-contained with no self-referential reductions

full rationale

The provided abstract and context contain no equations, fitted parameters renamed as predictions, self-citations invoked as load-bearing uniqueness theorems, or ansatzes smuggled via prior work. The EFL loss is presented as a training-time encoding of geometric priors without any derivation that reduces to its own inputs by construction. Experimental claims rest on external benchmarks (PubTables-1M, private datasets) rather than internal redefinitions. No load-bearing step collapses to a self-citation chain or fitted-input prediction; the central claim of +1.6 GriTS gains is therefore independent of the patterns that would trigger circularity flags.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no equations, training details, or modeling choices are supplied from which free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5797 in / 1080 out tokens · 21276 ms · 2026-07-02T14:37:52.279852+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 5 canonical work pages

[1]

In: European Conference on Computer Vision (ECCV)

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European Conference on Computer Vision (ECCV). pp. 213–229 (2020)

2020
[2]

arXiv preprint arXiv:2103.05959 (2021) 16 E

Cui, C., Guo, R., Du, Y., He, D., Li, F., Wu, Z., Liu, Q., Wen, S., Huang, J., Hu, X., Yu, D., Ding, E., Ma, Y.: Beyond self-supervision: A simple yet effective network distillation alternative to improve backbones. arXiv preprint arXiv:2103.05959 (2021) 16 E. Thomas et al

work page arXiv 2021
[3]

In: Document Analysis and Recognition (ICDAR)

Hou, Q., Wang, J.: TABLET: Table structure recognition using encoder-only trans- formers. In: Document Analysis and Recognition (ICDAR). pp. 253–278 (2025)

2025
[4]

Jocher, G., Qiu, J.: Ultralytics YOLO11 (2024), https://github.com/ultralytics/ ultralytics

2024
[5]

ACM Computing Surveys56(12) (2024)

Kasem, M., Abdallah, A., Berendeyev, A., Elkady, E., Mahmoud, M., Abdalla, M., Hamada, M., Nurseitov, D., Taj-Eddin, I.: Deep learning for table detection and structure recognition: A survey. ACM Computing Surveys56(12) (2024)

2024
[6]

In: International Joint Conference on Artificial Intelligence (IJCAI) (2024)

Khang, M., Hong, T.: TFLOP: Table structure recognition framework with layout pointer mechanism. In: International Joint Conference on Artificial Intelligence (IJCAI) (2024)

2024
[7]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss v2: Learning reliable localization quality estimation for dense object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

2021
[8]

In: International Conference on Neural Information Processing Systems (NeurIPS) (2020)

Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: General- ized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In: International Conference on Neural Information Processing Systems (NeurIPS) (2020)

2020
[9]

In: Proceedings of the 30th ACM Inter- national Conference on Multimedia (2022)

Lin, W., Sun, Z., Ma, C., Li, M., Wang, J., Sun, L., Huo, Q.: TSRFormer: Table structure recognition with transformers. In: Proceedings of the 30th ACM Inter- national Conference on Multimedia (2022)

2022
[10]

Pattern Recognition157, 110816 (2025)

Long,R.,Xing,H.,Yang,Z.,Zheng,Q.,Yu,Z.,Huang,F.,Yao,C.:LORE++:Log- ical location regression network for table structure recognition with pre-training. Pattern Recognition157, 110816 (2025)

2025
[11]

Lv, W., Zhao, Y., Chang, Q., Huang, K., Wang, G., Liu, Y.: RT-DETRv2: Im- proved baseline with bag-of-freebies for real-time detection transformer (2024), https://arxiv.org/abs/2407.17140

work page arXiv 2024
[12]

In: Document Analysis and Recognition (ICDAR) (2023)

Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P.W.J.: Optimized table to- kenization for table structure recognition. In: Document Analysis and Recognition (ICDAR) (2023)

2023
[13]

Pattern Recognition133, 109006 (2023)

Ma, C., Lin, W., Sun, L., Huo, Q.: Robust table detection and structure recognition from heterogeneous document images. Pattern Recognition133, 109006 (2023)

2023
[14]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Nassar, A., Livathinos, N., Lysak, M., Staar, P.: TableFormer: Table structure understanding with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4614–4623 (2022)

2022
[15]

Peng, Y., Li, H., Wu, P., Zhang, Y., Sun, X., Wu, F.: D-FINE: Redefine regression taskinDETRsasfine-graineddistributionrefinement.In:InternationalConference on Learning Representations (ICLR) (2025)

2025
[16]

Artificial Intelligence Review58(9), 274 (2025)

Sapkota, R., Flores-Calero, M., Qureshi, R., et al.: YOLO advances to its genesis: A decadal and comprehensive review of the you only look once series. Artificial Intelligence Review58(9), 274 (2025)

2025
[17]

In: Interna- tional Conference on Document Analysis and Recognition (ICDAR)

Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In: Interna- tional Conference on Document Analysis and Recognition (ICDAR). pp. 1162–1167 (2017)

2017
[18]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634–4642 (2022)

2022
[19]

In: International Conference on Document Analysis and Recog- nition (ICDAR)

Smock, B., Pesala, R., Abraham, R.: Aligning benchmark datasets for table struc- ture recognition. In: International Conference on Document Analysis and Recog- nition (ICDAR). pp. 371–386 (2023) ConRTF: Edge-Constrained Refinement for Table Structure Recognition 17

2023
[20]

In: Document Analysis and Recognition (ICDAR)

Smock, B., Pesala, R., Abraham, R.: GriTS: Grid table similarity metric for ta- ble structure recognition. In: Document Analysis and Recognition (ICDAR). p. 535–549 (2023)

2023
[21]

In: Winter Conference on Applications of Computer Vision (WACV) (2025)

Thomas, E., Coustaty, M., Joseph, A., Deloin, G., Carel, E., D’Andecy, V.P., Ogier, J.M.: RAPTOR: Refined approach for product table object recognition. In: Winter Conference on Applications of Computer Vision (WACV) (2025)

2025
[22]

In: International Conference on Document Analysis and Recognition (IC- DAR) (2025)

Thomas, E., Coustaty, M., Joseph, A., Deloin, G., Carel, E., D’Andecy, V.P., Ogier, J.M.: QUEST: Quality-aware semi-supervised table extraction for business docu- ments. In: International Conference on Document Analysis and Recognition (IC- DAR) (2025). https://doi.org/10.1007/978-3-032-04630-7_16

work page doi:10.1007/978-3-032-04630-7_16 2025
[23]

In: Advances in Neural Information Processing Systems (NeurIPS) (2025)

Tian, Y., Ye, Q., Doermann, D.: Yolov12: Attention-centric real-time object detec- tors. In: Advances in Neural Information Processing Systems (NeurIPS) (2025)

2025
[24]

In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., Ding, G.: Yolov10: Real- time end-to-end object detection. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

2024
[25]

ArXivabs/2504.17522(2025)

Xiao, A., Yang, C.: Towards one-stage end-to-end table structure recognition with parallel regression for diverse scenarios. ArXivabs/2504.17522(2025)

work page arXiv 2025
[26]

doi: https://doi.org/10.1016/j.eswa

Xiao, B., Simsek, M., Kantarci, B., Alkheir, A.A.: Rethinking detection based table structure recognition for visually rich document images. Expert Systems with Applications269, 126461 (2025). https://doi.org/https://doi.org/10.1016/j.eswa. 2025.126461

work page doi:10.1016/j.eswa 2025
[27]

In: International Confer- ence on Machine Learning (ICML) (2021)

Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Confer- ence on Machine Learning (ICML) (2021)

2021
[28]

In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

Yang,X.,Yang,X.,Yang,J.,Ming,Q.,Wang,W.,Tian,Q.,Yan,J.:Learninghigh- precision bounding box for rotated object detection via kullback-leibler divergence. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

2021
[29]

International Journal on Document Analysis and Recognition (IJDAR)7(1), 1–16 (2004)

Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, ob- servations, transformations, and inferences. International Journal on Document Analysis and Recognition (IJDAR)7(1), 1–16 (2004)

2004
[30]

IEEE Transactions on Geoscience and Remote Sensing62(2024)

Zeng,Y.,Yang,X.,etal.:ARS-DETR:Aspectratio-sensitivedetectiontransformer for aerial oriented object detection. IEEE Transactions on Geoscience and Remote Sensing62(2024)

2024
[31]

In: International Conference on Learning Representations (ICLR) (2023)

Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.Y.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2023)

2023
[32]

In: Findings of the Association for Computational Linguistics: EMNLP

Zhang, Z., Liu, S., Hu, P., Ma, J., Du, J., Zhang, J., Hu, Y.: UniTabNet: Bridging vision and language models for enhanced table structure recognition. In: Findings of the Association for Computational Linguistics: EMNLP. p. 6131–6143 (2024)

2024
[33]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: DE- TRs beat YOLOs on real-time object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16965–16974 (2024)

2024
[34]

In: European Conference on Computer Vision (ECCV)

Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: European Conference on Computer Vision (ECCV). pp. 564–580 (2020)

2020
[35]

In: International Conference on Learning Representations (ICLR) (2021)

Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: De- formable transformers for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2021)

2021

[1] [1]

In: European Conference on Computer Vision (ECCV)

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European Conference on Computer Vision (ECCV). pp. 213–229 (2020)

2020

[2] [2]

arXiv preprint arXiv:2103.05959 (2021) 16 E

Cui, C., Guo, R., Du, Y., He, D., Li, F., Wu, Z., Liu, Q., Wen, S., Huang, J., Hu, X., Yu, D., Ding, E., Ma, Y.: Beyond self-supervision: A simple yet effective network distillation alternative to improve backbones. arXiv preprint arXiv:2103.05959 (2021) 16 E. Thomas et al

work page arXiv 2021

[3] [3]

In: Document Analysis and Recognition (ICDAR)

Hou, Q., Wang, J.: TABLET: Table structure recognition using encoder-only trans- formers. In: Document Analysis and Recognition (ICDAR). pp. 253–278 (2025)

2025

[4] [4]

Jocher, G., Qiu, J.: Ultralytics YOLO11 (2024), https://github.com/ultralytics/ ultralytics

2024

[5] [5]

ACM Computing Surveys56(12) (2024)

Kasem, M., Abdallah, A., Berendeyev, A., Elkady, E., Mahmoud, M., Abdalla, M., Hamada, M., Nurseitov, D., Taj-Eddin, I.: Deep learning for table detection and structure recognition: A survey. ACM Computing Surveys56(12) (2024)

2024

[6] [6]

In: International Joint Conference on Artificial Intelligence (IJCAI) (2024)

Khang, M., Hong, T.: TFLOP: Table structure recognition framework with layout pointer mechanism. In: International Joint Conference on Artificial Intelligence (IJCAI) (2024)

2024

[7] [7]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss v2: Learning reliable localization quality estimation for dense object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

2021

[8] [8]

In: International Conference on Neural Information Processing Systems (NeurIPS) (2020)

Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: General- ized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In: International Conference on Neural Information Processing Systems (NeurIPS) (2020)

2020

[9] [9]

In: Proceedings of the 30th ACM Inter- national Conference on Multimedia (2022)

Lin, W., Sun, Z., Ma, C., Li, M., Wang, J., Sun, L., Huo, Q.: TSRFormer: Table structure recognition with transformers. In: Proceedings of the 30th ACM Inter- national Conference on Multimedia (2022)

2022

[10] [10]

Pattern Recognition157, 110816 (2025)

Long,R.,Xing,H.,Yang,Z.,Zheng,Q.,Yu,Z.,Huang,F.,Yao,C.:LORE++:Log- ical location regression network for table structure recognition with pre-training. Pattern Recognition157, 110816 (2025)

2025

[11] [11]

Lv, W., Zhao, Y., Chang, Q., Huang, K., Wang, G., Liu, Y.: RT-DETRv2: Im- proved baseline with bag-of-freebies for real-time detection transformer (2024), https://arxiv.org/abs/2407.17140

work page arXiv 2024

[12] [12]

In: Document Analysis and Recognition (ICDAR) (2023)

Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P.W.J.: Optimized table to- kenization for table structure recognition. In: Document Analysis and Recognition (ICDAR) (2023)

2023

[13] [13]

Pattern Recognition133, 109006 (2023)

Ma, C., Lin, W., Sun, L., Huo, Q.: Robust table detection and structure recognition from heterogeneous document images. Pattern Recognition133, 109006 (2023)

2023

[14] [14]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Nassar, A., Livathinos, N., Lysak, M., Staar, P.: TableFormer: Table structure understanding with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4614–4623 (2022)

2022

[15] [15]

Peng, Y., Li, H., Wu, P., Zhang, Y., Sun, X., Wu, F.: D-FINE: Redefine regression taskinDETRsasfine-graineddistributionrefinement.In:InternationalConference on Learning Representations (ICLR) (2025)

2025

[16] [16]

Artificial Intelligence Review58(9), 274 (2025)

Sapkota, R., Flores-Calero, M., Qureshi, R., et al.: YOLO advances to its genesis: A decadal and comprehensive review of the you only look once series. Artificial Intelligence Review58(9), 274 (2025)

2025

[17] [17]

In: Interna- tional Conference on Document Analysis and Recognition (ICDAR)

Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In: Interna- tional Conference on Document Analysis and Recognition (ICDAR). pp. 1162–1167 (2017)

2017

[18] [18]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634–4642 (2022)

2022

[19] [19]

In: International Conference on Document Analysis and Recog- nition (ICDAR)

Smock, B., Pesala, R., Abraham, R.: Aligning benchmark datasets for table struc- ture recognition. In: International Conference on Document Analysis and Recog- nition (ICDAR). pp. 371–386 (2023) ConRTF: Edge-Constrained Refinement for Table Structure Recognition 17

2023

[20] [20]

In: Document Analysis and Recognition (ICDAR)

Smock, B., Pesala, R., Abraham, R.: GriTS: Grid table similarity metric for ta- ble structure recognition. In: Document Analysis and Recognition (ICDAR). p. 535–549 (2023)

2023

[21] [21]

In: Winter Conference on Applications of Computer Vision (WACV) (2025)

Thomas, E., Coustaty, M., Joseph, A., Deloin, G., Carel, E., D’Andecy, V.P., Ogier, J.M.: RAPTOR: Refined approach for product table object recognition. In: Winter Conference on Applications of Computer Vision (WACV) (2025)

2025

[22] [22]

In: International Conference on Document Analysis and Recognition (IC- DAR) (2025)

Thomas, E., Coustaty, M., Joseph, A., Deloin, G., Carel, E., D’Andecy, V.P., Ogier, J.M.: QUEST: Quality-aware semi-supervised table extraction for business docu- ments. In: International Conference on Document Analysis and Recognition (IC- DAR) (2025). https://doi.org/10.1007/978-3-032-04630-7_16

work page doi:10.1007/978-3-032-04630-7_16 2025

[23] [23]

In: Advances in Neural Information Processing Systems (NeurIPS) (2025)

Tian, Y., Ye, Q., Doermann, D.: Yolov12: Attention-centric real-time object detec- tors. In: Advances in Neural Information Processing Systems (NeurIPS) (2025)

2025

[24] [24]

In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., Ding, G.: Yolov10: Real- time end-to-end object detection. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

2024

[25] [25]

ArXivabs/2504.17522(2025)

Xiao, A., Yang, C.: Towards one-stage end-to-end table structure recognition with parallel regression for diverse scenarios. ArXivabs/2504.17522(2025)

work page arXiv 2025

[26] [26]

doi: https://doi.org/10.1016/j.eswa

Xiao, B., Simsek, M., Kantarci, B., Alkheir, A.A.: Rethinking detection based table structure recognition for visually rich document images. Expert Systems with Applications269, 126461 (2025). https://doi.org/https://doi.org/10.1016/j.eswa. 2025.126461

work page doi:10.1016/j.eswa 2025

[27] [27]

In: International Confer- ence on Machine Learning (ICML) (2021)

Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Confer- ence on Machine Learning (ICML) (2021)

2021

[28] [28]

In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

Yang,X.,Yang,X.,Yang,J.,Ming,Q.,Wang,W.,Tian,Q.,Yan,J.:Learninghigh- precision bounding box for rotated object detection via kullback-leibler divergence. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

2021

[29] [29]

International Journal on Document Analysis and Recognition (IJDAR)7(1), 1–16 (2004)

Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, ob- servations, transformations, and inferences. International Journal on Document Analysis and Recognition (IJDAR)7(1), 1–16 (2004)

2004

[30] [30]

IEEE Transactions on Geoscience and Remote Sensing62(2024)

Zeng,Y.,Yang,X.,etal.:ARS-DETR:Aspectratio-sensitivedetectiontransformer for aerial oriented object detection. IEEE Transactions on Geoscience and Remote Sensing62(2024)

2024

[31] [31]

In: International Conference on Learning Representations (ICLR) (2023)

Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.Y.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2023)

2023

[32] [32]

In: Findings of the Association for Computational Linguistics: EMNLP

Zhang, Z., Liu, S., Hu, P., Ma, J., Du, J., Zhang, J., Hu, Y.: UniTabNet: Bridging vision and language models for enhanced table structure recognition. In: Findings of the Association for Computational Linguistics: EMNLP. p. 6131–6143 (2024)

2024

[33] [33]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: DE- TRs beat YOLOs on real-time object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16965–16974 (2024)

2024

[34] [34]

In: European Conference on Computer Vision (ECCV)

Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: European Conference on Computer Vision (ECCV). pp. 564–580 (2020)

2020

[35] [35]

In: International Conference on Learning Representations (ICLR) (2021)

Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: De- formable transformers for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2021)

2021