ConRTF: Edge-Constrained Boundary Distribution Refinement for Realtime TransFormer Table Structure Recognition
Pith reviewed 2026-07-02 14:37 UTC · model grok-4.3
The pith
An edge-constrained loss that weights horizontal boundaries for rows and vertical boundaries for columns improves table structure recognition accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ConRTF encodes structural asymmetry directly into the training objective through the EFL loss: row-like elements receive stronger supervision on horizontal boundaries and column-like elements on vertical boundaries. When applied inside a real-time detector that already performs distribution-based boundary refinement, this produces more structurally consistent boundaries without altering the inference pipeline or requiring additional data.
What carries the argument
Edge-constrained Fine-grained Localization loss (EFL) that encodes table-specific geometric priors by emphasizing horizontal boundaries for row-like elements and vertical boundaries for column-like elements.
If this is right
- The method delivers consistent accuracy gains over optimized baselines and real-time detectors such as RT-DETRv2 and YOLOv10-11 while preserving identical inference speed.
- Performance remains robust when training data is limited to 2k-3k annotated tables on PubTables-1M and private datasets.
- Because EFL operates only during training, the approach adds no computational cost at inference time.
- Boundary refinements are guided toward table geometry rather than generic object detection objectives.
Where Pith is reading between the lines
- The same boundary-type asymmetry principle could be tested on other layout tasks such as form parsing or diagram understanding where horizontal and vertical elements play unequal roles.
- If the EFL weighting proves stable across table styles, it may reduce the need for post-processing heuristics that currently correct boundary errors in production TSR systems.
- Data efficiency with 2k-3k examples suggests the loss could serve as a regularizer when adapting TSR models to new document domains with limited labels.
Load-bearing premise
Emphasizing horizontal boundaries for row-like elements and vertical boundaries for column-like elements will produce structurally meaningful boundary refinements that improve cell assignment accuracy without introducing new failure modes.
What would settle it
A controlled test on tables with irregular or merged cells where the EFL-trained model shows equal or lower cell assignment accuracy than the baseline detector at the same boundary localization precision.
Figures
read the original abstract
Table Structure Recognition (TSR) aims to recover the row and column layout of tables from document images, a key step in document understanding pipelines. Accurate TSR depends on precise boundary localization: small errors in row or column boundaries can propagate into incorrect cell assignments and structural inconsistencies. Yet detection-based approaches treat table elements as generic objects, ignoring a fundamental property of table layout: rows and columns play structurally distinct roles and their boundaries carry unequal importance. We propose an Edge-constrained Fine-grained Localization loss (EFL) that formalizes this structural asymmetry by encoding table-specific geometric priors into the training objective: row-like elements are supervised with emphasis on their horizontal boundaries, while column-like elements prioritize vertical boundaries. Implemented within a real-time detector with distribution-based boundary refinement (D-FINE), EFL operates during training only and guides boundary refinement toward structurally meaningful adjustments with no change to the inference pipeline. The proposed approach, ConRTF, is also data-efficient, maintaining robust accuracy with as few as 2k--3k annotated tables. Experiments on PubTables-1M and two private datasets show consistent improvements over the optimized baseline and several real-time detectors including RT-DETRv2 and YOLOv10-11, with gains of up to +1.6 GriTS points at equal inference speed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ConRTF, a real-time transformer detector for table structure recognition that augments a distribution-based boundary refinement module (D-FINE) with an Edge-constrained Fine-grained Localization (EFL) loss. EFL encodes table layout asymmetry by supervising row-like elements with emphasis on horizontal boundaries and column-like elements with emphasis on vertical boundaries. The loss is applied only at training time. Experiments are reported on PubTables-1M and two private datasets, claiming consistent GriTS gains (up to +1.6 points) over optimized baselines and real-time detectors (RT-DETRv2, YOLOv10-11) at matched inference speed, together with robustness when trained on only 2k–3k annotated tables.
Significance. If the EFL loss demonstrably produces boundary refinements that reduce cell-assignment errors without introducing new structural failure modes, the work would supply a lightweight, inference-neutral way to inject domain-specific geometric priors into detection-based TSR. The reported data-efficiency on small annotated sets would also be a practical strength for document-processing pipelines. The approach builds directly on existing real-time detectors, so any validated gains would be immediately deployable.
major comments (2)
- [Abstract] Abstract: performance gains of up to +1.6 GriTS are asserted, yet the text supplies neither an ablation that isolates EFL from the D-FINE refinement nor boundary-specific localization metrics (e.g., per-edge error stratified by row vs. column). This omission is load-bearing for the central claim that the structural asymmetry produces the observed cell-assignment improvements.
- [Abstract] The weakest link identified in the stress-test note remains unaddressed: without an experiment that measures whether the asymmetric weighting reduces cell-assignment errors on irregular tables or introduces new mis-refinements, it is impossible to confirm that the EFL prior yields structurally meaningful adjustments rather than incidental gains from other components.
minor comments (1)
- The two private datasets are referenced but neither named nor characterized (size, domain, annotation protocol), hindering reproducibility and assessment of generalizability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments point by point below, with a focus on the evidence for the EFL loss contribution.
read point-by-point responses
-
Referee: [Abstract] Abstract: performance gains of up to +1.6 GriTS are asserted, yet the text supplies neither an ablation that isolates EFL from the D-FINE refinement nor boundary-specific localization metrics (e.g., per-edge error stratified by row vs. column). This omission is load-bearing for the central claim that the structural asymmetry produces the observed cell-assignment improvements.
Authors: We agree the abstract would benefit from greater specificity. The manuscript presents ConRTF as an augmentation of the D-FINE baseline, with all reported gains measured against that baseline to isolate the effect of adding EFL. We will revise the abstract to explicitly reference the baseline comparison and the role of EFL. We will also incorporate boundary-specific per-edge localization metrics stratified by row versus column elements in the revised manuscript. revision: yes
-
Referee: [Abstract] The weakest link identified in the stress-test note remains unaddressed: without an experiment that measures whether the asymmetric weighting reduces cell-assignment errors on irregular tables or introduces new mis-refinements, it is impossible to confirm that the EFL prior yields structurally meaningful adjustments rather than incidental gains from other components.
Authors: The GriTS metric directly quantifies cell-assignment accuracy and structural fidelity. Consistent gains on PubTables-1M (which contains irregular tables) and the absence of new failure modes in our qualitative results indicate that EFL produces structurally meaningful boundary adjustments. We will add a targeted breakdown of performance on irregular table subsets in the revision to address this concern more explicitly. revision: partial
Circularity Check
No circularity: derivation chain self-contained with no self-referential reductions
full rationale
The provided abstract and context contain no equations, fitted parameters renamed as predictions, self-citations invoked as load-bearing uniqueness theorems, or ansatzes smuggled via prior work. The EFL loss is presented as a training-time encoding of geometric priors without any derivation that reduces to its own inputs by construction. Experimental claims rest on external benchmarks (PubTables-1M, private datasets) rather than internal redefinitions. No load-bearing step collapses to a self-citation chain or fitted-input prediction; the central claim of +1.6 GriTS gains is therefore independent of the patterns that would trigger circularity flags.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: European Conference on Computer Vision (ECCV)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European Conference on Computer Vision (ECCV). pp. 213–229 (2020)
2020
-
[2]
arXiv preprint arXiv:2103.05959 (2021) 16 E
Cui, C., Guo, R., Du, Y., He, D., Li, F., Wu, Z., Liu, Q., Wen, S., Huang, J., Hu, X., Yu, D., Ding, E., Ma, Y.: Beyond self-supervision: A simple yet effective network distillation alternative to improve backbones. arXiv preprint arXiv:2103.05959 (2021) 16 E. Thomas et al
-
[3]
In: Document Analysis and Recognition (ICDAR)
Hou, Q., Wang, J.: TABLET: Table structure recognition using encoder-only trans- formers. In: Document Analysis and Recognition (ICDAR). pp. 253–278 (2025)
2025
-
[4]
Jocher, G., Qiu, J.: Ultralytics YOLO11 (2024), https://github.com/ultralytics/ ultralytics
2024
-
[5]
ACM Computing Surveys56(12) (2024)
Kasem, M., Abdallah, A., Berendeyev, A., Elkady, E., Mahmoud, M., Abdalla, M., Hamada, M., Nurseitov, D., Taj-Eddin, I.: Deep learning for table detection and structure recognition: A survey. ACM Computing Surveys56(12) (2024)
2024
-
[6]
In: International Joint Conference on Artificial Intelligence (IJCAI) (2024)
Khang, M., Hong, T.: TFLOP: Table structure recognition framework with layout pointer mechanism. In: International Joint Conference on Artificial Intelligence (IJCAI) (2024)
2024
-
[7]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss v2: Learning reliable localization quality estimation for dense object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
2021
-
[8]
In: International Conference on Neural Information Processing Systems (NeurIPS) (2020)
Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: General- ized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In: International Conference on Neural Information Processing Systems (NeurIPS) (2020)
2020
-
[9]
In: Proceedings of the 30th ACM Inter- national Conference on Multimedia (2022)
Lin, W., Sun, Z., Ma, C., Li, M., Wang, J., Sun, L., Huo, Q.: TSRFormer: Table structure recognition with transformers. In: Proceedings of the 30th ACM Inter- national Conference on Multimedia (2022)
2022
-
[10]
Pattern Recognition157, 110816 (2025)
Long,R.,Xing,H.,Yang,Z.,Zheng,Q.,Yu,Z.,Huang,F.,Yao,C.:LORE++:Log- ical location regression network for table structure recognition with pre-training. Pattern Recognition157, 110816 (2025)
2025
- [11]
-
[12]
In: Document Analysis and Recognition (ICDAR) (2023)
Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P.W.J.: Optimized table to- kenization for table structure recognition. In: Document Analysis and Recognition (ICDAR) (2023)
2023
-
[13]
Pattern Recognition133, 109006 (2023)
Ma, C., Lin, W., Sun, L., Huo, Q.: Robust table detection and structure recognition from heterogeneous document images. Pattern Recognition133, 109006 (2023)
2023
-
[14]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Nassar, A., Livathinos, N., Lysak, M., Staar, P.: TableFormer: Table structure understanding with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4614–4623 (2022)
2022
-
[15]
Peng, Y., Li, H., Wu, P., Zhang, Y., Sun, X., Wu, F.: D-FINE: Redefine regression taskinDETRsasfine-graineddistributionrefinement.In:InternationalConference on Learning Representations (ICLR) (2025)
2025
-
[16]
Artificial Intelligence Review58(9), 274 (2025)
Sapkota, R., Flores-Calero, M., Qureshi, R., et al.: YOLO advances to its genesis: A decadal and comprehensive review of the you only look once series. Artificial Intelligence Review58(9), 274 (2025)
2025
-
[17]
In: Interna- tional Conference on Document Analysis and Recognition (ICDAR)
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In: Interna- tional Conference on Document Analysis and Recognition (ICDAR). pp. 1162–1167 (2017)
2017
-
[18]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634–4642 (2022)
2022
-
[19]
In: International Conference on Document Analysis and Recog- nition (ICDAR)
Smock, B., Pesala, R., Abraham, R.: Aligning benchmark datasets for table struc- ture recognition. In: International Conference on Document Analysis and Recog- nition (ICDAR). pp. 371–386 (2023) ConRTF: Edge-Constrained Refinement for Table Structure Recognition 17
2023
-
[20]
In: Document Analysis and Recognition (ICDAR)
Smock, B., Pesala, R., Abraham, R.: GriTS: Grid table similarity metric for ta- ble structure recognition. In: Document Analysis and Recognition (ICDAR). p. 535–549 (2023)
2023
-
[21]
In: Winter Conference on Applications of Computer Vision (WACV) (2025)
Thomas, E., Coustaty, M., Joseph, A., Deloin, G., Carel, E., D’Andecy, V.P., Ogier, J.M.: RAPTOR: Refined approach for product table object recognition. In: Winter Conference on Applications of Computer Vision (WACV) (2025)
2025
-
[22]
In: International Conference on Document Analysis and Recognition (IC- DAR) (2025)
Thomas, E., Coustaty, M., Joseph, A., Deloin, G., Carel, E., D’Andecy, V.P., Ogier, J.M.: QUEST: Quality-aware semi-supervised table extraction for business docu- ments. In: International Conference on Document Analysis and Recognition (IC- DAR) (2025). https://doi.org/10.1007/978-3-032-04630-7_16
-
[23]
In: Advances in Neural Information Processing Systems (NeurIPS) (2025)
Tian, Y., Ye, Q., Doermann, D.: Yolov12: Attention-centric real-time object detec- tors. In: Advances in Neural Information Processing Systems (NeurIPS) (2025)
2025
-
[24]
In: Advances in Neural Information Processing Systems (NeurIPS) (2024)
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., Ding, G.: Yolov10: Real- time end-to-end object detection. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)
2024
-
[25]
Xiao, A., Yang, C.: Towards one-stage end-to-end table structure recognition with parallel regression for diverse scenarios. ArXivabs/2504.17522(2025)
-
[26]
doi: https://doi.org/10.1016/j.eswa
Xiao, B., Simsek, M., Kantarci, B., Alkheir, A.A.: Rethinking detection based table structure recognition for visually rich document images. Expert Systems with Applications269, 126461 (2025). https://doi.org/https://doi.org/10.1016/j.eswa. 2025.126461
-
[27]
In: International Confer- ence on Machine Learning (ICML) (2021)
Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Confer- ence on Machine Learning (ICML) (2021)
2021
-
[28]
In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Yang,X.,Yang,X.,Yang,J.,Ming,Q.,Wang,W.,Tian,Q.,Yan,J.:Learninghigh- precision bounding box for rotated object detection via kullback-leibler divergence. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
2021
-
[29]
International Journal on Document Analysis and Recognition (IJDAR)7(1), 1–16 (2004)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, ob- servations, transformations, and inferences. International Journal on Document Analysis and Recognition (IJDAR)7(1), 1–16 (2004)
2004
-
[30]
IEEE Transactions on Geoscience and Remote Sensing62(2024)
Zeng,Y.,Yang,X.,etal.:ARS-DETR:Aspectratio-sensitivedetectiontransformer for aerial oriented object detection. IEEE Transactions on Geoscience and Remote Sensing62(2024)
2024
-
[31]
In: International Conference on Learning Representations (ICLR) (2023)
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.Y.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2023)
2023
-
[32]
In: Findings of the Association for Computational Linguistics: EMNLP
Zhang, Z., Liu, S., Hu, P., Ma, J., Du, J., Zhang, J., Hu, Y.: UniTabNet: Bridging vision and language models for enhanced table structure recognition. In: Findings of the Association for Computational Linguistics: EMNLP. p. 6131–6143 (2024)
2024
-
[33]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: DE- TRs beat YOLOs on real-time object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16965–16974 (2024)
2024
-
[34]
In: European Conference on Computer Vision (ECCV)
Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: European Conference on Computer Vision (ECCV). pp. 564–580 (2020)
2020
-
[35]
In: International Conference on Learning Representations (ICLR) (2021)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: De- formable transformers for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2021)
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.