BADAM: A Public Dataset for Baseline Detection in Arabic-script Manuscripts
Pith reviewed 2026-05-25 00:46 UTC · model grok-4.3
The pith
BADAM supplies 400 annotated Arabic manuscript images to train baseline detectors for text line extraction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a dataset of 400 annotated document images from different domains and time periods. A short elaboration on the particular challenges posed by handwriting in Arabic script for layout analysis and subsequent processing steps is given. Lastly, we propose a method based on a fully convolutional encoder-decoder network to extract arbitrarily shaped text line images from manuscripts.
What carries the argument
The BADAM dataset of 400 annotated Arabic-script manuscript images together with a fully convolutional encoder-decoder network for extracting text lines.
If this is right
- Baseline detection systems can now be trained and evaluated on a public Arabic-script resource instead of private collections.
- Layout analysis pipelines for historical documents gain a concrete starting point for handling non-Latin scripts.
- The encoder-decoder architecture provides one concrete technique that future work can compare against when processing arbitrarily shaped lines.
- Researchers gain a shared benchmark that makes incremental improvements in Arabic manuscript processing measurable.
Where Pith is reading between the lines
- If the dataset proves representative, similar annotation efforts could be applied to other under-resourced scripts to accelerate recognition research.
- Performance gaps on new manuscripts would indicate the need for additional diversity in future releases of the dataset.
- The network architecture could be adapted to related tasks such as word segmentation once line images are reliably extracted.
Load-bearing premise
The 400 selected images capture enough of the variation in Arabic handwriting across domains and eras for models trained on them to work on unseen manuscripts.
What would settle it
Train the proposed network on the BADAM training split and measure its text-line extraction accuracy on a fresh collection of Arabic manuscripts drawn from periods or domains absent from the 400-image set.
Figures
read the original abstract
The application of handwritten text recognition to historical works is highly dependant on accurate text line retrieval. A number of systems utilizing a robust baseline detection paradigm have emerged recently but the advancement of layout analysis methods for challenging scripts is held back by the lack of well-established datasets including works in non-Latin scripts. We present a dataset of 400 annotated document images from different domains and time periods. A short elaboration on the particular challenges posed by handwriting in Arabic script for layout analysis and subsequent processing steps is given. Lastly, we propose a method based on a fully convolutional encoder-decoder network to extract arbitrarily shaped text line images from manuscripts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the BADAM dataset of 400 annotated Arabic-script manuscript images drawn from different domains and time periods, briefly discusses challenges specific to Arabic handwriting for layout analysis, and proposes a fully convolutional encoder-decoder network for extracting arbitrarily shaped text lines.
Significance. A well-curated public dataset for baseline detection in Arabic manuscripts would address a documented scarcity of resources for non-Latin scripts and could support reproducible progress in historical document analysis. The proposed network architecture is a standard choice whose utility would be strengthened by empirical validation on the released data.
major comments (2)
- [Abstract] Abstract: the statement that the 400 images come 'from different domains and time periods' supplies no quantitative breakdown (counts per century, per script subtype, or per degradation class), which is load-bearing for the claim that models trained on BADAM will generalize to unseen manuscripts.
- [Abstract] Abstract / Method section: the fully convolutional encoder-decoder is described only at a high level with no training protocol, loss function, hyper-parameters, or any quantitative results (precision, recall, or pixel-level metrics) on the 400 images, leaving unsupported the assertion that the method handles Arabic-specific challenges.
minor comments (1)
- The abstract would be clearer if it stated the annotation format (e.g., polygon coordinates, pixel masks) and the tool or protocol used to produce the ground truth.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that the 400 images come 'from different domains and time periods' supplies no quantitative breakdown (counts per century, per script subtype, or per degradation class), which is load-bearing for the claim that models trained on BADAM will generalize to unseen manuscripts.
Authors: We agree that a quantitative breakdown is necessary to substantiate the generalization claim. While the manuscript provides qualitative descriptions of the sources, we will add a table or dedicated subsection in the revised version detailing the distribution of the 400 images by century, script subtype, and degradation class. revision: yes
-
Referee: [Abstract] Abstract / Method section: the fully convolutional encoder-decoder is described only at a high level with no training protocol, loss function, hyper-parameters, or any quantitative results (precision, recall, or pixel-level metrics) on the 400 images, leaving unsupported the assertion that the method handles Arabic-specific challenges.
Authors: The primary contribution is the BADAM dataset; the FCN serves as an illustrative baseline. We acknowledge that the current description lacks sufficient detail to support claims about handling Arabic-specific challenges. In the revision we will expand the method section with the training protocol, loss function, hyperparameters, and report quantitative metrics (e.g., precision, recall, pixel-level IoU) evaluated on the released data. revision: yes
Circularity Check
No circularity: dataset release with independent method description
full rationale
The paper presents a new public dataset of 400 images and describes a fully convolutional encoder-decoder network for baseline detection. No equations, parameter fits, predictions, or self-citations appear in the provided text that reduce any claimed result to the inputs by construction. The contribution is self-contained as a data release plus high-level architecture outline; representativeness claims are empirical assumptions, not circular derivations.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present a dataset of 400 annotated document images... propose a method based on a fully convolutional encoder-decoder network
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
U-Net architecture... ResNet blocks... group normalization
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Ste- fan Pletschacher. 2011. Historical document layout analysis competition. In Doc- ument Analysis and Recognition (ICDAR), 2011 11th International Conference on . IEEE, 1516–1520
work page 2011
-
[2]
Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Ste- fan Pletschacher. 2013. Icdar 2013 competition on historical newspaper layout analysis (hnla 2013). In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 1454–1458
work page 2013
-
[3]
Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Ste- fan Pletschacher. 2015. ICDAR2015 competition on recognition of documents with complex layouts-RDCL2015. In Document Analysis and Recognition (IC- DAR), 2015 13th International Conference on . IEEE, 1151–1155
work page 2015
-
[4]
Apostolos Antonacopoulos, Stefan Pletschacher, David Bridson, and Christos Papadopoulos. 2009. ICDAR 2009 page segmentation competition. In Document Analysis and Recognition, 2009. ICDAR’09. 10th International Conference on. IEEE, 1370–1374
work page 2009
-
[5]
Berat Barakat, Ahmad Droby, Majeed Kassis, and Jihad El-Sana. 2018. Text Line Segmentation for Challenging Handwritten Document Images using Fully Con- volutional Network. In 2018 16th International Conference on Frontiers in Hand- writing Recognition (ICFHR) . IEEE, 374–379
work page 2018
-
[6]
Jean-Christophe Burie, Mickaël Coustaty, Setiawan Hadi, Made Windu Antara Kesiman, Jean-Marc Ogier, Erick Paulus, Kimheng Sok, I Made Gede Sunarya, and Dona Valy. 2016. ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts. In Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference o...
work page 2016
-
[7]
Christian Clausner, Apostolos Antonacopoulos, Nora Mcgregor, and Daniel Wilson-Nunn. 2018. ICFHR 2018 Competition on Recognition of Historical Ara- bic Scientific Manuscripts–RASM2018. In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) . IEEE, 471–476
work page 2018
-
[8]
Markus Diem, Florian Kleber, Stefan Fiel, Tobias Grüning, and Basilis Gatos
-
[9]
In Document Analy- sis and Recognition (ICDAR), 2017 14th IAPR International Conference on , Vol
cbad: Icdar2017 competition on baseline detection. In Document Analy- sis and Recognition (ICDAR), 2017 14th IAPR International Conference on , Vol. 1. IEEE, 1355–1360
work page 2017
-
[10]
David H Douglas and Thomas K Peucker. 1973. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovi- sualization 10, 2 (1973), 112–122
work page 1973
-
[11]
Michael Fink, Thomas Layer, Georg Mackenbrock, and Michael Sprinzl. 2018. Baseline Detection in Historical Documents using Convolutional U-Nets. In2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 37– 42
work page 2018
-
[12]
Andreas Fischer, Volkmar Frinken, Alicia Fornés, and Horst Bunke. 2011. Tran- scription alignment of Latin manuscripts using hidden Markov models. In Pro- ceedings of the 2011 Workshop on Historical Document Imaging and Processing . ACM, 29–36
work page 2011
-
[13]
Basilis Gatos, Nikolaos Stamatopoulos, and Georgios Louloudis. 2010. ICHFR 2010 handwriting segmentation contest. In 2010 11th International Conference on Frontiers in Handwriting Recognition (ICFHR) . IEEE, 737–742
work page 2010
-
[14]
Basilios Gatos, Nikolaos Stamatopoulos, and Georgios Louloudis. 2011. IC- DAR2009 handwriting segmentation contest. International Journal on Document Analysis and Recognition (IJDAR) 14, 1 (2011), 25–33
work page 2011
-
[15]
Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, and Stefan Fiel
-
[16]
In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
Read-bad: A new dataset and evaluation scheme for baseline detection in archival documents. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 351–356
work page 2018
-
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034
work page 2015
-
[18]
Majeed Kassis, Alaa Abdalhaleem, Ahmad Droby, Reem Alaasam, and Jihad El- Sana. 2017. VML-HD: The historical Arabic documents dataset for recognition systems. In Arabic Script Analysis and Recognition (ASAR), 2017 1st International Workshop on. IEEE, 11–14. Benjamin Kiessling, Daniel Stökl Ben Ezra, and Matthew Thomas Miller
work page 2017
-
[19]
Ta-Chih Lee, Rangasami L Kashyap, and Chong-Nam Chu. 1994. Building skele- ton models via 3-D medial surface axis thinning algorithms. CVGIP: Graphical Models and Image Processing 56, 6 (1994), 462–478
work page 1994
-
[20]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition . 3431–3440
work page 2015
-
[21]
Michael Murdock, Shawn Reid, Blaine Hamilton, and Jackson Reese. 2015. IC- DAR 2015 competition on text line detection in historical documents. In Doc- ument Analysis and Recognition (ICDAR), 2015 13th International Conference on . IEEE, 1171–1175
work page 2015
-
[22]
Lorenzo Quirós. 2018. Multi-Task Handwritten Document Layout Analysis. arXiv preprint arXiv:1806.08852 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[23]
Veronica Romero, Joan Andreu Sanchez, Vicente Bosch, Katrien Depuydt, and Jesse de Does. 2015. Influence of text line segmentation in handwritten text recognition. In Document Analysis and Recognition (ICDAR), 2015 13th Interna- tional Conference on. IEEE, 536–540
work page 2015
-
[24]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolu- tional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention . Springer, 234– 241
work page 2015
-
[25]
Jaakko Sauvola and Matti Pietikäinen. 2000. Adaptive document image binariza- tion. Pattern recognition 33, 2 (2000), 225–236
work page 2000
-
[26]
Foteini Simistira, Mathias Seuret, Nicole Eichenberger, Angelika Garz, Marcus Liwicki, and Rolf Ingold. 2016. Diva-hisdb: A precisely annotated large dataset of challenging medieval manuscripts. In Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference on . IEEE, 471–476
work page 2016
-
[27]
Yuxin Wu and Kaiming He. 2018. Group Normalization. CoRR abs/1803.08494 (2018). arXiv: 1803.08494 http://arxiv.org/abs/1803.08494
work page internal anchor Pith review Pith/arXiv arXiv 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.