pith. sign in

arxiv: 1907.04041 · v1 · pith:2QHVSE73new · submitted 2019-07-09 · 💻 cs.CV

BADAM: A Public Dataset for Baseline Detection in Arabic-script Manuscripts

Pith reviewed 2026-05-25 00:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords baseline detectionArabic manuscriptsdocument layout analysishandwritten text recognitionpublic datasetconvolutional neural networktext line extraction
0
0 comments X

The pith

BADAM supplies 400 annotated Arabic manuscript images to train baseline detectors for text line extraction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a public dataset called BADAM consisting of 400 annotated document images drawn from varied domains and historical periods in Arabic script. It notes that progress on layout analysis for non-Latin scripts has been limited by the absence of established datasets, which in turn slows handwritten text recognition on historical works. The authors also describe a fully convolutional encoder-decoder network designed to pull out text lines of arbitrary shape from these manuscripts. Accurate baseline detection serves as a prerequisite step for recognition systems, so the new resource is positioned to support method development in this area.

Core claim

We present a dataset of 400 annotated document images from different domains and time periods. A short elaboration on the particular challenges posed by handwriting in Arabic script for layout analysis and subsequent processing steps is given. Lastly, we propose a method based on a fully convolutional encoder-decoder network to extract arbitrarily shaped text line images from manuscripts.

What carries the argument

The BADAM dataset of 400 annotated Arabic-script manuscript images together with a fully convolutional encoder-decoder network for extracting text lines.

If this is right

  • Baseline detection systems can now be trained and evaluated on a public Arabic-script resource instead of private collections.
  • Layout analysis pipelines for historical documents gain a concrete starting point for handling non-Latin scripts.
  • The encoder-decoder architecture provides one concrete technique that future work can compare against when processing arbitrarily shaped lines.
  • Researchers gain a shared benchmark that makes incremental improvements in Arabic manuscript processing measurable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the dataset proves representative, similar annotation efforts could be applied to other under-resourced scripts to accelerate recognition research.
  • Performance gaps on new manuscripts would indicate the need for additional diversity in future releases of the dataset.
  • The network architecture could be adapted to related tasks such as word segmentation once line images are reliably extracted.

Load-bearing premise

The 400 selected images capture enough of the variation in Arabic handwriting across domains and eras for models trained on them to work on unseen manuscripts.

What would settle it

Train the proposed network on the BADAM training split and measure its text-line extraction accuracy on a fresh collection of Arabic manuscripts drawn from periods or domains absent from the 400-image set.

Figures

Figures reproduced from arXiv: 1907.04041 by Benjamin Kiessling, Daniel St\"okl Ben Ezra, Matthew Thomas Miller.

Figure 1
Figure 1. Figure 1: Aspects of Arabic-script handwriting While many Arabic handwritten texts present only a single base￾line per logical text line a large number of documents, especially calligraphic works in Thuluth and Nastaliq style, display per word slanted baselines (Fig. 1b), multiple baseline levels, and dislocation of fragments into the margins or above other text in the line (heap￾ing) (Fig. 1c and 1a). Most of these… view at source ↗
Figure 2
Figure 2. Figure 2: Examples of annotation guideline application (baseline in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of the baseline labelling network. Dropout and batch/group normalization layers are omitted. (beige: convolutional [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: 4 sample pages from the corpus The backbone model consists of the first 3 blocks of a 34-layer ResNet in the contracting path followed by 4 3×3 convolution￾transposed convolution blocks in the expanding paths with group normalization [25] (G = 32) and dropout (p = 0.1) employed after each layer and block respectively. A final 1×1 convolutional layer reduces the dimensionality of the input-sized 64-channel … view at source ↗
read the original abstract

The application of handwritten text recognition to historical works is highly dependant on accurate text line retrieval. A number of systems utilizing a robust baseline detection paradigm have emerged recently but the advancement of layout analysis methods for challenging scripts is held back by the lack of well-established datasets including works in non-Latin scripts. We present a dataset of 400 annotated document images from different domains and time periods. A short elaboration on the particular challenges posed by handwriting in Arabic script for layout analysis and subsequent processing steps is given. Lastly, we propose a method based on a fully convolutional encoder-decoder network to extract arbitrarily shaped text line images from manuscripts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the BADAM dataset of 400 annotated Arabic-script manuscript images drawn from different domains and time periods, briefly discusses challenges specific to Arabic handwriting for layout analysis, and proposes a fully convolutional encoder-decoder network for extracting arbitrarily shaped text lines.

Significance. A well-curated public dataset for baseline detection in Arabic manuscripts would address a documented scarcity of resources for non-Latin scripts and could support reproducible progress in historical document analysis. The proposed network architecture is a standard choice whose utility would be strengthened by empirical validation on the released data.

major comments (2)
  1. [Abstract] Abstract: the statement that the 400 images come 'from different domains and time periods' supplies no quantitative breakdown (counts per century, per script subtype, or per degradation class), which is load-bearing for the claim that models trained on BADAM will generalize to unseen manuscripts.
  2. [Abstract] Abstract / Method section: the fully convolutional encoder-decoder is described only at a high level with no training protocol, loss function, hyper-parameters, or any quantitative results (precision, recall, or pixel-level metrics) on the 400 images, leaving unsupported the assertion that the method handles Arabic-specific challenges.
minor comments (1)
  1. The abstract would be clearer if it stated the annotation format (e.g., polygon coordinates, pixel masks) and the tool or protocol used to produce the ground truth.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that the 400 images come 'from different domains and time periods' supplies no quantitative breakdown (counts per century, per script subtype, or per degradation class), which is load-bearing for the claim that models trained on BADAM will generalize to unseen manuscripts.

    Authors: We agree that a quantitative breakdown is necessary to substantiate the generalization claim. While the manuscript provides qualitative descriptions of the sources, we will add a table or dedicated subsection in the revised version detailing the distribution of the 400 images by century, script subtype, and degradation class. revision: yes

  2. Referee: [Abstract] Abstract / Method section: the fully convolutional encoder-decoder is described only at a high level with no training protocol, loss function, hyper-parameters, or any quantitative results (precision, recall, or pixel-level metrics) on the 400 images, leaving unsupported the assertion that the method handles Arabic-specific challenges.

    Authors: The primary contribution is the BADAM dataset; the FCN serves as an illustrative baseline. We acknowledge that the current description lacks sufficient detail to support claims about handling Arabic-specific challenges. In the revision we will expand the method section with the training protocol, loss function, hyperparameters, and report quantitative metrics (e.g., precision, recall, pixel-level IoU) evaluated on the released data. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release with independent method description

full rationale

The paper presents a new public dataset of 400 images and describes a fully convolutional encoder-decoder network for baseline detection. No equations, parameter fits, predictions, or self-citations appear in the provided text that reduce any claimed result to the inputs by construction. The contribution is self-contained as a data release plus high-level architecture outline; representativeness claims are empirical assumptions, not circular derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the creation and utility of a new annotated image collection plus the applicability of an existing fully convolutional architecture; no free parameters, domain-specific axioms, or new entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5633 in / 1055 out tokens · 40357 ms · 2026-05-25T00:46:38.613897+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 2 internal anchors

  1. [1]

    Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Ste- fan Pletschacher. 2011. Historical document layout analysis competition. In Doc- ument Analysis and Recognition (ICDAR), 2011 11th International Conference on . IEEE, 1516–1520

  2. [2]

    Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Ste- fan Pletschacher. 2013. Icdar 2013 competition on historical newspaper layout analysis (hnla 2013). In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 1454–1458

  3. [3]

    Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Ste- fan Pletschacher. 2015. ICDAR2015 competition on recognition of documents with complex layouts-RDCL2015. In Document Analysis and Recognition (IC- DAR), 2015 13th International Conference on . IEEE, 1151–1155

  4. [4]

    Apostolos Antonacopoulos, Stefan Pletschacher, David Bridson, and Christos Papadopoulos. 2009. ICDAR 2009 page segmentation competition. In Document Analysis and Recognition, 2009. ICDAR’09. 10th International Conference on. IEEE, 1370–1374

  5. [5]

    Berat Barakat, Ahmad Droby, Majeed Kassis, and Jihad El-Sana. 2018. Text Line Segmentation for Challenging Handwritten Document Images using Fully Con- volutional Network. In 2018 16th International Conference on Frontiers in Hand- writing Recognition (ICFHR) . IEEE, 374–379

  6. [6]

    Jean-Christophe Burie, Mickaël Coustaty, Setiawan Hadi, Made Windu Antara Kesiman, Jean-Marc Ogier, Erick Paulus, Kimheng Sok, I Made Gede Sunarya, and Dona Valy. 2016. ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts. In Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference o...

  7. [7]

    Christian Clausner, Apostolos Antonacopoulos, Nora Mcgregor, and Daniel Wilson-Nunn. 2018. ICFHR 2018 Competition on Recognition of Historical Ara- bic Scientific Manuscripts–RASM2018. In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) . IEEE, 471–476

  8. [8]

    Markus Diem, Florian Kleber, Stefan Fiel, Tobias Grüning, and Basilis Gatos

  9. [9]

    In Document Analy- sis and Recognition (ICDAR), 2017 14th IAPR International Conference on , Vol

    cbad: Icdar2017 competition on baseline detection. In Document Analy- sis and Recognition (ICDAR), 2017 14th IAPR International Conference on , Vol. 1. IEEE, 1355–1360

  10. [10]

    David H Douglas and Thomas K Peucker. 1973. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovi- sualization 10, 2 (1973), 112–122

  11. [11]

    Michael Fink, Thomas Layer, Georg Mackenbrock, and Michael Sprinzl. 2018. Baseline Detection in Historical Documents using Convolutional U-Nets. In2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 37– 42

  12. [12]

    Andreas Fischer, Volkmar Frinken, Alicia Fornés, and Horst Bunke. 2011. Tran- scription alignment of Latin manuscripts using hidden Markov models. In Pro- ceedings of the 2011 Workshop on Historical Document Imaging and Processing . ACM, 29–36

  13. [13]

    Basilis Gatos, Nikolaos Stamatopoulos, and Georgios Louloudis. 2010. ICHFR 2010 handwriting segmentation contest. In 2010 11th International Conference on Frontiers in Handwriting Recognition (ICFHR) . IEEE, 737–742

  14. [14]

    Basilios Gatos, Nikolaos Stamatopoulos, and Georgios Louloudis. 2011. IC- DAR2009 handwriting segmentation contest. International Journal on Document Analysis and Recognition (IJDAR) 14, 1 (2011), 25–33

  15. [15]

    Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, and Stefan Fiel

  16. [16]

    In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)

    Read-bad: A new dataset and evaluation scheme for baseline detection in archival documents. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 351–356

  17. [17]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034

  18. [18]

    Majeed Kassis, Alaa Abdalhaleem, Ahmad Droby, Reem Alaasam, and Jihad El- Sana. 2017. VML-HD: The historical Arabic documents dataset for recognition systems. In Arabic Script Analysis and Recognition (ASAR), 2017 1st International Workshop on. IEEE, 11–14. Benjamin Kiessling, Daniel Stökl Ben Ezra, and Matthew Thomas Miller

  19. [19]

    Ta-Chih Lee, Rangasami L Kashyap, and Chong-Nam Chu. 1994. Building skele- ton models via 3-D medial surface axis thinning algorithms. CVGIP: Graphical Models and Image Processing 56, 6 (1994), 462–478

  20. [20]

    Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition . 3431–3440

  21. [21]

    Michael Murdock, Shawn Reid, Blaine Hamilton, and Jackson Reese. 2015. IC- DAR 2015 competition on text line detection in historical documents. In Doc- ument Analysis and Recognition (ICDAR), 2015 13th International Conference on . IEEE, 1171–1175

  22. [22]

    Lorenzo Quirós. 2018. Multi-Task Handwritten Document Layout Analysis. arXiv preprint arXiv:1806.08852 (2018)

  23. [23]

    Veronica Romero, Joan Andreu Sanchez, Vicente Bosch, Katrien Depuydt, and Jesse de Does. 2015. Influence of text line segmentation in handwritten text recognition. In Document Analysis and Recognition (ICDAR), 2015 13th Interna- tional Conference on. IEEE, 536–540

  24. [24]

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolu- tional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention . Springer, 234– 241

  25. [25]

    Jaakko Sauvola and Matti Pietikäinen. 2000. Adaptive document image binariza- tion. Pattern recognition 33, 2 (2000), 225–236

  26. [26]

    Foteini Simistira, Mathias Seuret, Nicole Eichenberger, Angelika Garz, Marcus Liwicki, and Rolf Ingold. 2016. Diva-hisdb: A precisely annotated large dataset of challenging medieval manuscripts. In Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference on . IEEE, 471–476

  27. [27]

    Yuxin Wu and Kaiming He. 2018. Group Normalization. CoRR abs/1803.08494 (2018). arXiv: 1803.08494 http://arxiv.org/abs/1803.08494