DKDS: A Benchmark Dataset of Degraded Kuzushiji Documents with Seals for Detection and Binarization
Pith reviewed 2026-05-17 23:42 UTC · model grok-4.3
The pith
A new dataset of degraded historical Japanese documents with seals fills the gap for testing OCR on noisy scans.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce the Degraded Kuzushiji Documents with Seals (DKDS) dataset as a new benchmark for Kuzushiji character and seal detection and document binarization tasks, constructed with the assistance of a trained Kuzushiji expert to address noise types ignored by existing resources.
What carries the argument
The DKDS dataset of annotated degraded Kuzushiji pages containing seals, which supports two benchmark tracks and supplies baseline results from YOLO detectors and cGAN-based binarization.
Load-bearing premise
The collected documents and expert-assisted annotations are sufficiently representative of the full range of real-world degradation and seal types encountered in historical Kuzushiji archives.
What would settle it
A model trained only on clean Kuzushiji data achieves equal or higher detection and recognition accuracy on held-out real degraded documents than any model trained or fine-tuned on the DKDS set.
read the original abstract
Kuzushiji, a pre-modern Japanese cursive script, can currently be read and understood by only a few thousand trained experts in Japan. With the rapid development of deep learning, researchers have begun applying Optical Character Recognition (OCR) techniques to transcribe Kuzushiji into modern Japanese. Although existing OCR methods perform well on clean pre-modern Japanese documents written in Kuzushiji, they often fail to consider various types of noise, such as document degradation and seals, which significantly affect recognition accuracy. To the best of our knowledge, no existing dataset specifically addresses these challenges. To address this gap, we introduce the Degraded Kuzushiji Documents with Seals (DKDS) dataset as a new benchmark for related tasks. We describe the dataset construction process, which involves the assistance of a trained Kuzushiji expert, and define two benchmark tracks: (1) Kuzushiji character and seal detection and (2) document binarization. For the Kuzushiji character and seal detection track, we provide baseline results using several recent versions of YOLO to detect Kuzushiji characters and seals. For the document binarization track, we present baseline results from traditional binarization algorithms, traditional algorithms combined with K-means clustering, two state-of-the-art (SOTA) generative adversarial network (GAN) methods, and our improved conditional GAN (cGAN)-based method. The DKDS dataset and the implementation code for baseline methods are available at https://ruiyangju.github.io/DKDS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that existing Kuzushiji OCR methods struggle with real-world noise from document degradation and seals, that no prior dataset specifically targets these issues, and that the new DKDS dataset—constructed with assistance from a trained Kuzushiji expert—fills this gap by providing a benchmark for two tracks: (1) Kuzushiji character and seal detection (with YOLO baselines) and (2) document binarization (with traditional algorithms, K-means combinations, SOTA GANs, and an improved cGAN). The dataset and baseline code are released publicly.
Significance. If the dataset proves representative and the annotations reliable, DKDS would supply a much-needed public benchmark for historical document analysis in a low-resource script, enabling systematic evaluation of detection and binarization methods under realistic degradation conditions that current clean-document datasets omit. The provision of multiple baseline implementations and public code release strengthens reproducibility.
major comments (2)
- [abstract and §3] Dataset construction (abstract and §3): the process is described only at a high level as 'involves the assistance of a trained Kuzushiji expert' with no enumeration of source archives, counts or distribution per degradation class (fading, stains, bleed, physical damage), seal taxonomy (style, size, overlap, color), or annotation protocol. This directly weakens the central claim that DKDS addresses a genuine gap, as the representativeness of the collected images for the broader historical Kuzushiji corpus cannot be assessed.
- [§4] Benchmark tracks and splits: the manuscript provides no details on train/test/validation splits, total image counts, or per-class statistics for either the detection or binarization track. Without these, it is impossible to determine whether the reported YOLO and cGAN baselines reflect a balanced evaluation or merely performance on a narrow subset.
minor comments (2)
- [abstract] The abstract states 'to the best of our knowledge, no existing dataset specifically addresses these challenges' but does not cite or briefly contrast the closest prior Kuzushiji or historical-document datasets; adding 2–3 sentences of related-work context would clarify novelty.
- [figures and tables] Figure and table captions should explicitly state image resolution, number of samples shown, and whether examples are from train or test portions to aid readers in interpreting the visual results.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our submission. We have reviewed the major comments carefully and provide our responses below. We agree that more detailed information on dataset construction and benchmark splits is necessary to fully substantiate our claims and will revise the manuscript to include these details.
read point-by-point responses
-
Referee: [abstract and §3] Dataset construction (abstract and §3): the process is described only at a high level as 'involves the assistance of a trained Kuzushiji expert' with no enumeration of source archives, counts or distribution per degradation class (fading, stains, bleed, physical damage), seal taxonomy (style, size, overlap, color), or annotation protocol. This directly weakens the central claim that DKDS addresses a genuine gap, as the representativeness of the collected images for the broader historical Kuzushiji corpus cannot be assessed.
Authors: We recognize that the description in the abstract and Section 3 is indeed high-level. To address this, we will revise the manuscript to provide a more comprehensive account of the dataset construction. Specifically, we will enumerate the source archives from which the images were collected, provide counts and distributions for each degradation class (including fading, stains, bleed, and physical damage), detail the seal taxonomy covering style, size, overlap, and color, and describe the annotation protocol, including the role of the trained Kuzushiji expert in verifying the annotations. These additions will enable a better evaluation of the dataset's representativeness for the historical Kuzushiji corpus. revision: yes
-
Referee: [§4] Benchmark tracks and splits: the manuscript provides no details on train/test/validation splits, total image counts, or per-class statistics for either the detection or binarization track. Without these, it is impossible to determine whether the reported YOLO and cGAN baselines reflect a balanced evaluation or merely performance on a narrow subset.
Authors: We agree with the referee that the lack of details on splits and statistics limits the assessment of the baselines. In the revised manuscript, we will expand Section 4 to include the total number of images in the DKDS dataset, the specific train/test/validation splits with their respective counts and ratios, and per-class statistics for both tracks. For the detection track, this will include breakdowns for Kuzushiji characters and seals; for the binarization track, relevant image statistics. This will demonstrate that the evaluations are conducted on balanced and representative subsets. revision: yes
Circularity Check
No circularity: dataset introduction with independent baselines
full rationale
The paper presents a new benchmark dataset (DKDS) for degraded Kuzushiji documents with seals, describes its construction process involving expert assistance, and reports empirical baseline results on detection (YOLO variants) and binarization (traditional methods, K-means, GANs, and an improved cGAN). No derivations, equations, or predictions exist that reduce outputs to inputs by construction. Novelty claims are standard and non-circular. No self-citation load-bearing steps or ansatz smuggling appear. The work is self-contained against external benchmarks for dataset papers.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the Degraded Kuzushiji Documents with Seals (DKDS) dataset... two benchmark tracks: (1) text and seal detection... (2) document binarization.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In: IPSJ Humanities and Computer Symposium (2021)
Clanuwat, T., Kitamoto, A.: ‘miwo’ ai kuzushiji recognition application for docu- ment examination. In: IPSJ Humanities and Computer Symposium (2021)
work page 2021
-
[2]
In: International Conference on Pattern Recog- nition, pp
Ueki, K., Kojima, T.: Survey on deep learning-based kuzushiji recognition. In: International Conference on Pattern Recog- nition, pp. 97–111 (2021)
work page 2021
-
[3]
Digital Humanities Quarterly11(1) (2017)
Hashimoto, Y., Iikura, Y., Hisada, Y., Kang, S., Arisawa, T., et al.: The kuzushiji project: Developing a mobile learning application for 14 reading early modern japanese texts. Digital Humanities Quarterly11(1) (2017)
work page 2017
-
[4]
In: NeurIPS Workshop on Machine Learning for Creativity and Design (2018)
Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep learning for classical japanese literature. In: NeurIPS Workshop on Machine Learning for Creativity and Design (2018)
work page 2018
-
[5]
In: IPSJ SIG Com- puters and the Humanities Symposium, vol
Kitamoto, A., Clanuwat, T., Lamb, A., Bober-Irizar, M.: Progress and results of kaggle machine learning competition for kuzushiji recognition. In: IPSJ SIG Com- puters and the Humanities Symposium, vol. 2019, pp. 223–230 (2019). (in Japanese)
work page 2019
-
[6]
In: Proceed- ings of the Japanese Society for Artificial Intelligence, vol
Kitamoto, A., Clanuwat, T., Bober-Irizar, M.: Kaggle kuzushiji recognition competition: Challenges of hosting a world-wide competi- tion in the digital humanities. In: Proceed- ings of the Japanese Society for Artificial Intelligence, vol. 35, pp. 366–376 (2020). (in Japanese)
work page 2020
-
[7]
http: //codh.rois.ac.jp/software/soan/ (2023)
Center for Open Data in the Humanities (CODH): Soan: Library for rendering mod- ern Japanese using old movable type. http: //codh.rois.ac.jp/software/soan/ (2023)
work page 2023
-
[8]
In: International Conference on Document Anal- ysis and Recognition, pp
Clanuwat, T., Lamb, A., Kitamoto, A.: Kuronet: Pre-modern japanese kuzushiji character recognition with deep learning. In: International Conference on Document Anal- ysis and Recognition, pp. 607–614 (2019)
work page 2019
-
[9]
SN Computer Science1(3), 1–15 (2020)
Lamb, A., Clanuwat, T., Kitamoto, A.: Kuronet: Regularized residual u-nets for end- to-end kuzushiji character recognition. SN Computer Science1(3), 1–15 (2020)
work page 2020
-
[10]
In: International Con- ference on Medical Image Computing and Computer-Assisted Intervention, pp
Ronneberger, O., Fischer, P., Brox, T.: U- net: Convolutional networks for biomedical image segmentation. In: International Con- ference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)
work page 2015
-
[11]
https://huggingface.co/SakanaAI/Metom
Imajuku, Y., Clanuwat, T.: Metom (2024). https://huggingface.co/SakanaAI/Metom
work page 2024
-
[12]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Confer- ence on Learning Representations (2021) [13]国文学研 究資料館:日本古典籍く ず し 字データセ ット(2016). https:...
work page 2021
-
[13]
Journal of Data Mining & Digital Humanities (HistoInfor- matics) (2021)
Li, K., Batjargal, B., Maeda, A.: Charac- ter segmentation in asian collector’s seal imprints: an attempt to retrieval based on ancient character typeface. Journal of Data Mining & Digital Humanities (HistoInfor- matics) (2021)
work page 2021
-
[14]
In: International Confer- ence on Document Analysis and Recognition, pp
Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar 2009 document image binarization con- test (dibco 2009). In: International Confer- ence on Document Analysis and Recognition, pp. 1375–1382 (2009)
work page 2009
-
[15]
In: International Conference on Frontiers in Handwriting Recognition, pp
Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-dibco 2010-handwritten document image binarization competition. In: International Conference on Frontiers in Handwriting Recognition, pp. 727–732 (2010)
work page 2010
-
[16]
In: International Confer- ence on Document Analysis and Recognition, pp
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icdar 2011 document image binarization con- test (dibco 2011). In: International Confer- ence on Document Analysis and Recognition, pp. 1506–1510 (2011)
work page 2011
-
[17]
In: International Confer- ence on Document Analysis and Recognition, pp
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icdar 2013 document image binarization con- test (dibco 2013). In: International Confer- ence on Document Analysis and Recognition, pp. 1471–1476 (2013)
work page 2013
-
[18]
In: International Conference on Frontiers in Handwriting Recognition, pp
Ntirogiannis, K., Gatos, B., Pratikakis, I.: Icfhr2014 competition on handwritten doc- ument image binarization (h-dibco 2014). In: International Conference on Frontiers in Handwriting Recognition, pp. 809–813 (2014)
work page 2014
-
[19]
In: International Conference on Frontiers 15 in Handwriting Recognition, pp
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icfhr 2012 competition on handwritten doc- ument image binarization (h-dibco 2012). In: International Conference on Frontiers 15 in Handwriting Recognition, pp. 817–822 (2012)
work page 2012
-
[20]
In: Inter- national Conference on Frontiers in Hand- writing Recognition, pp
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icfhr2016 handwritten document image binarization contest (h-dibco 2016). In: Inter- national Conference on Frontiers in Hand- writing Recognition, pp. 619–623 (2016)
work page 2016
-
[21]
In: Interna- tional Conference on Document Analysis and Recognition, vol
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icdar2017 competition on document image binarization (dibco 2017). In: Interna- tional Conference on Document Analysis and Recognition, vol. 1, pp. 1395–1403 (2017)
work page 2017
-
[22]
In: International Conference on Frontiers in Handwriting Recognition, pp
Pratikakis, I., Zagori, K., Kaddas, P., Gatos, B.: Icfhr 2018 competition on handwritten document image binarization (h-dibco 2018). In: International Conference on Frontiers in Handwriting Recognition, pp. 489–493 (2018)
work page 2018
-
[23]
In: International Conference on Document Analysis and Recognition, pp
Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., Marthot- Santaniello, I.: Icdar 2019 competition on document image binarization (dibco 2019). In: International Conference on Document Analysis and Recognition, pp. 1547–1556 (2019)
work page 2019
-
[24]
In: Joint Conference on Digital Libraries, pp
Deng, F., Wu, Z., Lu, Z., Brown, M.S.: Bina- rizationshop: a user-assisted software suite for converting old documents to black-and-white. In: Joint Conference on Digital Libraries, pp. 255–258 (2010)
work page 2010
-
[25]
Pattern Recognition46(8), 2297–2312 (2013)
Hedjam, R., Cheriet, M.: Historical docu- ment image restoration using multispectral imaging system. Pattern Recognition46(8), 2297–2312 (2013)
work page 2013
-
[26]
In: International Conference on Pat- tern Recognition and Image Analysis, pp
Ayatollahi, S.M., Nafchi, H.Z.: Persian her- itage image binarization competition (phibc 2012). In: International Conference on Pat- tern Recognition and Image Analysis, pp. 1–4 (2013)
work page 2012
-
[27]
In: ACM International Conference on Multimedia, pp
Yang, Z., Liu, B., Xxiong, Y., Yi, L., Wu, G., Tang, X., Liu, Z., Zhou, J., Zhang, X.: Docdiff: Document enhancement via resid- ual diffusion models. In: ACM International Conference on Multimedia, pp. 2795–2806 (2023)
work page 2023
-
[28]
In: International Conference on Document Analysis and Recognition, pp
Buitrago, P.A., Toropov, E., Prabha, R., Uran, J., Adal, R.: Miikeminestamps: a long- tailed dataset of japanese stamps via active learning. In: International Conference on Document Analysis and Recognition, pp. 3– 19 (2021)
work page 2021
-
[29]
In: AAAI Conference on Artificial Intelligence, vol
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with dif- ferentiable binarization. In: AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474– 11481 (2020)
work page 2020
-
[30]
In: International Conference on Document Analysis and Recognition, pp
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S.,et al.: Icdar 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)
work page 2015
-
[31]
In: Interna- tional Conference on Document Analysis and Recognition, pp
Micenkov, B., Beusekom, J.: Stamp detec- tion in color document images. In: Interna- tional Conference on Document Analysis and Recognition, pp. 1125–1129 (2011)
work page 2011
-
[32]
In: International Conference on Document Anal- ysis and Recognition, pp
Yu, W., Liu, M., Chen, M., Lu, N., Wen, Y., Liu, Y., Karatzas, D., Bai, X.: Icdar 2023 competition on reading the seal title. In: International Conference on Document Anal- ysis and Recognition, pp. 522–535 (2023)
work page 2023
-
[33]
https://github.com/ ultralytics/ultralytics
Jocher, G., Chaurasia, A., Qiu, J.: Ultra- lytics YOLOv8 (2023). https://github.com/ ultralytics/ultralytics
work page 2023
-
[34]
In: European Conference on Computer Vision, pp
Wang, C.-Y., Yeh, I.-H., Mark Liao, H.- Y.: Yolov9: Learning what you want to learn using programmable gradient informa- tion. In: European Conference on Computer Vision, pp. 1–21 (2024)
work page 2024
-
[35]
Advances in Neural Information Processing Systems37, 107984– 108011 (2024)
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J.,et al.: Yolov10: Real-time end- to-end object detection. Advances in Neural Information Processing Systems37, 107984– 108011 (2024)
work page 2024
-
[36]
https://github.com/ultralytics/ 16 ultralytics
Jocher, G., Qiu, J.: Ultralytics YOLO11 (2024). https://github.com/ultralytics/ 16 ultralytics
work page 2024
-
[37]
IEEE Transactions on Systems, Man, and Cybernetics9(1), 62–66 (1979)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics9(1), 62–66 (1979)
work page 1979
-
[38]
Strandberg Publishing Company, Denmark (1985)
Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Denmark (1985)
work page 1985
-
[39]
Pattern Recogni- tion33(2), 225–236 (2000)
Sauvola, J., Pietik¨ ainen, M.: Adaptive doc- ument image binarization. Pattern Recogni- tion33(2), 225–236 (2000)
work page 2000
-
[40]
Communications of the ACM 63(11), 139–144 (2020)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adver- sarial networks. Communications of the ACM 63(11), 139–144 (2020)
work page 2020
-
[41]
Pattern Recognition130, 108810 (2022)
Suh, S., Kim, J., Lukowicz, P., Lee, Y.O.: Two-stage generative adversarial networks for binarization of color document images. Pattern Recognition130, 108810 (2022)
work page 2022
-
[42]
In: IEEE International Conference on Computer Vision, pp
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
work page 2017
-
[43]
In: Pacific Rim International Conference on Arti- ficial Intelligence, pp
Ju, R.-Y., Lin, Y.-S., Chiang, J.-S., Chen, C.-C., Chen, W.-H., Chien, C.-T.: Ccdwt- gan: Generative adversarial networks based on color channel using discrete wavelet trans- form for document image binarization. In: Pacific Rim International Conference on Arti- ficial Intelligence, pp. 186–198 (2023)
work page 2023
-
[44]
Knowledge-Based Systems304, 112542 (2024)
Ju, R.-Y., Lin, Y.-S., Jin, Y., Chen, C.- C., Chien, C.-T., Chiang, J.-S.: Three-stage binarization of color document images based on discrete wavelet transform and genera- tive adversarial networks. Knowledge-Based Systems304, 112542 (2024)
work page 2024
-
[45]
In: Asia Pacific Signal and Information Processing Association Annual Summit and Conference, pp
Ju, R.-Y., Wong, K., Chiang, J.-S.: Efficient generative adversarial networks for color doc- ument image enhancement and binarization using multi-scale feature extraction. In: Asia Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1898–1903 (2025)
work page 1903
-
[46]
arXiv preprint arXiv:2512.14114 (2025)
Ju, R.-Y., Wong, K., Jin, Y., Chiang, J.- S.: Mfe-gan: Efficient gan-based framework for document image enhancement and bina- rization with multi-scale feature extraction. arXiv preprint arXiv:2512.14114 (2025)
-
[47]
In: Proceedings of Berkeley Symposium on Mathematical Statistics and Probability, pp
McQueen, J.B.: Some methods of classifi- cation and analysis of multivariate observa- tions. In: Proceedings of Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
work page 1967
-
[48]
In: IEEE Conference on Computer Vision and Pattern Recognition, pp
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
work page 2017
-
[49]
In: International Conference on Machine Learn- ing, vol
Tan, M., Le, Q.E.,et al.: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learn- ing, vol. 15 (2019)
work page 2019
-
[50]
Advances in Neural Information Processing Systems30 (2017)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Advances in Neural Information Processing Systems30 (2017)
work page 2017
-
[51]
Annals of Mathematical Statistics, 400–407 (1951)
Robbins, H., Monro, S.: A stochastic approx- imation method. Annals of Mathematical Statistics, 400–407 (1951)
work page 1951
-
[52]
In: European Conference on Computer Vision, pp
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft coco: Common objects in con- text. In: European Conference on Computer Vision, pp. 740–755 (2014)
work page 2014
-
[53]
IEEE Transactions on Medical Imaging39(6), 1856–1867 (2019)
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: Redesigning skip connec- tions to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging39(6), 1856–1867 (2019)
work page 2019
-
[54]
In: Interna- tional Conference on Learning Representa- tions (2015) 17
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Interna- tional Conference on Learning Representa- tions (2015) 17
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.