Transformer based Urdu Handwritten Text Optical Character Reader

Ali Kamal; Mohammad Daniyal Shaiq; Musa Dildar Ahmed Cheema

arxiv: 2206.04575 · v1 · pith:X3T5W72Snew · submitted 2022-06-09 · 💻 cs.CV · cs.AI· cs.IR· cs.LG

Transformer based Urdu Handwritten Text Optical Character Reader

Mohammad Daniyal Shaiq , Musa Dildar Ahmed Cheema , Ali Kamal This is my paper

classification 💻 cs.CV cs.AIcs.IRcs.LG

keywords urdubeenhandwritinghandwrittenlanguagetextverywork

0 comments

read the original abstract

Extracting Handwritten text is one of the most important components of digitizing information and making it available for large scale setting. Handwriting Optical Character Reader (OCR) is a research problem in computer vision and natural language processing computing, and a lot of work has been done for English, but unfortunately, very little work has been done for low resourced languages such as Urdu. Urdu language script is very difficult because of its cursive nature and change of shape of characters based on it's relative position, therefore, a need arises to propose a model which can understand complex features and generalize it for every kind of handwriting style. In this work, we propose a transformer based Urdu Handwritten text extraction model. As transformers have been very successful in Natural Language Understanding task, we explore them further to understand complex Urdu Handwriting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Urdu Katib Handwritten Dataset: A Historical Document Dataset for Offline Urdu Handwritten Text Recognition with CRNN-Based Baseline Evaluation
cs.CV 2026-06 unverdicted novelty 7.0

Presents UKHD, the first historical offline Urdu handwritten text lines dataset from Katib materials, and benchmarks CRNN-based models with CNN-BGRU-CTC showing lowest CER and WER.