A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples

Aishwarya Dharani and; Lalitha Giridhar; Velmathi Guruviah

arxiv: 1907.04917 · v1 · pith:SIBJUF5Znew · submitted 2019-07-04 · 💻 cs.CV · eess.IV

A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples

Lalitha Giridhar , Aishwarya Dharani and , Velmathi Guruviah This is my paper

Pith reviewed 2026-05-25 09:34 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords ancient Tamiloptical character recognitionconvolutional neural networkTesseractOtsu thresholdingtemple inscriptionscharacter classificationepigraphy

0 comments

The pith

A 2D CNN linked to Tesseract recognizes 7th-12th century Tamil inscriptions at 77.7 percent combined efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper curates a dataset of cropped character images from selected temple inscriptions of the 7th to 12th centuries. It applies Otsu thresholding to binarize the images and trains a two-dimensional convolutional neural network to classify the characters. The trained network connects to Tesseract OCR through the pytesseract library in Python, and the system adds text-to-speech conversion for audio output of the recognized text. Testing on samples of both modern and ancient Tamil yields the reported efficiency. This targets the practical problem of reading evolved ancient scripts when comprehensive labeled datasets remain difficult to assemble.

Core claim

After binarization with Otsu thresholding, a two-dimensional convolutional neural network is trained on cropped images of characters from certain 7th-12th century temple inscriptions; this network is interfaced with Tesseract via pytesseract to classify and recognize the characters, producing digitized text that is also rendered as audio through Google's text-to-speech engine, for an overall efficiency of 77.7 percent on the inscriptions examined.

What carries the argument

The 2D convolutional neural network trained on cropped temple-inscription character images and linked to Tesseract OCR through pytesseract.

If this is right

The same pipeline processes both modern Tamil and the studied ancient samples.
Recognized text is converted to spoken audio output.
A usable dataset for this period can be assembled from limited temple sources rather than exhaustive collection.
The method supplies a concrete digitization route for inscriptions whose character forms have changed over centuries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Comparable small-dataset CNN approaches could be tested on other historical scripts that also lack large public corpora.
Accuracy might rise if the same architecture receives additional images from a wider range of temples or if the network depth is adjusted.
The workflow suggests a route for epigraphers to move from physical inscriptions to searchable digital text without first building massive labeled sets.

Load-bearing premise

The small collection of cropped character images from selected temple inscriptions is representative enough for the 2D CNN to learn features that generalize across 7th-12th century Tamil script without overfitting.

What would settle it

Apply the trained system to an independent collection of 7th-12th century Tamil inscriptions not used in training and check whether the measured recognition rate is substantially lower than 77.7 percent.

Figures

Figures reproduced from arXiv: 1907.04917 by Aishwarya Dharani and, Lalitha Giridhar, Velmathi Guruviah.

**Figure 2.** Figure 2: Suggested setup for implementation The model is trained to minimize the distance between images belonging to same font family and to maximize the distance between images belonging to different font family. The recognized images of each letter block are combined to form a single image tile. This is then subjected to father processing to perform OCR. To implement OCR, the tile is fed to the Pytesseract libra… view at source ↗

**Figure 3.** Figure 3: Flowchart for software implementation In the case of the pre-digitized text, the OCR accuracy was close to 99%. The real challenge was to perfect the audio output. It was observed that the audio output after performing text to speech conversion also worked with near 99% accuracy. Since the accuracy of the audio output is dependent on the accuracy of the OCR, the efficiency of the integrated system relies p… view at source ↗

read the original abstract

Recognition of ancient Tamil characters has always been a challenge for epigraphers. This is primarily because the language has evolved over the several centuries and the character set over this time has both expanded and diversified. This proposed work focuses on improving optical character recognition techniques for ancient Tamil script which was in use between the 7th and 12th centuries. While comprehensively curating a functional data set for ancient Tamil characters is an arduous task, in this work, a data set has been curated using cropped images of characters found on certain temple inscriptions, specific to this time as a case study. After using Otsu thresholding method for binarization of the image a two dimensional convolution neural network is defined and used to train, classify and, recognize the ancient Tamil characters. To implement the optical character recognition techniques, the neural network is linked to the Tesseract using the pytesseract library of Python. As an added feature, the work also incorporates Google's text to speech voice engine to produce an audio output of the digitized text. Various samples for both modern and ancient Tamil were collected and passed through the system. It is found that for Tamil inscriptions studied over the considered time period, a combined efficiency of 77.7 percent can be achieved.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a basic case-study application of existing OCR tools to a small set of ancient Tamil inscriptions, with an accuracy claim that cannot be checked from the given details.

read the letter

The main point for you is that nothing new is introduced here. The authors take Otsu binarization, a standard 2D CNN, Tesseract, and Google TTS, run them on cropped character images from some 7th-12th century Tamil temple inscriptions, and report 77.7% combined efficiency. They also add speech output, which is a reasonable practical feature for an end-user tool. They tested on both modern and ancient samples, which shows some attention to the domain shift. That is the extent of what is done well: a working pipeline for one narrow use case. The dataset curation itself is presented as a case study rather than a large-scale effort. The soft spots are straightforward and central. The abstract states the 77.7% figure but gives no dataset size, number of distinct characters, train/test split, CNN architecture, hyperparameters, or even how the efficiency number was computed. The stress-test note is accurate on this point. Without those quantities it is impossible to tell whether the result reflects learned features or just a small memorized collection. No baselines or error analysis appear either. The paper does not engage with prior work on historical script OCR beyond naming the tools, and it offers no analysis of how the script changed over the centuries. This is for someone who needs a quick starting template to digitize inscriptions from that exact period and location. A reader wanting reproducible results, method improvements, or comparisons to other approaches will find nothing of value. It does not deserve peer review in this form. The central empirical claim cannot be evaluated, so I would recommend against sending it out.

Referee Report

3 major / 3 minor

Summary. The manuscript presents a system for OCR of ancient Tamil inscriptions (7th-12th centuries) from temple sources. It curates a dataset of cropped character images, applies Otsu thresholding for binarization, trains a 2D CNN for classification, integrates the output with Tesseract via pytesseract, and adds Google TTS for audio. The central claim is that this pipeline achieves a combined efficiency of 77.7% on the studied inscriptions.

Significance. If the empirical result were reproducible and generalizable, the work would offer a practical tool for digitizing historical Tamil epigraphy. The integration of CNN classification with existing OCR and TTS components is a reasonable engineering approach for a low-resource script, but the absence of any quantitative validation details prevents assessment of whether the 77.7% figure reflects genuine generalization or case-specific performance.

major comments (3)

[Abstract] Abstract: the reported 77.7% combined efficiency is presented without any dataset statistics (total images, number of distinct characters, temporal distribution across 7th-12th century samples), train/test split, or description of how the metric was computed (per-character, per-inscription, or otherwise). This information is required to evaluate whether the CNN learned generalizable features or simply memorized a small curated collection.
[Abstract] Abstract and system description: no architecture details, layer counts, filter sizes, or training hyperparameters are supplied for the 2D CNN, nor is any validation procedure (cross-validation, held-out test set) described. Without these, the classification step that underpins the 77.7% claim cannot be reproduced or compared to standard baselines.
[Abstract] Abstract: the manuscript states that 'various samples for both modern and ancient Tamil were collected and passed through the system' yet supplies no quantitative breakdown of modern vs. ancient performance, error analysis, or comparison against Tesseract alone or other Tamil OCR methods. This omission makes it impossible to isolate the contribution of the CNN component.

minor comments (3)

[Abstract] The term 'efficiency' is used for the 77.7% figure; the manuscript should clarify whether this denotes accuracy, F1, or another metric and provide the exact formula or aggregation method.
The description of dataset curation ('cropped images of characters found on certain temple inscriptions') is too vague for a methods section; explicit counts and selection criteria should be added.
No mention is made of the number of classes (distinct ancient Tamil characters) or class imbalance, both of which are critical for interpreting CNN performance on an evolving script.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to supply the missing details where they are available from the original study.

read point-by-point responses

Referee: [Abstract] Abstract: the reported 77.7% combined efficiency is presented without any dataset statistics (total images, number of distinct characters, temporal distribution across 7th-12th century samples), train/test split, or description of how the metric was computed (per-character, per-inscription, or otherwise). This information is required to evaluate whether the CNN learned generalizable features or simply memorized a small curated collection.

Authors: We agree that these statistics are necessary to assess generalization. The original manuscript described the curation process but omitted numerical details for brevity. In revision we will add the available dataset information, the train/test split employed for the CNN, and clarify that the reported figure is the end-to-end pipeline accuracy on the held-out inscriptions. revision: yes
Referee: [Abstract] Abstract and system description: no architecture details, layer counts, filter sizes, or training hyperparameters are supplied for the 2D CNN, nor is any validation procedure (cross-validation, held-out test set) described. Without these, the classification step that underpins the 77.7% claim cannot be reproduced or compared to standard baselines.

Authors: We concur that architecture and training details are required for reproducibility. Although the manuscript refers to a 2D CNN, specific layer counts, filter sizes, and hyperparameters were not listed. We will revise the methods section to document the network architecture and validation procedure used in the original implementation. revision: yes
Referee: [Abstract] Abstract: the manuscript states that 'various samples for both modern and ancient Tamil were collected and passed through the system' yet supplies no quantitative breakdown of modern vs. ancient performance, error analysis, or comparison against Tesseract alone or other Tamil OCR methods. This omission makes it impossible to isolate the contribution of the CNN component.

Authors: The work focused on the pipeline for the 7th-12th century inscriptions; modern samples served only for preliminary checks. A full quantitative breakdown, error analysis, or comparison against Tesseract alone and other methods was not performed. We will add a brief discussion of the CNN's role and performance on the ancient samples studied, but a comprehensive comparative evaluation lies outside the original scope. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical result with no derivations or self-referential predictions.

full rationale

The manuscript reports an empirical accuracy of 77.7% obtained by curating a dataset of cropped temple inscription images, applying Otsu binarization, training a 2D CNN, and piping output through Tesseract. No equations, fitted parameters presented as predictions, uniqueness theorems, or self-citations appear in the provided text. The central claim is a direct measurement on the authors' case-study collection rather than a derivation that reduces to its own inputs by construction. This matches the default expectation of a non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The reported accuracy rests on the assumption that the authors' small curated image set is adequate for CNN training and that standard off-the-shelf tools will generalize to the target script; no free parameters, new axioms, or invented entities are introduced beyond routine ML practice.

axioms (1)

domain assumption A 2D CNN with unspecified architecture will learn useful features from the curated character images after Otsu binarization.
The abstract invokes the CNN without stating layer count, filter sizes, or regularization, treating its effectiveness as given.

pith-pipeline@v0.9.0 · 5762 in / 1278 out tokens · 32504 ms · 2026-05-25T09:34:34.503068+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Archaeological survey of India

“Archaeological survey of India” [online] Available: http://asi.nic.in/publications/

work page
[2]

Tamil Script

“Tamil Script” [online] Available: https://en.wikipedia.org/wiki/Tamil_script

work page
[3]

Department of Archaeology

"Department of Archaeology" [online] Available: http:// www.tnarch.gov.in/epi/ins2.htm

work page
[4]

Middle Chola Temples

“Middle Chola Temples” S R Balasubramanyam Available: http://ignca.gov.in/Asi_data/62202.pdf

work page
[5]

Optical Character Recognition

“Optical Character Recognition” [online] Available: https://en.wikipedia.org/wiki/Optical_character_recognition

work page
[6]

Tamil character recognition from ancient epigraphical inscription using OCR and NLP,

T. Manigandan, V. Vidhya, V. Dhanalakshmi and B. Nirmala, "Tamil character recognition from ancient epigraphical inscription using OCR and NLP," 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, 2017, pp. 1008-1011

work page 2017
[7]

Century Identification and Recognition of Ancient Tamil Character Recognition,

S. Rajakumar and S. V. Bharathi., "Century Identification and Recognition of Ancient Tamil Character Recognition," International Journal of Computer Applications, vol. 26, no. 4, pp. 32-35, July 2011

work page 2011
[8]

A Comparative Study of Optical Character Recognition for Tamil Script

R . J. Kannan and R. Phrabhakar, “A Comparative Study of Optical Character Recognition for Tamil Script”, European Journal of Scientific Research ISSN 1450 -216X Vol.35 No.4 (2009), pp.570-582

work page 2009
[9]

Tamil Handwritten Character Recognition: Progress and Challenges

K. Punitharaja, and P. Elango, “Tamil Handwritten Character Recognition: Progress and Challenges” I J C T A, 9(3), 2016, pp. 143-151

work page 2016
[10]

Unsupervised Transcription of Historical Documents

T. B. Kirkpatrick, G. Durrett and D. Klein, “Unsupervised Transcription of Historical Documents”, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 207–217, Sofia, Bulgaria, August 4-9 2013

work page 2013
[11]

A Comprehensive Guide to Convolutional Neural Networks- the EL15 way

“A Comprehensive Guide to Convolutional Neural Networks- the EL15 way” [online] Available: https://towardsd atascience.com/a-comprehensive-guide-to-convolutional-neural- networks-the-eli5-way-3bd2b1164a53

work page
[12]

Tesseract OCR

“Tesseract OCR” [online] Available: https://opensource. google.com/projects/tesseract

work page
[13]

Comparing the OCR Accuracy Levels of Bitonal and Greyscale Images

“Comparing the OCR Accuracy Levels of Bitonal and Greyscale Images” [online] Available: http://www.dlib.org/dli b/march09/powell/03powell.html

work page
[14]

Otsu Thresholding

“Otsu Thresholding” [online] Available: http://www.labbookpages.co.uk/software/imgProc/otsuThreshold. html

work page
[15]

Transfer Learning

L. Torrey and J. Shavlik, “Transfer Learning”, Appears in the Handbook of Research on Machine Learning Applications, published by IGI Global, edited by E. Soria, J. Martin, R. Magdalena, M. Martinez and A. Serrano, 2009

work page 2009
[16]

Keras: The Python Deep Learning library

“Keras: The Python Deep Learning library”, [online] Available: https://keras.io/

work page
[17]

An end-to-end open source machine learning plat- form

“An end-to-end open source machine learning plat- form”, [online] Available: https://www.tensorflow.org/

work page
[18]

Acceleration and Implementation of JPEG 2000 Encoder on TI DSP platform

L. Chien-Chih, H . Hsueh-Ming, "Acceleration and Implementation of JPEG 2000 Encoder on TI DSP platform" Image Processing, 2007. ICIP 2007. IEEE International Conference on, Vo1. 3, pp. III-329-339, 2005

work page 2000
[19]

Convolutional Neural Networks for Visual recognition

“Convolutional Neural Networks for Visual recognition” A Karpathy, [online] Available: http://cs231n.github.io/convolutional-networks/#pool

work page
[20]

Max -pooling/Pooling

“Max -pooling/Pooling”, [online] Available: https://computersciencewiki.org/index.php/Maxpooling_/_Poolig

work page
[21]

Euclidean Distance Theory

“Euclidean Distance Theory”, [online] Available: https://pythonprogramming.net/euclidean -distance-machine-learnin g-tutorial/

work page
[22]

pytesseract PyPI

“pytesseract PyPI” [online] Available: https://pypi.org/ project/pytesseract/

work page
[23]

gTTS Documentation

“gTTS Documentation”, Pierre -Nick Durette [online] available: https://buildmedia.readthedocs.org/media/pdf/gtts/l atest/gtts.pdf

work page
[24]

Cloud Text to Speech

“Cloud Text to Speech” [online] Available: https://cloud.google.com/text-to-speech/

work page
[25]

Language support

“Language support” [online] Available: https://cloud.go ogle.com/speech-to-text/docs/languages

work page
[26]

Image Slicer Documentation

S . Dobson, “Image Slicer Documentation” [online] Available: https://buildmedia.readthedocs.org/media/pdf/imag e-slicer/latest/image-slicer.pdf

work page
[27]

A Fast and Accurate Dependency Parser using Ne ural Networks

D . Chen and C . D. Manning, “A Fast and Accurate Dependency Parser using Ne ural Networks.” Proceedings of EMNLP 2014. 6 Table 1. Results Obtained Scripture/document/t emple inscription Grayscale Output Binarized Output Digitized text Output in Python shell Case 1: Modern Tamil Case 2: Ancient Tamil Case 3: Ancient Tamil Case 4: Ancient Tamil 7 Inscripti...

work page 2014

[1] [1]

Archaeological survey of India

“Archaeological survey of India” [online] Available: http://asi.nic.in/publications/

work page

[2] [2]

Tamil Script

“Tamil Script” [online] Available: https://en.wikipedia.org/wiki/Tamil_script

work page

[3] [3]

Department of Archaeology

"Department of Archaeology" [online] Available: http:// www.tnarch.gov.in/epi/ins2.htm

work page

[4] [4]

Middle Chola Temples

“Middle Chola Temples” S R Balasubramanyam Available: http://ignca.gov.in/Asi_data/62202.pdf

work page

[5] [5]

Optical Character Recognition

“Optical Character Recognition” [online] Available: https://en.wikipedia.org/wiki/Optical_character_recognition

work page

[6] [6]

Tamil character recognition from ancient epigraphical inscription using OCR and NLP,

T. Manigandan, V. Vidhya, V. Dhanalakshmi and B. Nirmala, "Tamil character recognition from ancient epigraphical inscription using OCR and NLP," 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, 2017, pp. 1008-1011

work page 2017

[7] [7]

Century Identification and Recognition of Ancient Tamil Character Recognition,

S. Rajakumar and S. V. Bharathi., "Century Identification and Recognition of Ancient Tamil Character Recognition," International Journal of Computer Applications, vol. 26, no. 4, pp. 32-35, July 2011

work page 2011

[8] [8]

A Comparative Study of Optical Character Recognition for Tamil Script

R . J. Kannan and R. Phrabhakar, “A Comparative Study of Optical Character Recognition for Tamil Script”, European Journal of Scientific Research ISSN 1450 -216X Vol.35 No.4 (2009), pp.570-582

work page 2009

[9] [9]

Tamil Handwritten Character Recognition: Progress and Challenges

K. Punitharaja, and P. Elango, “Tamil Handwritten Character Recognition: Progress and Challenges” I J C T A, 9(3), 2016, pp. 143-151

work page 2016

[10] [10]

Unsupervised Transcription of Historical Documents

T. B. Kirkpatrick, G. Durrett and D. Klein, “Unsupervised Transcription of Historical Documents”, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 207–217, Sofia, Bulgaria, August 4-9 2013

work page 2013

[11] [11]

A Comprehensive Guide to Convolutional Neural Networks- the EL15 way

“A Comprehensive Guide to Convolutional Neural Networks- the EL15 way” [online] Available: https://towardsd atascience.com/a-comprehensive-guide-to-convolutional-neural- networks-the-eli5-way-3bd2b1164a53

work page

[12] [12]

Tesseract OCR

“Tesseract OCR” [online] Available: https://opensource. google.com/projects/tesseract

work page

[13] [13]

Comparing the OCR Accuracy Levels of Bitonal and Greyscale Images

“Comparing the OCR Accuracy Levels of Bitonal and Greyscale Images” [online] Available: http://www.dlib.org/dli b/march09/powell/03powell.html

work page

[14] [14]

Otsu Thresholding

“Otsu Thresholding” [online] Available: http://www.labbookpages.co.uk/software/imgProc/otsuThreshold. html

work page

[15] [15]

Transfer Learning

L. Torrey and J. Shavlik, “Transfer Learning”, Appears in the Handbook of Research on Machine Learning Applications, published by IGI Global, edited by E. Soria, J. Martin, R. Magdalena, M. Martinez and A. Serrano, 2009

work page 2009

[16] [16]

Keras: The Python Deep Learning library

“Keras: The Python Deep Learning library”, [online] Available: https://keras.io/

work page

[17] [17]

An end-to-end open source machine learning plat- form

“An end-to-end open source machine learning plat- form”, [online] Available: https://www.tensorflow.org/

work page

[18] [18]

Acceleration and Implementation of JPEG 2000 Encoder on TI DSP platform

L. Chien-Chih, H . Hsueh-Ming, "Acceleration and Implementation of JPEG 2000 Encoder on TI DSP platform" Image Processing, 2007. ICIP 2007. IEEE International Conference on, Vo1. 3, pp. III-329-339, 2005

work page 2000

[19] [19]

Convolutional Neural Networks for Visual recognition

“Convolutional Neural Networks for Visual recognition” A Karpathy, [online] Available: http://cs231n.github.io/convolutional-networks/#pool

work page

[20] [20]

Max -pooling/Pooling

“Max -pooling/Pooling”, [online] Available: https://computersciencewiki.org/index.php/Maxpooling_/_Poolig

work page

[21] [21]

Euclidean Distance Theory

“Euclidean Distance Theory”, [online] Available: https://pythonprogramming.net/euclidean -distance-machine-learnin g-tutorial/

work page

[22] [22]

pytesseract PyPI

“pytesseract PyPI” [online] Available: https://pypi.org/ project/pytesseract/

work page

[23] [23]

gTTS Documentation

“gTTS Documentation”, Pierre -Nick Durette [online] available: https://buildmedia.readthedocs.org/media/pdf/gtts/l atest/gtts.pdf

work page

[24] [24]

Cloud Text to Speech

“Cloud Text to Speech” [online] Available: https://cloud.google.com/text-to-speech/

work page

[25] [25]

Language support

“Language support” [online] Available: https://cloud.go ogle.com/speech-to-text/docs/languages

work page

[26] [26]

Image Slicer Documentation

S . Dobson, “Image Slicer Documentation” [online] Available: https://buildmedia.readthedocs.org/media/pdf/imag e-slicer/latest/image-slicer.pdf

work page

[27] [27]

A Fast and Accurate Dependency Parser using Ne ural Networks

D . Chen and C . D. Manning, “A Fast and Accurate Dependency Parser using Ne ural Networks.” Proceedings of EMNLP 2014. 6 Table 1. Results Obtained Scripture/document/t emple inscription Grayscale Output Binarized Output Digitized text Output in Python shell Case 1: Modern Tamil Case 2: Ancient Tamil Case 3: Ancient Tamil Case 4: Ancient Tamil 7 Inscripti...

work page 2014