pith. sign in

arxiv: 1907.04917 · v1 · pith:SIBJUF5Znew · submitted 2019-07-04 · 💻 cs.CV · eess.IV

A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples

Pith reviewed 2026-05-25 09:34 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords ancient Tamiloptical character recognitionconvolutional neural networkTesseractOtsu thresholdingtemple inscriptionscharacter classificationepigraphy
0
0 comments X

The pith

A 2D CNN linked to Tesseract recognizes 7th-12th century Tamil inscriptions at 77.7 percent combined efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper curates a dataset of cropped character images from selected temple inscriptions of the 7th to 12th centuries. It applies Otsu thresholding to binarize the images and trains a two-dimensional convolutional neural network to classify the characters. The trained network connects to Tesseract OCR through the pytesseract library in Python, and the system adds text-to-speech conversion for audio output of the recognized text. Testing on samples of both modern and ancient Tamil yields the reported efficiency. This targets the practical problem of reading evolved ancient scripts when comprehensive labeled datasets remain difficult to assemble.

Core claim

After binarization with Otsu thresholding, a two-dimensional convolutional neural network is trained on cropped images of characters from certain 7th-12th century temple inscriptions; this network is interfaced with Tesseract via pytesseract to classify and recognize the characters, producing digitized text that is also rendered as audio through Google's text-to-speech engine, for an overall efficiency of 77.7 percent on the inscriptions examined.

What carries the argument

The 2D convolutional neural network trained on cropped temple-inscription character images and linked to Tesseract OCR through pytesseract.

If this is right

  • The same pipeline processes both modern Tamil and the studied ancient samples.
  • Recognized text is converted to spoken audio output.
  • A usable dataset for this period can be assembled from limited temple sources rather than exhaustive collection.
  • The method supplies a concrete digitization route for inscriptions whose character forms have changed over centuries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Comparable small-dataset CNN approaches could be tested on other historical scripts that also lack large public corpora.
  • Accuracy might rise if the same architecture receives additional images from a wider range of temples or if the network depth is adjusted.
  • The workflow suggests a route for epigraphers to move from physical inscriptions to searchable digital text without first building massive labeled sets.

Load-bearing premise

The small collection of cropped character images from selected temple inscriptions is representative enough for the 2D CNN to learn features that generalize across 7th-12th century Tamil script without overfitting.

What would settle it

Apply the trained system to an independent collection of 7th-12th century Tamil inscriptions not used in training and check whether the measured recognition rate is substantially lower than 77.7 percent.

Figures

Figures reproduced from arXiv: 1907.04917 by Aishwarya Dharani and, Lalitha Giridhar, Velmathi Guruviah.

Figure 1
Figure 1. Figure 1: Proposed architecture of the work [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Suggested setup for implementation The model is trained to minimize the distance between images belonging to same font family and to maximize the distance between images belonging to different font family. The recognized images of each letter block are combined to form a single image tile. This is then subjected to father processing to perform OCR. To implement OCR, the tile is fed to the Pytesseract libra… view at source ↗
Figure 3
Figure 3. Figure 3: Flowchart for software implementation In the case of the pre-digitized text, the OCR accuracy was close to 99%. The real challenge was to perfect the audio output. It was observed that the audio output after performing text to speech conversion also worked with near 99% accuracy. Since the accuracy of the audio output is dependent on the accuracy of the OCR, the efficiency of the integrated system relies p… view at source ↗
read the original abstract

Recognition of ancient Tamil characters has always been a challenge for epigraphers. This is primarily because the language has evolved over the several centuries and the character set over this time has both expanded and diversified. This proposed work focuses on improving optical character recognition techniques for ancient Tamil script which was in use between the 7th and 12th centuries. While comprehensively curating a functional data set for ancient Tamil characters is an arduous task, in this work, a data set has been curated using cropped images of characters found on certain temple inscriptions, specific to this time as a case study. After using Otsu thresholding method for binarization of the image a two dimensional convolution neural network is defined and used to train, classify and, recognize the ancient Tamil characters. To implement the optical character recognition techniques, the neural network is linked to the Tesseract using the pytesseract library of Python. As an added feature, the work also incorporates Google's text to speech voice engine to produce an audio output of the digitized text. Various samples for both modern and ancient Tamil were collected and passed through the system. It is found that for Tamil inscriptions studied over the considered time period, a combined efficiency of 77.7 percent can be achieved.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript presents a system for OCR of ancient Tamil inscriptions (7th-12th centuries) from temple sources. It curates a dataset of cropped character images, applies Otsu thresholding for binarization, trains a 2D CNN for classification, integrates the output with Tesseract via pytesseract, and adds Google TTS for audio. The central claim is that this pipeline achieves a combined efficiency of 77.7% on the studied inscriptions.

Significance. If the empirical result were reproducible and generalizable, the work would offer a practical tool for digitizing historical Tamil epigraphy. The integration of CNN classification with existing OCR and TTS components is a reasonable engineering approach for a low-resource script, but the absence of any quantitative validation details prevents assessment of whether the 77.7% figure reflects genuine generalization or case-specific performance.

major comments (3)
  1. [Abstract] Abstract: the reported 77.7% combined efficiency is presented without any dataset statistics (total images, number of distinct characters, temporal distribution across 7th-12th century samples), train/test split, or description of how the metric was computed (per-character, per-inscription, or otherwise). This information is required to evaluate whether the CNN learned generalizable features or simply memorized a small curated collection.
  2. [Abstract] Abstract and system description: no architecture details, layer counts, filter sizes, or training hyperparameters are supplied for the 2D CNN, nor is any validation procedure (cross-validation, held-out test set) described. Without these, the classification step that underpins the 77.7% claim cannot be reproduced or compared to standard baselines.
  3. [Abstract] Abstract: the manuscript states that 'various samples for both modern and ancient Tamil were collected and passed through the system' yet supplies no quantitative breakdown of modern vs. ancient performance, error analysis, or comparison against Tesseract alone or other Tamil OCR methods. This omission makes it impossible to isolate the contribution of the CNN component.
minor comments (3)
  1. [Abstract] The term 'efficiency' is used for the 77.7% figure; the manuscript should clarify whether this denotes accuracy, F1, or another metric and provide the exact formula or aggregation method.
  2. The description of dataset curation ('cropped images of characters found on certain temple inscriptions') is too vague for a methods section; explicit counts and selection criteria should be added.
  3. No mention is made of the number of classes (distinct ancient Tamil characters) or class imbalance, both of which are critical for interpreting CNN performance on an evolving script.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to supply the missing details where they are available from the original study.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 77.7% combined efficiency is presented without any dataset statistics (total images, number of distinct characters, temporal distribution across 7th-12th century samples), train/test split, or description of how the metric was computed (per-character, per-inscription, or otherwise). This information is required to evaluate whether the CNN learned generalizable features or simply memorized a small curated collection.

    Authors: We agree that these statistics are necessary to assess generalization. The original manuscript described the curation process but omitted numerical details for brevity. In revision we will add the available dataset information, the train/test split employed for the CNN, and clarify that the reported figure is the end-to-end pipeline accuracy on the held-out inscriptions. revision: yes

  2. Referee: [Abstract] Abstract and system description: no architecture details, layer counts, filter sizes, or training hyperparameters are supplied for the 2D CNN, nor is any validation procedure (cross-validation, held-out test set) described. Without these, the classification step that underpins the 77.7% claim cannot be reproduced or compared to standard baselines.

    Authors: We concur that architecture and training details are required for reproducibility. Although the manuscript refers to a 2D CNN, specific layer counts, filter sizes, and hyperparameters were not listed. We will revise the methods section to document the network architecture and validation procedure used in the original implementation. revision: yes

  3. Referee: [Abstract] Abstract: the manuscript states that 'various samples for both modern and ancient Tamil were collected and passed through the system' yet supplies no quantitative breakdown of modern vs. ancient performance, error analysis, or comparison against Tesseract alone or other Tamil OCR methods. This omission makes it impossible to isolate the contribution of the CNN component.

    Authors: The work focused on the pipeline for the 7th-12th century inscriptions; modern samples served only for preliminary checks. A full quantitative breakdown, error analysis, or comparison against Tesseract alone and other methods was not performed. We will add a brief discussion of the CNN's role and performance on the ancient samples studied, but a comprehensive comparative evaluation lies outside the original scope. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical result with no derivations or self-referential predictions.

full rationale

The manuscript reports an empirical accuracy of 77.7% obtained by curating a dataset of cropped temple inscription images, applying Otsu binarization, training a 2D CNN, and piping output through Tesseract. No equations, fitted parameters presented as predictions, uniqueness theorems, or self-citations appear in the provided text. The central claim is a direct measurement on the authors' case-study collection rather than a derivation that reduces to its own inputs by construction. This matches the default expectation of a non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The reported accuracy rests on the assumption that the authors' small curated image set is adequate for CNN training and that standard off-the-shelf tools will generalize to the target script; no free parameters, new axioms, or invented entities are introduced beyond routine ML practice.

axioms (1)
  • domain assumption A 2D CNN with unspecified architecture will learn useful features from the curated character images after Otsu binarization.
    The abstract invokes the CNN without stating layer count, filter sizes, or regularization, treating its effectiveness as given.

pith-pipeline@v0.9.0 · 5762 in / 1278 out tokens · 32504 ms · 2026-05-25T09:34:34.503068+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Archaeological survey of India

    “Archaeological survey of India” [online] Available: http://asi.nic.in/publications/

  2. [2]

    Tamil Script

    “Tamil Script” [online] Available: https://en.wikipedia.org/wiki/Tamil_script

  3. [3]

    Department of Archaeology

    "Department of Archaeology" [online] Available: http:// www.tnarch.gov.in/epi/ins2.htm

  4. [4]

    Middle Chola Temples

    “Middle Chola Temples” S R Balasubramanyam Available: http://ignca.gov.in/Asi_data/62202.pdf

  5. [5]

    Optical Character Recognition

    “Optical Character Recognition” [online] Available: https://en.wikipedia.org/wiki/Optical_character_recognition

  6. [6]

    Tamil character recognition from ancient epigraphical inscription using OCR and NLP,

    T. Manigandan, V. Vidhya, V. Dhanalakshmi and B. Nirmala, "Tamil character recognition from ancient epigraphical inscription using OCR and NLP," 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, 2017, pp. 1008-1011

  7. [7]

    Century Identification and Recognition of Ancient Tamil Character Recognition,

    S. Rajakumar and S. V. Bharathi., "Century Identification and Recognition of Ancient Tamil Character Recognition," International Journal of Computer Applications, vol. 26, no. 4, pp. 32-35, July 2011

  8. [8]

    A Comparative Study of Optical Character Recognition for Tamil Script

    R . J. Kannan and R. Phrabhakar, “A Comparative Study of Optical Character Recognition for Tamil Script”, European Journal of Scientific Research ISSN 1450 -216X Vol.35 No.4 (2009), pp.570-582

  9. [9]

    Tamil Handwritten Character Recognition: Progress and Challenges

    K. Punitharaja, and P. Elango, “Tamil Handwritten Character Recognition: Progress and Challenges” I J C T A, 9(3), 2016, pp. 143-151

  10. [10]

    Unsupervised Transcription of Historical Documents

    T. B. Kirkpatrick, G. Durrett and D. Klein, “Unsupervised Transcription of Historical Documents”, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 207–217, Sofia, Bulgaria, August 4-9 2013

  11. [11]

    A Comprehensive Guide to Convolutional Neural Networks- the EL15 way

    “A Comprehensive Guide to Convolutional Neural Networks- the EL15 way” [online] Available: https://towardsd atascience.com/a-comprehensive-guide-to-convolutional-neural- networks-the-eli5-way-3bd2b1164a53

  12. [12]

    Tesseract OCR

    “Tesseract OCR” [online] Available: https://opensource. google.com/projects/tesseract

  13. [13]

    Comparing the OCR Accuracy Levels of Bitonal and Greyscale Images

    “Comparing the OCR Accuracy Levels of Bitonal and Greyscale Images” [online] Available: http://www.dlib.org/dli b/march09/powell/03powell.html

  14. [14]

    Otsu Thresholding

    “Otsu Thresholding” [online] Available: http://www.labbookpages.co.uk/software/imgProc/otsuThreshold. html

  15. [15]

    Transfer Learning

    L. Torrey and J. Shavlik, “Transfer Learning”, Appears in the Handbook of Research on Machine Learning Applications, published by IGI Global, edited by E. Soria, J. Martin, R. Magdalena, M. Martinez and A. Serrano, 2009

  16. [16]

    Keras: The Python Deep Learning library

    “Keras: The Python Deep Learning library”, [online] Available: https://keras.io/

  17. [17]

    An end-to-end open source machine learning plat- form

    “An end-to-end open source machine learning plat- form”, [online] Available: https://www.tensorflow.org/

  18. [18]

    Acceleration and Implementation of JPEG 2000 Encoder on TI DSP platform

    L. Chien-Chih, H . Hsueh-Ming, "Acceleration and Implementation of JPEG 2000 Encoder on TI DSP platform" Image Processing, 2007. ICIP 2007. IEEE International Conference on, Vo1. 3, pp. III-329-339, 2005

  19. [19]

    Convolutional Neural Networks for Visual recognition

    “Convolutional Neural Networks for Visual recognition” A Karpathy, [online] Available: http://cs231n.github.io/convolutional-networks/#pool

  20. [20]

    Max -pooling/Pooling

    “Max -pooling/Pooling”, [online] Available: https://computersciencewiki.org/index.php/Maxpooling_/_Poolig

  21. [21]

    Euclidean Distance Theory

    “Euclidean Distance Theory”, [online] Available: https://pythonprogramming.net/euclidean -distance-machine-learnin g-tutorial/

  22. [22]

    pytesseract PyPI

    “pytesseract PyPI” [online] Available: https://pypi.org/ project/pytesseract/

  23. [23]

    gTTS Documentation

    “gTTS Documentation”, Pierre -Nick Durette [online] available: https://buildmedia.readthedocs.org/media/pdf/gtts/l atest/gtts.pdf

  24. [24]

    Cloud Text to Speech

    “Cloud Text to Speech” [online] Available: https://cloud.google.com/text-to-speech/

  25. [25]

    Language support

    “Language support” [online] Available: https://cloud.go ogle.com/speech-to-text/docs/languages

  26. [26]

    Image Slicer Documentation

    S . Dobson, “Image Slicer Documentation” [online] Available: https://buildmedia.readthedocs.org/media/pdf/imag e-slicer/latest/image-slicer.pdf

  27. [27]

    A Fast and Accurate Dependency Parser using Ne ural Networks

    D . Chen and C . D. Manning, “A Fast and Accurate Dependency Parser using Ne ural Networks.” Proceedings of EMNLP 2014. 6 Table 1. Results Obtained Scripture/document/t emple inscription Grayscale Output Binarized Output Digitized text Output in Python shell Case 1: Modern Tamil Case 2: Ancient Tamil Case 3: Ancient Tamil Case 4: Ancient Tamil 7 Inscripti...