pith. sign in

arxiv: 1907.04888 · v1 · pith:HJLR2BO2new · submitted 2019-07-10 · 💻 cs.CV

Fully Convolutional Networks for Handwriting Recognition

Pith reviewed 2026-05-24 23:34 UTC · model grok-4.3

classification 💻 cs.CV
keywords handwriting recognitionfully convolutional networksdual stream architectureIAM datasetRIMES datasetattention mechanismsymbol sequences
0
0 comments X

The pith

A dual-stream fully convolutional network recognizes handwriting of unknown length by outputting symbol streams directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a fully convolutional model that takes handwriting images of any length and produces streams of symbols. It uses two streams for local and global context to avoid complex preprocessing and postprocessing steps. The approach works across Latin-based languages with over 100 symbols and matches dictionary-based methods on IAM and RIMES datasets. An attention mechanism handles variations like slant and noise. When a dictionary is available, it applies probabilistic correction to words.

Core claim

The authors demonstrate that a dual stream fully convolutional architecture processes raw handwriting images of unknown length into arbitrary symbol streams using local and global context, achieving competitive results on IAM and RIMES without relying on symbol alignment, connectionist temporal classification, dictionary matching, or language models.

What carries the argument

Dual stream fully convolutional architecture that combines local and global context to process variable-length handwriting inputs into symbol sequences.

Load-bearing premise

The dual-stream convolutional design alone suffices to produce accurate symbol streams from raw images without alignment correction, CTC, dictionaries, or language models.

What would settle it

Evaluating the model on the IAM dataset without any dictionary or language model and obtaining error rates substantially above existing dictionary-free baselines would disprove the claim.

Figures

Figures reproduced from arXiv: 1907.04888 by Dheeraj Peri, Felipe Petroski Such, Frank Brockler, Paul Hutkowski, Raymond Ptucha.

Figure 1
Figure 1. Figure 1: The Symbol CNN is a fully convolutional model comprised of two networks. The bottom part of the network takes an input image of size [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: During training, blank space characters( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Calculating CER error between words tymme and time using dynamic programming. From left to right: after one step, after finishing “t”, after finishing first “m”, and final CER of 2. quite effective at symbol recognition with or without a word lexicon. IV. CER AND VOCABULARY MATCHING We report our results using Character Error Rate (CER): CER = R + D + C (1) where R is number of characters replaced, D is th… view at source ↗
read the original abstract

Handwritten text recognition is challenging because of the virtually infinite ways a human can write the same message. Our fully convolutional handwriting model takes in a handwriting sample of unknown length and outputs an arbitrary stream of symbols. Our dual stream architecture uses both local and global context and mitigates the need for heavy preprocessing steps such as symbol alignment correction as well as complex post processing steps such as connectionist temporal classification, dictionary matching or language models. Using over 100 unique symbols, our model is agnostic to Latin-based languages, and is shown to be quite competitive with state of the art dictionary based methods on the popular IAM and RIMES datasets. When a dictionary is known, we further allow a probabilistic character error rate to correct errant word blocks. Finally, we introduce an attention based mechanism which can automatically target variants of handwriting, such as slant, stroke width, or noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a dual-stream fully convolutional network for handwriting text recognition. It processes raw variable-length handwriting images to output arbitrary symbol streams, using local and global context to avoid symbol alignment correction, CTC, dictionary matching, and language models. The approach is evaluated on IAM and RIMES with over 100 symbols, claiming competitiveness with state-of-the-art dictionary-based methods; an optional probabilistic CER correction is available when a dictionary is known, and an attention mechanism handles variants such as slant, stroke width, or noise.

Significance. If the empirical results hold, the work would be significant for demonstrating that a purely convolutional dual-stream architecture can produce usable symbol streams from raw inputs without traditional alignment or decoding components. The explicit separation of no-dictionary and dictionary-assisted cases, along with the attention module presented as an add-on, provides clear support for the central claim. The language-agnostic design for Latin-based scripts adds practical value if the reported CER/WER numbers prove robust across ablations.

minor comments (3)
  1. Abstract: the competitiveness claim on IAM and RIMES would be easier to assess if the specific CER and WER values (and their comparison to baselines) were stated directly rather than deferred to the experimental section.
  2. The attention mechanism section would benefit from an explicit diagram or equation showing how the attention weights are computed and added to the dual-stream outputs, to clarify integration without implying hidden alignment.
  3. Table presenting the no-dictionary results should include the exact number of runs or standard deviations to allow readers to judge stability of the reported competitiveness.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the work's significance, and recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents a dual-stream fully convolutional architecture for handwriting recognition and reports empirical CER/WER results on IAM and RIMES. No derivation chain, equations, or fitted parameters are described that reduce a claimed prediction or uniqueness result to the inputs by construction. The central claim is an empirical demonstration that the architecture can operate without CTC, alignment correction, or language models; this is supported by explicit ablations rather than by self-definition or self-citation load-bearing. No self-citation, ansatz smuggling, or renaming of known results is visible in the provided text that would force the outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities. The central claim rests on the unstated assumption that standard convolutional training on the cited datasets is sufficient to realize the stated performance.

pith-pipeline@v0.9.0 · 5681 in / 1206 out tokens · 19280 ms · 2026-05-24T23:34:03.945774+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 2 internal anchors

  1. [1]

    Accessed: 2017

    Nist special database 19, 2nd ed. Accessed: 2017

  2. [2]

    Deep learning based isolated arabic scene character recognition

    Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, and Ru- biyah Yousaf. Deep learning based isolated arabic scene character recognition. In Arabic Script Analysis and Recognition (ASAR), 2017 1st International Workshop on , pages 46–51. IEEE, 2017

  3. [3]

    Rimes evaluation campaign for handwritten mail processing

    Emmanuel Augustin, Matthieu Carr ´e, Emmanu `ele Grosicki, J-M Brodin, Edouard Geoffrois, and Franc ¸oise Prˆeteux. Rimes evaluation campaign for handwritten mail processing. In International Workshop on Frontiers in Handwriting Recognition , pages 231–235, 2006

  4. [4]

    Handwritten text recognition using deep learning

    Batuhan Balci, Dan Saadati, and Dan Shiferaw. Handwritten text recognition using deep learning. CS231n: Convolutional Neural Networks for Visual Recognition, Stanford University Project , 2017

  5. [5]

    A comparison of sequence-trained deep neural networks and recurrent neural networks optical modeling for handwriting recognition

    Th ´eodore Bluche, Hermann Ney, and Christopher Kermorvant. A comparison of sequence-trained deep neural networks and recurrent neural networks optical modeling for handwriting recognition. In International Conference on Statistical Language and Speech Pro- cessing, pages 199–210. Springer, 2014

  6. [6]

    Structured document segmentation and representation by the modified xy tree

    Francesca Cesarini, Marco Gori, Simone Marinai, and Giovanni Soda. Structured document segmentation and representation by the modified xy tree. In Proceedings of the Fifth International Conference on Document Analysis and Recognition , pages 563–566, 1999

  7. [7]

    DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

    Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915 , 2016

  8. [8]

    A system for offline character recognition using auto-encoder networks

    Sagar Dewan and Srinivasa Chakravarthy. A system for offline character recognition using auto-encoder networks. In International Conference on Neural Information Processing , pages 91–99. Springer, 2012

  9. [9]

    Fast and robust training of recurrent neural networks for offline handwriting recogni- tion

    Patrick Doetsch, Michal Kozielski, and Hermann Ney. Fast and robust training of recurrent neural networks for offline handwriting recogni- tion. In 14th International Conference on Frontiers in Handwriting Recognition, pages 279–284, 2014

  10. [10]

    Hierarchical hybrid mlp/hmm or rather mlp features for a discrimi- natively trained gaussian hmm: a comparison for offline handwriting recognition

    Philippe Dreuw, Patrick Doetsch, Christian Plahl, and Hermann Ney. Hierarchical hybrid mlp/hmm or rather mlp features for a discrimi- natively trained gaussian hmm: a comparison for offline handwriting recognition. In 18th IEEE International Conference on Image Pro- cessing, pages 3541–3544, 2011

  11. [11]

    Improving offline handwritten text recognition with hybrid hmm/ann models

    Salvador Espana-Boquera, Maria Jose Castro-Bleda, Jorge Gorbe- Moya, and Francisco Zamora-Martinez. Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE transactions on pattern analysis and machine intelligence , 33(4):767–779, 2011

  12. [12]

    Handwritten word recognition with character and inter-character neural networks

    Paul D Gader, Magdi Mohamed, and Jung-Hsien Chiang. Handwritten word recognition with character and inter-character neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cyber- netics), 27(1):158–164, 1997

  13. [13]

    Connectionist temporal classification: labelling unseg- mented sequence data with recurrent neural networks

    Alex Graves, Santiago Fern ´andez, Faustino Gomez, and J ¨urgen Schmidhuber. Connectionist temporal classification: labelling unseg- mented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning , pages 369–

  14. [14]

    Long short-term memory

    Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory. Neural computation , 9(8):1735–1780, 1997

  15. [15]

    Word segmentation of off-line handwritten documents

    Chen Huang and Sargur N Srihari. Word segmentation of off-line handwritten documents. In Document Recognition and Retrieval XV , volume 6815, page 68150E. International Society for Optics and Photonics, 2008

  16. [16]

    Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

    Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zis- serman. Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 , 2014

  17. [17]

    Caffe: Convolutional architecture for fast feature embedding

    Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia , pages 675–678. ACM, 2014

  18. [18]

    Improvements in rwth’s system for off-line handwriting recognition

    Michał Kozielski, Patrick Doetsch, and Hermann Ney. Improvements in rwth’s system for off-line handwriting recognition. In Document Analysis and Recognition (ICDAR), 2013 12th International Confer- ence on , pages 935–939. IEEE, 2013

  19. [19]

    Fully convo- lutional networks for semantic segmentation

    Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convo- lutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 3431–3440, 2015

  20. [20]

    The iam-database: an english sentence database for offline handwriting recognition

    U-V Marti and Horst Bunke. The iam-database: an english sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition , 5(1):39–46, 2002

  21. [21]

    The a2ia french handwriting recognition system at the rimes-icdar2011 competition

    Far `es Menasri, J ´erˆome Louradour, Anne-Laure Bianne-Bernard, and Christopher Kermorvant. The a2ia french handwriting recognition system at the rimes-icdar2011 competition. In Document Recognition and Retrieval XIX , volume 8297, page 82970Y . International Society for Optics and Photonics, 2012

  22. [22]

    Dropout improves recurrent neural networks for hand- writing recognition

    Vu Pham, Th ´eodore Bluche, Christopher Kermorvant, and J ´erˆome Louradour. Dropout improves recurrent neural networks for hand- writing recognition. In 14th International Conference on Frontiers in Handwriting Recognition , pages 285–290, 2014

  23. [23]

    Cnn-n-gram for handwriting word recognition

    Arik Poznanski and Lior Wolf. Cnn-n-gram for handwriting word recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 2305–2314, 2016

  24. [24]

    Faster r- cnn: Towards real-time object detection with region proposal networks

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r- cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems , pages 91–99, 2015

  25. [25]

    An end-to-end trainable neural network for image-based sequence recognition and its applica- tion to scene text recognition

    Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its applica- tion to scene text recognition. IEEE transactions on pattern analysis and machine intelligence , 39(11):2298–2304, 2017

  26. [26]

    Robust scene text recognition with automatic rectification

    Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. Robust scene text recognition with automatic rectification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4168–4176, 2016

  27. [27]

    An analysis of sentence boundary detection systems for english and portuguese documents

    Carlos N Silla and Celso AA Kaestner. An analysis of sentence boundary detection systems for english and portuguese documents. In International Conference on Intelligent Text Processing and Com- putational Linguistics , pages 135–141. Springer, 2004

  28. [28]

    Convolutional multi-directional recurrent network for offline handwritten text recognition

    Zenghui Sun, Lianwen Jin, Zecheng Xie, Ziyong Feng, and Shuye Zhang. Convolutional multi-directional recurrent network for offline handwritten text recognition. In 15th International Conference on Frontiers in Handwriting Recognition , pages 240–245, 2016

  29. [29]

    Feature extraction methods for character recognition-a survey

    Øivind Due Trier, Anil K Jain, and Torfinn Taxt. Feature extraction methods for character recognition-a survey. Pattern recognition , 29(4):641–662, 1996

  30. [30]

    Handwriting recognition with large multidimensional long short-term memory re- current neural networks

    Paul V oigtlaender, Patrick Doetsch, and Hermann Ney. Handwriting recognition with large multidimensional long short-term memory re- current neural networks. In 15th International Conference on Frontiers in Handwriting Recognition , pages 228–233, 2016

  31. [31]

    Fully convolutional recurrent network for handwritten chinese text recognition

    Zecheng Xie, Zenghui Sun, Lianwen Jin, Ziyong Feng, and Shuye Zhang. Fully convolutional recurrent network for handwritten chinese text recognition. In 23rd International Conference on Pattern Recog- nition, pages 4011–4016, 2016

  32. [32]

    A deep learning based character recognition system from multimedia document

    Usha Yadav, Satya Verma, Deepak Kumar Xaxa, and Chandrakant Mahobiya. A deep learning based character recognition system from multimedia document. In Power and Advanced Computing Technologies (i-PACT), 2017 Innovations in , pages 1–7. IEEE, 2017