pith. sign in

arxiv: 1506.04395 · v2 · pith:PSPFNKCRnew · submitted 2015-06-14 · 💻 cs.CV

Reading Scene Text in Deep Convolutional Sequences

classification 💻 cs.CV
keywords deepimagemodelscenetextwordcharacterconvolutional
0
0 comments X
read the original abstract

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. Codes for the DTRN will be available.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Multitask Network for Localization and Recognition of Text in Images

    cs.CL 2019-06 unverdicted novelty 6.0

    Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.