pith. sign in

arxiv: 1806.00578 · v1 · pith:IWH644XNnew · submitted 2018-06-02 · 💻 cs.CV

SCAN: Sliding Convolutional Attention Network for Scene Text Recognition

classification 💻 cs.CV
keywords scanrecognitiontextscenesequenceattentionconvolutionalduring
0
0 comments X
read the original abstract

Scene text recognition has drawn great attentions in the community of computer vision and artificial intelligence due to its challenges and wide applications. State-of-the-art recurrent neural networks (RNN) based models map an input sequence to a variable length output sequence, but are usually applied in a black box manner and lack of transparency for further improvement, and the maintaining of the entire past hidden states prevents parallel computation in a sequence. In this paper, we investigate the intrinsic characteristics of text recognition, and inspired by human cognition mechanisms in reading texts, we propose a scene text recognition method with sliding convolutional attention network (SCAN). Similar to the eye movement during reading, the process of SCAN can be viewed as an alternation between saccades and visual fixations. Compared to the previous recurrent models, computations over all elements of SCAN can be fully parallelized during training. Experimental results on several challenging benchmarks, including the IIIT5k, SVT and ICDAR 2003/2013 datasets, demonstrate the superiority of SCAN over state-of-the-art methods in terms of both the model interpretability and performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. 2D-CTC for Scene Text Recognition

    cs.CV 2019-07 unverdicted novelty 6.0

    2D-CTC extends CTC to two dimensions to achieve higher accuracy and speed in recognizing regular and irregular scene text.