Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Cong Yao; Minghui Liao; Pengyuan Lyu; Wenhao Wu; Xiang Bai

arxiv: 1807.02242 · v2 · pith:SC2T7XYSnew · submitted 2018-07-06 · 💻 cs.CV

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Pengyuan Lyu , Minghui Liao , Cong Yao , Wenhao Wu , Xiang Bai This is my paper

classification 💻 cs.CV

keywords textend-to-enddetectionmaskneuralrecognitionscenespotting

0 comments

read the original abstract

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TedEval: A Fair Evaluation Metric for Scene Text Detectors
cs.CV 2019-07 unverdicted novelty 6.0

TedEval is a novel evaluation protocol for scene text detectors that performs instance-level matching followed by character-level scoring to provide fairer quality assessment across difficulty levels.
A Multitask Network for Localization and Recognition of Text in Images
cs.CL 2019-06 unverdicted novelty 6.0

Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.