Detecting Text in Natural Image with Connectionist Text Proposal Network

Pan He; Tong He; Weilin Huang; Yu Qiao; Zhi Tian

arxiv: 1609.03605 · v1 · pith:7MZIJQPSnew · submitted 2016-09-12 · 💻 cs.CV

Detecting Text in Natural Image with Connectionist Text Proposal Network

Zhi Tian , Weilin Huang , Tong He , Pan He , Yu Qiao This is my paper

classification 💻 cs.CV

keywords textctpnimagenetworkproposalconnectionistconvolutionalmodel

0 comments

read the original abstract

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multi- language text without further post-processing, departing from previous bottom-up methods requiring multi-step post-processing. It achieves 0.88 and 0.61 F-measure on the ICDAR 2013 and 2015 benchmarks, surpass- ing recent results [8, 35] by a large margin. The CTPN is computationally efficient with 0:14s/image, by using the very deep VGG16 model [27]. Online demo is available at: http://textdet.com/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Multitask Network for Localization and Recognition of Text in Images
cs.CL 2019-06 unverdicted novelty 6.0

Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.