pith. sign in

arxiv: 1709.00138 · v1 · pith:AO22BDRJnew · submitted 2017-09-01 · 💻 cs.CV

Single Shot Text Detector with Regional Attention

classification 💻 cs.CV
keywords textdetectoraccurateattentionfeaturesinceptionmulti-scaleresults
0
0 comments X
read the original abstract

We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accurate inference of words, particularly at extremely small sizes. This results in a single model that essentially works in a coarse-to-fine manner. It departs from recent FCN- based text detectors which cascade multiple FCN models to achieve an accurate prediction. Furthermore, we develop a hierarchical inception module which efficiently aggregates multi-scale inception features. This enhances local details, and also encodes strong context information, allow- ing the detector to work reliably on multi-scale and multi- orientation text with single-scale images. Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28]. Demo is available at: http://sstd.whuang.org/.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition

    cs.CV 2019-07 unverdicted novelty 7.0

    GA-DAN models cross-domain shifts in geometry and appearance spaces with multi-modal spatial learning and disentangled cycle-consistency loss, yielding superior scene text detection and recognition performance on adap...