Synthetic Data for Text Localisation in Natural Images

Andrea Vedaldi; Andrew Zisserman; Ankush Gupta

arxiv: 1604.06646 · v1 · pith:XBS6RLO2new · submitted 2016-04-22 · 💻 cs.CV

Synthetic Data for Text Localisation in Natural Images

Ankush Gupta , Andrea Vedaldi , Andrew Zisserman This is my paper

classification 💻 cs.CV

keywords imagestextdetectionnaturalsyntheticenginefcrnmethod

0 comments

read the original abstract

In this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. Second, we use the synthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detection and bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning. The resulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can process 15 images per second on a GPU.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting
cs.CV 2026-05 unverdicted novelty 6.0

SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.