Multi-script Text Extraction from Natural Scenes
Scene text extraction methodologies are usually
based in classification of individual regions or patches, using a
priori knowledge for a given script or language. Human perception
of text, on the other hand, is based on perceptual organisation
through which text emerges as a perceptually significant group of
atomic objects. Therefore humans are able to detect text even in
languages and scripts never seen before. In this paper, we argue
that the text extraction problem could be posed as the detection of
meaningful groups of regions. We present a method built around
a perceptual organisation framework that exploits collaboration
of proximity and similarity laws to create text-group hypotheses.
Experiments demonstrate that our algorithm is competitive with
state of the art approaches on a standard dataset covering text
in variable orientations and two languages.
Our method is inspired by the human perception of textual
content, largely based on perceptual organisation. The proposed
method requires practically no training as the perceptual
organisation based analysis is parameter free. It is totally
independent of the language and script in which text appears,
it can deal efficiently with any type of font and text size,
while it makes no assumptions about the orientation of the
text. Qualitative results demonstrate competitive performance
and faster computation.
Gomez L. and Karatzas D., "Multi-script Text Extraction from Natural Scenes", 12th International Conference on Document Analysis and Recognition, 2013.
The source code implementation of the paper can be found at .
On-line text extractor
You can test how our method perform in localizing text in an image provided by you. You can upload up to 600Kb images (jpg or png) using the form below. Just some words in case the output is not what you expect: our method assume that characters are non overlapping connected components of the image, with a constant colour, and a noticeable contrast with their immediate background. Besides, some parameters on the region decomposition (MSER algorithm) have been validated using the ICDAR2003 and MSRA-TD500 train sets and may not be the appropriate for your image. To get an idea of the kind of images in which our method performs well (and in which not) you can take a look on the gallery of qualitative results on the KAIST dataset  for the task of text segmentation, or on the MSRA-TD500  and ICDAR2003  datasets for the task of text localization.
 A. Desolneux, L. Moisan, and J.-M. Morel, "A grouping principle and four applications", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, pp. 508-513, 2003.
 A. Fred and A. Jain, "Combining multiple clusterings using evidence accumulation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, no.6, pp. 835-850, 2005.
 S. Lee, M. S. Cho, K. Jung, and J. H. Kim, “Scene text extraction with
edge constraint and text collinearity,” in Proc. ICPR, 2010.
 C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary
orientations in natural images,” in Proc. CVPR, 2012.
 Simon M. Lucas et al., "ICDAR 2003 robust reading competitions: entries, results, and future directions". IJDAR, Volume 7, Issue 2-3, pp 105-122, 2005.