1,720,969 research outputs found
Text localization and recognition in natural scene images
Text localization and recognition (text spotting) in natural scene images is an interesting task that finds many practical applications. Algorithms for text spotting may be used in helping visually impaired subjects during navigation in unknown environments; building autonomous driving systems that automatically avoid collisions with pedestrians or automatically identify speed limits and warn the driver about possible infractions that are being committed; and to ease or solve some tedious and repetitive data entry tasks that are still manually carried out by humans. While Optical Character Recognition (OCR) from scanned documents is a solved problem, the same cannot be said for text spotting in natural images. In fact, this latest class of images contains plenty of difficult situations that algorithms for text spotting need to deal with in order to reach acceptable recognition rates. During my PhD research I focused my studies on the development of novel systems for text localization and recognition in natural scene images. The two main works that I have presented during these three years of PhD studies are presented in this thesis: (i) in my first work I propose a hybrid system which exploits the key ideas of region-based and connected components (CC)-based text localization approaches to localize uncommon fonts and writings in natural images; (ii) in my second work I describe a novel deep-based system which exploits Convolutional Neural Networks and enhanced stable CC to achieve good text spotting results on challenging data sets. During the development of both these methods, my focus has always been on maintaining an acceptable computational complexity and a high reproducibility of the achieved results
Combining Textual and Visual Features to Identify Anomalous User-generated Content
Anomaly detection has extensive use in a wide variety of applications, such techniques aim to find patterns in data that do not conform to expected behavior. In this work we apply anomaly detection to the task of discovering anomalies from user-generated content of commercial product descriptions. While most of the other works in literature rely exclusively on textual features, we combine those textual descriptors with visual information extracted from the media resources associated with each product description. Given a large corpus of documents, the proposed system infers the key features describing the behavioral traits of expert users, and automatically reports whenever a newly generated description contains suspicious or low quality textual/visual elements. We prove that the joint use of textual and visual features helps in obtaining a robust detection model that can be employed in an enterprise environment to automatically mark suspicious descriptions for further manual inspection
Neural 1D Barcode Detection Using the Hough Transform
Barcode reading mobile applications to identify products from pictures acquired by mobile devices are widely used by customers from all over the world to perform online price comparisons or to access reviews written by other customers. Most of the currently available 1D barcode reading applications focus on effectively decoding barcodes and treat the underlying detection task as a side problem that needs to be solved using general purpose object detection methods. However, the majority of mobile devices do not meet the minimum working requirements of those complex general purpose object detection algorithms and most of the efficient specifically designed 1D barcode detection algorithms require user interaction to work properly. In this work, we present a novel method for 1D barcode detection in camera captured images, based on a supervised machine learning algorithm that identifies the characteristic visual patterns of 1D barcodes' parallel bars in the two-dimensional Hough Transform space of the processed images. The method we propose is angle invariant, requires no user interaction and can be effectively executed on a mobile device; it achieves excellent results for two standard 1D barcode datasets: WWU Muenster Barcode Database and ArTe-Lab 1D Medium Barcode Dataset. Moreover, we prove that it is possible to enhance the performance of a state-of-the-art 1D barcode reading library by coupling it with our detection method
Augmented text character proposals and convolutional neural networks for text spotting from scene images
In this work we propose a novel method for text spotting from scene images based on augmented Multi-resolution Maximally Stable Extremal Regions and Convolutional Neural Networks. The goal of this work is augmenting text character proposals to maximize their coverage rate over text elements in scene images, to obtain satisfying text detection rates without the need of using very deep architectures nor large amount of training data. Using simple and fast geometric transformations on multi-resolution proposals our system achieves good results for several challenging datasets while also being computationally efficient to train and test on a desktop computer
Robust Angle Invariant GAS Meter Reading
In this work we propose a novel method for automatic gas meter reading from real world images. In a wide range of countries all over the world, the existing automatic technology is not adopted, usually the reading is manually done on site, and a picture is taken through a mobile device as a proof of reading. In order to confirm the reading, a tedious work of checking the proof images is commonly done offline by an operator. With this contribution we aim to supply an effective system, able to provide a real support to the validation process reducing the human effort and the time consumed. We exploit both region-based and Maximally Stable Extremal Regions techniques, during the phase involving the localization of the meter area and to detect the meter counter digits in the detection step respectively. The evaluation has been carried out on every step of our approach, as well as on the overall assessment; although the problem is complex, the proposed method leads to good results even when applied to degraded images, it represents an effective solution to the gas meter reading problem and it can be utilized in real applications
Content extraction from marketing flyers
The rise of online shopping has hurt physical retailers, which struggle to persuade customers to buy products in physical stores rather than online. Marketing flyers are a great mean to increase the visibility of physical retailers, but the unstructured offers appearing in those documents cannot be easily compared with similar online deals, making it hard for a customer to understand whether it is more convenient to order a product online or to buy it from the physical shop. In this work we tackle this problem, introducing a content extraction algorithm that automatically extracts structured data from flyers. Unlike competing approaches that mainly focus on textual content or simply analyze font type, color and text positioning, we propose novel and more advanced visual features that capture the properties of graphic elements typically used in marketing materials to attract the attention of readers towards specific deals, obtaining excellent results and a high language and genre independence
Text Localization based on Fast Feature Pyramids and Multi-resolution Maximally Stable Extremal Regions
- …
