Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
Not a member yet
343 research outputs found
Sort by
A Novel Angular Texture Pattern (ATP) Extraction Method for Crop and Weed Discrimination Using Curvelet Transformation
Weed management is the most significant process in the agricultural applications to improve the crop productivity rate and reduce the herbicide application cost. Existing weed detection techniques does not yield better performance due to the complex background and illumination variation. Hence, there arises a need for the development of effective weed identification technique. To overcome this drawback, this paper proposes a novel Angular Texture Pattern (ATP) Extraction Method for crop and weed discrimination using curvelet transformation. In our proposed work, Adaptive Median Filter (AMF) is used for filtering the impulse noise from the image. Plant image identification is performed using green pixel extraction and K-means clustering. Wrapping based Curvelet transform is applied to the plant image. Feature extraction is performed to extract the angular texture pattern of the plant image. Particle Swarm Optimization (PSO) based Differential Evolution Feature Selection (DEFS) approach is applied to select the optimal features. Then, the selected features are learned and passed through an RVM based classifier to find out the weed. Edge detection and contouring is performed to identify the weed in the plant image. Fuzzy rule-based approach is applied to detect the low, medium and high levels of the weed patchiness. From the experimental results, it is clearly observed that the accuracy of the proposed approach is higher than the existing Support Vector Machine (SVM) based approaches. The proposed approach achieves better performance in terms of Hausdorff distance, Jaccard distance, Dice distance, accuracy, sensitivity, and specificity
New contributions on line-projections in omnidirectional vision
Computer vision has an increasing interest in most fields of emerging technologies. A challenging topic in this field is to study how to enlarge the field of view of the camera systems to obtain more information of the environment in a single view. In particular, omnidirectional vision can be useful in many applications such as estimating location in robotics, autonomous driving and unmanned aerial vehicles.The wide field of view of omnidirectional cameras allows taking advantage of describing 3D scenarios using line features. On the one hand, line features represent natural landmarks in man-made environments, they are easy to understand, coincident with edges of constructive elements and often still present when having texture-less scenarios.On the other hand, long segments are especially useful for drift compensation because they are usually completely visible on the omnidirectional projection.However, in omnidirectional cameras line projections are distorted by the projection mapping becoming complex curves. This thesis is focused on the geometry of line projections (line-images) in omnidirectional systems. Main addressed topic of this work is line-image extraction on different kinds of central and non-central omnidirectional images. However, due to the nature of projection in omnidirectional cameras, other addressed topics are camera calibration and, in the case on non-central cameras, 3D reconstruction from single images
Multi-focus image fusion using maximum symmetric surround saliency detection
In digital photography, two or more objects of a scene cannot be focused at the same time. If we focus one object, we may lose information about other objects and vice versa. Multi-focus image fusion is a process of generating an all-in-focus image from several out-of-focus images. In this paper, we propose a new multi-focus image fusion method based on two-scale image decomposition and saliency detection using maximum symmetric surround. This method is very beneficial because the saliency map used in this method can highlight the saliency information present in the source images with well defined boundaries. A weight map construction method based on saliency information is developed in this paper. This weight map can identify the focus and defocus regions present in the image very well. So we implemented a new fusion algorithm based on weight map which integrate only focused region information into the fused image. Unlike multi-scale image fusion methods, in this method two-scale image decomposition is sufficient. So, it is computationally efficient. Proposed method is tested on several multi-focus image datasets and it is compared with traditional and recently proposed fusion methods using various fusion metrics. Results justify that our proposed method outperforms the existing methods
Segmentation and indexation of complex objects in comic book
Born in the 19th century, comics is a visual medium used to express ideas via images, often combined with text or visual information.It is an art form that uses images deployed in sequence for graphic storytelling (sequential art), spread worldwide initially using newspapers, books and magazines.Nowadays, the development of the new technologies and the World Wide Web is giving birth to a new form of paperless comics that takes advantage of the virtual world freedom.However, traditional comics still represent an important cultural heritage in many countries.They have not yet received the same level of attention as music, cinema or literature about their adaptation to the digital format.Using information technologies with digitized comic books would facilitate the exploration of digital libraries, accelerate their translation, allow augmented reading, speech playback for the visually impaired etc.Heritage museums such as the CIBDI (French acronym for International City of Comic books and Images), the Kyoto International Manga Museum and The Digital Comic Museum have already digitized several thousands of comic albums that some are now in the public domain.Despite the growing market place of digital comics, few research has been carried out to take advantage of the added value provided by these new media.A particularity of documents is their dependence on the type of document that often requires specific processing.The challenge of document analysis systems is to propose generic solutions for specific problems.The design process of comics is so specific that their automated analysis may be seen as a niche research field within document analysis, at the intersection of complex background, semi-structured and mixed content documents.Being at the intersection of several fields combines their difficulties.In this thesis, we review, highlight and illustrate the challenges related to comic book image analysis in order to provide a good overview about the last research progress in this field and the current issues.In order to cover the widest possible scope of study, we propose three different approaches for comic book image analysis.The three approaches aim to provide an automatic description of the image content.Different levels of description are discussed, from spacial positions (low level) to semantic information (high level).The first approach describes the image in an intuitive way, from simple to complex elements using previously extracted elements to guide further processing.Simple elements such as panel, text and balloon regions are extracted first, followed by balloon tails and comic character positions from the direction indicated by the tails.The second approach addresses independent information extraction to recover the main drawback of the first approach: error propagation.This second method is composed by several specific extractors for each type of content, independent from each other.Those extractors can be used in parallel, without needing previous information which cancels the error propagation effect.Extra processing such as balloon type classification and text recognition are also covered.The third approach introduces a knowledge-driven system that combines low and high level processing to build a scalable system for comics image understanding.This approach is intended to improve the overall precision of content extraction methods.We built an expert system composed by an inference engine and two models, one for comics domain and another one for image processing, stored in an ontology.The first model embeds the knowledge about comic books and the second models the image processing related part.These two models allow consistency analysis of extracted information and inference of the relationships between all the extracted elements such as the reading order, the type of text (e.g. spoken, onomatopoeic, illustrative) and the relations between speech balloons and speaking characters.The expert system combines the benefits of the two first approaches and enables high level semantic description such as the reading order, the semantic of the balloon shapes, the relations between the speech balloons and their speakers, and the interaction between the comic characters.Apart from that, in this thesis we have provided the first public comic book image dataset and ground truth to the community along with an overall experimental comparison of all the proposed methods and some of the state-of-the-art method
Contributions to Gait Recognition Using Multiple-Views
This thesis focuses on identifying people by the way they walk. The problem of gait recognition has been addressed by using different approaches, both in the 2D and 3D domains, and using one or multiple views. However, the dependence on camera viewpoint (and therefore the dependence on the trajectory of motion) still remains an open problem. This dissertation addresses the problem of dependence on the trajectory through the use of 3D reconstructions of walking humans. The use of 3D models have several advantages that are worth mentioning. First, by the use of 3D reconstructions it is possible to exploit a greater amount of information in contrast to methods that extract descriptors from just 2D images. Second, the 3D reconstructions can be aligned along the way as if the subject had walked on a treadmill, thus providing a way to recognize people regardless the path. Three approaches are proposed in order to address the dependence on the trajectory: (1) using aligned 3D reconstructions of walking humans, (2) using unaligned 3D reconstructions of walking humans. (3) extracting a 3D description without using 3D reconstructions. Three gait descriptors are also proposed. The first focuses on describing gait by means of morphological analysis of 3D aligned volumes. The second makes use of the concept of entropy to describe the dynamics of human gait. The third aims to capture the dynamics of gait in a rotation invariant way, which makes it interesting for recognize people walking on both straight and curves path, and regardless direction changes. These approaches have been tested on the "AVA Multi-View Dataset (AVAMVG)" and on the "Kyushu University 4D Gait Database (KY4D)". Both databases are specifically designed to address the problem of dependence on the viewpoint, and therefore the dependence on the trajectory. Experimental results show that for the approach based on aligned volumetric reconstructions, the entropy-based gait descriptor achieved the best results compared to other closely related methods of the state-of-art. However, the rotation invariant gait descriptor achieves a recognition rate that overcomes the compared state-of-art methods without requiring the alignment of the 3D gait reconstructions
Automated Analysis of Orthopaedic X-ray Images based on Digital-Geometric Techniques
This thesis reports several methods for automated analysis and interpretation of bone X-ray images. Automatic segmentation of the bone part in a digital X-ray image is a challenging problem because of its low contrast against the surrounding flesh. In this thesis, we propose a fully automated X-ray image segmentation technique, which is based on a variant of entropy measure of the image. We have also analyzed the geometric information embedded in the long-bone contour image to identify the presence of abnormalities in the bone and perform fracture detection, fracture classification, and bone cancer diagnosis
Anytime and Distributed Approaches for Graph Matching
Due to the inherent genericity of graph-based representations, and thanks to the improvement of computer capacities, structural representations have become more and more popular in the field of Pattern Recognition (PR). In a graph-based representation, vertices and their attributes describe objects (or part of them) while edges represent interrelationships between the objects. Representing objects by graphs turns the problem of object comparison into graph matching (GM) where correspondences between vertices and edges of two graphs have to be found.In the domain of GM, over the last decade, Graph Edit Distance (GED) has been given a specific attention due to its flexibility to match many types of graphs. GED has been applied to a wide range of specific applications from molecule recognition to image classification. Researchers have shed light on the approximate methods that can find suboptimal solutions hopefully close to the optimal ones but the gap between optimal and suboptimal solutions has not been deeply studied yet. For that reason, in this thesis, we focus on exact GED algorithms. Unfortunately, exact GED methods have an exponential complexity. Thus, coming up with an exact GED algorithm that can be scaled up to match graphs involved in PR tasks is a great challenge. Two promising ways to cut-off computational time are search space pruning and distributed algorithms. To this end, we first propose a depth-first GED algorithm which requires less memory and search time. An evaluation of all possible solutions is performed without explicitly enumerating all of them. Candidates are discarded using an upper and lower bounds strategy.To find a trade-off between speed and optimality, we describe how to convert the proposed depth-first GED method into an anytime one that is capable of delivering a first solution very quickly. It also can find a list of improved solutions and eventually converges to the optimal solution instead of providing one and only one solution (i.e., the optimal solution). With the delight of more time, anytime methods can also reach the optimal solution. To illustrate the usage of anytime GM algorithms, we convert our depth-first GED algorithm into an anytime one. We analyze the properties of such methods to solve GM problems and consider the performance in terms of accuracy of the provided solution compared to the optimal or the best one found by a state-of-the-art methods.This thesis is also considered as a first attempt to reduce the run time of exact GED methods usingparallel and distributed fashions. Two parallel and distributed GED approaches are put forward; both of them are based on the depth-first GED method. The search space is decomposed into smaller search trees which are solved independently in a parallel or a distributed manner.To benchmark the proposed GED methods, we propose not only assessing GED methods in a classification context but also evaluating them in a graph-level one (i.e., evaluating their distance and matchin accuracy). Due to the exponential complexity of exact GED algorithms and in order to obtain this kind of information about methods, we propose analyzing the behavior of the eight compared methods under time and memory constraints. In addition to the performance evaluations metrics, we propose a graph database repository dedicated to GED. In this repository, we add graph-level information to well-known and publicly used databases. Added information consists of the best found edit distance of each pair of graphs as well as their vertex-to-vertex and edge-to-edge mappings corresponding to the best found distance. This information helps in assessing the feasibility of exact and approximate GED methods. This thesis brings into question the usual evidences saying that it is impossible to use exact errortolerant GM methods in real-world applications when matching large graphs, or even in a classification context. However, we argue and show that a new type of GM, referred to as anytime methods, can be successful in a graph-level context as well as a classification one. Anytime videos, pseudo-codes and the publications related to the thesis are publicly available at: http://www.rfai.li.univ-tours.fr/ PagesPerso/zabuaisheh/home.html. The thesis is also publicly available at: http://www.rfai.li.univ-tours.fr/Documents/Articles_RFAI/PhD2016zeina.pdf
A Hybrid Particle Swarm Optimization with Affine Transformation Approach for Cloud Free Multi-Temporal Image Registration
An image registration is the major part of the image categorization and cluster formation in multi temporal image processing. The images are affected by the different factors such as cloud shadow, water level, building shadows etc. In this paper, an enhanced registration process and the cloud removal technique is proposed for image enhancement. The Daemons, Combined Registration and Segmentation (CRS) approach, Markov Random Field (MRF) approach and Mutual Information (MI) based approaches results in more computational complexity, minimum edge preservation measure (QAB/F) and Mutual Information in image registration. In order to maximize the quality of edge preservation measure and MI with minimum computational time, this paper proposes Particle Swarm Optimization (PSO) based affine transformation technique. The proposed techniques measure and compare the computation time against the number of pixels of an image with the existing methods of CRS and MRF for the number of images. The comparative analysis of QAB/F and MI with the traditional methods of Clock Point –Least Square (CP-LS) and the Multi-Focus Image Fusion (MFIF) and Discrete Wavelet Transform (DWT) is presented to confirm the effective performance. The simulation results of the proposed transformation for registration process confirms the effective image registration in the multi-temporal image processing
Image Retrieval: Modelling Keywords via Low-level Features
With the advent of cheap digital recording and storage devices and the rapidly increasing popularity of online social networks that make extended use of visual information, like Facebook and Instagram, image retrieval regained great attention among the researchers in the areas of image indexing and retrieval. Image retrieval methods are mainly falling into content-based and text-based frameworks.Although content-based image retrieval has attracted large amount of research interest, the difficulties inquerying by an example propel ultimate users towards text queries. Searching by text queries yields more effective and accurate results that meet the needs of the users while at the same time preserves their familiarity with the way traditional search engines operate. However, text-based image retrieval requires images to be annotated i.e. they are related to text information. Much effort has been invested on automatic image annotation methods [1], since the manual assignment of keywords (which is necessary for text-based image retrieval) is a time consuming and labour intensive procedure [2].In automatic image annotation, a manually annotated set of data is used to train a system for the identification of joint or conditional probability of an annotation occurring together with a certain distribution of feature vectors corresponding to image content [3]. Different models and machine learning techniques were developed to learn the correlation between image features and textual words based on examples of annotated images. Learned models of this correlation are then applied to predict keywords for unseen images [4]. In the literature of automatic semantic image annotation, proposed approaches tend to classify images using only abstract terms or using holistic image features for both abstract terms and object classes. The extraction and selection of low-level features, either holistic or from particular image areas is of primary importance for automatic image annotation. This is true either for the content-based or for the text-based retrieval paradigm. In the former case the use of appropriate low-level features leads to accurate and effective object class models used in object detection while in the latter case, the better the low- level features are, the easier the learning of keyword models is.The intent of the image classification is to categorize the content of the input image to one of several keyword classes. A proper image annotation may contain more than one keyword that is relevant to the image content, so a reclassification process is required in this case, as well as whenever a new keyword class is added to the classification scheme. The creation of separate visual models for all keyword classes adds a significant value in automatic image annotation since several keywords can be assigned to the input image. As the number of keyword classes increases the number of keywords assigned to the images also increases too and there is no need for reclassification. However, the keyword modeling incurred various issues such as the large amount of manual effort required in developing the training data, the differences in interpretation of image contents, and the inconsistency of the keyword assignments among different annotators.This thesis focuses on image retrieval using keywords under the perspective of machine learning. It covers different aspects of the current research in this area including low-level feature extraction, creation of training sets and development of machine learning methodologies. It also proposes the idea of addressing automatic image annotation by creating visual models, one for each available keyword, and presents several examples of the proposed idea by comparing different features and machine learning algorithms in creating visual models for keywords referring to the athletics domain.The idea of automatic image annotation through independent keyword visual models is divided into two main parts: the training and automatic image annotation. In the first part, visual models for all available keywords are created, using the one-against-all training paradigm, while in the second part, annotations are produced for a given image based on the output of these models, once they are fed with a feature vector extracted from the input image. An accurate manually annotated dataset containing pairs of images and annotations is prerequisite for a successful automatic image annotation. Since the manual annotations are likely to contain human judgment errors and subjectivity in interpreting the image, the current thesis investigates the factors that influence the creation of manually annotated image datasets [5]. It also proposes the idea of modeling the knowledge of several people by creating visual models using such training data, aiming to significantly improve the ultimate efficiency of image retrieval systems [6].Moreover, it proposes a new algorithm for the extraction of low level features. The Spatial Histogramof Keypoints (SHiK) [7], keeps the spatial information of localized keypoints, on an effort to overcome the limitations caused by the non-fixed and huge dimensionality of the SIFT feature vector when used in machine learning frameworks. SHiK partitions the image into a fixed number of ordered sub-regions based on the Hilbert space-Filling curve and counts the localized keypoints found inside each sub-region. The resulting spatial histogram is a compact and discriminative low-level feature vector that shows significantly improved performance on classification tasks.References[1] D. Zhang, M. M. Islam, G. Lu, “A review on automatic image annotation techniques”, Pattern Recognition, 45:346-362, 2012.[2] A. Hanbury, “A survey of methods for image annotation”, Journal of Visual Languages & Computing,19(5):617-627, 2008.[3] K. Athanasakos, V. Stathopoulos, J. Jose, “A framework for evaluating automatic image annotation algorithms”, Lecture Notes in Computer Science, 5993:217-228, 2010.[4] R. Zhang, Z. Zhang, M. Li, H. J. Zhang, “A probabilistic semantic model for image annotation and multimodal image retrieval”, Multimedia Systems, pages 12(1):27-33, 2006.[5] Z. Theodosiou, N. Tsapatsoulis, “Semantic Gap Between People: An Experimental Investigation based on Image Annotation”, Proc. of the 7th International Workshop on Semantic Media Adaptation and Personalization, Luxembourg, 73-77, 2012.[6] Z. Theodosiou, N. Tsapatsoulis, “Modelling Crowdsourcing Originated Keywords within the AthleticsDomain”, Artificial Intelligence Applications and Innovations, IFIP Advances in Information and Communication Technology, 381:404-413, 2012.[7] Z. Theodosiou, N. Tsapatsoulis, “Spatial Histogram of Keypoints (SHiK)”, Proc. of the IEEE International Conference on Image Processing, Melbourne 2924-2928, 2013
Image database indexing: Emotional impact assessing
The goal of my PhD was to propose an efficient approach for emotional impact recognition based on CBIR techniques (descriptors, image representation). The main idea relies in classifying images according to their emotion which can be "Negative", "Neutral" or "Positive". Emotion is related to the image content and also to the personal feelings. To achieve our goal we firstly need a correct assessed image database. Our first contribution is about this aspect. We proposed a set of 350 diversifed images rated by people around the world. Added to our choice to use CBIR methods, we studied the impact of visual saliency for the subjective evaluations and interest region segmentation for classification. The results are really interesting and prove that the CBIR methods are useful for emotion recognition. The chosen desciptors are complementary and their performance is consistent on the database we have built and on IAPS, reference database for the analysis of the image emotional impact