Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
Not a member yet
343 research outputs found
Sort by
Learning audio and image representations with bio-inspired trainable feature extractors
Since when very young, we can quickly learn new concepts, and distinguish between different kinds of object or sound. If we see a single object or hear a particular sound, we are then able to recognize such sample or even different versions of it in other scenarios. As an example, if one sees a iron chair and associates the object to the general concept of “chairs”, he will be able to detect and recognize also wooden or wicker chairs. Similarly, when we hear the sound of a particular event, such as a scream, we are then able to recognize other kinds of scream that occur in different environments. We continuously learn representations of the real world, which we then use in order to understand new and changing environments.In the field of pattern recognition, traditional methods typically require a careful design of data representations (i.e. features), which involves considerable domain knowledge and effort by experts. Recently, approaches for automated learning of representations from training data were introduced and based on popular deep learning techniques and convolutional neural networks (CNN). Representation learning aims at avoiding engineering of hand-crafted features and providing automatically learned features suitable for the recognition tasks. In this work, we proposed novel trainable filters for representation learning in audio and image processing. The structure of these filters is not fixed in the implementation but rather configured directly from single prototype patterns of interest [4].In the context of audio processing, we focused on the problem of audio event detection and classification in noisy environments, also in cases where the signal to noise ratio (SNR) is null or negative. We released two data sets, namely the MIVIA audio events and the MIVIA road events data sets, and obtained baseline results (recognition rate of about 85%) with a real-time method for event detection based on the bag of features classification scheme [3, 2].We designed novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy), that are able to detect specific constellations of energy peak points in time-frequency representations of input audio signals [8]. The particular constellation of energy peaks to be detected by a COPE feature extractor is determined in an automatic configuration process performed on a given prototype sound. The design of COPE feature extractors was inspired by some functions of the cochlea membrane and the inner hair cells in the inner auditory system, which convert the sound pressure waves into neural stimuli on the auditory nerve.We proposed a method that uses COPE feature extractors together with a classification system to perform real-time audio event detection and classification, also in cases where sounds have null and negative SNR. The performance results (recognition rate over 90%) that we obtained on several benchmarking data sets for audio events detection in different contexts are higher than state-of-the-art approaches.In the second part of the work, we introduced B-COSFIRE filters for detection of elongated and curvilinear patterns in images and apply them to the delineation of blood vessels in retinal images [1, 6]. The B-COSFIRE filters are trainable, that is their structure is automatically configured from prototype elongated patterns. The design of the B-COSFIRE filters is inspired by the functions of some neurons, called simple cells, in area V1 of the visual system, which fire when presented with line or contour stimuli. A B-COSFIRE filter achieves orientation selectivity by computing the weighted geometric mean of the output of a pool of Difference-of-Gaussians (DoG) filters, whose supports are aligned in a collinear manner. Rotation invariance is efficiently obtained by appropriate shiftings of the DoG filter responses.After configuring a large bank of B-COSFIRE filters selective for vessels (i.e. lines) and vessel-endings (i.e. line-endings) of various thickness (i.e. scale), we employed different techniques based on information theory and machine learning to select an optimal subset of B-COSFIRE filters for the vessel delineation task [5, 7]. We considered the selected filters as feature extractors to construct a pixel-wise feature vector which we used in combination with a classifier to classify the pixels in the testing image as vessel and non-vessel pixels. We carried out experiments on public benchmarking data sets (DRIVE, STARE, CHASE DB1 and HRF data sets) and the results that we achieved are higher than many existing methods.We studied the computational requirements of the proposed algorithms in order to evaluate their applicabilityin real-world applications and the fulfillment of real-time constraints given by the considered problems. The MATLAB implementation of the proposed algorithms are publicly released for research purposes.This work contributes to the development of algorithms for representation learning in audio and imageprocessing and promotes their use in higher-level pattern recognition systems
Feature extraction algorithms from MRI to evaluate quality parameters on meat products by using data mining
This thesis proposes a new methodology to determine the quality characteristics of meat products (Iberian loin and ham) in a non-destructive way. For that, new algorithms have been developed to analyze Magnetic Resonance Imaging (MRI), and data mining techniques have been applied on data obtained from the images.The general procedure consists of obtaining MRI of meat products, and applying different computer vision algorithms (texture and fractal approaches, mainly), which allow the extraction of sets of computational features. Figure 1 shows the design of the proposed procedure.To achieve this, different research have been done, based on:high-field and low-field MRI scannersdifferent acquisition sequences: Spin Echo (SE), Gradient Echo (GE) and Turbo 3D (T3D)different texture approaches: Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM) and Neighboring Gray Level Dependence Matrix (NGLDM)fractals algorithms: Classical Fractal Algorithm (CFA), Fractal Texture Algorithm (FTA) and One Point Fractal Texture Algorithm (OPFTA)FTA [1] and OPFTA [2] have been developed in this thesis. They allow analyzing MRI images, properly, noting OPFTA for its simplicity and lower computational cost. At the same time, the meat products, Iberian hams and loins, were also analyzed by means of physico-chemical and sensory techniques. Databases were constructed with all these data. Different data mining techniques have been applied on them: deductive (Multiple Linear Regression (MLR)) [3], classification (Decision Trees (DT) and Rules-based Systems (RBS)) [4], and prediction techniques [5-7]. Figure 2 shows the MRI images of fresh and dry-cured Iberian loins (Figure 2A and 2B) and fresh and dry-cured hams (Figure 2C and 2D).The accuracy of the analysis of the quality parameters of Iberian ham and loin is affected by the MRI acquisition sequence, the algorithm used to analyze them and the data mining technique applied. Considering the data mining techniques, MLR and DT are appropriate, respectively, to deduce physico-chemical parameters of hams, and to classify as a function of salt content in hams. Regarding to the predictive technique, MLR could be indicate it allows obtaining equations to determine the physico-chemical characteristics and sensory attributes of Iberian loins and hams with a high degree of reliability, and analyzing the quality of these meat products in a non-destructive, efficient, effective and accurate way
Uncertainty Theories Based Iris Recognition System
The performance and robustness of the iris-based recognition systems still suffer from imperfection in the biometric information. This paper makes an attempt to address these imperfections and deals with important problem for real system. We proposed a new method for iris recognition system based on uncertainty theories to treat imperfection iris feature. Several factors cause different types of degradation in iris data such as the poor quality of the acquired pictures, the partial occlusion of the iris region due to light spots, or lenses, eyeglasses, hair or eyelids, and adverse illumination and/or contrast. All of these factors are open problems in the field of iris recognition and affect the performance of iris segmentation, its feature extraction or decision making process, and appear as imperfections in the extracted iris feature. The aim of our experiments is to model the variability and ambiguity in the iris data with the uncertainty theories. This paper illustrates the importance of the use of this theory for modeling or/and treating encountered imperfections. Several comparative experiments are conducted on two subsets of the CASIA-V4 iris image database namely Interval and Synthetic. Compared to a typical iris recognition system relying on the uncertainty theories, experimental results show that our proposed model improves the iris recognition system in terms of Equal Error Rates (EER), Area Under the receiver operating characteristics Curve (AUC) and Accuracy Recognition Rate (ARR) statistics.
MMKK++ algorithm for clustering heterogeneous images into an unknown number of clusters
In this paper we present a suggested automatic clustering procedure with the main aim to predict the number of clusters of unknown, heterogeneous images. We used the state-of-the-art Fisher-vector for mathematical representation of the images and these vectors were considered as input data points for the clustering algorithm. We implemented a novel variant of K-means, the kernel K-means++, furthermore the min-max kernel K-means plusplus (MMKK++) as clustering method. The proposed approach examines some candidate cluster numbers and uses the law of large numbers in order to choose the optimal cluster size. We conducted experiments on four image sets to demonstrate the efficiency of our solution. The first two image sets are subsets of different popular collections; the third is their union; the fourth is the complete Caltech101 image set
A New feature extraction method to Improve Emotion Detection Using EEG Signals
Since emotion plays an important role in human life, demand and importance of automatic emotion detection have grown with increasing role of human computer interface applications. In this research, the focus is on the emotion detection from the electroencephalogram (EEG) signals. The system derives a mechanism of quantification of basic emotions using. So far, several methods have been reported, which generally use different processing algorithms, evolutionary algorithms, neural networks and classification algorithms. The aim of this paper is to develop a smart method to improve the accuracy of emotion detection by discrete signal processing techniques and applying optimized support vector machine classifier with genetic evolutionary algorithm. The obtained results show that the proposed method provides the accuracy of 93.86% in detection of 4 emotions which is higher than state-of-the-art methods
Recognition and retrieval of objects in diverse applications
This work proposes and evaluates object description and retrieval techniques in different real applications. It addresses the classification of boar spermatozoa according to acrosome integrity using several methods based on invariant local features. In addition, it provides two new methods for insert localisation and an automatic solution for the recognition of broken inserts in edge profile milling heads that can be set up in-process without delaying any machining operations. Finally, it evaluates different clusterings of keypoints for object retrieval and proposes a new descriptor, named colour COSFIRE, in the scope of the European project Advisory System Against Sexual Exploitation of Children
Semantic Video Concept Detection using Novel Mixed-Hybrid-Fusion Approach for Multi-Label Data
The performance of the semantic concept detection method depends on, the selection of the low-level visual features used to represent key-frames of a shot and the selection of the feature-fusion method used. This paper proposes a set of low-level visual features of considerably smaller size and also proposes novel ‘hybrid-fusion’ and ‘mixed-hybrid-fusion’, approaches which are formulated by combining early and late-fusion strategies proposed in the literature. In the initially proposed hybrid-fusion approach, the features from the same feature group are combined using early-fusion before classifier training; and the concept probability scores from multiple classifiers are merged using late-fusion approach to get final detection scores. A feature group is defined as the features from the same feature family such as color moment. The hybrid-fusion approach is refined and the “mixed-hybrid-fusion” approach is proposed to further improve detection rate. This paper presents a novel video concept detection system for multi-label data using a proposed mixed-hybrid-fusion approach. Support Vector Machine (SVM) is used to build classifiers that produce concept probabilities for a test frame. The proposed approaches are evaluated on multi-label TRECVID2007 development dataset. Experimental results show that, the proposed mixed-hybrid-fusion approach performs better than other proposed hybrid-fusion approach and outperforms all conventional early-fusion and late-fusion approaches by large margins with respect to feature set dimensionality and Mean Average Precision (MAP) values
Recognition of Facial Expressions using Local Mean Binary Pattern
In this paper, we propose a novel appearance based local feature extraction technique called Local Mean Binary Pattern (LMBP), which efficiently encodes the local texture and global shape of the face. LMBP code is produced by weighting the thresholded neighbor intensity values with respect to mean of 3 x 3 patch. LMBP produces highly discriminative code compared to other state of the art methods. The micro pattern is derived using the mean of the patch, and hence it is robust against illumination and noise variations. An image is divided into M x N regions and feature descriptor is derived by concatenating LMBP distribution of each region. We also propose a novel template matching strategy called Histogram Normalized Absolute Difference (HNAD) for comparing LMBP histograms. Rigorous experiments prove the effectiveness and robustness of LMBP operator. Experiments also prove the superiority of HNAD measure over well-known template matching methods such as L2 norm and Chi-Square measure. We also investigated LMBP for facial expression recognition low resolution. The performance of the proposed approach is tested on well-known datasets CK, JAFFE, and TFEID
Robust Real-Time Gradient-based Eye Detection and Tracking Using Transform Domain and PSO-Based Feature Selection
Despite numerous research on eye detection and tracking, this field of study remains challenging due to the individuality of eyes, occlusion, and variability in scale, location, and light conditions. This paper combines a techniques of feature extraction and a feature selection method to achieve a significant increase in eye recognition. Subspace methods may improve detection efficiency and accuracy of eye centers detection using dimensionality reduction. In this study, HoG descriptor is used to lay the ground for BPSO based feature selection. Histogram of Oriented Gradient (HoG) features are used for efficient extraction of pose, translation and illumination invariant features. HoG descriptors uses the fact that local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions. The method upholds invariance to geometric and photometric transformations. The performance of presented method is evaluated using several benchmark datasets, namely, BioID and RS-DMV. Experimental results obtained by applying the proposed algorithm on BioID dataset show that the proposed system outperforms other eye recognition systems. A significant increase in the recognition rate is achieved when using the combination of HoG descriptor, BPSO, and SVM for feature extraction, feature selection and training phase respectively. The Recognition rate for BioID dataset was 99.6% and the detection time was 15.24 msec for every single frame
A block-based background model for moving object detection
Detecting the moving objects in a video sequence using a stationary camera is an important task for many computer vision applications. This paper proposes a background subtraction approach. As first step, the background is initialized using the block-based analysis before being updated in each incoming frame. Our background frame is generated by collecting the blocks background candidates. The block candidate selection is based on probability density function (pdf) computation. After that, the absolute difference between the background frame and each frame of sequence is computed. A noise filter is applied using the Structure/Texture decomposition in order to minimize the noise caused by background subtraction operation. The binary motion mask is formed using an adaptive threshold that was deduced from the weighted mean and variance calculation. To assure the correspondence between the current frame and the background frame, an adaptation of background model in each incoming frame is realized. After comparing results obtained from the proposed method to other existing ones, it was shown that our approach attains a higher degree of efficac