Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
Not a member yet
    343 research outputs found

    Haar Hybrid Transform Based Melanoma Identification Using Ensemble of Machine Learning Algorithms

    No full text
    Traditional methods of disease diagnosis can be time-intensive, error prone and invasive to the subject. These methods are also prejudiced by the doctor’s subjectivity. These issues can be resolved by using automated diagnosis methods. There is a considerable dearth of medical experts today, especially in the rural areas. The use of computing technology may help to assist in the diagnostic process. This paper proposes the utilization of computers to detect melanoma skin cancer. Melanoma skin cancer can be fatal, especially in its later stages. However, it shows a high recovery rate when it is detected in its early stages. Considering the lack of medical professionals, early diagnosis of melanoma may be tried using machine learning algorithms. This paper explores hybrid wavelet transform based melanoma identification using ensemble of machine learning algorithms. The hybrid wavelet transform is produced using Discrete Cosine Transform and Haar Wavelet Transform as its components. The sizes of both components are varied from 4x4 to 128x128 to obtain the hybrid wavelet transorm. Experimentation performed on the transformed dermoscopy skin images with machine learning algorithms and their ensembles gives rise to a total of 196 variations. Overall, if the average of the metrics accuracy, sensitivity and specificity is considered, the SVM algorithm using the hybrid transform of Haar 8x8 and DCT 64x64 gives the best performance, followed by the SVM algorithm using hybrid transform of Haar 128x128 and DCT 4x4

    Understanding Eye Movements: Psychophysics and a Model of Primary Visual Cortex

    Full text link
    Humans move their eyes in order to learn visual representations of the world. These eye movements depend on distinct factors, either by the scene that we perceive or by our own decisions. To select what is relevant to attend is part of our survival mechanisms and the way we build reality, as we constantly react both consciously and unconsciously to all the stimuli that is projected into our eyes. In this thesis we try to explain (1) how we move our eyes, (2) how to build machines that understand visual information and deploy eye movements, and (3) how to make these machines understand tasks in order to decide for eye movements.(1) We provided the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of 230 synthetically-generated image patterns. A total of 15 types of stimuli has been generated (e.g. orientation, brightness, color, size, etc.), with 7 feature contrasts for each feature category. Eye-tracking data was collected from 34 participants during the viewing of the dataset, using Free-Viewing and Visual Search task instructions. Results showed that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. temporality of fixations, 4. task difficulty and 5. center bias. From such dataset (SID4VAM), we have computed a benchmark of saliency models by testing performance using psychophysical patterns. Model performance has been evaluated considering model inspiration and consistency with human psychophysics. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation.(2) Computations in the primary visual cortex (area V1 or striate cortex) have long been hypothesized to be responsible, among several visual processing mechanisms, of bottom-up visual attention (also named saliency). In order to validate this hypothesis, images from eye tracking datasets have been processed with a biologically plausible model of V1 (named Neurodynamic Saliency Wavelet Model or NSWAM). Following Li\u27s neurodynamic model, we define V1\u27s lateral connections with a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation and scale. Early subcortical processes (i.e. retinal and thalamic) are functionally simulated. The resulting saliency maps are generated from the model output, representing the neuronal activity of V1 projections towards brain areas involved in eye movement control. We want to pinpoint that our unified computational architecture is able to reproduce several visual processes (i.e.  brightness, chromatic induction and visual discomfort) without applying any type of training or optimization and keeping the same parametrization. The model has been extended (NSWAM-CM) with an implementation of the cortical magnification function to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search conditions. Results show that our model outpeforms other biologically-inpired models of saliency prediction as well as to predict visual saccade sequences, specifically for nature and synthetic images. We also show how temporal and spatial characteristics of inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) predict attention at distinct image contexts.(3) Although previous scanpath models have been able to efficiently predict saccades during Free-Viewing, it is well known that stimulus and task instructions can strongly affect eye movement patterns. In particular, task priming has been shown to be crucial to the deployment of eye movements, involving interactions between brain areas related to goal-directed behavior, working and long-term memory in combination with stimulus-driven eye movement neuronal correlates. In our latest study we proposed an extension of the Selective Tuning Attentive Reference Fixation Controller Model based on task demands (STAR-FCT), describing novel computational definitions of Long-Term Memory, Visual Task Executive and Task Working Memory. With these modules we are able to use textual instructions in order to guide the model to attend to specific categories of objects and/or places in the scene. We have designed our memory model by processing a visual hierarchy of low- and high-level features. The relationship between the executive task instructions and the memory representations has been specified using a tree of semantic similarities between the learned features and the object category labels. Results reveal that by using this model, the resulting object localization maps and predicted saccades have a higher probability to fall inside the salient regions depending on the distinct task instructions compared to saliency

    Covid19 Identification from Chest X-ray Images using Machine Learning Classifiers with GLCM Features

    No full text
    From staying quarantined at home, practicing work from home to moving outside wearing masks and carrying sanitizers, every individual has now become so adaptive to so called ‘New Normal’ post series of lockdowns across the countries. The situation triggered by novel Coronavirus has changed the behaviour of every individual towards every other living as well as non-living entity. In the Wuhan city of China, multiple cases were reported of pneumonia caused due to unknown reasons. The concerned medical authorities confirmed the cause to be Coronavirus. The symptoms seen in these cases were not much different than those seen in case of pneumonia. Earlier the research has been carried out in the field of pneumonia identification and classification through X-ray images of chest. The difficulty in identifying Covid19 infection at initial stage is due to high resemblance of its symptoms with the infection caused due to pneumonia. Hence it is trivial to well distinguish cases of coronavirus from pneumonia that may help in saving life of patients. The paper uses chest X-ray images to identify Covid19 infection in lungs using machine learning classifiers and ensembles with Gray-Level Cooccurrence Matrix (GLCM) features. The advocated methodology extracts statistical texture features from X-ray images by computing a GLCM for each image. The matrix is computed by considering various stride combinations. These GLCM features are used to train the machine learning classifiers and ensembles. The paper explores both the multiclass classification (X-ray images are classified into one of the three classes namely Covid19 affected, Pneumonia affected and normal lungs) and binary classification (Covid19 affected and other). The dataset used for evaluating performance of the method is open sourced and can be accessed easily. Proposed method being simple and computationally effective achieves noteworthy performance in terms of Accuracy, F-Measure, MCC, PPV and Sensitivity. In sum, the best stride combination of GLCM and ensemble of machine learning classifiers is suggested as vital outcome of the proposed method for effective Covid19 identification from chest X-ray image

    A Novel Method to Improve the Efficiency of Classification Phase of a Decision Tree

    No full text
    So far, most of the research on classification algorithms in machine learning has been focused only on improving the training speed and further improving the technical performance evaluation measures of the constructed models. There is no focus on improving the runtime efficiency of the classification phase which is much required in some critical applications. In this paper, we are considering the computation complexity of a decision tree’s classification phase as the major criterion. A novel approach has been proposed to predict the class label of an unseen instance using the decision tree in less time than the regular tree traversal method. In the proposed method, the constructed decision tree is represented in the form of arrays. Then, the process of finding the class label is carried out by performing the bitwise operations between the elements of the arrays and test instance. Empirical results on various UCI data sets proved that the proposed method outperforms the standard method and five other benchmark classifiers and its classification is at least four times faster than the regular method

    Face Analysis Using Row and Correlation Based Local Directional Pattern

    No full text
    Face analysis, which includes face recognition and facial expression recognition, has been attempted by many researchers and gave ideal solutions. The problem is still active and challenging due to an increase in the complexity of the problem viz. due to poor lighting, face occlusion, low-resolution images, etc. Local pattern descriptor methods introduced to overcome these critical issues and improve the recognition rate. These methods extract the discriminant information from the local features of the face image for recognition. In this paper, the local descriptor based two methods, namely row-based local directional pattern and correlation-based local directional pattern proposed by extending an existing descriptor -- local directional pattern (LDP). Further, the two feature vectors obtained by these methods concatenated to form a hybrid descriptor. Experimentation has carried out on benchmark databases and results infer that the proposed hybrid descriptor outperforms the other descriptors in face analysis

    Robust computer vision system for marbling meat segmentation

    Full text link
    In this study, we developed a robust automatic computer vision system for marbling meat segmentation. Our approach can segment muscle fat in various marbled meat samples using images acquired with different quality devices in an uncontrolled environment, where there was external ambient light and artificial light; thus, professionals can apply this method without specialized knowledge in terms of sample treatments or equipment, as well as without disruption to normal procedures, thereby obtaining a robust solution. The proposed approach for marbling segmentation is based on data clustering and dynamic thresholding. Experiments were performed using two datasets that comprised 82 images of 41 longissimus dorsi muscles acquired by different sampling devices. The experimental results showed that the computer vision system performed well with over 98% accuracy and a low number of false positives, regardless of the acquisition device employed

    Fully Convolutional Networks for Text Understanding in Scene Images

    Full text link
    Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically  rich  information  about  scene  content  and  context.   For  instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task. The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect and recognize text in natural scenes images

    Single Sensor Multi-Spectral Imaging

    Full text link
    This dissertation presents the benefits of using a multispectral Single Sensor Camera (SSC) that, simultaneously acquire images in the visible and near-infrared (NIR) bands. The principal benefits while addressing problems related to image bands in the spectral range of 400 to 1100 nanometers, there are cost reductions in the hardware setup because only one SSC is needed instead of two; moreover, the cameras’ calibration and images alignment are not required anymore. Concerning to the NIR spectrum, even though this band is close to the visible band and shares many properties, the sensor sensitivity is material dependent due to different behavior of absorption/reflectance capturing a given scene compared to visible channels. Many works in literature have proven the benefits of working with NIR to enhance RGB images (e.g., image enhancement, dehazing, etc.). In spite of the advantage of using SSC (e.g., low latency), there are some drawbacks to be solved. One of these drawbacks corresponds to the nature of the silicon-based sensor, which in addition to capturing the RGB image when the infrared cut off filter is not installed it also acquires NIR information into the visible image. This phenomenon is called RGB and NIR crosstalking. This thesis firstly faces this problem in challenging images and then it shows the benefit of using multispectral images in the edge detection task.Then, three methods based on CNN have been proposed for edge detection. While the first one is based on the most used model, holistically-nested edge detection (HED) termed as multispectral HED (MS-HED), the other two have been proposed observing the drawbacks of MS-HED. These two novel architectures have been designed from scratch; after the first architecture is validated in the visible domain a slight redesign is proposed to tackle the multispectral domain. A dataset is collected to face this problem with SSCs. Even though edge detection is confronted in the multispectral domain, its qualitative and quantitative evaluation demonstrates the generalization in other datasets used for edge detection, improving state-of-the-art results. One of the main properties of this proposal is to show that the edge detection problem can be tackled by just training the proposed architecture one-time while validating it in other datasets

    Robust Object Tracking in Infrared Video via Particle Filters

    Full text link
    In this paper we investigate the effectiveness of particle filters for object tracking in infrared videos. Once the user identifies the target object to be followed in position and size, its most representative feature points are obtained by means of the SURF algorithm. A particle filter is initialized with these feature points, and the location of the object within the video frames is determined by the average value of the particles that have a greater similarity with the target. Two different field tests were carried out to study the filter behaviour in comparison with previously used methods in the bibliography. The first one was tracking an unmanned aerial vehicle (UAV) in the open. The second one was to identify a heliport in a noisy infrared zenithal video take. In the first test, the UAV was followed by another positioning system simultaneously, thus allowing the comparison of both systems, and the evaluation in the improvement introduced by the particle algorithm

    Transition region based approach for skin lesion segmentation

    Full text link
    Skin melanoma is a skin disease that affects nearly 40% of people globally. Manual detection of the area is a time-consuming process and requires expert knowledge. The application of computer vision techniques can simplify this. In this article, a novel unsupervised transition region based approach for skin lesion segmentation for melanoma detection is proposed. The method starts with Gaussian blurring of the green channel dermoscopic image. Further, the transition region is extracted using local variance features and a global thresholding operation. It achieves the region of interest (binary mask) using various morphological operations. Finally, the melanoma regions are segregated from normal skin regions using the binary mask. The proposed method is tested using DermQuest dataset along with ISIC 2017 dataset and it achieves better results as compared to other state of art methods in effectively segmenting the melanoma regions from the normal skin regions

    258

    full texts

    343

    metadata records
    Updated in last 30 days.
    Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇