Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
Not a member yet
    343 research outputs found

    Contributions to Real-time Metric Localisation with Wearable Vision Systems

    Full text link
    Under the rapid development of electronics and computer science in the last years, cameras have becomeomnipresent nowadays, to such extent that almost everybody is able to carry one at all times embedded intotheir cellular phone. What makes cameras specially appealing for us is their ability to quickly capture a lot ofinformation of the environment encoded in one image or video, allowing us to immortalize special moments inour life or share reliable visual information of the environment with other persons. However, while the task ofextracting the information from an image may by trivial for us, in the case of computers complex algorithmswith a high computational burden are required to transform a raw image into useful information. In this sense, the same rapid development in computer science that allowed the widespread of cameras has enabled also the possibility of real-time application of previously practically infeasible algorithms.Among the current fields of research in the computer vision community, this thesis is specially concerned inmetric localisation and mapping algorithms. These algorithms are a key component in many practical applications such as robot navigation, augmented reality or reconstructing 3D models of the environment.The goal of this thesis  is to delve into visual localisation and mapping from vision, paying special attentionto conventional and unconventional cameras which can be easily worn or handled by a human. In this thesis Icontribute in the following aspects of the visual odometry and SLAM (Simultaneous Localisation and Mapping)pipeline:- Generalised Monocular SLAM for catadioptric central cameras- Resolution of the scale problem in monocular vision- Dense RGB-D odometry- Robust place recognition- Pose-graph optimisatio

    Image Processing Algorithms for Driver Assistance using Wide Angle Cameras

    Full text link
    Modern vehicles are deployed with a large number of sensors in order to provide a rich spectrum of driver assistance functionality. These systems enhance security and comfort of passengers and other traffic participants alike, but they also pave the road to fully autonomous traffic. In order to provide this functionality robustly and reliably, one currently makes use of numerous specialized sensors: laser, radar, ultrasound, and infrared sensors, as well as different kinds of video cameras. The diversity of sensors comes with high cost and enables complex assistance functions momentarily only for upper-class vehicles. The current research, thus, focuses on the development of better algorithms that make similar systems possible on inexpensive sensors. This thesis examines the aptitude of a new camera system, which has recently grown popular in vehicles of most of the large automobile manufacturers, for all major video-based driver assistance functionality. This so-called Topview system consists of four wide angle cameras with a view angle of up to 200 degrees, usually mounted at the front bumper, the two side mirrors and the trunk lid. By these means, one is able to provide a view surrounding the entire vehicle. However, the single camera images are distorted which substantiates the need for adapted image processing algorithms

    A variational approach to denoising problem

    Full text link
    A digital image can be created by different digital devices, such as digital cameras, X-ray scanners, etc. In practice, such devices can give unexpected defects, for example, noise. The Gaussian noise and Poisson noise are very important, but their combination is important too. This mixed noise usually appears in electronic microscopic images, in aerospace images, etc. Our goal is to combine ROF model (for Gaussian noise removal) and modified ROF model (for Poisson noise removal) to create new model that can treat this combination effectively. Our model will treat this combination with considering proportion of noise between them.

    A critical appraisal on wavelet based features from brain MR images for efficient characterization of ischemic stroke injuries

    Full text link
    Ischemic stroke is a severe neuro disorder typically characterized by a block inside a blood vessel supplying blood to the brain. It remains the third leading cause for death, after heart attack and cancer. Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) were the vital major imaging techniques used for diagnosing this disorder. While the CT imaging can be used at the primary stage, MRI proves to be a standard aid for progressive diagnostic planning in the treatment of stroke injuries. Developing a fully automatic approach for lesion segmentation is a challenging issue due to the complex nature of the lesions structures. This research basically aims at examining the properties of such complex structures. It analyses the characteristics of the normal brain tissues and abnormal lesion structures using a three-level wavelet decomposition procedure. Four different wavelet functions namely daubechies, symlet, coiflet and de-meyer were applied to the different datasets and the resulting observations were examined based on their feature statistics obtained. Experiments indicate the feature statistics obtained from daubechies and de-meyer wavelets were able to clearly distinguish between the typical brain tissues and abnormal lesion structures

    Speech Recognition Supported by Lip Analysis

    Full text link
    Computers have become more pervasive than ever with a wide range of devices and multiple ways of interaction. Traditional ways of human computer interaction using keyboards, mice and display monitors are being replaced by more natural modes such as speech, touch, and gesture. The continuous progress of technology brings to an irreversible change of paradigms of interaction between human and machine. They are now used in daily life in many devices that have revolutionized the way users interact with machines. In fact new PCs, tablets and smartphones are moving increasingly toward a direction that will bring in a short time to have interaction paradigms so advanced that will be completely transparent to users. The various modes of human-machine interaction, through voice recognition are without doubt one of the most considered. Many attempts have been in recent years to automate the process of voice communication with which they interact between themselves persons.A number of researchers have revealed that a speech reading system is beneficial complement to an audio speech recognition system by using of visual cues of the speakers, such as face in noisy environment. However, robust and precise extraction of visual features is a challenging problem in object recognition, due to high variation in pose, lighting and facial makeup. Most of the existing approaches use constraints such as the use of reflective marker on subjects lips, lip movements recorded with a fixed camera position (head mounted camera) and lip segmentation in organized illumination conditions. Furthermore, there is no common consensus about the visual features selection and their significance for a particular phoneme.Speech is the natural procedure of communication. Therefore speech would be an apparently preferred option for human computer interaction. In the past years, development in technology, combined with a significant reduction in cost, has led to the pervasive use of automated speech recognition in variety of systems such as telephony, human-computer interaction and robotics.  Visual speech cues are prospective source of speech information and they are apparently not affected in noisy acoustic environmental condition and cross talking between speakers. Visual information of a speaker is the key component of Speech Recognition system such as outside area of mouth, mouth gestures and facial expressions.The major problem to develop an accurate and robust speech recognition system is to find the precise visual feature extraction method. Sometime hearer observes improper from speaker because of the incompatible effect of visual features. These visual features have great role in the lip reading process. These interpretations gave a motivation for developing a computer speech recognition system.I propose a speech recognition system using face detection, lip extraction and tracking with some pre-processing techniques to overwhelmed the pose/lighting variation problems. The proposed approach is useful for face/lip detection and tracking in sequence of images and to augment global facial features to improve the recognition performance. The Proposed approach consists of four major parts, firstly detecting/localizing human faces, lips and define the lip region of interest in the first frame, secondly apply some pre-processing to overwhelmed the inference triggered by illumination effects, shadow and teeth appearance, thirdly create contour line with using sixteen key points with geometric constraint and stored the coordinates of these constraints. Finally track the lip contour with their coordinates in the following frames. The proposed method not only adapts to the lip movement, but also robust in contrast to the appearance of teeth, shadows and low contrast environment. Extensive experiments show the encouraging results and the effectiveness of the proposed method in comparison with the existing methods. However, several factors were found during the experiments which may lead to an increase of the error rate. The key challenge for the recognition system is to get precise results with different environmental conditions and disturbing visual domain effects, such as illumination, shadow and teeth.  Three pre-processing steps, namely illumination equalization, teeth detection and shadow removal developed, aiming at investigating edge information and global statistical characteristics which are sensitive to the uneven illuminations and susceptible to the complex appearance in presence of teeth and shadow. In contrast, the proposed method, which is aimed at local region analysis, can successfully avoid the complex appearance (e.g. low contrast, shadow, moustaches and teeth). The high average extraction performance is reached. Experimental results show also some unsatisfactory results due to very low contrast and bad low resolution camera. A standard video camera (Logitech) is used to record English alphabets uttered by users is applied. Proposed method is an easy to implement and a computationally efficient algorithm that is capable of locating face and mouth and lips feature points throughout an entire image sequence. The extracted feature parameters are suitable for speech recognition and can greatly improve recognition accuracy.An approach to detect and track lip boundaries in a precise way is presented. The basic idea of this new approach is that not only it highlights the lips but also avoids other factors, (such as false lip pixels) and recovers from failures. The new approach is implemented in the lip tracking module. Using this lip tracking module from the lip boundary lines a feature vector of 16 points lip model of the speaker’s lips, stores the coordinates of these points and tracks these coordinates during the utterance by the speaker and tracked in every image of the image sequence. The strength of the new approach has also been evaluated by testing the system in noisy real world facial image sequences. Experiments have shown that outliers detecting and better predicting ROIs can further reduce the number of frames with locating or tracking failures.  Computers have become more pervasive than ever with a wide range of devices and multiple ways of interaction. Traditional ways of human computer interaction using keyboards, mice and display monitors are being replaced by more natural modes such as speech, touch, and gesture. The continuous progress of technology brings to an irreversible change of paradigms of interaction between human and machine. They are now used in daily life in many devices that have revolutionized the way users interact with machines. In fact new PCs, tablets and smartphones are moving increasingly toward a direction that will bring in a short time to have interaction paradigms so advanced that will be completely transparent to users. The various modes of human-machine interaction, through voice recognition are without doubt one of the most considered. Many attempts have been in recent years to automate the process of voice communication with which they interact between themselves persons.A number of researchers have revealed that a speech reading system is beneficial complement to an audio speech recognition system by using of visual cues of the speakers, such as face in noisy environment. However, robust and precise extraction of visual features is a challenging problem in object recognition, due to high variation in pose, lighting and facial makeup. Most of the existing approaches use constraints such as the use of reflective marker on subjects lips, lip movements recorded with a fixed camera position (head mounted camera) and lip segmentation in organized illumination conditions. Furthermore, there is no common consensus about the visual features selection and their significance for a particular phoneme.Speech is the natural procedure of communication. Therefore speech would be an apparently preferred option for human computer interaction. In the past years, development in technology, combined with a significant reduction in cost, has led to the pervasive use of automated speech recognition in variety of systems such as telephony, human-computer interaction and robotics.  Visual speech cues are prospective source of speech information and they are apparently not affected in noisy acoustic environmental condition and cross talking between speakers. Visual information of a speaker is the key component of Speech Recognition system such as outside area of mouth, mouth gestures and facial expressions.The major problem to develop an accurate and robust speech recognition system is to find the precise visual feature extraction method. Sometime hearer observes improper from speaker because of the incompatible effect of visual features. These visual features have great role in the lip reading process. These interpretations gave a motivation for developing a computer speech recognition system.I propose a speech recognition system using face detection, lip extraction and tracking with some pre-processing techniques to overwhelmed the pose/lighting variation problems. The proposed approach is useful for face/lip detection and tracking in sequence of images and to augment global facial features to improve the recognition performance. The Proposed approach consists of four major parts, firstly detecting/localizing human faces, lips and define the lip region of interest in the first frame, secondly apply some pre-processing to overwhelmed the inference triggered by illumination effects, shadow and teeth appearance, thirdly create contour line with using sixteen key points with geometric constraint and stored the coordinates of these constraints. Finally track the lip contour with their coordinates in the following frames. The proposed method not only adapts to the lip movement, but also robust in contrast to the appearance of teeth, shadows and low contrast environment. Extensive experiments show the encouraging results and the effectiveness of the proposed method in comparison with the existing methods. However, several factors were found during the experiments which may lead to an increase of the error rate. The key challenge for the recognition system is to get precise results with different environmental conditions and disturbing visual domain effects, such as illumination, shadow and teeth.  Three pre-processing steps, namely illumination equalization, teeth detection and shadow removal developed, aiming at investigating edge information and global statistical characteristics which are sensitive to the uneven illuminations and susceptible to the complex appearance in presence of teeth and shadow. In contrast, the proposed method, which is aimed at local region analysis, can successfully avoid the complex appearance (e.g. low contrast, shadow, moustaches and teeth). The high average extraction performance is reached. Experimental results show also some unsatisfactory results due to very low contrast and bad low resolution camera. A standard video camera (Logitech) is used to record English alphabets uttered by users is applied. Proposed method is an easy to implement and a computationally efficient algorithm that is capable of locating face and mouth and lips feature points throughout an entire image sequence. The extracted feature parameters are suitable for speech recognition and can greatly improve recognition accuracy.An approach to detect and track lip boundaries in a precise way is presented. The basic idea of this new approach is that not only it highlights the lips but also avoids other factors, (such as false lip pixels) and recovers from failures. The new approach is implemented in the lip tracking module. Using this lip tracking module from the lip boundary lines a feature vector of 16 points lip model of the speaker’s lips, stores the coordinates of these points and tracks these coordinates during the utterance by the speaker and tracked in every image of the image sequence. The strength of the new approach has also been evaluated by testing the system in noisy real world facial image sequences. Experiments have shown that outliers detecting and better predicting ROIs can further reduce the number of frames with locating or tracking failures. Key Words: Computer Vision, Image Analysis, Illumination Equalization, Image Segmentation, Lip Dection and Tracking, Video and Image Sequence Analysi

    Fast Region-based Active Contour Model Driven by Local Signed Pressure Force

    Full text link
    Intensity inhomogeneity is a well-known problem in image segmentation. In this paper, we present a new region-based active contour model for image segmentation which can handle intensity inhomogeneity problem. This model introduced a new region-based signed pressure force (SPF) function, which uses the local mean values provided by the local binary fitted (LBF) model. In addition, the proposed model utilizes a new regularization operation such as morphological opening and closing to regularize the level set function in the evolution process. Experimental results on synthetic and real images show that the proposed model gives satisfactory segmentation results as well as less sensitivity to the initial contour location and less time consuming compared to the LBF model

    Efficient Labelling of Pedestrian Supervisions

    Full text link
    Object detection is a fundamental goal to achieve intelligent visual perception by computers due to the fact that objects are the basic building blocks to achieve higher level image understanding. Among the numerous categories of objects in the real-world, pedestrians are among the most important due to several potential benefits brought about by successful pedestrian detection. Often, pedestrian detectors are trained in state-of-the-art systems using supervised machine learning algorithms which necessitates costly and often tedious manual annotation of pedestrians in the form of precise bounding boxes. In this paper, a novel weakly supervised learning algorithm is proposed to train a pedestrian detector that requires, instead of bounding boxes, only annotations of estimated centres of pedestrians. The algorithm makes use of a pedestrian prior learnt in an unsupervised way from the video and this prior is fused with the given weak supervision information in a systematic manner. By evaluating on publicly available datasets, we demonstrate that our weakly supervised algorithm reduces the cost of manual annotation of pedestrians by more than four times while achieving similar performance to a pedestrian detector trained with standard bounding box annotations

    Memory Organization for Invariant Object Recognition and Categorization

    Full text link
    Using distributed representations of objects enables artificial systems to be more versatile regarding inter- and intra-category variability, improving the appearance-based modeling of visual object understanding. They are built on the hypothesis that object models are structured dynamically using relatively invariant patches of information arranged in visual dictionaries, which can be shared across objects from the same category. However, implementing distributed representations efficiently to support the complexity of invariant object recognition and categorization remains a research problem of outstanding significance for the biological, the psychological, and the computational approach to understanding visual perception. The present work focuses on solutions driven by top-down object knowledge. It is motivated by the idea that, equipped with sensors and processing mechanisms from the neural pathways serving visual perception, biological systems are able to define efficient measures of similarities between properties observed in objects and use these relationships to form natural clusters of object parts that share equivalent ones. Based on the comparison of stimulus-response signatures from these object-to-memory mappings, biological systems are able to identify objects and their kinds. The present work combines biologically inspired mathematical models to develop memory frameworks for artificial systems, where these invariant patches are represented with regular-shaped graphs, whose nodes are labeled with elementary features that capture texture information from object images. It also applies unsupervised clustering techniques to these graph image features to corroborate the existence of natural clusters within their data distribution and determine their composition. The properties of such computational theory include self-organization and intelligent matching of these graph image features based on the similarity and co-occurrence of their captured texture information. The performance to model invariant object recognition and categorization of feature-based artificial systems equipped with each of the developed memory frameworks is validated applying standard methodologies to well-known image libraries found in literature. Additionally, these artificial systems are cross-compared with state-of-the-art alternative solutions. In conclusion, the findings of the present work convey implications for strategies and experimental paradigms to analyze human object memory as well as technical applications for robotics and computer vision

    Remote Authentication Using Vaulted Fingerprint Verification

    Full text link
    With the rise of the Internet, remote verification of identity is an increasingly important part of modern life. From online banking systems to personal data storage to software as a service, most aspects of modern life require identity verification.Traditional authentication systems rely on the possession of a token, generally a password or smartcard. Token-based identity transactions are relatively easy to repudiate since unauthorized persons may possess the token. A system that can guarantee a user’s presence during authentication would greatly enhance the non-reputability of these transactions. Biometrics can provide this strong link between users and their identities. By measuring and comparing a feature of the user, we can increase the assurance that the user is present during authentication.Fingerprint biometrics are increasingly used for identity verification. However, these require a careful balance of accuracy and privacy that is missing in many implementations. This dissertation describes a new biometric matching system, the protection of an existing data-type, and a model for matching security with error correction codes.My research develops the Vaulted Verification (VV) system into Vaulted Fingerprint Verification (VFV) by implementing VV on a fingerprint minutia triangle representation [1].Triangle representation contains much information that enhances the accuracy of the system. VFV matcher requires no order or relation between the minutia triangles.VFV allows for key exchange and remote authentication using the challenge response protocol. Protected biometric template is used to preform the authentication. Privacy and security of the user\u27s biometric data is preserved through multiple levels of protection. First, the system uses a protected transmission protocol to transmit the authentication token. Second, minutia triangles are difficult to invert. Third, VFV is compatible with protected minutia data types.VFV is built on blocks containing several minutia triangles. Selecting several minutia triangles within a block provides tolerance to common errors in fingerprint images. The blocks are permuted to store arbitrary data, such as encryption keys or an authenticator challenge. The data is combined with an error correcting code to provide tolerance to inter-image errors in fingerprint minutia.VFV is fully compatible with protected biometric data-types. This is demonstrated by including a protected minutia descriptor, Protected Minutia Cylinder Code (PMCC) [2]. PMCC is known for its ability to enhance the accuracy of matching fingerprint minutia while being difficult to invert. Augmented VFV features with PMCC enhance the accuracy of the system.A modification of PMCC is developed in this dissertation to enhance the privacy of the system. The PMCC\u27s within a minutia triangle are XORed together. The XOR procedure greatly enhances the non-invertiblity of PMCC, while having a small impact on VFV accuracy.Due to the importance of error correcting codes (ECC) in VFV, a model of security with ECC is developed. It is used to identify non-trivial potential attacker uses of ECC bits

    A Confidence Framework for the Assessment of Optical Flow Performance

    Full text link
    Optical Flow (OF) is the input of a wide range of decision support systems such as car driver assistance, UAV guiding or medical diagnose. In these real situations, the absence of ground truth forces to assess OF quality using quantities computed from either sequences or the computed optical flow itself. These quantities are generally known as Confidence Measures, CM. Even if we have a proper confidence measure we still need a way to evaluate its ability to discard pixels with an OF prone to have a large error. Current approaches only provide a descriptive evaluation of the CM performance but such approaches are not capable to fairly compare different confidence measures and optical flow algorithms. Thus, it is of prime importance to define a framework and a general road map for the evaluation of optical flow performance. This thesis provides a framework able to decide which pairs ”optical flow - confidence measure” (OF-CM)are best suited for optical flow error bounding given a confidence level determined by a decision support system.To design this framework we cover the following points: 1) Descriptive scores. As a first step, we summarize and analyze the sources of inaccuracies in the output of optical flow algorithms. Second, we present several descriptive plots that visually assess CM capabilities for OF error bounding. In addition to the descriptive plots, given a plot representing OF-CM capabilities to bound the error, we provide a numeric score that categorizes the plot according to its decreasing profile, that is, a score assessing CM performance. 2) Statistical framework. We provide a comparison framework that assesses the best suited OF-CM pairfor error bounding that uses a two stage cascade process. First of all we assess the predictive value of the confidence measures by means of a descriptive plot. Then, for a sample of descriptive plots computed over training frames, we obtain a generic curve that will be used for sequences with no ground truth. As a second step, we evaluate the obtained general curve and its capabilities to really reflect the predictive value of a confidence measure using the variability across train frames by means of ANOVA.The presented framework has shown its potential in the application on clinical decision support systems. In particular, we have analyzed the impact of the different image artifacts such as noise and decay to the output of optical flow in a cardiac diagnose system and we have improved the navigation inside the bronchial tree on bronchoscopy

    258

    full texts

    343

    metadata records
    Updated in last 30 days.
    Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇