1,720,974 research outputs found

    Real-time hand gesture recognition exploiting multiple 2D and 3D cues

    Full text link
    The recent introduction of several 3D applications and stereoscopic display technologies has created the necessity of novel human-machine interfaces. The traditional input devices, such as keyboard and mouse, are not able to fully exploit the potential of these interfaces and do not offer a natural interaction. Hand gestures provide, instead, a more natural and sometimes safer way of interacting with computers and other machines without touching them. The use cases for gesture-based interfaces range from gaming to automatic sign language interpretation, health care, robotics, and vehicle automation. Automatic gesture recognition is a challenging problem that has been attaining a growing interest in the research field for several years due to its applications in natural interfaces. The first approaches, based on the recognition from 2D color pictures or video only, suffered of the typical problems characterizing such type of data. Inter occlusions, different skin colors among users even of the same ethnic group and unstable illumination conditions, in facts, often made this problem intractable. Other approaches, instead, solved the previous problems by making the user wear sensorized gloves or hold proper tools designed to help the hand localization in the scene. The recent introduction in the mass market of novel low-cost range cameras, like the Microsoft Kinect, Asus XTION, Creative Senz3D, and the Leap Motion, has opened the way to innovative gesture recognition approaches exploiting the geometry of the framed scene. Most methods share a common gesture recognition pipeline based on firstly identifying the hand in the framed scene, then extracting some relevant features on the hand samples and finally exploiting suitable machine learning techniques in order to recognize the performed gesture from a predefined ``gesture dictionary''. This thesis, based on the previous rationale, proposes a novel gesture recognition framework exploiting both color and geometric cues from low-cost color and range cameras. The dissertation starts by introducing the automatic hand gesture recognition problem, giving an overview of the state-of-art algorithms and the recognition pipeline employed in this work. Then, it briefly describes the major low-cost range cameras and setups used in literature for color and depth data acquisition for hand gesture recognition purposes, highlighting their capabilities and limitations. The methods employed for respectively detecting the hand in the framed scene and segmenting it in its relevant parts are then analyzed with a higher level of detail. The algorithm first exploits skin color information and geometrical considerations for discarding the background samples, then it reliably detects the palm and the finger regions, and removes the forearm. For the palm detection, the method fits the largest circle inscribed in the palm region or, in a more advanced version, an ellipse. A set of robust color and geometric features which can be extracted from the fingers and palm regions, previously segmented, is then illustrated accurately. Geometric features describe properties of the hand contour from its curvature variations, the distances in the 3D space or in the image plane of its points from the hand center or from the palm, or extract relevant information from the palm morphology and from the empty space in the hand convex hull. Color features exploit, instead, the histogram of oriented gradients (HOG), local phase quantization (LPQ) and local ternary patterns (LTP) algorithms to provide further helpful cues from the hand texture and the depth map treated as a grayscale image. Additional features extracted from the Leap Motion data complete the gesture characterization for a more reliable recognition. Moreover, the thesis also reports a novel approach jointly exploiting the geometric data provided by the Leap Motion and the depth data from a range camera for extracting the same depth features with a significantly lower computational effort. This work then addresses the delicate problem of constructing a robust gesture recognition model from the features previously described, using multi-class Support Vector Machines, Random Forests or more powerful ensembles of classifiers. Feature selection techniques, designed to detect the smallest subset of features that allow to train a leaner classification model without a significant accuracy loss, are also considered. The proposed recognition method, tested on subsets of the American Sign Language and experimentally validated, reported very high accuracies. The results showed also how higher accuracies are obtainable by combining proper sets of complementary features and using ensembles of classifiers. Moreover, it is worth noticing that the proposed approach is not sensor dependent, that is, the recognition algorithm is not bound to a specific sensor or technology adopted for the depth data acquisition. Eventually, the gesture recognition algorithm is able to run in real-time even in absence of a thorough optimization, and may be easily extended in a near future with novel descriptors and the support for dynamic gestures

    Hand gesture recognition with jointly calibrated Leap Motion and depth sensor

    Full text link
    Novel 3D acquisition devices like depth cameras and the Leap Motion have recently reached the market. Depth cameras allow to obtain a complete 3D description of the framed scene while the Leap Motion sensor is a device explicitly targeted for hand gesture recognition and provides only a limited set of relevant points. This paper shows how to jointly exploit the two types of sensors for accurate gesture recognition. An ad-hoc solution for the joint calibration of the two devices is firstly presented. Then a set of novel feature descriptors is introduced both for the Leap Motion and for depth data. Various schemes based on the distances of the hand samples from the centroid, on the curvature of the hand contour and on the convex hull of the hand shape are employed and the use of Leap Motion data to aid feature extraction is also considered. The proposed feature sets are fed to two different classifiers, one based on multi-class SVMs and one exploiting Random Forests. Different feature selection algorithms have also been tested in order to reduce the complexity of the approach. Experimental results show that a very high accuracy can be obtained from the proposed method. The current implementation is also able to run in real-time

    Combining multiple depth-based descriptors for hand gesture recognition

    Full text link
    Depth data acquired by current low-cost real-time depth cameras provide a more informative description of the hand pose that can be exploited for gesture recognition purposes. Following this rationale, this paper introduces a novel hand gesture recognition scheme based on depth information. The hand is firstly extracted from the acquired data and divided into palm and finger regions. Then four different sets of feature descriptors are extracted, accounting for different clues like the distances of the fingertips from the hand center and from the palm plane, the curvature of the hand contour and the geometry of the palm region. Finally a multi-class SVM classifier is employed to recognize the performed gestures. Experimental results demonstrate the ability of the proposed scheme to achieve a very high accuracy on both standard datasets and on more complex ones acquired for experimental evaluation. The current implementation is also able to run in real-time

    Hand Gesture Recognition for 3D Interfaces

    No full text
    The recent introduction of many new three dimensional applications and display technologies has created the need for new human-computer interfaces in order to interact with them in a simpler and more natural way compared to what is possible with traditional devices such as the keyboard and mouse. In this work we describe a novel interface for interactive 3D browsing that relies only on the direct acquisition of hand gestures exploiting data from depth cameras such as the Microsoft Kinect. We do not aim at introducing a complete working system for this task, but instead all the various building blocks and available techniques will be presented in order to construct a framework inside which the various components can be fitted

    Hand gesture recognition with leap motion and kinect devices

    Full text link
    The recent introduction of novel acquisition devices like the Leap Motion and the Kinect allows to obtain a very informative description of the hand pose that can be exploited for accurate gesture recognition. This paper proposes a novel hand gesture recognition scheme explicitly targeted to Leap Motion data. An ad-hoc feature set based on the positions and orientation of the fingertips is computed and fed into a multi-class SVM classifier in order to recognize the performed gestures. A set of features is also extracted from the depth computed from the Kinect and combined with the Leap Motion ones in order to improve the recognition performance. Experimental results present a comparison between the accuracy that can be obtained from the two devices on a subset of the American Manual Alphabet and show how, by combining the two features sets, it is possible to achieve a very high accuracy in real-time

    Human-Robot Interaction with Depth-Based Gesture Recognition

    Full text link
    Human robot interaction is a very heterogeneous research field and it is attracting a growing interest. A key building block for a proper interaction between humans and robots is the automatic recognition and interpretation of gestures performed by the user. Consumer depth cameras (like MS Kinect) have made possible an accurate and reliable interpretation of human gestures. In this paper a novel framework for gesture- based human-robot interaction is proposed. Both hand gestures and full-body gestures are recognized through the use of depth information, and a human-robot interaction scheme based on these gestures is proposed. In order to assess the feasibility of the proposed scheme, the paper presents a simple application based on the well-known rock-scissors-paper game

    ToF Cameras and Microsoft Kinect Depth Sensor for Natural Gesture Interfaces

    No full text
    Natural computer-human interfaces are gaining more and more importance everyday. Users would ideally like to communicate with machines in a natural way, i.e., by means of voice or gestures. Concerning the latter, a natural inteface should be able to correctly identify gestures without the presence of a physical controller such as a mouse, a track-pad or a Nintendo WiiMote. Interfaces of this kind in practice require the acquisition and the analysis of 3D data from dynamic scenes, which has always been a very challenging problem. Until a few years ago this was possible only with complex and expensive setups. Recently, novel acquisition devices such as Time-of-Flight (ToF) range cameras have made this task almost as simple as acquiring a standard video of the scene. Furthermore, the introduction of the Microsoft Kinect device has made this technology available to the mass market at a very low price. This paper after a quick review of the technology behind ToF cameras and the Kinect device, introduces a mathematical model of the errors in the measurement process. The measurement errors and the artefacts in the acquired data are analysed in detail for the two devices upon a wide set of experiments showing the performance of these devices in dierent conditions

    Effective and precise

    Full text link
    In this work an effective face detector based on the well-known Viola–Jones algorithm is proposed. A common issue in face detection is that for maximizing the face detection rate a low threshold is used for classifying as face an input image, but at the same time using a low threshold drastically increases the number of false positives. In this paper several criteria are proposed for reducing false positives: (i) a skin detection step is used to reject a candidate face region that does not contain the skin color, (ii) the size of the candidate face region is calculated according to the depth data, removing the too small or the too large faces, (iii) images of flat objects (e.g. candidate face found in a wall) or uneven objects (e.g. candidate face found in the leaves of a tree) are removed using the depth map and a segmentation approach based both on color and depth data. The above criteria permit to drastically reduce the number of false positives without decreasing the detection rate. The proposed approach has been validated on three datasets composed of 180 samples including both 2D and depth images. The face position inside samples has been manually labeled for testing. A Matlab version of the system for face detection and the full testing dataset will be freely available from http://www.dei.unipd.it/node/2357
    corecore