1,720,967 research outputs found

    Deep learning for scene understanding with color and depth data

    Full text link
    Significant advancements have been made in the recent years concerning both data acquisition and processing hardware, as well as optimization and machine learning techniques. On one hand, the introduction of depth sensors in the consumer market has made possible the acquisition of 3D data at a very low cost, allowing to overcome many of the limitations and ambiguities that typically affect computer vision applications based on color information. At the same time, computationally faster GPUs have allowed researchers to perform time-consuming experimentations even on big data. On the other hand, the development of effective machine learning algorithms, including deep learning techniques, has given a highly performing tool to exploit the enormous amount of data nowadays at hand. Under the light of such encouraging premises, three classical computer vision problems have been selected and novel approaches for their solution have been proposed in this work that both leverage the output of a deep Convolutional Neural Network (ConvNet) as well jointly exploit color and depth data to achieve competing results. In particular, a novel semantic segmentation scheme for color and depth data is presented that uses the features extracted from a ConvNet together with geometric cues. A method for 3D shape classification is also proposed that uses a deep ConvNet fed with specific 3D data representations. Finally, a ConvNet for ToF and stereo confidence estimation has been employed underneath a ToF-stereo fusion algorithm thus avoiding to rely on complex yet inaccurate noise models for the confidence estimation task

    Deep learning for 3D shape classification based on volumetric density and surface approximation clues

    Full text link
    This paper proposes a novel approach for the classification of 3D shapes exploiting surface and volumetric clues inside a deep learning framework. The proposed algorithm uses three different data representations. The first is a set of depth maps obtained by rendering the 3D object. The second is a novel volumetric representation obtained by counting the number of filled voxels along each direction. Finally NURBS surfaces are fitted over the 3D object and surface curvature parameters are selected as the third representation. All the three data representations are fed to a multi-branch Convolutional Neural Network. Each branch processes a different data source and produces a feature vector by using convolutional layers of progressively reduced resolution. The extracted feature vectors are fed to a linear classifier that combines the outputs in order to get the final predictions. Experimental results on the ModelNet dataset show that the proposed approach is able to obtain a state-of-the-art performance

    Stereo and ToF Data Fusion by Learning from Synthetic Data

    Full text link
    Time-of-Flight (ToF) sensors and stereo vision systems are both capable of acquiring depth information but they have complementary characteristics and issues. A more accurate representation of the scene geometry can be obtained by fusing the two depth sources. In this paper we present a novel framework for data fusion where the contribution of the two depth sources is controlled by confidence measures that are jointly estimated using a Convolutional Neural Network. The two depth sources are fused enforcing the local consistency of depth data, taking into account the estimated confidence information. The deep network is trained using a synthetic dataset and we show how the classifier is able to generalize to different data, obtaining reliable estimations not only on synthetic data but also on real world scenes. Experimental results show that the proposed approach increases the accuracy of the depth estimation on both synthetic and real data and that it is able to outperform state-of-the-art methods

    Scene Segmentation Driven by Deep Learning and Surface Fitting

    Full text link
    This paper proposes a joint color and depth segmentation scheme exploiting together geometrical clues and a learning stage. The approach starts from an initial over-segmentation based on spectral clustering. The input data is also fed to a Convolutional Neural Network (CNN) thus producing a per-pixel descriptor vector for each scene sample. An iterative merging procedure is then used to recombine the segments into the regions corresponding to the various objects and surfaces. The proposed algorithm starts by considering all the adjacent segments and computing a similarity metric according to the CNN features. The couples of segments with higher similarity are considered for merging. Finally the algorithm uses a NURBS surface fitting scheme on the segments in order to understand if the selected couples correspond to a single surface. The comparison with state-of-the-art methods shows how the proposed method provides an accurate and reliable scene segmentation

    Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting

    Full text link
    We present an approach for segmentation and semantic labelling of RGBD data exploiting together geometrical cues and deep learning techniques. An initial over-segmentation is performed using spectral clustering and a set of non-uniform rational B-spline surfaces is fitted on the extracted segments. Then a convolutional neural network (CNN) receives in input colour and geometry data together with surface fitting parameters. The network is made of nine convolutional stages followed by a softmax classifier and produces a vector of descriptors for each sample. In the next step, an iterative merging algorithm recombines the output of the over-segmentation into larger regions matching the various elements of the scene. The couples of adjacent segments with higher similarity according to the CNN features are candidate to be merged and the surface fitting accuracy is used to detect which couples of segments belong to the same surface. Finally, a set of labelled segments is obtained by combining the segmentation output with the descriptors from the CNN. Experimental results show how the proposed approach outperforms state-of-the-art methods and provides an accurate segmentation and labelling

    Deep Learning for Confidence Information in Stereo and ToF Data Fusion

    No full text
    This paper proposes a novel framework for the fusion of depth data produced by a Time-of-Flight (ToF) camera and a stereo vision system. The key problem of balancing between the two sources of information is solved by extracting confidence maps for both sources using deep learning. We introduce a novel synthetic dataset accurately representing the data acquired by the proposed setup and use it to train a Convolutional Neural Network architecture. The machine learning framework estimates the reliability of both data sources at each pixel location. The two depth fields are finally fused enforcing the local consistency of depth data taking into account the confidence information. Experimental results show that the proposed approach increases the accuracy of the depth estimation

    3D hand shape analysis for palm and fingers identification

    No full text
    This paper proposes a novel scheme for the extraction and identification of the palm and the fingers from a single depth map. The hand is firstly segmented from the rest of the scene, then it is divided into palm and fingers regions. For this task we employed a novel scheme that exploits the idea that fingers have a tubular shape while the palm is more planar. Following this rationale we applied a contraction guided by the normals in order to reduce the fingers into thinner structures that can be identified by analyzing the changes in the point density. Density-based clustering is then applied and finally a linear programming based approach is employed to identify the various fingers. Experimental results prove the effectiveness of the proposed approach even in complex situations and in presence of inter-occlusions between the various fingers

    Exploiting Silhouette Descriptors and Synthetic Data for Hand Gesture Recognition

    Full text link
    This paper proposes a novel real-time hand gesture recognition scheme explicitly targeted to depth data. The hand silhouette is firstly extracted from the acquired data and then two ad-hoc feature sets are computed from this representation. The first is based on the local curvature of the hand contour, while the second represents the thickness of the hand region close to each contour point using a distance transform. The two feature sets are rearranged in a three dimensional data structure representing the values of the two features at each contour location and then this representation is fed into a multi-class Support Vector Machine. The classifier is trained on a synthetic dataset generated with an ad-hoc rendering system developed for the purposes of this work. This approach allows a fast construction of the training set without the need of manually acquiring large training datasets. Experimental results on real data show how the approach is able to achieve a 90% accuracy on a typical hand gesture recognition dataset with very limited computational resources

    Face Detection Coupling Texture, Color and Depth Data

    No full text
    In this chapter, we propose an ensemble of face detectors for maximizing the number of true positives found by the system. Unfortunately, combining different face detectors increases both the number of true positives and false positives. To overcome this difficulty, several methods for reducing false positives are tested and proposed. The different filtering steps are based on the characteristics of the depth map related to the subwindows of the whole image that contain the candidate faces. The most simple and easiest criteria to use, for instance, is to filter the candidate face region by considering its size in metric units. The experimental section demonstrates that the proposed set of filtering steps greatly reduces the number of false positives without decreasing the detection rate. The proposed approach has been validated on a dataset of 549 images (each including both 2D and depth data) representing 614 upright frontal faces. The images were acquired both outdoors and indoors, with both first and second generation Kinect sensors. This was done in order to simulate a real application scenario. Moreover, for further validation and comparison with the state-of-the-art, our ensemble of face detectors is tested on the widely used BioID dataset where it obtains 100 % detection rate with an acceptable number of false positives. A MATLAB version of the filtering steps and the dataset used in this paper will be freely available from http://​www.​dei.​unipd.​it/​node/​2357
    corecore