1,721,110 research outputs found

    Stereo vision algorithms suited to constrained FPGA cameras

    No full text
    The advent of cheap RGBD active 3D sensors, such as those based on structured light (e.g., the Microsoft Kinect) or those based on time-of-flight technology, has significantly increased the interest in computer vision applications based on depth data that, in most cases, enables higher robustness compared to solutions based on traditional 2D images. Unfortunately, active techniques are quite noisy or even completely useless in outdoor environments (in particular under sunlight). An effective and well-known technique to infer depth suited to indoor and outdoor environments is passive stereo vision. Nevertheless, despite the frequent deployment of this technology in many research projects since the 1960s, stereo vision is often perceived, especially in consumer applications, as an expensive technology due to its high demanding computation requirements. In this paper, we will review a subset of state-of-the-art stereo vision algorithms that have the potential to fit with a basic computing architecture made of a low-cost field-programmable gate arrays (FPGAs), without additional external devices (e.g., FIFOs, DDR memories, etc.) excluding a USB or GigaEthernet communication controller. Compared to more complex designs based on expensive FPGAs coupled with additional external memory devices, clear advantages of the outlined simplified computing architecture are the reduced design and manufacturing costs as well as the reduced power consumption. Another significant advantage consists in better code portability as well as in improved robustness with respect to obsolescence of electronic devices being almost the whole design self-contained into the FPGA logic. On the other hand, mapping stereo vision algorithms into a similar low-power, low-cost architecture poses a very challenging task and only a subset of existing algorithms appropriately modified are suited to this constrained computing platform. Nevertheless, we believe that devices based on such a proposed simplified computing architecture would make RGBD sensors based on stereo vision suitable to a wider class of application scenarios not yet fully addressed by this technology

    Fast stereo matching for the VIDET system using a general purpose processor with multimedia extensions

    No full text
    The ever-increasing speed of current general purpose processors, together with architectural enhancements such as multimedia-oriented instruction set extensions, allow for deploying standard PC-based systems in a number of computationally intensive computer vision tasks. This paper describes the PC-based real-time stereo vision system developed within the VIDET project, which is a research project aimed at the development of a mobility aid for the visually impaired. VIDET's approach consists in the conversion of depth data gathered through a stereo vision system into a 3D model perceivable by the user by means of a wire-actuated haptic interface. The developed stereo matching algorithm makes massive use of recursion and multimedia instructions to achieve the performance figures needed to sustain user's real-time interaction with the 3D model through the haptic interface

    Contrastive Learning for Depth Prediction

    Full text link
    Depth prediction is at the core of several computer vision applications, such as autonomous driving and robotics. It is often formulated as a regression task in which depth values are estimated through network layers. Unfortunately, the distribution of values on depth maps is seldom explored. Therefore, this paper proposes a novel framework combining contrastive learning and depth prediction, allowing us to pay more attention to depth distribution and consequently enabling improvements to the overall estimation process. Purposely, we propose a window-based contrastive learning module, which partitions the feature maps into non-overlapping windows and constructs contrastive loss within each one. Forming and sorting positive and negative pairs, then enlarging the gap between the two in the representation space, constraints depth distribution to fit the feature of the depth map. Experiments on KITTI and NYU datasets demonstrate the effectiveness of our framework

    A wearable mobility aid for the visually Impaired based on embedded 3D vision and deep learning

    No full text
    In this paper we propose an effective and wearable mobility aid for people suffering of visual impairments purely based on 3D computer vision and machine learning techniques. By wearing our device the users can perceive, guided by audio messages and tactile feedback, crucial information concerned with the surrounding environment and hence avoid obstacles along the path. Our proposal can work in synergy with the white cane and allows for very effective and real-time obstacle detection on an embedded computer, by processing the point-cloud provided by a custom RGBD sensor, based on passive stereo vision. Moreover, our system, leveraging on deep-learning techniques, enables to semantically categorize the detected obstacles in order to increase the awareness of the explored environment. It can optionally work in synergy with a smartphone, wirelessly connected to the the proposed mobility aid, exploiting its audio capability and standard GPS-based navigation tools such as Google Maps. The overall system can operate in real-time for hours using a small battery, making it suitable for everyday life. Experimental results confirmed that our proposal has excellent obstacle detection performance and has a promising semantic categorization capability

    Good cues to learn from scratch a confidence measure for passive depth sensors

    Full text link
    As reported in the stereo literature, confidence estimation represents a powerful cue to detect outliers as well as to improve depth accuracy. Purposely, we proposed a strategy enabling us to achieve state-of-the-art results by learning a confidence measure in the disparity domain only with a CNN. Since this method does not require the cost volume, it is very appealing because potentially suited for any depth-sensing technologies, including, for instance, those based on deep networks. By following this intuition, in this paper, we deeply investigate the performance of confidence estimation methods, known in the literature and new ones proposed in this paper, neglecting the use of the cost volume. Specifically, we estimate from scratch confidence measures feeding deep networks with raw depth estimates and optionally images and assess their performance deploying three datasets and three stereo algorithms. We also investigate, for the first time, their performance with disparity maps inferred by deep stereo end-to-end architectures. Moreover, we move beyond the stereo matching context, estimating confidence from depth maps generated by a monocular network. Our extensive experiments with different architectures highlight that inferring confidence prediction from the raw reference disparity only, as proposed in our previous work, is not only the most versatile solution but also the most effective one in most cases

    Exploring Few-Beam LiDAR Assistance in Self-Supervised Multi-Frame Depth Estimation

    No full text
    Self-supervised multi-frame depth estimation methods only require unlabeled monocular videos for training. However, most existing methods face challenges, including accuracy degradation caused by moving objects in dynamic scenes and scale ambiguity due to the absence of real-world references. In this field, the emergence of low-cost LiDAR sensors highlights the potential to improve the robustness of multi-frame depth estimation by exploiting accurate sparse measurements at the correct scale. Moreover, the LiDAR ranging points often intersect moving objects, providing more precise depth cues for them. This paper explores the impact of few-beam LiDAR data on self-supervised multi-frame depth estimation, proposing a method that fuses multi-frame matching and sparse depth features. It significantly enhances depth estimation robustness, particularly in scenarios involving moving objects and textureless backgrounds. We demonstrate the effectiveness of our approach through comprehensive experiments, showcasing its potential to address the limitations of existing methods and paving the way for more robust and reliable depth estimation based on this paradigm

    TemporalStereo: Effcient Spatial-Temporal Stereo Matching Network

    Full text link
    We present TemporalStereo, a coarse-to-fine stereo matching network that is highly efficient, and able to effectively exploit the past geometry and context information to boost matching accuracy. Our network leverages sparse cost volume and proves to be effective when a single stereo pair is given. However, its peculiar ability to use spatio-temporal information across stereo sequences allows TemporalStereo to alleviate problems such as occlusions and reflective regions while enjoying high efficiency also in this latter case. Notably, our model - trained once with stereo videos - can run in both single-pair and temporal modes seamlessly. Experiments show that our network relying on camera motion is robust even to dynamic objects when running on videos. We validate TemporalStereo through extensive experiments on synthetic (SceneFlow, TartanAir) and real (KITTI 2012, KITTI 2015) datasets. Our model achieves state-of-the-art performance on any of these datasets

    Learning a confidence measure in the disparity domain from O(1) features

    No full text
    Depth sensing is of paramount importance for countless applications and stereo represents a popular, effective and cheap solution for this purpose. As highlighted by recent works concerned with stereo, uncertainty estimation can be a powerful cue to improve accuracy in stereo. Most confidence measures rely on features, mainly extracted from the cost volume, fed to a random forest or a convolutional neural network trained to estimate match uncertainty. In contrast, we propose a novel strategy for confidence estimation based on features computed in the disparity domain, making our proposal suited for any stereo system including COTS devices, and in constant time. We exhaustively assess the performance of our proposals, referred to as O1 and O2, on KITTI and Middlebury datasets with three popular and different stereo algorithms (CENSUS, MC-CNN and SGM), as well as a deep stereo network (PSM-Net). We also evaluate how well confidence measures generalize to different environments/datasets

    Evaluation of variants of the SGM algorithm aimed at implementation on embedded or reconfigurable devices

    No full text
    Inferring dense depth from stereo is crucial for several computer vision applications and stereo cameras based on embedded systems and/or reconfigurable devices such as FPGA became quite popular in the past years. In this field Semi Global Matching (SGM) is, in most cases, the preferred algorithm due to its good trade-off between accuracy and computation requirements. Nevertheless, a careful design of the processing pipeline enables significant improvements in terms of disparity map accuracy, hardware resources and frame rate. In particular factors like the amount of matching costs and parameters, such as the number/selection of scanlines, and so on have a great impact on the overall resource requirements. In this paper we evaluate different variants of the SGM algorithm suited for implementation on embedded or reconfigurable devices looking for the best compromise in terms of resource requirements, accuracy of the disparity estimation and running time. To assess quantitatively the effectiveness of the considered variants we adopt the KITTI 2015 training dataset, a challenging and standard benchmark with ground truth containing several realistic scenes

    Learning to Predict Stereo Reliability Enforcing Local Consistency of Confidence Maps

    No full text
    Confidence measures estimate unreliable disparity assignments performed by a stereo matching algorithm and, as recently proved, can be used for several purposes. This paper aims at increasing, by means of a deep network, the effectiveness of state-of-the-art confidence measures exploiting the local consistency assumption. We exhaustively evaluated our proposal on 23 confidence measures, including 5 top-performing ones based on random-forests and CNNs, training our networks with two popular stereo algorithms and a small subset (25 out of 194 frames) of the KITTI 2012 dataset. Experimental results show that our approach dramatically increases the effectiveness of all the 23 confidence measures on the remaining frames. Moreover, without re-training, we report a further cross-evaluation on KITTI 2015 and Middlebury 2014 confirming that our proposal provides remarkable improvements for each confidence measure even when dealing with significantly different input data. To the best of our knowledge, this is the first method to move beyond conventional pixel-wise confidence estimation
    corecore