1,721,053 research outputs found
Federated Online Adaptation for Deep Stereo
We introduce a novel approach for adapting deep stereo networks in a collaborative manner. By building over principles of federated learning, we develop a distributed framework allowing for demanding the optimization process to a number of clients deployed in different environments. This makes it possible, for a deep stereo network running on resourced-constrained devices, to capitalize on the adaptation process carried out by other instances of the same architecture, and thus improve its accuracy in challenging environments even when it cannot carry out adaptation on its own. Experimental results show how federated adaptation performs equivalently to on-device adaptation, and even better when dealing with challenging environments
Contrastive Learning for Depth Prediction
Depth prediction is at the core of several computer vision applications, such as autonomous driving and robotics. It is often formulated as a regression task in which depth values are estimated through network layers. Unfortunately, the distribution of values on depth maps is seldom explored. Therefore, this paper proposes a novel framework combining contrastive learning and depth prediction, allowing us to pay more attention to depth distribution and consequently enabling improvements to the overall estimation process. Purposely, we propose a window-based contrastive learning module, which partitions the feature maps into non-overlapping windows and constructs contrastive loss within each one. Forming and sorting positive and negative pairs, then enlarging the gap between the two in the representation space, constraints depth distribution to fit the feature of the depth map. Experiments on KITTI and NYU datasets demonstrate the effectiveness of our framework
A Survey on Deep Stereo Matching in the Twenties
Stereo matching is close to hitting a half-century of history, yet witnessed a rapid evolution in the last decade thanks to deep learning. While previous surveys in the late 2010s covered the first stage of this revolution, the last five years of research brought further ground-breaking advancements to the field. This paper aims to fill this gap in a two-fold manner: first, we offer an in-depth examination of the latest developments in deep stereo matching, focusing on the pioneering architectural designs and groundbreaking paradigms that have redefined the field in the 2020s; second, we present a thorough analysis of the critical challenges that have emerged alongside these advances, providing a comprehensive taxonomy of these issues and exploring the state-of-the-art techniques proposed to address them. By reviewing both the architectural innovations and the key challenges, we offer a holistic view of deep stereo matching and highlight the specific areas that require further investigation. To accompany this survey, we maintain a regularly updated project page that catalogs papers on deep stereo matching in our Awesome-Deep-Stereo-Matching repository
Learning a confidence measure in the disparity domain from O(1) features
Depth sensing is of paramount importance for countless applications and stereo represents a popular, effective and cheap solution for this purpose. As highlighted by recent works concerned with stereo, uncertainty estimation can be a powerful cue to improve accuracy in stereo. Most confidence measures rely on features, mainly extracted from the cost volume, fed to a random forest or a convolutional neural network trained to estimate match uncertainty. In contrast, we propose a novel strategy for confidence estimation based on features computed in the disparity domain, making our proposal suited for any stereo system including COTS devices, and in constant time. We exhaustively assess the performance of our proposals, referred to as O1 and O2, on KITTI and Middlebury datasets with three popular and different stereo algorithms (CENSUS, MC-CNN and SGM), as well as a deep stereo network (PSM-Net). We also evaluate how well confidence measures generalize to different environments/datasets
TemporalStereo: Effcient Spatial-Temporal Stereo Matching Network
We present TemporalStereo, a coarse-to-fine stereo matching network that is highly efficient, and able to effectively exploit the past geometry and context information to boost matching accuracy. Our network leverages sparse cost volume and proves to be effective when a single stereo pair is given. However, its peculiar ability to use spatio-temporal information across stereo sequences allows TemporalStereo to alleviate problems such as occlusions and reflective regions while enjoying high efficiency also in this latter case. Notably, our model - trained once with stereo videos - can run in both single-pair and temporal modes seamlessly. Experiments show that our network relying on camera motion is robust even to dynamic objects when running on videos. We validate TemporalStereo through extensive experiments on synthetic (SceneFlow, TartanAir) and real (KITTI 2012, KITTI 2015) datasets. Our model achieves state-of-the-art performance on any of these datasets
Good cues to learn from scratch a confidence measure for passive depth sensors
As reported in the stereo literature, confidence
estimation represents a powerful cue to detect outliers as well as
to improve depth accuracy. Purposely, we proposed a strategy
enabling us to achieve state-of-the-art results by learning a confidence
measure in the disparity domain only with a CNN. Since
this method does not require the cost volume, it is very appealing
because potentially suited for any depth-sensing technologies, including,
for instance, those based on deep networks. By following
this intuition, in this paper, we deeply investigate the performance
of confidence estimation methods, known in the literature and
new ones proposed in this paper, neglecting the use of the
cost volume. Specifically, we estimate from scratch confidence
measures feeding deep networks with raw depth estimates and
optionally images and assess their performance deploying three
datasets and three stereo algorithms. We also investigate, for the
first time, their performance with disparity maps inferred by deep
stereo end-to-end architectures. Moreover, we move beyond the
stereo matching context, estimating confidence from depth maps
generated by a monocular network. Our extensive experiments
with different architectures highlight that inferring confidence
prediction from the raw reference disparity only, as proposed in
our previous work, is not only the most versatile solution but
also the most effective one in most cases
Exploring Few-Beam LiDAR Assistance in Self-Supervised Multi-Frame Depth Estimation
Self-supervised multi-frame depth estimation methods only require unlabeled monocular videos for training. However, most existing methods face challenges, including accuracy degradation caused by moving objects in dynamic scenes and scale ambiguity due to the absence of real-world references. In this field, the emergence of low-cost LiDAR sensors highlights the potential to improve the robustness of multi-frame depth estimation by exploiting accurate sparse measurements at the correct scale. Moreover, the LiDAR ranging points often intersect moving objects, providing more precise depth cues for them. This paper explores the impact of few-beam LiDAR data on self-supervised multi-frame depth estimation, proposing a method that fuses multi-frame matching and sparse depth features. It significantly enhances depth estimation robustness, particularly in scenarios involving moving objects and textureless backgrounds. We demonstrate the effectiveness of our approach through comprehensive experiments, showcasing its potential to address the limitations of existing methods and paving the way for more robust and reliable depth estimation based on this paradigm
A wearable mobility aid for the visually Impaired based on embedded 3D vision and deep learning
In this paper we propose an effective and wearable mobility aid for people suffering of visual impairments purely based on 3D computer vision and machine learning techniques. By wearing our device the users can perceive, guided by audio messages and tactile feedback, crucial information concerned with the surrounding environment and hence avoid obstacles along the path. Our proposal can work in synergy with the white cane and allows for very effective and real-time obstacle detection on an embedded computer, by processing the point-cloud provided by a custom RGBD sensor, based on passive stereo vision. Moreover, our system, leveraging on deep-learning techniques, enables to semantically categorize the detected obstacles in order to increase the awareness of the explored environment. It can optionally work in synergy with a smartphone, wirelessly connected to the the proposed mobility aid, exploiting its audio capability and standard GPS-based navigation tools such as Google Maps. The overall system can operate in real-time for hours using a small battery, making it suitable for everyday life. Experimental results confirmed that our proposal has excellent obstacle detection performance and has a promising semantic categorization capability
Learning to Predict Stereo Reliability Enforcing Local Consistency of Confidence Maps
Confidence measures estimate unreliable disparity assignments performed by a stereo matching algorithm and, as recently proved, can be used for several purposes. This paper aims at increasing, by means of a deep network, the effectiveness of state-of-the-art confidence measures exploiting the local consistency assumption. We exhaustively evaluated our proposal on 23 confidence measures, including 5 top-performing ones based on random-forests and CNNs, training our networks with two popular stereo algorithms and a small subset (25 out of 194 frames) of the KITTI 2012 dataset. Experimental results show that our approach dramatically increases the effectiveness of all the 23 confidence measures on the remaining frames. Moreover, without re-training, we report a further cross-evaluation on KITTI 2015 and Middlebury 2014 confirming that our proposal provides remarkable improvements for each confidence measure even when dealing with significantly different input data. To the best of our knowledge, this is the first method to move beyond conventional pixel-wise confidence estimation
- …
