1,721,033 research outputs found
High-Precision Depth Estimation Using Uncalibrated LiDAR and Stereo Fusion
We address the problem of 3D reconstruction from uncalibrated LiDAR point cloud and stereo images. Since the usage of each sensor alone for 3D reconstruction has weaknesses in terms of density and accuracy, we propose a deep sensor fusion framework for high-precision depth estimation. The proposed architecture consists of calibration network and depth fusion network, where both networks are designed considering the trade-off between accuracy and efficiency for mobile devices. The calibration network first corrects an initial extrinsic parameter to align the input sensor coordinate systems. The accuracy of calibration is markedly improved by formulating the calibration in the depth domain. In the depth fusion network, complementary characteristics of sparse LiDAR and dense stereo depth are then encoded in a boosting manner. Since training data for the LiDAR and stereo depth fusion are rather limited, we introduce a simple but effective approach to generate pseudo ground truth labels from the raw KITTI dataset. The experimental evaluation verifies that the proposed method outperforms current state-of-the-art methods on the KITTI benchmark. We also collect data using our proprietary multi-sensor acquisition platform and verify that the proposed method generalizes across different sensor settings and scenes.
LAT: Local area transform for cross modal correspondence matching
Establishing correspondences is a fundamental task in many image processing and computer vision applications. In particular, finding the correspondences between a non-linearly deformed image pair induced by different modality conditions is a challenging problem. This paper describes a simple but powerful image transform called local area transform (LAT) for modality-robust correspondence estimation. Specifically, TAT transforms an image from the intensity domain to the local area domain, which is invariant under nonlinear intensity deformations, especially radiometric, photometric, and spectral deformations. Experimental results show that LATransformed images provide a consistency for nonlinearly deformed images, even under random intensity deformations. LAT reduces the mean absolute difference by approximately 0.20 and the different pixel ratio by approximately 58% on average, as compared to conventional methods. Furthermore, the reformulation of descriptors with LAT shows superiority to conventional methods, which is a promising result for the tasks of cross-spectral and modality correspondence matching. LAT gains an approximately 23% improvement in the correct detection ratio and a 10% improvement in the recognition rate for the tasks of RGB-NIR cross-spectral template matching and cross-spectral feature matching, respectively. LAT reduces the bad pixel percentage by approximately 15% and the root mean squared errors by 13.5 in the task of cross-radiation stereo matching. LAT also improves the cross-modal dense flow estimation task in terms of warping error, providing 50% error reduction.
Memory-guided Image De-raining Using Time-Lapse Data
This paper addresses the problem of single image de-raining, that is, the
task of recovering clean and rain-free background scenes from a single image
obscured by a rainy artifact. Although recent advances adopt real-world
time-lapse data to overcome the need for paired rain-clean images, they are
limited to fully exploit the time-lapse data. The main cause is that, in terms
of network architectures, they could not capture long-term rain streak
information in the time-lapse data during training owing to the lack of memory
components. To address this problem, we propose a novel network architecture
based on a memory network that explicitly helps to capture long-term rain
streak information in the time-lapse data. Our network comprises the
encoder-decoder networks and a memory network. The features extracted from the
encoder are read and updated in the memory network that contains several memory
items to store rain streak-aware feature representations. With the read/update
operation, the memory network retrieves relevant memory items in terms of the
queries, enabling the memory items to represent the various rain streaks
included in the time-lapse data. To boost the discriminative power of memory
features, we also present a novel background selective whitening (BSW) loss for
capturing only rain streak information in the memory network by erasing the
background information. Experimental results on standard benchmarks demonstrate
the effectiveness and superiority of our approach
Joint learning of semantic alignment and object landmark detection
Convolutional neural networks (CNNs) based approaches for semantic alignment and object landmark detection have improved their performance significantly. Current efforts for the two tasks focus on addressing the lack of massive training data through weakly- or unsupervised learning frameworks. In this paper, we present a joint learning approach for obtaining dense correspondences and discovering object landmarks from semantically similar images. Based on the key insight that the two tasks can mutually provide supervisions to each other, our networks accomplish this through a joint loss function that alternatively imposes a consistency constraint between the two tasks, thereby boosting the performance and addressing the lack of training data in a principled manner. To the best of our knowledge, this is the first attempt to address the lack of training data for the two tasks through the joint learning. To further improve the robustness of our framework, we introduce a probabilistic learning formulation that allows only reliable matches to be used in the joint learning process. With the proposed method, state-of-the-art performance is attained on several benchmarks for semantic matching and landmark detection
Unsupervised Stereo Matching Using Confidential Correspondence Consistency
Stereo matching aims to perceive the 3D geometric configuration of scenes and facilitates a variety of computer vision in advanced driver assistance systems (ADAS) applications. Recently, deep convolutional neural networks (CNNs) have shown dramatic performance improvements for computing the matching cost in the stereo matching. However, the performance of CNN-based approaches relies heavily on datasets, requiring a large number of ground truth data which needs tremendous works. To overcome this limitation, we present a novel framework to learn CNNs for matching cost computation in an unsupervised manner. Our method leverages an image domain learning combined with stereo epipolar constraints. By exploiting the correspondence consistency between stereo images, our method selects putative positive samples in each training iteration and utilizes them to train the networks. We further propose a positive sample propagation scheme to leverage additional training samples. Our unsupervised learning method is evaluated with two kinds of network architectures, simple and precise CNNs, and shows comparable performance to that of the state-of-the-art methods including both supervised and unsupervised learning approaches on KITTI, Middlebury, HCI, and Yonsei datasets. This extensive evaluation demonstrates that the proposed learning framework can be applied to deal with various real driving conditions.
Robust stereo matching based on probabilistic Laplacian propagation with weighted mutual information
Conventional stereo matching methods provide the unsatisfactory results for stereo pairs under uncontrolled environments such as illumination distortions and camera device changes. A majority of efforts to address this problem has devoted to develop robust cost function. However, the stereo matching results by cost function cannot be liberated from a false correspondence when radiometric distortions exist. This paper presents a robust stereo matching approach based on probabilistic Laplacian propagation. In the proposed method, reliable ground control points are selected using weighted mutual information and reliability check. The ground control points are then propagated with probabilistic Laplacian. Since only reliable matching is propagated with the reliability of GCP, the proposed approach is robust to a false initial matching. Experimental results demonstrate the effectiveness of the proposed method in stereo matching for image pairs taken under illumination and exposure distortions
Single Image Deraining Using Time-Lapse Data
Leveraging on recent advances in deep convolutional neural networks (CNNs), single image deraining has been studied as a learning task, achieving an outstanding performance over traditional hand-designed approaches. Current CNNs based deraining approaches adopt the supervised learning framework that uses a massive training data generated with synthetic rain streaks, having a limited generalization ability on real rainy images. To address this problem, we propose a novel learning framework for single image deraining that leverages time-lapse sequences instead of the synthetic image pairs. The deraining networks are trained using the time-lapse sequences in which both camera and scenes are static except for time-varying rain streaks. Specifically, we formulate a background consistency loss such that the deraining networks consistently generate the same derained images from the time-lapse sequences. We additionally introduce two loss functions, the structure similarity loss that encourages the derained image to be similar with an input rainy image and the directional gradient loss using the assumption that the estimated rain streaks are likely to be sparse and have dominant directions. To consider various rain conditions, we leverage a dynamic fusion module that effectively fuses multi-scale features. We also build a novel large-scale time-lapse dataset providing real world rainy images containing various rain conditions. Experiments demonstrate that the proposed method outperforms state-of-the-art techniques on synthetic and real rainy images both qualitatively and quantitatively. On the high-level vision tasks under severe rainy conditions, it has been shown that the proposed method can be utilized as a pre-preprocessing step for subsequent tasks.
Unified Confidence Estimation Networks for Robust Stereo Matching
We present a deep architecture that estimates a stereo confidence, which is essential for improving the accuracy of stereo matching algorithms. In contrast to existing methods based on deep convolutional neural networks (CNNs) that rely on only one of the matching cost volume or estimated disparity map, our network estimates the stereo confidence by using the two heterogeneous inputs simultaneously. Specifically, the matching probability volume is first computed from the matching cost volume with residual networks and a pooling module in a manner that yields greater robustness. The confidence is then estimated through a unified deep network that combines confidence features extracted both from the matching probability volume and its corresponding disparity. In addition, our method extracts the confidence features of the disparity map by applying multiple convolutional filters with varying sizes to an input disparity map. To learn our networks in a semi-supervised manner, we propose a novel loss function that use confident points to compute the image reconstruction loss. To validate the effectiveness of our method in a disparity post-processing step, we employ three post-processing approaches; cost modulation, ground control points-based propagation, and aggregated ground control points-based propagation. Experimental results demonstrate that our method outperforms state-of-the-art confidence estimation methods on various benchmarks.
Dense Cross-Modal Correspondence Estimation With the Deep Self-Correlation Descriptor
We present the deep self-correlation (DSC) descriptor for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. We encode local self-similar structure in a pyramidal manner that yields both more precise localization ability and greater robustness to non-rigid image deformations. Specifically, DSC first computes multiple self-correlation surfaces with randomly sampled patches over a local support window, and then builds pyramidal self-correlation surfaces through average pooling on the surfaces. The feature responses on the self-correlation surfaces are then encoded through spatial pyramid pooling in a log-polar configuration. To better handle geometric variations such as scale and rotation, we additionally propose the geometry-invariant DSC (GI-DSC) that leverages multi-scale self-correlation computation and canonical orientation estimation. In contrast to descriptors based on deep convolutional neural networks (CNNs), DSC and GI-DSC are training-free (i.e., handcrafted descriptors), are robust to cross-modality, and generalize well to various modality variations. Extensive experiments demonstrate the state-of-the-art performance of DSC and GI-DSC on challenging cases of cross-modal image pairs having photometric and/or geometric variations.
Learning to Find Unpaired Cross-Spectral Correspondences
We present a deep architecture and learning framework for establishing correspondences across cross-spectral visible and infrared images in an unpaired setting. To overcome the unpaired cross-spectral data problem, we design the unified image translation and feature extraction modules to be learned in a joint and boosting manner. Concretely, the image translation module is learned only with the unpaired cross-spectral data, and the feature extraction module is learned with an input image and its translated image. By learning two modules simultaneously, the image translation module generates the translated image that preserves not only the domain-specific attributes with separate latent spaces but also the domain-agnostic contents with feature consistency constraint. In an inference phase, the cross-spectral feature similarity is augmented by intra-spectral similarities between the features extracted from the translated images. Experimental results show that this model outperforms the state-of-the-art unpaired image translation methods and cross-spectral feature descriptors on various visible and infrared benchmarks.
- …
