Search CORE

1,721,033 research outputs found

High-Precision Depth Estimation Using Uncalibrated LiDAR and Stereo Fusion

Author: Park Kihong
Kim Seungryong
Sohn Kwanghoon
Publication venue
Publication date: 2020
Field of study

We address the problem of 3D reconstruction from uncalibrated LiDAR point cloud and stereo images. Since the usage of each sensor alone for 3D reconstruction has weaknesses in terms of density and accuracy, we propose a deep sensor fusion framework for high-precision depth estimation. The proposed architecture consists of calibration network and depth fusion network, where both networks are designed considering the trade-off between accuracy and efficiency for mobile devices. The calibration network first corrects an initial extrinsic parameter to align the input sensor coordinate systems. The accuracy of calibration is markedly improved by formulating the calibration in the depth domain. In the depth fusion network, complementary characteristics of sparse LiDAR and dense stereo depth are then encoded in a boosting manner. Since training data for the LiDAR and stereo depth fusion are rather limited, we introduce a simple but effective approach to generate pseudo ground truth labels from the raw KITTI dataset. The experimental evaluation verifies that the proposed method outperforms current state-of-the-art methods on the KITTI benchmark. We also collect data using our proprietary multi-sensor acquisition platform and verify that the proposed method generalizes across different sensor settings and scenes.

KAIST Institutional Repository

LAT: Local area transform for cross modal correspondence matching

Author: Ryu Seungchul
Kim Seungryong
Sohn Kwanghoon
Publication venue
Publication date: 2017
Field of study

Establishing correspondences is a fundamental task in many image processing and computer vision applications. In particular, finding the correspondences between a non-linearly deformed image pair induced by different modality conditions is a challenging problem. This paper describes a simple but powerful image transform called local area transform (LAT) for modality-robust correspondence estimation. Specifically, TAT transforms an image from the intensity domain to the local area domain, which is invariant under nonlinear intensity deformations, especially radiometric, photometric, and spectral deformations. Experimental results show that LATransformed images provide a consistency for nonlinearly deformed images, even under random intensity deformations. LAT reduces the mean absolute difference by approximately 0.20 and the different pixel ratio by approximately 58% on average, as compared to conventional methods. Furthermore, the reformulation of descriptors with LAT shows superiority to conventional methods, which is a promising result for the tasks of cross-spectral and modality correspondence matching. LAT gains an approximately 23% improvement in the correct detection ratio and a 10% improvement in the recognition rate for the tasks of RGB-NIR cross-spectral template matching and cross-spectral feature matching, respectively. LAT reduces the bad pixel percentage by approximately 15% and the root mean squared errors by 13.5 in the task of cross-radiation stereo matching. LAT also improves the cross-modal dense flow estimation task in terms of warping error, providing 50% error reduction.

KAIST Institutional Repository

Crossref

Memory-guided Image De-raining Using Time-Lapse Data

Author: Kim Seungryong
Cho Jaehoon
Sohn Kwanghoon
Publication venue
Publication date: 01/01/2022
Field of study

This paper addresses the problem of single image de-raining, that is, the task of recovering clean and rain-free background scenes from a single image obscured by a rainy artifact. Although recent advances adopt real-world time-lapse data to overcome the need for paired rain-clean images, they are limited to fully exploit the time-lapse data. The main cause is that, in terms of network architectures, they could not capture long-term rain streak information in the time-lapse data during training owing to the lack of memory components. To address this problem, we propose a novel network architecture based on a memory network that explicitly helps to capture long-term rain streak information in the time-lapse data. Our network comprises the encoder-decoder networks and a memory network. The features extracted from the encoder are read and updated in the memory network that contains several memory items to store rain streak-aware feature representations. With the read/update operation, the memory network retrieves relevant memory items in terms of the queries, enabling the memory items to represent the various rain streaks included in the time-lapse data. To boost the discriminative power of memory features, we also present a novel background selective whitening (BSW) loss for capturing only rain streak information in the memory network by erasing the background information. Experimental results on standard benchmarks demonstrate the effectiveness and superiority of our approach

arXiv.org e-Print Archive

KAIST Institutional Repository

Joint learning of semantic alignment and object landmark detection

Author: Min Dongbo
Jeon Sangryul
Kim Seungryong
Sohn Kwanghoon
Publication venue
Publication date: 02/11/2019
Field of study

Convolutional neural networks (CNNs) based approaches for semantic alignment and object landmark detection have improved their performance significantly. Current efforts for the two tasks focus on addressing the lack of massive training data through weakly- or unsupervised learning frameworks. In this paper, we present a joint learning approach for obtaining dense correspondences and discovering object landmarks from semantically similar images. Based on the key insight that the two tasks can mutually provide supervisions to each other, our networks accomplish this through a joint loss function that alternatively imposes a consistency constraint between the two tasks, thereby boosting the performance and addressing the lack of training data in a principled manner. To the best of our knowledge, this is the first attempt to address the lack of training data for the two tasks through the joint learning. To further improve the robustness of our framework, we introduce a probabilistic learning formulation that allows only reliable matches to be used in the joint learning process. With the proposed method, state-of-the-art performance is attained on several benchmarks for semantic matching and landmark detection

KAIST Institutional Repository

Unsupervised Stereo Matching Using Confidential Correspondence Consistency

Author: Park Kihong
Joung Sunghun
Kim Seungryong
Sohn Kwanghoon
Publication venue
Publication date: 2020
Field of study

Stereo matching aims to perceive the 3D geometric configuration of scenes and facilitates a variety of computer vision in advanced driver assistance systems (ADAS) applications. Recently, deep convolutional neural networks (CNNs) have shown dramatic performance improvements for computing the matching cost in the stereo matching. However, the performance of CNN-based approaches relies heavily on datasets, requiring a large number of ground truth data which needs tremendous works. To overcome this limitation, we present a novel framework to learn CNNs for matching cost computation in an unsupervised manner. Our method leverages an image domain learning combined with stereo epipolar constraints. By exploiting the correspondence consistency between stereo images, our method selects putative positive samples in each training iteration and utilizes them to train the networks. We further propose a positive sample propagation scheme to leverage additional training samples. Our unsupervised learning method is evaluated with two kinds of network architectures, simple and precise CNNs, and shows comparable performance to that of the state-of-the-art methods including both supervised and unsupervised learning approaches on KITTI, Middlebury, HCI, and Yonsei datasets. This extensive evaluation demonstrates that the proposed learning framework can be applied to deal with various real driving conditions.

KAIST Institutional Repository

Robust stereo matching based on probabilistic Laplacian propagation with weighted mutual information

Author: Ryu Seungchul
Kim Junhyung
Kim Seungryong
Sohn Kwanghoon
Publication venue
Publication date: 10/02/2015
Field of study

Conventional stereo matching methods provide the unsatisfactory results for stereo pairs under uncontrolled environments such as illumination distortions and camera device changes. A majority of efforts to address this problem has devoted to develop robust cost function. However, the stereo matching results by cost function cannot be liberated from a false correspondence when radiometric distortions exist. This paper presents a robust stereo matching approach based on probabilistic Laplacian propagation. In the proposed method, reliable ground control points are selected using weighted mutual information and reliability check. The ground control points are then propagated with probabilistic Laplacian. Since only reliable matching is propagated with the reliability of GCP, the proposed approach is robust to a false initial matching. Experimental results demonstrate the effectiveness of the proposed method in stereo matching for image pairs taken under illumination and exposure distortions

KAIST Institutional Repository

Single Image Deraining Using Time-Lapse Data

Author: Min Dongbo
Kim Seungryong
Cho Jaehoon
Sohn Kwanghoon
Publication venue
Publication date: 01/01/2020
Field of study

Leveraging on recent advances in deep convolutional neural networks (CNNs), single image deraining has been studied as a learning task, achieving an outstanding performance over traditional hand-designed approaches. Current CNNs based deraining approaches adopt the supervised learning framework that uses a massive training data generated with synthetic rain streaks, having a limited generalization ability on real rainy images. To address this problem, we propose a novel learning framework for single image deraining that leverages time-lapse sequences instead of the synthetic image pairs. The deraining networks are trained using the time-lapse sequences in which both camera and scenes are static except for time-varying rain streaks. Specifically, we formulate a background consistency loss such that the deraining networks consistently generate the same derained images from the time-lapse sequences. We additionally introduce two loss functions, the structure similarity loss that encourages the derained image to be similar with an input rainy image and the directional gradient loss using the assumption that the estimated rain streaks are likely to be sparse and have dominant directions. To consider various rain conditions, we leverage a dynamic fusion module that effectively fuses multi-scale features. We also build a novel large-scale time-lapse dataset providing real world rainy images containing various rain conditions. Experiments demonstrate that the proposed method outperforms state-of-the-art techniques on synthetic and real rainy images both qualitatively and quantitatively. On the high-level vision tasks under severe rainy conditions, it has been shown that the proposed method can be utilized as a pre-preprocessing step for subsequent tasks.

KAIST Institutional Repository

Unified Confidence Estimation Networks for Robust Stereo Matching

Author: Min Dongbo
Kim Seungryong
Kim Sunok
Sohn Kwanghoon
Publication venue
Publication date: 2019
Field of study

We present a deep architecture that estimates a stereo confidence, which is essential for improving the accuracy of stereo matching algorithms. In contrast to existing methods based on deep convolutional neural networks (CNNs) that rely on only one of the matching cost volume or estimated disparity map, our network estimates the stereo confidence by using the two heterogeneous inputs simultaneously. Specifically, the matching probability volume is first computed from the matching cost volume with residual networks and a pooling module in a manner that yields greater robustness. The confidence is then estimated through a unified deep network that combines confidence features extracted both from the matching probability volume and its corresponding disparity. In addition, our method extracts the confidence features of the disparity map by applying multiple convolutional filters with varying sizes to an input disparity map. To learn our networks in a semi-supervised manner, we propose a novel loss function that use confident points to compute the image reconstruction loss. To validate the effectiveness of our method in a disparity post-processing step, we employ three post-processing approaches; cost modulation, ground control points-based propagation, and aggregated ground control points-based propagation. Experimental results demonstrate that our method outperforms state-of-the-art confidence estimation methods on various benchmarks.

KAIST Institutional Repository

Dense Cross-Modal Correspondence Estimation With the Deep Self-Correlation Descriptor

Author: Min Dongbo
Kim Seungryong
Lin Stephen
Sohn Kwanghoon
Publication venue
Publication date: 2021
Field of study

We present the deep self-correlation (DSC) descriptor for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. We encode local self-similar structure in a pyramidal manner that yields both more precise localization ability and greater robustness to non-rigid image deformations. Specifically, DSC first computes multiple self-correlation surfaces with randomly sampled patches over a local support window, and then builds pyramidal self-correlation surfaces through average pooling on the surfaces. The feature responses on the self-correlation surfaces are then encoded through spatial pyramid pooling in a log-polar configuration. To better handle geometric variations such as scale and rotation, we additionally propose the geometry-invariant DSC (GI-DSC) that leverages multi-scale self-correlation computation and canonical orientation estimation. In contrast to descriptors based on deep convolutional neural networks (CNNs), DSC and GI-DSC are training-free (i.e., handcrafted descriptors), are robust to cross-modality, and generalize well to various modality variations. Extensive experiments demonstrate the state-of-the-art performance of DSC and GI-DSC on challenging cases of cross-modal image pairs having photometric and/or geometric variations.

KAIST Institutional Repository

Learning to Find Unpaired Cross-Spectral Correspondences

Author: Park Kihong
Kim Seungryong
Jeong Somi
Sohn Kwanghoon
Publication venue
Publication date: 2019
Field of study

We present a deep architecture and learning framework for establishing correspondences across cross-spectral visible and infrared images in an unpaired setting. To overcome the unpaired cross-spectral data problem, we design the unified image translation and feature extraction modules to be learned in a joint and boosting manner. Concretely, the image translation module is learned only with the unpaired cross-spectral data, and the feature extraction module is learned with an input image and its translated image. By learning two modules simultaneously, the image translation module generates the translated image that preserves not only the domain-specific attributes with separate latent spaces but also the domain-agnostic contents with feature consistency constraint. In an inference phase, the cross-spectral feature similarity is augmented by intra-spectral similarities between the features extracted from the translated images. Experimental results show that this model outperforms the state-of-the-art unpaired image translation methods and cross-spectral feature descriptors on various visible and infrared benchmarks.

KAIST Institutional Repository