1,721,037 research outputs found

    Synergistic change detection and tracking

    No full text
    Visual tracking in image streams acquired by static cameras is usually based on change detection and recursive Bayesian estimation, such an approach laying at the core of many practical applications. Yet, the interaction between the change detector and the Bayesian filter is typically designed heuristically. Differently, this paper develops a sound framework to model and implement a bidirectional communication flow between the two processes. In our Bayesian loop, change detection provides well-defined observation likelihood to the recursive filter and the filter prediction provides an informative prior to the change detector, which deploys Bayesian reasoning alike. The loop is developed for the two major variants of Bayesian filters used in tracking, namely the Kalman filter and the particle filter. Experiments on publicly available videos and a novel challenging data set show that the proposed interaction scheme outperforms several state-of-the-art trackers

    Keypoint detection by wave propagation

    Full text link
    We propose to rely on the wave equation for the detection of repeatable keypoints invariant up to image scale and rotation and robust to viewpoint variations, blur, and lighting changes. The algorithm exploits the properties of local spatial–temporal extrema of the evolution of image intensities under the wave propagation to highlight salient symmetries at different scales. Although the image structures found by most state-of-the-art detectors, such as blobs and corners, occur typically on highly textured surfaces, salient symmetries are widespread in diverse kinds of images, including those related to poorly textured objects, which are hardly dealt with by current pipelines based on local invariant features. The impact on the overall algorithm of different numerical wave simulation schemes and their parameters is discussed, and a pyramidal approximation to speed-up the simulation is proposed and validated. Experiments on publicly available datasets show that the proposed algorithm offers state-of-the-art repeatability on a broad set of different images while detecting regions that can be distinctively described and robustly matched

    Performance evaluation of learned 3D features

    No full text
    Matching surfaces is a challenging 3D Computer Vision problem typically addressed by local features. Although a variety of 3D feature detectors and descriptors has been proposed in literature, they have seldom been proposed together and it is yet not clear how to identify the most effective detector-descriptor pair for a specific application. A promising solution is to leverage machine learning to learn the optimal 3D detector for any given 3D descriptor [15]. In this paper, we report a performance evaluation of the detector-descriptor pairs obtained by learning a paired 3D detector for the most popular 3D descriptors. In particular, we address experimental settings dealing with object recognition and surface registration

    Performance Evaluation of 3D Descriptors Paired with Learned Keypoint Detectors

    Full text link
    Matching surfaces is a challenging 3D Computer Vision problem typically addressed by local features. Although a plethora of 3D feature detectors and descriptors have been proposed in literature, it is quite difficult to identify the most effective detector-descriptor pair in a certain application. Yet, it has been shown in recent works that machine learning algorithms can be used to learn an effective 3D detector for any given 3D descriptor. In this paper, we present a performance evaluation of the detector-descriptor pairs obtained by learning a 3D detector for the most popular 3D descriptors. Purposely, we address experimental settings dealing with object recognition and surface registration. Our results show how pairing a learned detector to a learned descriptors like CGF leads to effective local features when pursuing object recognition (e.g., 0.45 recall at 0.8 precision on the UWA dataset), while there is not a clear performance gap between CGF and effective hand-crafted features like SHOT for surface registration (0.18 average precision for the former versus 0.16 for the latter)

    Learning an Effective Equivariant 3D Descriptor Without Supervision

    No full text
    Establishing correspondences between 3D shapes is a fundamental task in 3D Computer Vision, typically addressed by matching local descriptors. Recently, a few attempts at applying the deep learning paradigm to the task have shown promising results. Yet, the only explored way to learn rotation invariant descriptors has been to feed neural networks with highly engineered and invariant representations provided by existing hand-crafted descriptors, a path that goes in the opposite direction of end-to-end learning from raw data so successfully deployed for 2D images. In this paper, we explore the benefits of taking a step back in the direction of end-to-end learning of 3D descriptors by disentangling the creation of a robust and distinctive rotation equivariant representation, which can be learned from unoriented input data, and the definition of a good canonical orientation, required only at test time to obtain an invariant descriptor. To this end, we leverage two recent innovations: spherical convolutional neural networks to learn an equivariant descriptor and plane folding decoders to learn without supervision. The effectiveness of the proposed approach is experimentally validated by out- performing hand-crafted and learned descriptors on a standard benchmark

    3D Local Descriptors—from Handcrafted to Learned

    Full text link
    Surface matching is a fundamental task in 3D computer vision, typically tackled by describing and matching local features computed from the 3D surface. As a result, description of local features lays the foundations for a variety of applications processing 3D data, such as 3D object recognition, 3D registration and reconstruction, and SLAM. A variety of algorithms for 3D feature description exists in the scientific literature. The majority of them are based on different, handcrafted ways to encode and exploit the geometric properties of a given surface. Recently, the success of deep neural networks for processing images has fueled also a data-driven approach to learn descriptive features from 3D data. This chapter provides a comprehensive review of the main proposals in the field

    Learning a descriptor-specific 3D keypoint detector

    No full text
    Keypoint detection represents the first stage in the majority of modern computer vision pipelines based on automatically established correspondences between local descriptors. However, no standard solution has emerged yet in the case of 3D data such as point clouds or meshes, which exhibit high variability in level of detail and noise. More importantly, existing proposals for 3D keypoint detection rely on geometric saliency functions that attempt to maximize repeatability rather than distinctiveness of the selected regions, which may lead to sub-optimal performance of the overall pipeline. To overcome these shortcomings, we cast 3D keypoint detection as a binary classification between points whose support can be correctly matched by a predefined 3D descriptor or not, thereby learning a descriptor-specific detector that adapts seamlessly to different scenarios. Through experiments on several public datasets, we show that this novel approach to the design of a keypoint detector represents a flexible solution that, nonetheless, can provide state-of-the-art descriptor matching performance

    Efficient compact descriptors in visual search systems

    No full text
    Disclosed embodiments are directed to methods, systems, and circuits of generating compact descriptors for transmission over a communications network. A method according to one embodiment includes receiving an uncompressed descriptor, performing zero-thresholding on the uncompressed descriptor to generate a zero-threshold-delimited descriptor, quantizing the zero-threshold-delimited descriptor to generate a quantized descriptor, and coding the quantized descriptor to generate a compact descriptor for transmission over a communications network. The uncompressed and compact descriptors may be 3D descriptors, such as where the uncompressed descriptor is a SHOT descriptor. The operation of coding can be ZeroFlag coding, ExpGolomb coding, or Arithmetic coding, for example

    Learning Across Tasks and Domains

    No full text
    Recent works have proven that many relevant visual tasks are closely related one to another. Yet, this connection is seldom deployed in practice due to the lack of practical methodologies to transfer learned concepts across different training processes. In this work, we introduce a novel adaptation framework that can operate across both task and domains. Our framework learns to transfer knowledge across tasks in a fully supervised domain (e.g., synthetic data) and use this knowledge on a different domain where we have only partial supervision (e.g., real data). Our proposal is complementary to existing domain adaptation techniques and extends them to cross tasks scenarios providing additional performance gains. We prove the effectiveness of our framework across two challenging tasks (i.e., monocular depth estimation and semantic segmentation) and four different domains (Synthia, Carla, Kitti, and Cityscapes)

    Unsupervised Learning of Local Equivariant Descriptors for Point Clouds

    Full text link
    Correspondences between 3D keypoints generated by matching local descriptors are a key step in 3D computer vision and graphic applications. Learned descriptors are rapidly evolving and outperforming the classical handcrafted approaches in the field. Yet, to learn effective representations they require supervision through labeled data, which are cumbersome and time-consuming to obtain. Unsupervised alternatives exist, but they lag in performance. Moreover, invariance to viewpoint changes is attained either by relying on data augmentation, which is prone to degrading upon generalization on unseen datasets, or by learning from handcrafted representations of the input which are already rotation invariant but whose effectiveness at training time may significantly affect the learned descriptor. We show how learning an equivariant 3D local descriptor instead of an invariant one can overcome both issues. LEAD (Local EquivAriant Descriptor) combines Spherical CNNs to learn an equivariant representation together with plane-folding decoders to learn without supervision. Through extensive experiments on standard surface registration datasets, we show how our proposal outperforms existing unsupervised methods by a large margin and achieves competitive results against the supervised approaches, especially in the practically very relevant scenario of transfer learning
    corecore