1,721,238 research outputs found

    STD2P: RGBD semantic segmentation using spatio-temporal data-driven pooling

    Full text link
    Beyond the success in classification, neural networks have recently shown strong results on pixel-wise prediction tasks like image semantic segmentation on RGBD data. However, the commonly used deconvolutional layers for upsampling intermediate representations to the full-resolution output still show different failure modes, like imprecise segmentation boundaries and label mistakes in particular on large, weakly textured objects (e.g. fridge, whiteboard, door). We attribute these errors in part to the rigid way, current network aggregate information, that can be either too local (missing context) or too global (inaccurate boundaries). Therefore we propose a data-driven pooling layer that integrates with fully convolutional architectures and utilizes boundary detection from RGBD image segmentation approaches. We extend our approach to leverage region-level correspondences across images with an additional temporal pooling stage. We evaluate our approach on the NYU-Depth-V2 dataset comprised of indoor RGBD video sequences and compare it to various state-of-the-art baselines. Besides a general improvement over the state-of-the-art, our approach shows particularly good results in terms of accuracy of the predicted boundaries and in segmenting previously problematic classes

    A Kinect Based Indoor Navigation System for the Blind

    Full text link
    Team NAVIGATE aims to create a robust, portable navigational aid for the blind. Our prototype uses depth data from the Microsoft Kinect to perform realtime obstacle avoidance in unfamiliar indoor environments. The device augments the white cane by performing two signi cant functions: detecting overhanging objects and identifying stairs. Based on interviews with blind individuals, we found a combined audio and haptic feedback system best for communicating environmental information. Our prototype uses vibration motors to indicate the presence of an obstacle and an auditory command to alert the user to stairs ahead. Through multiple trials with sighted and blind participants, the device was successful in detecting overhanging objects and approaching stairs. The device increased user competency and adaptability across all trials

    Diffuse2Adapt: Controlled Diffusion for Synthetic-to-Real Domain Adaptation

    No full text
    Synthetic data generated from graphics engines has been shown to be effective for learning, while also being a cost-effective alternative to annotating data. However, models trained on synthetic data often face a drop in performance when evaluated on real data due to the synthetic-to-real domain gap. Unsupervised domain adaptation (UDA) techniques attempt to leverage a set of unlabeled target data in various ways for bridging this domain gap. Recently, conditional image generation models such as Stable Diffusion have shown impressive results in generating realistic images from text and image inputs. In this study, we investigate the utility of Stable Diffusion for translating the synthetic images to the target domain for synthetic-to-real UDA. The translated images must accurately represent the class semantics of source domain data while also exhibiting properties of the target domain. We investigate various strategies to leverage the unlabeled target domain data with Stable Diffusion to guide the generation towards the target distribution

    Diffuse2Adapt: Controlled Diffusion for Synthetic-to-Real Domain Adaptation

    No full text
    Synthetic data generated from graphics engines has been shown to be effective for learning, while also being a cost-effective alternative to annotating data. However, models trained on synthetic data often face a drop in performance when evaluated on real data due to the synthetic-to-real domain gap. Unsupervised domain adaptation (UDA) techniques attempt to leverage a set of unlabeled target data in various ways for bridging this domain gap. Recently, conditional image generation models such as Stable Diffusion have shown impressive results in generating realistic images from text and image inputs. In this study, we investigate the utility of Stable Diffusion for translating the synthetic images to the target domain for synthetic-to-real UDA. The translated images must accurately represent the class semantics of source domain data while also exhibiting properties of the target domain. We investigate various strategies to leverage the unlabeled target domain data with Stable Diffusion to guide the generation towards the target distribution

    View-Invariance in Visual Human Motion Analysis

    Full text link
    This thesis makes contributions towards the solutions to two problems in the area of visual human motion analysis: human action recognition and human body pose estimation. Although there has been a substantial amount of research addressing these two problems in the past, the important issue of viewpoint invariance in the representation and recognition of poses and actions has received relatively scarce attention, and forms a key goal of this thesis. Drawing on results from 2D projective invariance theory and 3D mutual invariants, we present three different approaches of varying degrees of generality, for human action representation and recognition. A detailed analysis of the approaches reveals key challenges, which are circumvented by enforcing spatial and temporal coherency constraints. An extensive performance evaluation of the approaches on 2D projections of motion capture data and manually segmented real image sequences demonstrates that in addition to viewpoint changes, the approaches are able to handle well, varying speeds of execution of actions (and hence different frame rates of the video), different subjects and minor variabilities in the spatiotemporal dynamics of the action. Next, we present a method for recovering the body-centric coordinates of key joints and parts of a canonically scaled human body, given an image of the body and the point correspondences of specific body joints in an image. This problem is difficult to solve because of body articulation and perspective effects. To make the problem tractable, previous researchers have resorted to restricting the camera model or requiring an unrealistic number of point correspondences, both of which are more restrictive than necessary. We present a solution for the general case of a perspective uncalibrated camera. Our method requires that the torso does not twist considerably, an assumption that is usually satisfied for many poses of the body. We evaluate the quantitative performance of the method on synthetic data and the qualitative performance of the method on real images taken with unknown cameras and viewpoints. Both these evaluations show the effectiveness of the method at recovering the pose of the human body

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore