1,721,238 research outputs found
STD2P: RGBD semantic segmentation using spatio-temporal data-driven pooling
Beyond the success in classification, neural networks have recently shown strong results on pixel-wise prediction tasks like image semantic segmentation on RGBD data. However, the commonly used deconvolutional layers for upsampling intermediate representations to the full-resolution output still show different failure modes, like imprecise segmentation boundaries and label mistakes in particular on large, weakly textured objects (e.g. fridge, whiteboard, door). We attribute these errors in part to the rigid way, current network aggregate information, that can be either too local (missing context) or too global (inaccurate boundaries). Therefore we propose a data-driven pooling layer that integrates with fully convolutional architectures and utilizes boundary detection from RGBD image segmentation approaches. We extend our approach to leverage region-level correspondences across images with an additional temporal pooling stage. We evaluate our approach on the NYU-Depth-V2 dataset comprised of indoor RGBD video sequences and compare it to various state-of-the-art baselines. Besides a general improvement over the state-of-the-art, our approach shows particularly good results in terms of accuracy of the predicted boundaries and in segmenting previously problematic classes
A Kinect Based Indoor Navigation System for the Blind
Team NAVIGATE aims to create a robust, portable navigational aid for the
blind. Our prototype uses depth data from the Microsoft Kinect to perform realtime
obstacle avoidance in unfamiliar indoor environments. The device augments
the white cane by performing two signi cant functions: detecting overhanging objects
and identifying stairs. Based on interviews with blind individuals, we found a
combined audio and haptic feedback system best for communicating environmental
information. Our prototype uses vibration motors to indicate the presence of an
obstacle and an auditory command to alert the user to stairs ahead. Through multiple
trials with sighted and blind participants, the device was successful in detecting
overhanging objects and approaching stairs. The device increased user competency
and adaptability across all trials
Diffuse2Adapt: Controlled Diffusion for Synthetic-to-Real Domain Adaptation
Synthetic data generated from graphics engines has been shown to be effective for learning, while also being a cost-effective alternative to annotating data. However, models trained on synthetic data often face a drop in performance when evaluated on real data due to the synthetic-to-real domain gap. Unsupervised domain adaptation (UDA) techniques attempt to leverage a set of unlabeled target data in various ways for bridging this domain gap. Recently, conditional image generation models such as Stable Diffusion have shown impressive results in generating realistic images from text and image inputs. In this study, we investigate the utility of Stable Diffusion for translating the synthetic images to the target domain for synthetic-to-real UDA. The translated images must accurately represent the class semantics of source domain data while also exhibiting properties of the target domain. We investigate various
strategies to leverage the unlabeled target domain data with Stable Diffusion to guide the generation towards the target distribution
Diffuse2Adapt: Controlled Diffusion for Synthetic-to-Real Domain Adaptation
Synthetic data generated from graphics engines has been shown to be effective for learning, while also being a cost-effective alternative to annotating data. However, models trained on synthetic data often face a drop in performance when evaluated on real data due to the synthetic-to-real domain gap. Unsupervised domain adaptation (UDA) techniques attempt to leverage a set of unlabeled target data in various ways for bridging this domain gap. Recently, conditional image generation models such as Stable Diffusion have shown impressive results in generating realistic images from text and image inputs. In this study, we investigate the utility of Stable Diffusion for translating the synthetic images to the target domain for synthetic-to-real UDA. The translated images must accurately represent the class semantics of source domain data while also exhibiting properties of the target domain. We investigate various
strategies to leverage the unlabeled target domain data with Stable Diffusion to guide the generation towards the target distribution
View-Invariance in Visual Human Motion Analysis
This thesis makes contributions towards the solutions to
two problems in the area of visual human motion
analysis: human action recognition and human body pose
estimation. Although there has been a substantial
amount of research addressing these two problems in the
past, the important issue of viewpoint invariance in
the representation and recognition of poses and actions
has received relatively scarce attention, and forms a
key goal of this thesis.
Drawing on results from 2D projective invariance theory
and 3D mutual invariants, we present three different
approaches of varying degrees of generality, for human
action representation and recognition. A detailed
analysis of the approaches reveals key challenges,
which are circumvented by enforcing spatial and
temporal coherency constraints. An extensive
performance evaluation of the approaches on 2D
projections of motion capture data and manually
segmented real image sequences demonstrates that in
addition to viewpoint changes, the approaches are able
to handle well, varying speeds of execution of actions
(and hence different frame rates of the video),
different subjects and minor variabilities in the
spatiotemporal dynamics of the action.
Next, we present a method for recovering the
body-centric coordinates of key joints and parts of a
canonically scaled human body, given an image of the
body and the point correspondences of specific body
joints in an image. This problem is difficult to solve
because of body articulation and perspective effects.
To make the problem tractable, previous researchers
have resorted to restricting the camera model or
requiring an unrealistic number of point
correspondences, both of which are more restrictive
than necessary. We present a solution for the general
case of a perspective uncalibrated camera. Our method
requires that the torso does not twist considerably, an
assumption that is usually satisfied for many poses of
the body. We evaluate the quantitative performance of
the method on synthetic data and the qualitative
performance of the method on real images taken with
unknown cameras and viewpoints. Both these evaluations
show the effectiveness of the method at recovering the
pose of the human body
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
