1,721,007 research outputs found
Robust perception of humans for mobile robots RGB-depth algorithms for people tracking, re-identification and action recognition
Human perception is one of the most important skills for a mobile robot sharing its workspace with humans.
This is not only true for navigation, because people have to be avoided differently than other obstacles, but also because mobile robots must be able to truly interact with humans.
In a near future, we can imagine that robots will be more and more present in every house and will perform services useful to the well-being of humans.
For this purpose, robust people tracking algorithms must be exploited and person re-identification techniques play an important role for allowing robots to recognize a person after a full occlusion or after long periods of time.
Moreover, they must be able to recognize what humans are doing, in order to react accordingly, helping them if needed or also learning from them.
This thesis tackles these problems by proposing approaches which combine algorithms based on both RGB and depth information which can be obtained with recently introduced consumer RGB-D sensors.
Our key contribution to people detection and tracking research is a depth-clustering method which allows to apply a robust image-based people detector only to a small subset of possible detection windows, thus decreasing the number of false detections while reaching high computational efficiency.
We also advance person re-identification research by proposing two techniques exploiting depth-based skeletal tracking algorithms: one is targeted to short-term re-identification and creates a compact, yet discrimative signature of people based on computing features at skeleton keypoints, which are highly repeatable and semantically meaningful; the other extract long-term features, such as 3D shape, to compare people by matching the corresponding 3D point cloud acquired with a RGB-D sensor. In order to account for the fact that people are articulated and not rigid objects, it exploits 3D skeleton information for warping people point clouds to a standard pose, thus making them directly comparable by means of least square fitting.
Finally, we describe an extension of flow-based action recognition methods to the RGB-D domain which computes motion over time of persons' 3D points by exploiting joint color and depth information and recognizes human actions by classifying gridded descriptors of 3D flow.
A further contribution of this thesis is the creation of a number of new RGB-D datasets which allow to compare different algorithms on data acquired by consumer RGB-D sensors. All these datasets have been publically released in order to foster research in these fields
A multi-viewpoint feature-based re-identification system driven by skeleton keypoints
Thanks to the increasing popularity of 3D sensors, robotic vision has experienced huge improvements in a wide range of applications and systems in the last years. Besides the many benefits, this migration caused some incompatibilities with those systems that cannot be based on range sensors, like intelligent video surveillance systems, since the two kinds of sensor data lead to different representations of people and objects. This work goes in the direction of bridging the gap, and presents a novel re-identification system that takes advantage of multiple video flows in order to enhance the performance of a skeletal tracking algorithm, which is in turn exploited for driving the re-identification. A new, geometry-based method for joining together the detections provided by the skeletal tracker from multiple video flows is introduced, which is capable of dealing with many people in the scene, coping with the errors introduced in each view by the skeletal tracker. Such method has a high degree of generality, and can be applied to any kind of body pose estimation algorithm. The system was tested on a public dataset for video surveillance applications, demonstrating the improvements achieved by the multi-viewpoint approach in the accuracy of both body pose estimation and re-identification. The proposed approach was also compared with a skeletal tracking system working on 3D data: the comparison assessed the good performance level of the multi-viewpoint approach. This means that the lack of the rich information provided by 3D sensors can be compensated by the availability of more than one viewpoint
Fast RGB-D people tracking for service robots
Service robots have to robustly follow and interact with humans. In this paper, we propose a very fast multi-people tracking algorithm designed to be applied on mobile service robots. Our approach exploits RGB-D data and can run in real-time at very high frame rate on a standard laptop without the need for a GPU implementation. It also features a novel depth-based sub-clustering method which allows to detect people within groups or even standing near walls. Moreover, for limiting drifts and track ID switches, an online learning appearance classifier is proposed featuring a three-term joint likelihood. We compared the performances of our system with a number of state-of-the-art tracking algorithms on two public datasets acquired with three static Kinects and a moving stereo pair, respectively. In order to validate the 3D accuracy of our system, we created a new dataset in which RGB-D data are acquired by a moving robot. We made publicly available this dataset which is not only annotated by hand, but the ground-truth position of people and robot are acquired with a motion capture system in order to evaluate tracking accuracy and precision in 3D coordinates. Results of experiments on these datasets are presented, showing that, even without the need for a GPU, our approach achieves state-of-the-art accuracy and superior speed
OpenPTrack: Open Source Multi-Camera Calibration and People Tracking for RGB-D Camera Networks
OpenPTrack is an open source software for multi-camera calibration and people tracking in RGB-D camera networks. It allows to track people in big volumes at sensor frame rate and currently supports a heterogeneous set of 3D sensors.
In this work, we describe its user-friendly calibration procedure, which consists of simple steps with real-time feedback that allow to obtain accurate results in estimating the camera poses that are then used for tracking people. On top of a calibration based on moving a checkerboard within the tracking space and on a global optimization of cameras and checkerboards poses, a novel procedure which aligns people detections coming from all sensors in a x-y-time space is used for refining camera poses.
While people detection is executed locally, in the machines connected to each sensor, tracking is performed by a single node which takes into account detections from all over the network. Here we detail how a cascade of algorithms working on depth point clouds and color, infrared and disparity images is used to perform people detection from different types of sensors and in any indoor light condition.
We present experiments showing that a considerable improvement can be obtained with the proposed calibration refinement procedure that exploits people detections and we compare Kinect v1, Kinect v2 and Mesa SR4500 performance for people tracking applications. OpenPTrack is based on the Robot Operating System and the Point Cloud Library and has already been adopted in networks composed of up to ten imagers for interactive arts, education, culture and human–robot interaction applications
Cost-efficient RGB-D smart camera for people detection and tracking
We describe a software library we developed for efficiently using the Kinect v2, a time-of-flight RGB-D sensor, with an embedded system, the NVidia Jetson TK1, as a cost-efficient RGB-D smart camera for people detection and tracking. The speed-up needed for achieving real-time operation has been obtained using NVidia CUDA to concurrently generate and process the raw depth and infrared data and to create the three-dimensional point cloud. This library has been released as open source and the smart camera has been tested in real-world scenarios, as a people-detection node in an open-source multinode RGB-D tracking system (OpenPTrack) and onboard a service robot for endowing it with robust people-following capabilities. Moreover, we show that nonembedded computers also can benefit from our library in terms of people-detection frame rate
Towards Cooperative People Re-Identification between 3D Sensors and 2D Camera Networks
Thanks to the increasing popularity of 3D sensors, robotics vision has experienced huge improvements in a wide range of applications and systems in the last years. Besides the many benets, this migration caused some incompatibilities with those systems that cannot be based on range sensors, like intelligent video surveillance systems, since the two kinds of sensor data lead to dierent representations of people and objects. This work goes in the direction of bridging the gap, and presents a novel re-identication system that takes advantage of multiple video flows in order to enhance the performance of a skeletal tracking algorithm, which is in turn exploited for driving the re-identication. A new, geometry-based method for joining together the detections provided by the skeletal tracker from multiple video flows is introduced: it is capable of dealing with many people in the scene, and of rejecting the errors introduced in each view by the skeletal tracker. Such method has a high degree of generality, and it can be applied to any body pose estimation algorithm. The system was tested on a public dataset for video surveillance applications, demonstrating the improvements achieved by the multi-viewpoint approach in the accuracy of both the body pose estimation and the re-identication. This means that the lack of the rich information provided by 3D sensors can be compensated by the availability of more than one viewpoint
An evaluation of 3D motion flow and 3D pose estimation for human action recognition
Modern human action recognition algorithms which
exploit 3D information mainly classify video sequences by extract-
ing local or global features from the RGB-D domain or classifying
the skeleton information provided by a skeletal tracker. In this
paper, we propose a comparison between two techniques which
share the same classification process, while differing in the type of
descriptor which is classified. The former exploits an improved
version of a recently proposed approach for 3D motion flow
estimation from colored point clouds, while the latter relies on the
estimated skeleton joints positions. We compare these methods
on a newly created dataset for RGB-D human action recognition
which contains 15 actions performed by 12 different people
Skeleton estimation and tracking by means of depth data fusion from depth camera networks
Scene specific people detection by simple human interaction
This paper proposes a generic procedure for training a scene specific people detector by exploiting simple human interaction. This technique works for any kind of scene im- aged by a static camera and allows to considerably increase the performances of an appearance-based people detector. The user is requested to validate the results of a basic detec- tor relying on background subtraction and proportions con- straints. From this simple supervision it is possible to select new scene specific examples that can be used for retraining the people detector used in the testing phase. These new ex- amples have the benefit of adapting the classifier to the par- ticular scene imaged by the camera, improving the detec- tion for that particular viewpoint, background, and image resolution. At the same time, positions and scales, where people can be found, are learnt, thus allowing to consider- ably reduce the number of windows that have to be scanned in the detection phase. Experimental results are presented on three different scenarios, showing an improved detection accuracy and a reduced number of false positives even when the ground plane assumption does not hold
- …
