1,720,963 research outputs found
J-MOD<sup>2</sup>: Joint Monocular Obstacle Detection and Depth Estimation
In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multi task architectures to both perform scene understanding and depth estimation. We follow their track and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to exploit the joint information of the obstacle detection task, that produces more reliable bounding boxes, with the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD 2. We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multi task methods that jointly perform semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments where a MAV explores an unknown scenario and plans safe trajectories by using our detection model
Evaluation of non-geometric methods for visual odometry
Visual Odometry (VO) is one of the fundamental building blocks of modern autonomous robot navigation
and mapping. While most state-of-the-art techniques use geometrical methods for camera ego-motion
estimation from optical flow vectors, in the last few years learning approaches have been proposed to
solve this problem. These approaches are emerging and there is still much to explore. This work follows
this track applying Kernel Machines to monocular visual ego-motion estimation. Unlike geometrical
methods, learning-based approaches to monocular visual odometry allow issues like scale estimation and
camera calibration to be overcome, assuming the availability of training data. While some previous works
have proposed learning paradigms to VO, to our knowledge no extensive evaluation of applying kernelbased
methods to Visual Odometry has been conducted. To fill this gap, in this work we consider publicly
available datasets and perform several experiments in order to set a comparison baseline with traditional
techniques. Experimental results show good performances of learning algorithms and set them as a solid
alternative to the computationally intensive and complex to implement geometrical techniques
A transfer learning approach for multi-cue semantic place recognition
As researchers are continuously striving for developing robotic systems able to move into the ’the wild’, the interest towards novel learning paradigms for domain adaptation has increased. In the specific application of semantic place recognition from cameras, supervised learning algorithms are typically adopted. However, once learning have been performed, if the robot is moved to another location, the acquired knowledge may be not useful, as the novel scenario can be very different from the old one. The obvious solution would be to retrain the model updating the robot internal representation of the environment. Unfortunately this procedure involves a very time consuming data-labeling effort at the human side. To avoid these issues, in this paper we propose a novel transfer learning approach for place categorization from visual cues. With our method the robot is able to decide automatically if and how much its internal knowledge is useful in the novel scenario. Differently from previous approaches, we consider the situation where the old and the novel scenario may differ significantly (not only the visual room appearance changes but also different room categories are present). Importantly, our approach does not require labeling from a human operator. We also propose a strategy for improving the performance of the proposed method optimally fusing two complementary visual cues. Our extensive experimental evaluation demonstrates the advantages of our approach on several sequences from publicly available datasets
SmartSEAL: A ROS based home automation framework for heterogeneous devices interconnection in smart buildings
Transferring knowledge across robots: A risk sensitive approach
One of the most impressive characteristics of human perception is its domain adaptation capability. Humans can recognize objects and places simply by transferring knowledge from their past experience. Inspired by that, current research in robotics is addressing a great challenge: building robots able to sense and interpret the surrounding world by reusing information previously collected, gathered by other robots or obtained from the web. But, how can a robot automatically understand what is useful among a large amount of information and perform knowledge transfer? In this paper we address the domain adaptation problem in the context of visual place recognition. We consider the scenario where a robot equipped with a monocular camera explores a new environment. In this situation traditional approaches based on supervised learning perform poorly, as no annotated data are provided in the new environment and the models learned from data collected in other places are inappropriate due to the large variability of visual information. To overcome these problems we introduce a novel transfer learning approach. With our algorithm the robot is given only some training data (annotated images collected in different environments by other robots) and is able to decide whether, and how much, this knowledge is useful in the current scenario. At the base of our approach there is a transfer risk measure which quantifies the similarity between the given and the new visual data. To improve the performance, we also extend our framework to take into account multiple visual cues. Our experiments on three publicly available datasets demonstrate the effectiveness of the proposed approach
Visual-inertial Tracking on Android for Augmented Reality Applications
Augmented Reality (AR) aims to enhance a person’s vision of the real world with useful information about the surrounding environment. Amongst all the possible applications, AR systems can be very useful as visualization tools for structural and environmental monitoring. While the large majority of AR systems run on a laptop or on a head-mounted device, the advent of smartphones have created new opportunities. One of the most important functionality of an AR system is the ability of the device to self localize. This can be achieved through visual odometry, a very challenging task for smartphone. Indeed, on most of the available smartphone AR applications, self localization is achieved through GPS and/or inertial sensors. Hence, developing an AR system on a mobile phone also poses new challenges due to the limited amount of computational resources. In this paper we describe the development of a egomotion estimation algorithm for an Android smartphone. We also present an approach based on an Extended Kalman Filter for improving localization accuracy integrating the information from inertial sensors. The implemented solution achieves a localization accuracy comparable to the PC implementation while running on an Android device
Robust visual semi-semantic loop closure detection by a covisibility graph and CNN features
Visual Self-localization in unknown environments is a crucial capability for an autonomous robot. Real life scenarios often present critical challenges for autonomous vision-based localization, such as robustness to viewpoint and appearance changes. To address these issues, this paper proposes a novel strategy that models the visual scene by preserving its geometric and semantic structure and, at the same time, improves appearance invariance through a robust visual representation. Our method relies on high level visual landmarks consisting of appearance invariant descriptors that are extracted by a pre-trained Convolutional Neural Network (CNN) on the basis of image patches. In addition, during the exploration, the landmarks are organized by building an incremental covisibility graph that, at query time, is exploited to retrieve candidate matching locations improving the robustness in terms of viewpoint invariance. In this respect, through the covisibility graph, the algorithm finds, more effectively, location similarities by exploiting the structure of the scene that, in turn, allows the construction of virtual locations i.e., artificially augmented views from a real location that are useful to enhance the loop closure ability of the robot. The proposed approach has been deeply analysed and tested in different challenging scenarios taken from public datasets. The approach has also been compared with a state-of-the-art visual navigation algorithm
Full-GRU Natural Language Video Description for Service Robotics Applications
Enabling effective human-robot interaction is crucial for any service robotics application. In this context, a fundamental aspect is the development of a user-friendly human-robot interface, such as a natural language interface. In this letter, we investigate the robot side of the interface, in particular the ability to generate natural language descriptions for the scene it observes.We achieve this capability via a deep recurrent neural network architecture completely based on the gated recurrent unit paradigm. The robot is able to generate complete sentences describing the scene, dealing with the hierarchical nature of the temporal information contained in image sequences. The proposed approach has fewer parameters than previous state-of-the-art architectures, thus it is faster to train and smaller in memory occupancy. These benefits do not affect the prediction performance. In fact, we show that our method outperforms or is comparable to previous approaches in terms of quantitative metrics and qualitative evaluation when tested on benchmark publicly available datasets and on a new dataset we introduce in this letter
- …
