1,720,977 research outputs found

    Semiautomatic Labeling for Deep Learning in Robotics

    Full text link
    In this article, we propose an augmented reality semiautomatic labeling (ARS), a semiautomatic method which leverages on moving a 2-D camera by means of a robot, proving precise camera tracking, and an augmented reality pen (ARP) to define initial object bounding box, to create large labeled data sets with minimal human intervention. By removing the burden of generating annotated data from humans, we make the deep learning technique applied to computer vision, which typically requires very large data sets, truly automated and reliable. With the ARS pipeline, we created two novel data sets effortlessly, one on electromechanical components (industrial scenario) and other on fruits (daily-living scenario) and trained two state-of-the-art object detectors robustly, based on convolutional neural networks, such as you only look once (YOLO) and single shot detector (SSD). With respect to conventional manual annotation of 1000 frames that takes us slightly more than 10 h, the proposed approach based on ARS allows to annotate 9 sequences of about 35,000 frames in less than 1 h, with a gain factor of about 450. Moreover, both the precision and recall of object detection is increased by about 15% with respect to manual labeling. All our software is available as a robot operating system (ROS) package in a public repository alongside with the novel annotated data sets

    NeRF-Supervised Deep Stereo

    Full text link
    We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth. By leveraging state-of-the-art neural rendering solutions, we generate stereo training data from image sequences collected with a single handheld camera. On top of them, a NeRF-supervised training procedure is carried out, from which we exploit rendered stereo triplets to compensate for occlusions and depth maps as proxy labels. This results in stereo networks capable of predicting sharp and detailed disparity maps. Experimental results show that models trained under this regime yield a 30-40% improvement over existing self-supervised methods on the challenging Middle-bury dataset, filling the gap to supervised models and, most times, outperforming them at zero-shot generalization

    Integration of Robotic Vision and Tactile Sensing for Wire-Terminal Insertion Tasks

    Full text link
    This paper reports the development of a manipulation system for electric wires, implemented by means of a commercial gripper installed on an industrial manipulator and equipped with cameras and suitably designed tactile sensors. The purpose of this system is the execution of wire insertion on commercial electromechanical components. The synergy between computer vision and tactile sensing is necessary because, in a real environment, the tight spaces very often prevent the possibility to use the vision system, also when the same task is performed by a human being. A novel technique to speed up the generation of training data sets for convolutional neural networks (CNNs) is proposed. Therefore, this technique is used to train a CNN in order to detect small objects (such as wire terminals). Moreover, aiming to prevent faults during the task and to interact with the environment safely, several machine learning approaches are used to produce an affordable output from the tactile sensor. The proposed approach shows how a cheap sensor embedded with suitable intelligence can provide information comparable to a more expensive force sensor

    SkiMap: An efficient mapping framework for robot navigation

    No full text
    We present a novel mapping framework for robot navigation which features a multi-level querying system capable to obtain rapidly representations as diverse as a 3D voxel grid, a 2.5D height map and a 2D occupancy grid. These are inherently embedded into a memory and time efficient core data structure organized as a Tree of SkipLists. Compared to the well-known Octree representation, our approach exhibits a better time efficiency, thanks to its simple and highly parallelizable computational structure, and a similar memory footprint when mapping large workspaces. Peculiarly within the realm of mapping for robot navigation, our framework supports real-time erosion and re-integration of measurements upon reception of optimized poses from the sensor tracker, so as to improve continuously the accuracy of the map

    RobotFusion: Grasping with a robotic manipulator via multi-view reconstruction

    No full text
    We propose a complete system for 3D object reconstruction and grasping based on an articulated robotic manipulator. We deploy an RGB-D sensor as an end effector placed directly on the robotic arm, and process the acquired data to perform multi-view 3D reconstruction and object grasping. We leverage the high repeatability of the robotic arm to estimate 3D camera poses with millimeter accuracy and control each of the six sensor’s DOF in a dexterous workspace. Thereby, we can estimate camera poses directly by robot kinematics and deploy a Truncated Signed Distance Function (TSDF) to accurately fuse multiple views into a unified 3D reconstruction of the scene. Then, we propose an efficient approach to segment the sought objects out of a planar workbench as well as a novel algorithm to automatically estimate grasping points

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects

    Full text link
    In this paper, we focus on the problem of rendering novel views from a Neural Radiance Field (NeRF) under unobserved light conditions. To this end, we introduce a novel dataset, dubbed ReNe (Relighting NeRF), framing real world objects under one-light-at-time (OLAT) conditions, annotated with accurate ground-truth camera and light poses. Our acquisition pipeline leverages two robotic arms holding, respectively, a camera and an omni-directional point-wise light source. We release a total of 20 scenes depicting a variety of objects with complex geometry and challenging materials. Each scene includes 2000 images, acquired from 50 different points of views under 40 different OLAT conditions. By leveraging the dataset, we perform an ablation study on the relighting capability of variants of the vanilla NeRF architecture and identify a lightweight architecture that can render novel views of an object under novel light conditions, which we use to establish a non-trivial baseline for the dataset. Dataset and benchmark are available at https://eyecan-ai.github.io/rene

    Physical-consistent behavior embodied in B-spline curves for robot path planning

    Full text link
    In this paper, a path planning algorithm for smooth obstacle avoidance of mobile robots is presented and discussed. The idea is based on B-splines, which are trajectories defined by a linear combination of basis functions weighted by constants called control points. Since the overall curve depends locally only on the position of a limited set of control points, B-splines are ideal tools for generating online reference trajectories able to react to local changes of the environment (e.g. mobile or spawning obstacles). The via-points of the reference B-spline are treated as dynamic agents with customized dynamic properties. These agents interact with each other and with the environment in such a way to generate the correct reference for the B-spline generator and thus for the robot. The high flexibility and the low computational burden of the proposed algorithm allow to develop applications in partially unknown or dynamically changing environments since a local modification of the trajectory does not require an entire re-computation. Experimental results performed with a KUKA Youboot mobile robot within a ROS based set-up are reported

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore