1,721,069 research outputs found
Light deep learning models enriched with Entangled features for RGB-D semantic segmentation
Semantic segmentation is a crucial task in emerging robotic applications like autonomous driving and social robotics. State-of-the-art methods in this field rely on deep learning, with several works in the literature following the trend of using larger networks to achieve higher performance. However, this leads to greater model complexity and higher computational costs, which make it difficult to integrate such models on mobile robots. In this work we investigate how it is possible to obtain lighter performing deep models introducing additional data at a very low computational cost, instead of increasing the network complexity. We consider the features used in the 3D Entangled Forests algorithm, proposing different strategies to integrate such additional information into different deep networks. The new features allow to obtain lighter and performing segmentation models, either by shrinking the network size or improving existing networks proposed for real-time segmentation. Such result represents an interesting alternative in mobile robotics application, where computational power and energy are limited
Boat hunting with semantic segmentation for flexible and autonomous manufacturing
Customized mass production of boats and other vehicles requires highly complex manufacturing processes that need a high amount of automation. To enhance the efficiency of such systems, sensing is of paramount importance to provide robots with detailed information about the working environment. In this paper, we propose the use of semantic segmentation to detect the key elements involved in production, to boost automation in the production process. Our main focus is on the sanding process of these tools by means of a robot. We demonstrate the potential of these techniques in an industrial environment featuring a lower degree of variability with respect to the domestic scenes typically considered in the literature. In the production environment, however, higher performances are required to address challenging manufacturing operations successfully. In this work, we also show that exploiting contextual cues and multiple points of view can further boost the reliability of our system, which provides useful data to the other robot modules in charge of navigation, work station recognition, and other tasks. All the methods have been thoroughly validated on the IASLAB RGB-D COROMA Dataset, that was created on purpose. It consists of 46589 RGB-D frames, whose annotation was speeded up thanks to our optimized annotation pipeline
Deep features for training support vector machines
Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convo-lutional neural networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to support vector machines that are then combined by sum rule. Several dimensionality reduction techniques were tested for reducing the high dimensionality of the inner layers so that they can work with SVMs. The empirically derived generic vision system based on applying a discrete cosine transform (DCT) separately to each channel is shown to significantly boost the performance of standard CNNs across a large and diverse collection of image data sets. In addition, an ensemble of different topologies taking the same DCT approach and combined with global mean thresholding pooling obtained state-of-the-art results on a benchmark image virus data set
Ensemble of convolutional neural networks trained with different activation functions
Activation functions play a vital role in the training of Convolutional Neural Networks. For this reason, developing efficient and well-performing functions is a crucial problem in the deep learning community. The idea of these approaches is to allow a reliable parameter learning, avoiding vanishing gradient problems. The goal of this work is to propose an ensemble of Convolutional Neural Networks trained using several different activation functions. Moreover, a novel activation function is here proposed for the first time. Our aim is to improve the performance of Convolutional Neural Networks in small/medium sized biomedical datasets. Our results clearly show that the proposed ensemble outperforms Convolutional Neural Networks trained with a standard ReLU as activation function. The proposed ensemble outperforms with a p-value of 0.01 each tested stand-alone activation function; for reliable performance comparison we tested our approach on more than 10 datasets, using two well-known Convolutional Neural Networks: Vgg16 and ResNet50
Enhancing semantic segmentation with detection priors and iterated graph cuts for robotics
To foster human–robot interaction, autonomous robots need to understand the environment in which they operate. In this context, one of the main challenges is semantic segmentation, together with the recognition of important objects, which can aid robots during exploration, as well as when planning new actions and interacting with the environment. In this study, we extend a multi-view semantic segmentation system based on 3D Entangled Forests (3DEF) by integrating and refining two object detectors, Mask R-CNN and You Only Look Once (YOLO), with Bayesian fusion and iterated graph cuts. The new system takes the best of its components, successfully exploiting both 2D and 3D data. Our experiments show that our approach is competitive with the state-of-the-art and leads to accurate semantic segmentations
Multi-Camera Hand-Eye Calibration for Human-Robot Collaboration in Industrial Robotic Workcells
In industrial scenarios, effective human-robot collaboration relies on multi-camera systems to robustly monitor human operators despite the occlusions that typically show up in a robotic workcell. In this scenario, precise localization of the person in the robot coordinate system is essential, making the hand-eye calibration of the camera network critical. This process presents significant challenges when high calibration accuracy should be achieved in short time to minimize production downtime, and when dealing with extensive camera networks used for monitoring wide areas, such as industrial robotic workcells. Our paper introduces an innovative and robust multi-camera hand-eye calibration method, designed to optimize each camera's pose relative to both the robot's base and to each other camera. This optimization integrates two types of key constraints: i) a single board-to-end-effector transformation, and ii) the relative camera-to-camera transformations. We demonstrate the superior performance of our method through comprehensive experiments employing the METRIC dataset and real-world data collected on industrial scenarios, showing notable advancements over state-of-the-art techniques even using less than 10 images
Performance Evaluation of Depth Completion Neural Networks for Various RGB-D Camera Technologies in Indoor Scenarios
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
