1,721,076 research outputs found
Visual-SLAM for Humanoid Robots
In robotics the Simultaneous Localization and Mapping (SLAM) is the problem in which an autonomous robots acquires a map of the surrounding environment
while at the same time localizes itself inside this map. In the last years a lot of researchers have spent a great effort in developing new families of algorithms, using several sensors and robotic platforms.
One of the most challenging field of research in SLAM is the so called Visual-SLAM problem, in which various types of cameras are used as sensor for the navigation. Cameras are inexpensive sensors and can provide rich information about the surrounding environment, on the other hand the complexity of the computer vision tasks and the strong dependence on the characteristics of the environment in current approaches makes the Visual-SLAM far to be considered
a closed problem.
Most of the SLAM algorithm are usually tested on wheeled robot. These platforms have become robust and stable, on the other hand the research in robot design moves toward a new family of robot platforms, the humanoid robots. Just like humans, a humanoid robot can adapt itself to changes in the environment in order to efficiently reach its goals.
Despite that, only a few roboticists focused theirs research on stable implementation of SLAM and Visual SLAM algorithms well suited for humanoid robots.
Humanoid platforms raise issues which can compromise the stability of the conventional navigation algorithms, especially for vision-based approaches. A humanoid robot can move in 3D without the usual planar motion assumption that constraint the movement in 2D, usually with quick and complex movements combined with unpredictable vibrations, compromising the reliability of the acquired sensors data, for example introducing in the images grabbed by the camera an undesired motion blur effect. Due to the strong balance constraints, a humanoid robot usually can’t be equipped with powerfull but hefty computer boards: this
limits the implementation of complex and computational expensive algorithms.
Moreover, unlike wheeled robots, its complex kinematics usually forbids a reliable reconstruction of the motion from the servo-motor encoders.
In this thesis, we focus on studying and developing new techniques addressing the Visual-SLAM problem, with particular attention to the issues related to using as experimental platform small humanoid robots equipped with a single perspective camera.
The main efforts in SLAM and Visual SLAM research areas have been put into the estimation functionality. However, most of the functionalities involved in Visual SLAM are in perception processes. In this thesis we therefore focus on the improvement of the perceptual processes, from a computer vision point-of-view.
We faced small humanoid robot related issues like low-computational capability, the low quality of the sensor data and the high degrees of freedom of the motion. We cope with the low computational resources presenting a new similarity measure for images based on a compact signature to be used in image-based topological SLAM problem. The motion blur problem is faced proposing a new
feature detection and tracking scheme that is robust even to non-uniform motion blur. We develop a framework for visual odometry based on features robust to motion blur.
We finally propose an homography-based approach to 3D visual SLAM, using the information provided by a single camera mounted on a humanoid robot, based on the assumption that the robot moves on a planar environment.
All proposed methods have been validated with experiments and comparative validation using both standard datasets and images taken by the cameras mounted on walking small humanoid robots.Nell’ambito della robotica, il Simultaneous Localization and Mapping (SLAM) é
il processo grazie al quale un robot autonomo é in grado di creare una mappa dell’ambiente circostante e allo stesso tempo di localizzarsi avvalendosi di tale mappa. Negli ultimi anni un considerevole numero di ricercatori ha sviluppato nuove famiglie di algoritmi di SLAM, basati su vari sensori e utilizzando varie piattaforme robotiche.
Uno degli ambiti più complessi nella ricerca sullo SLAM é il cosiddetto Visual-SLAM, che prevede l’utilizzo di vari tipi di telecamera come sensore per la navigazione. Le telecamere sono sensori economici che raccolgono molte informazioni sull’ambiente circostante. D’altro canto, la complessità degli algoritmi di visione artificiale e la forte dipendenza degli approcci attualmente realizzati dalle caratteristiche dell’ambiente, rendono il Visual-SLAM un problema lontano dal poter essere considerato risolto.
Molti degli algoritmi di SLAM sono solitamente testati usando robot dotati
di ruote. Sebbene tali piattaforme siano ormai robuste e stabili, la ricerca sulla progettazione di nuove piattaforme robotiche sta in parte migrando verso la robotica umanoide. Proprio come gli esseri umani, i robot umanoidi sono in grado di adattarsi ai cambiamenti dell’ambiente per raggiungere efficacemente i propri obiettivi.
Nonostante ciò, solo pochi ricercatori hanno focalizzato i loro sforzi su implementazioni stabili di algoritmi di SLAM e Visual-SLAM adatti ai robot umanoidi.
Tali piattaforme robotiche introducono nuove problematiche che possono compromettere la stabilità degli algoritmi di navigazione convenzionali, specie se basati sulla visione. I robot umanoidi sono dotati di un alto grado di libertà di movimento, con la possibilità di effettuare velocemente movimenti complessi: tali caratteristiche introducono negli spostamenti vibrazioni non deterministiche in grado di compromettere l’affidabilit` dei dati sensoriali acquisiti, per esempio introducendo nei flussi video effetti indesiderati quali il motion blur. A causa
dei vincoli imposti dal bilanciamento del corpo, inoltre, tali robot non sempre possono essere dotati di unit` di elaborazione molto performanti che spesso sono ingombranti e dal peso elevato: ci` limita l’utilizzo di algoritmi complessi e computazionalmente gravosi. Infine, al contrario di quanto accade per i robot dotati di ruote, la complessa cinematica di un robot umanoide impedisce di ricostruire
il movimento basandosi sulle informazioni provenienti dagli encoder posti sui motori.
In questa tesi ci si é focalizzati sullo studio e sullo sviluppo di nuove metodologie per affrontare il problema del Visual-SLAM, ponendo particolare enfasi ai problemi legati all’utilizzo di piccoli robot umanoidi dotati di una singola telecamera come piattaforme per gli esperimenti.
I maggiori sforzi nell’ambito della ricerca sullo SLAM e sul Visual-SLAM si sono concentrati nel campo del processo di stima dello stato del robot, ad esempio la stima della propria posizione e della mappa dell’ambiente. D’altra parte, la maggior parte delle problematiche incontrate nella ricerca sul Visual-SLAM sono legate al processo di percezione, ovvero all’interpretazione dei dati provenienti dai
sensori. In questa tesi ci si é perciò concentrati sul miglioramento dei processi percettivi da un punto di vista della visione artificiale.
Sono stati affrontati i problemi che scaturiscono dall’utilizzo di piccoli robot
umanoidi come piattaforme sperimentali, come ad esempio la bassa capacità di calcolo, la bassa qualit` dei dati sensoriali e l’elevato numero di gradi di libertà nei movimenti. La bassa capacità di calcolo ha portato alla creazione di un nuovo metodo per misurare la similarità tra le immagini, che fa uso di una descrizione dell’immagine compatta, utilizzabile in applicazioni di SLAM topologico. Il problema del motion blur é stato affrontato proponendo una nuova tecnica di rilevamento di feature visive, unitamente ad un nuovo schema di tracking, robusto an-
che in caso di motion blur non uniforme. E’ stato altresì sviluppato un framework per l’odometria basata sulle immagini, che fa uso delle feature visive presentate.
Si propone infine un approccio al Visual-SLAM basato sulle omografie, che sfrutta le informazioni ottenute da una singola telecamera montata su un robot umanoide. Tale approccio si basa sull’assunzione che il robot si muove su una superficie piana.
Tutti i metodi proposti sono stati validati con esperimenti e studi comparativi, usando sia dataset standard che immagini acquisite dalle telecamere installate su piccoli robot umanoidi
KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation
Object pose estimation is a fundamental computer vision task exploited in
several robotics and augmented reality applications. Many established
approaches rely on predicting 2D-3D keypoint correspondences using RANSAC
(Random sample consensus) and estimating the object pose using the PnP
(Perspective-n-Point) algorithm. Being RANSAC non-differentiable,
correspondences cannot be directly learned in an end-to-end fashion. In this
paper, we address the stereo image-based object pose estimation problem by i)
introducing a differentiable RANSAC layer into a well-known monocular pose
estimation network; ii) exploiting an uncertainty-driven multi-view PnP solver
which can fuse information from multiple views. We evaluate our approach on a
challenging public stereo object pose estimation dataset and a custom-built
dataset we call Transparent Tableware Dataset (TTD), yielding state-of-the-art
results against other recent approaches. Furthermore, in our ablation study, we
show that the differentiable RANSAC layer plays a significant role in the
accuracy of the proposed method. We release with this paper the code of our
method and the TTD dataset.Comment: Published in IEEE Robotics and Automation Letter
Flexible 3D localization of planar objects for industrial bin-picking with monocamera vision system
telecamera panoramica ad inseguimento
Dispositivo di visione che integra una telecamera motorizzata (con movimentazione per alzo e/o brandeggio, e/o zoom) ed una telecamera panoramica di qualsiasi tipo (gruppo di telecamere affiancate, una telecamera omnidirezionale, ecc...) che cooperano attivamente con l’impiego di un sistema intelligente (realizzato mediante hardware e/o software) allo scopo di supervisionare tutto l’ambiente circostante sfruttando i vantaggi di entrambi i tipi di telecamera. Le informazioni estratte dalle immagini panoramiche sono utilizzate con lo scopo di controllare attivamente la telecamera motorizzata e inquadrare dettagliatamente zone, persone ed oggetti di interesse.
Le immagini provenienti dai sensori integrati possono essere diffuse ed elaborate in modo da estrarre informazioni di interesse quali oggetti in movimento o azioni specifiche. Fondendo le informazioni provenienti da entrambe le telecamere è possibile ottenere dettagliate informazioni riguardo lo stato, le variazioni e la profondità degli oggetti fissi ed in movimento presenti in tutto ambiente supervisionato dal dispositivo
Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation
Semantic segmentation of point clouds is an essential task for understanding the environment in autonomous driving and robotics. Recent range-based works achieve real-time efficiency, while point- and voxel-based methods produce better results but are affected by high computational complexity. Moreover, highly complex deep learning models are often not suited to efficiently learn from small datasets. Their generalization capabilities can easily be driven by the abundance of data rather than the architecture design. In this paper, we harness the information from the three-dimensional representation to proficiently capture local features, while introducing the range image representation to incorporate additional information and facilitate fast computation. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. Extensive experiments on SemanticKITTI and nuScenes datasets demonstrate the benefits of our modification in a "small data"setup, in which only one sequence of the dataset is used to train the models, but also in the conventional setup, where all sequences except one are used for training. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications. The code of our method is available at https://github.com/Bender97/WaffleAndRange
Fast Incremental Objects Identification and Localization using Cross-correlation on a 6 DoF Voting Scheme
In this work, we propose a sparse features-based object recognition and localization system, well suited for online learning of new objects. Our method takes advantages of both depth and ego-motion information, along with salient feature descriptors information, in order to learn and recognize objects with a scalable approach. We extend the conventional probabilistic voting scheme for object the recognition task, proposing a correlation-based approach in which each object-related point feature contributes in a 6-dimensional voting space (i.e., the 6 degrees-of-freedom, DoF, object position) with a continuous probability density distribution (PDF) represented by a Mixture of Gaussian (MoG). A global PDF is then obtained adding the contribution of each feature. The object instance and pose are hence inferred exploiting an efficient mode-finding method for mixtures of Gaussian distributions. The special properties of the convolution operator for the MoG distributions, combined with the sparsity of the exploited data, provide our method with good computational efficiency and limited memory requirements, enabling real-time performances also in robots with limited resources
Scalable Dense Large-Scale Mapping and Navigation
This paper describes a scalable dense 3D recon- struction and navigation system suitable for real-time operation. The system represents the environment as the back-projection of a Delaunay triangulation of the omnidirectional image, estimated at each instant from two adjacent views. The cost being minimized (i.e., the reprojection error) is photometric rather than geometric, as in the majority of feature-based reconstruction and navigation systems. While temporal inte- gration would enable more accurate reconstruction, this would carry the computational burden of handling topological changes due to occlusion phenomena. We successfully tested our system in a challenging urban scenario along a large loop using an omnidirectional camera mounted on the roof of a car
ConUDA: Confidence-Guided Pseudo-Label Sampling for Unsupervised Domain Adaptation in 3D LiDAR Semantic Segmentation
Dense annotation of real 3D LiDAR point clouds for mobile robot applications remains challenging. Unsupervised Domain Adaptation (UDA) enables the segmentation of unlabeled real-world point clouds by leveraging labeled synthetic data. However, existing self-training-based UDA methods rely on fixed thresholds for pseudo-label selection, limiting adaptation performance. In this work, we address this limitation. We propose a novel UDA framework for 3D LiDAR semantic segmentation, centered on a confidence-guided pseudo-label sampling strategy (ConSamp). Specifically, ConSamp adopts a probabilistic sampling strategy in which pseudo-labels with higher confidence are more likely to be retained. Meanwhile, the sampling function itself evolves adaptively throughout training to respond to changes in confidence distribution. Experiments show that our model achieves strong performance on synthetic-to-real 3D LiDAR semantic segmentation tasks. In particular, results better than state-of-the-art methods have been achieved on two public 3D point cloud datasets: SemanticKITTI [1] and SemanticPOSS [2]
- …
