1,721,007 research outputs found
3D acquisition and analysis with applications in interaction and contactless measurement
Negli ultimi anni la Computer Vision è diventata una presenza costante nella vita quotidiana di ognuno di noi. Grazie alla riduzione del prezzo degli apparati e all'aumento della potenza di calcolo, le tecniche basate sulla Computer Vision sono diventate un ottimo strumento da applicare in diversi scenari, dall'acquisizione della struttura tridimensionale degli oggetti, alla fotogrammetria sino all'analisi automatica delle immagini di video sorveglianza e l'utilizzo in sistemi automatici di autenticazione basati sul riconoscimento facciale. Il recente interesse del settore video-ludico in nuovi paradigmi di interazione basati su gesture recognition ha ulteriormente incrementato la presenza di apparati di acquisizione video (e non solo) nelle nostre case. Questa tesi è dedicata all'utilizzo di questa varietà di dispositivi a basso costo per lo sviluppo di applicazioni utili sia nel campo industriale che nel mondo dell'home-entertainment.
Il primo contributo di questa tesi riguarda lo sviluppo di alcune tecniche avanzate di calibrazione degli apparati di acquisizione, con particolare attenzione a dispositivi a basso costo che, a causa delle imperfezioni nella produzione, sono più propensi a divergere dai modelli parametrici comuni. Oltre ad immagini bidimensionali, la recente diffusione di camere di profondità, come per esempio il sensore Mictosoft Kinect, permette di acquisire insieme all'immagine anche informazioni relative alla tridimensionalità della scena. Questi dati sono particolarmente utili quando dobbiamo riconoscere movimenti del corpo e gesti effettuati dagli utenti, tuttavia la loro analisi introduce un nuovo problema: la ricerca di corrispondenza tra forme non rigide . Questo problema è affrontato nella seconda parte della tesi con due metodi che cercano di trovare delle corrispondenza sia sparse che dense tra due o più forme in presenza di parzialità . Nell'ultima parte sono proposte alcune applicazioni che sfruttano le nozioni e le tecniche introdotte precedentemente per lo sviluppo di nuovi paradigmi di interazione mediante la progettazione di due dispositivi di tracking applicati in diversi scenari: un tavolo interattivo, una Witeboard interattiva e un Viewer Dependent display
GraFix: A Graph Transformer with Fixed Attention Based on the WL Kernel
In this paper we introduce GraFix, a novel graph transformer with fixed structural attention. Inspired by recent works 1) harnessing the link between (graph) kernels and the attention mechanism of transformers and 2) favouring simple fixed (non-learnable) attentive patterns over the standard attention mechanism, we propose to use graph kernels, specifically the WL kernel, to replace the learnable attention mechanism of a transformer with a fixed one capturing the structural similarity between substructures in the input graphs. The resulting graph transformer showcases an excellent performance on standard graph classification benchmarks, performing on-par with and in some instances outperforming a wide variety of alternative graph neural network and graph transformer-based approaches while at the same time benefiting from a reduced number of learnable parameters and learning runtime
Learning disentangled representations via product manifold projection
We propose a novel approach to disentangle the generative factors of variation underlying a given set of observations. Our method builds upon the idea that the (unknown) low-dimensional manifold underlying the data space can be explicitly modeled as a product of submanifolds. This definition of disentanglement gives rise to a novel weakly-supervised algorithm for recovering the unknown explanatory factors behind the data. At training time, our algorithm only requires pairs of non i.i.d. data samples whose elements share at least one, possibly multidimensional, generative factor of variation. We require no knowledge on the nature of these transformations, and do not make any limiting assumption on the properties of each subspace. Our approach is easy to implement, and can be successfully applied to different kinds of data (from images to 3D surfaces) undergoing arbitrary transformations. In addition to standard synthetic benchmarks, we showcase our method in challenging real-world applications, where we compare favorably with the state of the art
A 5 degrees of freedom multi-user pointing device for interactive whiteboards
Interactive whiteboards are nowadays rather common equipments in classrooms as they provide large advantages in terms of expressive power. Despite the radical paradigm shift, their interaction model is firmly tied to the archetypal concept of strokes and gestures over a whiteboard. In this paper we introduce a novel pointing device that enables one to escape the surface-based interaction, by means of a robust and occlusion-resilient multi-camera 3D tracking. More precisely, we designed a frequency-based active pen. By means of a camera network such pen can be localized in a 3D frame featuring the same 5 degrees of freedom exposed by a real whiteboard marker. Our approach allows for using many pointers at the same time, by reliably assigning an unique and permanent identity to each one. By levering on these capabilities, interaction designers can conceive new and inventive interaction models. A few of them have been implemented within this study and are described in the experimental part of this work
GNN-LoFI: a Novel Graph Neural Network through Localized Feature-based Histogram Intersection
Graph neural networks are increasingly becoming the framework of choice for
graph-based machine learning. In this paper, we propose a new graph neural
network architecture that substitutes classical message passing with an
analysis of the local distribution of node features. To this end, we extract
the distribution of features in the egonet for each local neighbourhood and
compare them against a set of learned label distributions by taking the
histogram intersection kernel. The similarity information is then propagated to
other nodes in the network, effectively creating a message passing-like
mechanism where the message is determined by the ensemble of the features. We
perform an ablation study to evaluate the network's performance under different
choices of its hyper-parameters. Finally, we test our model on standard graph
classification and regression benchmarks, and we find that it outperforms
widely used alternative approaches, including both graph kernels and graph
neural networks
Phase-based spatio-temporal interpolation for accurate 3D localization in camera networks
Many computer vision applications that exploit a network of independent cameras strongly depend on an accurate synchronization between them. This is indeed the case for 3D tracking. In fact, even if the calibration of the intrinsic and extrinsic parameters of each camera is flawless, inaccurate synchronization would still result in an impaired triangulation between incoherent projective images of the observed features. In many setups, synchronization can be guaranteed with specialized hardware supporting dedicated trigger control lines, however this becomes more difficult when dealing with a (possibly dynamic) network of distributed cameras communicating through wireless channels. With this paper we introduce an end-to-end solution to the problem, including a very simple hardware design for an easy to track device and a practical method that exploits its intrinsic properties for obtaining precise synchronization among an arbitrary number of cameras. Furthermore we propose a simple interpolation schema that can deal naturally with shots captured at different times. Our approach is highly scalable, since it does not require any kind of direct communication or synchronization between cameras. Moreover, new cameras can be added at any time without requiring any additional configuration. In order to test our method we built a specially crafted setup that we used to perform an exhaustive set of experiments
Adaptive Albedo Compensation for Accurate Phase-Shift Coding
Among structured light strategies, the ones based on phase shift are considered to be the most adaptive with respect to the features of the objects to be captured. Inter alia, the theoretical invariance to signal strength and the absence of discontinuities in intensity, make phase shift an ideal candidate to deal with complex surfaces of unknown geometry, color and texture. However, in practical scenarios, unexpected artifacts could still result due to the characteristics of real cameras. This is the case, for instance, with high contrast areas resulting from abrupt changes in the albedo of the captured objects. In fact, the not negligible size of pixels and the presence of blur can produce a mix of signal integration from adjacent areas with different albedo. This, in turn, would result in a bias in the phase recovery and, consequentially, in an inaccurate 3D reconstruction of the surface. While this problem affects most structure light methods based on phase shift or derived techniques, little effort has been put in addressing it. With this paper we propose a model for the phase corruption and a theoretically sound correction step to be adopted to compensate the bias. The practical effectiveness of our approach is well demonstrated by a complete set of experimental evaluations
An Accurate and Robust Artificial Marker based on Cyclic Codes
Artificial markers are successfully adopted to solve several vision tasks, ranging from tracking to calibration. While most designs share the same working principles, many specialized approaches exist to address specific application domains. Some are specially crafted to boost pose recovery accuracy. Others are made robust to occlusion or easy to detect with minimal computational resources. The sheer amount of approaches available in recent literature is indeed a statement to the fact that no silver bullet exists. Furthermore, this is also a hint to the level of scholarly interest that still characterizes this research topic. With this paper we try to add a novel option to the offer, by introducing a general purpose fiducial marker which exhibits many useful properties while being easy to implement and fast to detect. The key ideas underlying our approach are three. The first one is to exploit the projective invariance of conics to jointly find the marker and set a reading frame for it. Moreover, the tag identity is assessed by a redundant cyclic coded sequence implemented using the same circular features used for detection. Finally, the specific design and feature organization of the marker are well suited for several practical tasks, ranging from camera calibration to information payload delivery
Evaluating Stereo Vision and User Tracking in Mixed Reality Tasks
Advances in head tracking and stereoscopic visualization technologies have fostered the implementation of subjective display systems able to render a 3D scene perspective-corrected according to the position of the user. This enables a whole class of mixed reality applications and interaction paradigms, where the user is able to move freely around the scene and to perform tasks involving the interplay between physical and virtual objects. The accuracy and ergonomics of such tasks strongly depend on the ability of the subjective display system to offer not only a convincing 3D visual experience, but also, and mostly, an accurate rendering of the virtual scene in terms of spatial and metric relations between virtual and physical scene components. In this paper we study the role and impact of head tracking and stereo visualization in mixed reality contexts using a set of measuring tasks involving physical rulers and virtual objects, performed under different rendering conditions. Specifically, we analyze to what extent the two features contribute to give the user the correct alignment between the virtual and the real components of a 3D scene. Finally, we draw some conclusions about their impact within different scenarios.Advances in head tracking and stereoscopic visualization technologies have fostered the implementation of subjective display systems able to render a 3D scene perspective-corrected according to the position of the user. This enables a whole class of mixed reality applications and interaction paradigms, where the user is able to move freely around the scene and to perform tasks involving the interplay between physical and virtual objects. The accuracy and ergonomics of such tasks strongly depend on the ability of the subjective display system to offer not only a convincing 3D visual experience, but also, and mostly, an accurate rendering of the virtual scene in terms of spatial and metric relations between virtual and physical scene components. In this paper we study the role and impact of head tracking and stereo visualization in mixed reality contexts using a set of measuring tasks involving physical rulers and virtual objects, performed under different rendering conditions. Specifically, we analyze to what extent the two features contribute to give the user the correct alignment between the virtual and the real components of a 3D scene. Finally, we draw some conclusions about their impact within different scenarios
A low cost tracking system for position-dependent 3D visual interaction
In many visual interaction applications the user needs to explore a scene by moving with respect to the virtual environment. Using a fixed camera viewpoint leads to visual inconsistencies, which can be avoided only if the exact pose of the user head is known and can be used to produce a perspective correct rendering. To this end, tracking devices are often used, however many of them are relatively expensive or require the user to wear special apparel. With this paper we present a tracking system that can be implemented with a simple and very low cost modification of standard shutter glasses. The accuracy of such approach has been evaluated quantitatively with a specially crafted experimental setup. © 2014 Authors
- …
