1,720,966 research outputs found
Identificazione di anomalie nell’attenzione del guidatore e nel comportamento delle persone.
Attraverso sensori e dispositivi informatici sempre più pervasivi il mondo diventa di giorno in giorno sempre più interconnesso e digitalizzato: di conseguenza, emergono nuove opportunità per l'intelligenza artificiale.
In particolare, il monitoraggio pubblico si candida come tema critico e la visione artificiale ha le potenzialità per emergere come tecnologia guida nella costruzione di un mondo più sicuro. In questa tesi, presentiamo soluzioni per affrontare la salvaguardia pubblica in due diverse aree applicative.
Consideriamo innanzitutto la sicurezza al volante, sviluppando un sistema in grado di prevedere su quali elementi della scena circostante un guidatore posa la sua attenzione. Nonostante il grande potenziale per il miglioramento della sicurezza, tale previsione appare molto complessa dal momento che guidare un'auto è un compito complicato, ed è altamente soggettivo dal punto di vista attentivo. A tal proposito, raccogliamo e rilasciamo DR(eye)VE, un dataset costituito da video acquisiti sia dal punto di vista del guidatore che da quello dell’auto, annotato con i punti di fissazione del guidatore sulla scena urbana esterna. Successivamente, una profonda ispezione di tali dati permette di stabilire quali fattori influenzano maggiormente l’attenzione del guidatore, in termini di movimento e di semantica. Guidati da tali evidenze, sviluppiamo infine una rete neurale profonda che, a partire da una scena urbana, identifica quali regioni sono salienti per l'attenzione del guidatore.
In secondo luogo, affrontiamo la sicurezza in ambito videosorveglianza introducendo un modello di rilevamento delle anomalie. Tale modello è in grado di apprendere gli aspetti che caratterizzano situazioni normali (sicure), e quindi di generare una allerta ogni qualvolta compaiano eventi imprevisti. Addestrare tali modelli in assenza di esempi di condizioni anormale è lo scopo della ricerca per il rilevamento di anomalie (o rilevamento di novità). Nonostante la sua importanza ed una esuberanza di lavori precedenti, la natura imprevedibile di eventi anomali e la loro inaccessibilità durante la procedura di training degrada significativamente l'efficacia dei sistemi preesistenti. In questo contesto, proponiamo un modello generale costituito da un autoencoder profondo dotato di uno stimatore di densità parametrico, il quale impara la distribuzione delle sue rappresentazioni latenti attraverso una procedura autoregressiva. Mostriamo che un obiettivo di maximum likelihood nello spazio latente regolarizza l’obiettivo di ricostruzione dell'autoencoder e minimizza l'entropia differenziale della distribuzione dei vettori latenti. Intuitivamente, tale ottimizzazione congiunta forza il modello a descrivere (e ricostruire) ogni esempio in termini di features che appaiono frequentemente nel set di addestramento (pertanto, più rappresentative della normalità).
Ampie indagini sperimentali e confronti con lo stato dell’arte dimostrano l'efficacia di entrambe le nostre proposte.As the world matures increasingly connected and digitized by the day, with sensors and computing devices becoming more and more pervasive, new opportunities appear for artificial intelligence.
In particular, public monitoring steps forward as a critical theme, and computer vision can forcefully prevail as the lead technology to help build a safer world. In this thesis, we present solutions to tackle public safeguard in two different areas of operation.
First, we begin with vehicle-based safety by developing a system capable of predicting where a person is likely to focus her attention on while driving. Such activity has a vast potential to improve driving safety. Nevertheless, it appears utterly complex since driving a car is a complicated task, and it is highly subjective from an attentive perspective. To handle attention prediction, we collect and release DR(eye)VE, a dataset consisting of driver-centric and car-centric clips, along with driver's fixation points on the outer urban scene. Next, we deeply inspect such data in order to establish which factors most influence a driver's gaze, both in terms of motion and semantics. Guided by such evidence, we finally develop a deep neural network that, given a car-centric urban scene, identifies which regions are likely to capture the driver's attention.
Secondly, we address surveillance-based safety by introducing an anomaly detection model capable of learning the traits that characterize healthy (safe) situations and, therefore, alert when unexpected events appear. Learning such models without utilizing examples of abnormal conditions is the aim of anomaly detection (a.k.a. novelty detection) research. Despite its importance and a plethora of prior work, the unpredictable nature of novel events and their inaccessibility during the training procedure severely degrades the effectiveness of state-of-the-art systems. In this framework, we propose a general model consisting of a deep autoencoder equipped with a parametric density estimator, fitting its latent representations through an autoregressive procedure. We show that a maximum likelihood objective in latent space effectively regularizes the optimization of the autoencoder's reconstruction error, and minimizes the differential entropy of the distribution spanned by latent vectors. Intuitively, such a joint optimization forces the model to describe (and reconstruct) each example in terms of features that frequently appear in the training set.
Extensive experimental inquiries and comparisons with prior art show the effectiveness of both our proposals
Self-Supervised Optical Flow Estimation by Projective Bootstrap
Dense optical flow estimation is complex and time consuming, with state-of-the-art methods relying either on large synthetic data sets or on pipelines requiring up to a few minutes per frame pair. In this paper, we address the problem of optical flow estimation in the automotive scenario in a self-supervised manner. We argue that optical flow can be cast as a geometrical warping between two successive video frames and devise a deep architecture to estimate such transformation in two stages. First, a dense pixel-level flow is computed with a projective bootstrap on rigid surfaces. We show how such global transformation can be approximated with a homography and extend spatial transformer layers so that they can be employed to compute the flow field implied by such transformation. Subsequently, we refine the prediction by feeding a second, deeper network that accounts for moving objects. A final reconstruction loss compares the warping of frame Xt with the subsequent frame Xt+1 and guides both estimates. The model has the speed advantages of end-to-end deep architectures while achieving competitive performances, both outperforming recent unsupervised methods and showing good generalization capabilities on new automotive data sets
Latent Space Autoregression for Novelty Detection
Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure.
We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts
Exploring Architectural Details Through aWearable Egocentric Vision Device
Augmented user experiences in the cultural heritage domain are in increasing
demand by the new digital native tourists of 21st century. In this paper, we propose a novel
solution that aims at assisting the visitor during an outdoor tour of a cultural site using the
unique first person perspective of wearable cameras. In particular, the approach exploits
computer vision techniques to retrieve the details by proposing a robust descriptor based on
the covariance of local features. Using a lightweight wearable board the solution can localize
the user with respect to the 3D point cloud of the historical landmark and provide him with
information about the details he is currently looking at. Experimental results validate the
method both in terms of accuracy and computational effort. Furthermore, user evaluation
based on real-world experiments shows that the proposal is deemed effective in enriching a
cultural experience
Learning to Map Vehicles into Bird's Eye View
Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies.
This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird's eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird's eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones
Predicting the Driver's Focus of Attention: the DR(eye)VE Project
Predicting the Driver's Focus of Attention: the DR(eye)VE Project
Andrea Palazzi, Davide Abati, Simone Calderara, Francesco Solera, Rita Cucchiara
(Submitted on 10 May 2017 (v1), last revised 6 Jun 2018 (this version, v3))
In this work we aim to predict the driver's focus of attention. The goal is to estimate what a person would pay attention to while driving, and which part of the scene around the vehicle is more critical for the task. To this end we propose a new computer vision model based on a multi-branch deep architecture that integrates three sources of information: raw video, motion and scene semantics. We also introduce DR(eye)VE, the largest dataset of driving scenes for which eye-tracking annotations are available. This dataset features more than 500,000 registered frames, matching ego-centric views (from glasses worn by drivers) and car-centric views (from roof-mounted camera), further enriched by other sensors measurements. Results highlight that several attention patterns are shared across drivers and can be reproduced to some extent. The indication of which elements in the scene are likely to capture the driver's attention may benefit several applications in the context of human-vehicle interaction and driver attention analysis
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
- …
