1,720,973 research outputs found
Città intelligenti: connettere i punti di vista visuali di guidatore, veicolo e infrastruttura.
Il numero di dispositivi nel mondo che sono interconnessi tra loro sta crescendo rapidamente. Secondo un recente studio Gartner, entro la fine del 2020 saranno più di 20 miliardi.
A causa del sempre maggior numero di persone che si spostano versi i centri urbani, il settore della mobilità si sta evolvendo rapidamente e sta diventando esso stesso una forza trainante in questa direzione. Gli stessi veicoli si stanno trasformando in sofisticati centri di calcolo dotati di un enorme di sensori che permettono capacità di percezione sempre maggiori.
Larga parte di questi sensori è costituito da videocamere. All'interno del veicolo, tramite videocamere è possibile monitorare il guidatore e i passeggeri; altre all'esterno sono utilizzate per la comprensione della scena. Al tempo stesso, un gran numero di videocamere sono installate a livello di infrastruttura, per molteplici applicazioni: tra le altre, videosorveglianza, controllo dei flussi di traffico, lettura automatica delle targe.
In questo panorama, questa tesi investiga come più punti di vista visuali sulla stessa scena urbana possono essere messi in relazione tra loro.
Per prima cosa si studia il punto di vista del guidatore. A questo scopo è raccolto e reso pubblicamente disponibile il DR(eye)VE dataset, contenente i punti di fissazione del guidatore per più di 500000 frame di guida, integrati nel tempo in mappe di salienza specifiche per l'atto della guida.
Su questo dataset viene effettuata un'approfondita analisi del comportamento attentivo del guidatore su dati reali. Sui risultati di questa analisi viene costruito un modello computazionale basato su deep learning dell'attenzione umana nell'atto della guida.
Si ricerca inoltre se sia possibile imparare a mappare un punto di vista visuale dalla prima persona ad altre viste della scena, come una vista aerea. Poiché sarebbe impossibile raccogliere dati reali per questo task, viene raccolto e rilasciato un dataset sintetico di più di 1M di coppie di frame che raffiguranti rispettivamente la vista dal veicolo e la vista aerea. Con questi dati si allena una rete neurale convoluzionale in grado di inferire l'occupazione spaziale della vista aerea a partire dalla vista in prima persona. Prendendo una strada diversa per lo stesso obiettivo, è introdotto un encoder convoluzionale a due rami basato su rendering differenziabile che stima allo stesso tempo la categoria del veicolo e la sua posa nella scena. Nota la classe del veicolo e la sua posa, nuovi punti di vista possono essere generati rispettando la disposizione e la posa reciproca degli oggetti nella scena.
Infine, si supera la necessità di scegliere un particolare punto di vista in anticipo (es: vista aerea) e si presenta un framework per la generazione di nuove viste di un veicolo da punti di vista arbitrari. A differenza dei metodi parametrici (basati esclusivamente sull'apprendimento dai dati), si mostra come conoscenze a-priori sulla geometria dell'oggetto e sul mondo 3D possono essere integrate con successo nella pipeline di generazione dell'immagine basata su deep learning. Dal momento che questi vincoli geometrici non sono imparati, questo approccio è chiamato semi-parametrico.
L'integrazione tra componenti parametriche e non-parametriche consente di i) operare su dati reali ii) conservare informazioni visuali ad alta frequenza (es: texture) nella generazione e iii) operare roto-traslazioni 3D arbitrarie sull'input. Si mostra inoltre che questo approccio può essere facilmente esteso ad altri oggetti rigidi anche se di topologia completamente diversa, anche in presenza di strutture concave o buchi.
Approfondite analisi sperimentali e confronti con lo stato dell'arte confermano l'efficacia dei metodi proposti sia dal punti di vista quantitativo che percettivo.The number of interconnected devices is growing rapidly around us. According to a recent Gartner report, 20.4 billion of connected “things” are expected to be in use by the end of 2020. Cities make no exception. As most of the world population is congregating in urban areas, the sector of smart mobility is growing rapidly and has become a strong driving force towards this direction. Vehicles in the first place are mutating into sophisticated data crunchers, featuring a wide range of sensors that enable increasing perception capabilities.
Cameras constitute a large slice of these devices. In vehicles, inwards facing cameras allow to monitor the state of the driver and passengers, while multiple cameras pointing outwards are devoted to the understanding of the surrounding scene. At the same time, a massive number of infrastructure cameras are being installed around the cities with applications to surveillance, traffic flow monitoring, prediction plate recognition among others.
In this frame, this thesis investigates how multiple visual viewpoints on the same urban scene can be put in relation to each other and how novel viewpoints can be generated.
We start from the study of the driver's point of view. To this end, we collect and make publicly available a novel dataset called DR(eye)VE, composed of more than 500,000 frames of driving sequences containing drivers' gaze fixations and their temporal integration providing task-specific saliency maps. On this dataset we perform in-depth analysis of driver's attentional patterns on real-world data. Eventually, we build upon these findings to engineer and design the first deep learning based computational model of human attention during the driving task.
We then research if it is possible to learn a mapping between the aforementioned first person viewpoint and other views of the scene, e.g. a bird's eye view. As collecting real-world data for this purpose would be unfeasible, we record and release a photorealistic synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird’s eye view. On these data we show that a deep convolutional network can indeed be trained to infer the bird's eye spatial occupancy of the scene starting from raw detections on the first person view. Exploring a different path towards the same goal, we introduce a two-branched convolutional encoder network based on differentiable rendering that jointly estimates the vehicle category and its 6-DoF pose in the scene. Once the category and the 6DoF pose of each vehicle is known, this information suffices to render novel viewpoints in which objects arrangement and mutual poses are preserved.
Eventually, we overcome the need to decide a particular viewpoint in advance (e.g. bird's eye), presenting a framework for generating novel views of a vehicle from truly arbitrary 3D viewpoints, given a single monocular image. Differently from parametric (i.e. entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric. This careful blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures and iii) handle truly arbitrary 3D roto-translations of the input. We also show that our approach can be easily extended to other rigid objects with completely different topology, even in the presence of concave structures and holes.
Comprehensive experimental analyses against state-of-the-art competitors show the efficacy of our proposals both from a quantitative and a perceptive point of view
Warp and Learn: Novel Views Generation for Vehicles and Other Objects
In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a single monocular image.Differently from parametric (i.e. entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric.In particular, we exploit man-made object symmetry and piece-wise planarity to integrate rich a-priori visual information into the novel viewpoint synthesis process. An Image Completion Network (ICN) is then trained to generate a realistic image starting from this geometric guidance.This blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures, iii) handle truly arbitrary 3D roto-translations of the input and iv) perform shape transfer to completely different 3D models. Eventually, we show that our approach can be easily complemented with synthetic data and extended to other rigid objects with completely different topology, even in presence of concave structures and holes.A comprehensive experimental analysis against state-of-the-art competitors shows the efficacy of our method both from a quantitative and a perceptive point of view
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
DR(eye)VE: a Dataset for Attention-Based Tasks with Applications to Autonomous and Assisted Driving
Autonomous and assisted driving are undoubtedly hot topics in computer vision. However, the driving task is extremely complex and a deep understanding of drivers' behavior is still lacking. Several researchers are now investigating the attention mechanism in order to define computational models for detecting salient and interesting objects in the scene. Nevertheless, most of these models only refer to bottom up visual saliency and are focused on still images. Instead, during the driving experience the temporal nature and peculiarity of the task influence the attention mechanisms, leading to the conclusion that real life driving data is mandatory. In this paper we propose a novel and publicly available dataset acquired during actual driving. Our dataset, composed by more than 500,000 frames, contains drivers' gaze fixations and their temporal integration providing task-specific saliency maps. Geo-referenced locations, driving speed and course complete the set of released data. To the best of our knowledge, this is the first publicly available dataset of this kind and can foster new discussions on better understanding, exploiting and reproducing the driver's attention process in the autonomous and assisted cars of future generations
Learning to Map Vehicles into Bird's Eye View
Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies.
This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird's eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird's eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones
- …
