1,720,968 research outputs found
Riconoscimento di azioni nei video tramite tecnologie computazionali, multimediali e di apprendimento automatico
I video rappresentano oggi il mezzo più diffuso di condivisione delle informazioni. Con la loro diffusione, sono aumentate anche le esigenze di categorizzazione e di comprensione dei contenuti in modo automatico, sia per scopi di intrattenimento che per scopi professionali. In questa tesi vengono esplorati e progettati algoritmi e soluzioni per il riconoscimento automatico di azioni e per la loro localizzazione spazio-temporale nei video, utilizzando tecnologie multimediali e basate sul deep learning. Il lavoro non si limita alla valutazione quantitativa degli approcci proposti, al solo fine di migliorare le prestazioni su specifici task, ma affronta alcuni problemi che derivano dalla gestione dei contenuti video. Spesso i video coinvolgono persone e comportano problemi relativi alla loro privacy che non sono ancora investigati abbastanza dalla comunità scientifica. Inoltre, data la loro complessità e variabilità, i video rappresentano un tipo di dato difficile da elaborare e che richiede grandi risorse computazionali. Oltre allo scenario applicativo, questa tesi affronta anche problemi relativi alla sensibilità dei dati e alle risorse computazionali. Nella prima parte della tesi viene indagato il riconoscimento simultaneo di più attori e la classificazione delle loro azioni nei video, sfruttando interazioni sia spaziali che temporali tra le persone e gli oggetti circostanti. Viene poi progettata una rete neurale artificiale per l'individuazione di azioni salienti nelle partite di calcio, in collaborazione con Metaliquid SRL. Riguardo la privacy e i dati sensibili, viene proposto un nuovo metodo per mascherare l'identità delle persone nei video preservando la capacità dei modelli di predire le azioni in modo corretto. Infine, dal punto di vista computazionale, viene sviluppato un algoritmo per ridurre le dimensioni e l'utilizzo delle risorse delle reti neurali artificiali per il riconoscimento di azioni, mantenendo le prestazioni invariate. Questi aspetti della rappresentazione dei video vengono esaminati separatamente, rivelandosi generalizzabili in diversi scenari e rendendo più semplice la creazione di modelli di riconoscimento delle azioni efficienti e nel rispetto della privacy degli attori coinvolti. Tutte le alternative e le soluzioni presentate in questo lavoro si basano sul deep learning, che richiede un'enorme quantità di dati per l'apprendimento delle rappresentazioni video.Video clips represent the most pervasive means of disseminating information nowadays. With their outbreak, needs for automatic categorization and content understanding have also increased, both for entertainment purposes and professional ones. In the context of multimedia and deep learning technologies for video comprehension, we explore and devise video-based algorithms and state-of-the-art solutions to tackle action recognition and fine-grained action localization. Our research is not limited to the quantitative evaluation of the proposed approaches for improving performance on specific tasks. We observe that handling video content usually brings some drawbacks. Videos often involve human actors and could arise privacy issues that are not yet sufficiently investigated by the computer vision community. Moreover, given their complexity and variability, videos are not easy to process and often require large computational resources. In addition to the application scenario, this thesis tackles two main challenges related to automatic video processing, namely privacy issues and computation. In the application part, we investigate the simultaneous detection of multiple actors and the classification of their actions, by exploiting interactions between people and surrounding objects, both in space and time. We also explore a more production-oriented application, in collaboration with Metaliquid SRL and in line with the company’s needs, by devising a deep network for salient action spotting in broadcast soccer matches. Regarding the privacy issue, we propose a novel strategy for masking people’s identities in video clips while preserving the ability of action recognition models to predict correct class labels. Finally, from the computational perspective, we develop an algorithm for reducing the size and resource utilization of existing deep neural networks, while keeping performances. These three aspects of video modeling are investigated separately but have proved to be generalizable, making it easier to build efficient and privacy-preserving action recognition models. All the alternatives and solutions presented in this work build upon deep learning, requiring a huge amount of data for learning video representations
What was Monet seeing while painting? Translating artworks to photo-realistic images
State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured from the world as it is. This brings to an incompatibility between such methods and digital data from the artistic domain, on which current techniques under-perform. A possible solution is to reduce the domain shift at the pixel level, thus translating artistic images to realistic copies. In this paper, we present a model capable of translating paintings to photo-realistic images, trained without paired examples. The idea is to enforce a patch level similarity between real and generated images, aiming to reproduce photo-realistic details from a memory bank of real images. This is subsequently adopted in the context of an unpaired image-to-image translation framework, mapping each image from one distribution to a new one belonging to the other distribution. Qualitative and quantitative results are presented on Monet, Cezanne and Van Gogh paintings translation tasks, showing that our approach increases the realism of generated images with respect to the CycleGAN approach
Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach
In this article, we present an approach for retrieving similar faces between the artistic and the real domain. The application we refer to is an interactive exhibition inside a museum, in which a visitor can take a photo of himself and search for a lookalike in the collection of paintings. The task requires not only to identify faces but also to extract discriminative features from artistic and photo-realistic images, tackling a significant domain shift. Our method integrates feature extraction networks which account for the aesthetic similarity of two faces and their correspondences in terms of semantic attributes. Also, it addresses the domain shift between realistic images and paintings by translating photo-realistic images into the artistic domain. Noticeably, by exploiting the same technique, our model does not need to rely on annotated data in the artistic domain. Experimental results are conducted on different paired datasets to show the effectiveness of the proposed solution in terms of identity and attribute preservation. The approach is also evaluated on unpaired settings and in combination with an interactive relevance feedback strategy. Finally, we show how the proposed algorithm has been implemented in a real showcase at the Gallerie Estensi museum in Italy, with the participation of more than 1,100 visitors in just three days
Image-to-Image Translation to Unfold the Reality of Artworks: an Empirical Analysis
State-of-the-art Computer Vision pipelines show poor performances on artworks and data coming from the artistic domain, thus limiting the applicability of current architectures to the automatic understanding of the cultural heritage. This is mainly due to the difference in texture and low-level feature distribution between artistic and real images, on which state-of-the-art approaches are usually trained. To enhance the applicability of pre-trained architectures on artistic data, we have recently proposed an unpaired domain translation approach which can translate artworks to photo-realistic visualizations. Our approach leverages semantically-aware memory banks of real patches, which are used to drive the generation of the translated image while improving its realism. In this paper, we provide additional analyses and experimental results which demonstrate the effectiveness of our approach. In particular, we evaluate the quality of generated results in the case of the translation of landscapes, portraits and of paintings coming from four different styles using automatic distance metrics. Also, we analyze the response of pre-trained architecture for classification, detection and segmentation both in terms of feature distribution and entropy of prediction, and show that our approach effectively reduces the domain shift of paintings. As an additional contribution, we also provide a qualitative analysis of the reduction of the domain shift for detection, segmentation and image captioning
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation
The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain. This is partially due to the small amount of annotated artistic data, which is not even comparable to that of natural images captured by cameras. In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data. Our architecture can generate natural images by retrieving and learning details from real photos through a similarity matching strategy which leverages a weakly-supervised semantic understanding of the scene. Experimental results show that the proposed technique leads to increased realism and to a reduction in domain shift, which improves the performance of pre-trained architectures for classification, detection, and segmentation. Code is publicly available at: https://github.com/aimagelab/art2real
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
- …
