1,721,059 research outputs found

    Learning with uncertainty via Hyperbolic Neural Networks

    Full text link
    This thesis explores the application of hyperbolic geometry and hyperbolic neural networks across various domains, with a focus on leveraging uncertainty estimation to improve the learning process and performance in complex tasks. We begin with a brief introduction to hyperbolic neural networks, providing the theoretical foundation and key concepts that underpin our subsequent research. This work then spans three main areas: self-supervised representation learning for skeleton-based action recognition, active domain adaptation for semantic segmentation, and multimodal large language models. First, this thesis investigates self-supervised learning in the context of skeleton-based action recognition, where effective representation learning remains challenging due to the hierarchical nature of human motion data. We introduce hyperbolic neural networks to address this challenge through uncertainty-aware learning, developing a novel Hyperbolic Self-Paced learning model (HYSP). This approach leverages the hyperbolic radius as an uncertainty metric to adaptively pace the learning process, scaling the gradient determined by each sample by the norm of the hyperbolic embedding. When evaluated on standard action recognition benchmarks, HYSP demonstrates superior performance while eliminating the need for computationally expensive negative mining procedures. Next, we explore active learning for semantic segmentation under domain shift, where efficient label acquisition is crucial for adapting to new environments while keeping labeling costs down. For this challenge, we develop a hyperbolic approach named HALO (Hyperbolic Active Learning Optimization), which interprets the hyperbolic radius as an indicator of data scarcity. By combining the hyperbolic radius with prediction entropy, we obtain an estimator of epistemic uncertainty, which we use for selective annotation of pixels in the image. HALO achieves state-of-the-art results on domain adaptation benchmarks while requiring only a small fraction of target labels, surpassing even fully supervised domain adaptation methods. Finally, this thesis examines large-scale vision-language modeling, where uncertainty estimation becomes particularly challenging due to the scale and multimodal nature of the data. By developing a novel training strategy for a hyperbolic version of BLIP-2, we demonstrate that hyperbolic learning can be successfully scaled to billion-parameter architectures without compromising stability or performance. Our approach achieves results comparable to its Euclidean counterpart while providing meaningful uncertainty estimates thanks to hyperbolic embeddings, offering a new perspective on uncertainty quantification in large multimodal models. Throughout these studies, we demonstrate that learning in hyperbolic space offers unique advantages in estimating uncertainty and improving model performance and efficiency across diverse machine learning tasks. This work contributes to the broader understanding of hyperbolic neural networks and their potential to advance the field of deep learning

    Face-from-Depth for Head Pose Estimation on Depth Images

    Full text link
    Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon+, that receives as input three types of images and provides the 3D angles of the pose as output. Moreover, a Face-from-Depth component based on a Deterministic Conditional GAN model is able to hallucinate a face from the corresponding depth image. We empirically demonstrate that this positively impacts the system performances. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Experimental results show that our method overcomes several recent state-of-art works based on both intensity and depth input data, running in real-time at more than 30 frames per second

    Detecting Anomalies in People’s Trajectories using Spectral Graph Analysis

    No full text
    Video surveillance is becoming the technology of choice for monitoring crowded areas for security threats. While video provides ample information for human inspectors, there is a great need for robust automated techniques that can efficiently detect anomalous behavior in streaming video from single or multiple cameras. In this work we synergistically combine two state-of-the-art methodologies. The first is the ability to track and label single person trajectories in a crowded area using multiple video cameras, and the second is a new class of novelty detection algorithms based on spectral analysis of graphs. By representing the trajectories as sequences of transitions between nodes in a graph, shared individual trajectories capture only a small subspace of the possible trajectories on the graph. This subspace is characterized by large connected components of the graph, which are spanned by the eigenvectors with the low eigenvalues of the graph Laplacian matrix. Using this technique, we develop robust invariant distance measures for detecting anomalous trajectories, and demonstrate their application on real video data

    A videosurveillance data browsing software architecture for forensics: from trajectories similarities to video fragments

    No full text
    The information contained in digital video surveillance repositories can present relevant hints, when not even legal evidence, during investigations. As the amount of video data often forbids manual search, some tools have been developed during the past years in order to aid investigators in the look up process. We propose an application for forensic video analysis which aims at analysing the activities in a given scenario, particularly focusing on trajectories followed by people and their visual appearances. The recorded videos can be browsed by investigators thanks to a user-friendly interface, allowing easy information retrieval, through the choice of the best mining strategy. The underlying application architecture implements different feature and query models as well as query optimization strategies in order to return the best response in terms of both efficacy and efficiency

    Predicting the Driver's Focus of Attention: the DR(eye)VE Project

    Full text link
    Predicting the Driver's Focus of Attention: the DR(eye)VE Project Andrea Palazzi, Davide Abati, Simone Calderara, Francesco Solera, Rita Cucchiara (Submitted on 10 May 2017 (v1), last revised 6 Jun 2018 (this version, v3)) In this work we aim to predict the driver's focus of attention. The goal is to estimate what a person would pay attention to while driving, and which part of the scene around the vehicle is more critical for the task. To this end we propose a new computer vision model based on a multi-branch deep architecture that integrates three sources of information: raw video, motion and scene semantics. We also introduce DR(eye)VE, the largest dataset of driving scenes for which eye-tracking annotations are available. This dataset features more than 500,000 registered frames, matching ego-centric views (from glasses worn by drivers) and car-centric views (from roof-mounted camera), further enriched by other sensors measurements. Results highlight that several attention patterns are shared across drivers and can be reproduced to some extent. The indication of which elements in the scene are likely to capture the driver's attention may benefit several applications in the context of human-vehicle interaction and driver attention analysis

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Anomaly Detection for Vision-based Railway Inspection

    Full text link
    The automatic inspection of railways for the detection of obstacles is a fundamental activity in order to guarantee the safety of the train transport. Therefore, in this paper, we propose a vision-based framework that is able to detect obstacles during the night, when the train circulation is usually suspended, using RGB or thermal images. Acquisition cameras and external light sources are placed in the frontal part of a rail drone and a new dataset is collected. Experiments show the accuracy of the proposed approach and its suitability, in terms of computational load, to be implemented on a self-powered dron

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore