1,721,026 research outputs found

    A generative superpixel method

    No full text
    Superpixel methods have become popular in recent years as they provide an efficient preprocessing tool for a manifold of computer vision applications. In this work, we propose a method based on a self-adapting and self-growing network, which is bred starting from two random initialization seeds in the image. Such a network, which is a modification of the Instantaneous Topological Map (ITM), is inspired to a Growing Neural Gas (GNG) and like many other self adapting tools employs a Hebbian learning framework. Key point in competitive learning is the definition of a suitable distance function, which we analyse in depth in this work. Distance is indeed the notion which allows to link unsupervised competitive learning with segmentation, where cluster formation reduces to node creation and adaptation within the exploration of a suitable multidimensional input space

    Learning unbiased classifiers from biased data with meta-learning

    No full text
    It is well known that large deep architectures are powerful models when adequately trained, but may exhibit undesirable behavior leading to confident incorrect predictions, even when evaluated on slightly different test examples. Test data characterized by distribution shifts (from training data distribution), outliers, and adversarial samples are among the types of data affected by this problem. This situation worsens whenever data are biased, meaning that predictions are mostly based on spurious correlations present in the data. Unfortunately, since such correlations occur in the most of data, a model is prevented from correctly generalizing the considered classes. In this work, we tackle this problem from a meta-learning perspective. Considering the dataset as composed of unknown biased and unbiased samples, we first identify these two subsets by a pseudo-labeling algorithm, even if coarsely. Subsequently, we apply a bi-level optimization algorithm in which, in the inner loop, we look for the best parameters guiding the training of the two subsets, while in the outer loop, we train the final model taking benefit from augmented data generated using Mixup. Properly tuning the contributions of biased and unbiased data, together with the regularization introduced by the mixed data has proved to be an effective training strategy to learn unbiased models, showing superior generalization capabilities. Experimental results on synthetically and realistically biased datasets surpass state-of-the-art performance, as compared to existing methods

    Hand Detection in First Person Vision

    No full text
    The emergence of new pervasive wearable technologies (e.g. action cameras and smart glasses) calls attention to the so called First Person Vision (FPV). In the future, more and more everyday-life videos will be shot from a first-person point of view, overturning the classical fixed-camera understanding of Vision, specializing the existing knowledge of moving cameras and bringing new challenges in the field of video processing. The trend in research is going to be oriented towards a new type of computer vision, centred on moving sensors and driven by the need for new applications for wearable devices. We identify hand tracking and gesture recognition as an essential topic in this field, motivated by the simple realization that we often look at our hand, even while performing the simplest tasks in everyday life. In addition, the next frontier in user interfaces are hands-free devices. In this work we argue that applications based on FPV may involve information fusion at various complexity and abstraction levels, ranging from pure image processing to inference over patterns. We address the lowest, by proposing a first investigation on hand detection from a first-person point of view sensor and some preliminary results obtained fusing colour and optic flow information

    A modular approach to fire and smoke detection, based on colour features and dynamics analysis

    No full text
    This work addresses the issue of fire and smoke detection in a scene within a video surveillance framework. Detection of fire and smoke pixels is at first achieved by means of a motion detection algorithm. In addition, separation of smoke and fire pixels using colour information (within appropriate spaces, specifically chosen in order to enhance specific chromatic features) is performed. In parallel, a pixel selection based on the dynamics of the area is carried out in order to reduce false detection. The output of the three parallel algorithms are eventually fused by means of a MLP. Index Terms—Fire Detection, Smoke Detection, Colour Space, Entropy Estimation, Multi-Layers Perceptro

    Optimizing Superpixel clustering for real-time egocentric-vision applications

    No full text
    In this work, we propose a strategy for optimizing a superpixel algorithm for video signals, in order to get closer to real time performances which are on the one hand needed for egocentric vision applications and on the other must be bearable by wearable technologies. Instead of applying the algorithm frame by frame, we propose a technique inspired to Bayesian filtering and to video coding which allows to re-initialize superpixels using the information from the previous frame. This results in faster convergence and demonstrates how performances improve with respect to the standard application of the algorithm from scratch at each frame

    A Cognitive Control-Inspired Approach to Object Tracking

    No full text
    Under a tracking framework, the definition of the target state is the basic step for automatic understanding of dynamic scenes. More specifically, far object tracking raises challenges related to the potentially abrupt size changes of the targets as they approach the sensor. If not handled, size changes can introduce heavy issues in data association and position estimation. This is why adaptability and self-awareness of a tracking module are desirable features. The paradigm of cognitive dynamic systems (CDSs) can provide a framework under which a continuously learning cognitive module can be designed. In particular, CDS theory describes a basic vocabulary of components that can be used as the founding blocks of a module capable to learn behavioral rules from continuous active interactions with the environment. This quality is the fundamental to deal with dynamic situations. In this paper we propose a general CDS-based approach to tracking. We show that such a CDS-inspired design can lead to the self-adaptability of a Bayesian tracker in fusing heterogeneous object features, overcoming size change issues. The experimental results on infrared sequences show how the proposed framework is able to outperform other existing far object tracking methods

    Unsupervised Synthetic Acoustic Image Generation for Audio-Visual Scene Understanding

    No full text
    Acoustic images are an emergent data modality for multimodal scene understanding. Such images have the peculiarity of distinguishing the spectral signature of the sound coming from different directions in space, thus providing a richer information as compared to that derived from single or binaural microphones. However, acoustic images are typically generated by cumbersome and costly microphone arrays which are not as widespread as ordinary microphones. This paper shows that it is still possible to generate acoustic images from off-the-shelf cameras equipped with only a single microphone and how they can be exploited for audio-visual scene understanding. We propose three architectures inspired by Variational Autoencoder, U-Net and adversarial models, and we assess their advantages and drawbacks. Such models are trained to generate spatialized audio by conditioning them to the associated video sequence and its corresponding monaural audio track. Our models are trained using the data collected by a microphone array as ground truth. Thus they learn to mimic the output of an array of microphones in the very same conditions. We assess the quality of the generated acoustic images considering standard generation metrics and different downstream tasks (classification, cross-modal retrieval and sound localization). We also evaluate our proposed models by considering multimodal datasets containing acoustic images, as well as datasets containing just monaural audio signals and RGB video frames. In all of the addressed downstream tasks we obtain notable performances using the generated acoustic data, when compared to the state of the art and to the results obtained using real acoustic images as input

    HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

    No full text
    We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar and reduced rendering artifacts. This allows us to handle the animation of small body parts, such as fingers, that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively

    Run Length Encoded Dynamic Bayesian Networks for Probabilistic Interaction Modeling

    No full text
    Human behavior analysis for Cognitive Surveillance Systems (CSS) share mainly the concept that it can be time to extend functionalities beyond simple video analytics. In most recent systems addressed by research, automatic support to human decisions based on object detection, tracking and situation assessment tools is integrated as a part of a complete cognitive artificial process. In such cases a CSS needs to represent complex situations that describe alternative possible real time interactions between the dynamic observed situation and operators’ actions. To obtain such knowledge, particular types of Event based Dynamic Bayesian Networks E-DBNs are here proposed. In this paper it is shown how, by means of Run Length Encoding (RLE) of off line acquired information, the cognitive system is able to represent and anticipate possible operators’ actions within the CSS. Results are shown by considering a crowd monitoring application in a critical infrastructure. A system is presented where a CSS embedding in a structured way RLE E-DBN knowledge can interact with an active visual simulator of crowd situations. Outputs from such a simulator can be easily compared with video signals coming from real cameras and processed by typical Bayesian tracking methods
    corecore