1,720,995 research outputs found
Learning visual features under motion invariance
Humans are continuously exposed to a stream of visual data with a natural temporal structure. However, most successful computer vision algorithms work at image level, completely discarding the precious information carried by motion. In this paper, we claim that processing visual streams naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of learning that originates from variational principles, just like in physics. Such principled approach is well suited for a discussion on a number of interesting questions that arise in vision, and it offers a well-posed computational scheme for the discovery of convolutional filters over the retina. Differently from traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario for the unsupervised processing of video signals, where features are extracted in a multi-layer architecture with motion invariance. While the theory enables the implementation of novel computer vision systems, it also sheds light on the role of information-based principles to drive possible biological solutions
Coherence constraints in facial expression recognition
This paper investigates the role of coherence constraints in recognizing facial expressions from images and video sequences. A set of constraints are introduced to bridge a pool of Convolutional Neural Networks (CNNs) during their training stage. Constraints are inspired by practical considerations on the regularity of the temporal evolution of the predictions, and by the idea of connecting the information extracted from multiple representations. We study CNNs with the aim of building a versatile recognizer of expressions in static images that can be further applied to video sequences. First, the importance of different face parts in the recognition task is studied, considering appearance and shape-related features. Then we focus on the Semi-Supervised learning setting, exploiting video data, where only a few frames are supervised. The unsupervised portion of the training data is used to enforce three types of coherence, namely temporal coherence, coherence among the predictions on the face parts and coherence between appearance and shape-based representation. Our experimental analysis shows that coherence constraints improve the quality of the expression recognizer, thus offering a suitable basis to profitably exploit unsupervised video sequences, also in cases in which some portions of the input face are not visible
Machine Learning: A Constraint-Based Approach
Machine Learning: A Constraint-Based Approach, Second Edition provides readers with a refreshing look at the basic models and algorithms of machine learning, with an emphasis on current topics of interest that include neural networks and kernel machines. The book presents the information in a truly unified manner that is based on the notion of learning from environmental constraints. It draws a path towards deep integration with machine learning that relies on the idea of adopting multivalued logic formalisms, such as in fuzzy systems. Special attention is given to deep learning, which nicely fits the constrained-based approach followed in this book. The book presents a simpler unified notion of regularization, which is strictly connected with the parsimony principle, including many solved exercises that are classified according to the Donald Knuth ranking of difficulty, which essentially consists of a mix of warm-up exercises that lead to deeper research problems. A software simulator is also included
Toward Improving the Evaluation of Visual Attention Models: A Crowdsourcing Approach
Human visual attention is a complex phenomenon. A computational modeling of this phenomenon must take into account where people look in order to evaluate which are the salient locations (spatial distribution of the fixations), when they look in those locations to understand the temporal development of the exploration (temporal order of the fixations), and how they move from one location to another with respect to the dynamics of the scene and the mechanics of the eyes (dynamics). State-of-the-art models focus on learning saliency maps from human data, a process that only takes into account the spatial component of the phenomenon and ignore its temporal and dynamical counterparts. In this work we focus on the evaluation methodology of models of human visual attention. We underline the limits of the current metrics for saliency prediction and scanpath similarity, and we introduce a statistical measure for the evaluation of the dynamics of the simulated eye movements. While deep learning models achieve astonishing performance in saliency prediction, our analysis shows their limitations in capturing the dynamics of the process. We find that unsupervised gravitational models, despite of their simplicity, outperform all competitors. Finally, exploiting a crowd-sourcing platform, we present a study aimed at evaluating how strongly the scanpaths generated with the unsupervised gravitational models appear plausible to naive and expert human observers
A language modeling-like approach to sketching
Sketching is a universal communication tool that, despite its simplicity, is able to efficiently express a large variety of concepts and, in some limited contexts, it can be even more immediate and effective than natural language. In this paper we explore the feasibility of using neural networks to approach sketching in the same way they are commonly used in Language Modeling. We propose a novel approach to what we refer to as “Sketch Modeling”, in which a neural network is exploited to learn a probabilistic model that estimates the probability of sketches. We focus on simple sketches and, in particular, on the case in which sketches are represented as sequences of segments. Segments and sequences can be either given – when the sketches are originally drawn in this format – or automatically generated from the input drawing by means of a procedure that we designed to create short sequences, loosely inspired by the human behavior. A Recurrent Neural Network is used to learn the sketch model and, afterward, the network is seeded with an incomplete sketch that it is asked to complete, generating one segment at each time step. We propose a set of measures to evaluate the outcome of a Beam Search-based generation procedure, showing how they can be used to identify the most promising generations. Our experimental analysis assesses the feasibility of this way of modeling sketches, also in the case in which several different categories of sketches are considered
Generate and Revise: Reinforcement Learning in Neural Poetry
Writers, poets, singers usually do not create their compositions in just one breath. Text is revisited, adjusted, modified, rephrased, even multiple times, in order to better convey meanings, emotions and feelings that the author wants to express. Amongst the noble written arts, Poetry is probably the one that needs to be elaborated the most, since the composition has to formally respect predefined meter and rhyming schemes. In this paper, we propose a framework to generate poems that are repeatedly revisited and corrected, as humans do, in order to improve their overall quality. We frame the problem of revising poems in the context of Reinforcement Learning and, in particular, using Proximal Policy Optimization. Our model generates poems from scratch and it learns to progressively adjust the generated text in order to match a target criterion. We evaluate this approach in the case of matching a rhyming scheme, without having any information on which words are responsible of creating rhymes and on how to coherently alter the poem words. The proposed framework is general and, with an appropriate reward shaping, it can be applied to other text generation problems
Learning in text streams: discovery and disambiguation of entity and relation instances
We consider a scenario where an artificial agent is reading a stream of text composed of a set of narrations, and it is informed about the identity of some of the individuals that are mentioned in the text portion that is currently being read. The agent is expected to learn to follow the narrations, thus disambiguating mentions and discovering new individuals. We focus on the case in which individuals are entities and relations and propose an end-to-end trainable memory network that learns to discover and disambiguate them in an online manner, performing one-shot learning and dealing with a small number of sparse supervisions. Our system builds a not-given-in-advance knowledge base, and it improves its skills while reading the unsupervised text. The model deals with abrupt changes in the narration, considering their effects when resolving coreferences. We showcase the strong disambiguation and discovery skills of our model on a corpus of Wikipedia documents and on a newly introduced data set that we make publicly available
Inference, Learning, and Laws of Nature
Although inference and learning arise traditionally from different schools of thought, in the last few years they have been framed in nice unified frameworks, in the attempt to resemble clever human decision mechanisms. In this paper, however, we support the position that a true understanding of human-based inference and learning mechanisms might arise more naturally when replacing the focus on logic and probabilistic reasoning with that of cognitive laws, in the spirit of most variational laws of Nature. To this end, we propose a strong analogy between learning from constraints and analytic mechanics, which suggests us that agents living in their own environment obey laws exactly like those of particles subjected to a force field
Semi-supervised multiclass Kernel machines with probabilistic constraints
The extension of kernel-based binary classifiers to multiclass problems has been approached with different strategies in the last decades. Nevertheless, the most frequently used schemes simply rely on different criteria to combine the decisions of a set of independently trained binary classifiers. In this paper we propose an approach that aims at establishing a connection in the training stage of the classifiers using an innovative criterion. Motivated by the increasing interest in the semi-supervised learning framework, we describe a soft-constraining scheme that allows us to include probabilistic constraints on the outputs of the classifiers, using the unlabeled training data. Embedding this knowledge in the learning process can improve the generalization capabilities of the multiclass classifier, and it leads to a more accurate approximation of a probabilistic output without an explicit post-processing. We investigate our intuition on a face identification problem with 295 classes
Representation of facial features by Catmull-Rom splines
This paper describes a technique for the representation of the 2D frontal view of faces, based on Catmull-Rom splines. It takes advantage of the a priori knowledge about the face structure and of the proprieties of Catmull-Rom splines, like interpolation, smoothness and local control, in order to define a set of key points that correspond among different faces. Moreover, it can compactly describe the whole face even if the face features have not been completely localized. The proposed model has been tested in practical contexts of face analysis and promising qualitative results are included to illustrate its versatility and accuracy
- …
