1,721,010 research outputs found
Patch-Based Analysis of Visual Speech From Multiple Views
Obtaining a robust feature representation of visual speech is\ud
of crucial importance in the design of audio-visual automatic\ud
speech recognition systems. In the literature, when visual\ud
appearance based features are employed for this purpose,\ud
they are typically extracted using a "holistic" approach.\ud
Namely, a transformation of the pixel values of the entire\ud
region-of-interest (ROI) is obtained, with the ROI covering\ud
the speaker's mouth and often surrounding facial area. In\ud
this paper, we instead consider a "patch" based visual feature\ud
extraction approach, within the appearance based framework.\ud
In particular, we conduct a novel analysis to determine which\ud
areas (patches) of the mouth ROI are the most informative for visual speech. Furthermore, we extend this analysis beyond\ud
the traditional frontal views, by investigating profile views\ud
as well. Not surprisingly, and for both frontal and profile\ud
views, we conclude that the central mouth patches are the\ud
most informative, but less so than the holistic features of the\ud
entire ROI. Nevertheless, fusion of holistic and the best patch\ud
based features further improves visual speech recognition\ud
performance, compared to either feature set alone. Finally,\ud
we discuss scenarios where the patch based approach may be\ud
preferable to holistic features
Improving Pain Recognition Through Better Utilisation of Temporal Information
Automatically recognizing pain from video is a very useful application\ud
as it has the potential to alert carers to patients that are\ud
in discomfort who would otherwise not be able to communicate\ud
such emotion (i.e young children, patients in postoperative\ud
care etc.). In previous work [1], a “pain-no pain” system was\ud
developed which used an AAM-SVM approach to good effect.\ud
However, as with any task involving a large amount of video\ud
data, there are memory constraints that need to be adhered to\ud
and in the previous work this was compressing the temporal\ud
signal using K-means clustering in the training phase. In visual\ud
speech recognition, it is well known that the dynamics of the\ud
signal play a vital role in recognition. As pain recognition is\ud
very similar to the task of visual speech recognition (i.e. recognising\ud
visual facial actions), it is our belief that compressing\ud
the temporal signal reduces the likelihood of accurately recognising\ud
pain. In this paper, we show that by compressing the\ud
spatial signal instead of the temporal signal, we achieve better\ud
pain recognition. Our results show the importance of the temporal\ud
signal in recognizing pain, however, we do highlight some\ud
problems associated with doing this due to the randomness of a\ud
patient's facial actions
Facial feature detection for in-car environment
Acoustically, vehicles are extremely noisy environments\ud
and as a consequence audio-only in-car voice recognition\ud
systems perform very poorly. Seeing that the visual modality\ud
is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem. However, implementing such an approach requires a system being able to accurately locate and track the driver’s face and facial features in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using this system, we present our results which show that using the Viola-Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and\ud
head pose
A Unified Approach to Multi-Pose Audio-Visual ASR
The vast majority of studies in the field of audio-visual automatic\ud
speech recognition (AVASR) assumes frontal images of a\ud
speaker's face, but this cannot always be guaranteed in practice.\ud
Hence our recent research efforts have concentrated on extracting\ud
visual speech information from non-frontal faces, in particular\ud
the profile view. The introduction of additional views to an\ud
AVASR system increases the complexity of the system, as it has\ud
to deal with the different visual features associated with the various\ud
views. In this paper, we propose the use of linear regression\ud
to find a transformation matrix based on synchronous frontal\ud
and profile visual speech data, which is used to normalize the\ud
visual speech in each viewpoint into a single uniform view. In\ud
our experiments for the task of multi-speaker lipreading, we\ud
show that this "pose-invariant" technique reduces train/test mismatch\ud
between visual speech features of different views, and is\ud
of particular benefit when there is more training data for one\ud
viewpoint over another (e.g. frontal over profile)
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Patch-Based Representation of Visual Speech
Visual information from a speaker's mouth region is known to\ud
improve automatic speech recognition robustness, especially in the presence of acoustic noise. To date, the vast majority of work in this field has viewed these visual features in a holistic manner, which may not take into account the various changes that occur within articulation (process of changing the shape of the vocal tract using the articulators, i.e lips and jaw). Motivated by the work being conducted in fields of audio-visual automatic speech\ud
recognition (AVASR) and face recognition using articulatory\ud
features (AFs) and patches respectively, we present a\ud
proof of concept paper which represents the mouth region as a ensemble of image patches. Our experiments show that by dealing with the mouth region in this manner, we are able to extract more speech information from the visual domain. For the task of visual-only speaker-independent isolated digit recognition, we were able to improve the relative word error rate by more than 23\% on the CUAVE audio-visual corpus
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Problems Associated With Current Area-Based Visual Speech Feature Extraction Techniques
Techniques such as principle component analysis (PCA),\ud
linear discriminant analysis (LDA) and the discrete cosine\ud
transform (DCT) have all been used to good effect in face\ud
recognition. As these techniques are able to compactly represent\ud
a set of features, researchers have sought to use these\ud
methods to extract the visual speech content for audio-visual\ud
speech recognition (AVSR). In this paper, we expose the\ud
problems of employing such techniques in AVSR by running\ud
some visual-only speech recognition experiments. The\ud
results of these experiments illustrate that current area-based\ud
feature extraction techniques are heavily dependent on the\ud
visual front-end, as well as being ineffective in decoupling\ud
adequate speech content from a speaker’s mouth. As a potential\ud
solution, we introduce the concept of a free-parts representation,\ud
which may be able to circumvent many of these\ud
current problems experience by current area-based techniques
- …
