1,721,066 research outputs found
An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition
The field of automatic speech recognition (ASR) has enjoyed more than 30 years of technology advances due to the extensive utilization of the hidden Markov model (HMM) framework and a concentrated effort by the speech community to make available a vast amount of speech and language resources, known today as the Big Data Paradigm. State-of-the-art ASR systems achieve a high recognition accuracy for well-formed utterances of a variety of languages by decoding speech into the most likely sequence of words among all possible sentences represented by a finite-state network (FSN) approximation of all the knowledge sources required by the ASR task. However, the ASR problem is still far from being solved because not all information available in the speech knowledge hierarchy can be directly integrated into the FSN to improve the ASR performance and enhance system robustness. It is believed that some of the current issues of integrating various knowledge sources in top-down integrated search can be partially addressed by processing techniques that take advantage of the full set of acoustic and language information in speech. It has long been postulated that human speech recognition (HSR) determines the linguistic identity of a sound based on detected evidence that exists at various levels of the speech knowledge hierarchy, ranging from acoustic phonetics to syntax and semantics. This calls for a bottom-up attribute detection and knowledge integration framework that links speech processing with information extraction, by spotting speech cues with a bank of attribute detectors, weighting and combining acoustic evidence to form cognitive hypotheses, and verifying these theories until a consistent recognition decision can be reached. The recently proposed automatic speech attribute transcription (ASAT) framework is an attempt to mimic some HSR capabilities with asynchronous speech event detection followed by bottom-up knowledge integration and verification. In the last few years, ASAT has demonstrated good potential and has been applied to a variety of existing applications in speech processing and information extraction
Adaptation to New Microphones Using Artificial Neural Networks With Trainable Activation Functions
Model adaptation is a key technique that enables a modern automatic speech recognition (ASR) system to adjust its parameters, using a small amount of enrolment data, to the nuances in the speech spectrum due to microphone mismatch in the training and test data. In this brief, we investigate four different adaptation schemes for connectionist (also known as hybrid) ASR systems that learn microphone-specific hidden unit contributions, given some adaptation material. This solution is made possible adopting one of the following schemes: 1) the use of Hermite activation functions; 2) the introduction of bias and slope parameters in the sigmoid activation functions; 3) the injection of an amplitude parameter specific for each sigmoid unit; or 4) the combination of 2) and 3). Such a simple yet effective solution allows the adapted model to be stored in a small-sized storage space, a highly desirable property of adaptation algorithms for deep neural networks that are suitable for large-scale online deployment. Experimental results indicate that the investigated approaches reduce word error rates on the standard Spoke 6 task of the Wall Street Journal corpus compared with unadapted ASR systems. Moreover, the proposed adaptation schemes all perform better than simple multicondition training and comparable favorably against conventional linear regression-based approaches while using up to 15 orders of magnitude fewer parameters. The proposed adaptation strategies are also effective when a single adaptation sentence is available
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Erratum to: Effect of moderate red wine intake on cardiac prognosis after recent acute myocardial infarction of subjects with Type 2 diabetes mellitus (Diabetic Medicine, (2006), 23, 9, (974-981), 10.1111/j.1464-5491.2006.01886.x)
In an article by Marfella et al, the author name C. Saron is incorrect and should be listed as C. Sardu. Therefore the correct author list is: R. Marfella, F. Cacciapuoti, M. Siniscalchi, F. C. Sasso, F. Marchese, F. Cinone, E. Musacchio, M. A. Marfella, L. Ruggiero, G. Chiorazzo, D. Liberti, G. Chiorazzo, G. F. Nicoletti, C. Sardu, F. D'Andrea, C. Ammendola, M. Verza and L. Coppola.In an article by Marfella et al, the author name C. Saron is incorrect and should be listed as C. Sardu. Therefore the correct author list is: R. Marfella, F. Cacciapuoti, M. Siniscalchi, F. C. Sasso, F. Marchese, F. Cinone, E. Musacchio, M. A. Marfella, L. Ruggiero, G. Chiorazzo, D. Liberti, G. Chiorazzo, G. F. Nicoletti, C. Sardu, F. D'Andrea, C. Ammendola, M. Verza and L. Coppola
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems
Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach
Joint optimization of event detectors and evidence merger for continuous phone recognition
In the recent years, different data-driven methods have been
proposed to detect articulatory features (AF) from short-term
spectral representation. The main motivations for the AF based
approach are as follows. First, the AFs in general can more accurately and parsimoniously characterize the acoustic variability associated with conversational speech. Further, while not
explored in this work, AFs are more language universal than
phones, and therefore they can generalize better and are easier to adapt to new languages. For use in phone based systems
the AF scores are input to an evidence merger which produces
phone posteriors as outputs.
Several classifiers are usually built, and each classifier is trained
for detecting a single articulatory feature (describing manner
and/or place). We believe that joint optimization of all the classifiers and the subsequent phone evidence merger may be beneficial for the classification performance. This work is a preliminary study on this direction, and it is validated on the continuous
phone recognition task. A bank of articulatory detectors, designed using hidden Markov models (HMMs), learns the mapping from the MFCC space to the articulatory space. The detectors’ outputs are then combined by the evidence merger. The
AF based phone posteriors is integrated into an existing ASR
engine and applied to N-best rescoring. Experimental results
show promising performance on the TIMIT corpu
A Multi-Objective Programming-Based Approach to Language Model Adaptation
In this paper, we present a multi-layer learning approach to the language model (LM) adaptation problem by making use of multi-objective programming (MOP). The overall objective function of conventional MAP-based LM adaptation is implicitly a composition of two objective functions: The first objective is concerned with the maximum likelihood estimation of the model parameters from the indomain data while the second objective is concerned with an appropriate representation of prior information obtained from a general purpose corpus. In this paper, we separate these individual objective functions, which are at least partially conflicting, and take an MOP approach to LM adaptation. The resulting MOP problem is solved in an iterative manner such that each objective is optimized one after another with constraints on the others. This iterative solution can be represented as a multi-layer learning problem in each layer of which only one objective is minimized with constraints on others. In estimating an n-gram LM, number of the layers is given by 2× n with one hidden unit per layer. The inputs to the hidden units are LMs of order up to n that are estimated either from the general purpose corpus or from the in-domain data. When solved this way, the target LM is in the form of a log-linear interpolation of component LMs. In our preliminary experiments with bigram LMs, the proposed approach slightly outperformed linear interpolation. In our ongoing work with trigram LMs, we expect the proposed approach to outperform linear interpolation in terms of both the perplexity and the automatic speech recognition work error rate
- …
