Search CORE

1,721,066 research outputs found

An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition

Author: C. -H. Lee
S. M. SINISCALCHI
Publication venue
Publication date: 01/01/2013
Field of study

The field of automatic speech recognition (ASR) has enjoyed more than 30 years of technology advances due to the extensive utilization of the hidden Markov model (HMM) framework and a concentrated effort by the speech community to make available a vast amount of speech and language resources, known today as the Big Data Paradigm. State-of-the-art ASR systems achieve a high recognition accuracy for well-formed utterances of a variety of languages by decoding speech into the most likely sequence of words among all possible sentences represented by a finite-state network (FSN) approximation of all the knowledge sources required by the ASR task. However, the ASR problem is still far from being solved because not all information available in the speech knowledge hierarchy can be directly integrated into the FSN to improve the ASR performance and enhance system robustness. It is believed that some of the current issues of integrating various knowledge sources in top-down integrated search can be partially addressed by processing techniques that take advantage of the full set of acoustic and language information in speech. It has long been postulated that human speech recognition (HSR) determines the linguistic identity of a sound based on detected evidence that exists at various levels of the speech knowledge hierarchy, ranging from acoustic phonetics to syntax and semantics. This calls for a bottom-up attribute detection and knowledge integration framework that links speech processing with information extraction, by spotting speech cues with a bank of attribute detectors, weighting and combining acoustic evidence to form cognitive hypotheses, and verifying these theories until a consistent recognition decision can be reached. The recently proposed automatic speech attribute transcription (ASAT) framework is an attempt to mimic some HSR capabilities with asynchronous speech event detection followed by bottom-up knowledge integration and verification. In the last few years, ASAT has demonstrated good potential and has been applied to a variety of existing applications in speech processing and information extraction

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Adaptation to New Microphones Using Artificial Neural Networks With Trainable Activation Functions

Author: S. M. SINISCALCHI
V. M. Salerno
Publication venue
Publication date: 2017
Field of study

Model adaptation is a key technique that enables a modern automatic speech recognition (ASR) system to adjust its parameters, using a small amount of enrolment data, to the nuances in the speech spectrum due to microphone mismatch in the training and test data. In this brief, we investigate four different adaptation schemes for connectionist (also known as hybrid) ASR systems that learn microphone-specific hidden unit contributions, given some adaptation material. This solution is made possible adopting one of the following schemes: 1) the use of Hermite activation functions; 2) the introduction of bias and slope parameters in the sigmoid activation functions; 3) the injection of an amplitude parameter specific for each sigmoid unit; or 4) the combination of 2) and 3). Such a simple yet effective solution allows the adapted model to be stored in a small-sized storage space, a highly desirable property of adaptation algorithms for deep neural networks that are suitable for large-scale online deployment. Experimental results indicate that the investigated approaches reduce word error rates on the standard Spoke 6 task of the Wall Street Journal corpus compared with unadapted ASR systems. Moreover, the proposed adaptation schemes all perform better than simple multicondition training and comparable favorably against conventional linear regression-based approaches while using up to 15 orders of magnitude fewer parameters. The proposed adaptation strategies are also effective when a single adaptation sentence is available

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Erratum to: Effect of moderate red wine intake on cardiac prognosis after recent acute myocardial infarction of subjects with Type 2 diabetes mellitus (Diabetic Medicine, (2006), 23, 9, (974-981), 10.1111/j.1464-5491.2006.01886.x)

Author: Ammendola C.
MARFELLA RAFFAELE
Marchese F.
D'Andrea F.
Nicoletti G. F.
Sardu C.
Liberti D.
Marfella R.
Verza M.
Cinone F.
Chiorazzo G.
Musacchio E.
Cacciapuoti F.
Sasso F. C.
Ruggiero L.
Marfella M. A.
Siniscalchi M.
Coppola L.
Publication venue
Publication date: 01/01/2017
Field of study

In an article by Marfella et al, the author name C. Saron is incorrect and should be listed as C. Sardu. Therefore the correct author list is: R. Marfella, F. Cacciapuoti, M. Siniscalchi, F. C. Sasso, F. Marchese, F. Cinone, E. Musacchio, M. A. Marfella, L. Ruggiero, G. Chiorazzo, D. Liberti, G. Chiorazzo, G. F. Nicoletti, C. Sardu, F. D'Andrea, C. Ammendola, M. Verza and L. Coppola.In an article by Marfella etÂ al, the author name C. Saron is incorrect and should be listed as C. Sardu. Therefore the correct author list is: R. Marfella, F. Cacciapuoti, M. Siniscalchi, F. C. Sasso, F. Marchese, F. Cinone, E. Musacchio, M. A. Marfella, L. Ruggiero, G.Â Chiorazzo, D. Liberti, G. Chiorazzo, G. F. Nicoletti, C. Sardu, F. D'Andrea, C. Ammendola, M. Verza and L. Coppola

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository

Appropriate Similarity Measures for Author Cocitation Analysis

Author: Waltman L.R.
Eck N.J.P. van
Publication venue
Publication date
Field of study

We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authorsâ€™ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

Research Papers in Economics

Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems

Author: J. Li
S. M. SINISCALCHI
C. -H. Lee
Publication venue
Publication date: 01/01/2013
Field of study

Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach

Archivio istituzionale della ricerca - Università di Palermo

Joint optimization of event detectors and evidence merger for continuous phone recognition

Author: S. M. SINISCALCHI
M. H. JOHNSEN
AND T.SVENDSEN
O. BIRKENES
Publication venue
Publication date: 01/01/2008
Field of study

In the recent years, different data-driven methods have been proposed to detect articulatory features (AF) from short-term spectral representation. The main motivations for the AF based approach are as follows. First, the AFs in general can more accurately and parsimoniously characterize the acoustic variability associated with conversational speech. Further, while not explored in this work, AFs are more language universal than phones, and therefore they can generalize better and are easier to adapt to new languages. For use in phone based systems the AF scores are input to an evidence merger which produces phone posteriors as outputs. Several classifiers are usually built, and each classifier is trained for detecting a single articulatory feature (describing manner and/or place). We believe that joint optimization of all the classifiers and the subsequent phone evidence merger may be beneficial for the classification performance. This work is a preliminary study on this direction, and it is validated on the continuous phone recognition task. A bank of articulatory detectors, designed using hidden Markov models (HMMs), learns the mapping from the MFCC space to the articulatory space. The detectors’ outputs are then combined by the evidence merger. The AF based phone posteriors is integrated into an existing ASR engine and applied to N-best rescoring. Experimental results show promising performance on the TIMIT corpu

Archivio istituzionale della ricerca - Università di Palermo

A Multi-Objective Programming-Based Approach to Language Model Adaptation

Author: AND C.-H. LEE
S. YAMAN
S. M. SINISCALCHI
Publication venue
Publication date: 01/01/2009
Field of study

In this paper, we present a multi-layer learning approach to the language model (LM) adaptation problem by making use of multi-objective programming (MOP). The overall objective function of conventional MAP-based LM adaptation is implicitly a composition of two objective functions: The first objective is concerned with the maximum likelihood estimation of the model parameters from the indomain data while the second objective is concerned with an appropriate representation of prior information obtained from a general purpose corpus. In this paper, we separate these individual objective functions, which are at least partially conflicting, and take an MOP approach to LM adaptation. The resulting MOP problem is solved in an iterative manner such that each objective is optimized one after another with constraints on the others. This iterative solution can be represented as a multi-layer learning problem in each layer of which only one objective is minimized with constraints on others. In estimating an n-gram LM, number of the layers is given by 2× n with one hidden unit per layer. The inputs to the hidden units are LMs of order up to n that are estimated either from the general purpose corpus or from the in-domain data. When solved this way, the target LM is in the form of a log-linear interpolation of component LMs. In our preliminary experiments with bigram LMs, the proposed approach slightly outperformed linear interpolation. In our ongoing work with trigram LMs, we expect the proposed approach to outperform linear interpolation in terms of both the perplexity and the automatic speech recognition work error rate

Archivio istituzionale della ricerca - Università di Palermo