1,721,229 research outputs found
Bayesian Estimation Methods for N-Gram Language Model Adaptation
Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user’s language. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is provided by the Bayesian learning framework. Beyond the classical estimates, a Bayes derived interpolation model is proposed. Empirical comparisons have been carried out on a 10,000-word radiological reporting domain. Results are provided in terms of perplexity and recognition accurac
Efficient Language Model Adaptation through MDI Estimation
This paper presents a method for n-gram language model adaptation based on the principle of minimum discrimination information. A background language model is adapted to fit constraints on its marginal distributions that are derived from new observed data. This work gives a different derivation of the model by Kneser et al. (1997) and extends its application to interpolated language models. The proposed method has been evaluated on an Italian 60K-word broadcast news tas
Adaptive Estimation of N-gram Language Models
Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user’s language. As a matter of fact, even inside a single application domain (e.g. medical reporting), people use language in different ways, and consequently with different statistical features. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is suggested by the techniques employed in acoustic modeling to adapt a speaker-independent model to a speaker-dependent one. In fact, a language model could first be estimated on a large user-independent corpus and incrementally adapted to the user language, during system’s usage. In this paper, the Bayesian and the maximum ‘a posteriori’ adaptation methods are presented and an interpolation model is derived. Moreover, an EM derived algorithm for estimating the latter model is described. Experimental comparisons have been carried out in terms of perplexity and recognition accuracy. The interpolation model outperforms the classical methods with only few thousands of training words and it is competitive with language model estimation when enough training data are availabl
A System for the Retrieval of Italian Broadcast News
This paper presents a prototype for the retrieval of Italian broadcast news, which has been developed at ITC-irst. The architecture employs a speech recognition engine for the automatic transcription of audio news . Moreover, it features document indexing based on part-of-speech tagging of text coupled with morphological analysis, and query expansion exploiting the Italian WordNet thesaurus. Query-document matching is based on a statistical term weighting scheme. The system was tested on a 203 story collection of audio news, augmented with 9,500 newspaper articles. The evaluation was based on a `known item` retrieval task and aimed at evaluating the impact of speech recognition errors and query expansion on retrieval performanc
In-the-field evaluation of a speech based data-entry system
This paper reports on the field-test of a speech based data-entry system jointly developed by ITC-Irst within an EC funded project. Usability and performance measures relative to an extensive period of usage by a significant group of users are presented and discusse
Usability Evaluation of a Spoken Data-Entry Interface
This paper reports on the field-test of a speech based data-entry system developed as a follow-up of an EC funded project. The application domain is the data-entry of personnel absence records from a huge historical paper file (about 100,000 records). The application was required by the personnel office of a public administration. The tested system resulted both sufficiently simple to make a detailed analysis feasible, and sufficiently representative of the potentials of spoken data-entr
Language Model Adaptation through TOpic Decomposition and MDI Estimation
This work presents a language model adaptation method combining the latent semantic analysis framework with the minimum discrimination information estimation criterion. In particular, an unsupervised topic model decomposition is built which allows to infer topic related word distributions from very short adaptation texts. The resulting word distribution is then used to contraint the estimation of a minimum divergence trigram language. With respect to previous work, implementation detais are discussed that make such approach effective for a large scale application. Experimental results are provided for a digital library indexing task, i.e. the speech transcription of five historical documentary films. By adapting a trigram language model from very terse content descriptions, i.e. maximum ten words, available for each film a word error rate relative reduction of 3.2% was achieve
Model Selection Criteria for Acoustic Segmentation
Robust acoustic segmentation has become a critical issue in order to apply speech recognition to audio streams with variable acoustic content, e.g. radio programs. Many techniques in the literature base segmentation on statistical model selection, by applying the Bayesan Information Criterion. This work reviews alternative model selection criteria and presents comparative experiments both under controlled conditions and on a broadcast news corpu
- …
