Search CORE

1,720,975 research outputs found

Experiments on Hiwire database using Denoising and Adaptation with an hybrid HMM-ANN Model

Author: MANA F
SCANZIO STEFANO
GEMELLO R
Publication venue
Publication date: 01/01/2007
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Word Confidence Using Duration Models

Author: SCANZIO S
COLIBRO D
LAFACE Pietro
GEMELLO R.
Publication venue
Publication date: 01/01/2009
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Parallel implementation of Artificial Neural Network training for speech recognition

Author: MANA F
CUMANI SANDRO
SCANZIO S
GEMELLO R
LAFACE Pietro
Publication venue
Publication date: 01/01/2010
Field of study

In this paper we describe the implementation of a complete ANN training procedure using the block mode back-propagation learning algorithm for sequential patterns – such as the observation feature vectors of a speech recognition system – exploiting the high performance SIMD architecture of GPU using CUDA and its C-like language interface. We also compare the speed-up obtained implementing the training procedure only taking advantage of the multi-thread capabilities of multi-core processors. In our implementation we take into account all the peculiar aspects of training large scale sequential patterns, in particular, the re-segmentation of the training sentences, the block size for the feed-forward and for the back-propagation steps, and the transfer of huge amount of data from host memory to the GPU card. Our approach has been tested by training acoustic models for large vocabulary speech recognition tasks, showing a six times reduction of the time required to train real-world large size networks with respect to an already optimized implementation using the Intel MKL libraries. Thanks to these optimizations and to the support of the GPU, the training time for language having a huge set of training sentences (about one million for Italian) can be reduced from approximately a month to 5 days

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Adaptation of Artificial Neural Networks Avoiding Catastrophic Forgetting

Author: MANA F
SCANZIO STEFANO
GEMELLO R
LAFACE Pietro
ALBESANO D
Publication venue
Publication date: 01/01/2006
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Adaptation of Hybrid ANN/HMM using Weights Interpolation

Author: SCANZIO STEFANO
MANA F.
GEMELLO R
LAFACE Pietro
Publication venue
Publication date: 01/01/2006
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Speeding-up Neural Network Training Using Sentence and Frame Selection

Author: SCANZIO STEFANO
MANA F.
GEMELLO R
LAFACE Pietro
Publication venue
Publication date: 01/01/2007
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Multi source neural networks based on fixed and multiple resolution analysis for speech recognition

Author: Mana F.
Gemello R.
Albesano D
PEGORARO PAOLO ATTILIO
Publication venue
Publication date: 01/01/2001
Field of study

This paper reports the results obtained by an Automatic Speech Recognition system when MFCCs, J-RASTA Perceptual Linear Prediction Coefficients (J-Rasta PLP) and energies from a Multi Resolution Analysis (MRA) tree of filters are used as input features to a hybrid system consisting of a Neural Network (NN) which provides observation probabilities for a network of Hidden Markov Models (HMM). Furthermore, the paper compares the performance of the system when various combinations of these features are used showing a WER reduction of 20% w.r.t. the use of J-Rasta PLP coefficients, when J-Rasta PLP coefficients are combined with the energies computed at the output of the leaves of an MRA filter tree. Such a combination is practically feasible thanks to the use of a NN architecture designed to integrate multiple features, exploiting the NN capability of mixing several input parameters without any assumption about their stochastical independence. Recognition is performed on a very large test set including many speakers uttering proper names from different locations of the Italian public telephone network

Archivio istituzionale della ricerca - Università di Cagliari