1,721,022 research outputs found

    Sistemi di interazione vocale per la domotica

    Full text link
    Una delle questioni aperte nell'ambito dell'home automation è la realizzazione di interfacce uomo-macchina che siano non solo efficaci per il controllo di un sistema, ma anche facilmente accessibili. La voce è il mezzo naturale per comunicare richieste e comandi, quindi l'interfaccia vocale presenta notevoli vantaggi rispetto alle soluzioni touch-screen, interruttori ecc.. Il lavoro di tesi proposto è finalizzato alla realizzazione di un sistema di interazione vocale per l'home automation, in grado non solo di riconoscere singoli comandi veicolati da segnali vocali, ma anche di personalizzare i servizi richiesti tramite il riconoscimento del parlatore e di interagire mediante il parlato sintetizzato. Per ciascuna tipologia di interazione vocale, verranno proposte soluzioni volte a superare i limiti dell'approccio classico in letteratura. In prima analisi, verrà presentato un sistema di riconoscimento vocale distribuito (DSR) per il controllo delle luci, che implementa ottimizzazioni ad-hoc per operare nell'ambiente in modo non invasivo e risolvere le problematiche di uno scenario reale. Nel sistema DSR sarà integrato un algoritmo di identificazione del parlatore per ottenere un sistema in grado di personalizzare i comandi sulla base dell'utente riconosciuto. Un sistema di identificazione vocale deve essere in grado di classificare l'utente con frasi della durata inferiore a 5 s. A tal fine verrà proposto un algoritmo basato su truncated Karhunen-Loève transform con performance, su brevi sequenze di speech (< 3.5 s), migliori della convenzionale tecnica basata su Mel-Cepstral coefficients. Verrà infine proposto un framework di sintesi vocale Hidden Markov Model/unit-selection basato su Modified Discrete Cosine Transform, che garantisce la perfetta ricostruibilità del segnale e supera i limiti imposti dalla tecnica Mel-cepstral. Gli algoritmi ed il sistema proposto saranno applicati a segnali acquisiti in condizioni realistiche, al fine di verificarne l’adeguatezza

    Machine Learning in Electronic and Biomedical Engineering

    Full text link
    In recent years, machine learning (ML) algorithms have become of paramount importance in computer science research, both in the electronic and biomedical fields [...

    Principal Tensor Embedding for Unsupervised Tensor Learning

    Full text link
    Tensors and multiway analysis aim to explore the relationships between the variables used to represent the data and find a summarization of the data with models of reduced dimensionality. However, although in this context a great attention was devoted to this problem, dimension reduction of high-order tensors remains a challenge. The aim of this article is to provide a nonlinear dimensionality reduction approach, named principal tensor embedding (PTE), for unsupervised tensor learning, that is able to derive an explicit nonlinear model of data. As in the standard manifold learning (ML) technique, it assumes multidimensional data lie close to a low-dimensional manifold embedded in a high-dimensional space. On the basis of this assumption a local parametrization of data that accurately captures its local geometry is derived. From this mathematical framework a nonlinear stochastic model of data that depends on a reduced set of latent variables is obtained. In this way the initial problem of unsupervised learning is reduced to the regression of a nonlinear input-output function, i.e. a supervised learning problem. Extensive experiments on several tensor datasets demonstrate that the proposed ML approach gives competitive performance when compared with other techniques used for data reconstruction and classification

    A machine learning method to determine intrinsic dimension of time series data

    No full text
    The estimation of Intrinsic Dimension (ID) of data is particularly crucial in the unsupervised learning of nonlinear time series, as it essentially represents the minimum number of parameters to describe the data. The aim of this paper is to give both a new theoretical contribution and a machine learning algorithm that can be used for the ID estimation of time series. Several experimental results validate the proposed approach

    HMM speech synthesis based on MDCT representation

    No full text
    Hidden Markov model (HMM) based text-to-speech (TTS) has become one of the most promising approaches, as it has proven to be a particularly flexible and robust framework to generate synthetic speech. However, several factors such as mel-cepstral vocoder and over-smoothing are responsible for causing quality degradation of synthetic speech. This paper presents an HMM speech synthesis technique based on the modified discrete cosine transform (MDCT) representation to cope with these two issues. To this end, we use an analysis/synthesis technique based on MDCT that guarantees a perfect reconstruction of the signal frame from feature vectors and allows for a 50% overlap between frames without increasing the data vector, in contrast to the conventional mel-cepstral spectral parameters that do not ensure direct speech waveform reconstruction. Experimental results show that a sound of good quality, conveniently evaluated using both objective and subjective tests, is obtained

    An Investigation on the Accuracy of Truncated DKLT Representation for Speaker Identification With Short Sequences of Speech Frames

    No full text
    Speaker identification plays a crucial role in biometric person identification as systems based on human speech are increasingly used for the recognition of people. Mel frequency cepstral coefficients (MFCCs) have been widely adopted for decades in speech processing to capture the speech-specific characteristics with a reduced dimensionality. However, although their ability to decorrelate the vocal source and the vocal tract filter make them suitable for speech recognition, they greatly mitigate the speaker variability, a specific characteristic that distinguishes different speakers. This paper presents a theoretical framework and an experimental evaluation showing that reducing the dimension of features by applying the discrete Karhunen-Loève transform (DKLT) to the log-spectrum of the speech signal guarantees better performance compared to conventional MFCC features. In particular with short sequences of speech frames, with typical duration of less than 2 s, the performance of truncated DKLT representation achieved for the identification of five speakers are always better than those achieved with the MFCCs for the experiments we performed. Additionally, the framework was tested on up to 100 TIMIT speakers with sequences of less than 3.5 s showing very good recognition capabilities

    Multi-class ECG beat classification based on a Gaussian mixture model of Karhunen-Loève transform

    No full text
    Cardiovascular diseases are one of the main causes of death around the world. Automatic classification of electrocardiogram (ECG) signals is of paramount importance in the unmanned detection of a wide range of heartbeat abnormalities. In this paper an effective multi-class beat classifier, based on a statistical identification of a minimum-complexity model, is presented. This methodology extracts from the ECG signal the multivariate relationships of its natural modes, by means of the separation property of the Karhunen-Loève transform (KLT). Then, it exploits an optimized expectation maximization (EM) algorithm to find the optimal parameters of a Gaussian mixture model, with the focus being in reducing the number of parameters. The resulting statistical model is thus based on the estimation of the multivariate probability density function (PDF) that characterizes each beat type. Based on the above statistical characterization a multi-class ECG classification was performed. The experiments, conducted on the ECG signals from the MIT-BIH arrhythmia database, demonstrated the validity and, considering the reduced model size, the excellent performance of this technique to classify the ECG signals into different disease categories

    Wireless surface electromyograph and electrocardiograph system on 802.15.4

    No full text
    This paper presents a flexible low-cost wireless system specifically designed to acquire fitness metrics both from surface electromyographic (sEMG) and electrocardiographic (ECG) signals. The system, that can be easily extended to capture and process many other biological signals as well as the motion-related body signals, consists of several ultralight wireless sensing nodes that acquire, amplify, digitize, and transmit the biological or mechanical signals to one or more base stations through a 2.4 GHz radio link using a custom-made communication protocol designed on top of the IEEE 802.15.4 physical layer. The number of wireless nodes the base stations can handle depends on the type of signal being acquired. Each base station is connected through an USB link to a control PC running a user interface software for viewing, recording, and analyzing the data. The system for acquiring signals from wearable nodes in combination with a smartphone application provides a complete platform for monitoring fitness metrics extracted from the signals

    Embedded Real-Time Vehicle and Pedestrian Detection Using a Compressed Tiny YOLO v3 Architecture

    Full text link
    Vehicle and pedestrian detection (VaPD) is one of the most critical tasks in an advanced driver assistance system which help the driver to drive safely and save the pedestrian life. VaPD is a typical object detection problem that requires a trade-off among accuracy, speed, and memory consumption. Most existing methods focus on improving detection accuracy, while ignoring VaPD requires real-time detection speed with limited computational resources. Thus, it is of primary importance to study light-weight and real-time VaPD methods for embedded devices, that is hardware platforms with limited computation and memory resources. To deal with these issues, this paper proposes a low-rank (LR) Tiny YOLO v3 architecture that meets the requirements of real-time VaPD on embedded systems. The architecture has been developed starting from Tiny YOLO v3 adopting a convolutional neural network compression technique based on Tucker tensor decomposition, able to reduce the computational complexity of the network. A wide experimentation has been carried out on two embedded platforms, Raspberry Pi 4 and NVIDIA Jetson Nano 2 GB, and two datasets commonly used for VaPD, PASCAL VOC and KITTI dataset, showing the superiority of the LR Tiny YOLO v3 with respect to the state-of-the-art networks in obtaining the best compromise between inference time, accuracy and memory occupancy. Moreover, the proposed architecture meets the requirements of VaPD on embedded systems using only 22% of the memory required by the baseline Tiny YOLO v3 Darknet, and always providing better inference time (36.46 FPS) with only a marginal decrease in accuracy ( \sim 2%)

    Reduced complexity algorithm for heart rate monitoring from PPG signals using automatic activity intensity classifier

    No full text
    Photoplethysmography (PPG) is a well-studied and promising technique to detect heart rate (HR) using cheap, non-invasive, wrist-wearable sensors that sense the amount of light reflected by the skin, related to the blood flow beneath. Still, the main issue is the high sensitivity to motion, which produces severe artifacts in the signal, often impeding accurate HR tracking. In this paper we present a method that combines an automatic activity intensity classifier, to select the proper amount of artifact cleaning that needs to be performed on the signal, with a geometric-based signal subspace approach to estimate the HR component of the PPG signal. Experimental evaluation is performed over a widely available dataset and the results are compared to an ECG-derived golden standard
    corecore