1,720,963 research outputs found

    Anomalous sound event detection: A survey of machine learning based methods and applications

    No full text
    With the development of multi-modal man-machine interaction, audio signal analysis is gaining importance in a field traditionally dominated by video. In particular, anomalous sound event detection offers novel options to improve audio-based man-machine interaction, in many useful applications such as surveillance systems, industrial fault detection and especially safety monitoring, either indoor or outdoor. Event detection from audio can fruitfully integrate visual information and can outperform it in some respects, thus representing a complementary perceptual modality. However, it also presents specific issues and challenges. In this paper, a comprehensive survey of anomalous sound event detection is presented, covering various aspects of the topic, ı.e.feature extraction methods, datasets, evaluation metrics, methods, applications, and some open challenges and improvement ideas that have been recently raised in the literature

    Duration modelling and evaluation for Arabic statistical parametric speech synthesis

    No full text
    Sound duration is responsible for rhythm and speech rate. Furthermore, in some languages phoneme length is an important phonetic and prosodic factor. For example, in Arabic, gemination and vowel quantity are two important characteristics of the language. Therefore, accurate duration modelling is crucial for Arabic TTS systems. This paper is interested in improving the modelling of phone duration for Arabic statistical parametric speech synthesis using DNN-based models. In fact, since a few years, DNN have been frequently used for parametric speech synthesis, instead of HMM. Therefore, several variants of DNN-based duration models for Arabic are investigated. The novelty consists in training a specific DNN model for each class of sounds, i.e. short vowels, long vowels, simple consonants and geminated consonants. The main idea behind this choice is the improvement that we already achieved in the quality of Arabic parametric speech synthesis by the introduction of two specific features of Arabic, i.e. gemination and vowel quantity into the standard HTS feature set. Both objective and subjective evaluations show that using a specific model for each class of sounds leads to a more accurate modelling of the phone duration in Arabic parametric speech synthesis, outperforming the state-of-the-art duration modelling systems

    Emotional Content Comparison in Speech Signal Using Feature Embedding

    No full text
    Expressive speech processing has been improved in the recent years. However, it is still hard to detect emotion change in the same speech signal or to compare emotional content of a pair of speech signals, especially using unlabeled data. Therefore, feature embedding has been used in this work to enhance emotional content comparison for pairs of speech signals, cast as a classification task. Actually, feature embedding was proved to reduce the dimensionality and the intra-feature variance in the input space. Besides, deep autoencoders have recently been used as a feature embedding tool in several applications, such as image, gene and chemical data classification. In this work, a deep autoencoder is used for feature embedding before performing classification by vector quantization of the emotional content of pairs of speech signals. Autoencoding was performed following two schemes, for all features and for each group of features. The results show that the autoencoder succeeds (a) to reveal a more compact and a clearly separated structure of the mapped features, and (b) to improve the classification rates for the similarity/dissimilarity of all emotional content aspects that were compared, i.e neutrality, arousal and valence; in order to calculate the emotion identity metric

    Multimodal Emotion Recognition from Voice and Video Signals

    No full text
    A promising area of research and development that can significantly increase the efficacy and accuracy of mental health assessments is the use of artificial intelligence (AI) and machine learning algorithms to analyse simultaneously voice and facial expressions in a video stream. More studies are required to completely comprehend the capabilities and limitations of these technologies and guarantee their ethical and effective usage in clinical settings. Collaborative robots (cobots) have the potential to completely change how mental evaluations of autistic children are approached. ChatGPT is an effective language model that can understand and produce human-like text. When used in conjunction with the Cobot, this technology enables children with autism to interact and communicate in a way that is natural to them. In this article, we introduce a novel method for analysing emotional detection using voice analysis and facial recognition that has been tested on the IEMOCAP database. The outcomes session, which illustrates the tool's potential use in healthcare, concludes the paper

    Feature Analysis for Emotional Content Comparison in Speech

    No full text
    Emotional content analysis is getting more and more present in speech-based human machine interaction, such as emotion recognition and expressive speech synthesis. In this framework, this paper aims to compare the emotional content of a pair of speech signals, uttered by different speakers and not necessarily having the same text. This exploratory work employs machine learning methods to analyze emotional content in speech from different angles: (a) Evaluate the relevance of the used features in the analysis of emotions, (b) Calculate the similarity of the emotional content independently from speakers and text. The final goal is to provide a metric to compare emotional content in speech. Such a metric would form the basis for higher-level tasks, such as clustering utterances by emotional content, or applying kernel methods for expressive speech analysis

    Audio surveillance of roads using deep learning and autoencoder-based sample weight initialization

    No full text
    Road safety has always been a major concern, where a variety of competences is involved, ranging from government and local authorities, medical caregivers and other service provides. Prompt intervention in emergency cases is one of the key factors to minimize damages. Therefore, real-time surveillance is proposed as an efficient means to detect problems on roads. Video surveillance alone is not enough to detect serious accidents, since any hazardous behavior on the road may be confused with an accident, which may lead to many wrong alarms. Instead, audio processing has the potential to recognize sounds coming from different sources, such as crashes, tire skidding, harsh braking, etc. Since a few years, deep learning has become the state of the art for audio events detection. However, the usual dominance of absence of events in road surveillance would make a bias in the training process. Therefore, a novel method to initialize the neural network's weights using an autoencoder trained only on event-related data is used to balance the data distribution

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore