1,721,017 research outputs found

    Reverb and noise as real-world effects in speech recognition models: a study and a proposal of a feature set

    No full text
    Reverberation and background noise are common and unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems are typically trained on noise-free data. Most models rely on fixed audio feature sets. To evaluate the dependency of features on reverberation and noise, this study proposes augmenting the commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance of these features was assessed using noisy data generated by applying reverberation and pink noise to the DEMoS dataset, which includes 56 speakers. Verification models were trained on clean data using MFCCs, RASTA features, or their combination as inputs. They validated on augmented data with progressively increasing noise and reverberation levels. The results indicate that MFCCs struggle to identify the main speaker, while the RASTA method has difficulty with the opposite class. The hybrid feature set, derived from their combination, demonstrates the best overall performance as a compromise between the two. Although the MFCC method is the standard and performs well on clean training data, it shows a significant tendency to misclassify the main speaker in real-world scenarios, which is a critical limitation for modern user-centric verification applications. The hybrid feature set, therefore, proves effective as a balanced solution, optimizing both sensitivity and specificity

    NMF based system for speaker identification

    No full text

    Multi-class machine learning detection of Edema, Vocal Paralysis and Vocal Nodules through voice

    No full text
    This paper aims to differentiate causes of dysphonia, namely Reinke's Edema, Vocal Cord Paralysis, and Vocal Nodules, also including healthy subjects. A proprietary dataset of 245 subjects underwent acoustic feature extraction and selection, and four classifiers were trained for multi-class classification. Loudness/Energy-related features were among the most effective, which is in line with the fact that the three diseases all cause different impairments in terms of voice volume. Cepstrum is also confirmed as an effective domain. The four classifiers obtained comparable performances, with Random Forest having the highest accuracy at 78.4% and Naïve Bayes offering the best compromise in terms of recall. Healthy subjects always lead to a higher recall, which is in line with the fact that identifying dysphonia is an easier task than differentiating among its causes

    AKT-dependent phosphorylation of the adenosine deaminases ADAR-1 and -2 inhibits deaminase activity

    No full text
    Murine thymoma viral oncogene homolog (AKT) kinases target both cytosolic and nuclear substrates for phosphorylation. Whereas the cytosolic substrates are known to be closely associated with the regulation of apoptosis and autophagy or metabolism and protein synthesis, the nuclear substrates are, for the most part, poorly understood. To better define the role of nuclear AKT, potential AKT substrates were isolated from the nuclear lysates of leukemic cell lines using a phosphorylated AKT substrate antibody and identified in tandem mass spectrometry. Among the proteins identified was adenosine deaminase acting on RNA (ADAR)1p110, the predominant nuclear isoform of the adenosine deaminase acting on double-stranded RNA. Coimmunoprecipitation studies and in vitro kinase assays revealed that AKT-1, -2, and -3 interact with both ADAR1p110 and ADAR2 and phosphorylate these RNA editases. Using site-directed mutagenesis of suspected AKT phosphorylation sites, AKT was found to primarily phosphorylate ADAR1p110 and ADAR2 on T738 and T553, respectively, and overexpression of the phosphomimic mutants ADAR1p110 (T738D) and ADAR2 (T553D) resulted in a 50-100% reduction in editase activity. Thus, activation of AKT has a direct and major impact on RNA editing.-Bavelloni, A., Focaccia, E., Piazzi, M., Raffini, M., Cesarini, V., Tomaselli, S., Orsini, A., Ratti, S., Faenza, I., Cocco, L., Gallo, A., Blalock, W. L. AKT-dependent phosphorylation of the adenosine deaminases ADAR-1 and -2 inhibits deaminase activity

    Machine learning- and statistical-based voice analysis of Parkinson's disease patients: A survey

    Full text link
    The preliminary diagnosis and evaluation of the presence and/or severity of Parkinson’s disease is crucial in controlling the progress of the disease. Real-time, non-invasive methodologies based on machine learning-enhanced voice analysis are gathering more interest as the potential of this field unveils. Specifically, acoustic features are employed in many machine learning techniques, and could also function as indicators of the overall state of the subjects’ voice: this review aims at identifying the most widely employed and promising feature-based machine learning methodologies, evidencing baselines and state-of-the-art solutions. A total of 102 works plus 5 review articles were selected from the IEEE Xplore, PubMed, Elsevier, and Web of Science electronic databases. A statistical assessment is performed identifying the most frequently used features as well as those deemed as most effective; an overview of algorithms, public datasets, toolboxes, and general metadata is also performed. According to our results, Jitter, Shimmer, Harmonic-to-Noise Ratio, Fundamental Frequency, and Mel Frequency Cepstral Coefficients are the mostly adopted features. In addition, it is worth noting a fair prevalence of glottal-like models and additional filtering options, such as Detrended Fluctuation Analysis

    Beyond breathalyzers: AI-powered speech analysis for alcohol intoxication detection

    No full text
    Detecting potential alcohol inebriation or intoxication status holds paramount significance for social prevention and security. Beyond its association with long-term health effects, alcohol consumption can lead to immediate consequences, including reduced control over one's actions, with traffic fatalities representing one of the most tragic outcomes. This study leveraged the Alcohol Language corpus, involving 162 subjects recorded both in sober and inebriated states. Participants provided 60 speech samples while sober and 30 when intoxicated, all within a realistic car setting using head-mounted microphones. Our research endeavors encompassed comprehensive stratified statistical tests to examine the impact of alcohol consumption on speech production while uncovering the influence of covariates such as age, gender, and drinking habits. Additionally, we introduced a speaker-neutral machine learning algorithm, based on the Domain-Adversarial Neural Network architecture. This approach aimed to overcome challenges posed by individual differences that often complicate intoxicated speech analysis. Notably, our findings highlighted the effectiveness of features like the RASTA-filtered auditory spectrum. Nevertheless, the results from statistical tests emphasized the need for techniques that minimize inter-subject variability. As for the automatic classification, the proposed architecture exhibited promising results, yielding a classification accuracy slightly exceeding 70% on an independent test set. Although preliminary, our research demonstrates the potential for detecting alcohol-induced speech changes, benefiting societal well-being and security. It also underscores the importance of developing strategies that account for individual differences while harnessing the power of automatic models to effectively distinguish between sober and intoxicated individuals

    The adaptive potential of RNA editing-mediated miRNA-retargeting in cancer

    No full text
    A-to-I RNA editing is a post-transcriptional mechanism that converts the genomically coded Adenosine (A) into Inosine (I) at the RNA level. This type of RNA editing is the most frequent in humans and is mediated by the ADAR enzymes. RNA editing can alter the genetic code of mRNAs, but also affect the functions of noncoding RNAs such as miRNAs. Recent studies have identified thousands of microRNA editing events in different cancer types. However, the important role played by miRNA-editing in cancer has been reported for just a few microRNAs. Herein, we recapitulate the current studies on cancer-related microRNA editing and discuss their importance in tumor growth and progression. This article is part of a Special Issue entitled: mRNA modifications in gene expression control edited by Dr. Soller Matthias and Dr. Fray Rupert

    Machine learning- and statistical-based voice analysis of Parkinson's disease patients: a survey

    No full text
    The preliminary diagnosis and evaluation of the presence and/or severity of Parkinson's disease is crucial in controlling the progress of the disease. Real-time, non-invasive methodologies based on machine learning-enhanced voice analysis are gathering more interest as the potential of this field unveils. Specifically, acoustic features are employed in many machine learning techniques, and could also function as indicators of the overall state of the subjects' voice: this review aims at identifying the most widely employed and promising feature-based machine learning methodologies, evidencing baselines and state-of-the-art solutions. A total of 102 works plus 5 review articles were selected from the IEEE Xplore, PubMed, Elsevier, and Web of Science electronic databases. A statistical assessment is performed identifying the most frequently used features as well as those deemed as most effective; an overview of algorithms, public datasets, toolboxes, and general metadata is also performed. According to our results, Jitter, Shimmer, Harmonic-to-Noise Ratio, Fundamental Frequency, and Mel Frequency Cepstral Coefficients are the mostly adopted features. In addition, it is worth noting a fair prevalence of glottal-like models and additional filtering options, such as Detrended Fluctuation Analysis

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore