1,721,082 research outputs found

    Adaptive Algorithms for Intelligent Acoustic Interfaces

    Full text link
    Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

    Nonlinear spline adaptive filtering

    No full text
    In this paper a new class of nonlinear adaptive filters, consisting of a linear combiner followed by a flexible memory-less function, is presented. The nonlinear function involved in the adaptation process is based on a spline function that can be modified during learning. The spline control points are adaptively changed using gradient-based techniques. B-splines and Catmull-Rom splines are used, because they allow to impose simple constraints on control parameters. This new kind of adaptive function is then applied to the output of a linear adaptive filter and it is used for the identification of Wiener-type nonlinear systems. In addition, we derive a simple form of the adaptation algorithm and an upper bound on the choice of the step-size. Some experimental results are also presented to demonstrate the effectiveness of the proposed method. (c) 2012 Elsevier B.V. All rights reserved

    Frequency domain quaternion adaptive filters: Algorithms and convergence performance

    No full text
    Recently, adaptive fltering was extended to quaternion-valued systems. Quaternion-valued algorithms exhibit improved geometrical properties compared with real- and complex-valued algorithms. Moreover, working in the frequency domain allows a fast execution along with a good convergence performance. In this work, we propose three dfferent quaternion-valued adaptive algorithms operating in the frequency domain. Convergence properties are also analyzed: in particular, the step size stability range is obtained in relation to the eigenvaluesof the input autocorrelation matrix and the Excess Mean Square Error (EMSE) is expressed in relation to the algorithm parameters. Finally, simulations support the proposal

    Intelligent acoustic interfaces for immersive audio

    No full text
    Oncoming audio technologies privilege the perceptive quality of audio signals, thus offering users an immersive audio experience, which involves listening and acquisition of audio signals. In such scenario a fundamental role is played by intelligent acoustic interfaces which aim at acquiring audio information, processing it, and returning the processed information under the fulfillment of quality requirements demanded by users. In this work we introduce intelligent acoustic interfaces for immersive audio experience and we prove their effectiveness within the context of immersive speech communications. In particular, we introduce an intelligent acoustic interface composed of a combined adaptive beamforming scheme in conjunction with a microphone array, which is able to enhance the processed signals in immersive scenarios

    Combined adaptive beamforming schemes for nonstationary interfering noise reduction

    No full text
    This paper introduces new adaptive beamforming methods for nonstationary noise reduction, designed to be robust against broadband interfering signals. In particular, we propose combined beamforming schemes within a standard adaptive beamforming system, such as the generalized sidelobe canceller (GSC). The novelty of such combined adaptive beamformers relies on the use of different adaptive sidelobe cancelling structures which allow the system to achieve robustness in nonstationary noisy environments. The combined structures are based on the convex combination of two multiple-input single-output (MISO) adaptive systems with complementary capabilities. The whole beamformer benefits from such combination and results to be able to preserve the best properties of each system. We introduce two different adaptive schemes, whose difference lies in the way of combining the MISO systems. Moreover, we present a further adaptive beamforming scheme which generalizes the previous techniques, thus improving the robustness against nonstationary interfering signals in multisource environments. The effectiveness of the proposed systems is also assessed in a nonstationary dense multipath environment. The experiments show that the proposed combined beamforming schemes are capable of enhancing the desired signal even in the presence of nonstationary interfering signals. (c) 2013 Elsevier B.V. All rights reserved

    Hierarchical hypercomplex network for multimodal emotion recognition

    No full text
    Emotion recognition is relevant in various domains, ranging from healthcare to human-computer interaction. Physiological signals, being beyond voluntary control, offer reliable information for this purpose, unlike speech and facial expressions which can be controlled at will. They reflect genuine emotional responses, devoid of conscious manipulation, thereby enhancing the credibility of emotion recognition systems. Nonetheless, multimodal emotion recognition with deep learning models remains a relatively unexplored field. In this paper, we introduce a fully hypercomplex network with a hierarchical learning structure to fully capture correlations. Specifically, at the encoder level, the model learns intra- modal relations among the different channels of each input signal. Then, a hypercomplex fusion module learns inter-modal relations among the embeddings of the different modalities. The main novelty is in exploiting intra-modal relations by endowing the encoders with parameterized hyper-complex convolutions (PHCs) that thanks to hypercomplex algebra can capture inter-channel interactions within single modalities. Instead, the fusion module comprises parameterized hypercomplex multiplications (PHMs) that can model inter-modal correlations. The proposed architecture surpasses state-of-the-art models on the MAHNOB-HCI dataset for emotion recognition, specifically in classifying valence and arousal from electroencephalograms (EEGs) and peripheral physiological signals. The code of this study is available at https://github.com/ispamm/MHyEEG

    Novel decorrelation approach for an advanced multichannel acoustic echo cancellation system

    No full text
    A multichannel sound reproduction system aims at offering an immersive experience exploiting multiple microphones and loudspeakers. In the case of multichannel acoustic echo cancellation, a suitable solutions for overcoming the well-known non-uniqueness problem and an appropriate choice of the adaptive algorithm become essential to improve the audio reproduction quality. In this paper, an advanced system is proposed based on the introduction of a multichannel decorrelation solution exploiting the missing-fundamental phenomenon and a combined multiple-input multiple-output architecture updated by using the multichannel affine projection algorithm. Experimental results proved the effectiveness of the presented framework in terms of objective and subjective measures, providing a suitable solution for echo cancellation. © 2014 EURASIP

    Nonlinear system identification using IIR spline adaptive filters

    No full text
    The aim of this paper is to extend our previous work on a novel and recent class of nonlinear filters called Spline Adaptive Filters (SAFs), implementing the linear part of the Wiener architecture with an IIR filter instead of an FIR one. The new learning algorithm is derived by an LMS approach and a bound on the choice of the learning rate is also proposed. Some experimental results show the effectiveness of the proposed idea

    Online sequential extreme learning machine with kernels

    No full text
    The extreme learning machine (ELM) was recently proposed as a unifying framework for different families of learning algorithms. The classical ELM model consists of a linear combination of a fixed number of nonlinear expansions of the input vector. Learning in ELM is hence equivalent to finding the optimal weights that minimize the error on a dataset. The update works in batch mode, either with explicit feature mappings or with implicit mappings defined by kernels. Although an online version has been proposed for the former, no work has been done up to this point for the latter, and whether an efficient learning algorithm for online kernel-based ELM exists remains an open problem. By explicating some connections between nonlinear adaptive filtering and ELM theory, in this brief, we present an algorithm for this task. In particular, we propose a straightforward extension of the well-known kernel recursive least-squares, belonging to the kernel adaptive filtering (KAF) family, to the ELM framework. We call the resulting algorithm the kernel online sequential ELM (KOS-ELM). Moreover, we consider two different criteria used in the KAF field to obtain sparse filters and extend them to our context. We show that KOS-ELM, with their integration, can result in a highly efficient algorithm, both in terms of obtained generalization error and training time. Empirical evaluations demonstrate interesting results on some benchmarking datasets

    User-driven quality enhancement for audio signal processing

    No full text
    Classical methods for audio and speech enhancement are often based on error-driven optimization strategies, such as the mean-square error minimization. However, these approaches do not always satisfy the quality requirements demanded by users of the system. In order to meet subjective specifications, we put forward the idea of a user-driven approach to audio enhancement through the inclusion in the optimization stage of an interactive evolutionary algorithm (IEA). In this way, performance of the system can be adapted to any user in a principled and systematic way, thus reflecting the desired subjective quality. Experiments in the context of echo cancellation support the proposed methodology, showing significant statistical advantage of the proposed framework with respect to classical approaches
    corecore