1,721,063 research outputs found
HRTF selection by anthropometric regression for improving horizontal localization accuracy
This work focuses on objective Head-Related Transfer Function (HRTF) selection from anthropometric measurements for minimizing localization error in the frontal half of the horizontal plane. Localization predictions for every pair of 90 subjects in the HUTUBS database are first computed through an interaural time difference-based auditory model, and an error metric based on the predicted lateral error is derived. A multiple stepwise linear regression model for predicting error from intersubject anthropometric differences is then built on a subset of subjects and evaluated on a complementary test set. Results show that by using just three anthropometric parameters of the head and torso (head width, head depth, and shoulder circumference) the model is able to identify non-individual HRTFs whose predicted horizontal localization error generally lies below the localization blur. When using a lower number of anthropometric parameters, this result is not guaranteed
Are spectral elevation cues in head-related transfer functions distance-independent?
Since its title, this paper addresses one of the still open questions in sound localization: is our own perception of the elevation of a sound source affected by the distance of the source itself? The problem is addressed through the analysis of a recently published distance-dependent head-related transfer function (HRTF) database, which includes the responses of a single subject on a spatial grid spanning 14 elevation angles, 72 azimuth angles, and 8 distances comprised between 20 and 160 cm. Different HRTFs sharing the same angular coordinates are compared through spectral distortion and notch frequency deviation measurements. Results indicate that, even though the independence of spectral elevation cues fromdistance of the source can be assumed for the majority of all possible source directions, near-field HRTFs for sources close to the contralateral ear or around the horizontal plane in the ipsilateral side of the head are significantly affected by distancedependent pinna reflections and spectral modifications
Auditory model based subsetting of Head-Related Transfer Function datasets
The rising availability of public head-related transfer function (HRTF) data, measured on hundreds of different individuals, offers a user the possibility to select the best matching non-individual HRTF from a wide catalogue. To this end, reducing the number of alternatives to a small subset of candidate HRTFs is the first step towards an efficient selection process. In this article a novel HRTF subset selection algorithm based on auditory-model vertical localization predictions and a greedy heuristic is outlined, designed to identify a representative HRTF subset from a catalogue including the three biggest public datasets currently available (373 HRTFs overall). The so-resulting subset (6 HRTFs) is then evaluated on a fourth independent dataset. Auditory model predictions show that for over 95% of the subjects of this dataset there exists at least one HRTF out of the representative subset scoring minimal vertical localization error deviations compared to the best available non-individual HRTF out of the catalogue
Individual Three-dimensional Spatial Auditory Displays for Immersive Virtual Environments
Estimation of pinna notch frequency from anthropometry: An improved linear model based on principal component analysis and feature selection
In this paper, anthropometric data from a database of Head-Related Transfer Functions (HRTFs) is used to estimate the frequency of the first pinna notch in the frontal part of the median plane. Given the presence of high correlations between some of the anthropometric features, as well as repeated values for the same subject observations, we propose the introduction of Principal Component Analysis (PCA) to project the features onto a space where they are more separated. We then construct a regression model employing forward step-wise feature selection to choose the principal components most capable of predicting notch frequencies. Our results show that by using a linear regression model with as few as three principal components, we can predict notch frequencies with a cross-validation mean absolute error of just about 600 Hz
A hybrid approach to structural modeling of individualized HRTFs
We present a hybrid approach to individualized head-related transfer function (HRTF) modeling which requires only 3 anthropometric measurements and an image of the pinna. A prediction algorithm based on variational autoencoders synthesizes a pinna-related response from the image, which is used to filter a measured head-andtorso response. The interaural time difference is then manipulated to match that of the HUTUBS dataset subject minimizing the predicted localization error. The results are evaluated using spectral distortion and an auditory localization model. While the latter is inconclusive regarding the efficacy of the structural model, the former metric shows promising results with encoding HRTFs. Index Terms: Hardware - Digital signal processing; Computing methodologies - Neural networks; Applied computing - Sound and music computing</p
Musiche liquide. XX Colloquio di Informatica Musicale. 20Th Colloquium on Music Informatics. Liquid music
A Music Programming Course for Undergraduate Music Conservatory Students: Evaluation and Lessons Learnt
This paper introduces the content and organisation of a music programming course offered to undergraduate Conservatory students in the spring of 2022. A number of evaluation procedures, including pre- and post-course questionnaires and exercises, and a final assignment have been administered by the teacher. Results indicate an increased confidence in the use of computers and programming, although some aspects of creativity and computational thinking need further revision. The authors examine the course content in light of the results obtained, discuss the followed approach, and make assumptions for the improvement of both course content and assessment methods
Techniques for customized binaural audio rendering with applications to virtual rehabilitation
Multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. To achieve this, realistic models that describe our environment are of topical importance, in particular models that accurately describe the acoustics of the environment and communication through the auditory modality. Models for spatial (or 3-D) audio can provide accurate information about the relation between the sound source and the surrounding environment, and this information cannot be substituted by any other modality. However, being multimedia systems currently focused mostly on graphics processing and integrated with simple stereo or surround sound, today’s spatial representation of audio tends to be simplistic and with poor interaction potential. Furthermore, current auralization technologies rely on invasive and/or expensive reproduction devices (e.g. head-mounted displays, loudspeakers), which cause the user to perceive a non-integrated experience due to an unbridged gap between the real and virtual
worlds.
On a much different level lie binaural sound rendering approaches (i.e. based on headphone reproduction). Most of the binaural rendering techniques currently exploited in research rely on the use of the so-called Head-Related Transfer Functions (HRTFs), i.e. peculiar filters that capture the transformations undergone by a sound wave in its path from the source to the eardrum and typically due to reflection and diffraction effects on the torso, head, shoulders and pinnae of the listener. Such characterization allows virtual positioning of sound sources in the surrounding space by filtering the desired signals through a pair of HRTFs, thus creating left and right ear signals to be delivered by headphones. In this way, three-dimensional sound fields with a high immersion sense can be simulated and integrated within multimodal frameworks.
However, such techniques bear relevant limitations. First, they may request considerably large computational resources, especially in the case where one needs to simulate several sound sources in the surrounding space. Second, and most important, HRTF filters are usually presented under the form of acoustic signals recorded through dummy heads: this means that anthropometric differences among different subjects are not taken into account. Contrariwise, along with the critical relative position between listener and sound source, anthropometric features of the human body have a key role in HRTF characterization: while non-individualized HRTFs represent a cheap and straightforward mean of providing 3-D perception in headphone reproduction, listening to non-individualized spatialized sounds may likely result in evident sound localization errors such as incorrect perception of source elevation, front-back reversals, and lack of externalization, especially in static conditions. On the other hand, individual HRTF measurements on a significant number of subjects is often both time- and resource-expensive.
Structural modeling of HRTFs ultimately represents an attractive solution to these shortcomings. As a matter of fact, if one isolates the contributions of the listener’s head, pinnae, ear canals, shoulders, and torso to the HRTF in different subcomponents - each accounting for some well-defined physical phenomenon - then, thanks to linearity, he can reconstruct the global HRTF from a proper combination of all the considered effects.
This thesis presents one such model that can be employed for immersive sound reproduction, with a particular focus on the pinna contribution to the HRTF. The pinna plays a primary part in the perception of source elevation by introducing major spectral modifications, yet the relation between acoustic phenomena due to the pinna - mainly resonances and sound reflections - and anthropometry has not been understood up to date. Instead, a promising correspondence between reflection points on pinna surfaces and frequencies of notches occurring in the high-frequency range of the HRTF spectrum is formally found here. Such a relevant result allows for an interesting form of content adaptation and customization of the structural model, as it includes parameters related to the user’s anthropometry in addition to the spatial ones.
The proposed approach has also implications in terms of delivery, since it operates by processing a monophonic signal exclusively at the receiver side (e.g., on a terminal or mobile device) by means of low-order filters, allowing for reduced computational costs. Thanks to its low complexity, the model can be used to render scenes with multiple audiovisual objects in a number of contexts such as computer games, cinema, edutainment, and any other scenario where realistic sound spatialization and personalized sound reproduction is a major requirement.
Remarkably, the specific areas for which the proposed model is thought for are those of virtual rehabilitation and rehabilitation robotics, two of the most potentially interesting application fields for research in sonic interaction design today. The final goal of research in these areas is to facilitate re-integration of patients with neurological disorders into social and domestic life by helping them regain the ability to autonomously perform activities of daily living (ADLs, e.g., eating, or walking); however, much work is still needed to address challenges related to hardware, software, control system design, as well as effective approaches for delivering treatment. As a matter of fact, ADLs embody complex motor tasks for which current rehabilitation systems lack the sophistication needed in order to assist patients during their performance. In particular, it is recognized that a large number of degrees of freedom ought to be used in robot-assisted rehabilitation, and that multimodal feedback often plays a key role in both forementioned application fields.
Although several rehabilitation systems which make use of multimodal virtual environments with visual and haptic feedback already exist, the consistent use of auditory feedback is less investigated. A thorough analysis of literature reported in this thesis confirms this impression, showing that the potential of auditory feedback is largely underestimated in such systems. Five different proposed experiments allow investigation of the role that novel auditory feedbacks presented during gait training and tracking movements play in improving performance in healthy participants, providing a basis for a future comparison with neurologically injured patients. In particular, usefulness of task-related sound feedback and sound spatialization in coordinating the user’s movements during simple target following tasks is attested. Results thus suggest that constructive and well-designed multimodal feedback can definitely be used to improve performance and learning in complex motor tasks, thanks to the high level of attention, engagement, and presence provided to the user. Such studies represent a novelty in the current literature on virtual rehabilitation and rehabilitation robotics, especially concerning the use of sonification techniques to convey information in a rehabilitation scenario.Le interfacce multimodali rappresentano al giorno d’oggi un fattore chiave per l’abilitazione di un uso inclusivo delle nuove tecnologie. In questo contesto, sono di basilare importanza modelli realistici che descrivano il nostro ambiente, in particolare modelli che rappresentino accuratamente i fenomeni acustici e la comunicazione attraverso la modalità uditiva. Fra questi, i modelli per l’audio spaziale (o 3-D) sono capaci di offrire informazioni accurate sulla relazione tra la sorgente sonora e l’ambiente circostante, rappresentando un’informazione che non può essere sostituita da nessun’altra modalità. Tuttavia, essendo i sistemi multimediali attualmente focalizzati soprattutto sul processing grafico e integrati semplicemente con audio stereo o surround, l’odierna rappresentazione spaziale del suono tende ad essere semplicistica e ad aver poco potenziale interattivo. Inoltre, le tecnologie di auralizzazione si basano correntemente su dispositivi di riproduzione invasivi e/o costosi (ad es. head-mounted display e altoparlanti), responsabili di un’esperienza percettiva non integrata a causa di un vuoto mai colmato tra il mondo reale e quello virtuale.
Gli approcci di audio binaurale (ossia basati su riproduzione tramite cuffie) si collocano su un livello diverso. La maggior parte delle tecniche di rendering binaurale attualmente utilizzate in ricerca fanno affidamento sull’uso delle cosiddette Head-Related Transfer Function (HRTF), ovvero particolari filtri che catturano le trasformazioni subite da un’onda sonora nel proprio percorso dalla sorgente al timpano, generalmente dovute a effetti di riflessione e diffrazione sul torso, sulla testa, sulle spalle e sui padiglioni auricolari dell’ascoltatore. Tale caratterizzazione permette di posizionare virtualmente una o più sorgenti sonore nello spazio circostante semplicemente filtrando i segnali desiderati attraverso un paio di HRTF, creando quindi una coppia di segnali da presentare ai canali sinistro e destro di un paio di cuffie. In questo modo, campi sonori tridimensionali con un alto senso di immersione possono essere simulati e integrati in strutture multimodali.
Purtroppo, importanti limitazioni si nascondono dietro tali tecniche. Innanzitutto, potrebbero richiedere grosse risorse computazionali nel caso in cui si vogliano simulare più sorgenti sonore nello spazio. In secondo luogo, i filtri HRTF vengono solitamente presentati sotto forma di segnali acustici registrati attraverso appositi manichini: ciò significa che le differenze antropometriche fra diversi soggetti non vengono prese in considerazione. Al contrario, alla pari dell’importanza della posizione relativa tra l’ascoltatore e la sorgente sonora, l’antropometria del soggetto ha un ruolo chiave nella caratterizzazione della HRTF: sebbene le HRTF non individualizzate rappresentino un mezzo diretto ed economico per offrire una parvenza di percezione 3-D nella riproduzione via cuffie, l’ascolto del segnale risultante potrebbe frequentemente tradursi in evidenti errori di localizzazione quali percezione distorta dell’elevazione della sorgente, inversioni fronte-retro, e mancanza di esternalizzazione, specialmente in condizioni statiche. D’altro canto, misurare individualmente le HRTF di un numero significativo di soggetti comporterebbe un elevato dispendio di risorse e di tempo.
La modellazione strutturale delle HRTF rappresenta invece un’attraente soluzione a tutte le sopracitate limitazioni. Nello specifico, isolando i contributi alla HRTF di testa, padiglioni auricolari, canali uditivi, spalle e torso dell’ascoltatore in diverse componenti - ciascuna modellante un fenomeno acustico ben definito - la HRTF globale può essere ricostruita attraverso un’adeguata combinazione di tutti gli effetti considerati, grazie alla linearità della scomposizione.
Questa tesi presenta un modello strutturale utilizzabile per una riproduzione immersiva del suono, focalizzato in particolare sul contributo del padiglione auricolare (pinna) alla HRTF. La pinna gioca un ruolo fondamentale nella percezione dell’elevazione della sorgente grazie alle rilevanti modifiche spettrali che essa introduce nel suono che arriva al timpano. Tuttavia, la relazione tra i fenomeni acustici dovuti alla stessa - soprattutto risonanze e riflessioni - ed antropometria non ha ancora trovato una convincente rappresentazione nella letteratura. Una promettente corrispondenza tra i punti di riflessione teorici sulla superficie della pinna e le frequenze di una terna di notch spettrali presenti nella HRTF è invece discussa in questa tesi: tale risultato, sicuramente nuovo nel suo genere, apre le porte ad un’interessante forma di personalizzazione del modello strutturale, il quale include parametri relativi all’antropometria dell’utente oltre a parametri più strettamente correlati alla posizione della sorgente.
L’approccio proposto ha implicazioni anche in termini di trasmissione dei contenuti, poiché opera elaborando un segnale monofonico esclusivamente dalla parte del ricevitore (ad es. su un dispositivo terminale o mobile) per mezzo di filtri di basso ordine, permettendo così una riduzione dei costi computazionali. Grazie alla ridotta complessità, il modello può essere quindi utilizzato per rendere scene con molteplici oggetti audiovisivi in una varietà di contesti quali giochi per computer, cinema, edutainment, e qualsiasi altro scenario in cui spazializzazione realistica del suono e riproduzione personalizzata del suono siano requisiti importanti.
Tra questi, le specifiche aree di ricerca per le quali il suddetto modello è stato pensato sono quelle della riabilitazione virtuale (virtual rehabilitation) e della robotica riabilitativa (rehabilitation robotics), potenzialmente due dei più interessanti campi di applicazione per la ricerca nel design di interazione sonora (sonic interaction design). Lo scopo finale della ricerca in queste due aree è quello di facilitare la reintegrazione di pazienti con disordini neurologici (causati ad esempio da ictus) nella vita sociale e domestica aiutandoli a riottenere le abilità per compiere autonomamente le activities of daily living (ADLs, e.g. mangiare o camminare); nonostante ciò, una grossa mole di lavoro è tuttora richiesta per fronteggiare esigenze relative a hardware, software, design di sistemi di controllo, così come per la definizione di approcci efficaci per il trattamento. Le ADL incorporano infatti task motori complessi per i quali i sistemi riabilitativi attuali mancano della raffinatezza richiesta nell’assistenza dei pazienti durante l’esecuzione degli stessi task. In particolare, è risaputo che un grosso numero di gradi di libertà deve essere usato nella riabilitazione assistita da robot, e che il feedback multimodale spesso gioca un ruolo centrale.
Nonostante l’esistenza di una varietà di sistemi per la riabilitazione che sfruttano ambienti virtuali multimodali con feedback visivo e aptico, l’uso consistente del feedback uditivo è tuttora raro. Un’analisi accurata della letteratura conferma tale ipotesi, dimostrando come il potenziale del feedback uditivo sia largamente sottostimato in tale contesto. Cinque diversi esperimenti, descritti in questa tesi, permettono lo studio del ruolo che nuovi tipi di feedback uditivo presentati durante la camminata o durante movimenti di tracciamento giocano nel miglioramento della performance in soggetti sani, costituendo una base per un futuro paragone con pazienti neurologicamente deficitari. In particolare, viene qui attestata l’utilità di un feedback sonoro relativo al task e della spazializzazione del suono nel coordinamento dei movimenti dell’utente durante semplici task di inseguimento. I risultati suggeriscono quindi come un feedback multimodale costruttivo e ben progettato possa essere usato sistematicamente per migliorare performance e learning in task motori complessi, grazie all’elevato livello di attenzione, coinvolgimento e presenza offerto all’utente. Tali studi rappresentano una novità nella letteratura sulla riabilitazione virtuale e/o assistita da robot, soprattutto per quanto riguarda l’utilizzo di tecniche di sonificazione per convogliare informazioni in uno scenario riabilitativo
Integrating computational thinking with the curriculum of future professional musicians
The purpose of this study is to look at how a music programming course affects the development of computational thinking in undergraduate music conservatory students. In addition to teaching the fundamentals of computational thinking, music programming, and logic, the course addresses the Four C's of education. The change in students' attitudes toward computer and algorithmic skills, creativity, communication, and collaboration is measured using a pre-and post-test experimental design. Additionally, computational thinking abilities are assessed through the administration of music analysis, procedural, graphical, and logic quizzes, while creativity is evaluated through a qualitative grading of the students' final music projects. Results show a general perceived improvement of the students' attitudes toward the Four C's as well as a good ability to convert learned computational models into musical creativity. However, more effort is needed in order to guarantee an overall improvement in the students' actual computational thinking abilities
- …
