1,720,987 research outputs found
Uses of the pitch-scaled harmonic filter in speech processing
The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their periodic and aperiodic constituents, during periods of phonation. In this paper, the use of the PSHF for speech analysis and processing tasks is described. The periodic component can be used as an estimate of the part attributable to voicing, and the aperiodic component can act as an estimate of that attributable to turbulence noise, i.e., from fricative, aspiration and plosive sources. Here we present the algorithm for separating the periodic and aperiodic components from the pitch-scaled Fourier transform of a short section of speech, and show how to derive signals suitable for time-series analysis and for spectral analysis. These components can then be processed in a manner appropriate to their source type, for instance, extracting zeros as well as poles from the aperiodic spectral envelope. A summary of tests on synthetic speech-like signals demonstrates the robustness of the PSHF's performance to perturbations from additive noise, jitter and shimmer. Examples are given of speech analysed in various ways: power spectrum, short-time power and short-time harmonics-to-noise ratio, linear prediction and mel-frequency cepstral coefficients. Besides being valuable for speech production and perception studies, the latter two analyses show potential for incorporation into speech coding and speech recognition systems. Further uses of the PSHF are revealing normally-obscured acoustic features, exploring interactions of turbulence-noise sources with voicing, and pre-processing speech to enhance subsequent operations
Modelling vocal-tract acoustics validated by flow experiments
Modelling the acoustic response of the vocal tract is a complex task, both from the point of view of acquiring details of its internal geometry and of accounting for the acoustic-flow interactions. A VOcal-tract ACoustics program (VOAC) has been developed [P. Davies, R. McGowan & C. Shadle, Vocal Fold Phys., ed. I. Titze, San Diego: Singular Pub., 93-142 (1993)], which uses a more realistic, aeroacoustic model of the vocal tract than classic electrical-analogue representations. It accommodates area and hydraulic radius profiles, smooth and abrupt area changes, incorporating end-corrections, side-branches, and net fluid flows, including turbulence losses incurred through jet formation. Originally, VOAC was tested by comparing vowel formant frequencies (i) uttered by subjects, (ii) predicted using classic electrical analogues, and (iii) predicted by VOAC. In this study, VOAC is further validated by comparing the predicted frequency response functions for a range of flow rates with measurements of the radiated sound from a series of mechanical models of unvoiced fricatives [C. Shadle, PhD thesis, MIT-RLE Tech. Rpt. 506 (1985)]. Results show VOAC is more accurate in predicting the complete spectrum at a range of flow rates. Finally, preliminary work is presented with VOAC used to simulate the sound generated at a sequence of stages during the release of a plosive
Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data
We would like to develop a more realistic production model of unvoiced speech sounds, namely fricatives, plosives and aspiration noise. All three involve turbulence noise generation, with place-dependent source characteristics that vary with time (rapidly, in plosives). In this study, we aimed to produce, using an aero-acoustic model of the vocal-tract filter and source, voiced as well as unvoiced fricatives that provide a good match to analyses of speech recordings. The vocal-tract transfer function (VTTF) was computed by the vocal-tract acoustics program, VOAC [Davies, McGowan and Shadle. Vocal Fold Physiology: Frontiers in Basic Science, ed. Titze, Singular Pub., CA, 93-142, 1993], using geometrical data, in the form of cross-sectional area and hydraulic radius functions, along the length of the tract. VOAC incorporates the effects of net flow into the transmission of plane waves through a tubular representation of the tract, and relaxes assumptions of rrigid walls and isentropic propagation. The geometry functions were derived from multiple-slice, dynamic, magnetic resonance images (MRI) [Mohammad. PhD thesis, Dept. ECS, U. Southampton, UK, 1999; Shadle, Mohammad, Carter, and Jackson. Proc. ICPhS, S.F. CA, 1:623-626, 1999], using a method of converting from the pixel outlines that was improved over earlier efforts on vowels. A coloured noise source signal was combined with the VTTF and radiation characteristic to synthesize the unvoiced fricative [s]. For its voiced counterpart [z], many researchers have noted that the noise source appears to be modulated by voicing. Furthermore, the phase of the modulation has been shown to be perceptually significant. Based on our analysis [Jackson and Shadle. Proc. IEEE-ICASSP, Istanbul, 2000.] of recordings by the same subject, the frication source of [z] was varied periodically according to fluctuations in the flow velocity at the constriction exit, and the modulation phase was governed by the convection time for the flow perturbation to travel from the constriction to the obstacle. The synthesized fricatives were compared to the speech recordings in a simple listening test, and comparisons of the predicted and measured time series suggested that the model, which brings together physical, aerodynamic and acoustic information, can replicate characteristics of real speech, such as the modulation in voiced fricatives [http://www.isis.ecs.soton.ac.uk/research/ projects/nephthys/]
Application of Active Noise Control in Corporate Aircraft
Following the successful introduction of Active Noise Control (ANC) systems as standard production fits on commuter aircraft (Saab2000, Saab340B and Dash8Q series 100, 200 & 300), recent efforts have focused on developing low-cost, low-weight systems for smaller corporate aircraft. This paper describes the approach taken by Ultra to the new technical challenges and the resulting improvements to the design methodology. A review of system performance on corporate (King Air & Twin Commander) turboprop aircraft shows repeatable global Tonal Noise Reductions (TNRs) of >8 dBA throughout the whole cabin, achieving reductions >20 dB in some locations at the blade-pass frequency (BPF), and major comfort benefits throughout the flight envelope with a weight penalty of less than 20 kg
Pitch-synchronous Decomposition of Mixed-source Speech Signals
As part of a study of turbulence-noise sources in speech production, a method has been developed for decomposing an acoustic signal into harmonic (voiced) and anharmonic (unvoiced) components, based on a hoarseness metric (Muta et al., 1988, J. Acoust. Soc. Am. 84, pp.1292-1301). Their pitch-synchronous harmonic filter (PSHF) has been extended (to EPSHF) to yield time histories of both harmonic and anharmonic components. Our corpus includes many examples of turbulence noise, including aspiration, voiced and unvoiced fricatives, and a variety of voice qualities (e.g. breathy, whispered). The EPSHF algorithm plausibly decomposed breathy vowels, but the harmonic component of voiced fricatives still contained significant noise, similar in shape to (though weaker than) the ensemble-averaged anharmonic spectrum. In general the algorithm performed best on sustained sounds. Tracking errors at rapid transitions, and due to jitter and shimmer, were spuriously attributed to the anharmonic component. However, the extracted anharmonic component clearly exhibited modulation in voiced fricatives. While such modulation has been previously reported (and also in hoarse voice), it was verified by tests on synthetic signals, where constant and modulated noise signals were extracted successfully. The results suggest that the EPSHF will continue to enable exploration of the interaction of phonation and turbulence noise
Performance of the pitch-scaled harmonic filter and applications in speech analysis
The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their voiced and unvoiced constituents. In this paper, we evaluate its ability to reconstruct the time series of the two components accurately using a variety of synthetic, speech-like signals, and discuss its performance. These results determine the degree of confidence that can be expected for real speech signals: typically, 5 dB improvement in the signal-to-noise ratio of the harmonic component and approximately 5 dB more than the initial harmonics-to-noise ratio (HNR) in the anharmonic component. A selection of the analysis opportunities that the decomposition offers is demonstrated on speech recordings, including dynamic HNR estimation and separate linear prediction analyses of the two components. These new capabilities provided by the PSHF can facilitate discovering previously hidden features and investigating interactions of unvoiced sources, such as frication, with voicing
Frication noise modulated by voicing, as revealed by pitch-scaled decomposition
A decomposition algorithm that uses a pitch-scaled harmonic filter was evaluated using synthetic signals and applied to mixed-source speech, spoken by three subjects, to separate the voiced and unvoiced parts. Pulsing of the noise component was observed in voiced frication, which was analyzed by complex demodulation of the signal envelope. The timing of the pulsation, represented by the phase of the anharmonic modulation coefficient, showed a step change during a vowel-fricative transition corresponding to the change in location of the noise source within the vocal tract. Analysis of fricatives /B, v, dh, z, zh, gh, q/ demonstrated a relationship between steady-state phase and place, and f0 glides confirmed that the main cause was a place-dependent delay
Aerodynamically-based parametric description of the noise envelope in voiced fricatives
In voiced fricatives, the radiated sound is composed of a harmonic component associated with the vibrating larynx and a noise component generated at a constriction in the oral cavity. The sound from the two sources interacts in a nonlinear way to produce a noise signal with an amplitude envelope modulated at the fundamental frequency of voicing. While voiced fricatives synthesized as a linear combination of harmonic and noise components are identifiable, it is recognized that the inclusion of the modulation improves the perceived naturalness of a synthesized token. The depth of modulation of the radiated noise, for a range of aerodynamic and acoustic variables, was measured experimentally using a dynamic mechanical model of the larynx and vocal tract. Glottal excitation arose from driven shutters representing the vocal folds; frication noise was produced by an orifice plate with a sharp-edged obstacle downstream. Based on the empirical data, a parametric description was developed to predict the depth and phase of amplitude modulation of the noise from the aerodynamic and acoustic conditions
Analysis of mixed-source speech sounds: aspiration, voiced fricatives and breathiness
Our initial goal was to model the source characteristics of aspiration more accurately. The term is used inconsistently in the literature, but there is general agreement that aspiration is produced by turbulence noise generated in the vicinity of the glottis. Thus, in order to model aspiration, we must refine its concept, and in particular define its relation to other kinds of noise produced near the glottis, such as breathiness and hoarseness. For instance, do similar aeroacoustic processes operate transiently during a plosive release and steadily during a breathy vowel? In unvoiced fricatives, localized sources produce well-defined spectral troughs. We have therefore developed a series of analysis methods that generate spectra for transient and voice-and-noise-excited sounds. These methods include pitch-synchronous decomposition into harmonic and anharmonic components (based on a hoarseness metric of Muta et al., 1988), short-time spectra, ensemble averaging, and short-time harmonics-to-noise ratios (Jackson and Shadle, 1998). These have been applied to a corpus of repeated nonsense words consisting of aspirated stops in three vowel contexts and voiced and unvoiced fricatives, spoken in four voice qualities, thus providing multiple examples of mixed-source and transient-source speech sounds. Ensemble-averaged spectra derived throughout a stop release show evidence of a highly-localized noise source becoming more distributed. Variations by place are also apparent, complementing and extending previous work (Stevens and Blumstein, 1978; Stevens, 1993). The coordination of glottal and supraglottal articulation, described and modelled for aspiration by Scully and Mair (1995), is in a sense reversed for voiced fricatives. Use of the decomposition algorithm on voiced fricatives revealed greater complexity than expected: the anharmonic component appears sometimes to be modulated by the harmonic component, sometimes to be independent of it, and tends to change from one case to the other in the course of the fricative. In sum, we have made some progress in describing not only spectral but time-varying properties of an aspiration model, and in so doing, have improved our descriptions of other mixed-source, time-varying speech sounds
- …
