Source-filter model of speech production
From Wikipedia, the free encyclopedia
The source-filter model of speech production models speech as a combination of a sound source, such as the vocal cords, and a filter, the vocal tract (and radiation characteristic).
While only an approximation, the model is widely used in a number of applications because of its relative simplicity. To varying degrees, different phonemes can be distinguished by the properties of their source(s) and their spectral shape. Voiced sounds (e.g., vowels) have (at least) a source due to (mostly) periodic glottal excitation, which can be approximated by an impulse train in the time domain and by harmonics in the frequency domain, and a filter that depends on, e.g., tongue position and lip protrusion. On the other hand, fricatives have (at least) a source due to turbulent noise produced at a constriction in the oral cavity (e.g., the sounds represented by orthographically by "s" and "f"). So called voiced fricatives (such as "z" and "v") have two sources - one at the glottis and one at the supra-glottal constriction.
The source-filter model is used in both speech synthesis and speech analysis, and is related to linear prediction. The development of the model is due, in large part, to the early work of Gunnar Fant, although others, notably Ken Stevens, have also contributed substantially to the models underlying acoustic analysis of speech and speech synthesis.