What are Lx, Tx, Fx, Dx, Cx, etc?
These terms have originated at University College London to describe the various graphs related to the use of the Laryngograph for the analysis of voice.
- Lx is the name give to the current-flow waveform generated by the Laryngograph. An Lx waveform has a vertical axis representing current flow through the larynx, which is related to vocal fold contact area.
- Tx is the name given to the sequence of pitch period durations that can be generated from the Lx waveform. Tx data is used as the basis for the calculation of instantaneous fundamental frequency (= the fundamental frequency associated with a single voicing cycle), which is used to generate fundamental frequency distributions.
- Fx is the name give to the graph of fundamental frequency against time. You will see this described as F0 elsewhere in the literature. We prefer the term Fx because it reminds us that this is the "frequency of excitation" to the vocal tract, and is not to be confused with F1, F2, etc which are the resonant frequencies of the vocal tract.
- Dx is the name given to distributions of fundamental frequency, that is histograms of fundamental frequency usage. These histograms tell us how much time a speaker spends at each fundamental frequency. From these we can estimate his modal frequency (= most commonly used frequency) and his fundamental frequency range (typically the range in which a speaker spends 90% of his time voicing). Sometimes we distinguish first-order histograms (Dx1) which include all voicing cycles, to second-order histograms (Dx2) which only include voicing cycles occurring in regular speech. The difference between Dx1 and Dx2 allows us to quantify the amount of irregularity in the speech.
- Cx is the name given to a kind of scatterplot graph in which adjacent pairs of Tx values are plotted against one another on a frequency scale. This graph shows us the degree of irregularity in the voicing. In regular voicing, the scattering of points on the Cx graph is along the diagonal, whereas for irregular voicing, many points occur off the diagonal.
What is the best way to measure the "average" fundamental frequency?
There are basically three ways to obtain an average from a probability distribution: use the mean, the median or the mode. Distributions of fundamental frequency have some odd characteristics which affects the decision of which of these is most useful. Among these are:
- Perception of fundamental frequency is known to be related more closely to the logarithm of Fx rather then linearly in Hertz. Thus should we plot Fx or log(Fx) on our histogram?
- Usage of fundamental frequency could be measured in terms of the number of vocal fold cycles used by a speaker at each frequency, or by the total amount of time spent by the speaker at each frequency.
- Not all speech is voiced, and there are regions where the voicing is starting up or stopping which may not be typical of normal vibration.
- Instruments for measuring fundamental frequency are prone to measurement error: pitch halving and pitch doubling being common. Even the laryngograph has poor performance on some speakers.
- Some speakers use a great deal of creakiness in their phonation, and this can give odd fundamental frequency values.
- Fundamental frequency distributions can often be far from normally distributed (Gaussian shaped), with many outlier values, and sometimes more than one peak.
Together, these considerations suggest that the mode is the most useful measure. It is unaffected by the log/linear consideration or the shape of the distribution. Its weakness is for distributions with more than one peak. These should be documented specially. Both the mean and the median can be strongly affected by the odd shape of distributions.
What is the best way to measure the "range" of fundamental frequency?
There are basically three ways to measure the breadth of a distribution: the standard deviation, the inter quartile range, and the total range. Distributions of fundamental frequency have some odd characteristics, some of which are listed in the answer to the last question. The fact that the distribution often has a large number of outliers means that the use of the standard deviation is not satisfactory: it would give values which are much broader than the truth. Similarly, the total range is only set by two values from a distribution containing possibly thousands of values: the very highest and the very lowest. Thus the total range is also unsatisfactory. Thus measures based on percentiles, like the inter quartile range seem to be our best bet.
It is worth asking ourselves what we require of a measure of range? We want a measure that is reliable in the sense that if we repeat the measure on a different recording of the same speaker we would hope to get a similar answer. On the other hand, we want a measure that is sensitive to differences in fundamental frequency use: between one speaker and another, between one style of text and another, before and after therapy, etc. Thus we have to come to some compromise. At UCL we have settled on the 90% range as our preferred measure. This is the range of fundamental frequency that the speaker stays within 90% of the time (of his voiced speech). Not only is this measure fairly reliable, it is also easy to understand. The 90% range discards 5 percentiles of the distribution at the top and the bottom, making it less sensitive to outliers. On the other hand, the measure does not deal adequately with very irregular voicing. It may be better to use the second order Dx in these circumstances.
Previous FAQ Page | Next FAQ Page