OF TWO ACOUSTIC FEATURES IN
(Spanish version of this paper was published on R.L.A. 36, 1998, pp. 113-126)
The aim of this research is to determine the degree of influence of the features that oppose the /p-t-k/ and /b-d-g/ series in their discrimination according to descriptive studies. The operational validity of each feature regarding its relative simplicity, its independence of other parameters, and its adjustment to speech analysis and synthesis methods have been considered and discussed. Voicing and length have been artificially modified separately. The results of these manipulations have been tested by listeners.
In Spanish there is a correlation of two series of phonemes /b-d-g/ and /p-t-k/ at phonological level, i.e. there is at least one common feature that separates /b/ from /p/, /d/ from /t/, and /g/ from /k/. The discussion about the acoustic nature of such feature is not yet settled. Phonetic studies regarding this problem have been based on linguistic production. By means of the analysis of acoustic data these studies have thoroughly characterised and widely determined the relevance of at least six features: voicing (Alarcos Llorach, 1954; Harris, 1975; Cressey, 1978; Soto-Barba, 1995), tenseness (Gili Gaya, 1958; Martínez Celdrán, 1984; Canellada and Kuhlman, 1987), length (Borzone and Gurlekian, 1980; Cepeda, 1989), V.O.T. (Borzone and Gurlekian, 1980; Castañeda, 1986), formant transitions (Delattre, 1962) and IREDUS (Soto-Barba, 1994).
Considering that phonetic descriptive studies have characterised a series of different distinctive features, is it possible to measure the real influence of such features in the discrimination between /b-d-g/ and /p-t-k/? In order to give a positive answer to this question, first an answer to the following question must be given. What features can be manipulated and later verify its influence in the discrimination between /p-t-k/ and /b-d-g/? If the methodology used to alter the features is the analysis and synthesis of acoustic parameters, then the answer to both questions is that absolute length and voicing, shown as the presence or absence of low frequencies band, can be manipulated by these means.
In the phonetics tradition, there is consensus that the feature defined in articulatory terms as functions of the vocal cords, expressed in the segment classification by the dichotomy voiced/voiceless, are shown acoustically by the presence or absence of a periodic wave which, in the spectrographic representation, is shown by low frequencies band also called voicing band. This voicing band can be modified by speech analysis and synthesis procedures. It is also the characteristic of the stimulus which enables the determination of whether voicing is relevant or not in the discrimination between /b-d-g/ and /p-t-k/ by addition or removal.
Length is a fully acoustic feature because acoustically corresponds to the segment length measured in time, expressed in the spectrograph in milliseconds. The articulatory tenseness is not suitable for a physical description and manipulation due to its multiple acoustic manifestations. Thus, tenseness cannot be identified by strictly acoustic procedures. Regarding formant transition, and even though they are a spectrographic representation of an articulatory effect, Soto-Barba (1995) found that this feature was the most erratic in its acoustic manifestation and thus the least reliable to establish oppositions. Borzone and Gurlekian (1980) stated that formant transitions are relevant for the identification of the place of articulation of the studied series, but not for the identification of the vocal cords functions. V.O.T. is not a proper acoustic feature but a unit of measurement, which comprises in one index the interaction between the length of the voicing previous to the explosion bar (spaced vertical striation in spectrum) of /b-d-g/, and the length of the silence after the explosion bar of /p-t-k/. IREDUS is neither a proper acoustic feature but as V.O.T. a unit of measurement, a different way of expressing the length feature.
Once voicing and length can be isolated and manipulated, the results can be submitted to the judgement of many Spanish speakers by means of a perception test. Then is possible to measure quantitatively the possibility that once a feature has been altered, a change in the discrimination of the studied segments would be experienced.
The purpose of this study is to determine the degree of influence in the discrimination between /b/ and /p/, /d/ and /t/, and /k/ and /g/ of the acoustic features which oppose these phonemes, i.e. length, and voicing (presence or absence of periodic wave).
2. MANIPULATION OF ACOUSTIC FEATURES
Firstly, a group of four male speakers, between 30 and 40 years old, with the standard pronunciation and voice is asked to read a series of logathoms containing the consonants of the studied series, in the context NC_V (NC = nasal consonant, V = vowel). A sheet of paper with the written logathoms is given to be read three times leaving a pause of about 3 seconds between each of them. The signal is input directly to the Kay ElemetricsTM DSP Sonograph 5500 by means of an unidirectional microphone at a constant intensity of 35 dB. Once in the DSP the signal is stored in a digitalised file of sounds with the Kay ElemetricsTM ASL software. A natural utterance (NU) for each consonant is generated with this process. Once all the NUs are filed a subject whose utterances clearly shows the following acoustic features in the spectrograph is chosen: formants 1, 2, and 3; low frequencies band for the /b-d-g/ series, and explosion bar.
The analysis of each NU of the chosen subject, with the ASL software is run. First, the wave form is generated and the detection of glottal pulses of the whole utterance is done. Then, an analysis with LPC techniques is run which generates a spectrogram of the utterance and a spread sheet with the numerical data of the fundamental tone (F0), intensity (PK), length (LEN), frequency (F1, F2, etc.), and band width (B1, B2, etc.) of each formant in each of the glottal pulses.
To synthesise voicing or fundamental presence, the numerical data corresponding to F0, BK, F1, F2, F3, F4, B1, B2, B3, and B4 are isolated and extracted in each of the glottal pulses corresponding to the voicing band of the consonant /b-d-g/ series. Then, the same data are isolated and extracted in the silence previous to the explosion bar of the /p-t-k/ series and both sets of values are exchanged. As obviously /p/, /t/, and /k/ are always longer than /b/, /d/, and /g/ the glottal pulses of the inserted set of values are subtracted or reproduced (depending on the case) until a length value similar to the opposed segment, measured in milliseconds, is reached.
To synthesise length, each of the segments is measured from the end of the nasal consonant to the beginning of the following vowel, in milliseconds. Then, the opposed segment frames are multiplied or extracted (depending on the case) until this reaches the same length as its correlative in the series.
The frame that contains the explosion bar is excluded of any manipulation, for all the cases, in both procedures.
3. ELABORATION OF THE LINGUISTIC PERCEPTION TEST (LPT)
After the synthesis, 12 synthesised utterances (SU) are obtained. These have the following characteristics:
Each of these SU is stored in a digitalised file of sound and given a number. Then they are randomised, and all the SUs and NUs are recorded on a tape obtaining a recording of approximately 10 minutes containing 18 utterances. Finally, a form which includes instructions and an answer sheet where the listeners-judges must write in common writing what they hear in the recording is elaborated.
4. DATA ANALYSIS
The LPT is given to 120 subjects and the data is registered in the following 3 categories:
5. NATURAL UTTERANCES RECOGNITION
In the /b-d-g/ series 96.67% of the subjects correctly identified the natural utterances, while for the /p-t-k/ series 95% identified the natural utterances. In average, 95.83% of the subjects correctly identified the natural utterances. Even though the general average of natural utterance recognition of all the subjects in the sample is more than 95%, the data analysis has been done considering only those subjects who recognise 100% of natural utterances, in order to give a greater validity to the LPT results.
6. RESULTS WHEN ALTERING THE VOICING FEATURE
Once modified the voicing feature in the /b-d-g/ series an average 95.34% of the subjects recognises the original phoneme, 3.23% recognises it as its correlative in the series, and only 1.43% recognises it as other phoneme or does not answer [IMAGE 01].
When the voicing in the /p-t-k/ series was modified, an average 68.46% of the subjects recognises the original phoneme, 2.15% recognises it as its correlative in the series, and 29.39% recognises it as other phoneme or does not answer [IMAGE 02]. A high percentage of subjects who in these series recognise the phoneme as other when the voicing is modified can be observed. In the particular case of /k/ 76.34% of the subjects recognises it as other (Table 1), and in the case of /t/ the same happens in 10.75% of the cases (Table 1). The analysis of these results with the Chi-square test shows that the last two values mentioned are significant respect to its similar ones regarding recognition when length is altered. The analysis of the average percentage of the series with the same statistical test shows that this is also highly significant with respect to its similar ones in phoneme recognition with modified length.
In both series with altered voicing, an average 81.9% of the subjects recognises the phoneme as the original, 2.69% recognises it as the opposed, and 15.41% recognises it as other phoneme or does not answer [IMAGE 03].
Table 1. Altered voicing confusion matrix.
7. RESULTS WHEN ALTERING THE LENGTH FEATURE
When altering the length in the /b-d-g/ series an average 54.84% of the subjects recognises the original phoneme, 40.86% recognises it as the opposed in the series and 4.3% recognises it as other or does not answer [IMAGE 04]. When altering the length, there is an increase in the percentage of subjects who recognise one segment as the opposed in the correlation respect to the percentages of the voicing feature. In the particular case of /b/ 73.12% of the subjects recognises /b/ as /p/ when modifying this feature (Table 2). In the case of /d/ the same phenomenon is noticed in 46.24% of the cases (Table 2). The analysis of this data using the Chi-square test shows that these percentages are significant in relation to its similar ones in recognising utterances with altered voicing. This two phenomena determine that the average percentage of recognition as the opposed in the /b-d-g/ series would be 40.86% as already mentioned. This index is also significant according to the Chi-square test when compared with its correlative percentages in the recognition of natural utterances and in the recognition of utterances with modified voicing. However, it must be said that in the case of /g/ with altered length only 3.23% of the subjects recognises it as its opposed in the series, a non-significant percentage according to the Chi-square test.
When altering the length in the /p-t-k/ series, an average 83.87% of the subjects recognises the altered utterances as the original, 14.7% recognises it as the opposed in the series, and only 1.43% recognises it as other or does not answer [IMAGE 05]. It can be noticed that a high percentage of the subjects recognises the original utterance. Considered separately, the three segments of the series show similar behaviours to the average percentage. In these series the recognition percentages as the opposed phonemes is higher when length is modified than when voicing is modified. Even though the higher one does not reach more than 21.51%.
When length is altered in both series, an average 69.35% of the subjects recognises the phoneme as the original, 27.78% recognises it as the opposed, and 2.87% recognises it as other phoneme or does not answer [IMAGE 06].
It must be said that when comparing the recognition percentages of the phoneme /g/ between altered voicing utterances and altered length utterances by means of the Chi-square test the results are non-significant, i.e. in the case of /g/ both groups have a similar behaviour.
Table 2. Altered length confusion matrix .
When the acoustic features of the phonological correlation /p-t-k/ vs. /b-d-g/ and the synthesised utterances were submitted to listeners-judges for discrimination, no uniformity was found in the results. Voicing expressed by the absence or presence of low frequency bands in the spectrograph has no major influence in the discrimination of the correlation /p-t-k/ vs. /b-d-g/. The absolute length does have an influence in the discrimination of the correlation /p-t-k/ vs. /b-d-g/. However, it affects mainly the confusion of /b/ as /p/ (73%), and of /d/ as /t/ (46%). Its influence is moderate in the opposite confusion, i.e. of /p/ as /b/ (16%), and of /t/ as /d/ (22%). Nonetheless, there is practically no influence in the confusion between /g/ and /k/ (3%), and vice versa (6%).
If the absence or presence of low frequency bands is considered as the acoustic manifestation of the articulatory feature of voicing, then the results do not agree with what has been traditionally stated by the academy regarding the fact that voicing is the feature that has an influence in discrimination between /p/ and /b/; /t/ and /d/; and /k/ and /g/. After this type of experiment, and having removed or added low frequency bands, a change in the perception of the segments is expected and strictly speaking each altered realisation should be perceived as its opposed in the series. This supports the opinions stated by Alarcos Llorach (1954), and Cressey (1978) regarding that voicing is merely a functional feature and that the idea of voiced has only a proportional meaning in the relations of /p/ to /b/, of /t/ to /d/, and of /k/ to /g/ regardless of the acoustic realities, which differentiate these pairs of phonemes.
It is the altered length what produces a grater modification in the subjects perception. However, this is not an extremely high percentage. There is also a slight trend to confuse the segment with its correlative. The fact that the perceptual change is considerably marked in the alteration of the length of labial and post dental voiced segments is outstanding. This is not the case with its correlatives /p/ and /t/, and this is very low in velar segments. It is also interesting to note the phenomenon which occurs when altering the voicing feature in the segment /k/. In this case 73% of the subjects identified it as /t/ and not as /g/ as would be expected following the logical behaviour in the series. These two facts could indicate that these segments do not really behave as a series and that the place of articulation does influence the form in which each of the minimal pairs differentiate. These results also support what was stated by Guirao (1980). She found that, as objective response can be caused by one of various parameters, thus the stimulus can appear as only one variable or as a combination of variables. Probably, each segment has its own acoustic identity and this is not based on just one feature, but on an interaction of various features which can even compensate each other in such a way that when one is weakened the other fulfils the distinctive function. However, all this cannot be fully confirmed until thoroughly proved by experimental procedures.
The physical eventperception relationship is such a complex one that would be naive to think that only one research could fully account for it. This first approach is pointing towards some aspects which should be researched in further studies. The influence of the length of the silence after the explosion bar mainly in the identification of /p-t-k/ should be studied. A suitable methodology to measure, isolate, and modify the explosion bar in both series should be defined. Likewise, the reaction of the subjects to each of the segments separately should be studied because including all of them together in just one test can influence the perception of the listeners-judges. Finally, a procedure should be designed to experimentally study the acoustic features in interaction, and their influence, as a set, on perception.