File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/w94-0201_metho.xml
Size: 18,832 bytes
Last Modified: 2025-10-06 14:13:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W94-0201"> <Title>AUTOMATED TONE TRANSCRIPTION</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> TONE TRANSCRIPTION Generation and Recognition </SectionTitle> <Paragraph position="0"> A prot nising way of generating contours from tone sequences is to specify one or more pitch targets per tone and then to interpolate between the targets; the task then becomes one of providing a suitable sequence of targets (Pierrehumbert & Beckman, 1988). It is perhaps less clear how we should go about recognising tone sequences from pitch contours. Hidden Markov Models (HMMs) (Huang et al., 1990) offer a powerful statistical approach to this problem, though it is unch:ar how they could be used to rccognise the units of interest to phonologists, ttMMs do not encode timing information in a way that would allow them to output, say, one tone per syllable (or vowel).</Paragraph> <Paragraph position="1"> Moreover, the same section of a pitch contour may correspond to either H or L tones. For example, a H between two Hs looks just like an L between two Ls. There is no principled upper bound on the amount of context that needs to be inspected in order to resolve the ambiguity, lea(ling to a multiplication of state information required by the HMM and problems for training it.</Paragraph> <Paragraph position="2"> In the present context, the emphasis is not on automatic speech recognition but on a tool to support phonologists working with tone. As we shall see in the next section, once the phonologist has identified the salient location to measure the 'F0 value' of a syllable (or some other phonological unit), the task will be to automatically map a string of these values to a string of tones.</Paragraph> <Paragraph position="3"> A Tool for Phonologists Connell and Ladd have devised a set of heuristics for identifying key points in an F0 contour to record F0 values (Connell & Ladd, 1990, 21If). In the absence of a program which enshrines these heuristics, it was decided to develop a system for producing a tone transcription from a sequence of F0 values. Apart from the obvious benefits of automating the process, such as speed and accuracy, it ~'ould show up cases where there is more than one possible tone transcription, possibly with different parameter settings for the F0 scaling function. Having the set of tone transcriptions that are compatible with an utterance has consideral,le value to an analyst, searching for invariances in I.he tonal assignments to individual morphenaes.</Paragraph> <Paragraph position="4"> To exemplify this point, it is worth consktering a recent example where an alternatiw~ transcription of some data proved valuable in providing a fresh analysis of the data. In their analyses of tone in Bamileke Dschang, Hyman gives tile transcription in (la) while Stewart gives the one in (lb), for the phrase meaning machete of dogs.</Paragraph> <Paragraph position="5"> (1) a. flJai mSmSbhd -- (Hyman, 1985, 50) b. J~Jai't' SmSmbh6- (Stewart, 1993, 2(10) These two possibilities exist because of different F0 scaling parameters. These parameters deternfine the way in which the different tones are scaled relative to each other and to the speaker's pitch range. This is illustrated in (2), adapting Hyman's earlier notation (Hyman, 1979).</Paragraph> <Paragraph position="6"> (2) a. Hyman: flJli m~m,l.bhti .fl pl f mo $ mbhfi</Paragraph> <Paragraph position="8"> Example (2) displays a kind of phonetic interpretation function. Immediately below the two rOWS of tOllC'S we see a row of inllnbers corresponding to the tones. For Hyman, L=3 and H=I, while for Stewart, 1,=2 and H=I. Observe in HynllUi'S example that a rising tone.--synlbolised by a wedge abow: the i .--.is modelled as all btl scquencl: in keeping with standard practice in African tone analysis.</Paragraph> <Paragraph position="9"> The second row of numbers corresponds to downstep (.1.) and upstep ('1&quot;). For Hymart's model, this row begins at 0 and is increased by 1 for each downstep encountered. For Stewart's model, this row begins at. 1 and is increased by 1 for each downstep encountered and decreased by 1 for each upstep encountered. The two rows are summed vertically to give the last row of numbers. Observe that the last rows of Stewart's and Hyman's models are identical.</Paragraph> <Paragraph position="10"> The parameter which distinguishes the two approaches is partial vs. total downstep. Hyman treats Dschang as a partial downstep language, i.e. where .I.H appears as a mid tone (with respect to the material to its left). Stewart treats it as a total downstep language, i.e. where ~H appears as an I, tone (with:respect to the material to its left). While Hyman and Stewart present rather different analyses of rather different looking transcriptions, we can see that they are really analyzing the same data, given the above interpretation function. Therefore, phonologists who do not wish to limit themselves to the transcriptions which resuit from certain parameter settings in the phonetic interpretation function would be better off w,,rkiug directly with number sequences like the last row in (2). This paper describes a tool which lets them do just that.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Fo SCALING </SectionTitle> <Paragraph position="0"> C, onsider again the F0 contour in Figure 1. In particular, ilote that the F0 decay seems to be to a non-zero asymptote, and that H and L appear to have different asymptotes which we symbolise as h and I respectively. These observations are clearer in Figure 2, which (roughly speaking) displays the peaks and valleys from Figure 1.</Paragraph> <Paragraph position="1"> Although this is admittedly a rather artificial ex:unple, it remains true that there is no principh,,I upper limit ou the number of downsteps that C;i.II oCcllr in an utterance (C.\]eluents, 1979, 540), lul, I so the a.sytnptotic behaviour off Fll scaling still IIC,'ds I.o I)c addressed.</Paragraph> <Paragraph position="2"> NOw Sul)pose tllat we have a sequence T of t(mcs where ti is the ith tone (H or L) and a sequence X of F0 values where xi is the F0 value corresponding to ti. Then we would like a formula which predicts xi given xi-1, ti and ti-x (i > 1).</Paragraph> <Paragraph position="3"> We express this as follows:</Paragraph> <Paragraph position="5"> The question, now, is what should this function look like? Suppose for sake of argument that the ratio of L to the immediately preceding tt in Figure 2 is constant, with respect to the baselines for H and L, namely h and I. Then we have:</Paragraph> <Paragraph position="7"> More generally, suppose that we have a sequence of two arbitrary tones. Ignoring the possibility of downstep for the present, we have a static twotone system where HH and LL sequences are level and sequences like HLHLHL are realised as simple oscillation between two pitches. We can write the following formula, where \[i = h if tl = H and</Paragraph> <Paragraph position="9"> The situation becomes more interesting when we allow for downdrift and downstep. Downdrift is the automatic lowering of the second of two H tones when an L intervenes, so HLH is realised as \[--\] rather than as \[-_-\], while downstcp is the lowering of the second of two tones when an intervening l, is lost, so HI.H is rea.lised as \[ \] (llyman & Schuh, 1974). Bamileke l)schang has downstep but m>t downdrift while lgbo has downdril't but only wiry limited downstep. Now we deline ti = h iftl --I\[, ,IH and ii = l ifti =L, ,I.L. Generalising our equation once more, we have the following, where R is a factor called the transition ratio.</Paragraph> <Paragraph position="11"> Now I shall show how this general equation relates to the equations for \[gbo (Liberman et al., 1993,</Paragraph> <Paragraph position="13"> It will be helpful to introduce one more level of generality. P relates adjacent F0 values, but we would also like to relate non-adjacent values, given the sequence of intervening tones. Suppose that T = t0 * - * t,~ is a tone sequence where the F0 value of to is x. Then we shall write the F0 value of tn as PT(X). By repeated applications of'P we can write down the following expression for 'PT:</Paragraph> <Paragraph position="15"> where RT = YI~=i Rtk_~th, n > 2. Now, suppose that S = so&quot;'sm and T = to&quot;.tn are tone sequences and that s0 =/0, .sin = t'n and T~.s = T~T.</Paragraph> <Paragraph position="16"> Then it is straightforward to show that Rs = RT.</Paragraph> <Paragraph position="17"> Notice also that if 7~T(X) = x for all x and if f0 = t-~ then RT = 1. These results will be useful in the next section.</Paragraph> <Paragraph position="18"> Finally, it is worth comparing ~ with Hyman's and Stewart's interpretation functions which were illustrated in (2). As pointed out already, Hyman's is a partial downstep model while Stewart's is a total downstep model. Partial and total downstep can be visualised as follows, where the dotted lines indicate the abstract register inside which tones are scaled, and where downstep corresponds to lowering of the register.</Paragraph> <Paragraph position="20"> Observe that for partial downstep, it. is necessary to have two downsteps before a high tone is at the level of a preceding low, while for total downstep, it is only necessary to haw, a single downstep for a high tone to be at the same level as the preceding low. We can express these observations about partial and total dowustep in the model as follows. For partial downstep, we have 'Pt.$tt4U(Z) = x while for total downste i) we have 'PL~.H(X) = x. For both of these equations we :ire forced to have h = I which does not semn to be empirically justifiable in view of the data in Figur, l. It might be argued that this indicates a flaw iu I.he model being presented here, since partial and total downstep are widely attested in the literature on tone languages. Unfortunately, it is not possible in general to provide a model for partial or total downstep which permits distinct asymptotes for It and LJ Therefore, to the extent that Figure I is typical of tone languages in having dilferent H a.d L asymptotes, one must conclude that total and partial downstep are qualitative tern,s only. Ihrwever, they may yet re-emerge in the ,nodel under a different guise, as we shall see later.</Paragraph> <Paragraph position="21"> The effect of the distinction between partial and total downstep is to allow different transcriptions of the same string, as we saw in (2). In general, we have the following mappiug between transcriptions under the two views of downstep: (4) partial total</Paragraph> <Paragraph position="23"> It is clear that changing from one view of downstep to the other amounts to adding and deleting $ and t while leaving the tones themselves unchanged. Thus, the model admits both transcription schemes that result from the two views of downstep, and another besides, as shown later in (7).</Paragraph> <Paragraph position="24"> This concludes the discussion of the F0 prediction function. In the next section i shall investigate the phonetic interpretation of tone in Bamileke Dschang, and determine the values of R for this language.</Paragraph> <Paragraph position="25"> tTo see why this is so for the case of total downstep, suppose that such a model did exist, and so I < h. Let x E \[1, h), a valid F0 value for a low tone. Now, whatever interpretation function 'P' we use, wc still require that &quot;PL4H(X) = x by definition of total downstep, which means that there is now a high tone with a F0 value less than h. But h is tile asymptote below which no high tones should ever be realised, and so we have a contradiction. The case for partial downstep follows similarly.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> TONE AND Fo IN BAMILEKE DSCHANG </SectionTitle> <Paragraph position="0"> In a recent fiekl trip to Western (',ameroon to study the Bamileke Dschang ~ noun associative construction, I was able to collect a small amount, of data relating to F0 scaling throughout a particular informant's pitch range. Following Liberman et al., voice pitch was varied by getting the informant to speak at different volmnes and by adjusting the recording level appropriately. However, rather than asking the informant to imagin,: speaking to a subject at different distances, I controlled the volume by having the informant wear headphones and played white noise from a detuned radio. Thus, I could set the informant's voice pitch by using the volume control on my radio. My hypothesis is that this technique produces more consistent volume (and hence, pitch scaling) over long utterances and may make informants less self-conscious about speaking loudly than simply asking them to imagine speaking to subjects at various distances away. Measurements were taken from the following data.</Paragraph> <Paragraph position="1"> (5) HH d 3u5 sS1) t6 VI~U:5 t5 o t6 n3t~5 tdO t6 nSu3 kd.p tC/~ nStt3 kip He s,w the bird before, he saw the hat before he saw the b~r~k'et before he saw the pipe before he saw the cup</Paragraph> <Paragraph position="3"> jealousy and jealousy and ... jealousy 15~p5 mb5 155p5 mb5 ... 15.l.pa breast and breast and ... breast mb.l.vt~t rob5 mbSvt~t rnb5 ... mbSv~t oil and oil and ... oil $m5 rob5 ,~m5 rob5 ... $m5 child and child and ... child II,egrettably, the LL data was only available fr, ,n isolat,,, I disyllal des, and other sequences such a.~ IAI and 115It were not available at all. ttowever, from the F0 data for the above utterances we can hypothesise the behaviour of these unseen sequences, and this can be tested in subsequent empirical w,,rk. The r,'sults for utterances involving HH and LI, sequences are displayed in Figure 3, while resuits for L.~II and HI, are displayed in Figure 4. The regression equations obtained from these data are displayed in (6), where the number of oc2Bamileke Dschang is a grassfields Bantu language spoken in the Western Province of Cameroon. The name 'Bamileke' (pron: \[ba'mileke\]) represents both au e~,hnic grouping and a language cluster; Dschang (pron: \[tfmJ\]) is an important t.own around which one of the Bamileke languages is spoken. The data here is from the Balbu dialect.</Paragraph> <Paragraph position="4"> currences of each tone sequence is given in parentheses after the sequence. The third column gives the standard error for the gradient and intercept. (6) 'l'one Regression Standm'd</Paragraph> <Paragraph position="6"> From this, we conclude that. HL is the only sequence with an intercept significantly different from zero, and that x{ = x{-1 for HH and LL sequences. We also conclude that .RHH : .RLL =</Paragraph> <Paragraph position="8"> value will be referred to as the quantity d. We also see that I -- 88Hz and h = 96Hz. Fortunately, these figures are sufficient to determine the R values for all other pairs of tones in Barnileke Dschang.</Paragraph> <Paragraph position="9"> A further observation is that Bamileke Dschang does not have downdrift, and so there is no F0 difference across HLH and LHL sequences. This is evident in Figure 5. Therefore, we can write PHLH(X) = X, and by a result we showed above, RHL.RLH = 1. Given that RHL = d it follows that RLH = ~.</Paragraph> <Paragraph position="10"> Concerning downstep, I shall assume that the magnitude of downstep is independent of the tones on either side, and so ~OHL4H = 'PH$H ---- &quot;\])LSL ---~LII.I.L. A separate instrumental study supports this hypothesis t(Bird & Stegen, 1993). Therefore, we lave l~st = 7Pt,s.Lt -- dRstt, where s is any tone and t is 1I or L.</Paragraph> <Paragraph position="11"> Finally, it is itnportant to briefly consider upstep, since it has been used in some analyses of Banfileke Dschang (e.g. Stewart's). Given that upstep and downstep are intended as inverses of each other, we have the identities 79~4t,rt = &quot;Pat = P~'rt~.t, with ~, t as before. We now have a complete table</Paragraph> <Paragraph position="13"> n,$n,~n 1 d d d z d -~ 1 ! L,$L,~L d -1 1 1 d d -2 d -1 Observe the symmetries in this table. The configuration of four R values that we find when ti is not downstepped or upstepped (the first two cohmms) is reproduced in the columns for downstep (multiplied by d) and in the columns for upstep (divided by d).</Paragraph> <Paragraph position="14"> Note also that the above table is dependent upon how the data in (5) was transcribed. Suppose that we had not used repetitions of HLSH (a transcription scheme based on partial downstep) but HSLH (a scheme based on total downstep). Then we would have had RH4L = d and /'~.LH ---- 1. Accordingly, the table for R would be as follows: ti-t H L $H SL tH TTL H, SH, I&quot;H 1 1 d d d -1 d -1 L, SL, tL 1 1 d d d -1 d -x The fact that we have two possible tables for R is no cause for alarm. Recall that the transition between two tones ti-1 and ti also involves the factor {i/\[i-x. This factor is manifested in tone transitions according to the following pattern:</Paragraph> <Paragraph position="16"> I therefore conclude that the presence of more than one table for R indicates an interplay between R values and the ratio h/l. This raises an interesting question. Suppose we have two tone sequences T = t0...t, and 7 '~ = t~...t~, and two interpretation functions &quot;it:' and P' based on R and R ~ respectively. Then under what circumstances is the phonetic interpretation of both sequences the same under their respective interpretation fimctions? A sufficient condition for them to be the same is that \[i tr~ and that Rt~_,t, = = R'q_,,:.</Paragraph> <Paragraph position="17"> The reader can check that these conditions are met by the mapping in (4) and the two tables fi:)r R given above. Note that this observation h,,hls for the model in general, not just for the specialised version of the model as applied to Bamih'ke Dschang.</Paragraph> <Paragraph position="18"> It can also be shown that R is completely determined once RHL is specified. A possible characterisation of total vs. partial downstep now arises: if RHL = 1 then we have total downstep, but if RHL = d < 1 then we have partial downstep.</Paragraph> <Paragraph position="19"> However, the interpretation of these terms must necessarily be different from the standard interpretation, since I have shown that the standard interpretation is not compatible with the present model.</Paragraph> <Paragraph position="20"> This concludes the discussion of F0 scaling in Bamileke Dschang. I shall now present the implementations. null</Paragraph> </Section> class="xml-element"></Paper>