File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/w94-0201_evalu.xml
Size: 20,132 bytes
Last Modified: 2025-10-06 14:00:13
<?xml version="1.0" standalone="yes"?> <Paper uid="W94-0201"> <Title>AUTOMATED TONE TRANSCRIPTION</Title> <Section position="7" start_page="0" end_page="11" type="evalu"> <SectionTitle> IMPLEMENTATIONS </SectionTitle> <Paragraph position="0"> In this section, I show how it is possible to get two programs to produce a sequence of tones T (i.e. a tone transcription) given a sequence of n F0 values X. The programs make crucial use of the prediction function &quot;P in evaluating candidate tone transcriptions.</Paragraph> <Paragraph position="1"> Both programs involve search, and in general, the aim in searching is to discover tile values for xl, ..., xn so as to optimise the value of a specified evaluation fimction f(xl,...,xn). When f has many local optima, deterministic methods such as hill-climbing perform poorly. This is because they terminate in a local optimum and the particular one found-depends heavily on the starting point in the search, and there is usually no way of choosing a good starting point.</Paragraph> <Paragraph position="2"> Exhaustive search for the global optimum is not an option when the search space is prohibitively large. In the present context, say for a sequence of 20 tones, the search space contains 6 ~deg ~ 10 is possible tone transcriptions, and for each of these there are thousands of possible parameter settings, too large a search space for exhaustive search in a reasonable amount of compu-</Paragraph> <Paragraph position="4"> ration time.</Paragraph> <Paragraph position="5"> Non-deterministic search methods have been devised as a way of tackling large-scale combinatorial optimisation problems, problems that involve fin(ling optima of functions of discrete variables. 'I'hcse methods are only designed to yield an approximate solution, but they do so in a reasonable amount of computation time. The best known such methods are genetic search (Goldberg, 1989) and annealing search (van Laarhoven & Aarts, 1987). Recently, annealing search has been successfully applied to the learning of phonological constraints expressed as finite-state automata (Ellison, 1993). In the following sections I describe a genetic algorithm and an annealing algorithm for the tone transcription problem.</Paragraph> <Paragraph position="6"> A Genetic Algorithm For a cogent introduction to genetic search and an explanation of why it works, the reader is referred to (South et al., 1993). Before presenting the version of the algorithm used in the implementation, ! .~hall informally define the key data types it uses ah,ng with tim standard operations on those types.</Paragraph> <Paragraph position="7"> g,,ne A line;at encoding of a solution. In the present setti,Lg, it is an array of n tones, where each tone is oim of H, SH, TH, L, SL or tL. A gene also contains 16 bit eucodings of the parameters h, l and ,I. These encodings were scaled to be floating i)oint numbers in the range \[90,110\] for /,, \[70, I0,)\] for t and \[0.6, 0.9\] for d.</Paragraph> <Paragraph position="8"> gene pool An array of genes, P. One of the seearch parameters is the size of P, known as the population. The gene pool is renewed each generation, and the number of generations is another search parameter.</Paragraph> <Paragraph position="9"> evaluation A measure of the fitness of a gene as a solution to the problem. Suppose that X is the sequence of F0 values we wish to transcribe.</Paragraph> <Paragraph position="10"> Suppose also that T is a particular gene. The the evaluation function is as follows: &quot;</Paragraph> <Paragraph position="12"> crossover This is an operation which takes two genes and produces a single gene as the result.</Paragraph> <Paragraph position="13"> Suppose that A = al&quot;-an and B = bl...b,.</Paragraph> <Paragraph position="14"> Then the crossover function Cr is defined as follows, where r is the (randomly selected) crossover point (0 < r < n).</Paragraph> <Paragraph position="15"> Cr(al . . .arar+l &quot; &quot;a,~,bl &quot; &quot;brbr+l &quot; &quot;bn) -- al &quot; &quot;arbr+l &quot; &quot;bn In other words, the genes A and B are cut at a position determined by r and the first part of A is spliced with the second part of B to create a new gene. Crossover builds in the idea that good genes tend to produce good offspring. To see why this is so, suppose that the transcriptioln contained in tile first part of A is relatively good while the rest is poor, while the trallscription contained in the first part of B is poor and the rest is relatively good. Then the off,spring containing the first part of A and the second part of B will be an improvement on both A and B; other possible offspring from A and B will be significantly worse and may not survive to the next generation. The program performs this kind of crossover for the parameters h, l and d, employing independent crossover points for each, and randomising the argument order in C',. so that the high order bits in the offspring are equally likely to come from either parent.</Paragraph> <Paragraph position="16"> An extension to crossover allows more than one crossing point. The current model permits an arbitrary number of crossing points for crossover on the transcription string. The resulting gene is optimal since we choose the crossing points in such a way as to rninimise (~ti_lti(Xi-1) -- Zi) 2 at each position. In developing the system, exploiting the decomposability of the ewduation fimction in this way caused a significant improvement in system performance over the version which used simple crossover.</Paragraph> <Paragraph position="17"> breeding For each generation, we create, a new gene pool from the previous one. Each new gene is created by mating the best of three randomly chosen genes with the best of three other randomly chosen genes.</Paragraph> <Paragraph position="18"> mutation In order to maintain some genetic diversity and an element of randomness throughout the search (rather than just in the initial configuration), a further operation is applied to each gene in every generation. With a certain probability (known as the mutation probability), for each gene T and each tone in T, the tone is randomly set to any of the six possible tones.</Paragraph> <Paragraph position="19"> Likewise, the parameter encodings are mutated.</Paragraph> <Paragraph position="20"> The mutation rate is set to 0.005 but raised to 0.5 for a single generation if the evaluation of the best gene is UO improvement on the evaluation of the best gene ten generations earlier. Thc best gene is never mutated.</Paragraph> <Paragraph position="21"> The building blocks of genetic search discussed above are structured into the following algorithm, expressed in pseudo-Pascal:</Paragraph> <Paragraph position="23"> The main loop is executed for each generation.</Paragraph> <Paragraph position="24"> EaCh time through this loop, the program checks performance over the last ten generations and if performance has been good, the mutation rate stays low, otherwise it is changed to high. Then it copies the best gene to the new pool. Now we reach the inner loop, which selects two genes, performs crossover, and mutates tim result. Next, the current pool is updated, an evaluation is performed, and the program continues with the next generation. Once all the generations have been completed, the program displays the best gene from the final population and terminates.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> An Annealing Algorithm </SectionTitle> <Paragraph position="0"> As with genetic algorithms, simulated annealing (van Laarhoven & Aarts, 1987) is a combinatorial optimisation technique based on an analogy with a natural process. Annealing is the heating and slow cooling of a solid which allows the formati,m of regular crystalline structure having a mininu,n of excess energy. In its early stages when the temperature is high, annealing search rcsembles random search. There is so much free euergy in the system that a transition to a higher energy state is highly probable. As the temperature decreases the search begins to resemble hill-climbing. Now there is much less free energy and so transitions to higher energy states are h'ss and loss likely. In what follows, I explain some of the I)arameters of annealing search as used in the curreut implementation. null temperature At the start of the search the temperature, t is set to 1. During the search, the temperature is reduced at a rate set by the 'cocrling rate' parameter, until it reaches a valne loss than 10 -C/ .</Paragraph> <Paragraph position="1"> perturbation At each step of the search, the cu rrent state is perturbed by an amount which depends on the temperature. The temperature determines the fraction of the search space that is covered by a single perturbation step. For a tone sequence of length n, we randomly reset the worst n..t tones according to (Pt,_,t~ (xi-I)xi) 2. For the parameters we proceed as tbilows, here exemplified for h. First, set p = t(hmax-hmi~). Now, add to h a random number in the range \[-p, p\] and check that the result is still in the range \[h,nin, hmax\].</Paragraph> <Paragraph position="2"> equilibrium At each temperature, the system is required to reach 'thermal equilibrium' before the temperature is lowered. In the present context, equilibrium is reached if no more than one of the last eight perturbations yielded a new state that was accepted.</Paragraph> <Paragraph position="3"> free energy function This is the amount of available energy for transitions to higher energy states. In the current system, it is the distribution -lO00.t.log(p), where p is a uniform random variable in the range (0, 1\]. If the energy difference A between an old and a new state is less than the available energy, then the transition is accepted. The factor of 1000 is intended to scale the energy distribution to typical values of the evaluation function.</Paragraph> <Paragraph position="4"> Now the algorithm itself is presented: procedure annealing_search begin</Paragraph> <Paragraph position="6"> The program is made up of two loops. The outer loop simply iterates through the temperature range, beginning with a temperature of 1 and steadily decreasing it until it gets very close to zero. The nested loop performs the task of reaching thermal equilibrium at each temperature. The first step is to perturb the previous transcription to make a new one. Notice that the temperature t is a parameter of the perturb function. Next, the difference PSx between the old and new evaluations is calculated. If the new transcription has a better evaluation than the old one, then PSx is negative. Next, the program accepts the new transcription if (i) A is negative or (ii) A is positive and there is sufficient free energy in the system to allow the worse transcription to be accepted. Finally, we check if the new transcription is better than the best transcription found so far (BestTrans) and if so, we set BestTrans to be the new transcription.</Paragraph> <Paragraph position="7"> Once equilibrium is reached, the current transcription is set to be the best transcription found so far, and the search continues.</Paragraph> <Paragraph position="8"> Both the genetic and annealing search algorithms have been implemented in CA-+. In this section, the performance of the two implementations is compared. Performance statistics are based on 1,200 exe- null both programs generated random sequences of tones, then computed the corresponding F0 sequence using P, then set about transcribing the F0 sequence. Since these sequences were ideal, the best possible evaluation for a transcription was zero.</Paragraph> <Paragraph position="9"> The performance of the programs could then be measured to see how close they came to finding the optimal solution. Each program was tested on F0 sequences of length 5, 10, 15 and 20. For each length, each program transcribed 100 randomly- null is for the genetic search program, while the right member is for the annealing search program.</Paragraph> <Paragraph position="10"> The heavily shaded bars corresponding to evaluations less than 1 are the most important. These indicate the number of times out of 100 that the programs found a transcription with an evaluation less than 1. This evaluation means that the average of the squared difference between the predicted F0 values and the actual F0 values was less than 1Hz. Observe that the annealing search program performs significantly better in all cases.</Paragraph> <Paragraph position="11"> Note that the mutation operation in the genetic search program treats each bit in the parameter encodings equally, while the perturbation operation in the annealing search program is sensitive to the distinction between more significant vs. less significant bits. This may explain the better convergence behaviour of the annealing search.</Paragraph> <Paragraph position="12"> does not degrade with transcription length as the length doubles from 10 to 20. This is probably because a randomly generated sequence will contain downsteps on every second tone (on average) causing a general downtrend in the F0 values and severely limiting the combinatorial explosion of possible transcriptions.</Paragraph> </Section> <Section position="2" start_page="0" end_page="10" type="sub_section"> <SectionTitle> Trial 2: Artificial Data with Upstep. Trial </SectionTitle> <Paragraph position="0"> 2 was the same as trial 1 except that this time upstep was permitted as well. The results are displayed in Figure 7. Again the annealing program fares better than the genetic program. Consider again the bars corresponding to evaluations less than 1. For both programs, however, observe that the performance degrades more uniformly than in trial 1, probably because the inclusion of upstep greatly increases the number of possible transcriptions (and hence, the number of local optima). Trial 3: Actual Data. The final trial involved real data, including data from the utterance given in Figure 1. This trial involved four subtrials. The first and second had F0 sequences of length 10, while the third and fourth had length 18 and 19. The first and second sequences were taken by extracting the initial 10 F0 values from the third and fourth sequences, thereby avoiding the asymptotic behaviour of the longer sequences.</Paragraph> <Paragraph position="1"> The data is tabulated below, and it comes from the sentences in (5).</Paragraph> <Paragraph position="2"> Performance results are given in Figure 8. Notice that the interpretation of the shading in this figure is different from that in previous figures. This is because evaluations near zero were less likely with real data. In fact, the annealing program never found an evaluation less than 3 while the genetic program never found an evaluation less than 4.</Paragraph> <Paragraph position="3"> Since the programs performed about equally on finding transcriptions with an evaluation less than 7, I shall display these transcriptions along With an indication of how many times each program found the transcription (G = genetic, A = annealing). I give transcriptions which occurred at least twice in one of the programs, during The results from trial 1 deserve special attention. In trial 1, three transcriptions were found by both programs. The best evaluations found are given below: It is striking to note that the first two transcriptions above are what Hyman and Stewart (respectively) would have given as transcriptions for the abstract F0 sequence 1 324354657. This is (temoustrated in (7a,b). The third transcription points to another possibility, given in (7c).</Paragraph> <Paragraph position="4"> Therefore, there are encouraging signs that the program is living up to its promise of producing alternative, equally acceptable transcriptions, a.~ desired from an analytical standpoint.</Paragraph> </Section> <Section position="3" start_page="10" end_page="11" type="sub_section"> <SectionTitle> Multiple Solutions </SectionTitle> <Paragraph position="0"> All,hougJt we have seen more than one transcription I'or a giwm !&quot;0 sequence, it is inconvenient to I)o required to run the programs several times in order to see if more than one solution can be found. Furthermore, the programs are designed not to get caught in local optima, which is a problem since interesting alternative transcriptions may actually be local optima. Therefore, both programs are set up to report the k best solutions, where the user specifies the number of solutions desired. The program ensures that the same area of the search space is not re-explored by subsequent searches.</Paragraph> <Paragraph position="1"> This is done by defining a distance metric on transcriptions which counts the number of tones in one tra.nscription that have to be changed in order to make. it, identical to the other transcription. That pa.rt of the search space within a distance of n/3 I'rom any I)reviously found solut.ion is not explored again. The lu'ograms give up before linding k solutions if 5 randomly generated transcriptions all fidl within distance n/3 of previous solutions.</Paragraph> <Paragraph position="2"> Now, consider the following randondy generated sequence of tones: 201 215 20l 173 163 201 173 d: 0.87 : The annealing program was set the task of finding ten transcriptions of this tone sequence. The program was run only twice, and it reporte(I the following solutions with evaluations less than or equal to 1. Both runnings of the program found the same solutions, and in the same order. (Note that two transcriptions are taken to be the same if one or both begin with an initial upstep or downstep; this has no effect on the phonetic interpretation). In the following displays, the predicted F0 values are given below each solution to facilitate comparison with the input sequence.</Paragraph> <Paragraph position="3"> Since all executions to this point have been based on the first table of R values, it was decided to try a test with the second table of R values to see if the performance was different. Interestingly, the third solution in both of the above executions was not found, though two new solutions were Oluld.</Paragraph> <Paragraph position="4"> Observe that the value of d ill tile above solutions clusters around 0.66 and 0.87. Simila.r clustering may bc occurring with the ratio h/l. However, all analysis of the relationship between the kinds of solutions found, tile two It tables and the parameter values h, l and d has not been attempted. null</Paragraph> </Section> <Section position="4" start_page="11" end_page="11" type="sub_section"> <SectionTitle> Areas for Further Improvement </SectionTitle> <Paragraph position="0"> It is rather unsatisfying that the performance of the two programs is heavily dependent on (,he setting of several search parameters, and it seems to be a combinatorial optimisation problem in itself to find good parameter settings. My triM-anderror approach will not necessarily have found optimal parameter values, and so it would I,e premature to conclude from tile performance comparison thai. annealing search is better than genetic search for the problem of tone transcription. A more thoroughgoing comparison of these two approaches to the problem needs to be undertaken.</Paragraph> <Paragraph position="1"> Since the parameters are continuous variables, and since the evaluation function--which we could write as CT,x(h,l,d)--is a smoothly continuous function in h, l, d, it would be worthwhile to try other (deterministic) search methods for optimising h, l and d, once a candidate tone transcription T has been found.</Paragraph> <Paragraph position="2"> Finally, it would be interesting to integrate a system like either of the ones presented here into a speech workstation. As the phonologist identifies salient points with a cursor the system would do the traJ~scril)tion , incrementally and interactively.</Paragraph> </Section> </Section> class="xml-element"></Paper>