File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/91/j91-3001_concl.xml
Size: 10,636 bytes
Last Modified: 2025-10-06 13:56:38
<?xml version="1.0" standalone="yes"?> <Paper uid="J91-3001"> <Title>Dictionary \] ,L Elimination & Identification Rules Tagnm Analysis C Let~er-to-soun Language</Title> <Section position="5" start_page="271" end_page="274" type="concl"> <SectionTitle> 6. Testing and Evaluation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="271" end_page="271" type="sub_section"> <SectionTitle> 6.1 Performance </SectionTitle> <Paragraph position="0"> The performance goal of the software developed around this algorithm was real-time processing. We benchmarked the performance on a Digital Equipment Corporation Vax 8800 running VMS V5.1. A total of 34,337 names were processed in 25 minutes and 27 seconds, or equivalently 22.65 names per second. After some code optimization and streamlining of the filter rules, we later ran similar tests using the same databases on an 33 MHz PC running MS-DOS V5.0. While these tests were run on the identification portion only, we were able to process several thousand names per second.</Paragraph> <Paragraph position="1"> Large commercial applications will have similar compute power, and thus real-time processing is not a problem. It should be noted that many applications do not require real-time processing since the processed name and address can be simply stored in a separate field in the database. The routines can thus be used to create a database of phonemicized names by preprocessing the name, storing the phonemic equivalent of the name in some field, and sending that field to the synthesizer at some later time.</Paragraph> </Section> <Section position="2" start_page="271" end_page="274" type="sub_section"> <SectionTitle> 6.2 Pronunciation Accuracy </SectionTitle> <Paragraph position="0"> A number of different tests were conducted for accuracy of pronunciation. Accuracy here was measured in terms of the level of segmental and suprasegmental (i.e., stress placement) output determined by a linguist to be reasonable behavior. A more elaborate (and possibly more practical) criterion for accuracy might include the transcription (by a linguist) of a number of pronunciation tokens provided by nonlinguists.</Paragraph> <Paragraph position="1"> Our reasoning was that the output should minimally model human behavior. However, because the algorithm contains more linguistic information than is known by the average person, the software has the potential to be more accurate than a person (i.e., make fewer gross pronunciation errors). Testing of human vs. computer pronunciation of names from a test database is being conducted independently at the present time within the artificial intelligence community (Golding and Rosenbloom 1991) as well as within the telephone industry.</Paragraph> <Paragraph position="2"> One of the problems we faced is a definition of what constitutes correctness. Very often, more than one pronunciation is acceptable and many readers of this paper have had their own names pronounced differently by other individuals. Even professional linguists faced with names such as MOUDRY, FUCHS, SOUTO, D'ANGELO, BADKE, DUJMUCH, SMYTHE, and others cannot say definitively whether one pronunciation is correct or not (Hochberg et al. 1990). For the purposes of our evaluation, in cases like these, a pronunciation was accepted if linguists felt that the segmental phonemic output and stress placement were reasonable. Again, another and possibly more realistic approach might be to phonemicize a set of names from the pronunciation of a group of individuals who are not owners of the name. These pronunciations could then be phonemicized by a linguist and correctness could then be evaluated by a simple matching of the majority pronunciation. In any case, both \[fyuks\] and \[fu6\] were considered correct for FUCHS but \[ffi~iz\] and \[f^ks\] were not; \[smaYO\] and \[smIO\] for SMYTHE but not \[smIOiI; \[diY~njelo\] and \[degnjelo\] for D'ANGELO but not \[daenj61o\] and so on. Similarly, because of homographic variation (Section 5.4.1) and the attempt to make errors replicate what humans might say, we would accept certain pronunciations for names that we knew came from two very different sources as long as one were reasonable. For example, \[p6s\] would be an acceptable pronunciation for PACE even if the first name were Antonio. In fact, often one cannot say definitively that one pronunciation or the other is the one used without asking the person who owns the name. Again, with the loading strategy factoring in first name (above), one might increase the probability of a reasonable pronunciation.</Paragraph> <Paragraph position="3"> Testing was done with several databases that were not used to compile the trigrams. Some degradation was expected when using a new (test) database. However, as shown in Figure 1, after testing, a new database could be merged with the reference database, and new and more complete trigram statistics calculated.</Paragraph> <Paragraph position="4"> Table 3 shows the error rate with and without a dictionary over different subsets of a corpus. The dictionary covered all functors and the 2000 most common surnames.</Paragraph> <Paragraph position="5"> The complex polysyllabic test (see fn. 3) is simply a benchmark for the generic letter to sound rules without use of a dictionary. The last line of the table suggests the improvement possible in name pronunciation (in this case, Japanese names were used).</Paragraph> <Paragraph position="6"> Note the degradation in performance (without the name pronunciation software) from common names to Japanese names.</Paragraph> <Paragraph position="7"> In a second test, we had a subject randomly choose two sets of 100 names from our reference database and two further sets of 100 names from each of two telephone books. One telephone book was hardcopy from a large region in the East and the second was an on-line directory from a large region in the mid-West. In the case of the hardcopy listings, the data were put on line to be analyzed. The softcopy was edited to remove unwanted materiaU 9 We included the softcopy database to minimize any bias, conscious or otherwise, that the subject may have had and these names were chosen with a simple program that pulled out the required number of names from the name field. In spite of the fact that trigrams tend to be repeated over a database (above), we nevertheless expected some degradation going to new test lists, as the data in Table 4 illustrate. The error rate was calculated with and without the identification algorithm on four databases of 100 names each using no dictionary lookup.</Paragraph> <Paragraph position="8"> Because of the high functional load of dictionary entries (see Section 2), scores were expected to be considerably higher when the dictionary lookup module was included.</Paragraph> <Paragraph position="9"> We tested this hypothesis and found that when the dictionary was included in the softcopy test-database analysis (above), the error rate was reduced from 12% to 7%.</Paragraph> <Paragraph position="10"> Other tests also indicated that the use of a dictionary cuts the error rate approximately in half.</Paragraph> <Paragraph position="11"> Due to the fact that many applications written around this software will require the accurate pronunciations of first name and street name as well as last name, we decided to examine the accuracy for each of these categories as well. The anticipation was that the accuracy rate for first names (using a dictionary) would be slightly higher than that of last names and that the accuracy rate for street names would be slightly lower. This is because of the higher frequency of occurrence of first names (above) as well as the fact that the pronunciation of street names tends to be extremely variable and, like place names, has been observed to vary between local and non-local population groups.</Paragraph> <Paragraph position="12"> Table 5 indicates the error rates of the first name and street name tests compared with the last name tests mentioned above, run over the test database with and without a dictionary.</Paragraph> <Paragraph position="13"> 19 Telephone listings typically contain a variety of information including, inter alia, the telephone number and street number, demarcation of upper case (e.g., MC*ADOO) special symbols for unlisted numbers and so on.</Paragraph> <Paragraph position="14"> Vitale Algorithm for High Accuracy Name Pronunciation Note that both street names and first names have much lower accuracy than last names without the use of a dictionary. First names, like functors and irregular verb forms, exhibit unusual behavior in terms of the canonical segmental phonology of the language, e.g., THOMAS, where the first segment is /t/ rather than /(9/, MICHAEL, where orthographic CH, is /k/ and not /~/, and so on. In the case of street names, many are the same as place names (OTTAWA BLVD), first names (JOYCE ST.), or last names (EISENHOWER AVE). In any event, note that the use of the dictionary with these name fields is crucial to the success of the algorithm, much more so than in the case of surnames. In fact, the non-Anglo-Saxon surnames (LUELLA, LEONARDO, etc.) are handled quite adequately without use of a dictionary lookup. In the case of first names, the error rate is extremely low since the vast majority of these would be found in the dictionary.</Paragraph> <Paragraph position="15"> Naturally, the final and most crucial test of accuracy is the overall intelligibility of the name, that is, whether an individual on the receiving end of a telephone line (with its reduced bandwidth) can hear, repeat, and correctly transcribe (in normal orthography) a person's name and address. These tests and others remain for future research. We set out simply to attempt to improve pronunciation accuracy of proper names by creating a more intelligent front-end processor and a more complex letter-to-sound rule set that would take into account the variability of the text to be processed. Tests indicate that an algorithm can be successfully implemented to significantly increase accuracy of name pronunciation. This helps make possible applications in which proper names are output intelligibly using a speech synthesizer, as well as text-processing functions such as the construction of a name dictionary for automatic speech recognition. The algorithm has, in fact, been implemented for speech synthesis and is currently being used in a commercially available product within the telecommunications industry.</Paragraph> </Section> </Section> class="xml-element"></Paper>