File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/j00-1004_evalu.xml
Size: 6,015 bytes
Last Modified: 2025-10-06 13:58:39
<?xml version="1.0" standalone="yes"?> <Paper uid="J00-1004"> <Title>Learning Dependency Translation Models as Collections of Finite-State Head Transducers Hiyan Alshawi*</Title> <Section position="6" start_page="56" end_page="57" type="evalu"> <SectionTitle> 5. Experiments </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 5.1 Evaluation Method </SectionTitle> <Paragraph position="0"> In order to reduce the time required to carry out training evaluation experiments, we have chosen two simple, string-based evaluation metrics that can be calculated automatically. These metrics, simple accuracy and translation accuracy, are used to compare the target string produced by the system against a reference human translation from held-out data.</Paragraph> <Paragraph position="1"> Simple accuracy is computed by first finding a transformation of one string into another that minimizes the total weight of insertions, deletions, and substitutions. (We use the same weights for these operations as in the NIST ASR evaluation software \[National Institute of Standards and Technology 1997\].) Translation accuracy includes transpositions (i.e., movement) of words as well as insertions, deletions, and substitutions. We regard the latter metric as more appropriate for evaluation of translation systems because the simple metric would count a transposition as two errors: an insertion plus a deletion. (This issue does not arise for speech recognizers because these systems do not normally make transposition errors.) For the lowest edit-distance transformation between the reference translation and system output, if we write I for the number of insertions, D for deletions, S for substitutions, and R for number of words in the reference translation string, we can express simple accuracy as simple accuracy = 1 - (I + D + S)/R.</Paragraph> <Paragraph position="2"> Similarly, if T is the number of transpositions in the lowest weight transformation including transpositions, we can express translation accuracy as translation accuracy = 1 - (I ~ + D ~ + S + T)/R.</Paragraph> <Paragraph position="3"> Since a transposition corresponds to an insertion and a deletion, the values of I ~ and D ~ for translation accuracy will, in general, be different from I and D in the computation of simple accuracy. For Spanish, the units for string operations in the evaluation metrics are words, whereas for Japanese they are Japanese characters.</Paragraph> </Section> <Section position="2" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 5.2 English-to-Spanish </SectionTitle> <Paragraph position="0"> The training and test data for the English-to-Spanish experiments were taken from a set of transcribed utterances from the Air Travel Information System (ATIS) corpus together with a translation of each utterance to Spanish. An utterance is typically a single sentence but is sometimes more than one sentence spoken in sequence. Alignment search and transduction training was carried out only on bitexts with sentences up to length 20, a total of 13,966 training bitexts. The test set consisted of 1,185 held-out bitexts at all lengths. Table 1 shows the word accuracy percentages (see Section 5.1) for the trained model, e2s, against the original held-out translations at various source sentence lengths. Scores are also given for a &quot;word-for-word&quot; baseline, sww, in which each English word is translated by the most highly correlated Spanish word.</Paragraph> </Section> <Section position="3" start_page="56" end_page="57" type="sub_section"> <SectionTitle> 5.3 English-to-Japanese </SectionTitle> <Paragraph position="0"> The training and test data for the English-to-Japanese experiments was a set of transcribed utterances of telephone service customers talking to AT&T operators. These utterances, collected from real customer-operator interactions, tend to include fragmented language, restarts, etc. Both training and test partitions were restricted to bi-texts with at most 20 English words, giving 12,226 training bitexts and 3,253 held-out test bitexts. In the Japanese text, we introduce &quot;word&quot; boundaries that are convenient for the training process. These word boundaries are parasitic on the word boundaries in the English transcriptions: the translators are asked to insert such a word boundary between any two Japanese characters that are taken to have arisen from the translation of distinct English words. This results in bitexts in which the number of multicharacter Japanese &quot;words&quot; is at most the number of English words. However, as noted above, evaluation of the Japanese output is done with Japanese characters, i.e., with the Japanese text in its natural format. Table 2 shows the Japanese character accuracy percentages for the trained English-to-Japanese model, e2j, and a baseline model, jww, which gives each English word its most highly correlated translation.</Paragraph> </Section> <Section position="4" start_page="57" end_page="57" type="sub_section"> <SectionTitle> 5.4 Note on Experimental Setting </SectionTitle> <Paragraph position="0"> The vocabularies in these English-Spanish and English-Japanese experiments are only a few thousand words; the utterances are fairly short (an average of 7.3 words per utterance) and often contain errors typical of spoken language. So while the domains may be representative of task-oriented dialogue settings, further experimentation would be needed to assess the effectiveness of our method in situations such as translating newspaper articles. In terms of the training data required, Tsukada et al. (1999) provide indirect empirical evidence suggesting accuracy can be further improved by increasing the size of our training sets, though also suggesting that the learning curve is relatively shallow beyond the current size of corpus.</Paragraph> </Section> </Section> class="xml-element"></Paper>