File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/p99-1067_evalu.xml

Size: 5,335 bytes

Last Modified: 2025-10-06 14:00:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1067">
  <Title>Automatic Identification of Word Translations from Unrelated English and German Corpora</Title>
  <Section position="7" start_page="523" end_page="524" type="evalu">
    <SectionTitle>
4 Results and Evaluation
</SectionTitle>
    <Paragraph position="0"> Table 1 shows the results for 20 of the 100 German test words. For each of these test words, the top five translations as automatically generated are listed. In addition, for each word its expected English translation from the test set is given together with its position in the ranked lists of computed translations. The positions in the ranked lists are a measure for the quality of the predictions, with a 1 meaning that the prediction is correct and a high value meaning that the program was far from predicting the correct word.</Paragraph>
    <Paragraph position="1"> If we look at the table, we see that in many cases the program predicts the expected word, with other possible translations immediately following. For example, for the German word Hiiuschen, the correct translations bungalow, cottage, house, and hut are listed. In other cases, typical associates follow the correct translation. For example, the correct translation of Miidchen, girl, is followed by boy, man, brother, and lady. This behavior can be expected from our associationist approach. Unfortunately, in some cases the correct translation and one of its strong associates are mixed up, as for example with Frau, where its correct translation, woman, is listed only second after its strong associate man. Another example of this typical kind of error is pfeifen, where the correct translation whistle is listed third after linesman and referee. Let us now look at some cases where the program did particularly badly. For Kohl we had expected its dictionary translation cabbage, but- given that a substantial part of our newspaper corpora consists of political texts - we do not need to further explain why our program lists Major, Kohl, Thatcher, Gorbachev, and Bush, state leaders who were in office during the time period the texts were written. In other cases, such as Krankheit and Whisky, the simulation program simply preferred the British usage of the Guardian over the American usage in our test set: Instead of sickness, the program predicted disease and illness, and instead of whiskey it predicted whisky.</Paragraph>
    <Paragraph position="2"> A much more severe problem is that our current approach cannot properly handle ambiguities: For the German word weifl it does not predict white, but instead know. The reason is that weifl can also be third person singular of the German verb wissen (to know), which in newspaper texts is more frequent than the color white. Since our lemmatizer is not context-sensitive, this word was left unlemmatized, which explains the result.</Paragraph>
    <Paragraph position="3"> To be able to compare our results with other work, we also did a quantitative evaluation. For all test words we checked whether the predicted translation (first word in the ranked list) was identical to our expected translation. This was true for 65 of the 100 test words. However, in some cases the choice of the expected translation in the test set had been somewhat arbitrary. For example, for the German word Strafle we had expected street, but the system predicted road, which is a translation quite as good.</Paragraph>
    <Paragraph position="4"> Therefore, as a better measure for the accuracy of our system we counted the number of times where an acceptable translation of the source word is ranked first. This was true for 72 of the 100 test words, which gives us an accuracy of 72%. In another test, we checked whether an acceptable translation appeared among the top 10 of the ranked lists. This was true in 89 cases, s For comparison, Fung &amp; McKeown (1997) report an accuracy of about 30% when only the top candidate is counted. However, it must be emphasized that their result has been achieved under very different circumstances. On the one hand, their task was more difficult because they worked on a pair of unrelated languages (English/Japanese) using smaller corpora and a random selection of test words, many of which were multi-word terms. Also, they predetermined a single translation as being correct. On the other hand, when conducting their evaluation, Fung &amp; McKeown limited the vocabulary they considered as translation candidates to a few hundred terms, which obviously facilitates the task.</Paragraph>
    <Paragraph position="5"> 5 We did not check for the completeness of the translations found (recall), since this measure depends very much on the size of the dictionary used as the standard.</Paragraph>
    <Paragraph position="6">  illness Aids patient doctor girl 1 girl music 1 music dance stove 3 heat oven stove house whistle 3 linesman referee whistle blow offside religion 1 sheep 1 soldier 1 street 2 boy man brother lady theatre musical song burn religion culture faith religious belief sheep cattle cow pig goat soldier army troop force civilian road street city town walk sweet smell delicious taste love sweet 1 tobacco 1 white 46 whiskey 11 tobacco cigarette consumption nicotine drink know say thought see think whisky beer Scotch bottle wine</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML