File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-0902_evalu.xml
Size: 3,454 bytes
Last Modified: 2025-10-06 13:58:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0902"> <Title>Learning a Translation Lexicon from Monolingual Corpora</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3.3 Results </SectionTitle> <Paragraph position="0"> The results are summarized in Table 5. Recall that for each word that we are trying to map to the other language, a thousand possible target words exist, but only one is correct. The base-line for this task, choosing words at random, results on average in only 1 correct mapping in the entire lexicon. A perfect lexicon, of course, contains 1000 correct entries.</Paragraph> <Paragraph position="1"> The starting point for the corpus score is the 15.8% that are already achieved with the seed lexicon from Section 2.1. In an experiment where we identi ed the best lexical entries using a very large parallel corpus, we could achieve 89% accuracy on this test corpus.</Paragraph> <Paragraph position="2"> many correct lexicon entries where added (Entries), and how well the resulting translation lexicon performs compared to the actual word-level translations in a parallel corpus (Corpus). For all experiments the starting point was the seed lexicon of 1339 identical spelled words described in Section 2.1. which achieve 15.8% Corpus score.</Paragraph> <Paragraph position="3"> Taken alone, both the context and spelling clues learn over a hundred lexicon entries correctly. The similarity and frequency clues, however, seem to be too imprecise to pinpoint the search to the correct translations.</Paragraph> <Paragraph position="4"> A closer look of the spelling and context scores reveals that while the spelling clue allows to learn more correct lexicon entries (140 opposed to 107), the context clue does better with the more frequently used lexicon entries, as found in the test corpus (accuracy of 31.9% opposed to 25.4%).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Combining Clues </SectionTitle> <Paragraph position="0"> Combining di erent clues is quite simple: We can simply add up the matching scores. The scores can be weighted. Initially we simply weighted all clues equally. We then changed the weights to see, if we can obtain better results.</Paragraph> <Paragraph position="1"> We found that there is generally a broad range of weights that result in similar performance.</Paragraph> <Paragraph position="2"> When using the spelling clue in combination with others, we found it useful to de ne a cuto .</Paragraph> <Paragraph position="3"> If two words agree in 30% of their letters this is generally as bad as if they do not agree in any { the agreements are purely coincidental.</Paragraph> <Paragraph position="4"> Therefore we counted all spelling scores below 0.3 as 0.3.</Paragraph> <Paragraph position="5"> Combining the context and the spelling clues yields a signi cantly better result than using each clue by itself. A total of 185 correct lexical entries are learned with a corpus score of 38.6%. Adding in the other scores, however, does not seem to be bene cial: only adding the frequency clue to the spelling clue provides some improvement. In all other cases, these scores are not helpful.</Paragraph> <Paragraph position="6"> Besides this linear combination of scores from the di erent clues, more sophisticated methods may be possible [Koehn, 2002].</Paragraph> </Section> </Section> class="xml-element"></Paper>