File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1041_evalu.xml
Size: 3,791 bytes
Last Modified: 2025-10-06 14:00:29
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1041"> <Title>Machine Translation vs. Dictionary Term Translation a Comparison for English-Japanese News Article Alignment</Title> <Section position="9" start_page="265" end_page="266" type="evalu"> <SectionTitle> 7 Experiment </SectionTitle> <Paragraph position="0"> In order to automatically evaluate fractional recall and precision it was necessary to construct a representative set of Japanese articles with their correct English article alignments. We call this a judgement set. Although it is a significant effort to evaluate alignments by hand, this is possibly the only way to obtain an accurate assessment of the alignment performance. Once alignment has taken place we compared the threshold filtered set of English-Japanese aligned articles with the judgement set to obtain recall-precision statistics.</Paragraph> <Paragraph position="1"> The judgement set consisted of 100 Japanese queries with 454 relevant English documents. Some 24 Japanese queries had llO corresponding English document at all. This large percentage of irrelevant queries can be thought c,f as 'distractors' and is a particular feature of this alignment task.</Paragraph> <Paragraph position="2"> This set was then given to a bilingual checker who was asked to score each aligned article pair according to (1) the two articles are t~'anslations of each other, (2) the two articles are strongly contextually related, (3) no match. We removed type 3 correspondences so that the judgement set contained pairs of articles which at least shared the same context, i.e. referred to the same news event.</Paragraph> <Paragraph position="3"> Following inspection of matching articles we used the heuristic that the search space for each Japanese query was one day either side of the day of publication. On average this was 135 articles. This is small by the standards of conventional IR tasks, but given the large number of distractor queries, the requirement for high precision and the need to translate queries, the task is challenging.</Paragraph> <Paragraph position="4"> We will define recall and precision in the usual way as follows: no. of relevant items retrieved recall = (3) no. of relevant items in collection no. of relevant items retrieved precision = (4) no. of items retrieved Results for the model with MT and DTL are shown in Figure 3. We see that in the basic tf+idf model, machine translation provides significantly better article matching performance for medium and low levels of recall. For high recall levels DTL is better. Lexical transfer disambiguation appears to be important for high precision, but synonym choices are crucial for good recall.</Paragraph> <Paragraph position="5"> O,2 0.4 ReGImll 0.6 0.8 Figure 3: Model 1: Recall and precision for English-Japanese article alignment. -4-: DTL x: MT.</Paragraph> <Paragraph position="6"> Overall the MT method obtained an average precision of 0.72 in the 0.1 to 0.9 recall range and DTL has an average precision of 0.67. This 5 percent over-all improvement can be partly attributed to the fact that the Japanese news articles provided sufficient surrounding context to enable word sense disambiguation to be effective. It may also show that synonym selection is not so detrimental where a large number of other terms exist in the query. However, given these advantages we still see that DTL performs almost as well as MT and better at higher recall levels. In order to maximise recall, synonym lists provided by DTL seem to be important. Moreover, on inspection of the results we found that for some weakly matching document-query pairs in the judgement set, a mistranslation of an important or rare term may significantly bias the matching score.</Paragraph> </Section> class="xml-element"></Paper>