File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1042_evalu.xml
Size: 4,138 bytes
Last Modified: 2025-10-06 14:00:29
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1042"> <Title>An Experiment in Hybrid Dictionary and Statistical Sentence Alignment</Title> <Section position="7" start_page="271" end_page="272" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> In this section we present the results of using different combinations of the three basic methods. We combined the basic methods to make hybrid models simply by taking the product of the scores for the models given above. Although this is simplistic we felt that in the first stage of our investigation it was better to give equal weight to each method.</Paragraph> <Paragraph position="1"> The seven methods we tested are coded as follows: DICE: sentence alignmelit using bilingual dictionary and Dice's coefficient scores; LEN: sentence alignment using sentence length ratios; OFFSET: sentence alignment using offs,:t probabilities.</Paragraph> <Paragraph position="2"> We performed sentence alignment on our test set of 380 English sentences and 453 Japanese sentences. The results are shown as recall and precision which we define in the usual way as follows: recall = #correctly matched sentences retrieved #matched sentences in the test collection (a) precision = #correctly matched sentences retrieved matched sentences retrieved (4) The results are shown in Table 1. We see that the baseline method using lexical matching with a bilingual lexicon, DICE, performs better than either of the two statistical methods LEN or OFFSET used separately. Offset probabilities in particular performed poorly showing tltat we cannot expect the correctly matching sentence to appear constantly in the same highest probability position.</Paragraph> <Paragraph position="3"> precision.</Paragraph> <Paragraph position="4"> Considering the hybrid methods, we see significantly that DICE+LEN provides a clearly better result for both recall and precision to either DICE or LEN used separately. On inspection we found that DICE by itself could not distinguish clearly between many candidate sentences. This occured for two reasons. null 1. As a result of the limited domain in which news articles report, there was a strong lexical overlap between candidate sentences in a news article. null 2. Secondly, where the lexical overlap was poor between the English sentence and the Japanese translation, this leads to low DICE scores.</Paragraph> <Paragraph position="5"> The second reason can be attributed to low coverage in the bilingual lexicon with the domain of the news articles. If we had set a minimum threshold limit for overlap frequency then we would have ruled out many correct matches which were found.</Paragraph> <Paragraph position="6"> In both cases LEN provides a decisive clue and enables us to find the correct result more reliably. Furthermore, we found that LEN was particularly effective at identifying multi-sentence correspondences compared to DICE, possibly because some sentences are very small and provide weak evidence for lexical matching, whereas when they are combined with neighbours they provide significant evidence for the LEN model.</Paragraph> <Paragraph position="7"> Using all methods together however in DICE+LEN+OFFSET seems less promising and we believe that the offset probabilities are not a reliable model. Possibly this is due to lack of data in the training stage when we calculated ~ and p, or the data set may not in fact be normally distributed as indicated by Figure 7.</Paragraph> <Paragraph position="8"> Finally, we noticed that a consistent factor in the English and Japanese text pairs was that the first two lines of the English were always matched to the first line of the Japanese. This was because the English text separated the title and first line, whereas our sentence segmenter could not do this for the Japanese. This factor was consistent for all the 50 article pairs in our test collection and may have led to a small deterioration in the results, so the figures we present are the minimum of what we can expect when sentence segmentation is performed correctly.</Paragraph> </Section> class="xml-element"></Paper>