File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-2012_evalu.xml
Size: 3,118 bytes
Last Modified: 2025-10-06 13:59:22
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2012"> <Title>Automatic Extraction of English-Korean Translations for Constituents of Technical Terms</Title> <Section position="5" start_page="70" end_page="71" type="evalu"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> For experiments we use three kinds of technical dictionary. They are biology, chemistry, and physics technical dictionaries where Korean term constituents are manually analyzed. The characteristics of experimental data are summarized as Table 1 (Ministry, 2002).</Paragraph> <Paragraph position="1"> number of bilingual term pairs) We compare our model with IBM Model 2 (IBM-2), and IBM Model 4 (IBM-4) implemented by GIZA++ (Och et al., 2003). We evaluate results with the alignment error rate (AER) of Och and Ney (Och et al., 2003), which measures agreement at the level of pairs of term constituents.</Paragraph> <Paragraph position="2"> where A is the set of term constituent pairs aligned by the automatic system, and G is the set aligned in the gold standard.</Paragraph> <Section position="1" start_page="70" end_page="71" type="sub_section"> <SectionTitle> 4.1 Experimental results </SectionTitle> <Paragraph position="0"> Table 2 shows evaluation results for IBM-2, IBM-4 and our proposed method. In the results precision and AER of our proposed method is higher than those of IBM-4. But recall of our proposed method is lower than that of IBM-4.</Paragraph> <Paragraph position="1"> IBM-4 has strong points in handling cross-alignment and null alignment while our model has strong points in handling n:1 alignment. The difference between our model and IBM-4 causes the performance gap. Because most alignment type found in the gold standard is 1:1 alignment and 1:n alignment rather than cross-alignment, null alignment, and n:1 alignment as described in Table 3, the performance gap between our method and IBM-4 is not so big. IBM-2 shows the worst performance because it can not deal with 1:n alignment. In other words, IBM-2 does not consider fertility as its parameter for estimating the translation probability. Note that 1:n alignment in the gold standard is about When we analyze errors caused by our method, errors are mainly caused by n:1 alignment and cross-alignment. In order to produce relevant alignment results for n:1 alignment, we need information indicating that more than one While (Och et al., 2003) differentiates sure and possible hand-annotated alignment, our gold-standard comes in only one variety.</Paragraph> <Paragraph position="2"> English term constituents are used as a conceptual unit. Due to lack of the information, our model has limitation on recovering errors caused by n:1 alignment. It is necessary to use domain specific corpus as a way of relaxing the problem. Cross alignment, which our model does not allow due to constrain 1, makes errors. Due to the cross alignment, the performance of our method in chemistry and biology is lower than that in physics, where there are few cross alignments in the gold standard.</Paragraph> </Section> </Section> class="xml-element"></Paper>