File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1064_evalu.xml

Size: 3,296 bytes

Last Modified: 2025-10-06 13:58:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1064">
  <Title>Structural Feature Selection For English-Korean Statistical Machine Translation</Title>
  <Section position="7" start_page="442" end_page="443" type="evalu">
    <SectionTitle>
6 Experimental results
</SectionTitle>
    <Paragraph position="0"> The total saml)les consists of 3,000 aligned Sellteiice pairs of English-Korean, which were extracted from news on the web site of 'Korea Times' and a. magazine fl)r English learning.</Paragraph>
    <Paragraph position="1"> In the initial step, we manually constucted (;tie correspondences of tag sequences with 700 POS-tagged sentence I)airs. hi the SUl)ervision step, we extracte(t l.,d83 correct tag sequence corresl)onit(miles its shown in Table 2, and it work as active features. As a feature I)OOI, 3,172 (lisjoint %a(;ures of tag sequence ma.I)pings were retrieved. 1% is very important to make atomic thatures.</Paragraph>
    <Paragraph position="2"> We maxinfized A of active features with resl)ect to total smnples using improved the iterative scaling algoritlun. Figure 3 shows Ai of each feature .f(Q31,:P+.m,ttO C A. There a.re nlany corresl)ondence 1)atterns with resl)ect to the Englsh tag string, 'BEP+JJ'.</Paragraph>
    <Paragraph position="3"> Note that p(tt~lQ) is comtmted by the exponential model of (4) mid the conditional probability is the saine with empirical probal)ility in (7). Since the wflue of p(ylx) shows the maxinmm likelihood, it is proved that each A was converged correctly.</Paragraph>
    <Paragraph position="4"> # of (.% y) occurs in sam, pie P(ylx) - n, um, ber of times of a: (7) hi feature selection step, we chose useflll features with the gain threshold of 0.008. Figure 4 shows some feaures with a large gain. Anion\ then1, tag sequences mapping including 'RB' are erroneous. It means that position of adverb in Korean is very complicated to handle. Also, proper noun in English aligned coInmon nouns in Korean  because of tagging errors. Note that in the case of 'PN+PPCA2+PPAD+VBMA', it is not an adjacent string but an interrupted string. It means ttlat a verb in English generally map to a verb taking as argument the accusative and adverbial postposition in Korean.</Paragraph>
    <Paragraph position="5"> One way of testing usefulness of our method is to construct structured aligned bilingual sentences. Table 3 shows lexical alignments using tag sequence alignments drawn from our algorithm for a given sentence, 'you usually have to take regular seating - dangsineun dachcw ilbanscokc anjayaman handa' and Figure 5 shows the best lexical alignment of the sentence.</Paragraph>
    <Paragraph position="6"> We conducted the exi)eriment on 100 sentences composed of words in length 14 or less and silnlilY chose the most likely paths. As tim result, the accuray was about 71.1~. It shows that we can partly use the tag sequence alignments for lexical alignments. We will extend tlle structural mapping model with consideration to the lexical information. The parmneters, the conditional probabilities about stuctural mappings will be embedded in a statistical model. Table 4 shows conditional probabilities of seine features according to 'DT+NN'. In general, determiner is translated into NULL or adnominal word in Korean.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML