File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1011_metho.xml

Size: 10,999 bytes

Last Modified: 2025-10-06 14:07:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1011">
  <Title>Base Noun Phrase Translation Using Web Data and the EM Algorithm</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
EM Algorithm
</SectionTitle>
    <Paragraph position="0"> We define a relation between E and C as CER x[?] , which represents the links in a translation dictionary. We further define</Paragraph>
    <Paragraph position="2"> efefef are independently generated according to the distribution defined as:</Paragraph>
    <Paragraph position="4"> We estimate the parameters of the distribution by using the Expectation and Maximization (EM) Algorithm (Dempster et al., 1977).</Paragraph>
    <Paragraph position="5"> Initially, we set for all Cc [?]</Paragraph>
    <Paragraph position="7"> Next, we estimate the parameters by iteratively updating them, until they converge (cf., Figure 3).</Paragraph>
    <Paragraph position="8"> Finally, we calculate )(cf  where 1[?]a is an additional parameter used to emphasize the prior information. If we ignore the first term in Equation (4), then the use of one EM-NBC turns out to select the candidate whose frequency vector is the closest to the transformed vector D in terms of KL divergence (cf., Cover and Tomas 1991).</Paragraph>
    <Paragraph position="9"> EM-NBC-Ensemble To further improve performance, we use an ensemble (i.e., a linear combination) of EM-NBCs (EM-NBC-Ensemble), while the classifiers are constructed on the basis of the data in different contexts with different window sizes. More specifically, we calculate</Paragraph>
    <Paragraph position="11"> D denotes the data in different contexts.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Translation Selection -- EM-TF-IDF
</SectionTitle>
      <Paragraph position="0"> We view the translation selection problem as that of calculating similarities between context vectors and use as context vectors TF-IDF vectors constructed with the EM Algorithm.</Paragraph>
      <Paragraph position="1"> Figure 4 describes the algorithm in which we use the same notations as those in EM-NBC-Ensemble.</Paragraph>
      <Paragraph position="2"> The idf valueofaChinesewordc is calculated in advance and as )/)(log()( Fcdfcidf [?]= (6) where )cdf( denotes the document frequency of</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Advantage of Using EM Algorithm
</SectionTitle>
      <Paragraph position="0"> The uses of EM-NBC-Ensemble and EM-TF-IDF can be viewed as extensions of existing methods for word or phrase translation using non-parallel corpora. Particularly, the use of the EM Algorithm can help to accurately transform a frequency vector from one language to another.</Paragraph>
      <Paragraph position="1"> Suppose that we are to determine if 'G5b5G1643G1bca G4b7' is a translation of 'information age' (actually it is). The frequency vectors of context words for 'information age' and 'G5b5G1643G1bcaG4b7' are given in A and D in Figure 5, respectively. If for each English word we only retain the link connecting to the Chinese translation with the largest frequency (a link represented as a solid line) to establish a many-to-one mapping and transform vector A from English to Chinese, we obtain vector B. It turns out, however, that vector B is quite different from vector D, although they should be similar to each other. We will refer to this method as 'Major Translation' hereafter.</Paragraph>
      <Paragraph position="2"> With EM, vector A in Figure 5 is transformed into vector C, which is much closer to vector D, as expected. Specifically, EM can split the frequency of a word in English and distribute them into its translations in Chinese in a theoretically sound way (cf., the distributed frequencies of 'internet'). Note that if we assume  relationship, then the use of EM turns out to be equivalent to that of Major Translation.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.5 Combination
</SectionTitle>
      <Paragraph position="0"> In order to further boost the performance of translation, we propose to also use the translation method proposed in Nagata et al. Specifically, we combine our method with that of Nagata et al by using a back-off strategy.</Paragraph>
      <Paragraph position="1"> Figure 6 illustrates the process of collecting Chinese translation candidates for an English Base NP 'information asymmetry' with Nagata et al's method.</Paragraph>
      <Paragraph position="2"> In the combination of the two methods, we first use Nagata et al's method to perform translation; if we cannot find translations, we next use our method. We will denote this strategy 'Back-off'.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="2" type="metho">
    <SectionTitle>
4. Experimental Results
</SectionTitle>
    <Paragraph position="0"> We conducted experiments on translation of the Base NPs from English to Chinese.</Paragraph>
    <Paragraph position="1"> We extracted Base NPs (noun-noun pairs) from  Five translation experts evaluated the translation results by judging whether or not they were acceptable. The evaluations reported below are all based on their judgements.</Paragraph>
    <Section position="1" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
4.1 Basic Experiment
</SectionTitle>
      <Paragraph position="0"> In the experiment, we randomly selected 1000 Base NPs from the 3000 Base NPs. We next used our method to perform translation on the 1000 phrases. In translation selection, we employed  Table 1 shows the results in terms of coverage and top n accuracy. Here, coverage is defined as the percentage of phrases which have translations selected, while top n accuracy is defined as the percentage of phrases whose selected top n translations include correct translations.</Paragraph>
      <Paragraph position="1"> For EM-NBC-Ensemble, we set the a !in (4) to be 5 on the basis of our preliminary experimental results. For EM-TF-IDF, we used the non-web datadescribedinSection4.4toestimateidf values of words. We used contexts with window sizes of +-1, +-3, +-5, +-7, +-9, +-11.</Paragraph>
      <Paragraph position="2">  The dictionary is created by the Harbin Institute of Technology.  1. Input 'information asymmetry'; 2. Search the English Base NP on web sites in Chinese and obtain documents as follows (i.e., using partial parallel  EM-NBC-Ensemble and EM-TF-IDF, in which for EM-NBC-Ensemble 'window size' denotes that of the largest within an ensemble. Table 1 summarizes the best results for each of them. 'Prior' and 'MT-TF-IDF' are actually baseline methods relying on the existing technologies. In Prior, we select candidates whose prior probabilities are the largest, equivalently, document frequencies obtained in translation candidate collection are the largest. In MT-TF-IDF, we use TF-IDF vectors transformed with Major Translation.</Paragraph>
      <Paragraph position="3"> Our experimental results indicate that both EM-NBC-Ensemble and EM-TF-IDF significantly outperform Prior and MT-TF-IDF, when appropriate window sizes are chosen. The p-values of the sign tests are 0.00056 and 0.00133 for EM-NBC-Ensemble, 0.00002 and 0.00901 for EM-TF-IDF, respectively.</Paragraph>
      <Paragraph position="4"> We next removed each of the key components of EM-NBC-Ensemble and used the remaining components as a variant of it to perform translation selection. The key components are (1) distance calculation by KL divergence (2) EM, (3) prior probability, and (4) ensemble. The variants, thus, respectively make use of (1) the baseline method 'Prior', (2) an ensemble of Naive Bayesian Classifiers based on Major Translation (MT-NBC-Ensemble), (3) an ensemble of EM-based KL divergence calculations (EM-KL-Ensemble), and (4) EM-NBC. Figure 7 and Table 1 show the results. We see that EM-NBC-Ensemble outperforms all of the variants, indicating that all the components within EM-NBC-Ensemble play positive roles.</Paragraph>
      <Paragraph position="5"> We removed each of the key components of EM-TF-IDF and used the remaining components as a variant of it to perform translation selection. The key components are (1) idf value and (2) EM. The variants, thus, respectively make use of (1) EM-based frequency vectors (EM-TF), (2) the baseline method MT-TF-IDF. Figure 7 and Table 1 show the results. We see that EM-TF-IDF outperforms both variants, indicating that all of the components within EM-TF-IDF are needed.</Paragraph>
      <Paragraph position="6"> Comparing the results between MT-NBC-Ensemble and EM-NBC-Ensemble and the results between MT-TF-IDF and EM-TF-IDF, we see that the uses of the EM Algorithm can indeed help to improve translation accuracies.</Paragraph>
      <Paragraph position="7">  Table 2 shows translations of five Base NPs as output by EM-NBC-Ensemble, in which the translations marked with * were judged incorrect by human experts. We analyzed the reasons for incorrect translations and found that the incorrect translations were due to: (1) no existence of dictionary entry (19%), (2) non-compositional translation (13%), (3) ranking error (68%).</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Our Method vs. Nagata et al's Method
</SectionTitle>
      <Paragraph position="0"> We next used Nagata et al's method to perform translation. From Table 3, we can see that the accuracy of Nagata et al's method is higher than that of our method, but the coverage of it is lower.</Paragraph>
      <Paragraph position="1"> The results indicate that our proposed Back-off strategy for translation is justifiable.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.3 Combination
</SectionTitle>
      <Paragraph position="0"> In the experiment, we tested the Back-off strategy, Table 4 shows the results. The Back-off strategy  helps to further improve the results whether EM-NBC-Ensemble or EM-TF-IDF is used.</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.4 Web Data vs. Non-web Data
</SectionTitle>
      <Paragraph position="0"> To test the effectiveness of the use of web data, we conducted another experiment in which we performed translation by using non-web data.</Paragraph>
      <Paragraph position="1"> The data comprised of the Wall Street Journal corpus in English (1987-1992, 500MB) and the People's Daily corpus in Chinese (1982-1998, 700MB). We followed the Back-off strategy as in Section 4.3 to translate the 1000 Base NPs.</Paragraph>
      <Paragraph position="2">  The results in Table 5 show that the use of web data can yield better results than non-use of it, although the sizes of the non-web data we used were considerably large in practice. For Nagata et al's method, we found that it was almost impossible to find partial-parallel corpora in the non-web data.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML