File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-3248_evalu.xml

Size: 7,719 bytes

Last Modified: 2025-10-06 13:59:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3248">
  <Title>A New Approach for English-Chinese Named Entity Alignment</Title>
  <Section position="6" start_page="11" end_page="11" type="evalu">
    <SectionTitle>
4 Experimental Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.1 Experimental Setup
</SectionTitle>
      <Paragraph position="0"> We perform experiments to investigate the performance of the above framework. We take the LDC Xinhua News with aligned English-Chinese sentence pairs as our corpus.</Paragraph>
      <Paragraph position="1"> The incremental testing strategy is to investigate the system's performance as more and more data are added into the data set. Initially, we take 300  http://www.isi.edu/~och/YASMET.html sentences as the standard testing set, and we repeatedly add 5k more sentences into the data set and process the new data. After iterative re-ranking, the performance of alignment models over the 300 sentence pairs is calculated. The learning curves are drawn from 5k through 30k sentences with the step as 5k every time.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.2 Baseline System
</SectionTitle>
      <Paragraph position="0"> A translated Chinese NE may appear at a different position from the corresponding English NE in the sentence. IBM Model 4 (Brown et al., 1993) integrates a distortion probability, which is complete enough to account for this tendency. The HMM model (Vogel et al., 1996) conducts word alignment with a strong tendency to preserve localization from one language to another.</Paragraph>
      <Paragraph position="1"> Therefore we extract NE alignments based on the results of these two models as our baseline systems. For the alignments of IBM Model 4 and HMM, we use the published software package, GIZA++  (Och and Ney, 2003) for processing.</Paragraph>
      <Paragraph position="2"> Some recent research has proposed to extract phrase translations based on the results from IBM Model (Koehn et al., 2003). We extract English-Chinese NE alignments based on the results from IBM Model 4 and HMM. The extraction strategy takes each of the continuous aligned segments as one possible candidate, and finally the one with the highest frequency in the whole corpus wins.</Paragraph>
      <Paragraph position="3">  strategy. &amp;quot;China&amp;quot; here is aligned to either &amp;quot;Zhong Guo &amp;quot; or &amp;quot;Zhong &amp;quot;. Finally the one with a higher frequency in the whole corpus, say, &amp;quot;Zhong Guo &amp;quot;, will be viewed as the final alignment for &amp;quot;China&amp;quot;.</Paragraph>
    </Section>
    <Section position="3" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
4.3 Results Analysis
</SectionTitle>
      <Paragraph position="0"> Our approach first uses NLPWIN to conduct NER. Suppose S' is the set of identified NE with NLPWIN. S is the alignment set we compute with our models based on S', and T is the set consisting of all the true alignments based on S'. We define the evaluation metrics of precision, recall, and F-score as follows:  2. Calculate all the feature scores to get the N-best list of the Chinese NE candidates; 3. Candidates with their values over a given threshold are considered to be correct and put into the re-ranking training set; 4. Retrain the parameters</Paragraph>
      <Paragraph position="2"> l converge, and take the current ranking as the final result. [China] hopes to further economic ... [EU].  Based on the testing strategies discussed in Section 4.1, we perform all the experiments on data without word segmentation and get the performance for NE alignment with IBM Model 4, the HMM model, and the maximum entropy model. Figure 4, 5, and 6 give the learning curves for precision, recall, and F-score, respectively, with these experiments.</Paragraph>
      <Paragraph position="3">  From these curves, we see that HMM generally works a little better than IBM Model 4, both for precision and for recall. NE alignment with the maximum entropy model greatly outperforms IBM Model 4 and HMM in precision, recall, and F-Score. Since with this framework, we first use NLPWIN to recognize NEs in English, we have NE identification error. The precision of NLPWIN on our task is about 77%. Taking this into account, we know our precision score has actually been reduced by this rate. In Figure 4, this causes the upper bound of precision to be 77%.</Paragraph>
      <Paragraph position="4">  Segmentation To justify that our approach of NE alignment without word segmentation really reduces the error propagations from word segmentation and thereafter NER, we also perform all the experiments upon the data set with word segmentation. The segmented data is directly taken from published LDC Xinhua News corpus.</Paragraph>
      <Paragraph position="5">  and F-score for the experiments with word segmentation and without word segmentation when the size of the data set is 30k sentences. For HMM and IBM Model 4, performance without word segmentation is always better than with word segmentation. For maximum entropy model, the scores without word segmentation are always 6 to 9 percent better than those with word segmentation. This owes to the reduction of error propagation from word segmentation and NER. For example, in the following sentence pair with word segmentation, the English NE &amp;quot;United States&amp;quot; can no longer be correctly aligned to &amp;quot;Mei Guo &amp;quot;. Since in the Chinese sentence, the incorrect segmentation takes &amp;quot;Fang Wen Mei Guo &amp;quot; as one unit. But if we conduct alignment without word segmentation,  Similar situations exist when HMM and IBM Model 4 are used for NE alignment. When compared with IBM Model 4 and HMM with word segmentation, our approach with word segmentation also has a much better performance than them. This demonstrates that in any case our approach outperforms IBM Model 4 and HMM significantly.</Paragraph>
      <Paragraph position="6">  Huang et al.'s (2003) approach investigated transliteration cost and translation cost, based on IBM Model 1, and NE tagging cost by an NE identifier. In our approach, we do not have an NE tagging cost. We use a different type of translation and transliteration score, and add a distortion score that is important to distinguish identical NEs in the same sentence.</Paragraph>
      <Paragraph position="7"> Experimental results prove that in our approach the selected features that characterize NE translations from English to Chinese help much for NE alignment. The co-occurrence score uses the knowledge from the whole corpus to help NE alignment. And the transliteration score addresses the problem of data sparseness. For example, English person name &amp;quot;Mostafizur Rahman&amp;quot; only appears once in the data set. But with the transliteration score, we get it aligned to the Chinese NE &amp;quot;Mu Si Ta Fei Zi La He Man &amp;quot; correctly. Since in ME training we use iterative bootstrapping to help supervised learning, the training data is not completely clean and brings some errors into the final results. But it avoids the acquisition of large annotated training set and the performance is still much better than traditional alignment models. The performance is also impaired by the English NER tool. Another possible reason for alignment errors is the inconsistency of NE translation in English and Chinese. For example, usually only the last name of foreigners is translated into Chinese and the first name is ignored. This brings some trouble for the alignment of person names.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML