File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/w04-3248_relat.xml

Size: 2,918 bytes

Last Modified: 2025-10-06 14:15:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3248">
  <Title>A New Approach for English-Chinese Named Entity Alignment</Title>
  <Section position="3" start_page="1" end_page="1" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Translation knowledge can be acquired via word and phrase alignment. So far a lot of research has been conducted in the field of machine translation and knowledge acquisition, including both statistical approaches (Cherry and Lin, 2003; Probst and Brown, 2002; Wang et al., 2002; Och and Ney, 2000; Melamed, 2000; Vogel et al., 1996) and symbolic approaches (Huang and Choi, 2000; Ker and Chang, 1997).</Paragraph>
    <Paragraph position="1"> However, these approaches do not work well on the task of NE alignment. Traditional approaches following IBM Models (Brown et al., 1993) are not able to produce satisfactory results due to their inherent inability to handle many-to-many alignments. They only carry out the alignment between words and do not consider the case of complex phrases like some multi-word NEs. On the other hand, IBM Models allow at most one word in the source language to correspond to a word in the target language (Koehn et al., 2003; Marcu, 2001). Therefore they can not handle many-to-many word alignments within NEs well.</Paragraph>
    <Paragraph position="2"> Another well-known word alignment approach, HMM (Vogel et al., 1996), makes the alignment probabilities depend on the alignment position of the previous word. It does not explicitly consider many-to-many alignment either.</Paragraph>
    <Paragraph position="3"> Huang et al. (2003) proposed to extract Named Entity translingual equivalences based on the minimization of a linearly combined multi-feature cost. But they require Named Entity Recognition on both the source side and the target side.</Paragraph>
    <Paragraph position="4"> Moore's (2003) approach is based on a sequence of cost models. However, this approach greatly relies on linguistic information, such as a string repeated on both sides, and clues from capital letters that are not suitable for language pairs not belonging to the same family. Also, there are already complete lexical compounds identified on the target side, which represent a big part of the final results.</Paragraph>
    <Paragraph position="5"> During the alignment, Moore does not hypothesize that translations of phrases would require splitting predetermined lexical compounds on the target set.</Paragraph>
    <Paragraph position="6"> These methods are not suitable for our task, since we only have NEs identified on the source side, and there is no extra knowledge from the target side. Considering the inherent characteristics of NE translation, we can find several features that can help NE alignment; therefore, we use a maximum entropy model to integrate these features and carry out NE alignment.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML