File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2401_intro.xml

Size: 2,525 bytes

Last Modified: 2025-10-06 14:04:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2401">
  <Title>Named Entities Translation Based on Comparable Corpora</Title>
  <Section position="3" start_page="1" end_page="1" type="intro">
    <SectionTitle>
2 Related Works
</SectionTitle>
    <Paragraph position="0"> Despite the difficulty of getting bilingual parallel corpus, most of the NE translation researches carried out work with parallel data-sets. Furthermore, those bilingual corpora are used to be aligned at paragraph or even at phrase level. For example, Moore's work (Moore, 2003) uses a bilingual parallel aligned English-French corpora, and applying different statistical techniques, he obtains a French form for each English entity.</Paragraph>
    <Paragraph position="1"> Although it has been less experimented with comparable corpora there are some known systems designed to work with them as well. Most of them deal with language pairs that have different kinds of alphabets. For instance, the Chinese-English translation tool presented in ACL 2003 (Chen et al., 2003), or the one published in the ACL 2002 edition for translating entity names from Arabic to English (Al-Onaizan et al., 2002a). The main goal of both systems is to obtain the corresponding form for English, taking Chinese and Arabic respectively as source languages. Two kinds of translations can be distinguished in both systems: direct/simple translations and transliterations (Al-Onaizan et al., 2002b).</Paragraph>
    <Paragraph position="2"> However, the techniques used by each tool for both kinds of translations are different. Frequency based methods are used in Chinese-English translations, while in the Arabic-English language pair, a more complex process is applied, which involves the combination of different kinds of techniques.</Paragraph>
    <Paragraph position="3"> In this paper, we present the research carried out for translating entity names from Basque into Spanish. For the first step, we have based on the system presented by Y. Al Onaizan and K. Knight in ACL 2002. With this system, they first obtain a candidate translation list for the entity in the target language, using both monolingual and bilingual resources. Once they have this list, they build a ranking with candidates applying different methods (such as statistical measures, web-counting, etc.). Finally, if they consider that the correct translation does not appear in the list, they extract an extended list version using the web and they apply again the ranking step.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML