File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2011_intro.xml
Size: 4,610 bytes
Last Modified: 2025-10-06 14:03:41
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2011"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics</Title> <Section position="4" start_page="0" end_page="82" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> Translating NEs, which is different from translating common words, is an &quot;asymmetric&quot; translation. Translations of an NE in various languages can be organized as a tree according to the relations of translation language pairs, as shown in NE in its original language, i.e., initially de- null nominated. We call the translation of an NE along the tree downward as a &quot;forward translation&quot;. On the contrary, &quot;backward translation&quot; is to translate an NE along the tree upward.</Paragraph> <Paragraph position="1"> Generally speaking, forward translation is easier than backward translation. On the one hand, there is no unique answer to forward translation. Many alternative ways can be adopted to forward translate an NE from one language to another.</Paragraph> <Paragraph position="2"> For example, &quot;Jordan&quot; can be translated into &quot;Qiao Dan (Qiao-Dan)&quot;, &quot;Qiao Deng (Qiao-Deng)&quot;, &quot;Yue Dan (Yue-Dan)&quot;, and so on. On the other hand, there is generally one unique corresponding term in backward translation, especially when the target language is the root of the translating tree.</Paragraph> <Paragraph position="3"> In addition, when the original NE appears in documents in the target language in forward translation, it often comes together with a corresponding translation in the target language (Cheng et al., 2004). That makes forward translation less challenging. In this paper, we focus our study on Chinese-English backward translation, i.e., the original language of NE and the target language in translation is English, and the source language to be translated is Chinese.</Paragraph> <Paragraph position="4"> There are two important issues shown below to deal with backward translation of NEs or OOV words.</Paragraph> <Paragraph position="5"> * Where to find the corresponding translation? * How to identify the correct translation? NEs seldom appear in multi-lingual or even mono-lingual dictionaries, i.e., they are OOV or unknown words. For unknown words, where can we find its corresponding translation? A bilingual corpus might be a possible solution. However, NEs appear in a vast context and bilingual corpora available can only cover a small proportion. Most text resources are monolingual. Can we find translations of NEs in monolingual corpora? While mentioning a translated name during writing, sometimes we would annotate it with its original name in the original foreign language, especially when the name is less commonly known. But how often would it happen? With our testing data, which would be introduced in Section 4, over 97% of translated NEs would have its original NE appearing in the first 100 returned snippets by Google. Figure 2 shows several snippets returned by Google which contains the original NE of the given foreign NE.</Paragraph> <Paragraph position="6"> When translations can be found in snippets, the next work would be identifying which name is the correct translation of NEs. First we should know how NEs would be translated. The commonest case is translating by phonetic values, or so-called transliteration. Most personal names and location names are transliterated. NEs may also be translated by meaning. It is the way in which most titles and nicknames and some organization names would be translated. Another common case is translating by phonetic values for some parts and by meaning for the others. For example, &quot;Sears Tower&quot; is translated into &quot;Xi Er Si (Xi-Er-Si) Da Sha (tower)&quot; in Chinese. NEs would sometimes be translated by semantics or contents of the entity it indicates, especially with movies. Table 1 summarizes the possible translating ways of NEs. From the above discussion, we may use similarities in phonetic values, meanings of constituent words, semantics, and so on to identify corresponding translations. Besides these linguistic features, non-linguistic features such as statistical information may also help use well. We would discuss how to combine these features to identify corresponding translation in detail in the next section.</Paragraph> </Section> class="xml-element"></Paper>