File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1010_intro.xml
Size: 2,090 bytes
Last Modified: 2025-10-06 14:03:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1010"> <Title>Named Entity Transliteration with Comparable Corpora</Title> <Section position="4" start_page="73" end_page="73" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> In previous work on Chinese named-entity transliteration -- e.g. (Meng et al., 2001; Gao et al., 2004), the problem has been cast as the problem of producing, for a given Chinese name, an English equivalent such as one might need in a machine translation system. For example, for the name ESEKH2EUF3F5wei wei-lian-mu-si, one would like to arrive at the English name V(enus) Williams. Common approaches include source-channel methods, following (Knight and Graehl, 1998) or maximum-entropy models.</Paragraph> <Paragraph position="1"> Comparable corpora have been studied extensively in the literature (e.g.,(Fung, 1995; Rapp, 1995; Tanaka and Iwasaki, 1996; Franz et al., 1998; Ballesteros and Croft, 1998; Masuichi et al., 2000; Sadat et al., 2003)), but transliteration in the context of comparable corpora has not been well addressed.</Paragraph> <Paragraph position="2"> The general idea of exploiting frequency correlations to acquire word translations from comparable corpora has been explored in several previous studies (e.g., (Fung, 1995; Rapp, 1995; Tanaka and Iwasaki, 1996)).Recently, a method based on Pearson correlation was proposed to mine word pairs from comparable corpora (Tao and Zhai, 2005), an idea similar to the method used in (Kay and Roscheisen, 1993) for sentence alignment. In our work, we adopt the method proposed in (Tao and Zhai, 2005) and apply it to the problem of transliteration. We also study several variations of the similarity measures.</Paragraph> <Paragraph position="3"> Mining transliterations from multilingual web pages was studied in (Zhang and Vines, 2004); Our work differs from this work in that we use comparable corpora (in particular, news data) and leverage the time correlation information naturally available in comparable corpora.</Paragraph> </Section> class="xml-element"></Paper>