File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1630_intro.xml
Size: 1,788 bytes
Last Modified: 2025-10-06 14:04:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1630"> <Title>Unsupervised Named Entity Transliteration Using Temporal and Phonetic Correlation</Title> <Section position="4" start_page="250" end_page="250" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> Named entity transliteration is the problem of producing, for a name in a source language, a set of one or more transliteration candidates in a target language. Previous work -- e.g. (Knight and Graehl, 1998; Meng et al., 2001; Al-Onaizan and Knight, 2002; Gao et al., 2004) -- has mostly assumed that one has a training lexicon of transliteration pairs, from which one can learn a model, often a source-channel or MaxEnt-based model.</Paragraph> <Paragraph position="1"> Comparable corpora have been studied extensively in the literature -- e.g.,(Fung, 1995; Rapp, 1995; Tanaka and Iwasaki, 1996; Franz et al., 1998; Ballesteros and Croft, 1998; Masuichi et al., 2000; Sadat et al., 2004), but transliteration in the context of comparable corpora has not been well addressed. The general idea of exploiting time correlations to acquire word translations from comparable corpora has been explored in several previous studies -- e.g., (Fung, 1995; Rapp, 1995; Tanaka and Iwasaki, 1996). Recently, a Pearson correlation method was proposed to mine word pairs from comparable corpora (Tao and Zhai, 2005); this idea is similar to the method used in (Kay and Roscheisen, 1993) for sentence alignment. In our work, we adopt the method proposed in (Tao and Zhai, 2005) and apply it to the problem of transliteration; note that (Tao and Zhai, 2005) compares several different metrics for time correlation, as we also note below -- and see (Sproat et al., 2006).</Paragraph> </Section> class="xml-element"></Paper>