File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/p04-1024_abstr.xml

Size: 1,477 bytes

Last Modified: 2025-10-06 13:43:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1024">
  <Title>Finding Ideographic Representations of Japanese Names Written in Latin Script via Language Identification and Corpus Validation</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Multilingual applications frequently involve dealing with proper names, but names are often missing in bilingual lexicons. This problem is exacerbated for applications involving translation between Latin-scripted languages and Asian languages such as Chinese, Japanese and Korean (CJK) where simple string copying is not a solution. We present a novel approach for generating the ideographic representations of a CJK name written in a Latin script. The proposed approach involves first identifying the origin of the name, and then back-transliterating the name to all possible Chinese characters using language-specific mappings. To reduce the massive number of possibilities for computation, we apply a three-tier filtering process by filtering first through a set of attested bigrams, then through a set of attested terms, and lastly through the WWW for a final validation. We illustrate the approach with English-to-Japanese back-transliteration.</Paragraph>
    <Paragraph position="1"> Against test sets of Japanese given names and surnames, we have achieved average precisions of 73% and 90%, respectively.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML