File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/p04-1024_abstr.xml
Size: 1,477 bytes
Last Modified: 2025-10-06 13:43:36
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1024"> <Title>Finding Ideographic Representations of Japanese Names Written in Latin Script via Language Identification and Corpus Validation</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Multilingual applications frequently involve dealing with proper names, but names are often missing in bilingual lexicons. This problem is exacerbated for applications involving translation between Latin-scripted languages and Asian languages such as Chinese, Japanese and Korean (CJK) where simple string copying is not a solution. We present a novel approach for generating the ideographic representations of a CJK name written in a Latin script. The proposed approach involves first identifying the origin of the name, and then back-transliterating the name to all possible Chinese characters using language-specific mappings. To reduce the massive number of possibilities for computation, we apply a three-tier filtering process by filtering first through a set of attested bigrams, then through a set of attested terms, and lastly through the WWW for a final validation. We illustrate the approach with English-to-Japanese back-transliteration.</Paragraph> <Paragraph position="1"> Against test sets of Japanese given names and surnames, we have achieved average precisions of 73% and 90%, respectively.</Paragraph> </Section> class="xml-element"></Paper>