File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1809_concl.xml
Size: 1,425 bytes
Last Modified: 2025-10-06 13:54:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1809"> <Title>Term Extraction from Korean Corpora via Japanese</Title> <Section position="6" start_page="2" end_page="2" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> We proposed a method to extract foreign words, such as technical terms and proper nouns, from Korean corpora and produce a Japanese-Korean bilingual dictionary. Specific words, which have been imported into multiple countries, are usually spelled out by special phonetic alphabets, such as Katakana in Japanese and Hangul in Korean.</Paragraph> <Paragraph position="1"> Because extracting foreign words spelled out by Katakana in Japanese lexicons and corpora can be performed with a high accuracy, our method extracts words in Korean corpora that are phonetically similar to Japanese Katakana words. Our method does not require parallel or comparable bilingual corpora and human annotation for these corpora.</Paragraph> <Paragraph position="2"> We also performed experiments in which we extracted foreign words from Korean newspaper articles and used the resultant dictionary for morphological analysis. We found that our method did not correctly extract compound Korean words consisting of both conventional and foreign words. Future work includes larger-scale experiments to further investigate the effectiveness of our method.</Paragraph> </Section> class="xml-element"></Paper>