File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/w01-1412_abstr.xml

Size: 1,002 bytes

Last Modified: 2025-10-06 13:42:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1412">
  <Title>A Comparative Study on Translation Units for Bilingual Lexicon Extraction</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. The main objective of this paper is to examine various N-gram models of generating translation units for bilingual lexicon extraction. Three N-gram models, a baseline model (Bound-length N-gram) and two new models (Chunk-bound N-gram and Dependency-linked N-gram) are compared. An experiment with 10000 English-Japanese parallel sentences shows that Chunk-bound N-gram produces the best result in terms of accuracy (83%) as well as coverage (60%) and it improves approximately by 13% in accuracy and by 5-9% in coverage from the previously proposed baseline model.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML