File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/w96-0107_abstr.xml

Size: 924 bytes

Last Modified: 2025-10-06 13:48:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0107">
  <Title>Automatic Extraction of Word Sequence Correspondences in Parallel Corpora</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper proposes a method of finding correspondences of arbitrary length word sequences in aligned parallel corpora of Japanese and English. Translation candidates of word sequences are evaluated by a similarity measure between the sequences defined by the co-occurrence frequency and independent frequency of the word sequences. The similarity measure is an extension of Dice coefficient. An iterative method with gradual threshold lowering is proposed for getting a high quality translation dictionary. The method is tested with parallel corpora of three distinct domains and achieved over 80~0 accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML