File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/p93-1004_intro.xml

Size: 3,213 bytes

Last Modified: 2025-10-06 14:05:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1004">
  <Title>STRUCTURAL MATCHING OF PARALLEL TEXTS</Title>
  <Section position="3" start_page="0" end_page="23" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Bilingual (or parallel) texts are useful resources for acquisition of linguistic knowledge as well as for applications such as machine translation. Intensive research has been done for aligning bilingual texts at the sentence level using statistical teclmiques by measuring sentence lengths in words or in characters (Brown 91), (Gale 91a). Those works are quite successful in that far more than 90% of sentences in bilingual corpora, are a.ligned correctly.</Paragraph>
    <Paragraph position="1"> Although such parallel texts are shown to be useful in real applications such as machine translation (Brown 90) and word sense disambiguatioll (Daga.n 91), structured bilingual sentences are undoubtedly more informative and important for filture natural language researches. Structured bilingual or multi-lingual corpora, serve a.s richer sources for extracting linguistic knowledge (Kaji 92), (Klavans 90), (Sadler 91), (Utsuro 92).</Paragraph>
    <Paragraph position="2"> Phrase level or word level alignment has also been done by several researchers. The Textual Knowledge Bank Project (Sadler 91) is building lnonolingual and multilingual text bases structured by linking the elements with grammatical (dependency), referential, and bilingual relations. (Karl 92) reports a method to obtain phrase level correspondence of parallel texts by coupling phrases of two languages obtained in CKY parsing processes.</Paragraph>
    <Paragraph position="3"> This paper presents another method to obtain structural matching of bilingual texts. Sentences in both languages are parsed to produce (disjunctive) feature structures, from which dependency structures are extracted. Ambiguities are represented as disjunction. Then, the two structures are matched to establish a one-to-one correspondence between their substructures. The result of the match is obtained as a set of pairs of minimal corresponding substructures of the dependency structures. Examples of the results are shown in Figures 1, 2 and 3. A dependency structure is represented as a tree, in which ambiguity is specified by a disjunctive node (OR. node). Circles in the figure show substructures and bidirectional arrows show corresponding substructures.</Paragraph>
    <Paragraph position="4"> Our technique and the results are different from those of other lnethods mentioned above. (Kaji 92) identifies corresponding phrases and ahns at producing tra.nslation templates by abstracting those corresponding phrases. In the Bilingua.l Knowledge Bank (Sadler 91), the correspondence is shown by  links between words in two sentences, equating two whole subtrees headed by the words. We prefer the Ininimal substructure correspondence and the relationship between substructures. Such a minimal substructure stands for the minimal meaningful component in the sentence, which we believe is very useful for our target application of extracting lexical knowledge fi'om bilingual corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML