File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/a94-1007_intro.xml

Size: 5,767 bytes

Last Modified: 2025-10-06 14:05:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1007">
  <Title>Symmetric Pattern Matching Analysis for English Coordinate Structures</Title>
  <Section position="2" start_page="0" end_page="41" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper presents a model for analyzing English sentences including coordinate conjunctions such as &amp;quot;and&amp;quot; , &amp;quot;or&amp;quot; , &amp;quot;but&amp;quot; and equivalent words. Syntactic analysis of English coordinate sentences is one of the most difficult problems for machine translation (MT) systems. The problem is selecting, from all possible candidates, the correct syntactic structure formed by an individual coordinate conjunction, i.e., determining which constituents are conjoined by the conjunction. Although the conjunction appears to have a simple function in the English language, it has been researched as a conjunct scope problem by both theoretical and computational linguists. Theoretically, it is possible to describe the syntactic and semantic constraints that govern the acceptability of a structure in which two constituents are conjoined by the conjunction(Lesmo and Torasso, 1985; Gazdar, 1981; Schachter, 1977).</Paragraph>
    <Paragraph position="1"> Computationally, it is possible to describe the grammar and heuristic rules for these constraints by ATN networks, logic grarnmars~ HPSG~ and categorical grammars(Kosy, 1986; Fong and Berwick, 1985; Huang, 1984; Boguraev et al., 1983; Blackwell, 1981; Niimi et al., 1986). However, it is not easy to apply these techniques to large-scale MT systems, because there exist a variety of conjoined patterns, many word ambiguities, some unknown words and ellipses of the words simultaneously, in real environments. Also, there may be several conjunctions and the equivalent words, such as commas, in a single sentence. Typically, the methods produce so many possible structures that MT systems cannot select the correct one, even if the grammars allow to write the rules in the simple notations.</Paragraph>
    <Paragraph position="2"> Often, conjunctions might produce the reading difficulty even for the human readers. However, they also give the readers a kind of symmetry as a reading indication. They exhibit the tendency to conjoin the same kind of syntactic patterns, which has been called &amp;quot;parallelism&amp;quot; (Beaugrande and Dressier, 1981; Shinoda, 1981). In Japanese, the similarity is used for analyzing conjunctive structures and the method is found effective(Kurohashi and Nagao, 1992). While Japanese language has several coordinate conjunctions according to the syntactic levels (a noun phrase and a predicative clause), English coordinate conjunctions are used for any level of the structures. More robust methods are necessary for dealing with English conjunctive structures.</Paragraph>
    <Paragraph position="3"> We propose here an English coordinate structure analysis model, which can determine the correct syntactic structure in real environments by taking advantage of the symmetric patterns of the parallelism.</Paragraph>
    <Paragraph position="4">  The model is based on a balance matching operation for two lists of the feature sets. The lists represent the left-side and right-side structures of the coordinate conjunctions, including word ambiguities. The operation determines the most symmetric structure by comparing both sides of the conjunction.</Paragraph>
    <Paragraph position="5"> First, the problems of the English coordinate sentences are explained. Second, it is mentioned that the parallelism can be effective information for top-down analysis of the sentences. Third, the balance matching analysis model is presented for solving the problems. This model, which was implemented in the PIVOT English-Japanese MT system with the dictionary of 100,000 words, has been working in the analysis module. Finally, the results in the MT system are reported together with the MT system configuration.</Paragraph>
    <Paragraph position="6"> 2 Problems of the conjunctions Coordinate conjunctions for MT systems present three difficulties(Kosy, 1986; Huang, 1983; Niimi et al., 1986; Okumura et ai., 1987).</Paragraph>
    <Paragraph position="7"> 1. Analysis cost: English coordinate conjunctions have a variety of linguistic functions. The conjunctions can syntactically conjoin any parts of speech; nouns, adjectives, verbs, etc., and all sorts of constituents; words, phrases and clauses. They produce so many possible structures that MT systems cannot select the correct one, even if some grammars provide the simple notations of the rules. The complexity of the rules impose a burden on the analysis process.</Paragraph>
    <Paragraph position="8"> 2. Ambiguities of the words: Most English words are ambiguous in their parts of speech ,such as Noun and Verb, Preposition and Conjunction, Auxiliary Verb and others, and Adjective and Noun. The complex rules governing the conjunctions make the word disambiguation more complicated, which results in the reduction of analysis precision.</Paragraph>
    <Paragraph position="9"> 3. Ellipses of the words: It is possible for a conjunct to contain &amp;quot;gaps&amp;quot; (elided elements) which are not allowed if the conjunction is removed.</Paragraph>
    <Paragraph position="10"> These gaps must be filled with elements from other conjunct for a proper interpretation, as in (1) and (1)'.</Paragraph>
    <Paragraph position="11"> (1) NO.2 landing gear selector valves should be closed to the full position,and NO.3 to the hMPS (1)' NO.2 landing gear selector valves should be closed to the full position, and NO.3 \[landing gear selector valves should be closed\] to the half \[position\].</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML