File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/j94-4001_abstr.xml
Size: 10,445 bytes
Last Modified: 2025-10-06 13:48:17
<?xml version="1.0" standalone="yes"?> <Paper uid="J94-4001"> <Title>A Syntactic Analysis Method of Long Japanese Sentences Based on the Detection of Conjunctive Structures</Title> <Section position="2" start_page="0" end_page="510" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Machine translation systems are gradually being accepted by a wider range of people, and accordingly the improvement of machine translation systems is becoming an urgent requirement by manufacturers. There are many difficult problems that cannot be solved by the current efforts of many researchers. Analysis of long Japanese sentences is one of them. It is difficult to get a proper analysis of a sentence whose length is more than 50 Japanese characters, and almost all the current analysis methods fail for sentences composed of more than 80 characters. By analysis failure we mean the following: that no correct analysis is included in the multiple analysis results that are derived from the intrinsic ambiguity of a sentence or by inaccurate grammatical rules; that the analysis fails in the middle of the anaIysis process because an unacceptably large number of parses for a sentence is produced.</Paragraph> <Paragraph position="1"> * Department of Electrical Engineering, Kyoto University, Kyoto, 606, Japan. (c) 1994 Association for Computational Linguistics Comparison between a conventional method and our method.</Paragraph> <Paragraph position="2"> Some researchers have attributed the difficulties to the numerous possibilities of head-dependent relations between phrases in long sentences. But no deeper consideration has ever been given to the reasons for the analysis failure.</Paragraph> <Paragraph position="3"> A long sentence, particularly in Japanese, very often contains conjunctive structures. These may be either conjunctive noun phrases or conjunctive predicative clauses. Among the latter, those made by the renyoh forms of predicates (the ending forms that mean connection to another right predicate) are called renyoh chuushi-ho (see example sentence (iv) of Table 1). A renyoh chuushi-ho appears in an embedded sentence to modify nouns and is also used to connect two or more sentences. This form is used frequently in Japanese and is a major cause of structural ambiguity. Many major sentential components are omitted in the posterior part of renyoh chuushi expressions, thus complicating the analysis. For the successful analysis of long sentences, these conjunctive phrases and clauses, including renyoh chuushi-ho, must be recognized correctly. Nevertheless, most work in this area (e.g., Dahl and McCord 1983; Fong and Berwick 1985; Hirschman 1986; Kaplan and Maxwell 1988; Sag et al. 1985; Sedogbo 1985; Steedman 1990; Woods 1973) has concerned the problem of creating candidate conjunctive structures or explaining correct conjunctive structures, and not the method for selecting correct structures among many candidates. A method proposed by some researchers (Agarwal and Boggess 1992; Nagao et al. 1983) for selecting the correct structure is, in outline, that the two most similar components to the left side and to the right side of a conjunction are detected as two conjoined heads in a conjunctive structure. For example, in &quot;John enjoyed the book and liked the play&quot; we call the verbs &quot;enjoyed&quot; and &quot;liked&quot; conjoined heads; &quot;enjoyed&quot; is the pre-head, and &quot;liked&quot; the posthead. We also call &quot;enjoyed the book&quot; pre-conjunct, and &quot;liked the play&quot; post-conjunct. In Japanese, the word preceding a conjunction is the pre-head, and the post-head that is most similar to the pre-head is searched for (Nagao et al. 1983) (see the upper part of Figure 1). In English, conversely, the phrase following the conjunction is the posthead, and the pre-head is searched for in the same way (Agarwal and Boggess 1992). Sadao Kurohashi and Makoto Nagao Syntactic Analysis Method However, two conjoined heads are sometimes far apart in a long sentence, making this simple method clearly inadequate.</Paragraph> <Paragraph position="4"> Human beings can recognize conjunctive structures because of a certain, but sometimes subtle, similarity that exists between conjuncts. Not only the conjoined heads, but also other components in conjuncts, have some similarity, and furthermore, the pre- and post-conjuncts have a structural parallelism. A computational method needs to recognize this subtle similarity in order to detect the correct conjunctive structures. In this investigation, we have developed an algorithm for calculating a similarity measure between two arbitrary series of words from the left and the right of a conjunction and selecting the two most similar series of words that can reasonably be considered as composing a conjunctive structure (see the lower part of Figure 1). This procedure is realized using a dynamic programming technique.</Paragraph> <Paragraph position="5"> In our syntactic analysis method, the first step is the detection of conjunctive structures by the above-mentioned algorithm. Since two or more conjunctive structures sometimes exist in a sentence with very complex interrelations, the second step is to adjust tangled relations that may exist between two or more conjunctive structures in the sentence. In this step conjunctive structures with incorrect overlapping relations, if they exist, are found and retrials of detecting their scopes are done. The third step of our syntactic analysis is a very common operation. Japanese sentences can best be explained by kakari-uke, which is essentially a dependency structure. Therefore our third step, after identifying all the conjunctive structures, is to perform dependency analyses for each phrase/clause of the conjunctive structures and the dependency analysis for the whole sentence after all the conjunctive structures have been reduced into single nodes. The dependency analysis of Japanese is rather simple. A component depends on a component to its right (not necessarily the adjacent component), and the suffix (postposition) of a component indicates what kind of element it can depend on. More than one head-dependent relation may exist between components, but by introducing some heuristics, we can easily get a unique dependency analysis result that is correct for a high percentage of cases. A serious problem regarding conjunctive structures, in addition to the ambiguity of their scopes, is the ellipses in some of their components. Through the dependency analysis process outlined, we are able to find the ellipses occurring in the conjunctive structures and supplement them with the omitted components.</Paragraph> <Paragraph position="6"> 2. Types of Conjunctive Structures and Their Ambiguities In Japanese, bunsetsu is the smallest meaningful sequence consisting of an independent word (IW; nouns, verbs, adjectives, etc.) and accompanying words (AW; copulas, postpositions, auxiliary verbs, and so on)~ A bunsetsu whose IW is a verb or an adjective, or whose AW is a copula, functions as a predicate and thus is called a predicative bunsetsu (PB). A bunsetsu whose IW is a noun is called a nominal bunsetsu (NB).</Paragraph> <Paragraph position="7"> Conjunctive structures (CSs) that appear in Japanese are classified into three types (Shudo et al. 1986). The first type is the conjunctive noun phrase. We can find these phrases by the words listed in Table 1-a. Each conjunctive noun can have adjectival modifiers (Table 1-ii) or clausal modifiers (Table 1-iii).</Paragraph> <Paragraph position="8"> The second type is the conjunctive predicative clause, in which two or more predicates in a sentence form a coordination. We can find these clauses by the renyoh forms of predicates (Table 1-iv) or by the predicates accompanying one of the words in Table 1-b (Table l-v).</Paragraph> <Paragraph position="9"> The third type is a CS consisting of parts of conjunctive predicative clauses. We call this type an incomplete conjunctive structure. We can find these structures by the Characters in '//' are optional. Japanese postposition &quot;WO&quot; marks the object case. ~A noun directly followed by a comma indicates a conjunctive noun phrase or an incomplete conjunctive structure.</Paragraph> <Paragraph position="10"> correspondence of case-marking postpositions (Table 1-vi: &quot;.. WO .. NI, .. WO .. NI&quot;). However, sometimes the last bunsetsu of the pre-conjunct has no case-marking post-position (e.g., &quot;NI&quot; can be omitted in the bunsetsu &quot;KAISEKI-NI&quot; in Table 1-vi), just followed by one of the words listed in Table 1-c. In such cases we cannot distinguish this type of CS from conjunctive noun phrases by seeing the last bunsetsu of the pre-conjunct. However, this does not matter, as our method handles the three types of CSs in almost the same way in the stage of detecting their scopes, and it exactly distinguishes incomplete conjunctive structures in the stage of dependency analysis.</Paragraph> <Paragraph position="11"> For all of these types, it is relatively easy to detect the presence of a CS by looking for a distinctive key bunsetsu (we call this a KB) that accompanies a word indicating a CS listed in Table 1 or has the renyoh forms (the underlined bunsetsus are KBs in Sadao Kurohashi and Makoto Nagao Syntactic Analysis Method Table 1). A KB lies last in the pre-conjunct and is a pre-head. However, it is difficult to determine which bunsetsu sequences on both sides of the KB constitute pre- and post-conjuncts. That is, it is not easy to determine which bunsetsu to the left of a KB is the leftmost bunsetsu of the pre-conjunct (we call this starting bunsetsu SB) and which bunsetsu to the right of a KB is the rightmost bunsetsu of the post-conjunct (this ending bunsetsu is called EB and is a post-head). The bunsetsus between these two extreme bunsetsus constitute the scope of the CS. In detecting a CS, it is most important to find the post-head (that is, the EB) among many candidates in a sentence; e.g., in a conjunctive noun phrase, all NBs after a KB are candidates (we call such a candidate bunsetsu a CB). However, our method searches not only for the most plausible EB, but also for the most plausible scope of the CS.</Paragraph> </Section> class="xml-element"></Paper>