File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/j88-1002_metho.xml
Size: 37,344 bytes
Last Modified: 2025-10-06 14:12:11
<?xml version="1.0" standalone="yes"?> <Paper uid="J88-1002"> <Title>Parsing Japanese Phrases with Complex Verbs. Presented at the Linguistic Conference on East Asian Languages: Verb Phrases, in Los Angeles, California. (Reprinted in Kim, Nam-Kil and Tiee,</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> A COMMON PARSING SCHEME FOR LEFT- AND RIGHT=BRANCHING LANGUAGES </SectionTitle> <Paragraph position="0"> This paper presents some results of an attempt to develop a common parsing scheme that works systematically and realistically for typologically varied natural languages. The scheme is bottom-up, and the parser scans the input text from left to right. However, unlike the standard LR(k) parser or Tomita's extended LR(1) parser, the one presented in this paper is not a pushdown automaton based on shift-reduce transition that uses a parsing table. Instead, it uses integrated data bases containing information about phrase patterns and parse tree nodes, retrieval of which is triggered by features contained in individual entries of the lexicon. Using this information, the parser assembles a parse tree by attaching input words (and sometimes also partially assembled parse trees and tree fragments popped from the stack) to empty nodes of the specified tree frame, until the entire parse tree is completed. This scheme, which works effectively and realistically for both left-branching languages and right-branching languages, is deterministic in that it does not use backtracking or parallel processing. In this system, unlike in ATN or in LR(k), the grammatical sentences of a language are not determined by a set of rewriting rules, but by a set of patterns in conjunction with procedures and the meta rules that govern the system's operation.</Paragraph> <Paragraph position="1"> This paper presents some results of an attempt to develop a common parsing scheme that works systematically and realistically for typologically varied natural languages. When this project was started in 1982, the algorithm based on augmented transition networks (ATNs) codified by Woods (1970, 1973) was not only the most commonly used approach to parsing natural languages in computer systems, but it was also the achievement of computational linguistics which was most influential to other branches of linguistics. For example, researchers of psycholinguistics like Kaplan (1972) and Wanner and Maratsos (1978) used ATN-based parsers as simulation models of human language processing. Bresnan (1978) used an ATN model, among others, to test whether her version of transformational grammar was &quot;realistic&quot;. Fodor's theory of &quot;superstrategy&quot; Fodor (1979) was also strongly influenced by the standard ATN algorithm. Indeed, as Berwick and Weinberg (1982) contend, parsing efficiency or computational complexity by itself may not provide reliable criteria for the evaluation of grammatical theories. It is evident, however, that computers can be used as an effective means of simulation in linguistics, as they have proved to be in other branches of science.</Paragraph> <Paragraph position="2"> Nevertheless, as a simulation model of the human faculty of language processing, the standard ATN mechanism has an intrinsic drawback: unless some ad hoc, unrealistic, and efficiency-robbing operations are added, or unless one comes up with a radically different grammatical framework, it cannot be used to parse left-branching languages like Japanese in which the beginning of embedded clauses is not regularly marked.</Paragraph> <Paragraph position="3"> One may try to cope with this problem by developing a separate parsing algorithm for left-branching languages, leaving the ATN formalism to specialize in right-branching languages like English. However, this solution contradicts our intuition that the core of the human faculty of language processing is universal.</Paragraph> <Paragraph position="4"> Another possible alternative, an ATN-type parser which processes left-branching language's sentences backward from right to left, is also unrealistic. If computational linguistics is to provide a simulation model for theoretical linguistics and psycholinguistics, it must develop an alternative parsing scheme which can Copyright 1988 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/88/010020-30503.00 20 Computational Linguistics, Volume 14, Number I, Winter 1988 Paul T. Sato A Common Parsing Scheme for Left- and Right-Branching Languages effectively and realistically process both left-branching and right-branching languages. Even for purely practical purposes, such a scheme is desirable because it will facilitate the development of machine translation systems which can handle languages with different typological characteristics.</Paragraph> <Paragraph position="5"> Some limitations of ATN-based parsers for handling left-branching languages are illustrated in section 1. The rest of this paper describes and illustrates my alternative parsing scheme called Pattern Oriented Parser (POP), which can be used for both left-branching and right-branching languages. (POP is a descendant of its early prototype called Pattern-Stack Parser, which was introduced in Sato (1983a.)) A general outline of POP is given in section 2, and its operation is illustrated in section 3, using both English and Japanese examples.</Paragraph> <Paragraph position="6"> Some characteristics of POP are highlighted in section 4, after which brief concluding remarks are made in section 5.</Paragraph> <Paragraph position="7"> The present version of POP is a syntactic analyzer, and it does not take semantics into consideration.</Paragraph> <Paragraph position="8"> However, the system could be readily augmented with procedures that build up semantic interpretations along with syntactic analysis. One such model was presented in Sato (1983b).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 LIMITATIONS OF ATN-BASED PARSERS </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 CASE ASSIGNMENT </SectionTitle> <Paragraph position="0"> One of the greatest obstacles faced when attempting to develop an ATN-based parser for a language like Japanese is the unpredictability caused by the relatively free word order and by the left-branching subordinate clauses which have no beginning-of-clause marker.</Paragraph> <Paragraph position="1"> Indeed, Japanese word order is not completely free.</Paragraph> <Paragraph position="2"> For example, modifiers always precede the modified, and the verb complex (a verbal root plus one or more ordered suffixes marking tense, aspect, modality, voice, negativity, politeness level, question, etc.) is always placed at the end of the sentence. Moreover, almost all nouns and noun phrases occurring in Japanese sentences have one or more suffixes marking case relationships. 1 However, Japanese postnominal suffixes, by themselves, do not always provide all the necessary information for case assignment. For example, the direct object of a nonstative verb complex is marked by -o, while the direct object of a stative verb complex is usually marked by -ga, which also marks the subject.</Paragraph> <Paragraph position="3"> Compare the two sentences in (1).</Paragraph> <Paragraph position="4"> (1) a. Mary-wa John-ga nagusame-ta. 'As for Mary, John consoled her.' (-wa = TOPIC, nagusame'console' <-STATIVE>),-ta = PAST) b. Mary-wa John-ga wakar-ta. 'As for Mary, she understood John.' (wakar- 'understand'</Paragraph> <Paragraph position="6"> An ATN-based parser cannot positively identify the functions of the two noun phrases of these sentences until it processes the verb complex at the end of the sentence.</Paragraph> <Paragraph position="7"> Examples like (2) also illustrate how little can be deduced from postnominal suffixes before the sentence-final verb complex is processed.</Paragraph> <Paragraph position="8"> (2) a. Mary-ga hon-o kaw-ta. 'Mary bought a book.' (kaw- 'buy') b. John-ga Mary-ni hon-o kaw-sase-ta. 'John made Mary buy a book.' (-sase- = CAUSE) c. Mary-ga John-ni hon-o kaw-sase-rare-ta. 'Mary was made by John to buy a book.' (-rare- = PASSIVE) The agent of the embedded sentence is marked by -ni in (2b), but by -ga in (2c).</Paragraph> <Paragraph position="9"> The relatively free word order of Japanese further complicates the situation, as in the six sentences listed in (3) which are all grammatical and all mean &quot;Mary was made by John to buy a book&quot;, but each with different noun phrases given prominence.</Paragraph> <Paragraph position="10"> (3) a. Mary-ga John-ni hon-o kaw-sase-rare-ta. = (2c) b. Mary-ga hon-o John-ni kaw-sase-rare-ta. c. John-ni Mary-ga hon-o kaw-sase-rare-ta. d. John-ni hon-o Mary-ga kaw-sase-rare-ta.</Paragraph> <Paragraph position="11"> e. Hon-o Mary-ga John-ni kaw-sase-rare-ta.</Paragraph> <Paragraph position="12"> f. Hon-o John-ni Mary-ga kaw-sase-rare-ta.</Paragraph> </Section> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1.2 EMBEDDED SENTENCES </SectionTitle> <Paragraph position="0"> Embedded sentences in languages like Japanese pose more serious problems because they do not normally carry any sign to mark their beginning. As a result, the beginning of a deeply embedded sentence can look exactly like the beginning of a simple top-level sentence, as illustrated in (4).</Paragraph> <Paragraph position="1"> (4) a. Mary-ga sotugyoo-si-ta. 'Mary was graduated (from school).' (sotugyoo-si- 'be graduated') b. Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta. 'The high school from which Mary was graduated was burnt down.' (kookoo 'high school', zensyoosi- 'be burnt down') c. Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta to iw-ru. 'It is reported that the high school from which Mary was graduated was burnt down.' (to = END-OF-QUOTE, iw- 'say', -ru = NONPAST) null d. Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta to iw-ru sirase-o uke-ta. '(I/we/you/he/she/they) received news (which says) that the high school from which Mary was graduated was burnt down.' (sirase 'news', uke- 'receive') e. Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta to iw-ru sirase-o uke-ta Cindy-ga nak-te i-ru. 'Cindy, who received news that the high school from which Mary was graduated was burnt down, is crying.' (nak-te i- 'be crying, be weeping') Computational Linguistics, Volume 14, Number 1, Winter 1988 21 Paul T. Sato A Common Parsing Scheme for Left- and Right-Branching Languages In order to process sentences listed in (4), the NP network of an ATN-based parser must be expanded by prefixing to it another state with two arcs leaving from it: a PUSH SENTENCE arc that processes a relative clause, and a JUMP arc that processes noun phrases that do not include a relative clause.</Paragraph> <Paragraph position="2"> However, as (4) illustrates, there is no systematic way to determine which of the two arcs leaving the first state of this expanded NP network should be taken when the parser encounters the first word of the input. The parser cannot predict the correct path until it has completed processing the entire sentence or the entire relative clause and has seen what followed it. Because there is theoretically no limit to the number of levels of relative clause embedding, the number of combinations of possible arcs to be traversed is theoretically infinite.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 OVERVIEW OF PATTERN ORIENTED PARSER (POP) </SectionTitle> <Paragraph position="0"> This section presents a quick overview of Pattern Oriented Parser (POP), which I have developed in order to cope with the kind of difficulties mentioned in the previous section.</Paragraph> <Paragraph position="1"> POP is a left-to-right, bottom-up parser consisting of three data bases, a push-down STACK, a buffer, a register, and a set of LISP programs collectively called here the PROCESSOR that builds the parse tree of the input sentence. The relationship of these components is shown schematically in (5).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> PHP &quot;LNP REGISTER </SectionTitle> <Paragraph position="0"> The SNP (Sentence Pattern data base) contains a set of parse tree frames, each of which is associated with one class of verbs or verbal derivational suffixes and includes information about the syntactic subcategorization of the members of that class and information about the thematic roles of their arguments. For example, the SNP entry for a class of English verbs which includes buy and sell looks like (6).</Paragraph> <Paragraph position="2"> The PHP (Phrase Pattern data base) contains information about the internal structure of noun phrases and adverbial phrases and the procedures for building the parse trees of such phrases. For example, (7) is an English translation of the PHP entry for a Japanese noun phrase which contains a relative clause. 2 (7) If the CWS is an NP and the TOS is an S, then construct the following noun phrase and push it to the STACK: (NP (HEAD CWS) (MOD (rep_emn TOS CWS))) - CWS is the word or phrase on which the PROCESSOR is currently working.</Paragraph> <Paragraph position="3"> - TOS is the word or phrase at the top of the STACK.</Paragraph> <Paragraph position="4"> - (rep_emn TOS X) means &quot;pop the TOS and attach X to its first matching empty node&quot;.</Paragraph> <Paragraph position="5"> - Each non-empty NP node is given a new index number when it is constructed.</Paragraph> <Paragraph position="6"> Details of how (7) works will be illustrated in section 3. The push-down STACK of POP stores partially assembled parse trees and tree fragments, while LNP or the &quot;Last NP&quot; REGISTER temporarily stores a copy of the noun phrase most recently attached to a node in the sentence tree. LNP is necessary to process a noun phrase with a modifier that follows the head noun (e.g., English noun phrases which contain relative clauses). The present version of POP for Japanese does not use an LNP; however, it will prove useful when we try to process parenthetical phrases. The INPUT BUFFER stores the input sentence.</Paragraph> <Paragraph position="7"> The three data bases of POP are stored on disk and can be updated independently of each other and of the PROCESSOR, while the buffer, the stack and the register are created by the PROCESSOR each time it is invoked.</Paragraph> <Paragraph position="8"> The major program modules (functions) that constitute the PROCESSOR and their hierarchical calling paths are presented in (8), where the parameters are enclosed in parentheses.</Paragraph> <Paragraph position="9"> (8) Major Functions of the PROCESSOR</Paragraph> <Paragraph position="11"> where CWS = the word or the phrase which the PROCESSOR is currently working on SNA = address of a sentence pattern stored in the SNP The PROCESSOR is activated when its top-level function, PARSE-SENTENCE, is called with the input sentence as its parameter. PARSE-SENTENCE then creates the STACK, the INPUT BUFFER and the LNP-REGISTER in the memory, puts the input sentence into the INPUT BUFFER, and calls PARSE-WORD. PARSE-WORD searches the LEXICON for an entry which matches the first word in the INPUT BUFFER and, when it is found, calls either ASSEMBLE-NP or ASSEMBLE-SENTENCE, depending on the word type of the entry it finds in the LEXICON, assembles a sub-tree, and pushes the result to the STACK. After that, PARSE-WORD removes the first word from the INPUT BUFFER and repeats the same 22 Computational Linguistics, Volume 14, Number 1, Winter 1988 Paul T. Sato A Common Parsing Scheme for Left- and Right-Branching Languages process with the next word. In the course of assembling sub-trees, ASSEMBLE-NP uses the PHP, and ASSEMBLE-SENTENCE uses the SNP and the PHP as their data bases. This process continues until the INPUT BUFFER contains only the end-of-sentence mark (EOS), when PARSE-WORD returns control to PARSE-SENTENCE, which pops the assembled sentence from the STACK and sends it to the output device, removes the stack, the buffer and the register from memory, and exits successfully.</Paragraph> <Paragraph position="12"> As shown in section 3, POP assembles a parse tree primarily by attaching terminal elements (copies of lexical entries) or tree fragments popped from the STACK to the first matching empty node of the matrix tree. All empty nodes of tree frames have an asterisk as their first element, followed by various specifications for matching requirements: (* ga (NP <+HUMAN>)) is an empty node for an NP which has a feature specification <+HUMAN> and is flagged with ga. To find the first matching empty node, the PROCESSOR conducts a depth-first search for &quot;*&quot; followed by other conditions, and when the first matching empty node is found, it attaches the specified element to that node using the LISP function UNION, thus preventing overlapping elements from being duplicated in the resultant branch. After the attachment is completed, the asterisk is removed from the node.</Paragraph> <Paragraph position="13"> The use of the LNP REGISTER will be illustrated in subsection 3.3.</Paragraph> </Section> <Section position="6" start_page="0" end_page="23" type="metho"> <SectionTitle> 3 OPERATION OF POP </SectionTitle> <Paragraph position="0"> This section illustrates the operation of POP more in detail. Subsection 3.1 is a quick walk-through of the overall operation using a simple yes~no-question in English as an example, while subsection 3.2 illustrates how POP handles the inherent problems of left-branching languages discussed in section 1, using the Japanese examples presented in that section. Then we turn our attention to English again in subsection 3.3 and illustrate POP's handling of English wh-questions and relative clauses.</Paragraph> <Section position="1" start_page="0" end_page="23" type="sub_section"> <SectionTitle> 3.1 SIMPLE ENGLISH EXAMPLE </SectionTitle> <Paragraph position="0"> Our first example is (9).</Paragraph> <Paragraph position="1"> (9) Did John buy a good book in Boston? When PARSE-SENTENCE calls PARSE-WORD and the latter finds did in the LEXICON, it makes a copy of the matching lexical entry, (V < +PAST>), and pushes it to the STACK. The next word that PARSE-WORD finds in the INPUT BUFFER is John. Therefore, PARSE-WORD searches the LEXICON and gets a copy of the entry that matches this word, (&quot;John&quot;), which is a noun. 3 Whenever PARSE-WORD encounters a noun, it calls ASSEMBLE-NP with a copy of the lexical entry as its argument. ASSEMBLE-NP assembles a new Computational Linguistics, Volume 14, Number 1, Winter 1988 noun phrase (NP1 &quot;John&quot;), and then it calls CHECK-PHP with the newly assembled NP1 as its argument. CHECK-PHP then examines the PHP data base, and returns NIL to ASSEMBLE-NP because it finds no pattern that matches the string {<V, +PAST> NP} (i.e., the TOS followed by the CWS). Because CHECK-PHP failed to find any matching entry of the PHP, ASSEMBLE-NP pushes NP1 to STACK without conducting any further assembling operation, and returns control to PARSE-WORD. The contents of the STACK at this time are shown in (10).</Paragraph> <Paragraph position="3"> PARSE-WORD then removes John from the INPUT BUFFER, picks up buy there, searches the LEXICON, and gets a copy of a matching entry. This is a verb. The lexical entry of every verb or verbal derivational suffix contains an SNA (the SNP address of the sentence pattern associated with it). Therefore, ASSEMBLE-SENTENCE retrieves a copy of the sentence pattern from the address matching the verb's SNA and attaches the verb's remaining lexical entry to its first empty V node (i.e., the first node whose CAR is &quot;*&quot; and the second member is &quot;V&quot;). It then removes the &quot;*&quot; from that node. As mentioned in section 2, the SNP entry for the class of verbs like buy and sell is (6). Therefore, by attaching (V <&quot;buy&quot;>) to the V node of its copy, ASSEMBLE-SENTENCE constructs (11).</Paragraph> <Paragraph position="5"> pops the TOS, attaches it to the first empty node matching its specifications and removes the asterisk at the beginning of that node. The result is (12).</Paragraph> <Paragraph position="7"> ASSEMBLE-SENTENCE pops TOS again. This time, it is (<V, +PAST>). ASSEMBLE-SENTENCE then examines the PHP and finds two entries (13) and (14) whose conditions match the current state. (13) If the element popped is a V and if it contains no feature other than tense, number, and/or person, attach it to the V node of the S tree which ASSEMBLE-SENTENCE is currently building.</Paragraph> <Paragraph position="8"> (14) If there is a tense feature in the element that is popped immediately after the AGNT node (or the OBJ node if the tree has no AGNT node) is filled, attach feature <Q> (i.e., &quot;question&quot;) to the main verb of the matrix S.</Paragraph> <Paragraph position="9"> ASSEMBLE-SENTENCE executes (13) and (14). The result is (15).</Paragraph> <Paragraph position="11"> The STACK is now empty. Therefore, ASSEMBLE-SENTENCE pushes (15) to the STACK and returns control to PARSE-WORD.</Paragraph> <Paragraph position="12"> PARSE-WORD removes buy from the INPUT BUFFER, encounters the indefinite article a, gets a copy of the matching lexical entry (DET <-DEF>) from the LEXICON, and pushes it to the STACK. The next word that PARSE-WORD sees is good. So a copy of its matching lexical entry (ADJ &quot;good&quot;) is pushed to the STACK and good is removed from the INPUT BUFFER.</Paragraph> <Paragraph position="13"> PARSE-WORD then finds .book in the INPUT BUFFER. Because it is a noun, PARSE-WORD calls ASSEMBLE-NP, which assembles a single-word NP and routinely calls CHECK-PHP. This time, CHECK-PHP finds (16) in the PHP.</Paragraph> <Paragraph position="14"> (16) If the CWS is an NP and if the TOS is an ADJ, assemble: (NP (HEAD CWS) (MOD (pop TOS))) At this time, the TOS is (ADJ &quot;good&quot;). Therefore, ASSEMBLE-NP pops it and assembles a new noun phrase in accordance with (16) and calls CHECK-PHP again. The new TOS is (DET <-DEF>). CHECK-PHP finds (17) in the PHP which matches this situation. (17) If the CWS is an NP and if the TOS is a DET, assemble:</Paragraph> </Section> </Section> <Section position="7" start_page="23" end_page="24" type="metho"> <SectionTitle> (DET <-DEF>)) </SectionTitle> <Paragraph position="0"> Because (18) is an NP, ASSEMBLE-NP calls CHECK-PHP again. This time, the TOS is (15), which is an S tree. CHECK-PHP finds a matching entry in the PHP again, which is (19).</Paragraph> <Paragraph position="1"> (19) If CWS = NP and TOS = S, pop the TOS and attach the CWS to its first matching empty node.</Paragraph> <Paragraph position="2"> What is involved here is the assembly of an S, which is outside the domain of ASSEMBLE-NP's responsibility.</Paragraph> <Paragraph position="3"> Therefore, before popping the S from the STACK, ASSEMBLE-NP returns the symbol &quot;AS&quot; to PARSE-WORD. PARSE-WORD then calls ASSEMBLE-SENTENCE substituting (18) for the parameter CWS and &quot;TOS&quot; for the parameter SNA. ASSEMBLE-SENTENCE then builds (20) in the manner explained earlier. The STACK is now empty, and there is no match-</Paragraph> </Section> <Section position="8" start_page="24" end_page="25" type="metho"> <SectionTitle> (DET <-DEF>)))) </SectionTitle> <Paragraph position="0"> The next thing PARSE-WORD sees in the INPUT BUFFER is EOS (end-of-sentence symbol). Therefore, it returns control to PARSE-SENTENCE, which pops (20) from the STACK, and sends it to the output device. Nothing is left in the STACK now. Therefore, PARSE-SENTENCE removes the stack, the buffer and the register from memory and exits successfully.</Paragraph> <Section position="1" start_page="24" end_page="25" type="sub_section"> <SectionTitle> 3.2 JAPANESE EXAMPLES </SectionTitle> <Paragraph position="0"> This section illustrates how POP handles the problems WORD retrieves from the LEXICON a copy of the entry which matches the stem of this word, and calls ASSEMBLE-NP because Mary is a noun. ASSEMBLE-NP assembles (NP1 &quot;Mary&quot;), and places its suffix -wa in front of the newly assembled NP as its flag. Then CHECK-PHP is called, but it returns NIL because the STACK is still empty. Therefore, ASSEMBLE-NP pushes (wa (NP1 &quot;Mary&quot;)) to the STACK. The second word, John-ga, is processed in the same way, and (ga (NP2 &quot;John&quot;)) is also pushed to the STACK.</Paragraph> <Paragraph position="1"> PARSE-WORD then encounters nagusame-ta and identifies it as the verb &quot;console&quot; with a past tense suffix. Therefore, PARSE-WORD retrieves a copy of its SNP using the SNA included in the lexical entry, and attaches the lexical entry of nagusame-ta to its empty V node. The result is (22).</Paragraph> <Paragraph position="3"> ASSEMBLE-SENTENCE then pops the TOS (ga (NP2 &quot;John&quot;)) and attaches it to the first matching empty node, namely, the AGNT node. The case flag ga, which is no longer necessary, is removed.</Paragraph> <Paragraph position="4"> The next TOS is (wa (NPI &quot;Mary&quot;)). As mentioned in section 1, wa is a suffix that marks the sentence topic. However, there is no sentence pattern stored in the Computational Linguistics, Volume 14, Number 1, Winter 1988 Paul T. Sato A Common Parsing Scheme for Left- and Right-Branching Languages SNP which includes a topic (TPIC) node. Instead, it is created by the following instructions (23) retrieved from the PHP.</Paragraph> <Paragraph position="5"> (23) If the TOS has the flag wa: a. Create a TPIC node which is directly dominated by the topmost S node and attach a &quot;copy&quot; (i.e., the category symbol and its index) of the TOS to this node.</Paragraph> <Paragraph position="6"> b. Attach the TOS to the first matching empty node.</Paragraph> <Paragraph position="7"> As is evident from (la, lb), the topic marker wa absorbs both ga and o: i.e., the topicalized NP without any other case flag can match both an NP node which is flagged with o and an NP node which is flagged with ga. Therefore, following (23b), (NP1 &quot;Mary&quot;) is attached to the first (and the only) empty node (PTNT) after (23a) is executed. The result is (24), which is the correct parse tree of (21).</Paragraph> <Paragraph position="9"> 'As for MarYi, John consoled MarYi.' Example (lb) is processed in the same way, producing the correct parse tree (25b), although both the PTNT node and the AGNT node of the SNP pattern associated with the stative verb wakar- 'understand' are flagged by ga, as shown in (25a).</Paragraph> <Paragraph position="10"> (25) a. SNP pattern associated with wakar- null The parsing of (26b) is a little more complex because it involves causative suffix -sase-, to which is associated another SNP pattern (29) (simplified here for the sake of legibility).</Paragraph> <Paragraph position="12"> where ACTN = action, Sk = embedded S.</Paragraph> <Paragraph position="13"> When the PROCESSOR processing (26b) encounters the verb kaw-sase-ta 'made to buy', it first retrieves (27) and attaches &quot;buy&quot; to its empty V node to construct the tree frame (30).</Paragraph> <Paragraph position="15"> This tree is then incorporated into (29) to obtain the complex tree frame (31). (There is a meta-rule that removes the case flag of a node in the embedded sentence if the node is co-indexed with a node in the</Paragraph> <Paragraph position="17"> By the time the PROCESSOR encounters the verb complex kaw-sase-ta 'caused to buy' and constructs the complex tree frame (31), all three noun phrases of the sentence have already been processed and stored in the STACK, as shown in (32).</Paragraph> <Paragraph position="19"> Therefore, when the tree frame (31) is completed, ASSEMBLE-SENTENCE begins to pop elements from the STACK and to attach them to empty nodes of the tree. First, (o (NP3 &quot;book&quot;)) is popped. The PTNT node of the embedded sentence is the only empty node that matches it, so the popped NP is attached there.</Paragraph> <Paragraph position="20"> Next, (ni (NP2 &quot;Mary&quot;)) is popped, which is attached to the PTNT node of the matrix sentence and its copy is attached to the co-indexed AGNT node of the embedded sentence. Finally, (ga (NP1 &quot;John&quot;)) is popped and Paul T. Sato A Common Parsing Scheme for Left- and Right-Branching Languages attached to the AGNT node of the matrix sentence. The result is (33), which is the correct parse tree of (26b).</Paragraph> <Paragraph position="21"> At this stage, the contents of the STACK are the same as (32). So when they are popped and attached to the matching nodes according to the principle explained above, we obtain the correct parse tree (36).</Paragraph> <Paragraph position="23"> 'Mary was made by John to buy a book.' As mentioned in section 2, Japanese noun phrases containing a relative clause are processed by the PHP entry presented in (7), repeated here in (37). (37) If the CWS is an NP and the TOS is an S, then construct the following noun phrase and push it to the STACK:</Paragraph> <Paragraph position="25"> To illustrate how (37) works, we will trace the noun phrase (38), which is included in all sentences cited in (4b) through (4e).</Paragraph> <Paragraph position="26"> (38) Mary-ga sotugyoo-si-ta kookoo-ga 'The high school from which Mary was graduated' (sotugyoo-si- 'be graduated', -ta = PAST, kookoo 'high school', -ga = case suffix) The SNP pattern associated with sotugyoo-si- is (39).</Paragraph> <Paragraph position="28"> where ABL = ablative and DEF = default.</Paragraph> <Paragraph position="29"> Therefore, when the first two words of (38) are processed, (40) is assembled and pushed to the STACK.</Paragraph> <Paragraph position="31"> If the next item in the INPUT BUFFER were EOS (as in (4a)), the system pops (40) and, finding that the STACK is now empty, attaches the default value &quot;school&quot; to the empty ABL node, and sends the result to the output device. However, what follows the verb in (38) is a noun. Therefore, ASSEMBLE-NP assembles (ga (NP2 &quot;high school&quot;)) and calls CHECK-PHP, which finds (37) because the CWS is the noun phrase just assembled and the TOS is (40).</Paragraph> <Paragraph position="32"> In accordance with (37), (40) is popped from the STACK, and a new noun phrase (41) is assembled and pushed to the STACK.</Paragraph> <Paragraph position="34"> There is no backtracking involved here and, by repeating the same process, POP can process nested relative clauses like those cited in (4) from left to right, without facing any combinatorial explosion.</Paragraph> </Section> <Section position="2" start_page="25" end_page="25" type="sub_section"> <SectionTitle> 3.3 WH-QUESTION AND RELATIVE CLAUSE IN ENGLISH </SectionTitle> <Paragraph position="0"> The ATN strategy for parsing wh-questions and relative clauses in English attracted special attention of many linguists, including Bresnan (1978) and Fodor (1979), because it seemed to support the trace theory and the theory of wh-movement transformation. Therefore, we will conclude the illustration of POP by explaining how it handles them.</Paragraph> </Section> <Section position="3" start_page="25" end_page="25" type="sub_section"> <SectionTitle> 3.3.1 WH-QUESTIONS </SectionTitle> <Paragraph position="0"> No special mechanism is necessary for processing English wh-questions like (42) by POP.</Paragraph> <Paragraph position="1"> (42) a. Who praised John? 26 Computational Linguistics, Volume 14, Number 1, Winter 1988 Paul T. Sato A Common Parsing Scheme for Left- and Right-Branching Languages b. Who did John praise? The SNP pattern associated with the verb praise is (43).</Paragraph> <Paragraph position="3"> First, we will trace the parse of (42a). The first word, who, is processed and the result, (NP1 <+HUMAN, WH, Q>), is pushed to the STACK before the PROCESSOR encounters praised and retrieves a copy of (43) from the SNP. Then &quot;praised&quot; is attached to the empty V node of the tree frame, and the TOS is popped and attached to the first matching empty node. Since that NP has the features <WH, Q>, and because the STACK is now empty, thefeature <Q> is moved from NP1 node to the V node. The result is (44).</Paragraph> <Paragraph position="5"> Then, John is processed in the normal way, and it is attached to the first (and the only) matching node (PTNT), following the ordinary procedure illustrated in section 3.1. The result is the correct parse tree (45).</Paragraph> <Paragraph position="7"> At first sight, parsing (42b) by POP may seem difficult because the object is placed before the subject in this sentence. However, POP processes the sentence using auxiliary did as a clue, just as humans do. In the same way as POP handled the first word of (42a), it processes who in (42b) by assembling (NP1 <+HUMAN, WH, Q>) and pushing it to the STACK. And in the same way as it handled did in (9), POP assembles (V <+PAST>) and pushes it on top of NPI, after which it processes John and pushes (NP2 &quot;John&quot;) to the STACK.</Paragraph> <Paragraph position="8"> The system then encounters praise and retrieves (43) from the SNP, pops (NP2 &quot;John&quot;) from the STACK, and attaches it to the first matching empty node, which is the AGNT node. Next, (V <+PAST>) is popped, and it is attached to the V node in accordance with (13).</Paragraph> <Paragraph position="9"> Because (V <+PAST>) is an element that is popped immediately after AGNT node is filled and because it contains a tense feature, the feature <Q> is added to this node in accordance with (14). The result is (46).</Paragraph> <Paragraph position="11"> The TOS is now (NP1 <+HUMAN, WH, Q>), which is popped and attached to the remaining matching node, and its feature <Q> is moved to the V node. 4 The result is the correct parse tree (47).</Paragraph> </Section> </Section> <Section position="9" start_page="25" end_page="25" type="metho"> <SectionTitle> 3.3.2 RELATIVE CLAUSE </SectionTitle> <Paragraph position="0"> As an example of English sentences which include relative clauses, we will examine (18).</Paragraph> <Paragraph position="1"> (48) Joan loves the brilliant linguist who the students respect.</Paragraph> <Paragraph position="2"> The first two words are processed and the partial tree (49) is constructed in the usual way, and it is pushed to the STACK.</Paragraph> <Paragraph position="3"> (49) (S (V <&quot;love&quot;, -PAST>)</Paragraph> <Paragraph position="5"> The next three words (the, brilliant, linguist) are processed in the ordinary way, and following the PHP instructions cited in (16) and (17), they are assembled into noun phrase (50) and attached to the empty PTNT node of (4-41). The result is (51), and NP4 is the content of the LNP REGISTER: The next word (who) is read in. Its lexical entry includes the feature <WH>, and the TOS is (51). Therefore, CHECK-PHP finds (52) which matches these conditions. null (52) If the CWS has a feature <WH> and if the TOS is an S, then (mark TOS) and (setq CWS (list (copyi MARKED) '<REL>)) where - (mark TOS) marks the constituent of the TOS that is equal to the content of the</Paragraph> </Section> <Section position="10" start_page="25" end_page="25" type="metho"> <SectionTitle> LNP REGISTER </SectionTitle> <Paragraph position="0"> - MARKED represents the constituent of the TOS thus marked - (copyi X) returns the category index of X.</Paragraph> <Paragraph position="1"> When (52) is applied, the CWS becomes (53), which is pushed to the STACK.</Paragraph> <Paragraph position="2"> (53) (NP4 <REL>) The next two words, the and students, are processed, and the result (54) is pushed to the STACK in accordance with (17).</Paragraph> <Paragraph position="3"> Computational Linguistics, Volume 14, Number 1, Winter 1988 27 Then remove the mark from MARKED and remove feature <REL> from the CWS.</Paragraph> <Paragraph position="4"> Before (58) is applied, the CWS is (57) and the TOS is (51) of which NP4 is marked in accordance with (52). Following (58), therefore, the daughter of the PTNT node of (51) is replaced by (59).</Paragraph> <Paragraph position="5"> The next element found in the INPUT BUFFER is EOS (end-of-sentence). So the PROCESSOR pops (60) and sends it to the output device.</Paragraph> </Section> class="xml-element"></Paper>