File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/j87-1004_metho.xml
Size: 37,463 bytes
Last Modified: 2025-10-06 14:11:59
<?xml version="1.0" standalone="yes"?> <Paper uid="J87-1004"> <Title>AN EFFICIENT AUGMENTED-CONTEXT-FREE PARSING ALGORITHM 1</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 THE CONTEXT-FREE PARSING ALGORITHM </SectionTitle> <Paragraph position="0"> The LR parsing algorithms (Aho and Ullman 1972, Aho and Johnson 1974) were developed originally for programming languages. An LR parsing algorithm is a Copyright 1987 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission.</Paragraph> <Paragraph position="1"> 0362-613X/87/010031-46503.00 Computational Linguistics, Volume 13, Numbers 1-2, January-June 1987 31 Masaru Tomita An Efficient Augmented-Context-Free Parsing Algorithm shift-reduce parsing algorithm deterministically guided by a parsing table indicating what action should be taken next. The parsing table can be obtained automatically from a context-free phrase structure grammar, using an algorithm first developed by DeRemer (1969, 1971). We do not describe the algorithms here, referring the reader to chapter 6 in Aho and Ullman (1977). We assume that the reader is familiar with the standard LR parsing algorithm (not necessarily with the parsing table construction algorithm).</Paragraph> <Paragraph position="2"> The LR paring algorithm is one of the most efficient parsing algorithms. It is totally deterministic, and no backtracking or search is involved. Unfortunately, we cannot directly adopt the LR parsing technique for natural languages, because it is applicable only to a small subset of context-free grammars called LR grammars, and it is almost certain that any practical natural language grammars are not LR. If a grammar is non-LR, its parsing table will have multiple entries; 1 one or more of the action table entries will be multiply defined (Shieber 1983). Figures 2.1 and 2.2 show an example of a non-LR grammar and its parsing table. Grammar symbols starting with &quot;*&quot; represent pre-terminals. Entries &quot;sh n&quot; in the action table (the left part of the table) indicate the action &quot;shift one word from input buffer onto the stack, and go to state n&quot;. Entries &quot;re n&quot; indicate the action &quot;reduce constituents on the stack using rule n&quot;. The entry &quot;ace&quot; stands for the action &quot;accept&quot;, and blank spaces represent &quot;error&quot;. The goto table (the right part of the table) decides to what state the parser should go after a reduce action. These operations shall become clear when we trace the algorithm with example sentences in section 4. The exact definition and operation of the LR parser can be found in Aho and Ullman (1977).</Paragraph> <Paragraph position="3"> We can see that there are two multiple entries in the action table; on the rows of state 11 and 12 at the (1) S --> NP VP (2) S --> S PP (3) NP --> *n (4) NP --> *det *n (5) NP --> NP PP (6) PP --> *prep NP (7) VP --> *v NP column labeled &quot;*prep&quot;. Roughly speaking, this is the situation where the parser encounters a preposition of a PP right after a NP. If this PP does not modify the NP, then the parser can go ahead to reduce the NP into a higher nonterminal such as PP or VP, using rule 6 or 7, respectively (re6. and re7 in the multiple entries). If, on the other hand, the PP does modify the NP, then the parser must wait (sh6) until the PP is completed so it can build a higher NP using rule 5.</Paragraph> <Paragraph position="4"> It has been thought that, for LR parsing, multiple entries are fatal because once a parsing table has multiple entries, deterministic parsing is no longer possible and some kind of non-determinism is necessary. We handle multiple entries with a special technique, named a graph-structured stack. In order to introduce the concept, we first give a simpler form of non-determinism, and make refinements on it. Subsection 2.1 describes a simple and straightforward non-deterministic technique, that is, pseudo-parallelism (breadth-first search), in which the system maintains a number of stacks simultaneously, called the Stack List. A disadvantage of the stack list is then described. The next subsection describes the idea of stack combination, which was introduced in the author's earlier research (Tomita 1984), to make the algorithm much more efficient. With this idea, stacks are represented as trees (or a forest). Finally, a further refinement, the graph-structured stack, is described to make the algo-</Paragraph> <Paragraph position="6"> rithm even more efficient; efficient enough to run in polynomial time.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2.1 HANDLING MULTIPLE ENTRIES WITH STACK LIST </SectionTitle> <Paragraph position="0"> The simplest idea would be to handle multiple entries non-deterministically. We adopt pseudo-parallelism (breadth-first search), maintaining a list of stacks (the Stack List). The pseudo-parallelism works as follows.</Paragraph> <Paragraph position="1"> A number of processes are operated in parallel. Each process has a stack and behaves basically the same as in standard LR parsing. When a process encounters a multiple entry, the process is split into several processes (one for each entry), by replicating its stack. When a process encounters an error entry, the process is killed, by removing its stack from the stack list. All processes are synchronized; they shift a word at the same time so that they always look at the same word. Thus, if a process encounters a shift action, it waits until all other processes also encounter a (possibly different) shift action.</Paragraph> <Paragraph position="2"> Figure 2.3 shows a snapshot of the stack list right after shifting the word with in the sentence I saw a man on the bed in the apartment with a telescope using the grammar in Figure 2.1 and the parsing table in Figure 2.2. For the sake of convenience, we denote a stack with vertices and edges. The leftmost vertex is the bottom of the stack, and the rightmost vertex is the top of the stack. Vertices represented by a circle are called state vertices, and they represent a state number. Vertices represented by a square are called symbol vertices, and they represent a grammar symbol. Each stack is exactly the same as a stack in the standard LR parsing algorithm. The distance between vertices (length of an edge) does not have any significance, except it may help the reader understand the status of the stacks. In the figures, &quot;*p&quot; stands for *prep, and &quot;*d&quot; stands for *det throughout this paper.</Paragraph> <Paragraph position="3"> Since the sentence is 14-way ambiguous, the stack has been split into 14 stacks. For example, the sixth stack (0 S 1 *p 6 NP 11 *p 6) is in the status where I saw a man on the bed has been reduced into S, and the apartment has been reduced into NP. From the LR parsing table, we know that the top of the stack, state 6, is expecting *det or *n and eventually a NP. Thus, after a telescope comes in, a PP with a telescope will be formed, and the PP will modify the NP the apartment, and in the apartment will modify the S I saw a man. We notice that some stacks in the stack list appear to be identical. This is because they &quot;have reached the current state in different ways. For example, the sixth and seventh stacks are identical, because I saw a man on the bed has been reduced into S in two different ways.</Paragraph> <Paragraph position="4"> A disadvantage of the stack list method is that there are no interconnections between stacks (processes), and there is no way in which a process can utilize what other processes have done already. The number of stacks in the stack list grows exponentially as ambiguities are encountered. 3 For example, these 14 processes in Figure I saw a man on the bed in the apartment with a telescope (with the the grammar and the table in Figures 2.1 and 2.2). times in exactly the same way. This can be avoided by using a tree-structured stack, which is described in the following subsection.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 WITH A TREE-STRUCTURE STACK </SectionTitle> <Paragraph position="0"> If two processes are in a common state, that is, if two stacks have a common state number at the rightmost vertex, they will behave in exactly the same manner until the vertex is popped from the stacks by a reduce action.</Paragraph> <Paragraph position="1"> To avoid this redundant operation, these processes are unified into one process by combining their stacks.</Paragraph> <Paragraph position="2"> Whenever two or more processes have a common state number on the top of their stacks, the top vertices are unified, and these stacks are represented as a tree, where the top vertex corresponds to the root Of the tree. We call this a tree-structured stack. When the top vertex is popped, the tree-structured stack is split into the original number of stacks. In general, the system maintains a number of tree-structured stacks in parallel, so stacks are represented as a forest. Figure 2.4 shows a snapshot of the tree-structured stack immediately after shifting the word with. In contrast to the previous example, the telescope will be parsed only once.</Paragraph> <Paragraph position="3"> Although the amount of computation is significantly reduced by the stack combination technique, the number of branches of the tree-structured stack (the number of bottoms of the stack) that must be maintained still grows exponentially as ambiguities are encountered. In the next subsection, we describe a further modification in which stacks are represented as a directed acyclic graph, in order to avoid such inefficiency.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 WITH A GRAPH-STRUCTURE STACK </SectionTitle> <Paragraph position="0"> So far, when a stack is split, a copy of the whole stack is made. However, we do not necessarily have to copy the whole stack: even after different parallel operations on the tree-structured stack, the bottom portion of the stack may remain the same. Only the necessary portion of the stack should therefore be split. When a stack is split, the stack is thus represented as a tree, where the bottom of the stack corresponds to the root of the tree. With the stack combination technique described in the previous subsection, stacks are represented as a directed acyclic graph. Figure 2.5 shows a snapshot of the graph stack. It is easy to show that the algorithm with the graph-structured stack does not parse any part of an input sentence more than once in the same way. This is because, if two processes had parsed a part of a sentence in the same way, they would have been in the same state, and they would have been combined as one process.</Paragraph> <Paragraph position="1"> The graph-structured stack looks very similar to a chart in chart parsing. In fact, one can also view our algorithm as an extended chart parsing algorithm that is guided by LR parsing tables. The major extension is that nodes in the chart contain more information (LR state numbers) than in conventional chart parsing. In this paper, however, we describe the algorithm as a generalized LR parsing algorithm only.</Paragraph> <Paragraph position="2"> So far, we have focussed on how to accept or reject a sentence. In practice, however, the parser must not only accept or reject sentences but also build the syntactic structure(s) of the sentence (parse forest). The next section describes how to represent the parse forest and how to build it with our parsing algorithm.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 AN EFFICIENT REPRESENTATION OF A PARSE FOREST </SectionTitle> <Paragraph position="0"> Our parsing algorithm is an all-path parsing algorithm; that is, it produces all possible parses in case an input sentence is ambiguous. Such all-path parsing is often needed in natural language processing to manage temporarily or absolutely ambiguous input sentences. The ambiguity (the number of parses) of a sentence may grow exponentially as the length of a sentence grows (Church and Patil 1982). Thus, one might notice that, even with an efficient parsing algorithm such as the one we described, the parser would take exponential time because exponential time would be required merely to print out all parse trees (parse forest). We must therefore provide an efficient representation so that the size of the parse forest does not grow exponentially.</Paragraph> <Paragraph position="1"> This section describes two techniques for providing an efficient representation: subtree sharing and local ambiguity packing. It should be mentioned that these two techniques are not completely new ideas, and some existing systems (e.g., Earley's (1970) algorithm) have already adopted these techniques, either implicitly or explicitly.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 SUB-TREE SHARING </SectionTitle> <Paragraph position="0"> If two or more trees have a common subtree, the subtree should be represented only once. For example, the parse forest for the sentence I saw a man in the park with a telescope should be represented as in Figure 3.1.</Paragraph> <Paragraph position="1"> To implement this, we no longer push grammatical symbols on the stack; instead, we push pointers to a node of the shared forest. 4 When the parser &quot;shifts&quot; a word, it creates a leaf node labeled with the word and the pre-terminal, and, instead of the pre-terminal symbol, a pointer to the newly created leaf node is pushed onto the stack.</Paragraph> <Paragraph position="2"> If the exact same leaf node (i.e., the node labeled with the same word and the same pre-terminal) already exists, a pointer to this existing node is pushed onto the stack, without creating another node. When the parser &quot;reduces&quot; the stack, it pops pointers from the stack,</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 S </SectionTitle> <Paragraph position="0"> p d *n &quot;p *el *n In the apt wTth a tel Figure 3.1. Unpacked shared forest. creates a new node whose successive nodes are pointed to by those popped pointers, and pushes a pointer to the newly created node onto the stack.</Paragraph> <Paragraph position="1"> Using this relatively simple procedure, our parsing algorithm can produce the shared forest as its output without any other special book-keeping mechanism, because it never does the same reduce action twice in the same manner.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 LOCAL AMBIGUITY PACKING </SectionTitle> <Paragraph position="0"> We say that two or more subtrees represent local ambiguity if they have common leaf nodes and their top nodes are labeled with the same non-terminal symbol. That is to say, a fragment of a sentence is locally ambiguous if the fragment can be reduced to a certain non-terminal symbol in two or more ways. If a sentence has many local ambiguities, the total ambiguity would grow exponentially. To avoid this, we use a technique called local ambiguity packing, which works in the following way. The top nodes of subtrees that represent local ambiguity are merged and treated by higher-level structures as if there were only one node. Such a node is called a packed node, and nodes before packing are called subnodes of the packed node. An example of a shared-packed forest is shown in Figure 3.2. Packed nodes are represented by boxes. We have three packed nodes in Figure 3.2; one with three subnodes and two with two subnodes.</Paragraph> <Paragraph position="1"> Local ambiguity packing can be easily implemented with our parsing algorithm as follows. In the graph-structured stack, if two or more symbol vertices have a common state vertex immediately on their left and a common state vertex immediately on their right, they represent local ambiguity. Nodes pointed to by these symbol vertices are to be packed as one node. In Figure 2.5, for example, we see one 5-way local ambiguity and two 2-way local ambiguities. The algorithm is made clear by the example in the following section.</Paragraph> <Paragraph position="2"> Recently, the author (Tomita 1986) suggested a technique to disambiguate a sentence out of the shared-packed forest representation by asking the user a minimal number of questions in natural language (without showing any tree structures).</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 EXAMPLES </SectionTitle> <Paragraph position="0"> This section presents three examples. The first example, using the sentence I saw a man in the apartment with a telescope, is intended to help the reader understand the algorithm m6re clearly.</Paragraph> <Paragraph position="1"> The second example, with the sentence That information is important is doubtful is presented to demonstrate that our algorithm is able to handle multi-part-of-speech words without any special mechanism. In the sentence, that is a multi-part-of-speech word, because it could also be a determiner or a pronoun.</Paragraph> <Paragraph position="2"> The third example is provided to show that the algorithm is also able to handle unknown words by considering an unknown word as a special multi-part-of-speech word whose part of speech can be anything. We use an example sentence I * a *, where *s represent unknown words.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 THE EXAMPLE </SectionTitle> <Paragraph position="0"> This subsection gives a trace of the algorithm with the grammar in Figure 2.1, the parsing table in Figure 2.2, and the sentence I saw a man in the park with a telescope.</Paragraph> <Paragraph position="1"> At the very beginning, the stack contains only one vertex labeled 0, and the parse forest contains nothing.</Paragraph> <Paragraph position="2"> By looking at the action table, the next action, &quot;shift 4&quot;, is determined as in standard LR parsing.</Paragraph> <Paragraph position="3"> When shifting the word I, the algorithm creates a leaf node in the parse forest labeled with the word I and its preterminal *n, and pushes a pointer to the leaf node onto the stack. The next action, &quot;reduce 3, is determined from the action table.</Paragraph> <Paragraph position="4"> Next Word = 'saw' o o 4 IreS1 0 \['n 'I'\] We reduce the stack basically in the same manner as standard LR parsing. It pops the top vertex &quot;4&quot; and the pointer &quot;0&quot; from the stack, and creates a new node in the parse forest whose successor is the node pointed to by the pointer. The newly created node is labeled with the Next Word = 'saw' left-hand side symbol of rule 3, namely &quot;NP&quot;. The pointer to this newly created node, namely &quot;1&quot;, is pushed onto the stack. The action, &quot;shift 7&quot;, is determined as the next action. Now, we have</Paragraph> <Paragraph position="6"> After executing &quot;shift 7&quot;, we have The next action is &quot;reduce 4&quot;. It pops pointers, &quot;3&quot; and &quot;4&quot;, and creates a new node in the parse forest such that node 3 and node 4 are its successors. The newly Next Word = 'in' created node is labeled with the left-hand side symbol of rule 4, i.e., &quot;NP&quot;. The pointer to this newly created node, &quot;5&quot;, is pushed onto the stack. We now have At this point, we encounter a multiple entry, &quot;reduce 7&quot; and &quot;shift 6&quot;, and both actions are to be executed. Reduce actions are always executed first, and shift actions are executed only when there is no remaining reduce action to execute. In this way, the parser works strictly from left to right; it does everything that can be done before shifting the next word. After executing &quot;reduce 7&quot;, the stack and the parse forest look like the following. The top vertex labeled &quot;12&quot; is not popped away, because it still has an action not yet executed. Such a top vertex, or more generally, vertices with one or more actions yet to be executed, are called &quot;active&quot;. Thus, we have two active vertices in the stack above: one labeled &quot;12&quot;, and the other labeled &quot;8&quot;. The action &quot;reduce 1&quot; is determined from the action table, and is associated with the latter vertex.</Paragraph> <Paragraph position="7"> Because reduce actions have a higher priority than shift actions, the algorithm next executes &quot;reduce 1&quot; on the vertex labeled &quot;8&quot;. The action &quot;shift 6&quot; is determined from the action table.</Paragraph> <Paragraph position="8"> Now we have two &quot;shift 6'&quot;s. The parser, however, creates only one new leaf node in the parse forest. After executing two shift actions, it combines vertices in the stack wherever possible. The stack and the parse forest look like the following, and &quot;shift 3&quot; is determined from the action table as the next action.</Paragraph> </Section> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.2 MANAGING MULTI-PART-OF-SPEECH WORDS </SectionTitle> <Paragraph position="0"> This subsection gives a trace of the algorithm with the sentence That information is important is doubtful, to demonstrate that our algorithm can handle multi-part-of-speech words (in this sentence, that) just like multiple entries without any special mechanism. We use the grammar at the right and the parsing table below.</Paragraph> <Paragraph position="1"> (1) S --> NP VP (2) NP --> *det &quot;n (3) NP --> *n (4) NP --> *that S (5) VP --> *be *adj At the very beginning, the parse forest contains nothing, and the stack contains only one vertex, labeled 0. The first word of the sentence is that, which can be categorized as *that, *det or *n. The action table tells us that Next Word = 'that' all of these categories are legal. Thus, the algorithm behaves as if a multiple entry is encountered. Three actions, &quot;shift 3&quot;, &quot;shift 4&quot;, and &quot;shift 5&quot;, are to be executed.</Paragraph> <Paragraph position="2"> Note that three different leaf nodes have been created in the parse forest. One of the three possibilities, that as a noun, is discarded immediately after the parser sees the Next Word = 'is' next word information.</Paragraph> <Paragraph position="3"> actions, we have This time, only one leaf node has been created in the parse forest, because both shift actions regarded the word as belonging to the same category, i.e., noun. Now Next Word = 'is' we have two active vertices, and &quot;reduce 3&quot; is arbitrarily chosen as the next action to execute. After executing An error action is finally found for the possibility, that as a determiner. After executing &quot;reduce 4&quot;, we have The parser accepts the sentence, and returns &quot;15&quot; as the top node of the parse forest. The forest consists of only one tree which is the desired structure for That information is important is doubtful.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 MANAGING UNKNOWN WORDS </SectionTitle> <Paragraph position="0"> In the previous subsection, we saw the parsing algorithm handling a multi-part-of-speech word just like multiple entries without any special mechanism. That capability can also be applied to handle unknown words (words whose categories are unknown). An unknown word can be thought of as a special type of a multi-part-of-speech word whose categories can be anything. In the following, we present another trace of the parser with the sentence I * a * where *s represent an unknown word. We use the same grammar and parsing table as in the first example (Figures 2.1 and 2.2).</Paragraph> <Paragraph position="1"> At the very beginning, we have At this point, the parser is looking at the unknown word, &quot;*&quot;; in other words, a word whose categories are *det, *n, *v and *prep. On row 4 of the action table, we Next Word = '*' have only one kind of action, &quot;reduce 3&quot;. Thus the algorithm executes only the action &quot;reduce 3&quot;, after which we have I,h ~l 1 \[.p On row 2 of the action table, there are two kinds of actions, &quot;shift 6&quot; and &quot;shift 7&quot;. This means the unknown At this point, the parser is again looking at the unknown word, &quot;*&quot;. However, since there is only one entry on row 3 in the action table, we can uniquely deter-Next Word = '$' mine the category of the unknown word, which is a noun. After shifting the unknown word as a noun, we have The possibility of the first unknown word being a preposition has now disappeared. The parser accepts the sentence in only one way, and returns &quot;10&quot; as the root node of the parse forest.</Paragraph> <Paragraph position="2"> We have shown that our parsing algorithm can handle unknown words without any special mechanism.</Paragraph> </Section> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 EMPIRICAL RESULTS </SectionTitle> <Paragraph position="0"> In this section, we present some empirical results of the algorithm's practical performance. Since space is limited, we only show the highlights of the results, referring the reader to chapter 6 of Tomita (1985) for more detail.</Paragraph> <Paragraph position="1"> Figure 5.1 shows the relationship between parsing time of the Tomita algorithm and the length of input sentence, and Figure 5.2 shows the comparison with Earley's algorithm (or active chart parsing), using a sample English grammar that consists of 220 context-free rules and 40 sample sentences taken from actual publications. All programs are run on DEC-20 and written in MacLisp, but not compiled. Although the experiment is informal, the result show that the Tomita algorithm is about 5 to 10 times faster than Earley's algorithm, due to the pre-compilation of the grammar into the LR table. The</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 42 Computational Linguistics, Volume 13, Numbers i-2, January-June 1987 Masaru Tomita An Efficient Augmented-Context-Free Parsing Algorilhm </SectionTitle> <Paragraph position="0"> Earley/Tomita ratio seems to increase as the size of grammar grows as shown in Figure 5.3. Figure 5.4 shows the relationship between the size of a produced shared-packed forest representation (in terms of the number of nodes) and the ambiguity of its input sentence (the number of possible parses). The sample sentences are created from the following schema.</Paragraph> <Paragraph position="1"> noun verb det noun (prep det noun)n-1 An example sentence with this structure is I saw a man in the park on the hill with a telescope .... The result shows that all possible parses can be represented in almost O(log n) space, where n is the number of possible parses in a sentence. 5 Figure 5.5 shows the relationship between the parsing time and the ambiguity of a sentence. Recall that within the given time the algorithm produces all possible parses in the shared-packed forest representation. It is concluded that our algorithm can parse (and produce a forest for) a very ambiguous sentence with a million possible parses in a reasonable time.</Paragraph> </Section> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 AUGMENTED CONTEXT-FREE GRAMMARS </SectionTitle> <Paragraph position="0"> So far, we have described the algorithm as a pure context-free parsing algorithm. In practice, it is often desired for each grammar nonterminal to have attributes, and for each grammar rule to have an afigmentation to define, pass, and test the attribute values. It is also desired to produce a functional structure (in the sense of functional grammar formalism (Kay 1984, Bresnan and Kaplan 1982) rather than the context-free forest.</Paragraph> <Paragraph position="1"> Subsection 6.1 describes the augmentation, and subsection 6.2 discusses the shared-packed representation for functional structures.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.1 THE AUGMENTATION </SectionTitle> <Paragraph position="0"> We attach a Lisp function to each grammar rule for this augmentation. Whenever the parser reduces constituents into a higher-level nonterminal using a phrase structure rule, the Lisp program associated with the rule is evaluated. The Lisp program handles such aspects as construction of a syntax/semantic representation of the input sentence, passing attribute values among constituents at different levels and checking syntactic/semantic constraints such as subject-verb agreement.</Paragraph> <Paragraph position="1"> If the Lisp function returns NIL, the parser does not do the reduce action with the rule. If the Lisp function returns a non-NIL value, then this value is given to the newly created non-terminal. The value includes attributes of the nonterminal and a partial syntactic/semantic representation constructed thus far. Notice that those Lisp functions can be precompiled into machine code by the standard Lisp compiler.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.2 SHARING AND PACKING FUNCTIONAL STRUCTURES </SectionTitle> <Paragraph position="0"> A functional structure used in the functional grammar formalisms (Kay 1984, Bresnan and Kaplan 1982, Shieber 1985) is in general a directed acyclic graph (dag) rather than a tree. This is because some value may be shared by two different attributes in the same sentence (e.g., the &quot;agreement&quot; attributes of subject and main verb). Pereira (1985) introduced a method to share dag structures. However, the dag structure sharing method is much more complex and computationally expensive than tree structure sharing. Therefore, we handle only tree-structured functional structures for the sake of efficiency and simplicity. 6 In the example, the &quot;agreement&quot; attributes of subject and main verb may thus have two different values. The identity of these two values is tested explicitly by a test in the augmentation. Sharing tree-structured functional structures requires only a minor modification on the subtree sharing method for the shared-packed forest representation described in subsection 3.1.</Paragraph> <Paragraph position="1"> Local ambiguity packing for augmented context-free grammars is not as easy. Suppose two certain nodes have been packed into one packed node. Although these two nodes have the same category name (e.g., NP), they may have different attribute values. When a certain test in the Lisp function refers to an attribute of the packed node, its value may not be uniquely determined. In this case, the parser can no longer treat the packed node as one node, and the parser will unpack the packed node into two individual nodes again. The question, then, is how often this unpacking needs to take place in practice. The more frequently it takes place, the less significant it is to do local ambiguity packing. However, most of sentence ambiguity comes from such phenomena as PP-attachment and conjunction scoping, and it is unlikely to require unpacking in these cases. For instance, consider the noun phrase: a man in the park with a telescope, which is locally ambiguous (whether telescope modifies man or park). Two NP nodes (one for each interpretation) will be packed into one node, but it is unlikely that the two NP nodes have different attribute values which Computational Linguistics, Volume 13, Numbers 1-2, January-June 1987 43 Masaru Tomita An Efficient Augmented-Context-Free Parsing Algorithm are referred to later by some tests in the augmentation.</Paragraph> <Paragraph position="2"> The same argument holds with the noun phrases: pregnant women and children large file equipment Although more comprehensive experiments are desired, it is expected that only a few packed nodes need to be unpacked in practical applications.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.3 THE LFG COMPILER </SectionTitle> <Paragraph position="0"> It is in general very painful to create, extend, and modify augmentations written in Lisp. The Lisp functions should be generated automatically from more abstract specifications. We have implemented the LFG compiler that compiles augmentations in a higher level notation into Lisp functions. The notation is similar to the Lexical Functional Grammar (LFG) formalism (Bresnan and Kaplan 1982) and PATR-II (Shieber 1984). An example of the LFG-like notation and its compiled Lisp function are shown in Figures 6.1 and 6.2. We generate only non-destructive functions with no side-effects to make sure that a process never alters other processes or the parser's control flow. A generated function takes a list of arguments, each of which is a value associated with each right-ha~d side symbol, and returns a value to be associated with the left-hand side symbol. Each value is a list of f-structures, in case of disjunction and local ambiguity.</Paragraph> <Paragraph position="1"> That a semantic grammar in the LFG-like notation can also be generated automatically from a domain semantics specification and a purely syntactic grammar is discussed further in Tomita and Carbonell (1986). The discussion is, however, beyond the scope of this paper.</Paragraph> <Paragraph position="3"> Example grammar rule in the LFG-like notation.</Paragraph> </Section> </Section> <Section position="11" start_page="0" end_page="0" type="metho"> <SectionTitle> 7 THE ON-L\]NEPARSER </SectionTitle> <Paragraph position="0"> Our parsing algorithm parses a sentence strictly from left to right. This characteristics makes on-line parsing possible; i.e., to parse a sentence as the user types it in, without waiting for completion of the sentence. An example session of on-line parsing is presented in Figure 7.1 for the sample sentence I saw a man with a telescope.</Paragraph> <Paragraph position="1"> As in this example, the user often wants to hit the &quot;backspace&quot; key to correct previously input words. In the case in which these words have already been processed by the parser, the parser must be able to &quot;unparse&quot; the words, without parsing the sentence from the beginning all over again. To implement unparsing, the parser needs to store system status each time a word is parsed. Fortunately, this can be nicely done with our parsing algorithm; only pointers to the graph-structured stack and the parse forest need to be stored.</Paragraph> <Paragraph position="2"> It should be noted that our parsing algorithm is not the only algorithm that parses a sentence strictly from left to right; Other left-to-right algorithms include Earley's (1970) algorithm, the active chart parsing algorithm (Winograd 1983), and a breadth-first version of ATN (Woods 1970). Despite the availability of left-to-right algorithms, surprisingly few on-line parsers exist.</Paragraph> <Paragraph position="3"> NLMenu (Tennant et al. 1983) adopted on-line parsing for a menu-based system but not for typed inputs.</Paragraph> <Paragraph position="4"> In the rest of this section, we discuss two benefits of on-line parsing, quicker response time and early error detection. One obvious benefit of on-line parsing is that it reduces the parser's response time significantly. When the user finishes typing a whole sentence, most of the input sentence has been already processed by the parser.</Paragraph> <Paragraph position="5"> Although this does not affect CPU time, it could reduce response time from the user's point of view significantly.</Paragraph> <Paragraph position="6"> On-line parsing is therefore useful in interactive systems in which input sentences are typed in by the user on-line; it is not particularly useful in batch systems in which input sentences are provided in a file.</Paragraph> <Paragraph position="7"> Another benefit of on-line parsing is that it can detect an error almost as soon as the error occurs, and it can warn the user immediately. In this way, on-line parsing could provide better man-machine communication.</Paragraph> <Paragraph position="8"> Further studies on human factors are necessary.</Paragraph> </Section> class="xml-element"></Paper>