File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/j92-4001_metho.xml
Size: 42,832 bytes
Last Modified: 2025-10-06 14:13:11
<?xml version="1.0" standalone="yes"?> <Paper uid="J92-4001"> <Title>The Acquisition and Use of Context-Dependent Grammars for English</Title> <Section position="5" start_page="398" end_page="399" type="metho"> <SectionTitle> 4 Starting with an Emacs editor, it was fairly easy to read in a file of sentences and to assign each word </SectionTitle> <Paragraph position="0"> its syntactic class according to its context. Then the asterisk was inserted at the beginning of the syntactic string, the string was copied to the next line, the asterisk moved if a shift operation was indicated, or the top two symbols on the stack were rewritten if a reduce was required--just as we constructed the example in the preceding section. Naturally enough, we soon made Emacs macros to help us, and then escalated to a Lisp program that would print the stack-*-string and interpret our shift/reduce commands to produce a new state of the parse.</Paragraph> <Paragraph position="1"> Characteristics of a sample of the text corpus.</Paragraph> <Paragraph position="2"> sentences directly to case structures with no intermediate stage of phrase structure trees. It has the same functionality as GRAMAQ but allows the linguist user to specify a case argument and value as the transformation of syntactic elements on the stack, and to rename the head of such a constituent by a syntactic label. Figure 9 in Section 7.3 illustrates the acquisition of case grammar.</Paragraph> </Section> <Section position="6" start_page="399" end_page="401" type="metho"> <SectionTitle> 4. Experiments with CDG </SectionTitle> <Paragraph position="0"> There are a number of critical questions that need be answered if the claim that CDG grammars are useful is to be supported.</Paragraph> <Paragraph position="1"> * Can they be used to obtain accurate parses for real texts? * Do they reduce ambiguity in the parsing process? * How well do the rules generalize to new texts? * How large must a CFG be to encompass the syntactic structures for most newspaper text?</Paragraph> <Section position="1" start_page="399" end_page="401" type="sub_section"> <SectionTitle> 4.1 Parsing and Ambiguity with CDG </SectionTitle> <Paragraph position="0"> Over the course of this study we accumulated 345 sentences mainly from newswire texts. The first two articles were brief disease descriptions from a youth encyclopedia; the remaining fifteen were newspaper articles from February 1989 using the terms &quot;star wars,&quot; &quot;SDI,&quot; or &quot;Strategic Defense Initiative.&quot; Table 1 characterizes typical articles by the number of CDG rules or states, number of sentences, the range of sentence lengths, and the average number of words per sentence.</Paragraph> <Paragraph position="1"> We developed our approach to acquiring and parsing context-sensitive grammars on the first two simple texts, and then used GRAMAQ to redo those texts and to construct productions for the news stories. The total text numbered 345 sentences, which accumulated 16,275 context-sensitive rules--an average of 47 per sentence.</Paragraph> <Paragraph position="2"> The parser embodying the algorithm illustrated earlier in Figure I was augmented to compare the constituents it constructed with those prescribed during grammar acquisition by the linguist. In parsing the 345 sentences, 335 parses exactly matched the linguist's original judgement. In nine cases in which differences occurred, the parses were judged correct, but slightly different sequences of parse states occurred. The tenth case clearly made an attachment error--of an introductory adverbial phrase in the sentence &quot;Hours later, Baghdad announced .... &quot; This was mistakenly attached to &quot;Baghdad.&quot; This evaluation shows that the grammar was in precise agreement with Robert F. Simmons and Yeong-Ho Yu Context-Dependent Grammars for English Another mission soon scheduled that also would have priority over the shuttle is the first firing of a trident two intercontinental range missile from a submerged submarine.</Paragraph> <Paragraph position="3"> Sentence parse.</Paragraph> <Paragraph position="4"> the linguist 97% of the time and completed correct parses in 99.7% of the 345 sentences from which it was derived. Since our primary interest was in evaluating the effectiveness of the CDG, all these evaluations were based on using correct syntactic classes for the words in the sentences. The context-sensitive dictionary lookup procedure described in Section 7.3 is 99.5% accurate, but it assigns 40 word classes incorrectly. As a consequence, using this procedure would result in a reduction of about 10% accuracy in parsing.</Paragraph> <Paragraph position="5"> An output of a sentence from the parser is displayed as a tree in Figure 4. Since the whole mechanism is coded in Lisp, the actual output of the system is a nested list that is then printed as a tree.</Paragraph> <Paragraph position="6"> Notice in this figure that the PP at the bottom modifies the NP composed of &quot;the first firing of a trident two intercontinental range missile&quot; not just the word &quot;firing.&quot; Since the parsing is bottom-up, left-to-right, the constituents are formed in the natural order of words encountered in the sentence and the terminals of the tree can be read top-to-bottom to give their ordering in the sentence.</Paragraph> <Paragraph position="7"> Although 345 sentences totaling 8594 words is a small selection from the infinite set of possible English sentences, it is large enough to assure us that the CDG is a reasonable form of grammar. Since the deterministic parsing algorithm selects a single interpretation, which we have seen almost perfectly agrees with the linguist's parsings, it is apparent that, at least for this size text sample, there is little difficulty with ambiguous interpretations.</Paragraph> </Section> </Section> <Section position="7" start_page="401" end_page="410" type="metho"> <SectionTitle> 5. Generalization of CDG </SectionTitle> <Paragraph position="0"> The purpose of accumulating sample rules from texts is to achieve a grammar general enough to analyze new texts it has never seen. To be useful, the grammar must generalize. There are at least three aspects of generalization to be considered.</Paragraph> <Paragraph position="1"> How well does the grammar generalize at the sentence level? That is, how well does the grammar parse new sentences that it has not previously experienced? How well does the grammar generalize at the operation level? That is, how well does the grammar predict the correct Shift/Reduce operation during acquisition of new sentences? How much does the rule retention strategy affect generalization? For instance, when the grammar predicts the same output as a new rule does, and the new rule is not saved, how well does the resulting grammar parse? 5.1 Generalization at the Sentence Level The complete parse of a sentence is a sequence of states recognized by the grammar (whether it be CDG or any other). If all the constituents of the new sentence can be recognized, the new sentence can be parsed correctly. It will be seen in a later paragraph that with 16,275 rules, the grammar predicts the output of new rules correctly about 85% of the time. For the average sentence with 47 states, only 85% or about 40 states can be expected to be predicted correctly; consequently the deterministic parse will frequently fail. In fact, 5 of 14 new sentences parsed correctly in a brief experiment that used a grammar based on 320 sentences to attempt to parse the new, 20-sentence text. Considering that only a single path was followed by the deterministic parser, we predicted that a multiple-path parser would perform somewhat better for this aspect of generalization. In fact, our initial experiments with a beam search parser resulted in successful parses of 15 of the 20 new sentences using the same grammar based on the 320 sentences.</Paragraph> <Section position="1" start_page="401" end_page="402" type="sub_section"> <SectionTitle> 5.2 Generalization at the Operation Level </SectionTitle> <Paragraph position="0"> This level of generalization is of central significance to the grammar acquisition system.</Paragraph> <Paragraph position="1"> When GRAMAQ looks up a state in the grammar it finds the best matching state with the same top two elements on the stack, and offers the right half of this rule as its suggestion to the linguist. How often is this prediction correct? To answer this question we compiled the grammar of 16,275 rules in cumulative increments of 1,017 rules using a procedure, union-grammar, that would only add a rule to the grammar if the grammar did not already predict its operation. We call the result a &quot;minimal-grammar,&quot; and it contains 3,843 rules. The black line of Figure 5 shows that with the first 1,000 rules 40% were new; with an accumulation of 5,000, 18% were new rules. By the time 16,000 rules have been accumulated, the curve has flattened to an average of 16% new rules added. This means that the acquisition system will make correct prompts about 84% of the time and the linguist will only need to correct the system's suggestions about 3 or 4 times in 20 context presentations.</Paragraph> <Paragraph position="2"> ...... ~&quot; ...... t ....... t ....... t ....... t ....... t .............. ! ....... ! ....... +.'&quot;'&quot;t ...... t ....... ! ....... t ....... ! ....... ! ....... !</Paragraph> <Paragraph position="4"> ........ i ....... ; ....... ~ ....... i ....... i ....... ~ ....... 1 ....... i ....... 4 ...... 4 ...... ~- ....... i ....... ; ....... i ....... i ....... i ....... ....... i ....... i ....... i ....... J ...... L ..... i ...... i ...... i ...... J ...... ...... i ....... * ...... G.,,--.~ ....... i ....... J ....... ! ....... ! ....... i ....... i ....... 4 ...... 4 ...... 4 ....... ! ....... ! ....... ! ....... i ....... i i</Paragraph> <Paragraph position="6"/> </Section> <Section position="2" start_page="402" end_page="405" type="sub_section"> <SectionTitle> 5.3 Rule Retention and Generalization </SectionTitle> <Paragraph position="0"> If two parsing grammars account equally well for the same sentences, the one with fewer rules is less redundant, more abstract, and the one to be preferred. We used the union-grammar procedure to produce and study the minimal grammar for the 16,275 rules (rule-examples) derived from the sample text. Union-grammar records a new rule for a rule-example: s 1. if best matching rule has an operation that doesn't match 2. if best matching rule ties with another rule whose operation does not match 3. if 2 is true, and score = 21 we have a full contradiction and list the rule as an error.</Paragraph> <Paragraph position="1"> Six contradictions occurred in the grammar; five were inconsistent treatments of &quot;SNT&quot; followed by one or more punctuation marks, while the sixth offered both a shift and a &quot;pp&quot; for a preposition-noun followed by a preposition. The latter case is an attachment ambiguity not resolvable by syntax.</Paragraph> <Paragraph position="2"> In the first pass as shown in Table 2, the text resulted in 3,194 rules compared with 16,275 possible rules. That is, 13,081 possible CDG rules were not retained because already existing rules would match and predict the operation. However, using those rules to parse the same text gave very poor results: zero correct parses at the sentence level. Therefore, the process of compiling a minimal grammar was repeated starting with those 3,194 rules. This time only 619 new rules were added. The purpose of this Four passes with minimal grammar.</Paragraph> <Paragraph position="3"> repetition is to get rid of the effect that the rules added later change the predictions made earlier. Finally, in a fourth repetition of the process no rules were new. The resulting grammar of 3,843 rules succeeds in parsing the text with only occasional minor errors in attaching constituents. It is to be emphasized that the unretained rules are similar but not identical to those in the minimal grammar.</Paragraph> <Paragraph position="4"> We can observe that this technique of minimal retention by &quot;unioning&quot; new rules to the grammar results in a compression of the order 16,275/3,843 or 4.2 to 1, without increase in error. If this ratio holds for larger grammars, then if the linguist accumulates 40,000 training-example rules to account for the syntax of a given subset of language, that grammar can be compressed automatically to about 10,000 rules that will accomplish the same task.</Paragraph> <Paragraph position="5"> 6. Predicting the Size of CDGs When any kind of acquisition system is used to accumulate knowledge, one very interesting question is, when will the knowledge be complete enough for the intended application? In our case, how many CDG rules will be sufficient to cover almost all newswire stories? To answer this question, an extrapolation can be used to find a point when the solid line of Figure 5 intersects with the y-axis. However, the CDG curve is descending too slowly to make a reliable extrapolation.</Paragraph> <Paragraph position="6"> Therefore, another question was investigated instead: when will the CDG rules include a complete set of CFG rules? Note that a CDG rule is equivalent to a CFG rule if the context is limited to the top two elements of the stack. What the other elements in the context accomplish is to make one rule preferable to another that has the same top two elements of the stack, but a different context.</Paragraph> <Paragraph position="7"> We allow 64 symbols in our phrase structure analysis. That means, there are 642 possible combinations for the top two elements of the stack. For each combination, there are 65 possible operations: 6 a shift or a reduction to another symbol. Among 16,275 CDG rules, we studied how many different CFG rules can be derived by eliminating the context. We found 844 different CFG rules that used 600 different left-side pairs of symbols. This shows that a given context free pair of symbols averages 1.4 different operations. 7 Then, as we did with CDG rules, we measured how many new CFG rules were added in an accumulative fashion. The shaded line of Figure 5 shows the result. Log-log plot of new CFG rules.</Paragraph> <Paragraph position="8"> Notice that the line has descended to about 1.5% errors at 16,000 rules. To make an extrapolation easier, a log-log graph shows the same data in Figure 6. From this graph, it can be predicted that, after about 25,000 CDG rules are accumulated, the grammar will encompass a CFG component that is 99% complete. Beyond this point, additional CDG rules will add almost no new CFG rules, but only fine-tune the grammar so that it can resolve ambiguities more effectively.</Paragraph> <Paragraph position="9"> Also, it is our belief that, after the CDG reaches that point, a multi-path, beam-search parser will be able to parse most newswire stories very reliably. This belief is based on our initial experiment that used a beam search parser to test generalization of the grammar to find parses for fifteen out of twenty new sentences.</Paragraph> <Paragraph position="10"> 7. Acquiring Case Grammar Explicating the phrase structure constituents of sentences is an essential aspect in computer recognition of meaning. Case analysis organizes the constituents into a hierarchical structure of labeled propositions. The propositions can be used directly to answer questions and are the basis of schemas, scripts, and frames that are used to add meaning to otherwise inexplicit texts. As a result of the experiments with acquiring CDG and exploring its properties for parsing phrase structures, we became fairly confident that we could generalize the system to acquisition and parsing based on a grammar that would compute syntactic case structures directly from syntactic strings. Direct translation from string to structure is supported by neural network experiments such as those by McClelland and Kawamoto (1986), Miikkulainen and Dyer (1989), Yu and Simmons (1990), and Leow and Simmons (1990). We reasoned that if we could acquire case grammar with something approaching the simplicity of acquiring phrase structure rules, the result could be of great value for NL applications.</Paragraph> </Section> <Section position="3" start_page="405" end_page="406" type="sub_section"> <SectionTitle> Computational Linguistics Volume 18, Number 4 7.1 Case Structure </SectionTitle> <Paragraph position="0"> Cook (1989) reviewed twenty years of linguistic research on case analysis of natural language sentences. He synthesized the various theories into a system that depends on the subclassification of verbs into twelve categories, and it is apparent from his review that with a fine subcategorization of verbs and nominals, case analysis can be accomplished as a purely syntactic operation--subject to the limitations of attachment ambiguities that are not resolvable by syntax. This conclusion is somewhat at variance with those AI approaches that require a syntactic analysis to be followed by a semantic operation that filters and transforms syntactic constituents to compute case-labeled propositions (e.g. Rim 1990), but it is consistent with the neural network experience of directly mapping from sentence to case structure, and with the AI research that seeks to integrate syntactic and semantic processing while translating sentences to propositional structures.</Paragraph> <Paragraph position="1"> Linguistic theories of case structure have been concerned only with single propositions headed by verb predications; they have been largely silent with regard to the structure of noun phrases and the relations among embedded and sequential propositions. Additional conventions for managing these complications have been developed in Simmons (1984) and Alterman (1985) and are used here.</Paragraph> <Paragraph position="2"> The central notion of a case analysis is to translate sentence strings into a nested structure of case relations (or predicates) where each relation has a head term and an indefinite number of labeled arguments. An argument may itself be a case relation.</Paragraph> <Paragraph position="3"> Thus a sentence, as in the examples below, forms a tree of case relations.</Paragraph> <Paragraph position="4"> The old man from Spain ate fish.</Paragraph> <Paragraph position="5"> From (submarine Mod submerged Det a))) Note that mission is in Obj* relation to scheduled. This means the object of scheduled is mission, and the expression can be read as &quot;another mission such that mission is scheduled soon.&quot; An asterisk as a suffix to a label always signals the reverse direction for the label.</Paragraph> <Paragraph position="6"> There is a small set of case relations for verb arguments, such as verbmodifier, agent, object, beneficiary, experiencer, location, state, time, direction, etc. For nouns there are determiner, modifier, quantifier, amount, nounmodifier, preposition, and reverse verb relations, agt*, obj*, ben*, etc. Prepositions and conjunctions are usually used directly as argument labels while sentence conjunctions such as because, while, before, after, etc. are represented as heads of propositions that relate two other propositions with the labels preceding, post, antecedent, and consequent. For example, &quot;Because she ate fish and chips earlier, Mary was not hungry.&quot; (because Ante (ate Agt she Obj (fish And chips) Vmod earlier) Conse (was Vmod not Objl mary State hungry)) Verbs are subcategorized as vao, vabo, vo, va, vhav, vbe where a is agent, o is object, b is beneficiary and vhav is a form of have and vbe a form of be. So far, only the Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English subcategory of time has been necessary in subcategorizing nouns to accomplish this form of case analysis, but in general, a lexical semantics is required to resolve syntactic attachment ambiguities. The complete set of case relations is presumed to be small, but no one has yet claimed a complete enumeration of them.</Paragraph> <Paragraph position="7"> Other case systems such as those taught by Schank (1980) and Jackendoff (1983) classify predicate names into such primitives as Do, Event, Thing, Mtrans, Ptrans, Go, Action, etc., to approximate some form of &quot;language of thought&quot; but the present approach is less ambitious, proposing merely to represent in a fairly formal fashion the organization of the words in a sentence. Subsequent operations on this admittedly superficial class of case structures, when augmented with a system of shallow lexical semantics, have been shown to accomplish question answering, focus tracking of topics throughout a text, automatic outlining, and summarization of texts (Seo 1990; Rim 1990). One strong constraint on this type of analysis is that the resulting case structure must maintain all information present in the text so that the text may be exactly reconstituted from the analysis.</Paragraph> </Section> <Section position="4" start_page="406" end_page="409" type="sub_section"> <SectionTitle> 7.2 Syntactic Analysis of Case Structure </SectionTitle> <Paragraph position="0"> We've seen earlier that a shift/reduce-rename operation is sufficient to parse most sentences into phrase structures. Case structure, however, requires transformations in addition to these operations. To form a case structure it is frequently necessary to change the order of constituents and to insert case labels. Following Jackendoff's principle of grammatical constraint, which argues essentially that semantic interpretation is frequently reflected in the syntactic form, case transformations are accomplished as each syntactic constituent is discovered. Thus when a verb, say throw and an NP, say coconuts are on top of the stack, one must not only create a VP, but also decide the case, Obj, and form the constituent, (throw Obj coconuts). This can be accomplished in customary approaches to parsing by using augmented context free recognition rules of the form: VP~VPNP/ lobj2 where the numbers following the slash refer to the text dominated by the syntactic class in the referenced position, (ordered left-to-right) in the right half of the rule. The resulting constituents can be accumulated to form the case analysis of a sentence (Simmons 1984).</Paragraph> <Paragraph position="1"> We develop augmented context-sensitive rules following the same principle. Let us look again at the example &quot;The old man from Spain ate fish,&quot; this time to develop Some typical case transformations for syntactic constituents In this example the case transformation immediately follows the semicolon, and the result of the transformation is shown in parentheses further to the right. The result in the final constituent is: (ate Agt (man Mod old Det the From spain) Obj fish).</Paragraph> <Paragraph position="2"> Note that we did not rename the syntactic constituents as NP or VP in this example, because we were not interested in showing the phrase structure tree. Renaming in case analysis need only be done when it is necessary to pass on information accumulated from an earlier constituent.</Paragraph> <Paragraph position="3"> For example, in &quot;fish were eaten by birds,&quot; the CS parse is as follows: * n vbe ppart by n ; shift n * vbe ppart by n ; shift n vbe * ppart by n ; shift n vbe ppart * by n ; I vbe 2, vpasv (eaten Vbe were) n vpasv * by n ; I obj 2 (eaten Vbe were Obj fish) vpasv * by n ; shift vpasv by * n ; shift vpasv by n * ; i prep 2 (birds Prep by) vpasv n * ; 2 agt 1 (eaten Vbe were Obj fish Agt (birds Prep by)) Here, it was necessary to rename the combination of a past participle and its auxiliary as a passive verb, vpasv, so that the syntactic subject and object could be recognized as Obj and Agent, respectively. We also chose to use the argument name Prep to form (birds Prep by) so that we could then call that constituent Agent.</Paragraph> <Paragraph position="4"> We can see that the reduce operation has become a reduce-transform-rename operation where numbers refer to elements of the stack, the second term provides a case argument label, the ordering provides a transformation, and an optional fourth element may rename the constituent. A sample of typical case transformations is shown associated with the top elements of the stack in Table 3. In this table, the first element of the stack is in the third position in the left side of the table, and the number I refers to that position, 2 to the second, and 3 to the first. As an aid to the reader the first two Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English CS-CASE-Parser(input,cdg) Input is a string of syntactic classes for the given sentence.</Paragraph> <Paragraph position="5"> Cdg is the given CDG grammar rules.</Paragraph> <Paragraph position="7"> Algorithm for case parse.</Paragraph> <Paragraph position="8"> entries in the table refer literally by symbol rather than by reference to the stack. The symbols vao and vabo are subclasses of verbs that take, respectively, agent and object; and agent, beneficiary, and object. The symbol v.. refers to any verb. Forms of the verb be are referred to as vbe, and passivization is marked by relabeling a verb by adding the suffix -pasv.</Paragraph> <Paragraph position="9"> Parsing case structures From the discussion above we may observe that the flow of control in accomplishing a case parse is identical to that of a phrase structure parse. The difference lies in the fact that when a constituent is recognized (see Figure 7): in phrase structure, a new name is substituted for its stack elements, and a constituent is formed by listing the name and its elements in case analysis, a case transformation is applied to designated elements on the stack to construct a constituent, and the head (i.e. the first element of the transformation) is substituted for its elements--unless a new name is provided for that substitution.</Paragraph> <Paragraph position="10"> Consequently the algorithm used in phrase structure analysis is easily adapted to case analysis. The difference lies in interpreting and applying the operation to make a new constituent and a new stack.</Paragraph> <Paragraph position="11"> In the algorithm shown above, we revise the stack by attaching either the head of the new constituent, or its new name, to the stack resulting from the removal of all elements in the new constituent. The function select chooses either a new name if present, or the first element, the head of the operation. Makeconstituent applies the transformation rule to form a new constituent from the output stack and pushes the constituent onto the output stack, which is first reduced by removing the elements used in the constituent. Again, the algorithm is a deterministic, first (best) path parser Computational Linguistics Volume 18, Number 4 with behavior essentially the same as the phrase structure parser. But this version accomplishes transformations to construct a case structure analysis.</Paragraph> </Section> <Section position="5" start_page="409" end_page="410" type="sub_section"> <SectionTitle> 7.3 Acquisition System for Case Grammar </SectionTitle> <Paragraph position="0"> The acquisition system, like the parser, required only minor revisions to accept case grammar. It must apply a shift or any transformation to construct the new stack-string for the linguist user, and it must record the shift or transformation as the right half of a context-sensitive rule--still composed of a ten-symbol left half and an operation as the right half. Consequently, the system will be illustrated in Figure 9 rather than described in detail.</Paragraph> <Paragraph position="1"> Earlier we mentioned the context-sensitive dictionary. This is compiled by associating with each word the linguist's in-context assignments of each syntactic word class in which it is experienced. When the dictionary is built, the occurrence frequencies of each word class are accumulated for each word. A primitive grammar of four-tuples terminating with each word class is also formed and hashed in a table of syntactic paths. The procedure to determine a word class in context, ,, first obtains the candidates from the dictionary.</Paragraph> <Paragraph position="2"> ,, For each candidate wc, it forms a four-tuple, vec, by adding it to the cdr of each immediately preceding vec, stored in IPC.</Paragraph> <Paragraph position="3"> * Each such vec is tested against the table of syntactic paths; if it has been seen previously, it is added to the list of IPCs, otherwise it is eliminated.</Paragraph> <Paragraph position="4"> If the union of first elements of the IPC list is a single word class, that is the choice. If not, the word's most frequent word class among the union of surviving classes for the word is chosen.</Paragraph> <Paragraph position="5"> The effect of this procedure is to examine a context of plus and minus three words to determine the word class in question. Although a larger context based on five-tuple paths is slightly more effective, there is a tradeoff between accuracy and storage requirements.</Paragraph> <Paragraph position="6"> The word class selection procedure was tested on the 8,310 words of the 345-sentence sample of text. A score of 99.52% correct was achieved, with 8,270 words correctly assigned. As a comparison, the most frequent category for a word resulted in 8,137 correct assignments for a score of 97.52%. Although there are only 3,298 word types with an average of 3.7 tokens per type, the occurrence of single word class usages for words in this sample is very high, thus accounting for the effectiveness of the simpler heuristic of assignment of the most frequent category. However, since the effect of misassignment of word class can often ruin the parse, the use of the more complex procedure is amply justified. Analysis of the 40 errors in word class assignment showed 7 confusions of nouns and verbs that will certainly cause errors in parsing; other confusions of adjective/noun, and adverb/preposition are less devastating, but still serious enough to require further improvements in the procedure.</Paragraph> <Paragraph position="7"> The word class selection procedure is adequate to form the prompts in the lexical acquisition phase, but the statistics on parsing effectiveness given earlier depend on perfect word class assignments.</Paragraph> <Paragraph position="8"> Shown in Figure 8 is the system's presentation of a sentence and its requests for each word's syntactic class. The protocol in Figure 9 shows the acquisition of shift Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English Lexical Acquisition: The system prompts for syntactic classes are in capitals. The user accepts the system's prompt with a carriage return, cr or types in a syntactic class in lower case. We show user's responses in bold-face, using cr for carriage return. Other abbreviations are wc for word class, y or n for yes or no, and b for backup.</Paragraph> </Section> </Section> <Section position="8" start_page="410" end_page="411" type="metho"> <SectionTitle> (THE LAUNCH OF DISCOVERY AND ITS FIVE ASTRONAUTS HAS BEEN DELAYED AT-LEAST TWO DAYS UNTIL MARCH ELEVENTH BECAUSE-OF A CRUSHED ELECTRICAL PART ON A MAIN ENGINE COMMA OFFICIALS SAID) </SectionTitle> <Paragraph position="0"> process this one? - y or n y THE cr for default else we or b default is: ART cr LAUNCH cr for default else wc or b cr ;user made an error since there was no default LAUNCH cr for default else wc or b n ;system repeated the question OF cr for default else wc or b default is: OF cr DISCOVERY cr for default else we or b n AND cr for default else wc or b default is: CONJ er ITS cr for default else wc or b b ;user decided to redo &quot;and&quot; AND cr for default else wc or b default is: CONJ and ITS cr for default else wc or b ppron i ! skipping most of the sentence...</Paragraph> <Paragraph position="1"> A cr for default else wc or b default is: ART cr MAIN cr for default else wc or b n ENGINE cr for default else wc or b n COMMA cr for default else wc or b default is: COMMA cr OFFICIALS cr for default else wc or b n SAID cr for default else wc or b vao Figure 8 Illustration of dictionary acquisition.</Paragraph> <Paragraph position="2"> and transformation rules for the sentence. What we notice in this second protocol is that the stack shows syntactic labels but the input string presented to the linguist is in English. As the system constructs a CS rule, however, the vector containing five elements of stack and five of input string is composed entirely of syntactic classes. The English input string better enables the linguist to maintain the meaningful context he or she uses to analyze the sentence. About five to ten minutes were required to make the judgments for this sentence. Appendix A shows the rules acquired in the session. When rules for the sentence were completed, the system added the new syntactic classes and rules to the grammar, then offered to parse the sentence. The resulting parse is shown in Figure 10.</Paragraph> <Paragraph position="3"> The case acquisition system was used on the texts described earlier in Table 1 to accumulate 3,700 example CDG case rules. Because the case transformations refer to three stack elements and the number of case labels is large, we expected and found that a much larger sample of text would be required to obtain the levels of generalization seen in the phrase structure experiments.</Paragraph> <Paragraph position="4"> Accumulated in increments of 400 rules, the case curve flattens at about 2,400 rules with an average of 33% error in prediction compared to the 20% found in analysis of the same number of phrase structure rules. The compressed or minimal grammar for this set of case rules reduces the 3,700 rules to 1,633, a compression ratio in this case of 2.3 examples accounted for by each rule. The resulting compressed grammar parses the texts with 99% accuracy. These statistics are from our initial study of a case grammar, and they should be taken only as preliminary estimates of what a more thorough study may show.</Paragraph> <Paragraph position="5"> Computational Linguistics Volume 18, Number 4 Case-Grammar Acquisition: The options are h for a help message, b for backup one state, s for shift, case-trans for a case transformation, and cr for carriage return to accept a system prompt. System prompts axe capitalized in parentheses, user responses are in lower case. Where no appaxent response is shown, the user did a carriage return to accept the prompt. The first line shows the syntactic classes for the words in the sentence.</Paragraph> </Section> <Section position="9" start_page="411" end_page="414" type="metho"> <SectionTitle> (ART N OF N AND PPRON ADJ N VHAV VBE VAO AT-LEAST ADJ N UNTIL N N BECAUSE-OF ART PPART AD3 N ON ART N N COMMA N VAO) (* THE LAUNCH OF DISCOVERY AND ITS FIVE ASTRONAUTS HAS BEEN DELAYED AT-LEAST TWO DAYS UNTIL MARCH ELEVENTH BECAUSE-OF A CRUSHED ELECTRICAL PART ON A MAIN ENGINE COMMA OFFICIALS SAID) </SectionTitle> <Paragraph position="0"> Illustration of case grammar acquisition.</Paragraph> <Paragraph position="1"> 8. Discussion and Conclusions It seems remarkable that although the theory of context-sensitive grammars appeared in Chomsky (1957), formal context-sensitive rules seem not to have been used pre- null The launch of discovery and its five astronauts has been delayed at-least two days until march eleventh because-of a crushed electrical part on a main engine comma officials said. Figure 10 Case analysis of a sentence.</Paragraph> <Paragraph position="2"> viously in computational parsing. As researchers we seem simply to have assumed, without experimentation, that context-sensitive grammars would be too large and cumbersome to be a practical approach to automatic parsing. In fact, context-sensitive, binary phrase structure rules with a context composed of the preceding three stack symbols and the next five input symbols, stack1_3 binary-rule input1_5 ---* operation provide several encouraging properties.</Paragraph> <Paragraph position="3"> The linguist uses the full context of the sentence to make a simple decision: either shift a new element onto the stack or combine the top two elements into a phrase category.</Paragraph> <Paragraph position="4"> The system compiles a CS rule composed of ten symbols, the top five elements of the stack and the next five elements of the input string. The context of the embedded binary rule specializes that rule for use in similar environments, thus providing selection criteria to the parser for the choice of shift or reduce, and for assigning the phrase name that has most frequently been used in similar environments. The context provides a simple but powerful approach to preference parsing.</Paragraph> <Paragraph position="5"> Computational Linguistics Volume 18, Number 4 * As a result, a deterministic bottom-up parser is notably successful in finding precisely the parse tree that the linguist who constructed the analysis of a sentence had in mind--and this is true whether the grammar is stored as a trained neural network or in the form of hash-table entries.</Paragraph> <Paragraph position="6"> * Despite the large combinatoric space for selecting 1 of 64 symbols in each of 10 slots in the rules--641deg possible patterns experiments in accumulating phrase structure grammar suggest that a fairly complete grammar will require only about 25,000 CS rules.</Paragraph> <Paragraph position="7"> * It is also the case that when redundant rules are removed the CS grammar is reduced by a factor of four and still maintains its accuracy in parsing.</Paragraph> <Paragraph position="8"> * Because of the simplicity and regular form of the rule structure, it has proved possible to construct an acquisition system that greatly facilitates the accumulation of grammar. The acquisition system presents contexts and suggests operations that have previously been used with similar contexts; thus it helps the linguist to maintain consistency of judgments. * Parsing with context-sensitive rules generalizes from phrase structure rewriting rules to the transformational rules required by case analysis. Since the case analysis rules retain a regular, simple form, the acquisition system also generalizes to case grammar.</Paragraph> <Paragraph position="9"> Despite such advantageous properties, a few cautions should be noted. First, the deterministic parsing algorithm is sufficient to apply the CDG to the sentences from which the grammar was derived, but to accomplish effective generalization to new sentences, a bandwidth parsing algorithm that follows multiple parsing paths is superior. Second, the 99% accuracy of the parsing will deteriorate markedly if the dictionary lookup makes errors in word assignment. Thirdly, the shift/reduce parsing is unable to give correct analyses for such embedded discontinuous constituents as &quot;I saw the man yesterday who .... &quot; Finally, the actual parsing structures that we have presented here are skeletal. We did not mark mood, aspect or tense of verbs, number for nouns, or deal with long distance dependencies. We do not resolve pronoun references; and we do not complete ellipses in conjunctive and other constructions.</Paragraph> <Paragraph position="10"> Each of these shortcomings is the subject of continuing research. For the present, the output of the case parser provides the nested, labeled, propositional structures which, supported by a semantic knowledge base, we have customarily used to accomplish focus-tracking of topics through a continuous text to compute labeled outlines and other forms of discourse structure (Seo 1990; Rim 1990; Alterman 1985). During this process of discourse analysis, some degapping, completion of ellipsis, and pronoun resolution is accomplished.</Paragraph> <Section position="1" start_page="413" end_page="414" type="sub_section"> <SectionTitle> 8.1 Conclusions </SectionTitle> <Paragraph position="0"> From the studies presented in this paper we conclude: . Context-Dependent Grammars (CDGs) are computationally and conceptually tractable formalisms that can be composed easily by a linguist and effectively used by a deterministic parser to compute phrase structures and case analyses for subsets of newspaper English.</Paragraph> <Paragraph position="1"> Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English 2. The contextual portions of the CDG rules and the scoring formula that selects the rule that best matches the parsing context allow a deterministic parser to provide preferred parses, reflecting the linguist's meaning-based judgments.</Paragraph> <Paragraph position="2"> 3. The CDG acquisition system described earlier simplifies linguistic judgments and greatly improves a linguist's ability to construct relatively large grammars rapidly.</Paragraph> <Paragraph position="3"> 4. Although a deterministic, bottom-up parser has been sufficient to provide highly accurate parses for the 345-sentence sample of news text studied here, we believe that a multi-path parser proves superior in its ability to analyze sentences beyond the sample on which the grammar was developed.</Paragraph> <Paragraph position="4"> 5. With 3,843 compressed CDG rules, the acquisition system is about 85% accurate in suggesting the correct parsing for constituents from texts it has not experienced.</Paragraph> <Paragraph position="5"> 6. For phrase structure analysis, the context-free core of the CS rules will be 99% complete when we have accumulated about 25,000 CS rules. At that point it should be possible for a multi-path parser to find a satisfactory analysis for almost all news story sentences.</Paragraph> <Paragraph position="6"> We have shown that the acquisition and parsing techniques apply also to CDG grammars for computing structures of case propositions to represent sentences. In this application, however, much more research is needed to better define linguistic systems for case analysis, and for their application to higher levels of natural language understanding.</Paragraph> </Section> </Section> <Section position="10" start_page="414" end_page="414" type="metho"> <SectionTitle> Acknowledgments </SectionTitle> <Paragraph position="0"> This work was partially supported by the Army Research Office under contract DAAG29-84-K-0060.</Paragraph> </Section> class="xml-element"></Paper>