File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/67/c67-1009_metho.xml

Size: 17,904 bytes

Last Modified: 2025-10-06 14:11:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="C67-1009">
  <Title>EXPERIMENTS WITH A POWERFUL PARSER</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A B&amp;quot;~B A
B A--~A B
</SectionTitle>
    <Paragraph position="0"> and the string to be parsed contains either &amp;quot;A B y' or &amp;quot;B A,&amp;quot; then the program will continue substituting these sub-strings for one another until the space available for intermediate results is exhausted. This may hot seem to present any particularly severe problem because a pair of rules such as these would never appear in any properly constructed grammar. But, as we shall shortly see, entirely plausible grammars can be constructed for which this problem does arise.</Paragraph>
    <Paragraph position="1"> i. THE FORM OF RULES In order to get a general idea of the capabilities of the program, it will be useful first to consider the notation used for presenting rules to it and the way this is interpreted by the machine. In what follows, we shall assume that the reader is familiar with the terminology and usual conventions of phrase-structure and transformational grammar. An example of the simplest kind of rewrite rule is</Paragraph>
    <Paragraph position="3"> The Y'equals&amp;quot; sign is used in place of the more familiar arrow to separate the left and right-hand sides of the rule. The symbols on which the rules operate are words consisting of between one and six alphabetic characters. The above rule will replace the symbol &amp;quot;VPRSG&amp;quot; by a string of three symbols &amp;quot;PRES SG VERB&amp;quot; whenever it occurs. The following rule will invert the order of the symbols &amp;quot;VERB&amp;quot; and &amp;quot;ING&amp;quot;</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="6" type="metho">
    <SectionTitle>
VERB ING = ING VERB
</SectionTitle>
    <Paragraph position="0"> The simplest way to represent a context free phrase structure rule is as in the ' following example:</Paragraph>
    <Paragraph position="2"> Notice that the normal order of the left and right-hand sides of the rule is reversed because the recognition process consists in rewriting strings as single symbols; the rules must therefore take the form of reductions rather than productions.</Paragraph>
    <Paragraph position="3"> The program will accept phrase structure rules in the form we have shown, but, in applying them, it will not keep a record of the total sentence structure to which they contribute. In other words, it will cause a new string to be constructed, but will not relate this string in any way to the string which was rewritten. One way to cause this relationship to be preserved is to write the rule in the following form : NP.I AUX. 2 VP.3 = S(I 2 3) The number following the symbols on the left,hand side of the rule function very much like the numbers frequently associated with structural indices in transformational rules. When the\]eft-hand side of the rule is found to match a particular sub-string, ~ the number associated with a given symbol in the rule becomes a pointer to, or a temporary name for, that symbol. With this interpretation, the left-hand side of the above rule can be read somewhat as follows &amp;quot;Find an NP and call it i; Find an AUX following this and call it 2; Find a VP following this and call it 3.&amp;quot; The numbers in parentheses after a symbol on the right-hand side of a rule are pointers to items-identified by the left-hand side, and which the new symbol must dominate. In the example, the symbol &amp;quot;S&amp;quot; is to dominate all the symbols mentioned on the left-hand side.</Paragraph>
    <Paragraph position="4"> A pointer may refer to a single symbol, as we :have shown, or to a string of symbols. The following rule is equivalent to the one just described: NP.I AUX. I VP.I = S(1) Furthermore, the string to which a pointer refers need not be continuous. Consider the following</Paragraph>
    <Paragraph position="6"> This will cause any string I'NP AUX VP&amp;quot; to be re-written as &amp;quot;S&amp;quot;, but the &amp;quot;S&amp;quot; will dominate only &amp;quot;NP&amp;quot; and &amp;quot;VP.&amp;quot; There will be no evidence of the intervening &amp;quot;AUX&amp;quot; in the ~nal P-marker which will contain the following phrase:</Paragraph>
    <Paragraph position="8"> Consider now the following pairs of rules:</Paragraph>
    <Paragraph position="10"> If these rules are applied to the string &amp;quot;A B C D&amp;quot; the following P-marker will be formed: /\/\o Notice that the first rule in the pair not only reorders the symbols in the P-marker but forms two phrases simultaneously.</Paragraph>
    <Paragraph position="11"> A different way of using pointer numbers on the right-hand side can be illustrated by comparing the effects of the following two rules:</Paragraph>
    <Paragraph position="13"> What is required, we assume, is a context sensitive phrase structure rule which will rewrite &amp;quot;N SG&amp;quot; as &amp;quot;NOUN&amp;quot; in the environment before &amp;quot;V SG&amp;quot;. The first rule achieves this effect but also introduces a new &amp;quot;J~&amp;quot; dominating the old one, and a new &amp;quot;SG&amp;quot;. The second rule does what it really wanted: It constructs phrase labeled &amp;quot;NOUN&amp;quot; as required, and leaves the symbols referred to by pointer number 2 unchanged.</Paragraph>
    <Paragraph position="14"> The context sensitive rule just considered is pre I sumably intended to insure that singular verbs have only singular subjects. A second rule in which &amp;quot;SG&amp;quot; is replaced by &amp;quot;PL&amp;quot; would be required for plura~ verbs. But, ~nce agreements of this kind may well have to be specified in other parts of the grammar, the situation might better be described by the following three rules:</Paragraph>
    <Paragraph position="16"> N.I NUM. 2 V.3 2 = NOUN(I 2) 3 2 The first two rules introduce a node labeled &amp;quot;NUM'! into the structure above the singular and plural morphemes. The third rule checks for agreement and forms the subject noun phrase. Pointer number 2 is associated with the symbol &amp;quot;NUM&amp;quot; in the second place on the left-hand side, and occurs by itself in the fourth place. This means that the fourth symbol matched by the rule must be &amp;quot;NUM,&amp;quot; and also that it must dominate exactly the same sub-tree as the second.</Paragraph>
    <Paragraph position="17"> In the example we are assuming that &amp;quot;NUM&amp;quot; governs a single node which will be labeled either &amp;quot;SG&amp;quot; or &amp;quot;PL&amp;quot; and the rule will ensure that whichever of these is dominated by the first occurrence of &amp;quot;NUM&amp;quot; will also be dominated by the second occurrence. Notice that noun and verb phrases could be formed simultaneously by the following rule: N.I NUM. 2 V.3 2 = NOUN(I 2) VERB(3 2) The symbols &amp;quot;ANY&amp;quot; and &amp;quot;NULL&amp;quot; are treated in a special way by this program and should not occur in strings to be analyzed. The use of the symbol &amp;quot;NULL&amp;quot; is illustrated in the rule:</Paragraph>
    <Paragraph position="19"> This will cause the symbol &amp;quot;PPH&amp;quot; to be deleted from any string in which occurs. The program is non-deterministic in its treatment of rules of this kind, as elsewhere, so that it zwill consider analyses in which the symbol is deleted, as well as any which can be made by retaining it. The symbol &amp;quot;NULL&amp;quot; is used only on the right-hand sides of rules.</Paragraph>
    <Paragraph position="20"> The symbol &amp;quot;ANY&amp;quot; is used only on the left-hand sides of rules and has the property that the word implies, namely that it will match any symbol in a string. The use of this special symbol is illustrated in the following rule:  VERB.I ANY. I NP.I = VP(1)  This will form a verb phrase from a verb and a noun phrase, With one intervening word or phrase, whose grmmnatical category is irrelevant.</Paragraph>
    <Paragraph position="21"> Elements on the left-hand sides of rules can be specified as optional by writing a dollar sign to the left or right of the symbol as in the following rules:</Paragraph>
    <Paragraph position="23"> The first of these forms a noun phrase from a determiner and a noun, with or without an intervening adjective. The second is a new version of a rule already considered. A verb phrase is formed from a verb and a noun phrase, with or without an intervening word or phrase of some other type.</Paragraph>
    <Paragraph position="24"> Elements can also be specified as repeatable by writing an asterisk against the symbol, as in the following example: VERB. i *NP. i = VP(1) This says that a verb phrase may consist of a verb followed by one or more noun phrases. It is often convenient to be able to specify that a given element may occur zero or more times. This is done in the obvious way by combining the dollar sign and the asterisk as in the following rule: SDET.I *$ADJ. I N.I *PP$.I = NP(1) According to this, a noun may constitute a noun phrase by itself. However the noun may be preceeded by a determiner and any number of adjectives, and followed by a prepositional phrase, and all of these will be embraced by the new noun phrase that is formed. Notice that the asterisk and the dollar sign can be placed before or after the symbol they refer to.</Paragraph>
    <Paragraph position="25"> The combination is often useful with symbol &amp;quot;ANY&amp;quot; in rules of the following kinds N.I NUM.2 *$ANY.3 V.4 2 = NOUN(I 2) 3 VERB(4 2) This is similar to an earlier example. It combines the number morpheneme with a subject noun and with a  verb, provided that the two agree, and ~a~ows for-any number of other symbols to intervene. The symbol &amp;quot;ANY&amp;quot; with an asterisk and a dollar s~g n cor L responds in this system to the so called variables in the familiar notation of transformational grammar. .,.? Consider now the following rule:</Paragraph>
    <Paragraph position="27"> This will form a noun phrase from a subordinating conjunction followed bya nou~phrase, provided that this dominates only the &amp;quot;~ymbol &amp;quot;S.&amp;quot; Any symbol on the left-hand side of the rule may be followed by an expression in parentheses specifying the string of characters that this symbol must directly dominate. This expression is constructed exactly like the left-hand sides of rules. In particular, it may contain symbols followed by expressions in parentheses. The following rule will serve as an illustration of this, and of another new feature:</Paragraph>
    <Paragraph position="29"> This rule calls for a noun phrase consisting of a noun, a preceding adjective which dominates a ~resent participle and, optionally, a number of other elements. This noun phrase is replaced by the determiner from the original noun phrase, if there is one, the elements preceding the noun except for the present participle, the noun itself, the symbol '~H,&amp;quot; the symbol &amp;quot;DEF~&amp;quot; another Copy of the noun, the symbol f~E~&amp;quot; the symbol &amp;quot;ADJ&amp;quot; dominating exactly those elements originally dominated by '~RPRT&amp;quot; and, finally, any following prepositional phrases the original noun phrase may have contained.</Paragraph>
    <Paragraph position="30"> The number &amp;quot;2&amp;quot; in double parentheses following &amp;quot;ADJ&amp;quot; on the right-hand side of this rule specifies that this symbol is to dominate, not the present participle itself, but the elements, if any, that it dominates. This device turns out to have wide utility.</Paragraph>
    <Paragraph position="31"> Double parentheses can also be used following a symbol on the left-hand side of a rule, but with a different interpretation. We have seen how single parentheses are used to specify the strin~ in~nediately dominated by a given symbol. DouSle' parantheses enclose a string which must be a proper analysis of the sub-tree dominated by the given symbol. A string is said to be a proper analysis of a sub-tree if each terminal symbol of the.subtree is dominated by some member of the string. As usual, a symbol is taken to dominate itself. As an example of this, consider the following rule: ART.I S((ART N.2 ANY*)).I 2 = DET(1) 2 This rule applies to a string consisting of an article, a sentence, and a noun. The sentence must be analysable, at some level, as an article followed by a noun, followed by at least one other word or phrase. The noun in the embeded sentence, and the sub-tree it dominates, must be exactly matched by the noun corresponding to the last element on the left-hand side of the rule. The initial article and the embeded sentence will be collected as a phrase under the symbol &amp;quot;DET&amp;quot; and the final noun will be left unchanged.</Paragraph>
    <Paragraph position="32"> The principal facilities available for writing rules have now been exemplified. Another kind of rule is also available which has a left-hand side like those already described but no equal sign or right-hand side. However it will be in the best interests of clarity to defer an explanation of how these rules are interpreted.</Paragraph>
    <Paragraph position="33"> The user of the program may write rules in exactly the form we have described or may addinformation to control the order in which the rules are applied.</Paragraph>
    <Paragraph position="34"> This additional information takes the form of an expression written before the rule and separated from it by a comma. This expression, in its turn, takes one of the following forms:</Paragraph>
    <Paragraph position="36"> n I in an integer which orde~ this rule relative to the others. Since the same integer can be assigned to more than one rule, the ordering is partial. Rules to which no number is explicitly assigned are given the number 0 by the program.</Paragraph>
    <Paragraph position="37"> n 2 and nx, when present, are interpreted as fol- J lows: Egery symbol in the sub-string matched by the left-hand side of the rule must have been produced by a rule with number i, where ng) i~ nq.</Paragraph>
    <Paragraph position="38"> For these purposes the symbols in the 5riginal family of strings offerred for analysis are treated as though they had been produced by a rule with number O.</Paragraph>
  </Section>
  <Section position="5" start_page="6" end_page="6" type="metho">
    <SectionTitle>
2. PHRASE-STRUCTURE GRAMMAR
</SectionTitle>
    <Paragraph position="0"> It will be clear from what has been said already that this program is an exceedingly powerful device capable of operating on strings and trees in a wide variety of ways. It would clearly be entirely adequate for analyzing sentences with a context-free phrase-structure grammar. ~ut this problem has been solved before, and much more simply. We have seen how the notation can be used to write context-sensitive rules, and we should therefore expect the program to be able to analyze sentences with a context-sensitive grammar. However in the design of parsing algorithms, as elsewhere, context-sensitive grammars turn out to be surprisingly more complicated than context-free grammars. The problem that context-sensitive grammars pose for this program can be shown.~with a simple example. I Consider the following in grammar:</Paragraph>
    <Paragraph position="2"> This grammar, though trivial, is well behaved in all important ways. The language generated, though regular and unambigious, is infinite.</Paragraph>
    <Paragraph position="3"> ii am indebted for this example, as for other ideas too numerous to document individually, to Susumu Kuno of Harvard University.</Paragraph>
    <Paragraph position="4"> Furthermore, every rule is useful for some derivation. Since the language generated is unambigious, thegrammar is necessarily cycle-free, in otherwords, it produces no derivation in which the same line occurs more than once. Suppose, however, that the gr~nmar is used for analysis and is presented With the string&amp;quot;A D E&amp;quot; -not a sentence of the language. The attempt to analyze this string using rules of the grammar resuits in a rewriting operation that begins as follows and continues indefinitely:  It would clearly be possible, in principal, to equip the program with a procedure for detecting cycles of this sort, but the timerequired by such a procedure, and the complexity that it would introduce into the program as a whole, are sufficient to rule it out of all practical consideration. It might be argued that the strings which have to be analyzed in practical situations come from real texts and can be assumed to be sentences. The problem of distinguishing sentences from nonsentences is of academic interest. But, in natural languages, the assignment of words to grammatical categories is notoriously ambigious and for this problem to arise it is enough for suitably ambigious words to come together in the sentence. A sentence which would be accepted by the above gram~nar, but which would also give rise to cycles in the analysis, might consist of words with the following grammatical categories:  The program, as it stands, contains no mechanism which automatically guards against cycles. However, if the user knows where they are likely to occur or discovers them as a result of his experience with the program, he can include some special rules in his grammar which will prevent them from occurring. These rules, which we have already eluded to, are formally similiar to all others except that they contain no equals sign and no right-hand side. When a P-marker is found to contain a string which matches the left-hand side of one of these rules, the program arranges that, thence forward, no other rule shall be allowed to apply to the whole string. The cycle in this latest example could not occur if the grammar contained the rule:</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML