File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/82/j82-3003_concl.xml

Size: 17,688 bytes

Last Modified: 2025-10-06 13:55:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="J82-3003">
  <Title>Using Semantics in Non-Context-Free Parsing of Montague Grammar 1</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
2. Input-Refined Grammars
</SectionTitle>
    <Paragraph position="0"> We now switch our point of view and examine equivalence parsing not in algorithmic terms but in formal grammatical terms. This will then lead into showing how equivalence parsing relates to Universal 134 American Journal of Computational Linguistics, Volume 8, Number 3-4, July-December 1982 David Scott Warren and Joyce Friedman Using Semantics in Non-Context-Free Parsing Grammar (UG) (Montague 1970). The basic concept to be used is an input-refined grammar. We begin by defining this concept for context-free grammars and using it to relate the tabular context-free recognition algorithms of Earley 1970, Cocke-Kasami-Younger (Kasami 1965), and Sheil 1976 to each other and eventually to our algorithm.</Paragraph>
    <Paragraph position="1"> Given a context-free grammar G and a string s over the terminal symbols of G, we define from G and s a new grammar Gs, called an input-refinement of G.</Paragraph>
    <Paragraph position="2"> This new grammar G s will bear a particular relationship to G: L(Gs) = {s}nL(G), i.e., L(Gs) is the singleton set {s} if s is in L(G), and empty otherwise. Furthermore, there is a direct one-to-one relationship between the derivations of s in G and the derivations of s in G s. Thus the problem of recognizing s in G is reduced to the problem of determining emptiness for the grammar G s. Also, the problem of parsing s with respect to the grammar G reduces to the problem of exhaustive generation of the derivations of G s (there is at most one string). Each of the tabular context-free recognition algorithms can be viewed as implicitly defining this grammar G s and testing it for emptiness.</Paragraph>
    <Paragraph position="3"> Emptiness testing is essentially done by reducing the grammar, that is by eliminating useless symbols and productions. The table-constructing portion of a tabular recognition algorithm, in effect, constructs and reduces the grammar Gs, thus determining whether or not it is empty. The tabular methods differ in the construction and reduction algorithm used.</Paragraph>
    <Paragraph position="4"> In each case, to turn a tabular recognition method into a parsing algorithm, the table must first be constructed and then reprocessed to generate all the parses. This corresponds to reprocessing the grammar Gs t, the result of reducing the grammar Gs, and using it to exhaustively generate all derivations in G s.</Paragraph>
    <Paragraph position="5"> Rather than formally defining G s from a context-free grammar G and a string s in the general case, we illustrate the definition by example. The general definition should be clear.</Paragraph>
    <Paragraph position="6"> Let G be the following context-free grammar:  These productions for G s were constructed by beginning with a production of G, adding a subscript or a superscript to the nonterminal on the LHS to obtain a nonterminal of Gs, adding single subscripts to all terminals and sequence subscripts to some nonterminals on the RHS so that the concatenation of all subscripts on the RHS equals the subscript on the LHS. For the RHS nonterminals without subscripts, add the appropriate subscript. Also, to handle the terminals, for each t i add the production Ti~t where t is the i th symbol in s.</Paragraph>
    <Paragraph position="7"> It is straightforward to show inductively that if a nonterminal symbol generates any string at all it generates exactly the substring of s that its subscript determines. Symbols with superscripts generate the empty string. Also a parse tree of G s can be converted to a parse tree of G by first deleting all terminals (each is dominated by the same symbol with a subscript) and then erasing all superscripts and subscripts on all symbols in the tree. Conversely, any parse tree for s in G can be converted to a parse tree of s in G s by adding appropriate subscripts and superscripts to all the symbols of the tree and then adding the terminal symbols at the leaves.</Paragraph>
    <Paragraph position="8"> American Journal of Computational Linguistics, Volume 8, Number 3-4, July-December 1982 135 David Scott Warren and Joyce Friedman Using Semantics in Non-Context-Free Parsing It is clear that G s is not in general a reduced grammar. G s can be reduced to Gs ~ by eliminating unproductive and unreachable symbols and the rules involving them. Reducing the grammar will determine whether or not L(Gs) is empty. By the above discussion, this will determine whether s is in L(G), and thus an algorithm for constructing and reducing the refined grammar G s from G and s yields a recognition algorithm. Also, given the reduced grammar Gs I, it is straightforward, in light of the above discussion, to generate all parses of s in G: simply exhaustively generate the parse trees of Gs ~ and delete subscripts and superscripts.</Paragraph>
    <Paragraph position="9"> The tabular context-free recognition methods of Cocke-Kasami-Younger, Earley, and Sheil can all be understood as variations of this general approach. The C-K-Y recognition algorithm uses the standard bottom-up method to determine emptiness of G s. It starts with the terminals and determines which G s nonterminals are productive, eventually finding whether or not the start symbol is productive. The matrix it constructs is essentially the set of productive nonterminals of G s.</Paragraph>
    <Paragraph position="10"> Sheil's well-formed substring table algorithm is the most obviously and directly related. His simplest algorithm constructs the refined grammar and reduces it top-down. It uses a top-down control mechanism to determine the productivity only of nonterminals that are reachable from the start symbol. The well-formed substring table again consists essentially of the reachable, productive nonterminals of G s.</Paragraph>
    <Paragraph position="11"> Earley's recognition algorithm is more complicated because it simultaneously constructs and reduces the refined grammar. It can be viewed as manipulating sets of subscripted nonterminals and sets of productions of G s. The items on the item lists, however, correspond quite directly to reachable, productive nonterminals of G s.</Paragraph>
    <Paragraph position="12"> The concept of input-refined grammar provides a unified view of the tabular context-free recognition methods. Equivalence parsing as described in Part I above is also a tabular method, although it is not context-free. It applies to context-free grammars and also to some grammars such as PTQ that are not context-free. We next relate it to the very general class of grammars defined by Montague in UG.</Paragraph>
    <Paragraph position="13"> Universal Grammar and Equivalence Parsing In the following discussion of the problem of parsing in the general context of Montague's definitions of a language (which might more naturally be called a grammar) and an interpretation, we assume the reader is familiar with the definitions in UG (Montague 1970). We begin with a formal definition of a refinement of a general disambiguated language. A particular type of refinement, input-refinement, leads to an equivalence parsing algorithm. This generalizes the procedure for input-refining a grammar shown above for the special case of a context-free grammar. We then discuss the implications for equivalence parsing of using the formal interpretation of the language. Finally we show how the ATN for PTQ and semantic equivalence parsing fit into this general framework.</Paragraph>
    <Paragraph position="14"> Recall that a disambiguated language f~ = &lt;A, Fv, X 8, S, 80&gt;v~r,~a can be regarded as consisting of an algebra &lt;A,F~,&gt;~,eF, with proper expressions A and operations Fv, basic expressions X 8 for each category index d eA, a set of syntactic rules S, and a sentence category index 80EA. A language is a pair &lt;~2,R&gt; where ~2 is a disambiguated language and R is a binary relation with domain included in A. Given a disambig- null (Note that the proper expressions A, the operation indexing set F, and the operations Fy of ~ and 12 ~ are the same.) The word refinement refers to the fact that the catgories of lZ are split into finer categories. Condition 1 requires that the basic expressions of a refined category come from the basic expressions of the category it refines. Condition 2 requires that the new syntactic rules be consistent with the old ones. Note that Condition 2 is not a biconditional.</Paragraph>
    <Paragraph position="15"> If 12 t is a refinement of ~2 with refinement function d, &lt;C'8,&gt;~,,~, is the family of syntactic categories of ~2' and &lt;C0&gt;0E a is the family of syntactic categories of ~2, then C'~,-cCd(~, ).</Paragraph>
    <Paragraph position="16"> As a simple example of a refinement, consider an arbitrary disambiguated language ~2 t = &lt;A, Fy, Xts,, d0w&gt;yEr,8, Ea,. NOW let ~2 be the disambiguated language &lt;A, Fy, Xa, S, a&gt;yEi-, in which the set of category names is the singleton set {a}. X a = O~,EA, X~,. Let S be {&lt;Fr, &lt;a,a ..... a&gt;, a&gt; : yeF and the number of a's agrees with the arity of F}. Then f~ is a refinement of ~, with refinement function d:At-~{a}, d(8 ~) = a for all d~C/A ~. Note that the disambiguated language ~2 is completely determined by the algebra &lt;A,Fy&gt;yeF, and is the natural disambiguated language to associate with it. Thus in a formal sense, we can view a disambiguated language as a refinement of its algebra.</Paragraph>
    <Paragraph position="17"> 136 American Journal of Computational Linguistics, Volume 8, Number 3-4, July-December 1982 David Scott Warren and Joyce Friedman Using Semantics in Non-Context-Free Parsing As a more intuitive example of refinement, consider an English-like language with categories term (TE) and intransitive verb phrase (IV) that both include singular and plural forms. The language generated would then allow subject-verb disagreement (assuming the ambiguating relation R does not filter them out). By refining category TE to TEsing and TEpl and category IV to IVsing and IVpl, and having syntactic rules that combine category TEsing with IVsing and TEpl with IVpl only, we obtain a refined language that has subject-verb agreement. A similar kind of refinement could eliminate such combinations as &amp;quot;colorless green ideas&amp;quot;, if so desired.</Paragraph>
    <Paragraph position="18"> With this definition of refinement, we return to the problem of parsing a language L = &lt;~, R&gt;. The problem can now be restated: find an algorithm that, given a string ~, constructs a disambiguated language ~2~ that is an input-refinement of fL That is, f~ is a refinement in which the sentence category Cts, is exactly the set of parses of ~ in L. Finding this algorithm is equivalent to solving the parsing problem. For given such an algorithm, the parsing problem reduces to the problem of generating all members of C'80,.</Paragraph>
    <Paragraph position="19"> In the case of a general language &lt;~, R&gt;, it may be the case that for ~ a string, the input-refined language f~ has finitely many categories. In this case the reduced grammar can be computed and a recursive parsing algorithm exists. If the reduced grammar has infinitely many categories, then the string has infinitely many parses and we are not, in general, interested in trying to parse such languages. It may happen, however, that ~2~ has infinitely many categories, even though its reduction has only finitely many. In this case, we are not guaranteed a recursive parsing algorithm. However, if this reduced language can be effectively constructed, a recursive parsing algorithm still exists.</Paragraph>
    <Paragraph position="20"> The ATN for PTQ represents the disambiguated language for PTQ in the UG sense. The categories of this disambiguated language correspond to the set of possible triples: PTQ category name, contents of SENDR registers at a PUSH to that subnet, contents of the LIFTR registers at the corresponding POP. The input-refined categories include the remainder of the input string at the PUSH and POP. Thus the buckets in the recall table are exactly the input-refined categories. The syntactic execution method is thus an exhaustive generation of all expressions in the sentence category of the input-refined disambiguated language.</Paragraph>
    <Paragraph position="21"> Semantic Equivalence Parsing in OG In UG, Montague inclues a theory of meaning by providing a definition of interpretation for a language.</Paragraph>
    <Paragraph position="22"> Let L = &lt;&lt;A,F,r,Xs,S,t~0&gt;.rEF,SEA,R&gt; be a language.</Paragraph>
    <Paragraph position="23"> An interpretation ,t' for L is a system &lt;B,G~,,f&gt;3,EF such that &lt;B,Gv&gt;v~ r is an algebra similar to &lt;A,F./&gt;3,eF; i.e., for each ~, E F, Fy and G./ have the same number of arguments, and f is a function from O,EAX 8 into B. Note that the algebra &lt;B,G~,&gt;.rE F need not be a free algebra (even though &lt;A,Fy&gt;vC/ r must be). B is the set of meanings of the interpretation ,I,; Gv is the semantic rule corresponding to syntactic rule Fv; f assigns meanings to the basic expressions Xv. The meaning assignment for L determined by if' is the unique homomorphism g from &lt;A,F.r&gt;~,EF into &lt;B,Gy&gt;,/E F that is an extension of f.</Paragraph>
    <Paragraph position="24"> There are two ways to proceed in order to find all the meanings of a sentence ~ in a language L = &lt;f~, R&gt; with interpretation ~. The first method is to generate all members of the sentence category Cts0 , of the input-refined language ~2~. As discussed above, this is done in the algebra &lt;A,F./&gt;~,cF of ~, using the syntactic functions Fv to inductively construct members of A from the basic categories of f~ and members of A constructed earlier and then applying g. The second method is to use the fact that g is a homomorphism from &lt;A,F.~&gt;~,EF into &lt;B,G./&gt;~ F. Because g is a homomorphism, we can carry out the construction of the image of the sentence category entirely in the algebra &lt;B,G~,&gt;~,eF of the interpretation q'. We may use the G functions to construct inductively members of B from the basic semantic categories, that is, the images under g (and f) of the basic syntactic categories, and members of B already constructed. The advantage of carrying out the construction in the algebra of ,t, is that this algebra may not be free, i.e., some element of B may have multiple construction sequences. By carrying out the construction there, such instances can be noticed and used to advantage, thus eliminating some redundant search. There are additional costs, however, associated with parsing in the interpretation algebra q'. Usually, the cost of evaluating a G function in the semantic algebra is greater than the cost of the corresponding F function in the syntactic algebra. Also in semantic parsing, each member of B as it is constructed is compared to the other members of the same refined category that were previously constructed.</Paragraph>
    <Paragraph position="25"> In the PTQ parsing system discussed above, the interpretation algebra is the set of reduced translations. The semantic functions are those obtained from the functions given in the T-rules in PTQ, and reducing and extensionalizing their results. The directed process version of the parser finds the meanings in this algebra by the first method, generating all parses in the syntactic algebra and then taking their images under the interpretation homomorphism. Semantic equivalence parsing for PTQ uses the second method, carrying out the construction of the meaning entirely within the semantic algebra. The savings in the example sentence Bill walks and Mary runs comes about American Journal of Computational Linguistics, Volume 8, Number 3-4, July-December 1982 137 David Scott Warren and Joyce Friedman Using Semantics in Non-Context-Free Parsing because the algebra of reduced translations is not a free algebra, and the redundant search thus eliminated more than made up for the increase in the cost of translating and comparing formulas.</Paragraph>
    <Paragraph position="26"> Summary We have described a parsing algorithm for the language of PTQ viewed as consisting of two parts, a nondeterministic program and an execution method.</Paragraph>
    <Paragraph position="27"> We showed how, with only a change to an equivalence relation used in the execution method, the parser becomes a recognizer. We then discussed the addition of the semantic component of PTQ to the parser. With again only a change to the equivalence relation of the execution method, the semantic parser is obtained.</Paragraph>
    <Paragraph position="28"> The semantic equivalence relation is equality (to within change of bound variable) of reduced extensionalized translations. Examples were given to compare the two parsing methods.</Paragraph>
    <Paragraph position="29"> In the finalportion of the paper we described how the parsing method initially presented in procedural terms can be viewed in formal grammatical terms.</Paragraph>
    <Paragraph position="30"> The notion of input-refinement for context-free grammars was introduced by example, and the tabular context-free recognition algorithms were described in these terms. We then indicated how this notion of refinement can be extended to the UG theory of language and suggested how our semantic parser is essentially parsing in the algebra of an interpretation for the PTQ language.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML