File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0113_metho.xml
Size: 12,596 bytes
Last Modified: 2025-10-06 14:12:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W91-0113"> <Title>COMPILING TRACE & UNIFICATION GRAMMAR FOR PARSING AND GENERATION</Title> <Section position="4" start_page="102" end_page="105" type="metho"> <SectionTitle> 3 PROCESSING TRACE & UNIFICATION GRAMMAR </SectionTitle> <Paragraph position="0"> TUG can be processed by a parser and a generator. Before parsing and generation, the grammar is compiled to a more efficient form.</Paragraph> <Paragraph position="1"> 5Currently, only conjunction of equations is allowed in the definition of bounding nodes.</Paragraph> <Paragraph position="2"> The first compilation step is common to generation and parsing. The attribute-value-pair structure is transformed to (PROLOG) term structure by a TUG-tO-.DCG converter. This transformation makes use of the type definitions. As an example consider the transformation of the grammar</Paragraph> <Paragraph position="4"> It is transformed to the following grammar in a DCG like format e.</Paragraph> <Paragraph position="6"> The compilation steps following the TUG-to-DCG converter are different for parsing and generation.</Paragraph> <Section position="1" start_page="103" end_page="105" type="sub_section"> <SectionTitle> 3.1 THE PARSER GENERATOR </SectionTitle> <Paragraph position="0"> In the LKP, a TUG is processed by a Tomita parser (Tomita 1986). For usage in that parser the result of the TUG-tO-DCG converter is compiled in several steps: First, head movement rules are eliminated and the grammar is expanded by introducing slash rules for the head path by the head movement expander. Suppose the Tuo-to-DCG converter has produced the following fragment: eNote that the goal {A &quot; 1 ; 1;1(A.2) ,m B} is interpreted as a constraint and not as a PROLOG goal as in DCGs. See Block/Schmld (1991) for the evaluation of the constraints.</Paragraph> <Paragraph position="1"> v(_) is_head_of vk(_).</Paragraph> <Paragraph position="2"> vk(_) is_head_of vp(_).</Paragraph> <Paragraph position="3"> vp(_) is_head_of s(_).</Paragraph> <Paragraph position="5"> Then, the head movement expander introduces slash rules ~ along the head-path, thereby introducing the empty nonterminals push(X) and pop(X).</Paragraph> <Paragraph position="6"> rules solar</Paragraph> <Paragraph position="8"> v_vCV)---> \[popCvCV))\].</Paragraph> <Paragraph position="9"> empty productions for push and pop push(X) ---> \[\].</Paragraph> <Paragraph position="10"> pop(X) ---> \[\].</Paragraph> <Paragraph position="11"> push(X) and pop(X) are &quot;marker rules&quot; (Aho/Sethi/Ullman 1986) that invoke the parser to push and pop their argument onto and off a left-to-right stack. This treatment of head movement leads to a twofold prediction in the Tomita parser. First, the new slash categories will lead to LR parsing tables that predict that the verb will be missing if rule sl ---> ... has applied. Second, the feature structure of the verb is transported to the right on the left-to-right stack. Therefore, as soon as a v_v is expected, the whole information of the verb, e.g. its subcategorization frame, is available. This strategy leads to a considerable increase in parsing efficiency.</Paragraph> <Paragraph position="12"> In the next compilation phase, argument movement rules are transformed to the internal format. For the control of gaps a gap-threadding mechanism is introduced. Following Chen e.a. (1988), the gap features are designed as multisets, thus allowing crossing binding relations as mentioned in section 2.</Paragraph> <Paragraph position="13"> 7 A slashed category XlY= is represented using the under.score character Z_Y=.</Paragraph> <Paragraph position="15"> To see the effect of this compilation step, take the following fragment as output of the head movement expander.</Paragraph> <Paragraph position="16"> boundlng_node(s(_)).</Paragraph> <Paragraph position="18"> trace(np(_)).</Paragraph> <Paragraph position="19"> The argument movement expander transforms this to the followifig grammar.</Paragraph> <Paragraph position="21"> The predicates cut_trace/3 and bound/1 are defined as in Chen e.a. (1988).</Paragraph> <Paragraph position="22"> The next step, the empty production eliminater, eliminates all empty productions except those for push and pop. This transforms the output of the argument movement expander to the following grammar.</Paragraph> <Paragraph position="24"> v(Gi,\[trace(_,np(SP))\[Go\],V).</Paragraph> <Paragraph position="25"> Elimination of empty productions allows for a simpler implementation of the Tomita parser, which again leads to an increased efficiency.</Paragraph> <Paragraph position="26"> The next step, the DcG-to-LRK converter splits the grammar rules into a context free and a DCG part. A context free rule is represented as rule(No,LIIS,RHS), a DCG rule as dcg_rule (No, LHS, RHS, Constraint). Rules are synchronized by their numbers. After this step the above grammar fragment is represented in the following format.</Paragraph> <Paragraph position="27"> rule(l, sl 0 \[np, s\] ).</Paragraph> <Paragraph position="28"> rule (2, s I, \[s\] ).</Paragraph> <Paragraph position="29"> rule(S, s, \[np, vp\] ).</Paragraph> <Paragraph position="30"> rule (4, s, \[vp\] ).</Paragraph> <Paragraph position="31"> rule(S,vp, \[v\] ).</Paragraph> <Paragraph position="32"> true).</Paragraph> <Paragraph position="33"> Note that during this step, different rules that share the same context free backbone are transformed to a single context free rule. The difference in their feature structure is expressed in a disjunction in the Constraint (e.g. rule 5). As very often traces occur in optional positions (e.g. objects, as in vp ---> v. vp ---> v, np), the elimination of empty productions (traces) considerably reduces the amount of edges the parser has to build.</Paragraph> <Paragraph position="34"> After these compilation steps the context free rules are transformed to YACC format and YACC is used to compute the LR parsing table. Finally, YACC'S y. output file is transformed to PROLOG.</Paragraph> </Section> </Section> <Section position="5" start_page="105" end_page="107" type="metho"> <SectionTitle> 3.2THE GENERATOR GENERATOR </SectionTitle> <Paragraph position="0"> For generation with TUG an improved version of the semantic-head-driven generator (SHDG) (see Shieber e.a. 1990) is used. Before beeing useful for generation, the grammar is transformed in the following steps: * expansion of head movement rules * transformation to the semantic head driven generator format * expansion of movement rules * elimination of nonchainrules with uninstantiated semantics * goal reordering and transformation to exe- null cutable prolog code First, the head movement expander transforms the head movement rules. As in the parser generator, slashed categories are generated along the head path, but no push and pop categories are introduces. Instead, the head movement rule and the trace are treated similar to argument movement. The resulting relevant new rules from the example above are: newly introduced slash rules</Paragraph> <Paragraph position="2"> vk_v(VZ) ---> \[...,v_v(V),...\].</Paragraph> <Paragraph position="3"> trace(_,v_v(V)).</Paragraph> <Paragraph position="4"> In the next step rule symbols are transformed to the node(Cat,S,S0) format needed by the semantic-head-driven generator. Thereby disjunctions on the semantic argument as in the following example</Paragraph> <Paragraph position="6"> are unfolded (multiplied out) to different rules.</Paragraph> <Paragraph position="7"> The output of this step for the above rule is:</Paragraph> <Paragraph position="9"> node(b(SSem),S,Sl), node(c(Sem),Sl,SO).</Paragraph> <Paragraph position="10"> Obviously, unfolding of semantic disjunctions is necessary for a correct choice of the semantic head. The next compilation cycle expands the movement rules. Similar to the parser generator two arguments for gap threaddin8 are introduced. The filling of the arguments and the transformation of the movement rules is different from the parser generator. It is a rather complicated operation which is sensitive to the semantics control flow. Given a rule a(A) ---> b(B)<trace(var,b(BT)), c(C)} we can distinguish two cases: 1) The rule is a nonchain rule in the sense of Shieber e.a. (1990) or it is a chain rule and the antecedent of the trace is the semantic head. In this case the antecedent has to be generated prior to the trace. A typical example is a predicate logic analysis as in:</Paragraph> <Paragraph position="12"> S,Sl), nods(s(SemIn),Sl,SO). As the antecedent carries the semantic information, it is expanded at the landing site, while the the semantic head, then this head has to be generated prior to the antecedent. As the head might contain the trace, it also has to be generated prior to its antecedent. Consider the rule: node(sl(Sem),S,SO) ---> node(np(NPSem)<trace(var,np(NPSem)), S,SI), node(s(Sem),Sl,S0).</Paragraph> <Paragraph position="13"> In this rule s is generated prior to np. Within s, the trace of np w'ill be generated. Following the suggestion in Shieber e.a. (1990), rules like this axe compiled in such a way that an antecedent is generated in the~ trace position without linking it to the input st,ing. This antecedent is then added to the set of gaps together with its starting and ending position (coded as a difference list). When generation domes to the landing site, the antecedent is cut out of the trace set. Thereby its starting and ending position is unified with the landing site's star( and end positions. The translation of the above I rule is: node(np(G,G~NPSem),S,SO).</Paragraph> <Paragraph position="14"> In the next steP, a certain class of nonchain rules is eliminated from the grammar. One of the basic inefficiencies of the semantic-head-driven generator in Shiebcr e.a. (1990) has its origin in nonchaln rules who~se left-hand-side-semantics is a variable. This kin d of nonchain rule often results from empty productions or lexicon entries of semantically empty words. For instance, in a grammar and lexicon fragment like vk(SC)/Sem ---> aux(VKSem,SC,VKSC)/Sem, vk(VKSC)/VKSem, aux (VKSem, SC, SC)/past (VKSem) ---> \[has\]. aux (Sere, SC, \[Subj \[ SC\] )/Sere ---> \[is\]. the rule introducing is is a nonchain rule whose semantics is a variable and thus cannot be indexed properly. Rules like this one are eliminated by a partial evaluation technique. For each grammar rule that contains the left-hand-side of the rule on its right-hand-side/~ a copy of the rule is produced where the variables ~ are unified with the left-hand-side of the nonchain rule and the corresponding right-hand-side element is replaced with the right-hand-side of the nonchain rule. E.g. insertion of the rule for is into the vk-rule above leads to vk(SC)/Sem ---> \[is\], vk(\[Subj I SC\])/Sem. which is a normal chain rule.</Paragraph> <Paragraph position="15"> A final compilation transforms the rules to executable PROLOG code and sorts the right hand side to achieve a proper semantics information flow. Suppose that, in the following nonchaln rule the first argument of a category is its semantics argument. null node(a(f(Sem)),S,SO) ---> node(b(BSem)oS,Sl), node(c(CSem,BSem),SI,S2), node(d(Sem,CSem),S2,SO).</Paragraph> <Paragraph position="16"> The righthand side has to be ordered in such a way that all semantics arguments have a chance to be instantiated when the corresponding category is expanded, as in the following rule: node(a(f(Sem)),S,SO) ---> node(d(Sem,CSem),S2,SO), node(c(CSem,BSem),Sl,S2), node(b(BSem),S,S1).</Paragraph> <Paragraph position="17"> This ordering is achieved by a bubble-sort like mechanism. Elements of the right-hand-side are sorted into the new right-hand-side from right to left. To insert a new element e, ew into an (already sorted) list el ... ei, e,,, is inserted into e, ...ei-1 if the semantics argument of e, ew is not equal to some argument of ei, otherwise it is sorted after ei.</Paragraph> <Paragraph position="18"> In the final PROLOG code nonchaln rules are indexed by the functor of their lefthand side's semantics as in the following example.</Paragraph> <Paragraph position="19"> sl,s2)), generate(BSem,node(b(BSem),S,S1)), a(node(a(Sem),S,SO),Exp).</Paragraph> <Paragraph position="20"> The auxiliary predicates needed for the generator then can be reduced to bottom-up termination rules C(X,X) for all syntactic category symbols C and the predicate for generate/2:</Paragraph> </Section> class="xml-element"></Paper>