File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/p91-1015_metho.xml

Size: 14,160 bytes

Last Modified: 2025-10-06 14:12:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1015">
  <Title>Head Corner Parsing for Discontinuous Constituency</Title>
  <Section position="3" start_page="115" end_page="117" type="metho">
    <SectionTitle>
2 A sample grammar
</SectionTitle>
    <Paragraph position="0"> In this section I present a simple F-LCFR grammar for a (tiny) fragment of Dutch. As a caveat I want to stress that the purpose of the current section is to provide an example of possible input for the parser to be defined in the next section, rather than to provide an account of phenomena that is completely satisfactory from a linguistic point of view.</Paragraph>
    <Paragraph position="1"> Grammar rules are written as (pure) Prolog clauses. 1 Heads select arguments using a subcat list. Argument structures are specified lexically and are percolated from head to head. Syntactic features are shared between heads (hence I make the simplifying assumption that head functor, which may have to be revised in order to treat modification). In this grammar I use revised versions of Pollard's head wrapping operations to analyse cross serial dependency and verb second constructions. For a linguistic background of these constructions and analyses, cf.</Paragraph>
    <Paragraph position="2"> Evers (1975), Koster (1975) and many others.</Paragraph>
    <Paragraph position="4"> (for lexical entries), where Head represents the designated head daughter, Mother the mother category and Other a list of the other daughters.</Paragraph>
    <Paragraph position="5"> Each category is a term x(Syn,Subcat,Phon,Sem,Rule) where Syn describes the part of speech, Subcat 1 It should be stressed though that other unification grammar formalisms can be extended quite easily to encode the same grammar. I implemented the algorithm for several grammars written in a version of PATR II without built-in string concate~aation.</Paragraph>
    <Paragraph position="6"> is a list of categories a category subcategorizes for, Phon describes the string that is dominated by this category, and Sere is the argument structure associated with this category. Rule indicates which rule (i.e. version of the combine predicate eb to be defined below) should be applied; it generalizes the 'Order' feature of UCG. The value of Phon is a term p(Left,Head,RPSght) where the fields in this term are difference lists of words.</Paragraph>
    <Paragraph position="7"> The first argument represents the string left of the head, the second argument represents the head and the third argument represents the string right of the head. Hence, the string associated with such a term is the concatenation of the three arguments from left to right. There is only one parameterized, binary branching, rule in the grammar: null</Paragraph>
    <Paragraph position="9"> cb(R, PI, P2, P).</Paragraph>
    <Paragraph position="10"> In this rule the first element of the subcategorization list of the head is selected as the (only) other daughter of the mother of the rule. The syntactic and semantic features of the mother and the head are shared. Furthermore, the strings associated with the two daughters of the rule are to be combined by the cb predicate. For simple (left or right) concatenation this predicate is defined as follows: cb(left, p(L4-L.H,R),</Paragraph>
    <Paragraph position="12"> p(L,H,RI-R)).</Paragraph>
    <Paragraph position="13"> Although this looks horrible for people not familiar with Prolog, the idea is really very simple. In the first case the string associated with the argument is appended to the left of the string left of the head; in the second case this string is appended to the right of the string right of the head. In a friendlier notation the examples may look like:</Paragraph>
    <Paragraph position="15"> Lexical entries for the intransitive verb 'slaapt' (sleeps) and the transitive verb 'kust' (kisses) are defined as follows:</Paragraph>
    <Paragraph position="17"> kiss(A,B),_)).</Paragraph>
    <Paragraph position="18"> Proper nouns are defined as: rule( x(n, \[\] ,p(P-P, \[pier \[T\]-T,R-R), pete,_)).</Paragraph>
    <Paragraph position="19"> and a top category is defined as follows (complementizers that have selected all arguments, i.e. sentences): top(x(comp,\[\] ...... )).</Paragraph>
    <Paragraph position="20"> Such a complementizer, eg. 'dat' (that) is defined as:</Paragraph>
    <Paragraph position="22"> The choice of datastructure for the value of Phon allows a simple definition of the verb raising (vr) version of the combine predicate that may be used for Dutch cross serial dependencies: cb(vr, p(L1-L2,H,R3-R), p(L2-L,R1-R2,R2-R3), p(L1-L,H,R1-R)).</Paragraph>
    <Paragraph position="23"> Here the head and right string of the argument are appended to the right, whereas the left string of the argument is appended to the left. Again, an illustration might help:</Paragraph>
    <Paragraph position="25"> A raising verb, eg. 'ziet' (sees) is defined as: rule(x(v,\[x(n, \[\] ,_,InfSubj,left),</Paragraph>
    <Paragraph position="27"> see(A,B),_)).</Paragraph>
    <Paragraph position="28"> In this entry 'ziet' selects -- apart from its npsubject -- two objects, a np and a VP (with category inf). The inf still has an element in its subcat list; this element is controlled by the np (this is performed by the sharing of InfSubj). To derive the subordinate phrase 'dat jan piet marie ziet kussen' (that john sees pete kiss mary), the main verb 'ziet' first selects its rip-object 'piet' resulting in the string 'piet ziet'. Then it selects the infinitival 'marie kussen'. These two strings are combined into 'piet marie ziet kussen' (using the vr version of the cb predicate). The subject is selected resulting in the string 'jan pier marie ziet kussen'. This string is selected by the complementizer, resulting in 'dat jan piet marie ziet kussen'. The argument structure will be instantiated as that (sees (j elm, kiss (pete, mary))). In Dutch main clauses, there usually is no overt complementizer; instead the finite verb occupies the first position (in yes-no questions), or the second position (right after the topic; ordinary declarative sentences). In the following analysis an empty complementizer selects an ordinary (finite) v; the resulting string is formed by the following definition of C/b: cb(v2, p(A-A,B-B,C-C), p(R1-R2,H,R2-R), p(A-A,H,RI-R)).</Paragraph>
    <Paragraph position="29"> which may be illustrated with:</Paragraph>
    <Paragraph position="31"> The finite complementizer is defined as: xatle(xCcomp, \[xCv, FI ,_,A,v2)\], p(B-B,C-C,D-D), that (A),_)).</Paragraph>
    <Paragraph position="32"> Note that this analysis captures the special relationship between complementizers and (fronted) finite verbs in Dutch. The sentence 'ziet jan piet marie kussen' is derived as follows (where the head of a string is represented in capitals): inversion: ZIET jan piet marie kussen /\ e left: jan piet marie ZIET kussen /\ raising: piet marie ZIET kussen JAN /\ left: piet ZIET left: marie KUSSEN /\ /\</Paragraph>
  </Section>
  <Section position="4" start_page="117" end_page="119" type="metho">
    <SectionTitle>
ZIET PIET KUSSEN MARIE
3 The head corner parser
</SectionTitle>
    <Paragraph position="0"> This section describes the head-driven parsing algorithm for the type of grammars described above. The parser is a generalization of a left-corner parser. Such a parser, which may be called a 'head-corner' parser, ~ proceeds in a bottom-up way. Because the parser proceeds from head to head it is easy to use powerful top-down predictions based on the usual head feature percolations, and subcategorization requirements that heads require from their arguments.</Paragraph>
    <Paragraph position="1"> In left-corner parsers (Matsumoto et aL, 1983) the first step of the algorithm is to select the left2This name is due to Pete White.lock.</Paragraph>
    <Paragraph position="2"> most word of a phrase. The parser then proceeds by proving that this word indeed can be the left-corner of the phrase. It does so by selecting a rule whose leftmost daughter unifies with the category of the word. It then parses other daughters of the rule recursively and then continues by connecting the mother category of that rule upwards, recursively. The left-corner algorithm can be generalized to the class of grammars under consideration if we start with the seed of a phrase, instead of its leftmost word. Furthermore the connect predicate then connects smaller categories upwards by unifying them with the head of a rule. The first step of the algorithm consists of the prediction step: which lexical entry is the seed of the phrase? The first thing to note is that the words introduced by this lexical entry should be part of the input string, because of the nonerasure requirement (we use the string as a 'guide' (Dymetman ef al., 1990) as in a left-corner parser, but we change the way in which lexical entries 'consume the guide'). Furthermore in most linguistic theories it is assumed that certain features are shared between the mother and the head. I assume that the predicate head/2 defines these feature percolations; for the grammar of the foregoing section this predicate may be defined as: head(x(Syn ..... Sent,_), x(ST- ..... Sn,_)).</Paragraph>
    <Paragraph position="3"> As we will proceed from head to head these features will also be shared between the seed and the top-goal; hence we can use this definition to restrict lexical lookup by top-down prediction. 3 The first step in the algorithm is defined as:</Paragraph>
    <Paragraph position="5"> string(SmallCat,Words), subset(Words,PO,P).</Paragraph>
    <Paragraph position="6"> Instead of taking the first word from the current input string, the parser may select a lexical en3In the general case we need to compute the transitive closure of (restrictions of) pcesible mother-head relationships. The predicate 'head may also be used to compile rules into the format adopted here (i.e. using the definition the compiler will identify the head of a rule).  try dominating a subset of the words occuring in the input string, provided this lexical entry can be the seed of the current goal. The predicate subset(L1,L2,L3) is true in case L1 is a subset of L2 with complement L3. 4 The second step of the algorithm, the connect part, is identical to the connect part of the left-corner parser, but instead of selecting the left-most daughter of a rule the head-corner parser selects the head of a rule: connect(X,X,P,P).</Paragraph>
    <Paragraph position="7"> connect(Small,Big,PO,P) :rule(Small, Mid, Others), parse_rest(Others,PO,Pl), connect(Mid,Big,PI,P).</Paragraph>
    <Paragraph position="8"> parse_rest( \[\] ,P,P).</Paragraph>
    <Paragraph position="9"> parse_rest(\[HlT\],PO,P) :parse(H,PO,P1), null parse_rest(T,P1,P).</Paragraph>
    <Paragraph position="10"> The predicate 'start_parse' starts the parse process, and requires furthermore that the string associated with the category that has been found spans the input string in the right order.</Paragraph>
    <Paragraph position="11"> start_parse (String, Cat) : top(Cat), null parse (Cat, String, \[\] ), string(Cat, String).</Paragraph>
    <Paragraph position="12"> The definition of the predicate 'string' depends on the way strings are encoded in the grammar. The predicate relates linguistic objects and the string they dominate (as a list of words). I assume that each grammar provides a definition of this predicate. In the current grammar string/2 is defined as follows:  selectchk(H, P0,Pl), subset(T, PI,P).</Paragraph>
    <Paragraph position="13"> select.chk (El, \[El IP\] ,P) :!. null select_chk (El, \[HIP0\], \[HIP\] ) :select.chk (El, P0, P) .</Paragraph>
    <Paragraph position="14">  The cut in select.chkls necessary in case the same word occurs twice in the input string; without it the parser would not be 'minima/'; this could be changed by indexins words w.r.t, their position, hut I will not assume this complication here, string(x( .... Phon .... ),Str):copy_term(Phon,Phon2), null str(Phon2,Str).</Paragraph>
    <Paragraph position="15"> str(p(P-P1,P1-P2,P2-\[\]),P).</Paragraph>
    <Paragraph position="16"> This predicate is complicated using the predicate copy_term/2 to prevent any side-effects to happen in the category. The parser thus needs two grammar specific predicates: head/2 and string/2.</Paragraph>
    <Paragraph position="17"> Example. To parse the sentence 'dat jan slaapt', the head corner parser will proceed as follows. The first call to 'parse' will look like: parse (x(colap, \[\] ...... ), \[dat, j an, slaapt\], \[\] ) The prediction step selects the lexical entry 'dat'. The next goal is to show that this lexical entry is the seed of the top goal; furthermore the string that still has to be covered is now \[jan,slaapt\]. Leaving details out the connect clause looks as : connect ( x(comp, Ix(v,.. ,right)\],.. ), x(comp, 17,.. ), \[jan, slaapt\], \[\] ) The category of dat has to be matched with the head of a rule. Notice that dat subcategorises for a v with rule feature right. Hence the right version of the cb predicate applies, and the next goal is to parse the v for which this complementizer subcategorizes, with input 'jan, slaapt'. Lexical lookup selects the word slaapt from this string. The word slaapt has to be shown to be the head of this v node, by the connect predicate. This time the left combination rule applies and the next goal consists in parsing a np (for which slaapt subcategorizes) with input string jan. This goal succeeds with an empty output string. Hence the argument of the rule has been found successfully and hence we need to connect the mother of the rule up to the v node. This succeeds trivially, and therefore we now have found the v for which dat subcategorizes. Hence the next goal is to connect the complementizer with an empty subcat list up to the topgoal; again this succeeds trivially. Hence we obtain the instantiated version of the parse call:  parse(x(comp, \[\] ,p(P-P, \[dat IT\]-T, \[jan,slaapt \[q\]-q), that (sleeps (j ohn) ),_), \[dat, j an, slaapt\], O ) and the predicate start_parse will succeed, yielding: Cat = x(comp, \[\] ,p(P-P, \[dat \[T\]-T, \[jan, slaapt IQ\]-q), that (sleeps (john) ), _)</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML