File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/e87-1034_metho.xml

Size: 10,953 bytes

Last Modified: 2025-10-06 14:12:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="E87-1034">
  <Title>DISCONTINUOUS CONSTITUENTS IN TREES, RULES, AND PARSING</Title>
  <Section position="4" start_page="206" end_page="208" type="metho">
    <SectionTitle>
4. DPSG and parsing
</SectionTitle>
    <Paragraph position="0"> From a parser's point of view, a definition of adjacency as given in (24) is not sufficient, since it only applies to nodes within the context of a tree. A parser has the job of constructing such a set from a collection of substructures that may or may not fit together to form one or more trees for the entire sentence. Whether a number of subtrees fit together is not so easy if the end product may be a tree with discontinuities, since the adjacency relation defined by (20) and (24) allows neighbouring nodes to have common daughters. This is clearly undesirable.</Paragraph>
    <Paragraph position="1"> We therefore modify the definition (20) of adjacency by adding the requirement that two substructures (or their top nodes) can only have a precedence relation if they do not share any constituents:  (29) A node x in a collection of substructures for a potential tree (possibly with discontinuities) is to the left of a node y in the same qollection if and only if x's leftmost daughter is left of y's leftmost daughter, and there is no node z which is shared by x and y.</Paragraph>
    <Paragraph position="2"> If the nodes x and y in this definition belong to the same tree, the additional requirement that x and y do not share any constituent is automatically satisfied, due to the &amp;quot;single mother&amp;quot; condition. A parser for DPSG meets certain complications which do not arise in context-free parsing. To see these complications, we consider what would happen when a chart parser for context-free parsing (see Winograd, 1983) is applied to DPSG.</Paragraph>
    <Paragraph position="3"> Context-free chart parsing is a matter of fitting adjoining pieces together in a chart. For example, consider the grammar:  Given the arc V(1,2) in the chart, we look up all those rules which have a &amp;quot;free&amp;quot; V as the first constituent. These rules are placed in a separate list, the &amp;quot;activerule list&amp;quot;. We &amp;quot;bind&amp;quot; the V's in these rules to the V(1,2) arc, i.e. we establish links between them. When all constituents in a rule are bound, the rule is applied. In this case, the VP(I,2) will be built. This procedure is repeated for the new VP node. When nothing more can be done, we move on in the chart. The final result in this example is the chart (32).</Paragraph>
    <Paragraph position="5"> When we use DPSG rules and follow the same procedure, we run into difficulties.</Paragraph>
    <Paragraph position="6"> Consider the example grammar (33).</Paragraph>
    <Paragraph position="8"> For the input &amp;quot;V DET N PART&amp;quot; the first constituent that can be built is NP(2,4); the second is VP(I,5). The VP will activate the S rule, but this rule will not be applied since the NP does not have a binding. And even if it did, the rule would not be applicable as the VP(I,5) and the NP(2,4) are not adjoining in the traditional sense.</Paragraph>
    <Paragraph position="9"> In the next section we describe the provisions, added to a standard chart parser in order to deal with these difficulties.</Paragraph>
    <Paragraph position="10">  5. A modified chart parser for DPSG</Paragraph>
    <Section position="1" start_page="207" end_page="208" type="sub_section">
      <SectionTitle>
5.1 Finding all applicable rules
</SectionTitle>
      <Paragraph position="0"> To make sure that the parser finds all applicable rules of a DPSG, the following addition was made to the parsing algorithm.</Paragraph>
      <Paragraph position="1"> If a rule with internal context is applied, we first follow the standard procedure; subsequently we go through all those rules that appear on the active-rule list as the result of applying the standard procedure, giving bindings to those free constituents that correspond in category to the context-element(s) in the rule that was applied.</Paragraph>
      <Paragraph position="2"> In the case of (33), this means that just before application of the VP rule (after the PART has been bound), we have the active-rule list (34). (Underlining indicates that a constituent is bound).</Paragraph>
      <Paragraph position="4"> We now apply the rule building the VP.</Paragraph>
      <Paragraph position="5"> The standard procedure will add one rule to this list, namely S --&gt; VP + NP. The VP is given a binding, so we obtain the following active-rule list:</Paragraph>
      <Paragraph position="7"> Since the VP-building rule contained an internal context element, the additional procedure mentioned above is now applied; a binding is given to the NP in (a copy of) the S rule. The S arc is now built in the chart, which does not cause any new rules to be added to the active-rule list. There are no free S's  in the old active rule list either, which should be given a binding. So, we can look for other rules containing a free NP.</Paragraph>
      <Paragraph position="8"> There is one such rule, the second in (35), but this one will be neglected because it was already present in the rule list before; see (34). Note that it is essential that this rule is neglected, as there is already a version of the VP-rule on the active-rule list containing an NP with the s a me binding as the context-element.</Paragraph>
      <Paragraph position="9"> It may also be noted that we have combined constituents in this example that are not adjoining in the traditional sense (i.e., in the sense of successive vertex numbers). In particular, we have applied the rule S --&gt; VP(I,5) + NP(2,4). In a case like this, where the vertex numbers indicate that the constituents in a rule are overlapping, we must test whether these constituents form an adjacency sequence. This test is described below.</Paragraph>
    </Section>
    <Section position="2" start_page="208" end_page="208" type="sub_section">
      <SectionTitle>
5.2 The adjacency sequence test
</SectionTitle>
      <Paragraph position="0"> In order to make sure that only consituents are combined that form an adjacency sequence, the parser keeps track of daughter nodes and internal context in a so-called &amp;quot;construction list&amp;quot;, which is added to each arc in the chart; internal context nodes are marked as such in these lists. Whether two (or more) nodes share a constituent, in the sense of common domination, is easily detected with the help of these lists.</Paragraph>
      <Paragraph position="1"> By organizing these lists in a particular way, moreover, they can also be used to determine whether a sequence of constituents is an adjacency sequence in the sense of definition (28). This is achieved by ordering the elements in construction lists in such a way that an element is always either dominated by its predecessor in the list, or is internal context of it, or is a right neighbour of it. For instance, in the above example (25), P and Q have the construction lists (36): (36) P:(A, \[B\], C) Q:(B, \[C\], D).</Paragraph>
      <Paragraph position="2"> The rule S --&gt; P + Q + E is now applicable, since the construction list for S would be the result of merging P's and Q's lists with that of E, which is simply E:(), with the result S:(A, B, C, D, E). From this list, it can be concluded that the triple (P, Q, E) is an adjacency sequence, since (P, Q) is an adjacency pair (since P's leftmost daughter, i.e. A, is adjacent to Q's leftmost daughter, i.e. B, as can be seen also in the construction lists), and Q and E are separated in S's construction list by the adjacency pair (C, D), whose elemehts are both daughters of P.</Paragraph>
      <Paragraph position="3"> An example where the adjacency sequence test would give a negative result, is where the rule Y --&gt; X + B + E is considered for a constituent X with construction list X:(A, \[B\], \[C\], D). The rule is not applicable, since the triple (X, B, E) would not form an adjacency sequence according to the construction list that the node Y would get, namely: (37) Y:(A, B, \[C\], D, E).</Paragraph>
      <Paragraph position="4"> The constituents B and E are separated in (37) by the sequence (\[C\], D), where C is marked as internal context; therefore, C is not dominated by either X or B, and hence the test correctly fails.</Paragraph>
      <Paragraph position="5"> The currently implemented version of the DPSG parser is in fact based on a more restricted notion of adjacency sequence, where two constituents are viewed as sharing a constituent z not only if they both dominate z, but also if one of them dominates z and the other has an internal context node that dominates z (see note I). This means that structures like (38) are not generated, since P and T would share node B, and T and R would share node C.</Paragraph>
      <Paragraph position="6"> (38) T A B C D E Note that a structure like (38) would be an ill-formed tree, since the nodes B and C violate the single-mother condition, and the nodes Q and R, moreover, are not connected to the root node.</Paragraph>
      <Paragraph position="7"> To deal with this more restricted notion of adjacency sequence, the administration in the construction lists is actually slightly more complicated than described above.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="208" end_page="209" type="metho">
    <SectionTitle>
6. Conclusions
</SectionTitle>
    <Paragraph position="0"> Our findings concerning the use of discontinuous constituents in syntactic representations, phrase-structure rule, and parsers may be summarized as follows.</Paragraph>
    <Paragraph position="1"> I. Tr e e- 1 i ke s t r uctures with discontinuities can be given a precise definition, which makes them formally as acceptable for use in syntactic  representation as the familiar ord~ tree structures.</Paragraph>
    <Paragraph position="2"> 2. Discontinuous constituents can be allowed in phrase-structure rules generating trees with discontinuities, provided we give a suitable generalization to the notion of adjacency.</Paragraph>
    <Paragraph position="3"> 3. Trees with discontinuities are generalizations of ordinary tree structures, and phrase-structure rules with discontinuous constituents are generalizations of ordinary phrase-structure rules. Both concepts can be added to ordinary phrase-structure grammars, including GPSG, with the effect that such grammars generate trees with discontinuities for sentences with discontinuous constituents, while everything else remains the same.</Paragraph>
    <Paragraph position="4"> 4. Phrase-structure rules with discontinuities can be handled by a chart parser for context-free grammar by making two additions in the administration; one in the active-rule list for rules containing a discontinuous element to make sure that no parse is overlooked, and one in the arcs in the chart to check the generalized adjacency relation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML