File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0311_metho.xml

Size: 16,723 bytes

Last Modified: 2025-10-06 14:09:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0311">
  <Title>Dynamic Dependency Parsing</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Basic Dependency Parsing
</SectionTitle>
    <Paragraph position="0"> Using constraint satisfaction techniques for natural language parsing was introduced rst in (Maruyama, 1990) by de ning a constraint dependency grammar (CDG) that maps nicely on the notion of a CSP. A CDG is a quadruple</Paragraph>
    <Paragraph position="2"> words, R is a set of roles of a word. A role represents a level of language like 'SYN' or 'SEM'.</Paragraph>
    <Paragraph position="3"> L is a set of labels for each role (e.g. f'SUBJ', 'OBJ'g, f'AGENT','PATIENT'g), and C is a constraint grammar consisting of atomic logical formulas. Now, the only thing that is left in order to match a CDGs to a CSPs is to de ne variables and their possible values. For each word of an utterance and for each role we allocate one variable that can take values of the form ei;j = hr; wi; l; wji with r 2 R, wi; wj 2 and l 2 L. ei;j is called the dependency edge between word form wi and wj labeled with l on the description level r. A dependency edge of the form ei;root is called the root edge. Hence a dependency tree of an utterance of length n is a set of dependency edges s = fei;j j i 2 f1; : : :; ng ; j 2 f1; : : :; ng [ frootg ; i 6= jg. From this point on parsing natural language has become a matter of constraint processing as can be found in the CSP literature (Dechter, 2001).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Weighted Dependency Parsing
</SectionTitle>
    <Paragraph position="0"> In (Schr oder, 2002) the foundations of dependency parsing have been carried over to COPs using weighted constraint dependency grammars (WCDG), a framework to model language using all-quanti ed logical formulas on dependency structures. Penalties for constraint violations aren't necessarily static once, but can be lexicalized or computed arithmetically on the basis of the structure under consideration. The following constraints are rather typical once restricting the properties of subject edges:</Paragraph>
    <Paragraph position="2"> Both constraints have a scope of one dependency edge on the syntax level ({X:SYN}). The constraint SUBJ-init is a hard constraint stating that every dependency edge labeled SUBJ shall have a nominal modi er and a nite verb as its modi ee. The second constraint SUBJ-dist is a soft one, such as every edge with label SUBJ attached more than two words away induces a penalty calculated by the term 2.9 / X.length. Note, that the maximal edge length in SUBJ-dist is quite arbitrary and should be extracted from a corpus automatically as well as the grade of increasing penalization. A realistic grammar consists of about 500 such handwritten constraints like the currently developed grammar for German (Daum et al., 2003).</Paragraph>
    <Paragraph position="3"> The notation used for constraints in this paper is expressing valid formulas interpretable by the WCDG constraint system. The following de nitions explain some of the primitives that are part of the constraint language: X is a variable for a dependency edge of the form ei;j = hr; wi; l; wji,  root(X^id) true i wj = root X.length is de ned as ji jj.</Paragraph>
    <Paragraph position="4"> A complete de nition can be found in (Schr oder et al., 1999).</Paragraph>
    <Paragraph position="5"> Figure (1) outlines the overall architecture of the system consisting of a lexicon component, ontologies and other external shallow parsing components, all of which are accessed via constraints a ecting the internal constraint network as far as variables are in their scope. While a static parsing model injects all variables into the ring in Figure (1) once and then waits for all constraints to let the variable instantiations settle in a state of equilibrium, a dynamic optimization process will add and remove variables from the current scope repeatedly.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Dynamic Dependency Parsing
</SectionTitle>
    <Paragraph position="0"> As indicated, the basic parsing model in WCDG is a two stage process: rst building up a constraint network given an utterance and second constructing an optimal dependency parse. In a dynamic system like an incremental dependency parser these two steps are repeated in a loop while consuming all bits from the input that complete a sentence over time. In principle, the problem of converting the static parsing model into a dynamic one should only be a question of repetitive updating the constraint network in a subtle way. Additionally, information about the internal state of the 'constraint optimizer' itself, which is not stored in the constraint net, shall not get lost during consecutive iterations as it (a) might participate in the update heuristics of the rst phase and (b) the parsing e ort during all previous loops might a ect the current computation substantially. We will come back to this argument in Section 8.</Paragraph>
    <Paragraph position="1"> Basically, changes to the constraint network are only allowed after the parser has emitted a parse tree. This is acceptable if the parser itself is interruptible providing the best parse tree found so far. An interrupt may occur either from 'outside' or from 'inside' by the parser itself taking into account the number of pending new words not yet added. So it either may integrate a new word as soon as it arrives or wait until further hypotheses have been checked. As transformation based parsing has strong any-</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.1 Motivation
</SectionTitle>
      <Paragraph position="0"> Analyzing sentence pre xes with a static parser, that i.e. is not aware of the sentence being a prex, will yield at least a penalty for a fragmentary representation. To get such a result at all, the parser must allow partial parse trees. The constraints S-init and frag illustrate modeling of normal and fragmentary dependency trees.</Paragraph>
      <Paragraph position="2"> Constraint S-init restricts all edges with label S to be nite verbs pointing to root. But if some dependency edge is pointing to root and is not labeled with S then constraint frag is violated and induces a penalty of 0.001. So every fragment in sentence (2a) that can not be integrated into the rest of the dependency tree will increase the penalty of the structure by three orders of magnitude. A constraint optimizer will try to avoid an analysis with an overall penalty of at least 1 12 and will search for another structure better than that. Modeling language in a way that (2a) in fact turns out as the optimal solution is therefore di cult. Moreover, the computational e ort could be avoided if a partial tree is promised to be integrated later with fewer costs.</Paragraph>
      <Paragraph position="3"> The only way to prevent a violation of frag in WCDG is either by temporarily switching it o completely or, preferably, by replacing the root attachment with a nonspec dependency as shown in (2b), thereby preventing the prerequisites of frag in the rst place while remaining relevant for 'normal' dependency edges.</Paragraph>
      <Paragraph position="4"> A pre x-analysis like (2a) might turn out to be cognitively implausible as well, as humans expect a proper NP-head to appear as long as no other evidence forces an adjective to be nominalized. Such a thesis can be modeled using nonspec dependency edges.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.2 De nition
</SectionTitle>
      <Paragraph position="0"> We now extend the original de nition of WCDG, so that a dependency tree is devised as s = fei;j j i 2 f1; : : :; ng [ f g; j 2 f1; : : :; ng [ froot; g; i 6= jg. We will use the notation w to denote any unseen word. A dependency edge modifying w is written as ei; , and an edge of the form e ;i denotes a dependency edge of w modifying an already seen word. ei; and e ;i are called nonspec dependency edges.</Paragraph>
      <Paragraph position="1"> Selective changes to the semantics of the constraint language have been made to accomplish nonspec dependency edges. So given two</Paragraph>
      <Paragraph position="3"> nonspec(X^id) true i wi2 = w spec(X^id) true i wi2 2</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.3 Properties
</SectionTitle>
      <Paragraph position="0"> Although every nonspec dependency in Figure (2b) points to the same word w , two nonspec dependency edges are not taken to be connected at the top (X^id = Y^id false) as we don't know yet whether wi and wj will be modifying the same word in the future.</Paragraph>
      <Paragraph position="1"> In general, the above extension to the constraint language is reasonable enough to t into the notion of static parsing, that is a grammar tailored for incremental parsing can still be used for static parsing. An unpleasant consequence of nonspec is, that more error-cases might occur in an already existing constraint grammar for static parsing that was not written with nonspec dependency edges in mind. Therefore we introduced guard-predicates nonspec() and spec() that complete those guard-predicates already part of the language (e.g. root() and exists()).</Paragraph>
      <Paragraph position="2"> These must be added by the grammarian to prevent logical error-cases in a constraint formula. Looking back at the constraints we've discussed so far, the constraints SUBJ-init and S-init have to be adjusted to become nonspecaware because referring to the POS-tag is not possible if the dependency edge under consideration is of the form ei; or e ;i. Thus a pre x-analysis like (2c) is inducing a hard violation of SUBJ-init. We have to rewrite SUBJ-init to allow (2c) as follows:</Paragraph>
      <Paragraph position="4"> When all constraints have been checked for logical errors due to a possible nonspec dependency edge, performance of the modi ed grammar will not have changed for static parsing but will accept nonspec dependency edges.</Paragraph>
      <Paragraph position="5"> Using the nonspec() predicate, we are able to write constraints that are triggered by nonspec edges only being not pertinent for 'normal' edges. For example we like to penalize nonspec dependency edges the older they become during the incremental course and thereby allow a cheaper structure to take over seamlessly. This can easily be achieved with a constraint like nonspec-dist, similar to SUBJ-dist:</Paragraph>
      <Paragraph position="7"> The e ect of nonspec-dist is, that a certain amount of penalty is caused by hSYN, the, DET, w i and hSYN, big, ADJ, w i in (2b). Figure (2c) illustrates the desired pre x-analysis in the next loop when nonspec edges become pricey due to their increased attachment length. In a real-life constraint grammar (2c) will be optimal basically because the head of the NP occurred, therefore overruling every alternative nonspec dependency edges that crosses the head. The latter alternative structure will either cause a projectivity violation with all other non-head components of the NP that are still linked to the head or cause an alternative head to be elected when becoming available.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 Dynamic Constraint Networks
</SectionTitle>
    <Paragraph position="0"> nonspec dependency edges play an important role when updating a constraint network to reect the problem change Pi. Maintaining the constraint network in the rst phase is crucial for the overall performance as a more sophisticated strategy to prune edges might compensate computational e ort in the second phase.</Paragraph>
    <Paragraph position="1"> Figure (3) illustrates a sentence of three words being processed one word wi per timepoint ti as follows:  1. for each edge e of the form ej; or e ;j, (j &lt; i) recompute penalty f(hei). If its penalty drops below , then remove e. Otherwise derive edge e0 on the basis of e 2. add new edges ei; and e ;i to the CN as far as f(hei; i) &lt; and f(he ;ii) &lt; 3. remove each edge e from the CN if it's lo- null cal penalty is lower than the penalty of the best parse so far.</Paragraph>
    <Paragraph position="2"> The parameter is a penalty threshold that determines the amount of nonspec edges being pruned. Any remaining nonspec edge indicates where the constraint network remains extensible  and provides an upper estimate of any future edge derived from it. This holds only if some prerequisites of monotony are guaranteed: The penalty of a parse will always be lower than each of the penalties on its dependency edges (guaranteed by the multiplicative cost function).</Paragraph>
    <Paragraph position="3"> Each nonspec edge must have a penalty that is an upper boundary of each dependency edge that will be derived from it:</Paragraph>
    <Paragraph position="5"> Only then will pruning of nonspec dependency edges be correct.</Paragraph>
    <Paragraph position="6"> As a consequence the overall penalties of pre x-analyses degrade monotonically over time: f(si) &gt;= f(si+1) Note, that the given strategy to update the constraint network does not take the structure of the previous pre x-analysis into account but only works on the basis of the complete constraint network. Nevertheless, the previous parse tree is used as a starting point for the next optimization step, so that near-by parse trees will be constructed within a few transformation steps using the alternatives licensed by the constraint network.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
8 The Optimizer
</SectionTitle>
    <Paragraph position="0"> So far we discussed the rst phase of a dynamic dependency parser building up a series of problems P0; P1; : : : changing Pi using Pi+1 in terms of maintaining a dynamic constraint network.</Paragraph>
    <Paragraph position="1"> In the second phase 'the optimizer' tries to accommodate to those changes by constructing Si+1 on the basis of Si and Pi+1.</Paragraph>
    <Paragraph position="2"> WCDG o ers a decent set of methods to compute the second phase, one of which implements a guided local search (Daum and Menzel, 2002).</Paragraph>
    <Paragraph position="3"> The key idea of GLS is to add a heuristics sitting on top of a local search procedure by introducing weights for each possible dependency edge in the constraint network. Initially being zero, weights are increased steadily if a local search settles in a local optimum. By augmenting the cost function f with these extra weights, further transformations are initiated along the gradient of f. Thus every weight of a dependency edge resembles an custom-tailored constraint whose penalty is learned during search.</Paragraph>
    <Paragraph position="4"> The question now to be asked is, how weights acquired during the incremental course of parsing in uence GLS. The interesting property is that the weights of dependency edges integrated earlier will always tend to be higher than weights of most recently introduced dependency edges as a matter of saturation. Thus keeping old weights will prevent GLS from changing old dependency edges and encourage transforming newer dependency edges rst. Old dependency edges will not be transformed until more recent constraint violations have been removed or old structures are strongly deprecated recently.</Paragraph>
    <Paragraph position="5"> This is a desirable behavior as it stabilizes former dependency structures with no extra provisions to the base mechanism. Transformations will be focused on the most recently added dependency edges. This approach is comparable to a simulated annealing heuristics where transformations are getting more infrequent due to a declining 'temperature'.</Paragraph>
    <Paragraph position="6"> Another very successful implementation of 'the optimizer' in WCDG is called Frobbing (Foth et al., 2000) which is a transformation based parsing technique similar to taboo search.</Paragraph>
    <Paragraph position="7"> One interesting feature of Frobbing is its ability to estimate an upper boundary of the penalty of any structure using a certain dependency edge and a certain word form. In an incremental parsing mode the penalty limit of a nonspec dependency edge will then be an estimate of any structure derived from it and thereby provide a good heuristics to prune nonspec edges falling beyond during the maintenance of the constraint network.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML