File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0307_metho.xml

Size: 14,768 bytes

Last Modified: 2025-10-06 14:09:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0307">
  <Title>A Statistical Constraint Dependency Grammar (CDG) Parser</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 CDG Parsing
</SectionTitle>
    <Paragraph position="0"> CDG (Harper and Helzerman, 1995) represents syntactic structures using labeled dependencies between words. Consider an example CDG parse for the sentence What did you learn depicted in the white box of Figure 1. Each word in the parse has a lexical category, a set of feature values, and a set of roles that are assigned role values, each comprised of a label indicating the grammatical role of the word and its modifiee (i.e., the position of the word it is modifying when it takes on that role). Consider the role value assigned to the governor role (denoted G) of you, np-2. The label np indicates the grammatical function of you when it is governed by its head in position 2. Every word in a sentence must have a governor role with an assigned role value.</Paragraph>
    <Paragraph position="1"> Need roles are used to ensure that the grammatical requirements of a word are met (e.g., subcategorization). null  ARV of the word did in the sentence what did you learn. PX and MX([R]) represent the position of a word and its modifiee (for role R), respectively.</Paragraph>
    <Paragraph position="2"> Note that CDG parse information can be easily lexicalized at the word level. This lexicalization is able to include not only lexical category and syntactic constraints, but also a rich set of lexical features to model subcategorization and wh-movement without a combinatorial explosion of the parametric space (Wang and Harper, 2002). CDG can distinguish between adjuncts and complements due to the use of need roles (Harper and Helzerman, 1995), is more powerful than CFG, and has the ability to model languages with crossing dependencies and free word ordering (hence, this research could be applicable to a wide variety of languages).</Paragraph>
    <Paragraph position="3"> An almost-parsing LM based on CDG has been developed in (Wang and Harper, 2002). The underlying hidden event of this LM is a SuperARV.</Paragraph>
    <Paragraph position="4"> A SuperARV is formally defined as a four-tuple for a word, hC;F, (R;L;UC;MC)+;DCi, where C is the lexical category of the word, F = fFname1 = Fvalue1, :::;FNamef = FV aluefg is a feature vector (where Fnamei is the name of a feature and Fvaluei is its corresponding value), DC represents the relative ordering of the positions of a word and all of its modifiees, (R, L, UC, MC)+ is a list of one or more four-tuples, each representing an abstraction of a role value assignment, where R is a role variable, L is a functionality label, UC represents the relative position relation of a word and its dependent, and MC encodes some modifiee constraints, namely, the lexical category of the modifiee for this dependency relation. The gray box of Figure 1 presents an example of a SuperARV for the word did. From this example, it is easy to see that a SuperARV is a join on the role value assignments of a word, with explicit position information replaced by a relation that expresses whether the modifiee points to the current word, a previous word, or a subsequent word. The SuperARV structure provides an explicit way to organize information concerning one consistent set of dependency links for a word that can be directly derived from a CDG parse. SuperARVs encode lexical information as well as syntactic and semantic constraints in a uniform representation that is much more fine-grained than part-of-speech (POS). A sentence tagged with SuperARVs is an almost-parse since all that remains is to specify the precise position of each modifiee. SuperARV LMs have been effective at reducing word error rate (WER) on wide variety of continuous speech recognition (CSR) tasks, including Wall Street Journal (Wang and Harper, 2002), Broadcast News (Wang et al., 2003), and Switchboard tasks (Wang et al., 2004).</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 SCDG Parser
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 The Basic Parsing Algorithm
</SectionTitle>
      <Paragraph position="0"> Our SCDG parser is a probabilistic generative model. It can be viewed as consisting of two components: SuperARV tagging and modifiee determination. These two steps can be either loosely or tightly integrated. To simplify discussion, we describe the loosely integrated version, but we implement and evaluate both strategies. The basic parsing algorithm for the loosely integrated case is summarized in Figure 2, with the algorithm's symbols defined in Table 1. In the first step, the top N-best SuperARV assignments are generated for an input sentence w1;:::;wn using token-passing (Young et al., 1997) on a Hidden Markov Model with tri-gram probabilistic estimations for both transition and emission probabilities. Each SuperARV sequence for the sentence is represented as a sequence of tuples: hw1;s1i;:::;hwn;sni, where hwk;ski represents the word wk and its SuperARV assignment sk. These assignments are stored in a stack ranked in non-increasing order by tag assignment probability.</Paragraph>
      <Paragraph position="1"> During the second step, the modifiees are statistically specified in a left-to-right manner. Note that the algorithm utilizes modifiee lexical category constraints to filter out candidates with mismatched lexical categories. When processing the word wk;k = 1;:::;n, the algorithm attempts to determine the left dependents of wk from the closest to the farthest. The dependency assignment probability when choosing the (c+ 1)th left dependent (with its position denoted dep(k;!(c + 1))) is defined as:</Paragraph>
      <Paragraph position="3"> where H = hw;sik;hw;sidep(k;!(c+1));hw;sidep(k;!c)dep(k;!1).</Paragraph>
      <Paragraph position="4"> The dependency assignment probability is conditioned on the word identity and SuperARV assignment of wk and wdep(k;!(c+1)) as well as all of the c previously chosen left dependents hw;sidep(k;!c)dep(k;!1) for wk. A Boolean random variable syn is used to model the synergistic relationship between certain role pairs. This mechanism allows us to elevate, for example, the probability that the subject of a sentence wi is governed by a tensed verb wj when the need role value of wj points to wi as its subject. The syn value for a dependency relation is determined heuristically based on the lexical category, role name, and label information of the two dependent words. After the algorithm statistically specifies the left dependents for wk, it must also determine whether wk could be the (d+1)th right dependent of a previously seen word wp;p = 1;:::;k ! 1 (where d denotes the number of already assigned right dependents of wp), as shown in Figure 2.</Paragraph>
      <Paragraph position="5"> After processing word wk in each partial parse on the stack, the partial parses are re-ranked according to their updated probabilities. This procedure is iterated until the top parse in the stack covers the entire sentence. For the tightly coupled parser, the SuperARV assignment to a word and specification of its modifiees are integrated into a single step. The parsing procedure, which is completely incremental, is implemented as a simple best-first stack-based search. To control time and memory complexity, we used two pruning thresholds: maximum stack depth and maximum difference between the log probabilities of the top and bottom partial parses in the stack. These pruning thresholds are tuned based on the tradeoff of time/memory complexity and parsing accuracy on a heldout set, and they both have hard limits.</Paragraph>
      <Paragraph position="6"> Note the maximum likelihood estimation of dependency assignment probabilities in the basic loosely coupled parsing algorithm presented in Figure 2 is likely to suffer from data sparsity, and the estimates for the tightly coupled algorithm are likely to suffer even more so. Hence, we smooth the probabilities using Jelinek-Mercer smoothing (Jelinek, 1997), as described in (Wang and Harper, 2003; Wang, 2003).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Additions to the Basic Model
</SectionTitle>
      <Paragraph position="0"> Some additional features are added to the basic model because of their potential to improve SCDG parsing accuracy. Their efficacy is evaluated in Section 5.</Paragraph>
      <Paragraph position="1"> Modeling crossing dependencies: The basic parsing algorithm was implemented to preclude crossing dependencies; however, it is important to allow them in order to model wh-movement in some cases (e.g., wh-PPs).</Paragraph>
      <Paragraph position="2"> Distance and barriers between dependents: Because distance between two dependent words is an important factor in determining the modifiees of a word, we evaluate an alternative model that adds distance, C/dep(k;SS(c+1));k to H in Figure 2. Note that C/dep(k;SS(c+1));k represents the distance between position dep(k;SS(c + 1)) and k. To avoid data sparsity problems, distance is bucketed and a discrete random variable is used to model it. We also model punctuation and verbs based on prior work. Like (Collins, 1999), we also found that verbs appear to act as barriers that impact modifiee links. Hence, a Boolean random variable that represents whether there is a verb between the dependencies is added to condition the probability estimations. Punctuation is treated similarly to coordination constructions with punctuation governed by the headword of the following phrase, and heuristic questions on punctuation were used to provide additional constraints on dependency assignments (Wang, 2003).</Paragraph>
      <Paragraph position="3"> Modifiee lexical features: The SuperARV structure employed in the SuperARV LM (Wang and Harper, 2002) uses only lexical categories of modifiees as modifiee constraints. In previous work (Harper et al., 2001), modifiee lexical features were central to increasing the selectivity of a CDG.</Paragraph>
      <Paragraph position="4"> Hence, we have developed methods to add additional relevant lexical features to modifiee constraints of a SuperARV structure (Wang, 2003).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Parsing Evaluation Metric
</SectionTitle>
    <Paragraph position="0"> To evaluate our parser, which generates CDG analyses rather than CFG constituent bracketing, we</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Term Denotes
</SectionTitle>
      <Paragraph position="0"> L(sk), R(sk) all dependents of sk to the left and right of wk, respectively N(L(sk)), N(R(sk)) the number of left and right dependents of sk, respectively dep(k;!c), dep(k; c) cth left dependent and right dependent of sk, respectively dep(k;!1), dep(k; 1) the position of the closest left dependent and right dependent of sk, respectively dep(k;!N(L(sk))), dep(k; N(L(sk))) the position of the farthest left dependent and right dependent of sk, respectively Cat(sk) the lexical category of sk ModCat(sk;!c), ModCat(sk; c) the lexical category of sk's cth left and right dependent (encoded in the SuperARV structure), respectively link(si; sj; k) the dependency relation between SuperARV si and sj with wi assigned as the kth dependent of sj, e.g., link(sdep(k;!(c+1)); sk;!(c + 1)) indicates that wdep(k;!(c+1)) is the (c + 1)th left dependent of sk.</Paragraph>
      <Paragraph position="1"> D(L(sk)), D(R(sk))) the number of left and right dependents of sk already assigned, respectively hw; sidep(k;!c)dep(k;!1) words and SuperARVs of sk's closest left dependent up to its cth left dependent hw; sidep(k;c)dep(k;1) words and SuperARVs of sk's closest right dependent up to its cth right dependent syn a random variable denoting the synergistic relation between some dependents can either convert the CDG parses to CFG bracketing and then use PARSEVAL, or convert the CFG bracketing generated from the gold standard CFG parses to CDG parses and then use a metric based on dependency links. Since our parser is trained using a CFG-to-CDG transformer (Wang, 2003), which maps a CFG parse tree to a unique CDG parse, it is sensible to evaluate our parser's accuracy using gold standard CDG parse relations. Furthermore, in the 1998 Johns Hopkins Summer workshop final report (Hajic et al., 1998), Collins et al.</Paragraph>
      <Paragraph position="2"> pointed out that in general the mapping from dependencies to tree structures is one-to-many: there are many possible trees that can be generated for a given dependency structure since, although generally trees in the Penn Treebank corpus are quite flat, they are not consistently &amp;quot;flat.&amp;quot; This variability adds a non-deterministic aspect to the mapping from CDG dependencies to CFG parse trees that could cause spurious PARSEVAL scoring errors. Additionally, when there are crossing dependencies, then no tree can be generated for that set of dependencies. Consequently, we have opted to use a transformer to convert CFG trees to CDG parses and define a new dependency-based metric adapted from (Eisner, 1996). We define role value labeled precision (RLP) and role value labeled recall (RLR) on dependency links as follows: RLP = correct modifiee assignmentsnumber of modifiees our parser found RLR = correct modifiee assignmentsnumber of modifiess in the gold test set parses where a correct modifiee assignment for a word wi in a sentence means that a three-tuple hrole id; role label; modifiee word positioni (i.e., a role value) for wi is the same as the three-tuple role value for the corresponding role id of wi in the gold test parse. This differs from Eisner's (1996) precision and recall metrics which use no label information and score only parent (governor) assignments, as in traditional dependency grammars. We will evaluate role value labeled precision and recall on all roles of the parse, as well as the governoronly portion of a parse. Eisner (Eisner, 1996) and Lin (Lin, 1995) argued that dependency link evaluation metrics are valuable for comparing parsers since they are less sensitive than PARSEVAL to single misattachment errors that may cause significant error propagation to other constituents. This, together with the fact that we must train our parser using CDG parses generated in a lossy manner from a CFG treebank, we chose to use RLP and RLR to compare our parsing accuracy with several state-of-the-art parsers.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML