XML Viewer - p01-1060

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/p01-1060_metho.xml
Size: 20,758 bytes
Last Modified: 2025-10-06 14:07:38
<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1060">
  <Title>Parse Forest Computation of Expected Governors</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Expected Governors
</SectionTitle>
    <Paragraph position="0"> Suppose that a probabilistic grammar licenses headed tree analyses a53a55a54 a51a57a56a58a56a58a56a58a51 a53 a21 for a sentence a59 , and assigns them probabilistic weights a60a61a54 a51a57a56a58a56a58a56a58a51 a60 a21 .  deprive all beginning students of their high school lunches.</Paragraph>
    <Paragraph position="1"> For a label a70 in column 2, column 3 gives a71a73a72a58a70a75a74 as computed with a PCFG weighting of trees, and column 4 gives a71a73a72a58a70a75a74 as computed with a head-lexicalized weighting of trees. Values below 0.1 are omitted. According to the lexicalized model, the PP headed by of probably attaches to VFP (finite verb phrase) rather than NP.</Paragraph>
    <Paragraph position="2"> Let a76a77a54 a51a57a56a58a56a58a56a58a51 a76 a21 be the governor labels for word position a78 determined by a53a55a54 a51a57a56a58a56a58a56a58a51 a53 a21 respectively. We define a scheme which divides a count of 1 among the different governor labels.</Paragraph>
    <Paragraph position="3"> For a given governor tuple a76 , let</Paragraph>
    <Paragraph position="5"> The definition sums the probabilistic weights of trees with markup a76 , and normalizes by the sum of the probabilities of all tree analyses of a59 .</Paragraph>
    <Paragraph position="6"> The definition may be justified as follows. We work with a markup space a91 a80a93a92a95a94a96a92a97a94a96a98 , where a92 is the set of category labels and a98 is the set of lemma labels. For a given markup triple a76 , let a99 a16a101a100 a91 a102a104a103a105 be the function which maps a76 to 1, and a76a106a31 to 0 for a76a107a31a109a108a80 a76 . We define a random variate  a16 , where a76 is the governor markup for word position a78 which is determined by tree a53 . The random variate a110 is defined on labeled trees licensed by the probabilistic grammar. Note that a111a91 a102a82a103a105a114a113 is a vector space (with pointwise sums and scalar products), so that expectations and conditional expectations may be defined. In these terms, a79 is the conditional expectation of a110 , conditioned on the yield being a59 . This definition, instead of a single governor label for a given word position, gives us a set of pairs of a markup a76 and a real number a79 a23 a76 a27 in [0,1], such that the real numbers in the pairs sum to 1. In our implementation (which is based on Schmid (2000a)), we use a cutoff of 0.1, and print only indices a76 where a79 a23 a76 a27 is above the cutoff. Figure 3 is an example.</Paragraph>
    <Paragraph position="7"> A direct implementation of the above definition using an iteration over trees to compute a79 would be unusable because in the robust grammar of English we work with, the number of tree analyses for a sentence is frequently large, greater than a115a57a116a28a117 for about 1/10 of the sentences in the British National Corpus. We instead calculate a79 in a parse forest representation of a set of tree analyses.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Parse Forests
</SectionTitle>
    <Paragraph position="0"> A parse forest (see also Billot and Lang (1989)) in labeled grammar notation is a tuple a118  nals in a128 . By using a124 a7 on symbols on the left hand and right hand sides of a parse forest rule, a124 a7 can be extended to map the set of parse forest rules a105a75a7 to the set of underlying grammar rules</Paragraph>
    <Paragraph position="2"> a7 is also extended to map trees licensed by the parse forest grammar to trees licensed by the underlying grammar. An example is given in figure 4.</Paragraph>
    <Paragraph position="3"> Where a129a87a130a125a119a131a7a48a132 a121 a7a133a132a48a105a122a7 , let a103a134a7 a23 a129 a27 be the set of trees licensed by a49a38a119a131a7 a51a19a121 a7 a51 a105a122a7 a51a19a123 a7a135a50 which have root symbol a129 in the case of a symbol, and the set of trees which have a129 as the rule expanding the root in the case or a rule. a103 a23 a129 a27 is defined to be the multiset image of a103 a7 a23 a129 a27 under a124 a7 . a103 a23 a129 a27 is the multiset of inside trees represented by parse  resenting two tree analyses of John reads every paper on markup. The labeling function drops subscripts, so that a124 a7 a23 VPa54 a27 a80 VP.</Paragraph>
    <Paragraph position="4"> forest symbol or rule a129 .3 Let a92 a7 a23 a129 a27 be the set of trees in a103a134a7 a23a145a123 a7 a27 which contain a129 as a symbol or use a129 as a rule. a92 a23 a129 a27 is defined to be the multiset image of a92 a7 a23 a129 a27 under a124 a7 . a92 a23 a129 a27 is the multiset of complete trees represented by the parse forest symbol or rule a129 .</Paragraph>
    <Paragraph position="5"> Where a60 is a probability function on trees licensed by the underlying grammar and a129 is a symbol or rule in a118 ,</Paragraph>
    <Paragraph position="7"> is called the flow for a129 .4 Parse forests are often constructed so that all inside trees represented by a parse forest nonterminal a161a162a130a6a119a120a7 have the same span, as well as the same parent category. To deal with headedness and lexicalization of a probabilistic grammar, we construct parse forests so that, in addition, all inside trees represented by a parse forest nonterminal have the same lexical head. We add to the labeled grammar a function a163a46a7 which labels parse forest symbols with lexical heads. In our implementation, an ordinary context free parse forest is 3We use multisets rather than set images to achieve correctness of the inside algorithm in cases where a164 represents some tree more than once, something which is possible given the definition of labeled grammars. A correct parser produces a parse forest which represents every parse for the input sentence exactly once.</Paragraph>
    <Paragraph position="8"> 4These quantities can be given probabilistic interpretations and/or definitions, for instance with reference to conditionally expected rule frequencies for flow.</Paragraph>
    <Paragraph position="9">  first constructed by tabular parsing, and then in a second pass parse forest symbols are split according to headedness. Such an algorithm is shown in appendix B. This procedure gives worst case time and space complexity which is proportional to the fifth power of the length of the sentence.</Paragraph>
    <Paragraph position="10"> See Eisner and Satta (1999) for discussion and an algorithm with time and space requirements proportional to the fourth power of the length of the input sentence in the worst case. In practical experience with broad-coverage context free grammars of several languages, we have not observed super-cubic average time or space requirements for our implementation. We believe this is because, for our grammars and corpora, there is limited ambiguity in the position of the head within a given category-span combination.</Paragraph>
    <Paragraph position="11"> The governor algorithm stated in the next section refers to headedness in parse forest rules.</Paragraph>
    <Paragraph position="12"> This can be represented by constructing parse forest rules (as well as ordinary grammar rules) with headed tree domains of depth one.5 Where a30 is a parse forest symbol on the right hand side of a parse forest rule a110 , we will simply state the condition &amp;quot;a30 is the head of a110 &amp;quot;.</Paragraph>
    <Paragraph position="13"> The flow and governor algorithms stated below call an algorithm PF-INSIDEa23 a118 a51a2a165a122a27 which computes inside probabilities in a118 , where a165 is a function giving probability parameters for the underlying grammar. Any probability weighting of trees may be used which allows inside probabilities to be computed in parse forests. The inside 5See footnote 1. Constructed in this way, the first rule in parse forest in Figure 4 has domain a177a6a37a12a178a12a179a135a68a65a178 a42a57a180 , and labeling function a177a57a181a10a37a12a178 Sa182a9a183a11a178a12a181a38a179a135a68a65a178 NPa182a9a183a11a178a2a181a42 a178 VPa182a11a183 a180 . When parse forest rules are mapped to underlying grammar rules, the domain is preserved, so that a184 a159 applied to the parse forest rule just described is the tree with domain a177a125a37a2a178a4a179a135a68a65a178 a42a57a180 and label function  algorithm for ordinary PCFGs is given in figure 5. The parameter a165 maps the set of underlying grammar rules a105 which is the image of a124 a7 on a124a44a188 to reals, with the interpretation of rule probabilities. In step 5, a124 a7 maps the parse forest rule a110 to a grammar rule</Paragraph>
    <Paragraph position="15"> of a165 . The functions lhs and rhs map rules to their left hand and right hand sides, respectively.</Paragraph>
    <Paragraph position="16"> Given an inside algorithm, the flow a154 may be computed by the flow algorithm in Figure 6, or by the inside-outside algorithm.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Governors Algorithm
</SectionTitle>
    <Paragraph position="0"> The governor algorithm annotates parse forest symbols and rules with functions from governor labels to real numbers. Let a53 be a tree in the parse forest grammar, let a25 be a symbol in a53 , let a30 be the maximal symbol in a53 of which a25 is a head, or a25 itself if a25 is a non-head child of its parent in a53 , and let a30a52a31 be the parent of a30 in a53 . Recall that</Paragraph>
    <Paragraph position="2"> is a vector mapping the markup triple</Paragraph>
    <Paragraph position="4"> to 0. We have constructed parse forests such that a49 a124 a7 a23 a30 a27a19a51a125a124 a7 a23 a30a52a31 a27a19a51 a163 a7 a23 a30a32a31a27 a50 agrees with the governor label for the lexical head of the node corresponding to a25 in a124 a7 a23 a53 a27 .</Paragraph>
    <Paragraph position="5"> A parse forest tree a53 and symbol a25 in a53 thus determine the vector (4), where a30 and a30 a31 are defined as above. Call the vector determined in this waya99</Paragraph>
    <Paragraph position="7"> Assuming that a118 a80 a49a38a119 a7 a51a19a121 a7 a51 a105 a7 a51a19a123 a7 a51a125a124 a7 a50 is a parse forest representing each tree analysis for a sentence exactly once, the quantity a79 for terminal position a78 (as defined in section 1) is found by summing a194 a23a26a25a39a27 for terminal symbols a25 in a121 a7 which have string position a78 .6 The algorithm PF-GOVERNORS is stated in Figure 3. Working top down, if fills in an array a194 a111a152a198a113 which is supposed to agree with the quantity a194 a23 a198a27 defined above. Scaled governor vectors are created for non-head children in step 10, and summed down the chain of heads in step 9. In step 6, vectors are divided in proportion to inside probabilities (just as in the flow algorithm), because the set of complete trees for the left hand side of a110 are partitioned among the parse forest rules which expand the left hand side of a110 .</Paragraph>
    <Paragraph position="8"> Consider a parse forest rule a110 , and a parse forest symbol a30 on its right hand side which is not the head of a110 . In each tree in a92 a7 a23 a110 a27 , a30 is the top of a chain of heads, because a30 is a non-head child in rule a110 . In step 10, the governor tuple describing the syntactic environment of a30 in trees in a92 a7 a23 a110 a27 (or rather, their images under a124 a7 ) is constructed</Paragraph>
    <Paragraph position="10"> to a unique string position, something which is not enforced by our definition of parse forests. Indeed, such cases may arise if parse forest symbols are constructed as pairs of grammar symbols and strings (Tendeau, 1998) rather than pairs of grammar symbols and spans. Our parser constructs parse forests organized according to span.</Paragraph>
    <Paragraph position="11"> as</Paragraph>
    <Paragraph position="13"> the relative weight of trees in a92 a7 a23 a110 a27 . This is appropriate because a194 a23 a30 a27 as defined in equation (5) is to be scaled by the relative weight of trees in</Paragraph>
    <Paragraph position="15"> In line 9 of the algorithm, a194 is summed into the head child a30 . There is no scaling, because every tree in a92 a7 a23 a110 a27 is a tree in a92 a7 a23 a30 a27 . A probability parameter vector a165 is used in the inside algorithm. In our implementation, we can use either a probabilistic context free grammar, or a lexicalized context free grammar which conditions rules on parent category and parent lexical head, and conditions the heads of non-head children on child category, parent category, and parent head (Eisner, 1997; Charniak, 1995; Carroll and Rooth, 1998). The requisite information is directly represented in our parse forests by a92 a7 and a163a127a7 . Thus the call to PF-INSIDE in line 1 of PF-GOVERNORS may involve either a computation of PCFG inside probabilities, or head-lexicalized inside probabilities. However, in both cases the algorithm requires that the parse forest symbols be split according to heads, because of the reference to a163 a7 in line 10. Construction of head-marked parse forests is presented in the appendix.</Paragraph>
    <Paragraph position="16"> The LoPar parser (Schmid, 2000a) on which our implementation of the governor algorithm is based represents the parse forest as a graph with at most binary branching structure. Nodes with more than two daughter nodes in a conventional parse forest are replaced with a right-branching tree structure and common sub-trees are shared between different analyses. The worst-case space complexity of this representation is cubic (cmp.</Paragraph>
    <Paragraph position="17"> Billot and Lang (1989)).</Paragraph>
    <Paragraph position="18"> LoPar already provided functions for the computation of the head-marked parse forest, for the flow computation and for traversing the parse forest in depth-first and topologically-sorted order (see Cormen et al. (1994)). So it was only necessary to add functions for data initialization, for the computation of the governor vector at each node and for printing the result.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Pooling of grammatical relations
</SectionTitle>
    <Paragraph position="0"> The governor labels defined above are derived from the specific symbols of a context free grammar. In contrast, according to the general markup methodology of current computational linguistics, labels should not be tied to a specific grammar and formalism. The same markup labels should be produced by different systems, making it possible to substitute one system for another, and to compare systems using objective tests.</Paragraph>
    <Paragraph position="1"> Carroll et al. (1998) and Carroll et al. (1999) propose a system of grammatical relation markup to which we would like to assimilate our proposal.</Paragraph>
    <Paragraph position="2"> As grammatical relation symbols, they use atomic labels such as dobj (direct object) an ncsubj (nonclausal subject). The labels are arranged in a hierarchy, with for instance subj having subtypes ncsubj, xsubj, and csubj.</Paragraph>
    <Paragraph position="3"> There is another problem with the labels we have used so far. Our grammar codes a variety of features, such as the feature VFORM on verb projections. As a result, instead of a single object grammatical relation a49 NP,VPa50 , we have grammatical relations a49 NP,VP.Na50 , a49 NP,VP.FINa50 , a49 NP,VP.TOa50 , a49 NP,VP.BASEa50 , and so forth. This may result in frequency mass being split among different but similar labels. For instance, a verb phrase will have read every paper might have some analyses in which read is the head of a base form VP and paper is the head of the object of read, and others where read is a head of a finite form VP, and paper is the head of the object of read.</Paragraph>
    <Paragraph position="4"> In this case, frequencies would be split between a49 NP,VP.BASE,reada50 and a49 NP,VP.FIN,reada50 as governor labels for paper.</Paragraph>
    <Paragraph position="5"> To address these problems, we employ a pooling function a128 a105 which maps pairs of categories to symbols such as ncsubj or obj. The governor tuple a49a38a29 a23 a30 a27a19a51 a29 a23 a30a32a31 a27a19a51a6a22a24a23 a30a52a31 a27 a50 is then replaced by  More flexibility could be gained by using a rule and the address of a constituent on the right hand side as arguments of a128 a105 . This would allow the following assignments.</Paragraph>
    <Paragraph position="6">  The head of a rule is marked with a prime. In the first pair, the objects in double object construction are distinguished using the address. In each case, the child-parent category pair is a49 NP,VP.FINa50 , so that the original proposal could not distinguish the grammatical relations. In the second pair, a VP.TO argument is distinguished from a VP.TO modifier using the category of the head. In each case, the child-parent category pair is a49 VP.TO,VP.FINa50 . Notice that in Line 10 of PF-GOVERNORS, the rule</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> The governor algorithm was designed as a component of Spot, a free-text question answering system. Current systems usually extract a set of candidate answers (e.g. sentences), score them and return the n highest-scoring candidates as possible answers. The system described in Harabagiu et al. (2000) scores possible answers based on the overlap in the semantic representations of the question and the answer candidates. Their semantic representation is basically identical to the head-head relations computed by the governor algorithm. However, Harabagiu et al. extract this information only from maximal probability parses whereas the governor algorithm considers all analyses of a sentence and returns all possible relations weighted with estimated frequencies. Our application in Spot works as follows: the question is parsed with a specialized question grammar, and features including the governor of the trace are extracted from the question. Governors are among the features used for ranking sentences, and answer terms within sentences. In collaboration with Pranav Anand and Eric Breck, we have incorporated governor markup in the question answering prototype, but not debugged or evaluated it.</Paragraph>
    <Paragraph position="1"> Expected governor markup summarizes syntactic structure in a weighted parse forest which is the product of exhaustive parsing and inside-outside computation. This is a strategy of dumbing down the product of computationally intensive statistical parsing into unstructured markup. Estimated frequency computations in parse forests have previously been applied to tagging and chunking (Schulte im Walde and Schmid, 2000). Governor markup differs in that it is reflective of higher-level syntax. The strategy has the advantage, in our view, that it allows one to base markup algorithms on relatively sophisticated grammars, and to take advantage of the lexically sensitive probabilistic weighting of trees which is provided by a lexicalized probability model.</Paragraph>
    <Paragraph position="2"> Localizing markup on the governed word increases pooling of frequencies, because the span of the phrase headed by the governed item is ignored. This idea could be exploited in other markup tasks. In a chunking task, categories and heads of chunks could be identified, rather than categories and boundaries.</Paragraph>
    <Paragraph position="3"> A Relation Between Flow and</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Inside-Outside Algorithm
</SectionTitle>
      <Paragraph position="0"> The inside-outside algorithm computes inside probabilities a146 a111a25 a113 and outside probabilities a203a39a111a25 a113 . We will show that these quantities are related to the flow a154a61a23a26a25a28a27 by the equation a154 a111a25 a113 a80  a7a24a113 is the inside probability of the root symbol, which is also the sum of the probabilities of all parse trees.</Paragraph>
      <Paragraph position="1"> According to Charniak (1993), the outside probabilities in a parse forest are computed by:  The outside probability of the start symbol is 1. We prove by induction over the depth of the parse forest that the following relationship holds:  according to the definition of a203a39a111a25 a113 . So, the induction hypothesis is generally true.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML