File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2157_metho.xml

Size: 13,802 bytes

Last Modified: 2025-10-06 14:15:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2157">
  <Title>Prefix Probabilities from Stochastic Tree Adjoining Grammars*</Title>
  <Section position="5" start_page="953" end_page="954" type="metho">
    <SectionTitle>
3 Overview
</SectionTitle>
    <Paragraph position="0"> The approach we adopt in the next section to derive a method for the computation of prefix probabilities for TAGs is based on transformations of equations. Here we informally discuss the general ideas underlying equation transformations. null Let w = ala2...an E ~* be a string and let N E V +-. We use the following representation which is standard in tabular methods for TAG parsing. An item is a tuple \[N, i, j, fl, f2\] representing the set of all trees t such that (i) t is a subtree rooted at N of some derived elementary tree; and (ii) t's root spans from position i to position j in w, t's foot node spans from position fl to position f2 in w. In case N does not dominate the foot, we set fl = f2 = -. We generalize in the obvious way to items It, i, j, fl, f2\], where t is an elementary tree, and \[a, i, j, fl, f2\], where cdn (N) = al~ for some N and/3.</Paragraph>
    <Paragraph position="1"> To introduce our approach, let us start with some considerations concerning the TAG parsing problem. When parsing w with a TAG G, one usually composes items in order to construct new items spanning a larger portion of the input string. Assume there are instances of auxiliary trees t and t' in G, where the yield of t', apart from its foot, is the empty string. If C/(t, N) &gt; 0 for some node N on the spine of t', and we have recognized an item \[Rt, i,j, fl, f2\], then we may adjoin t at N and hence deduce the existence of an item \[Rt,,i,j, fl, f2\] (see Fig. l(a)). Similarly, if t can be adjoined at a node N to the left of the spine of t' and fl = f2, we may deduce the existence of an item \[Rt, , i, j, j, j\] (see Fig. l(b)). Importantly, one or more other auxiliary trees with empty yield could wrap the tree t' before t adjoins. Adjunctions in this situation are potentially nontermihating. null One may argue that situations where auxiliary trees have empty yield do not occur in practice, and are even by definition excluded in the  case of lexicalized TAGs. However, in the computation of the prefix probability we must take into account trees with non-empty yield which behave like trees with empty yield because their lexical nodes fall to the right of the right boundary of the prefix string. For example, the two cases previously considered in Fig. 1 now generalize to those in Fig. 2.</Paragraph>
    <Paragraph position="2">  computing the prefix probability To derive a method for the computation of prefix probabilities, we give some simple recursive equations. Each equation decomposes an item into other items in all possible ways, in the sense that it expresses the probability of that item as a function of the probabilities of items associated with equal or smaller portions of the input.</Paragraph>
    <Paragraph position="3"> In specifying the equations, we exploit techniques used in the parsing of incomplete input (Lang, 1988). This allows us to compute the prefix probability as a by-product of computing the inside probability.</Paragraph>
    <Paragraph position="4"> In order to avoid the problem of nontermination outlined above, we transform our equations to remove infinite recursion, while preserving the correctness of the probability computation. The transformation of the equations is explained as follows. For an item I, the span of I, written a(I), is the 4-tuple representing the 4 input positions in I. We will define an equivalence relation on spans that relates to the portion of the input that is covered. The transformations that we apply to our equations produce two new sets of equations. The first set of equations are concerned with all possible decompositions of a given item I into set of items of which one has a span equivalent to that of I and the others have an empty span. Equations in this set represent endless recursion. The system of all such equations can be solved independently of the actual input w. This is done once for a given grammar.</Paragraph>
    <Paragraph position="5"> The second set of equations have the property that, when evaluated, recursion always terminates. The evaluation of these equations computes the probability of the input string modulo the computation of some parts of the derivation that do not contribute to the input itself. Combination of the second set of equations with the solutions obtained from the first set allows the effective computation of the prefix probability.</Paragraph>
  </Section>
  <Section position="6" start_page="954" end_page="958" type="metho">
    <SectionTitle>
4 Computing Prefix Probabilities
</SectionTitle>
    <Paragraph position="0"> This section develops an algorithm for the computation of prefix probabilities for stochastic TAGs.</Paragraph>
    <Section position="1" start_page="954" end_page="954" type="sub_section">
      <SectionTitle>
4.1 General equations
</SectionTitle>
      <Paragraph position="0"> The prefix probability is given by:</Paragraph>
      <Paragraph position="2"> Term P(\[t, i, j, fl, f2\]) gives the inside probability of all possible trees derived from elementary tree t, having the indicated span over the input.</Paragraph>
      <Paragraph position="3"> This is decomposed into the contribution of each single node of t in equations (1) through (6).</Paragraph>
      <Paragraph position="4"> In equations (5) and (6) the contribution of a node N is determined by the combination of the inside probabilities of N's children and by all possible adjunetions at N. In (7) we recognize some terminal symbol if it occurs in the prefix, or ignore its contribution to the span if it occurs after the last symbol of the prefix. Crucially, this step allows us to reduce the computation of prefix probabilities to the computation of inside probabilities.</Paragraph>
    </Section>
    <Section position="2" start_page="954" end_page="957" type="sub_section">
      <SectionTitle>
4.2 Terminating equations
</SectionTitle>
      <Paragraph position="0"> In general, the recursive equations (1) to (9) are not directly computable. This is because the value of P(\[A, i, j, f, if\]) might indirectly depend on itself, giving rise to nontermination.</Paragraph>
      <Paragraph position="1"> We therefore rewrite the equations.</Paragraph>
      <Paragraph position="2"> We define an equivalence relation over spans, that expresses when two items are associated with equivalent portions of the input:</Paragraph>
      <Paragraph position="4"> We introduce two new functions P~ow and P, pm. When evaluated on some item I, Plow recursively calls itself as long as some other item I' with a given elementary tree as its first component can be reached, such that a(I) ~. a(I').</Paragraph>
      <Paragraph position="5"> Pto~ returns 0 if the actual branch of recursion cannot eventually reach such an item I', thus removing the contribution to the prefix probability of that branch. If item I ' is reached, then P~ow switches to Psptit. Complementary to Plow, function P, pm tries to decompose an argument item I into items I ~ such that a(I) ~ a(I'). If this is not possible through the actual branch of recursion, P, pm returns 0. If decomposition is indeed possible, then we start again with Pto,o at items produced by the decomposition. The effect of this intermixing of function calls is the simulation of the original function P, with Pzo~ being called only on potentially nonterminating parts of the computation, and P, pm being called on parts that are guaranteed to terminate.</Paragraph>
      <Paragraph position="6"> Consider some derivation tree spanning some portion of the input string, and the associated derivation tree 7-. There must be a unique elementary tree which is represented by a node in 7- that is the &amp;quot;lowest&amp;quot; one that entirely spans the portion of the input of interest. (This node might be the root of T itself.) Then, for each t E .A and for each i,j, fl,f2 such that i &lt; j and i &lt; fl &lt; f2 __&lt; j, we must have:</Paragraph>
      <Paragraph position="8"> Similarly, for each t E 27 and for each i, j such that i &lt; j, we must have:</Paragraph>
      <Paragraph position="10"> The reason why P~o~, keeps a record of indices f{ and f~, i.e., the spanning of the foot node of the lowest tree (in the above sense) on which Plow is called, will become clear later, when we introduce equations (29) and (30).</Paragraph>
      <Paragraph position="11"> We define Pzo~:(\[t,i,j, fl,f2\],\[t',f\[,f~\]) and P~o=(\[a,i,j, fl,f2\],\[t',f{,f~\]) for / &lt; j and</Paragraph>
      <Paragraph position="13"> The definition of Pto~ parallels the one of P given in SS4.1. In (12), the second term in the right-hand side accounts for the case in which the tree we are visiting is the &amp;quot;lowest&amp;quot; one on which Pto,. should be called. Note how in the above equations Pto~ must be called also on nodes that do not dominate the footnode of the elementary tree they belong to (cf. the definition of ~). Since no call to P,p,t is possible through the terms in (18), (19) and (20), we must set the right-hand side of these equations to 0.</Paragraph>
      <Paragraph position="14"> The specification of P.pm(\[a, i, j, fl,f2\]) is given below. Again, the definition parallels the one of P given in SS4.1.</Paragraph>
      <Paragraph position="16"> if N E V A dft(N); P,,,, (\[N, i, j, -, -\]) = (25) C/(nil, N). Psplit (\[cdn (N), i, j, -, -\]) +</Paragraph>
      <Paragraph position="18"> We can now separate those branches of recursion that terminate on the given input from the cases of endless recursion. We assume below that P,p,,(\[Rt, i,j, f~,f~\]) &gt; 0. Even if this is not always valid, for the purpose of deriving the equations below, this assumption does not lead to invalid results. We define a new function Po,..., which accounts for probabilities of sub-derivations that do not derive any words in the prefix, but contribute structurally to its derivation: null</Paragraph>
      <Paragraph position="20"> P,,m (iRe, i, j, f{, fgt\]) We can now eliminate the infinite recursion that arises in (10) and (11) by rewriting</Paragraph>
      <Paragraph position="22"> P, pzit (\[Rt,, i, j, f, f\]).</Paragraph>
      <Paragraph position="23"> Equations for Po~,, will be derived in the next subsection.</Paragraph>
      <Paragraph position="24"> In summary, terminating computation of prefix probabilities should be based on equations (31) and (32), which replace (1), along with equations (2) to (9) and all the equations for P, pm.</Paragraph>
    </Section>
    <Section position="3" start_page="957" end_page="958" type="sub_section">
      <SectionTitle>
4.3 Off-line Equations
</SectionTitle>
      <Paragraph position="0"> In this section we derive equations for function Po~t,r introduced in SS4.2 and deal with all remaining cases of equations that cause infinite recursion.</Paragraph>
      <Paragraph position="1"> In some cases, function P can be computed independently of the actual input. For any i &lt; n we can consistently define the following quantities, where t E Z U.4 and a E V +- or cdn(N) = aft for some N and fl:</Paragraph>
      <Paragraph position="3"> where f = i if t E .A, f = - otherwise, and ff = i if dft(a), f = - otherwise. Thus, Ht is the probability of all derived trees obtained from t, with no lexical node at their yields. Quantities Ht and Ha can be computed by means of a system of equations which can be directly obtained from equations (1) to (9). Similar quantities as above must be introduced for the case i = n.</Paragraph>
      <Paragraph position="4"> For instance, we can set H~ = P(\[t, n, n, f, f\]), f specified as above, which gives the probability of all derived trees obtained from t (with no restriction at their yields).</Paragraph>
      <Paragraph position="5"> Function Po~e. is also independent of the actual input. Let us focus here on the case fl,f2 C/; {i,j,-} (this enforces (fl, f2) = (f~, f~) below). For any i, j, fl, f2 &lt; n, we can consistently define the following quantities.</Paragraph>
      <Paragraph position="7"> In the case at hand, Lt,t, is the probability of all derived trees obtained from t such that (i) no lexical node is found at their yields; and (ii) at some 'unfinished' node dominating the foot of t, the probability of the adjunction of t ~ has already been accounted for, but t t itself has not been adjoined.</Paragraph>
      <Paragraph position="8"> It is straightforward to establish a system of equations for the computation of Lt,t, and La,t,, by rewriting equations (12) to (20) according to (29) and (30). For instance, combining (12) and (29) gives (using the above assumptions on fl and f2): Lt,t' = LRt,t' + (~(t = t').</Paragraph>
      <Paragraph position="9"> Also, if a ~ e and dft(N), combining (14) and (30) gives (again, using previous assump- null tions on fl and f2; note that the Ha's are known terms here): L~N,t' = Ha&amp;quot; LN,t'.</Paragraph>
      <Paragraph position="10"> For any i, fl,f2 &lt; n and j = n, we also need to define:</Paragraph>
      <Paragraph position="12"> Here L~, t, is the probability of all derived trees obtained from t with a node dominating the foot node of t, that is an adjunction site for t' and is 'unfinished' in the same sense as above, and with lexical nodes only in the portion of the tree to the right of that node. When we drop our assumption on fl and f2, we must (pre)compute in addition terms of the form Po~t~r(\[t,i,j,i,i\], \[t',i,i\]) and Po~,~(\[t,i,j,i,i\], \[t',j,j\]) for i &lt; j &lt; n, Po,t~,(\[t,i,n, fl,n\], \[t',/i,f~\]) for i &lt; 11 &lt; n, Po,,..(\[t,i,n,n,n\], \[t', f{, f~\]) for i &lt; n, and similar. Again, these are independent of the choice of i, j and fl. Full treatment is omitted due to length restrictions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML