File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/j96-3007_metho.xml

Size: 11,267 bytes

Last Modified: 2025-10-06 14:14:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="J96-3007">
  <Title>Technology A Probabilistic Recursive Transition Network is an elevated version of a Recursive Transition Network used to model and process context-free languages in stochastic parameters. We present</Title>
  <Section position="3" start_page="0" end_page="422" type="metho">
    <SectionTitle>
2. A Probabilistic Recursive Transition Network
</SectionTitle>
    <Paragraph position="0"> A PRTN denoted by A is a 6-tuple.</Paragraph>
    <Paragraph position="1"> = (A,u,s,;:,r,5).</Paragraph>
    <Paragraph position="3"> A is a transition matrix containing transition probabilities, and B is a word matrix containing the probability distribution of the words observable at each terminal transition. I? specifies the types of transitions, and ~ represents a stack. S and .~ denote start and final states, respectively.</Paragraph>
    <Paragraph position="4"> Stack operations are associated with transitions; transitions are classified into three types, according to the stack operation. The first type is nonterminal transition, in which state identification is pushed into the stack. The second type is pop transition, in which transition is determined by the content of the stack. The third type is transitions not committed to stack operation; these are terminal and empty transitions. In general, the grammar expressed in PRTN consists of layers. A layer is a fragment of network that corresponds to a nonterminal. A table of the probability distribution of words is defined at each terminal transition. Pop transitions represent the returning of a layer to one of its (possibly multiple) higher layers.</Paragraph>
    <Paragraph position="5"> In this paper, parses are assumed to be sequences of dark-headed transitions (see  first(1) returns the first state of layer I.</Paragraph>
    <Paragraph position="6"> last(l) returns the last state of layer I.</Paragraph>
    <Paragraph position="7"> layer(s) returns the layer state s belongs to.</Paragraph>
    <Paragraph position="8"> bout(1) returns the states from which layer 1 branches out.</Paragraph>
    <Paragraph position="9"> bin(1) returns the states to which layer 1 returns.</Paragraph>
    <Paragraph position="10"> terminal(1) returns a set of terminal edges in layer 1.</Paragraph>
    <Paragraph position="11"> nonterminal(1) returns a set of nonterminal edges in layer I.</Paragraph>
    <Paragraph position="12"> denotes the edge between states i and j.</Paragraph>
    <Paragraph position="13"> \[i,j\] denotes the network segment between states i and j.</Paragraph>
    <Paragraph position="14"> Wa~b is a word sequence covering the a th to b th word.</Paragraph>
    <Paragraph position="15">  Han and Choi A Chart Re-estimation Algorithm</Paragraph>
  </Section>
  <Section position="4" start_page="422" end_page="424" type="metho">
    <SectionTitle>
3. Re-estimation Algorithm
</SectionTitle>
    <Paragraph position="0"> The task of a re-estimation algorithm is to assign probabilities to transitions and the word symbols defined at each terminal transition. The Inside-Outside algorithm provides a formal basis for estimating parameters of context free languages so that the probabilities of the word sequences (sample sentences) may be maximized. The re-estimation algorithm for PRTN uses a variation of the Inside-Outside algorithm customized for PRTN.</Paragraph>
    <Paragraph position="1"> Let a word sequence of length N be denoted by:</Paragraph>
    <Paragraph position="3"> Now define the Inside probability.</Paragraph>
    <Paragraph position="4"> Definition 1 The Inside probability denoted by Pi(i)s~t of state i is the probability that layer(i) generates the string positioned from s to t starting at state i given a model ,~.</Paragraph>
    <Paragraph position="6"> Figure 2 is the pictorial view of the Inside probability. A valid sequence can begin only at state $, thus to be strict, P1(8) has an additional product, P($). When the immediate transition ~ is of terminal type, the transition probability aq and the probability of the S th word at the transition b(~j, Ws) are multiplied together with the Inside probability of the rest of the sequence, Ws+l~t.</Paragraph>
    <Paragraph position="7"> Now define the Outside probability.</Paragraph>
    <Paragraph position="8"> Definition 2 The Outside probability denoted by Po (i,j)s~t is the probability that partial sequences, Wl~s_l and Wt+l~N, are generated, provided that the partial sequence, Ws~t, is generated by \[i,j\] given a model ~.</Paragraph>
    <Paragraph position="9"> And by definition:</Paragraph>
    <Paragraph position="11"> Computational Linguistics Volume 22, Number 3</Paragraph>
    <Paragraph position="13"> Illustration of Outside probability.</Paragraph>
    <Paragraph position="14"> where x E bout(layer(i)), y E bin(layer(i)), f =first(layer(i)), e = last(layer(i)), layer(i) = layer(\]'), and layer(x) = layer(y).</Paragraph>
    <Paragraph position="15"> The summation on x is defined only when a ~ 1 or b ~ N (i.e., there are words left to be generated). Nonterminal and its corresponding pop transitions are defined to be 1 when a = 1 and b = N.</Paragraph>
    <Paragraph position="16"> For a boundary case of the Outside probability where f is the first state of a layer in the above equation:</Paragraph>
    <Paragraph position="18"> Figure 3 shows the network configuration in computing the Outside probability. In equation 2, P~(f, i)~s-1 is the probability that sequence, Wa~-l, is generated by layer(i) left to state i, and Pl(j)t+l~b is the probability that sequence Wt+l~b is generated by layer(i) right to state j.</Paragraph>
    <Paragraph position="19"> The computation of P~ 0 c, i)s~t--a slight variation of the Inside probability in which the P~(f)a~b'S in equation 1 are replaced by P~0 c, i)a~b--is done as follows:</Paragraph>
    <Paragraph position="21"> It is basically the same as the Inside probability except that it carries an i that indicates a stop state.</Paragraph>
    <Paragraph position="22"> Now we can derive the re-estimation algorithm for .g and/3 using the Inside and Outside probabilities. As the result of constrained maximization of Baum's auxiliary  Han and Choi A Chart Re-estimation Algorithm function, we have the following form of re-estimation for each transition (Rabiner 1989).</Paragraph>
    <Paragraph position="23"> expected number of transitions from state i to state j expected number of transitions from state i The expectation of each transition type is computed as follows: For a terminal transition: null</Paragraph>
    <Paragraph position="25"> where u = last(layer(j)), v C/ bin(layer(j)), layer(i) = layer(v), layer(j) = layer(u), and uv is a pop transition. For a pop transition: N N ~-~s=l ~-~t=s &amp;,vPl(v)s~taijPo(u, J)sNt</Paragraph>
    <Paragraph position="27"> where u E bout(layer(i)), j c bin(layer(i)), v = first(layer(i)), layer(u) = layer(j), layer(v) = layer(i), and uv is a nonterminal transition.</Paragraph>
    <Paragraph position="28"> Since transitions of terminal and nonterminal types can occur together at a state, terminal transitions are estimated as follows:  Outside computation with chart. Inside computation builds a table of computed Insides.</Paragraph>
  </Section>
  <Section position="5" start_page="424" end_page="426" type="metho">
    <SectionTitle>
4. Chart Re-estimation Algorithm
</SectionTitle>
    <Paragraph position="0"> It can be shown that the complexity of the Inside algorithm is O(N3G 3) and that of the Outside algorithm is O(N4G 3) where N is the input size and G is the number of states.</Paragraph>
    <Paragraph position="1"> The complexity is too much for current workstations when either N or G becomes bigger than a few 10s. A basic implementation of the algorithm is to use a chart and avoid doing the same computations more than once. For instance, the table for storing Inside computations takes O(N2G2C) store, where C is the number of terminal and nonterminal categories. A chart item is a function of five parameters, and returns an Inside probability.</Paragraph>
    <Paragraph position="2"> I(i,j, s, t, c) = Pi(i,j)s~t.</Paragraph>
    <Paragraph position="3"> A chart item is associated with categories implying that the item is valid on the specified categories that begin the net fragment of the item. Suppose a net fragment \[i,j\] begins with NP and ADJP, then given a sentence fragment Ws~t, ADJP may not participate in generating Ws~t, while NP may. The information of valid categories is useful when the chart is used in computing Outside probabilities.</Paragraph>
    <Paragraph position="4"> An Outside probability is the result of computing many Inside probabilities. Computing an Inside probability even in an application of moderate size can be impractical. A naive implementation of Outside computation takes numerous Inside computations, so estimating even a parameter will not be realistic in a serial workstation (Lari and Young 1990).</Paragraph>
    <Paragraph position="5"> The proposed estimation algorithm aims at reducing the redundant Inside computations in computing an Outside probability. The idea is to identify the Inside probabilities used in generating an input sentence and to compute an Outside probability using mainly those Insides. This is done first by computing an Inside probability of the input sentence, which can return a table of Insides used in the computation. Note that the Insides in the deepest depth are produced first, as the recursion is released, thus there can be many Insides that are not relevant to the given sentence. The Insides that participate in generating the input sentence can be identified by running the Inside algorithm one more time, top-down. Figure 4 illustrates the steps of the revised Outside computation.</Paragraph>
    <Paragraph position="6"> The identified Insides, however, do not cover all the Insides needed in computing an Outside probability. This is because the Inside algorithm works on a network from  Han and Choi A Chart Re-estimation Algorithm left to right and one transition at a time. Many Insides that are missed in the table are compositions of smaller Insides.</Paragraph>
    <Paragraph position="7"> Once charts of selected Insides are prepared, an Outside probability is computed as follows:</Paragraph>
    <Paragraph position="9"> where x E bout(layer(i)), y c bin(layer(i)), f = first(layer(i)), e = last(layer(i)), c c {nonterminal}, layer(i) = layer(j), and layer(x) = layer(y).</Paragraph>
    <Paragraph position="10"> The function cr0C, e,s, t) returns a set of (a, b) pairs where there are Inside items I0 c, e, a, b) defined at the chart such that a G s and b &gt; t. In short, the items for state f indicate the possible combinations of sentence segments inclusive of the given fragment Ws~t because the chart contains items of all the valid sentence segments that were generated through the layer ~c, e\]. When the current layer ~c, e\] is completed with the two Insides computed, the computation extends to the Outside.</Paragraph>
    <Paragraph position="11"> Useless advancements into high layers that do not lead to the successful completion of a given sentence can be avoided by making sure that Ix, y\] generates Wa~b and the category of current layer c is defined, which can be checked by consulting the chart items for state x.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML