File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0107_metho.xml
Size: 15,266 bytes
Last Modified: 2025-10-06 14:14:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0107"> <Title>Reestimation and Best-First Parsing Algorithm for Probabilistic Dependency Grammars</Title> <Section position="3" start_page="41" end_page="45" type="metho"> <SectionTitle> 2 PDG Best First Parsing Algorithm </SectionTitle> <Paragraph position="0"> Dependency grammar describes a language with a set of head-dependent relations between any two words in the language. Head-dependent relations represent specific relations such as modiflee-modifier, predicate-argument, etc. In general, a functional role is assigned to a dependency link and specifies the syntactic/semantic relation between the head and the dependent. However, in this paper, we use the minimal definition of dependency grnmmar with head-dependent relations only. In the future we will extend our dependency grammar into one with functions of dependency lln~.</Paragraph> <Paragraph position="2"> A dependency tree of a n-word sentence is always composed of n-1 dependency links.</Paragraph> <Paragraph position="3"> Every word in the sentence must have its head, except the word which is the head of the sentence. In a dependency tree, crossing links are not allowed.</Paragraph> <Paragraph position="4"> Salesperson sold the dog buiscuits Figure 1 shows a dependency tree as a hierarchical representation and a link representation respectively. In both, the word ~sold&quot; is the head of the sentence. Here, we define the non-constituent objects, complete-link and complete-sequence which are used in PDG reestimation and BFP algorithms. A set of dependency links constructed for word sequence wid is defined as complete-link ff the set satisfies following conditions: Complete-link has directionality. It is determined by the direction of the outermost dependency relation. If the complete-llnk has (wi --> wj), it is rightward, and if the complete-link has (wi ~-- wj), then it is leftward. Basic complete-link is a dependency link between adjacent two words.</Paragraph> <Paragraph position="5"> Complete-sequence is defined as a sequence of null or more adjacent complete-lknks of same direction. Basic complete-sequence is null sequence of complete-links which is defined on one word, the smallest word sequence. The direction of complete-sequence is determined by the direction of component complete-links. If the complete-sequence is composed of leftward complete-links, the complete-sequence is leftward, and vice versa.</Paragraph> <Paragraph position="6"> Figure 2 shows abstract rightward complete-llnk for wi,j, rightward complete-sequence for Wi,m, and leftward complete-sequence for Wrn+ld. Double-slashed line means a completesequence. Whatever the direction is, a complete-link for wij is always constructed with a dependency link between wi and wj, a rightward complete-sequence from i to ra, and a leftward complete-sequence from j to m + 1, for an m between i and j - 1. Rightward complete-sequence is always composed with a combination of a rightward complete-sequence and a rightward complete-link. On the contrary, leftward complete-sequence is always composed with a combination of a leftward complete-link and a leftward complete-sequence. These restrictions on composition of complete-sequence is for the ease of description of algorithm~ The basic complete-link and complete-sequence are also shown in the Figure 2. Following notations are used to represent the four kinds of objects for a word sequence wij and for an m from i to j-1.</Paragraph> <Paragraph position="7"> To generalize the structure of a dependency tree, we assume that there are marking tags, BOS (Begin Of Sentence) before w, and EOS (End Of Sentence) after wn and that there are always the dependency links, (wBos --+ WEos) and (wk ~-- wEos) when wk is the head word of the sentence. Then, by definition: any dependency tree of a sentence, wi.n can be uniquely represented with either a Lr(BOS, EOS) or a Sl(1, EOS) as depicted in Figure 3. This is because Lr (BOS, EOS) for any sentence is always composed of null Sr(BOS, BOS) and St(1,EOS). The head of a dependency tree Wk can be found in the rightmost Lt( k, EO S) of &(1, EO S).</Paragraph> <Paragraph position="8"> The probability of each object is defined as follows.</Paragraph> <Paragraph position="10"> The m varies from i to j - 1 for Lz, Lr and St, and from i + 1 to j for Si. The best L~ and the best Lr always share the same m. This is because both are composed of the same sub-St and sub-St with rnaYimum probabilities. Basis probabilities are as follows:</Paragraph> <Paragraph position="12"> Thus, the probability of a dependency tree is defined either by p(Lr(BOS, EOS)) or by p(st(1, sos).</Paragraph> <Paragraph position="13"> The PDG best-first parsing algorithm constructs the best dependency tree in bottom-up manner, with dynamic programrrdng method using CYK style chart. It is based on complete-link and complete-sequence of non-constituent concept. The parsing algorithm constructs the complete-link.q and complete-sequences for substring, and merges incrementally the complete-links into larger complete-sequence and complete-sequences into larger complete-link until the Lr(BOS, EOS) with maximum probability is constructed. Eisner (Eisner, 1996) proposed an O(n 3) parsing algorithm for PDG. In their work, basic unit of chart entry is span which is also of non-constituent concept. But, the span slightly differs from our complete-sequence and complete-link. When two adjacent spans are merged into a larger span, some conditional tests must be satisfied. In our work, best-first parsing is done by inscribing the four entries with maximum probabilities, Lt (i, j), L~ (i, j), St(i,j), and Sr(i,j) to each chart positions in bottom-up/left-to-right manner without any extra condition checking.</Paragraph> <Paragraph position="14"> Figure 4 depicts the possible combinations of chart entries into a larger Lr, Lt, St, and Sr each. The sub-entries under the white-headed arrow and the sub-entries under the black-headed arrow are merged into a larger entries. The larger entries are inscribed into the bold box.</Paragraph> <Paragraph position="15"> There is an exception for chart entries of n+lth column. In the n+lth column, only the Lt(k, EOS) whose sub Sl is null can be inscribed. This is because there can be only one head word for a tree structure. If Lt(k, EOS) whose sub St is not null is inscribed into the chart, the overall tree structure will have two or more heads.</Paragraph> <Paragraph position="17"> The best parse is maximum Lr(BOS, EOS) in the chart position (0,n + 1). The best parse can also be found by the maximum St(l, EOS) because the Lr(BOS, EOS) is always composed of Sr(BOS, BOS) and Sl(l, EOS).</Paragraph> <Paragraph position="18"> The chart size is n2+4n+3 for n word sentence. For four items (Lr, Ll, St, and Sl) of 2 each chart position, there can be maximally n searches. Thus, the time complexity of the best-fRrst parsing algorithm is O(nS).</Paragraph> </Section> <Section position="4" start_page="45" end_page="49" type="metho"> <SectionTitle> 3 PDG Reestimation Algorithm </SectionTitle> <Paragraph position="0"> For reestimation of dependency probabilities of PDG, eight kinds of chart entries are defined based on three factors: inside/outside, complete-link/complete-sequence, and leftward/rightward. In following definitions, f~ is for inside probability and a is for outside probability. Superscripts represent whether the object is complete-link or complete-sequence, l for complete-link and s for complete-sequence. Subscripts of f~ and a are for the directionality, r for rightward and l for leftward.</Paragraph> <Paragraph position="1"> Complete-link Inside Probabilities: jSlr, ~8~ Inside probability of a complete-link is the probability that word sequence wij will be generated when there is a dependency m relation between wi and wj. I</Paragraph> <Paragraph position="3"> In Figure 5, ~(i,j), the inside probability of Lr(i,j) is depicted. In the left part of the</Paragraph> <Paragraph position="5"> figure~ the gray partitions indicate all the possible constructions of St(/,m) and all the possible constructions of Sl(m + 1, j) respectively. Double-slashed links depict complete-sequences which compose the Lr together with the outermost dependency (wi ~ wj). The right part of the figure represents the chart. The bold box is the position where the jS~ is to be inscribed. Inside probability of a complete-link is the sum of the probabilities of all the possible constructions of the complete-link. As explained in the previous section, a Lr(i,j) is composed of the dependency link between word i and word j (either (wi -+ wj) or (wi ~-- wj)), S,.(i, m) and Sl(rn+l,j) for an m from i to j-1. Inside probability of Lt(i,j) can be computed the same as that of Lr (i, j) except for the direction of dependency link between zvi and wj. The outermost dependency (zoi ~ wj) must be replaced with (wi ~ wj). Lr and Lt are not defined on word sequence of 1 length, zvi. The unit probabilities for ~ and ~ are as follows:</Paragraph> <Paragraph position="7"> Any dependency tree of a sentence always has the dependency (z~BOS -+ Z~EOS) as the outermost dependency. So the ~r(BOS, EOS) is the same as the sentence probability which is the sum of probabilities of all the possible parses.</Paragraph> <Paragraph position="8"> Complete-sequence Inside Probabilities: ~r s, ~ Inside probability of complete-sequence is the probability that word sequence wio is generated when there is St(i, j) or S,(i,j).</Paragraph> <Paragraph position="10"> In Figure 6 and 7, the double-slashed link means complete-sequence, a sequence of null or more adjacent complete-links of same direction. A complete-sequence is composed of sub- null complete-sequence and sub-complete-link. Figure 6 depicts rightward complete-sequence for an m. The value of m varies from i to j-1. In Figure 7, St is composed of sub-L~ and sub-St. The inside probability of complete-sequence is the sum of the probabilities that the complete-sequences are constructed with. The basis for inside probabilities of complete-sequence are as follows.</Paragraph> <Paragraph position="11"> ~r s (i, i) = ~ (i, i) = 1</Paragraph> <Paragraph position="13"> Because n+lth word, wEos can not be a dependent of any other word, l~r(k, EOS) or ~rS(k, EOS) for k from 1 to n is not computed. And because there can be only one head of a tree, wEos must be head of only one word. Thus, in computation of flt(x, EOS) and fl~(x, EOS), only the Lts whose sub St is null are considered.</Paragraph> <Paragraph position="14"> Complete-link Outside Probability: atr, a~ This is the probability of producing the words before i and after j of a sente;ace while complete-link(i.j) generates wid.</Paragraph> <Paragraph position="16"> St respectively. In Figure 8, the outside probability of Lr which is inscribed in the bold box is computed by summing the products of the inside probabilities in the boxes under the white-headed arrow and the outside probabilities in the boxes under the black-headed</Paragraph> <Paragraph position="18"> arrow. Likewise, in Figure 9, the outside probability of Lt in the bold box is computed by slimming all the products of the inside probabilities under the white-headed arrow and the outside probabilities under the black-headed arrow each. This is because, in parsing, the subentries under the white-headed arrows and Lr/Lz in the bold boxes are merged into larger entries which are to be inscribed in the boxes under the black-headed arrows. Basis probability for complete-link outside probability is as follow.</Paragraph> <Paragraph position="19"> ~(BOS, EOS) = a~(k, EOS) is always 0, for k = 1,n + I(EOS) because wEos can not be a dependent of any other word.</Paragraph> <Paragraph position="20"> Complete-sequence Outside Probabilities: aSr, ~/ This is the probability of producing word sequence wl,i-1 and Wj+l,n while words i through j construct a complete-sequence.</Paragraph> <Paragraph position="22"> In the above expression, The first term is for the construction of larger St(i, h) from the combination of Sr(i,j) and its adjacent Lr(j: h). The second term me~n~ the construction of larger Lr(i,h) from the combination of Sr(i,j), St(j + 1,h), and the dependency link from wi to wh. The third term is for the larger Lt(i,h) from the combination of Sr(i,\]), Si(j + 1,h) and the dependency link from wh to wi. The three terms in the expression are depicted in Figure 10.</Paragraph> <Paragraph position="24"> a~ is the sum of all the probabilities that Sl is to become a subentry of larger entries: St, L~, and Lt. The first term in the above expression is for the combination of St(v,j) from Lt (v, i) and St (i, j). The second is for the construction of Lr (v, j) from St(v, i - 1), St(i, \]),</Paragraph> <Paragraph position="26"> and the dependency link from wv to wj. The third term is for the construction of L~(v,j) from Sr(v,i - 1), St(i,j), and dependency relation from w i to w~. The three cases are depicted in Figure 11. The basis probabilities for complete-sequence outside probabilities are as follows.</Paragraph> <Paragraph position="27"> ~(BOS, ZOS) = 4(BOS, EOS) = ~?(1, EOS) : 1 The reesthnation algorithm computes the inside probabilities(#r ~, #~, #r', and #t) inscribing them into the chart in bottom-up and left-to-right. The outside probabilities(c~r, a~, ar s, and ~) are computed and inscribed into the chart in top-down and right-to-left.</Paragraph> <Paragraph position="28"> Training The training process is as foflow; 1. Initialize the probabilities of dependency relations between all the possible word pairs. 2. Compute initial entropy.</Paragraph> <Paragraph position="29"> 3. Analyze the training corpus using the known probabilities, and recalculate the frequency of each dependency relation based on the analysis result. 4. Compute the new probabilities based on the newly counted frequencies. 5. Compute the new entropy.</Paragraph> <Paragraph position="30"> 6. Continue 3 through 5 until the new entropy ~ previous entropy.</Paragraph> <Paragraph position="32"> The above iteration is continued until all the probabilities are settled down or the training corpus entropy converges to the minimum. The new usage count of a dependency relation is calculated as follows. In the following expression, the OC/~ is 1 if the dependency relation</Paragraph> <Paragraph position="34"> Similarly, the usage count of (w~ ~ w#), c(wi ~- w#) is ~a~(i,j)g(i,j).</Paragraph> <Paragraph position="35"> a~ Chart has n2+4n+3 number of boxes. The reestimation algorithm computes eight items for each chart box and the computation of each item needs maximally n number of productions and snmrnations respectively. So the time complexity of the algorithm is O(nS). The algorithm can be used for class-based (or tag-based) dependency grammar. With the concept of word class/tag, the complexity is affected by the class/tag size due to the class/tag ambiguities of each word. In the worst case, the time needed is 8 x t 2 x ns/4~ ~+sn, so the complexity will be O(~n a) with respect to t, the number of classes and n, the length of input string.</Paragraph> </Section> class="xml-element"></Paper>