File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1024_metho.xml
Size: 11,071 bytes
Last Modified: 2025-10-06 14:14:21
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1024"> <Title>Parsing Algorithms and Metrics</Title> <Section position="4" start_page="178" end_page="179" type="metho"> <SectionTitle> 3 Labelled Recall Parsing </SectionTitle> <Paragraph position="0"> Consider writing a parser for a domain such as machine assisted translation. One could use the Labelled Tree Algorithm, which would maximize the expected number of exactly correct parses. However, since the number of correct constituents is a better measure of application performance for this domain than the number of correct trees, perhaps one should use an algorithm which maximizes the Labelled Recall criterion, rather than the Labelled Tree criterion.</Paragraph> <Paragraph position="1"> The Labelled Recall Algorithm finds that tree TG which has the highest expected value for the Labelled Recall Rate, L/Nc (where L is the number of correct labelled constituents, and Nc is the number of nodes in the correct parse). This can be written as follows:</Paragraph> <Paragraph position="3"> It is not immediately obvious that the maximization of expression (2) is in fact different from the maximization of expression (1), but a simple example illustrates the difference. The following grammar generates four trees with equal probability:</Paragraph> <Paragraph position="5"> For the first tree, the probabilities of being correct are S: 100%; A:50%; and C: 25%. Similar counting holds for the other three. Thus, the expected value of L for any of these trees is 1.75.</Paragraph> <Paragraph position="6"> On the other hand, the optimal Labelled Recall parse is</Paragraph> <Paragraph position="8"> This tree has 0 probability according to the grammar, and thus is non-optimal according to the Labelled Tree Rate criterion. However, for this tree the probabilities of each node being correct are S: 100%; A: 50%; and B: 50%. The expected value of L is 2.0, the highest of any tree. This tree therefore optimizes the Labelled Recall Rate.</Paragraph> <Section position="1" start_page="178" end_page="179" type="sub_section"> <SectionTitle> 3.1 Algorithm </SectionTitle> <Paragraph position="0"> We now derive an algorithm for finding the parse that maximizes the expected Labelled Recall Rate.</Paragraph> <Paragraph position="1"> We do this by expanding expression (2) out into a probabilistic form, converting this into a recursive equation, and finally creating an equivalent dynamic programming algorithm.</Paragraph> <Paragraph position="2"> We begin by rewriting expression (2), expanding out the expected value operator, and removing the which is the same for all TG, and so plays no NC ' role in the maximization.</Paragraph> <Paragraph position="4"> Now, given a PCFG with start symbol S, the following equality holds:</Paragraph> <Paragraph position="6"> By rearranging the summation in expression (5) and then substituting this equality, we get</Paragraph> <Paragraph position="8"> At this point, it is useful to introduce the Inside and Outside probabilities, due to Baker (1979), and explained by Lari and Young (1990). The Inside probability is defined as e(s,t,X) = P(X =~ w~) and the Outside probability is f(s, t, X) = P(S =~ 8-I n w 1 Xwt+l). Note that while Baker and others have used these probabilites for inducing grammars, here they are used only for parsing.</Paragraph> <Paragraph position="9"> Let us define a new function, g(s, t, X).</Paragraph> <Paragraph position="11"> Now, the definition of a Labelled Recall Parse can be rewritten as</Paragraph> <Paragraph position="13"> Given the matrix g(s, t, X) it is a simple matter of dynamic programming to determine the parse that maximizes the Labelled Recall criterion. Define</Paragraph> <Paragraph position="15"> It is clear that MAXC(1, n) contains the score of the best parse according to the Labelled Recall criterion. This equation can be converted into the dynamic programming algorithm shown in Figure 1.</Paragraph> <Paragraph position="16"> For a grammar with r rules and k nonterminals, the run time of this algorithm is O(n 3 + kn 2) since there are two layers of outer loops, each with run time at most n, and an inner loop, over nonterminals and n. However, this is dominated by the computation of the Inside and Outside probabilities, which takes time O(rna).</Paragraph> <Paragraph position="17"> By modifying the algorithm slightly to record the actual split used at each node, we can recover the best parse. The entry maxc\[1, n\] contains the expected number of correct constituents, given the model.</Paragraph> </Section> </Section> <Section position="5" start_page="179" end_page="180" type="metho"> <SectionTitle> 4 Bracketed Recall Parsing </SectionTitle> <Paragraph position="0"> The Labelled Recall Algorithm maximizes the expected number of correct labelled constituents.</Paragraph> <Paragraph position="1"> However, many commonly used evaluation metrics, such as the Consistent Brackets Recall Rate, ignore labels. Similarly, some grammar induction algorithms, such as those used by Pereira and Schabes (1992) do not produce meaningful labels. In particular, the Pereira and Schabes method induces a grammar from the brackets in the treebank, ignoring the labels. While the induced grammar has labels, they are not related to those in the treebank. Thus, although the Labelled Recall Algorithm could be used in these domains, perhaps maximizing a criterion that is more closely tied to the domain will produce better results. Ideally, we would maximize the Consistent Brackets Recall Rate directly. However, since it is time-consuming to deal with Consistent Brackets, we instead use the closely related Bracketed Recall Rate.</Paragraph> <Paragraph position="2"> For the Bracketed Recall Algorithm, we find the parse that maximizes the expected Bracketed Recall Rate, B/Nc. (Remember that B is the number of brackets that are correct, and Nc is the number of constituents in the correct parse.)</Paragraph> <Paragraph position="4"> Following a derivation similar to that used for the Labelled Recall Algorithm, we can rewrite equation</Paragraph> <Paragraph position="6"> The algorithm for Bracketed Recall parsing is extremely similar to that for Labelled Recall parsing.</Paragraph> <Paragraph position="7"> The only required change is that we sum over the symbols X to calculate max_g, rather than maximize over them.</Paragraph> </Section> <Section position="6" start_page="180" end_page="181" type="metho"> <SectionTitle> 5 Experimental Results </SectionTitle> <Paragraph position="0"> We describe two experiments for testing these algorithms. The first uses a grammar without meaningful nonterminal symbols, and compares the Bracketed Recall Algorithm to the traditional Labelled Tree (Viterbi) Algorithm. The second uses a grammar with meaningful nonterminal symbols and performs a three-way comparison between the Labelled Recall, Bracketed Recall, and Labelled Tree Algorithms. These experiments show that use of an algorithm matched appropriately to the evaluation criterion can lead to as much as a 10% reduction in error rate.</Paragraph> <Paragraph position="1"> In both experiments the grammars could not parse some sentences, 0.5% and 9%, respectively. The unparsable data were assigned a right branching structure with their rightmost element attached high.</Paragraph> <Paragraph position="2"> Since all three algorithms fail on the same sentences, all algorithms were affected equally.</Paragraph> <Section position="1" start_page="180" end_page="180" type="sub_section"> <SectionTitle> 5.1 Experiment with Grammar Induced by Pereira and Schabes Method </SectionTitle> <Paragraph position="0"> The experiment of Pereira and Schabes (1992) was duplicated. In that experiment, a grammar was trained from a bracketed form of the TI section of the ATIS corpus 1 using a modified form of the Inside-Outside Algorithm. Pereira and Schabes then used the Labelled Tree Algorithm to select the best parse for sentences in held out test data. The experiment was repeated here, except that both the Labelled Tree and Labelled Recall Algorithm were run for each sentence. In contrast to previous research, we repeated the experiment ten times, with different training set, test set, and initial conditions each time.</Paragraph> <Paragraph position="1"> Table 1 shows the results of running this experiment, giving the minimum, maximum, mean, and standard deviation for three criteria, Consis- null sus Bracketed Recall for Pereira and Schabes Bracketed Recall. We also display these statistics for the paired differences between the algorithms.</Paragraph> <Paragraph position="2"> The only statistically significant difference is that for Consistent Brackets Recall Rate, which was significant to the 2% significance level (paired t-test). Thus, use of the Bracketed Recall Algorithm leads to a 10% reduction in error rate.</Paragraph> <Paragraph position="3"> In addition, the performance of the Bracketed Recall Algorithm was also qualitatively more appealing. Figure 2 shows typical results. Notice that the</Paragraph> </Section> <Section position="2" start_page="180" end_page="180" type="sub_section"> <SectionTitle> Bracketed Recall Algorithm's Consistent Brackets </SectionTitle> <Paragraph position="0"> Rate (versus iteration) is smoother and more nearly monotonic than the Labelled Tree Algorithm's. The Bracketed Recall Algorithm also gets off to a much faster start, and is generally (although not always) above the Labelled Tree level. For the Labelled Tree Rate, the two are usually very comparable.</Paragraph> </Section> <Section position="3" start_page="180" end_page="181" type="sub_section"> <SectionTitle> 5.2 Experiment with Grammar Induced by Counting </SectionTitle> <Paragraph position="0"> The replication of the Pereira and Schabes experiment was useful for testing the Bracketed Recall Algorithm. However, since that experiment induces a grammar with nonterminals not comparable to those in the training, a different experiment is needed to evaluate the Labelled Recall Algorithm, one in which the nonterminals in the induced grammar are the same as the nonterminals in the test set.</Paragraph> <Paragraph position="1"> For this experiment, a very simple grammar was induced by counting, using a portion of the Penn Tree Bank, version 0.5. In particular, the trees were first made binary branching by removing epsilon productions, collapsing singleton productions, and converting n-ary productions (n > 2) as in figure 3. The resulting trees were treated as the &quot;Correct&quot; trees in the evaluation. Only trees with forty or fewer symbols were used in this experiment.</Paragraph> <Paragraph position="3"> Branching 6 Conclusions and Future Work A grammar was then induced in a straightforward way from these trees, simply by giving one count for each observed production. No smoothing was done.</Paragraph> <Paragraph position="4"> There were 1805 sentences and 38610 nonterminals in the test data.</Paragraph> </Section> </Section> class="xml-element"></Paper>