File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/w96-0112_evalu.xml

Size: 8,492 bytes

Last Modified: 2025-10-06 14:00:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0112">
  <Title>A Probabilistic Disambiguation Method Based on Psycholinguistic Principles</Title>
  <Section position="8" start_page="148" end_page="151" type="evalu">
    <SectionTitle>
6 Experimental Results
</SectionTitle>
    <Paragraph position="0"> We have conducted experiments to test the effectiveness of our proposed method. This section describes the results. In the experiments, we considered only resolving pp-attachment ambigui- null ties and coordinate structure ambiguities. These two kinds of ambiguities are typical, and other ambiguities can be resolved in the same way (Hobbs and Bear, 1990).</Paragraph>
    <Paragraph position="1"> We first defined 12 CFG rules as our grammar to be used by a parser which calculates a preference for each partial interpretation, and always retains the N most preferable partial interpretations 7. We have not yet actually constructed such a parser, however, and use a parser called 'SAX,' previously developed by Matsumoto &amp; Sugimura (Matsumoto and Sugimura, 1986), which calculates a preference for each interpretation after it obtains each interpretation.</Paragraph>
    <Paragraph position="2"> We then trained the parameters of probability models. We extracted 181,250 case frames from the WSJ (Wall Street Journal) bracketed corpus of the Penn Tree Bank (Marcus et al., 1993).</Paragraph>
    <Paragraph position="3"> We used these data to estimate three-word probabilities and two-word probabilities* Furthermore, we extracted 963 sentences from the WSJ tagged corpus of the Penn Tree Bank. We used SAX to analyze the sentences and selected the correct syntactic trees by hand. We then employed the Maximum Likelihood Estimator to estimate length probabilities using the selected syntactic trees, e.g., if CFG rule NP ~ NP, PP is applied x times, and among the attachments obtained by applying this rule, xi of them have the lengths of 2 and 3, then the length probability P(2, 31(NP NP, PP)) is estimated as ~. It is known, in statistics, that the number of samples required for accurate estimation of a probabilistic model is roughly proportional to the number of parameters in the target model, and thus the data used for training length probabilities were nearly sufficient. Figure 3 plots the estimated length probabilities versus the lengths, for two CFG rules. The result indicates that there are more attachments attached to nearby phrases than are attached to distant ones in the training data. Moreover, the length probabilities for CFG rule VP ~ VP, PP and those for CFG rule NP ~ NP, PP show different distribution patterns, suggesting that syntactic preference is a function of a CFG rule.</Paragraph>
    <Paragraph position="4">  We then extracted 249 sentences from a part of the in training as our test data and analyzed the sentences.</Paragraph>
    <Paragraph position="5"> obtained interpretations as follows: if  tagged WSJ corpus which was not used When analizing a sentence, we rank the tit is necessary to do so, as the number increases (Church and Patil, 1982).</Paragraph>
    <Paragraph position="7"> of ambiguities will increase drastically when the length of an input, sentence  where /1 and I2 denote any two interpretations. Ptex3() denotes the lexical likelihood value of an interpretation calculated as the geometric mean of three-word probabilities, Pte,2() the lexical likelihood value of an interpretation calculated as the geometric mean of two-word probabilities, and Psyn() the syntactic likelihood value of an interpretation. The average number of interpretations obtained in the analysis of a sentence was 2.4.</Paragraph>
    <Paragraph position="8">  The number i accuracy obtained was 89.2% (Table 1 represents this result as 'Lex3+Lex2+Syn'), where the number n accuracy is defined as the fraction of the test sentences whose preferred interpretation is successfully ranked in the first n candidates. We feel that this result is very encouraging. Table 2 shows the breakdown of the result, in which 'Lex3' stands for the proportion determined by using lexical likelihood Pzex3, 'Lex2' by using lexical likelihood Pl~x2, and 'Syn' by using syntactic likelihood Psyn. The accuracies of 'Lex3,' 'Lex2,' and 'Syn' were 95.7%, 87.0%, and 66.7%, respectively. Furthermore, 'Lex3,' 'Lex2,' and 'Syn' formed 47.0%, 43.4%, and 9.6% of the disambiguation results, respectively.</Paragraph>
    <Paragraph position="9"> We further examined the types of mistakes made by our method. First, there were some mistakes by 'Syn.' For example, in Rain washes the fertilizers off the land, (25)  there are two interpretations. The lexical likelihood values PtC/~3 of the two interpretations were calculated as 0, and the lexical likelihood values Pt~x2 of the two interpretations were calculated as 0, as well. The interpretations were ranked by the syntactic likelihood Psy~, and the interpretation of attaching the 'off' phrase to 'fertilizer' was mistakenly preferred. We also found some mistakes by 'Lex2.' For example, in The parents reclaimed the child under the circumstances, (26) there are two interpretations. The lexical likelihood values Pt~.~3 of the two interpretations were calculated as 0. The lexical likelihood value Pl~2 of the interpretation of attaching 'under' phrase to 'child' was higher than that of attaching it to 'reclaimed,' as there were many expressions like 'a child under five' observed in the training data. And thus the former interpretation was mistakenly preferred. It is obvious that these kinds of mistakes could be avoided if more data were available. We conclude that the most effective way of improving disambiguation results is to increase data for training lexical preference.</Paragraph>
    <Paragraph position="10"> We further checked the disambiguation decisions made by 'Syn' when 'Lex3' and 'Lex2' fail to work, and found that all of the prepositional phrases in these sentences were attached to nearby phrases by 'Syn,' indicating that using syntactic likelihood can help to achieve a functioning of RAP. One may argue that we could obtain the same number 1 accuracy if we were to employ a deterministic approach in implementing RAP. As we pointed out earlier, however, if we are to obtain the N most preferred interpretations, we need to use syntactic likelihood. To verify that the syntactic likelihood is indeed useful, we conducted the following additional experiment. We ranked the interpretations of each of the 249 test sentences using only syntactic likelihood. We also selected the interpretation with phrases always attached to nearby phrases as the most preferred ones, and randomly selected interpretations from what remain as the nth most preferred ones. We evaluated the results on the basis of the number n accuracy. Figure 4 shows the top 5 accuracies of the stochastic approach and the deterministic approach. The results indicate that the former outperforms the latter. (The number 2 accuracy for both methods increases drastically, as many test sentences have only two interpretations.} The improvement is not significant, however. We expect the effect of the use of the syntactic likelihood to become more significant when longer sentences are used in future analyses.</Paragraph>
    <Paragraph position="11"> In place of a length probability model, we used PCFG for calculating syntactic preference. We employed the Maximum Likelihood Estimator to estimate the parameters of PCFG (we did not use the so-called 'inside-outside algorithm' (Jelinek et al., 1990; Lari and Young, 1990)), making use of the same training data as those used for the length probability model. Table 1 represents this result as 'Lex3+Lex2+PCFG.' Our experimental results indicate that our method of using a length probability model outperforms that of using PCFG.</Paragraph>
    <Paragraph position="12"> Instead of the back-off method, we used the product of lexical likelihood values and syntactic likelihood values to rank interpretations. When using lexical likelihood, we use a lexical likelihood value calculated fl'om three-word probabilities, provided that it is not 0, otherwise we use a lexical likelihood value calculated from two-word probabilities. Table 1 represents this result as 'Lex3(Lex2)x Syn.' When the preference values of all of the interpretations obtained are calculated as 0, we rank the interpretations at random. Our results indicate that it is preferable to employ the back-off method.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML