File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1147_evalu.xml

Size: 7,076 bytes

Last Modified: 2025-10-06 13:59:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1147">
  <Title>Fast Computation of Lexical Affinity Models</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> We use the empirical and the parametric affinity distributions in two applications. In both, the independence model is used as a baseline.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Log-Likelihood Ratio
</SectionTitle>
      <Paragraph position="0"> The co-occurrence distributions assign probabilities for each pair at every distance. We can compare point estimations from distributions and how unlikely they are by means of log-likelihood ratio test:  where a4 a195 and a4a19a197 are the parameters for a14a46a36a47a5a8a7a48a18a25a24a44a12 under the empirical distribution and independence models, respectively. It is also possible to use the cumulative a39a198a36 instead of a14a6a36 . Figure 3 show log-likelihood ratios using the asymmetric empirical distribution and Figure 4 depicts log-likelihood ratio using the symmetric distribution.</Paragraph>
      <Paragraph position="1"> A set of fill-in-the-blanks questions taken from GRE general tests were answered using the log-likelihood ratio. For each question a sentence with one or two blanks along with a set of options a199 was given, as shown in Figure 5.</Paragraph>
      <Paragraph position="2"> The correct alternative maximizes the likelihood of the complete sentence a200 :  where a45a42a31a34a33a35 is distance of a7 and a24 in the sentence. Since only the blanks change from one alternative to another, the remaining pairs are treated as constants and can be ignored for the purpose of ranking:  for every a7a37a206a87a199 .</Paragraph>
      <Paragraph position="3"> It is not necessary to compute the likelihood for all pairs in the whole sentence, instead a cut-off for the maximum distance can be specified. If the cut-off is two, then the resulting behavior will be similar to a word bigram language model (with different estimates). An increase in the cut-off has two immediate implications. First, it will incorporate the surroundings of the word as context. Second, it causes an undirect effect of smoothing, since we use cumulative probabilities to compute the likelihood. As with any distance model, this approach has the drawback of allowing constructions that are not syntactically valid.</Paragraph>
      <Paragraph position="4"> The tests used are from GRE practice tests extracted from the websites: gre.org (9 questions), PrincetonReview.com(11 questions), Syvum.com (15 questions) and Microedu.com (28 questions). Table 2 shows the results for a cut-off of seven words. Every questions has five options, and thus selecting the answer at random gives an expected score of 20%. Our framework answers 55% of the questions.</Paragraph>
      <Paragraph position="5"> The science of seismology has grown just  enough so that the first overly bold theories have been .</Paragraph>
      <Paragraph position="6"> a) magnetic. . . accepted b) predictive . . . protected c) fledgling. . . refuted d) exploratory . . . recalled e) tentative. . . analyzed</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Skew
</SectionTitle>
      <Paragraph position="0"> Our second evaluation uses the parametric affinity model. We use the skew of the fitted model to evaluate the degree of affinity of two terms. We validated our hypothesis that a greater positive skew corresponds to more affinity. A list of pairs from word association norms and a list of randomly picked pairs are used. Word association is a common test in psychology (Nelson et al., 2000), and it consists of a person providing an answer to a stimulus word by giving an associated one in response. The set of words used in the test are called &amp;quot;norms&amp;quot;. Many word association norms are available in psychology literature, we chose the Minnesota word association norms for our experiments (Jenkings, 1970). It is composed of 100 stimulus words and the most frequent answer given by 1000 individuals who took the test. We also use 100 word pairs generated by randomly choosing words from a small dictionary.</Paragraph>
      <Paragraph position="1"> The skew in the gamma distribution is a207a208a58a210a209a204a211a62a212 a91 and table 3 shows the normalized skew for the association and the random pair sets. Note that the set of 100 random pairs include some non-independent ones.</Paragraph>
      <Paragraph position="2"> The value of the skew was then tested on a set of TOEFL synonym questions. Each question in this synonym test set is composed of one target word and a set of four alternatives. This TOEFL synonym test set has been used by several other researchers. It was first used in the context of Latent Semantic Analisys(LSA) (Landauer and Dumais, 1997), where 64.4% of the questions were answered correctly. Turney (Turney, 2001) and Terra et al. (Terra and Clarke, 2003) used different sim- null the questions, achieving 73.75% and 81.25% correct answers respectively. Jarmasz (Jarmasz and Szpakowicz, 2003) used a thesaurus to compute the distance between the alternatives and the target word, answering 78.75% correctly. Turney (Turney et al., 2003) trained a system to answer the questions with an approach based on combined components, including a module for LSA, PMI, thesaurus and some heuristics based on the patterns of synonyms. This combined approach answered 97.50% of the questions correctly after being trained over 351 examples. With the exception of (Turney et al., 2003), all previous approaches were not exclusively designed for the task of answering TOEFL synonym questions.</Paragraph>
      <Paragraph position="3"> In order to estimate a91 and a93 we compute the empirical distribution. This distribution provides us with the right hand side of the equation 4 and we can solve fora91 numerically. The calculation ofa93 is then straightforward. Using only skew, we were able to answer 78.75% of the TOEFL questions correctly.</Paragraph>
      <Paragraph position="4"> Since skew represents the degree of asymmetry of the affinity model, this result suggests that skew and synonymy are strongly related.</Paragraph>
      <Paragraph position="5"> We also used log-likelihood to solve the TOEFL synonym questions. For each target-alternative pair, we calculated the log-likelihood for every distance in the range four to 750. The initial cut-off discarded the affinity caused by phrases containing both target and alternative words. The upper cut-off of 750 represents the average document size in the collection. The cumulative log-likelihood was then used as the score for each alternative, and we considered the best alternative the one with higher accumulated log-likelihood. With this approach, we are able to answer 86.25% of questions correctly, which is a substantial improvement over similar methods, which do not require training data.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML