File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1146_intro.xml
Size: 4,860 bytes
Last Modified: 2025-10-06 14:02:10
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1146"> <Title>Characterising Measures of Lexical Distributional Similarity</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Distributional similarity measures </SectionTitle> <Paragraph position="0"> In this section, we introduce some basic concepts and then discuss the ten distributional similarity measures used in this study.</Paragraph> <Paragraph position="1"> The co-occurrence types of a target word are the contexts, c, in which it occurs and these have associated frequencies which may be used to form probability estimates. In our work, the co-occurrence types are always grammatical dependency relations. For example, in Sections 3 to 5, similarity between nouns is derived from their co-occurrences with verbs in the direct-object position. In Section 6, similarity between verbs is derived from their subjects and objects.</Paragraph> <Paragraph position="2"> The k nearest neighbours of a target word w are the k words for which similarity with w is greatest. Our use of the term similarity measure encompasses measures which should strictly be referred to as distance, divergence or dissimilarity measures. An increase in distance correlates with a decrease in similarity. However, either type of measure can be used to nd the k nearest neighbours of a target word.</Paragraph> <Paragraph position="3"> Table 1 lists ten distributional similarity measures. The cosine measure (Salton and McGill, 1983) returns the cosine of the angle between two vectors.</Paragraph> <Paragraph position="4"> The Jensen-Shannon (JS) divergence measure (Rao, 1983) and the -skew divergence measure (Lee, 1999) are based on the Kullback-Leibler (KL) divergence measure. The KL divergence, or relative entropy, D(pjjq), between two probability distribution functions p and q is de ned (Cover and Thomas, 1991) as the \ine ciency of assuming that the distribution is q when the true distribution is p&quot;: D(pjjq) = Pcplog pq. However, D(pjjq) = 1 if there are any contexts c for which p(c) > 0 and q(c) = 0. Thus, this measure cannot be used directly on maximum likelihood estimate (MLE) probabilities.</Paragraph> <Paragraph position="5"> One possible solution is to use the JS divergence measure, which measures the cost of using the average distribution in place of each individual distribution. Another is the -skew divergence measure, which uses the p distribution to smooth the q distribution. The value of the parameter controls the extent to which the KL divergence is approximated. We use = 0:99 since this provides a close approximation to the KL divergence and has been shown to provide good results in previous research (Lee, 2001).</Paragraph> <Paragraph position="6"> The confusion probability (Sugawara et al., 1985) is an estimate of the probability that one word can be substituted for another. Words w1 and w2 are completely confusable if we are equally as likely to see w2 in a given context as we are to see w1 in that context.</Paragraph> <Paragraph position="7"> Jaccard's coe cient (Salton and McGill, 1983) calculates the proportion of features belonging to either word that are shared by both words. In the simplest case, the features of a word are de ned as the contexts in which it has been seen to occur. simja+mi is a variant (Lin, 1998) in which the features of a word are those contexts for which the pointwise mutual information (MI) between the word and the context is positive, where MI can be calculated using I(c;w) = log P(cjw)P(c) . The related Dice Coe cient (Frakes and Baeza-Yates, 1992) is omitted here since it has been shown (van Rijsbergen, 1979) that Dice and Jaccard's Coe cients are monotonic in each other.</Paragraph> <Paragraph position="8"> Lin's Measure (Lin, 1998) is based on his information-theoretic similarity theorem, which states, \the similarity between A and B is measured by the ratio between the amount of information needed to state the commonality of A and B and the information needed to fully describe what A and B are.&quot; The nal three measures are settings in the additive MI-based Co-occurrence Retrieval Model (AMCRM) (Weeds and Weir, 2003; Weeds, 2003). We can measure the precision and the recall of a potential neighbour's retrieval of the co-occurrences of the target word, where the sets of required and retrieved co-occurrences (F(w1) and F(w2) respectively) are those co-occurrences for which MI is positive.</Paragraph> <Paragraph position="9"> Neighbours with both high precision and high recall retrieval can be obtained by computing</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Measure Function </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"/> </Section> </Section> class="xml-element"></Paper>