File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/p99-1004_evalu.xml

Size: 2,832 bytes

Last Modified: 2025-10-06 14:00:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1004">
  <Title>Measures of Distributional Similarity</Title>
  <Section position="6" start_page="30" end_page="30" type="evalu">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> In this paper, we empirically evaluated a number of distributional similarity measures, including the skew divergence, and analyzed their information sources. We observed that the ability of a similarity function f(q, r) to select useful nearest neighbors appears to be correlated with its focus on the intersection Vqr of the supports of q and r. This is of interest from a computational point of view because Vqr tends to be a relatively small subset of V, the set of all verbs.</Paragraph>
    <Paragraph position="1"> Furthermore, it suggests downplaying the role of negative information, which is encoded by verbs appearing with exactly one noun, although the Jaccard coefficient does take this type of information into account.</Paragraph>
    <Paragraph position="2"> Our explicit division of V-space into various support regions has been implicitly considered in other work. Smadja et al. (1996) observe that for two potential mutual translations X and Y, the fact that X occurs with translation Y indicates association; X's occurring with a translation other than Y decreases one's belief in their association; but the absence of both X and Y yields no information. In essence, Smadja et al. argue that information from the union of supports, rather than the just the intersection, is important. D. Lin (1997; 1998a) takes an axiomatic approach to determining the characteristics of a good similarity measure. Starting with a formalization (based on certain assumptions) of the intuition that the similarity between two events depends on both their commonality and their differences, he derives a unique similarity function schema. The  definition of commonality is left to the user (several different definitions are proposed for different tasks).</Paragraph>
    <Paragraph position="3"> We view the empirical approach taken in this paper as complementary to Lin's. That is, we are working in the context of a particular application, and, while we have no mathematical certainty of the importance of the &amp;quot;common support&amp;quot; information, we did not assume it a priori; rather, we let the performance data guide our thinking.</Paragraph>
    <Paragraph position="4"> Finally, we observe that the skew metric seems quite promising. We conjecture that appropriate values for a may inversely correspond to the degree of sparseness in the data, and intend in the future to test this conjecture on larger-scale prediction tasks. We also plan to evaluate skewed versions of the Jensen-Shannon divergence proposed by Rao (1982) and J. Lin (1991).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML