File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-0604_concl.xml
Size: 3,485 bytes
Last Modified: 2025-10-06 13:54:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0604"> <Title>New Experiments in Distributional Representations of Synonymy</Title> <Section position="7" start_page="30" end_page="31" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> Specific conclusions regarding the &quot;Optimal&quot; measure are problematic. We do not know whether or to what extent this particular parameter setting is universally best, best only for English, best for newswire English, or best only for the specific test we have devised. We have restricted our attention to a relatively small space of similarity measures, excluding many previously proposed measures of lexical affinity (but see Weeds, et al (2004), and Lee (1999) for some empirical comparisons). Lee observed that measures from the space of invariant divergences (particularly the JS and skew divergences) perform at least as well as any of a wide variety of alternatives. As noted, we experimented with the JS divergence and observed accuracies that tracked those of the Hellinger closely. This provides a point of comparison with the measures investigated by Lee, and recommends both Ehlert's measure and what we have called &quot;Optimal&quot; as credible, perhaps superior alternatives. More generally, our results argue for some form of feature importance weighting.</Paragraph> <Paragraph position="1"> Empirically, the strength of Optimal on the WBST is a feature of its robustness in the presence of polysemy. Both Ehlert and Optimal are expressed as a sum of ratios, in which the numerator is a product of some function of conditional context probabilities, and the denominator is some function of the marginal probability. The Optimal exponents on both the numerator and denominator have the effect of advantaging lower-probability events, relative to Ehlert. In our test, WordNet senses are sampled uniformly at random. Perhaps its emphasis on lower probability events allows Optimal to sacrifice some fidelity on high-frequency senses in exchange for increased sensitivity to low-frequency ones.</Paragraph> <Paragraph position="2"> It is clear, however, that polysemy is a critical hurdle confronting distributional approaches to lexical semantics. Figure 1 shows that, in the absence of polysemy, distributional comparisons detect synonymy quite well. Much of the human advantage over machines on this task may be attributed to an awareness of polysemy. In order to achieve performance comparable to that of humans, therefore, it is probably not enough to optimize context policies or to rely on larger collections of text. Instead, we require strategies for detecting and resolving latent word senses.</Paragraph> <Paragraph position="3"> Pantel and Lin (2002) propose one such method, evaluated by finding the degree of overlap between sense clusters and synsets in WordNet. The above considerations suggest that a possibly more pertinent test of such approaches is to evaluate their utility in the detection of semantic similarity between specific polysemous terms. We expect to undertake such an evaluation in future work.</Paragraph> <Paragraph position="4"> Acknowledgments. This material is based on work funded in whole or in part by the U.S. Government. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors, and do not necessarily reflect the views of the U.S. Government.</Paragraph> </Section> class="xml-element"></Paper>