File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-3026_concl.xml

Size: 2,176 bytes

Last Modified: 2025-10-06 13:54:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3026">
  <Title>A Practical Solution to the Problem of Automatic Word Sense Induction</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions and prospects
</SectionTitle>
    <Paragraph position="0"> From the observations described above we conclude that avoiding the mixture of senses, i.e.</Paragraph>
    <Paragraph position="1"> clustering local context vectors instead of global co-occurrence vectors, is a good way to deal with the problem of word sense induction. However, there is a pitfall, as the matrices of local vectors are extremely sparse. Fortunately, our simulations suggest that computing the main dimensions of a matrix through SVD solves the problem of sparseness and greatly improves clustering results.</Paragraph>
    <Paragraph position="2"> Although the results that we presented in this paper seem useful even for practical purposes, we can not claim that our algorithm is capable of finding all the fine grained distinctions that are listed in manually created dictionaries such as the Longman Dictionary of Contemporary English (LDOCE), or in lexical databases such as WordNet.</Paragraph>
    <Paragraph position="3"> For future improvement of the algorithm we see two main possibilities: 1) Considering all context words instead of only the top 30 associations would further reduce the sparse data problem. However, this requires finding an appropriate association function. This is difficult, as for example the log-likelihood ratio, although delivering almost perfect rankings, has an inappropriate value characteristic: The increase in computed strengths is over-proportional for stronger associations. This prevents the SVD from finding optimal dimensions.</Paragraph>
    <Paragraph position="4"> 2) The principle of avoiding mixtures can be applied more consequently if not only local instead of global vectors are used, but if also the parts of speech of the context words are considered. By operating on a part-of-speech tagged corpus those sense distinctions that have an effect on part of speech can be taken into account.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML