File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-3026_intro.xml
Size: 2,894 bytes
Last Modified: 2025-10-06 14:02:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3026"> <Title>A Practical Solution to the Problem of Automatic Word Sense Induction</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Approach </SectionTitle> <Paragraph position="0"> The basic idea is that we do not cluster the global co-occurrence vectors of the words (based on an entire corpus) but local ones which are derived from the contexts of a single word. That is, our computations are based on the concordance of a word. Also, we do not consider a term/term but a term/context matrix. This means, for each word that we want to analyze we get an entire matrix.</Paragraph> <Paragraph position="1"> Let us exemplify this using the ambiguous word palm with its tree and hand senses. If we assume that our corpus has six occurrences of palm, i.e.</Paragraph> <Paragraph position="2"> there are six local contexts, then we can derive six local co-occurrence vectors for palm. Considering only strong associations to palm, these vectors could, for example, look as shown in table 1.</Paragraph> <Paragraph position="3"> The dots in the matrix indicate if the respective word occurs in a context or not. We use binary vectors since we assume short contexts where words usually occur only once. By looking at the matrix it is easy to see that contexts c1, c3, and c6 seem to relate to the hand sense of palm, whereas contexts c2, c4, and c5 relate to its tree sense. Our intuitions can be resembled by using a method for computing vector similarities, for example the cosine coefficient or the (binary) Jaccard-measure. If we then apply an appropriate clustering algorithm to the context vectors, we should obtain the two expected clusters. Each of the two clusters corresponds to one of the senses of palm, and the words closest to the geometric centers of the clusters should be good descriptors of each sense.</Paragraph> <Paragraph position="4"> However, as matrices of the above type can be extremely sparse, clustering is a difficult task, and common algorithms often deliver sub-optimal results. Fortunately, the problem of matrix sparseness can be minimized by reducing the dimensionality of the matrix. An appropriate algebraic method that has the capability to reduce the dimensionality of a rectangular or square matrix in an optimal way is singular value decomposition (SVD). As shown by Schutze (1997) by reducing the dimensionality a generalization effect can be achieved that often improves the results. The approach that we suggest in this paper involves reducing the number of columns (contexts) and then applying a clustering algorithm to the row vectors (words) of the resulting matrix. This works well since it is a strength of SVD to reduce the effects of sampling errors and to close gaps in the data.</Paragraph> <Paragraph position="6"/> </Section> class="xml-element"></Paper>