File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1654_intro.xml

Size: 4,104 bytes

Last Modified: 2025-10-06 14:03:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1654">
  <Title>Random Indexing using Statistical Weight Functions</Title>
  <Section position="4" start_page="457" end_page="457" type="intro">
    <SectionTitle>
2 Random Indexing
</SectionTitle>
    <Paragraph position="0"> Random Indexing is an approximating technique proposed by Kanerva et al. (2000) as an alternative to Singular Value Decomposition (SVD) for Latent Semantic Analysis (LSA, Landauer and Dumais, 1997). In LSA, it is assumed that there is some underlying dimensionality in the data, so that the attributes of two or more terms that have similar meanings can be folded onto a single axis.</Paragraph>
    <Paragraph position="1"> Sahlgren (2005) criticise LSA for being both computationally inefficient and requiring the formation of a full co-occurrence matrix and its decomposition before any similarity measurements can be made. Random Indexing avoids both these by creating a short index vector for each unique context, and producing the context vector for each term by summing index vectors for each context as it is read, allowing an incremental building of the context space.</Paragraph>
    <Paragraph position="2"> Hecht-Nielsen (1994) observed that there are many more nearly orthogonal directions in high-dimensional space than there are truly orthogonal directions. The random index vectors are nearly-orthogonal, resulting in an approximate description of the context space. The approximation comes from the Johnson-Lindenstrauss lemma (Johnson and Lindenstrauss, 1984), which states that if we project points in a vector space into a randomly selected subspace of sufficiently high dimensionality, the distances between the points are approximately preserved. Random Projection (Papadimitriou et al., 1998) and Random Mapping (Kaski, 1998) are similar techniques that use this lemma. Achlioptas (2001) showed that most zero-mean distributions with unit variance, including very simple ones like that used in Random Indexing, produce a mapping that satisfies the lemma. The following description of Random Indexing is taken from Sahlgren (2005) and Sahlgren and Karlgren (2005).</Paragraph>
    <Paragraph position="3"> We allocate a d length index vector to each unique context as is it found. These vectors consist of a large number of 0s and a small number (epsilon1) of 1s. Each element is allocated one of these values with the following probability:</Paragraph>
    <Paragraph position="5"> Context vectors are generated on-the-fly. As the corpus is scanned, for each term encountered, its contexts are extracted. For each new context, an index vector is produced for it as above. The context vector is the sum of the index vectors of all the contexts in which the term appears.</Paragraph>
    <Paragraph position="6"> The context vector for a term t appearing in one each in the contexts c1 = [1,0,0, 1] and c2 = [0,1,0, 1] would be [1,1,0, 2]. If the context c1 encountered again, no new index vector would be generated and the existing index vector for c1 would be added to the existing context vector to produce a new context vector for t of [2,1,0, 3].</Paragraph>
    <Paragraph position="7"> The distance between these context vectors can then be measured using any vector space distance measure. Sahlgren and Karlgren (2005) use the</Paragraph>
    <Paragraph position="9"> Random Indexing allows for incremental sampling. This means that the entire data set need not be sampled before similarity between terms can be measured. It also means that additional context information can be added at any time without invalidating the information already produced. This is not feasible with most other word-space models. The approach used by Grefenstette (1994) and Curran (2004) requires the re-computation of all non-linear weights if new data is added, although some of these weights can be approximated when adding new data incrementally. Similarly, new data can be folded into a reduced LSA space, but there is no guarantee that the original smoothing will apply correctly to the new data (Sahlgren, 2005).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML