File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1013_intro.xml

Size: 8,823 bytes

Last Modified: 2025-10-06 14:03:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1013">
  <Title>Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing</Title>
  <Section position="2" start_page="0" end_page="98" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Dimensionality reduction techniques are of great relevance within the field of natural language processing. A persistent problem within language processing is the over-specificity of language, and the sparsity of data. Corpus-based techniques depend on a sufficiency of examples in order to model human language use, but the Zipfian nature of frequency behaviour in language means that this approach has diminishing returns with corpus size. In short, there are a large number of ways to say the same thing, and no matter how large your corpus is, you will never cover all the things that might reasonably be said. Language is often too rich for the task being performed; for example it can be difficult to establish that two documents are discussing the same topic.</Paragraph>
    <Paragraph position="1"> Likewise no matter how much data your system has seen during training, it will invariably see something new at run-time in a domain of any complexity. Any approach to automatic natural language processing will encounter this problem on several levels, creating a need for techniques which compensate for this.</Paragraph>
    <Paragraph position="2"> Imagine we have a set of data stored as a matrix. Techniques based on eigen decomposition allow such a matrix to be transformedinto a set of orthogonal vectors, each with an associated &amp;quot;strength&amp;quot;, or eigenvalue. This transformation allows the data contained in the matrix to be compressed; by discarding the less significant vectors (dimensions) the matrix can be approximated with fewer numbers. This is what is meant by dimensionality reduction.</Paragraph>
    <Paragraph position="3"> The technique is guaranteed to return the closest (least squared error) approximation possible for a given number of numbers (Golub and Reinsch, 1970). In certain domains, however, the technique has even greater significance. It is effectively forcing the data through a bottleneck; requiring it to describe itself using an impoverished construct set. This can allow the critical underlying features to reveal themselves. In language, for example, these features might be semantic constructs. It can also improve the data, in the case that the detail is noise, or richness not relevant to the task.</Paragraph>
    <Paragraph position="4"> Singular value decomposition (SVD) is a near relative of eigen decomposition, appropriate to domains where input is asymmetrical. The best known application of singular value decomposition within natural language processing is Latent Semantic Analysis (Deerwester et al., 1990). Latent Semantic Analysis (LSA) allows passages of text to be compared to each other in a reduced-dimensionality semantic space, based on the wordsthey contain.</Paragraph>
    <Paragraph position="5">  The technique has been successfully applied to information retrieval, where the overspecificity of language is particularly problematic; text searches often miss relevant documents where different vocabulary has been chosen in the search terms to that used in the document (for example, the user searches on &amp;quot;eigen decomposition&amp;quot; and fails to retrieve documents on factor analysis). LSA has also been applied in language modelling (Bellegarda, 2000), where it has been used to incorporate long-span semantic dependencies.</Paragraph>
    <Paragraph position="6"> Much research has been done on optimising eigen decomposition algorithms, and the extent to which they can be optimised depends on the area of application. Most natural language problems involve sparse matrices, since there are many words in a natural language and the great majority do not appear in, for example, any one document. Domains in which matrices are less sparse lend themselves to such techniques as Golub-Kahan-Reinsch (Golub and Reinsch, 1970) and Jacobi-like approaches. Techniques such as those described in (Berry, 1992) are more appropriate in the natural language domain.</Paragraph>
    <Paragraph position="7"> Optimisation is an important way to increase the applicability of eigen and singular value decomposition. Designing algorithms that accommodate different requirements is another. For example, another drawback to Jacobi-like approaches is that they calculate all the singular triplets (singular vector pairs with associated values) simultaneously, which may not be practical in a situation where only the top few are required. Consider also that the methods mentioned so far assume that the entire matrix is available from the start. There are many situations in which data may continue to become available.</Paragraph>
    <Paragraph position="8"> (Berry et al., 1995) describe a number of techniques for including new data in an existing decomposition. Their techniques apply to a situation in which SVD has been performed on a collection of data, then new data becomes available. However, these techniques are either expensive, or else they are approximations which degrade in quality over time. They are useful in the context of updating an existing batch decomposition with a second batch of data, but are less applicable in the case where data are presented serially, for example, in the context of a learning system.</Paragraph>
    <Paragraph position="9"> Furthermore, there are limits to the size of matrix that can feasibly be processed using batch decomposition techniques. This is especially relevant within natural language processing, where very large corpora are common. Random Indexing (Kanerva et al., 2000) provides a less principled, though very simple and efficient, alternative to SVD for dimensionality reduction over large corpora.</Paragraph>
    <Paragraph position="10"> This paper describes an approach to singular value decomposition based on the Generalized Hebbian Algorithm (Sanger, 1989). GHA calculates the eigen decomposition of a matrix based on single observations presented serially. The algorithm presented here differs in that where GHA produces the eigen decomposition of symmetrical data, our algorithm produces the singular value decomposition of asymmetrical data. It allows singular vectors to be learned from paired inputs presented serially using no more memory than is required to store the singular vector pairs themselves.</Paragraph>
    <Paragraph position="11"> It is therefore relevant in situations where the size of the dataset makes conventional batch approaches infeasible. It is also of interest in the context of adaptivity, since it has the potential to adapt to changing input. The learning update operation is very cheap computationally. Assuming a stable vector length, each update operation takes exactly as long as each previous one; there is no increase with corpus size to the speed of the update. Matrix dimensions may increase during processing. The algorithm produces singular vector pairs one at a time, starting with the most significant, which means that useful data becomes available quickly; many standard techniques produce the entire decomposition simultaneously.</Paragraph>
    <Paragraph position="12"> Since it is a learning technique, however, it differs from what would normally be considered an incremental technique, in that the algorithm converges on the singular value decomposition of the dataset, rather than at any one point having the best solution possible for the data it has seen so far. The method is potentially most appropriate in situations where the dataset is very large or unbounded: smaller, bounded datasets may be more efficiently processed by other methods. Furthermore, our  approach is limited to cases where the final matrix is expressible as the linear sum of outer products of the data vectors. Note in particular that Latent Semantic Analysis, as usually implemented, is not an example of this, because LSA takes the log of the final sums in each cell (Dumais, 1990). LSA, however, does not depend on singular value decomposition; Gorrell and Webb (Gorrell and Webb, 2005) discuss using eigen decomposition to perform LSA, and demonstrate LSA using the Generalized Hebbian Algorithm in its unmodified form. Sanger (Sanger, 1993) presents similar work, and future work will involve more detailed comparison of this approach to his.</Paragraph>
    <Paragraph position="13"> The next section describes the algorithm.</Paragraph>
    <Paragraph position="14"> Section 3 describes implementation in practical terms. Section 4 illustrates, using word n-gram and letter n-gram tasks as examples and section 5 concludes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML