File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1014_intro.xml

Size: 3,950 bytes

Last Modified: 2025-10-06 14:03:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1014">
  <Title>Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis</Title>
  <Section position="3" start_page="105" end_page="105" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"> A number of different methods have been proposed for handling the non-globally optimal solution when using EM. These include the use of Tempered EM (Hofmann, 1999), combining models from different initializations in postprocessing (Hofmann, 1999; Brants et al., 2002), and trying to find good initial values. For their segmentation task, Brants et al. (2002) found overfitting, which Tempered EM helps address, was not a problem and that early stopping of EM provided good performance and faster learning. Computing and combining different models is computationally expensive, so a method that reduces this cost is desirable. Different methods for initializing EM include the use of random initialization e.g., (Hofmann, 1999), k-means clustering, and an initial cluster refinement algorithm (Fayyad et al., 1998). K-means clustering is not a good fit to the PLSA model in several ways: it is sensitive to outliers, it is a hard clustering, and the relation of the identified clusters to the PLSA parameters is not well defined. In contrast to these other initialization methods, weknow that the LSAreduces noise in the data and handles synonymy, and so should be a good initialization. The trick is in trying to relate the LSA parameters to the PLSA parameters.</Paragraph>
    <Paragraph position="1"> LSA is based on singular value decomposition (SVD) of a term by document matrix and retaining the top K singular values, mapping documents and terms to a new representation in a latent semantic space. It has been successfully applied in different domains including automatic indexing.</Paragraph>
    <Paragraph position="2"> Text similarity is better estimated in this low dimension space because synonyms are mapped to nearby locations and noise is reduced, although handling of polysemy is weak. In contrast, the PLSA model distributes the probability mass of a term over the different latent classes corresponding to different senses of a word, and thus better handles polysemy (Hofmann, 1999). The LSA model has two additional desirable features. First, the word document co-occurrence matrix can be weighted by any weight function that reflects the relative importance of individual words (e.g., tfidf). The weighting can therefore incorporate external knowledge into the model. Second, the SVD algorithm is guaranteed to produce the matrix of rank a0 that minimizes the distance to the original word document co-occurrence matrix.</Paragraph>
    <Paragraph position="3"> As noted in Hofmann (1999), an important difference between PLSA and LSA is the type of objective function utilized. In LSA, this is the L2 or Frobenius norm on the word document counts.</Paragraph>
    <Paragraph position="4"> In contrast, PLSA relies on maximizing the likelihood function, which is equivalent to minimizing the cross-entropy or Kullback-Leibler divergence between the empirical distribution and the predicted model distribution of terms in documents.</Paragraph>
    <Paragraph position="5"> A number of methods for deriving probabilities from LSA have been suggested. For example, Coccaro and Jurafsky (1998) proposed a method based on the cosine distance, and Tipping and Bishop (1999) give a probabilistic interpretation of principal component analysis that is formulated within a maximum-likelihood framework based on a specific form of Gaussian latent variable model. In contrast, we relate the LSA parameters to the PLSA model using a probabilistic interpretation of dimensionality reduction proposed by Ding (1999) that uses an exponential distribution to model the term and document distribution conditioned on the latent class.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML