File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-5009_intro.xml

Size: 8,651 bytes

Last Modified: 2025-10-06 14:03:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-5009">
  <Title>Evaluating Contextual Dependency of Paraphrases using a Latent Variable Model</Title>
  <Section position="2" start_page="0" end_page="66" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper proposes amethodto evaluatewhether a paraphrasing pair is contextually independent.</Paragraph>
    <Paragraph position="1"> Evaluating a paraphrasing pair is important when we extract paraphrases from a corpus or apply a paraphrase to asentence, sincewemust guarantee that the paraphrase carries almost the same meaning. However, the meaning carried by a sentence is affected by its context. Thus, we focus on the contextual dependency of paraphrases.</Paragraph>
    <Paragraph position="2"> A thing can be expressed by various expressions, and a single idea can be paraphrased in many ways to enrich its expression or to increase understanding. Paraphrasing plays a very important role in natural language expressions. However, it is very hard for machines to handle different expressions that carry the same meaning.</Paragraph>
    <Paragraph position="3"> The importance of paraphrasing has been widely acknowledged, and many paraphrasing studies have been carried out. Using only surface similarity is insufficient for evaluating paraphrases because there are not only surface differences but many other kinds of differences between paraphrased sentences. Thus, it is not easy to evaluate whether two sentences carry almost the same meaning.</Paragraph>
    <Paragraph position="4"> Some studies have constructed and evaluated hand-made rules (Takahashi et al., 2001; Ohtake and Yamamoto, 2001). Others have tried to extract paraphrases from corpora (Barzilay and McKeown, 2001; Lin and Pantel, 2001), which are very useful because they enable us to constructparaphrasingrules. Inaddition, wecanconstruct an example-based or a Statistical Machine Translation (SMT)-like paraphrasing system that utilizes paraphrasing examples. Thus, collecting paraphrased examples must be continued to achieve high-performance paraphrasing systems.</Paragraph>
    <Paragraph position="5"> Several methods of acquiring paraphrases have been proposed (Barzilay and McKeown, 2001; Shimohata and Sumita, 2002; Yamamoto, 2002).</Paragraph>
    <Paragraph position="6"> Some use parallel corpora as resources to obtain paraphrases, which seems a promising way to extract high-quality paraphrases.</Paragraph>
    <Paragraph position="7"> However, unlike translation, there is no obvious paraphrasing direction. Given paraphrasing pair E1:E2, we have to know the paraphrasing direction to paraphrase from E1 to E2 and vice versa. When extracting paraphrasing pairs from corpora, whether the paraphrasing pairs are con- null textually dependent paraphrases is a serious problem, and thus there is a specific paraphrase direction for each pair. In addition, it is also important to evaluate a paraphrasing pair not only when extracting but also when applying a paraphrase.</Paragraph>
    <Paragraph position="8"> Consider this example, automatically extracted from a corpus: Can I pay by traveler's check? / Do you take traveler's checks? This example seems contextually independent. On the other hand, here is another example: I want to buy a pair of sandals. / I'm looking for sandals. This example seems to be contextually dependent, because we don't know whether the speaker is only looking for a single pair of sandals. In some contexts, the latter sentence means that the speaker is seeking or searching for sandals. In other words, the former sentence carries specific meaning, but the latter carries generic meaning. Thus, the paraphrasing sentences are contextually dependent, and although the paraphrasing direction from specific to generic might be acceptable, the opposite direction may not be.</Paragraph>
    <Paragraph position="9"> We can solve part of this problem by inferring the contexts of the paraphrasing sentences. A text model with latent variables can be used to infer the topic of a text, since latent variables correspond to the topics indicated by texts. We assume that a topic indicated by a latent variable of a text model can be used as an approximation of context. Needless to say, however, such an approximation is very rough, and a more complex model or more powerful approach must be developed to achieve performances that match human judgement in evaluating paraphrases.</Paragraph>
    <Paragraph position="10"> The final goal of this study is the evaluation of paraphrasing pairs based on the following two factors: contextual dependency and paraphrasing direction. In this paper, however, as a first step to evaluate paraphrasing pairs, we focus on the evaluation of contextual dependency by usingprobabilisticLatentSemanticIndexing(pLSI) null (Hofmann, 1999) and Latent Dirichlet Allocation  (LDA) (Blei et al., 2003) as text models with latent variables.</Paragraph>
    <Paragraph position="11"> 2 Latent Variable Models and Topic</Paragraph>
    <Section position="1" start_page="65" end_page="65" type="sub_section">
      <SectionTitle>
Inference
</SectionTitle>
      <Paragraph position="0"> In this section, we introduce two latent variable models, pLSI and LDA, and also explain how to infer a topic with the models.</Paragraph>
      <Paragraph position="1"> InadditiontopLSIandLDA,there areotherlatent variable models such as mixture of unigrams. We used pLSI and LDA because Blei et al. have already demonstrated that LDA outperforms mixture of unigrams and pLSI (Blei et al., 2003), and a toolkit has been developed for each model.</Paragraph>
      <Paragraph position="2"> From a practical viewpoint, we want to determinehow muchperformance differenceexists between pLSI and LDA through evaluations of contextual paraphrase dependency. The time complexity required to infer a topic by LDA is larger than that by pLSI, and thus it is valuable to know the performance difference.</Paragraph>
    </Section>
    <Section position="2" start_page="65" end_page="66" type="sub_section">
      <SectionTitle>
2.1 Probabilistic LSI
</SectionTitle>
      <Paragraph position="0"> PLSI is a latent variable model for general co-occurrence data that associates an unobserved topic variable z [?] Z = {z1,***,zK} with each observation, i.e., with each occurrence of word w [?] W = {w1,***,wM} in document d [?] D = {d1,***dN}.</Paragraph>
      <Paragraph position="1"> PLSI gives joint probability for a word and a document as follows:</Paragraph>
      <Paragraph position="3"> However, to infer a topic indicated by a document, wehavetoobtain P(z|d). From (Hofmann, 1999), we can derive the following formulas:</Paragraph>
      <Paragraph position="5"> where n(d,w) denotes term frequency, which is the number of times w occurs in d. Assuming that P(d|z) =producttextw[?]d P(w|z), the probability of a topic under document (P(z|d)) is proportional to the following formula:</Paragraph>
      <Paragraph position="7"> After a pLSI model is constructed with a learning corpus, we can infer topic z [?] Z indicated  by given document d = w1,***,wM(d) with Formula 5. A topic z that maximizes Formula 5 is inferred as the topic of document d.</Paragraph>
    </Section>
    <Section position="3" start_page="66" end_page="66" type="sub_section">
      <SectionTitle>
2.2 Latent Dirichlet Allocation
</SectionTitle>
      <Paragraph position="0"> Latent Dirichlet Allocation (LDA) is a generative probabilistic model of a corpus. The basic idea is that documents are represented as random mixtures over latent topics, where each topic is characterized by a distribution over words.</Paragraph>
      <Paragraph position="1"> LDA gives us the marginal distribution of a document (p(d|a,b),d = (w1,w2,***wN)) by the following formula:</Paragraph>
      <Paragraph position="3"> where a parameterizes Dirichlet random variable th and b parameterizes the word probabilities, and zn indicates a topic variable zn [?] Z = {z1,z2,***,zN}. To obtain the probability of a corpus, we take the product of the marginal probabilities of single documents.</Paragraph>
      <Paragraph position="4"> Here, we omit the details of parameter estimation and the inference of LDA due to space limitations. However, the important point is that the Dirichlet parameters used to infer the probability of a document can be seen as providing a representation of the document in the topic simplex.</Paragraph>
      <Paragraph position="5"> In other words, these parameters indicate a point in the topic simplex. Thus, in this paper, we use the largest elements of the parameters to infer the topic (as an approximation of context) to which a given text belongs.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML