XML Viewer - p05-1074

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1074_metho.xml
Size: 21,146 bytes
Last Modified: 2025-10-06 14:09:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1074">
  <Title>Paraphrasing with Bilingual Parallel Corpora</Title>
  <Section position="3" start_page="597" end_page="598" type="metho">
    <SectionTitle>
2 Extracting paraphrases
</SectionTitle>
    <Paragraph position="0"> Much previous work on extracting paraphrases (Barzilay and McKeown, 2001; Barzilay and Lee, 2003; Pang et al., 2003) has focused on finding identifying contexts within aligned monolingual sentences from which divergent text can be extracted, and treated as paraphrases. Barzilay and McKeown (2001) gives the example shown in Figure 1 of how identical surrounding substrings can be used to extract the paraphrases of burst into tears as cried and comfort as console.</Paragraph>
    <Paragraph position="1"> While monolingual parallel corpora often have identical contexts that can be used for identifying paraphrases, bilingual parallel corpora do not. Instead, we use phrases in the other language as pivots: we look at what foreign language phrases the English translates to, find all occurrences of those foreign phrases, and then look back at what other English phrases they translate to. We treat the other English phrases as potential paraphrases. Figure 2 illustrates how a German phrase can be used as a point of identification for English paraphrases in this way.</Paragraph>
    <Paragraph position="2"> Section 2.1 explains which statistical machine translation techniques are used to align phrases within sentence pairs in a bilingual corpus.</Paragraph>
    <Paragraph position="3"> A significant difference between the present work and that employing monolingual parallel corpora, is that our method frequently extracts more than one possible paraphrase for each phrase. We assign a probability to each of the possible paraphrases. This is a mechanism for ranking paraphrases, which can be utilized when we come to select the correct paraphrase for a given context . Section 2.2 explains how we calculate the probability of a paraphrase.</Paragraph>
    <Section position="1" start_page="597" end_page="598" type="sub_section">
      <SectionTitle>
2.1 Aligning phrase pairs
</SectionTitle>
      <Paragraph position="0"> We use phrase alignments in a parallel corpus as pivots between English paraphrases. We find these alignments using recent phrase-based approaches to statistical machine translation.</Paragraph>
      <Paragraph position="1"> The original formulation of statistical machine translation (Brown et al., 1993) was defined as a word-based operation. The probability that a foreign sentence is the translation of an English sentence is calculated by summing over the probabilities of all possible word-level alignments, a, between the sentences: null</Paragraph>
      <Paragraph position="3"> Thus Brown et al. decompose the problem of determining whether a sentence is a good translation of another into the problem of determining whether there is a sensible mapping between the words in the sentences.</Paragraph>
      <Paragraph position="4"> More recent approaches to statistical translation calculate the translation probability using larger blocks of aligned text. Koehn (2004), Tillmann (2003), and Vogel et al. (2003) describe various heuristics for extracting phrase alignments from the Viterbi word-level alignments that are estimated using Brown et al. (1993) models. We use the heuristic for phrase alignment described in Och and Ney (2003) which aligns phrases by incrementally building longer phrases from words and phrases which have adjacent alignment points.1  what is more, the relevant cost dynamic is completelyunder control im ubrigen ist die diesbezugliche kostenentwicklung vollig unter kontrolle we owe it to the taxpayers to keep in checkthe costs wir sind es den steuerzahlern die kosten zu habenschuldig unter kontrolle Figure 2: Using a bilingual parallel corpus to extract paraphrases</Paragraph>
    </Section>
    <Section position="2" start_page="598" end_page="598" type="sub_section">
      <SectionTitle>
2.2 Assigning probabilities
</SectionTitle>
      <Paragraph position="0"> We define a paraphrase probability p(e2|e1) in terms of the translation model probabilities p(f|e1), that the original English phrase e1 translates as a particular phrase f in the other language, and p(e2|f), that the candidate paraphrase e2 translates as the foreign language phrase. Since e1 can translate as multiple foreign language phrases, we sum over f:</Paragraph>
      <Paragraph position="2"> The translation model probabilities can be computed using any standard formulation from phrase-based machine translation. For example, p(e|f) can be calculated straightforwardly using maximum likelihood estimation by counting how often the phrases e and f were aligned in the parallel corpus:</Paragraph>
      <Paragraph position="4"> Note that the paraphrase probability defined in Equation 2 returns the single best paraphrase, ^e2, irrespective of the context in which e1 appears. Since the best paraphrase may vary depending on information about the sentence that e1 appears in, we extend the paraphrase probability to include that sentence S:</Paragraph>
      <Paragraph position="6"> word-level alignments in this paper, direct estimation of phrasal translations (Marcu and Wong, 2002) would also suffice for extracting paraphrases from bilingual corpora.</Paragraph>
      <Paragraph position="7"> a million, as far as possible, at work, big business, carbon dioxide, central america, close to, concentrate on, crystal clear, do justice to, driving force, first half, for the first time, global warming, great care, green light, hard core, horn of africa, last resort, long ago, long run, military action, military force, moment of truth, new world, noise pollution, not to mention, nuclear power, on average, only too, other than, pick up, president clinton, public transport, quest for, red cross, red tape, socialist party, sooner or later, step up, task force, turn to, under control, vocational training, western sahara, world bank  S allows us to re-rank the candidate paraphrases based on additional contextual information. The experiments in this paper employ one variety of contextual information. We include a simple language model probability, which would additionally rank e2 based on the probability of the sentence formed by substiuting e2 for e1 in S. A possible extension which we do not evaluate might be permitting only paraphrases that are the same syntactic type as the original phrase, which we could do by extending the translation model probabilities to count only phrase occurrences of that type.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="598" end_page="600" type="metho">
    <SectionTitle>
3 Experimental Design
</SectionTitle>
    <Paragraph position="0"> We extracted 46 English phrases to paraphrase (shown in Table 1), randomly selected from those multi-word phrases in WordNet which also occured multiple times in the first 50,000 sentences of our bilingual corpus. The bilingual corpus that we used  was the German-English section of the Europarl corpus, version 2 (Koehn, 2002). We produced automatic alignments for it with the Giza++ toolkit (Och and Ney, 2003). Because we wanted to test our method independently of the quality of word alignment algorithms, we also developed a gold standard of word alignments for the set of phrases that we wanted to paraphrase.</Paragraph>
    <Section position="1" start_page="599" end_page="599" type="sub_section">
      <SectionTitle>
3.1 Manual alignment
</SectionTitle>
      <Paragraph position="0"> The gold standard alignments were created by highlighting all occurrences of the English phrase to paraphrase and manually aligning it with its German equivalent by correcting the automatic alignment, as shown in Figure 3a. All occurrences of its German equivalents were then highlighted, and aligned with their English translations (Figure 3b).</Paragraph>
      <Paragraph position="1"> The other words in the sentences were left with their automatic alignments.</Paragraph>
    </Section>
    <Section position="2" start_page="599" end_page="600" type="sub_section">
      <SectionTitle>
3.2 Paraphrase evaluation
</SectionTitle>
      <Paragraph position="0"> We evaluated the accuracy of each of the paraphrases that was extracted from the manually aligned data, as well as the top ranked paraphrases from the experimental conditions detailed below in Section 3.3. Because the acccuracy of paraphrases can vary depending on context, we substituted each Under control This situation is in check in terms of security.</Paragraph>
      <Paragraph position="1"> This situation is checked in terms of security.</Paragraph>
      <Paragraph position="2"> This situation is curbed in terms of security.</Paragraph>
      <Paragraph position="3"> This situation is curb in terms of security.</Paragraph>
      <Paragraph position="4"> This situation is limit in terms of security.</Paragraph>
      <Paragraph position="5"> This situation is slow down in terms of security.</Paragraph>
      <Paragraph position="6">  shows the paraphrases for under control substituted into one of the sentences in which it occurred. We created a total of 289 such evaluation sets, with a total of 1366 unique sentences created through substitution. null We had two native English speakers produce judgments as to whether the new sentences preserved the meaning of the original phrase and as to whether they remained grammatical. Paraphrases that were judged to preserve both meaning and grammaticality were considered to be correct, and examples which failed on either judgment were considered to be incorrect.</Paragraph>
      <Paragraph position="7"> In Figure 4 in check, checked, and curbed were  under control checked, curb, curbed, in check, limit, slow down sooner or later at some point, eventually military force armed forces, defence, force, forces, military forces, peace-keeping personnel long ago a little time ago, a long time, a long time ago, a lot of time, a while ago, a while back, far, for a long time, for some time, for such a long time, long, long period of time, long term, long time, long while, overdue, some time, some time ago green light approval, call, go-ahead, indication, message, sign, signal, signals, formal go-ahead great care a careful approach, greater emphasis, particular attention, special attention, specific attention, very careful first half first six months crystal clear absolutely clear, all clarity, clear, clearly, in great detail, no mistake, no uncertain, obvious, obviously, particularly clear, perfectly clear, quite clear, quite clearly, quite explicitly, quite openly, very clear, very clear and comprehensive, very clearly, very sure, very unclear, very well carbon dioxide co2 at work at the workplace, employment, held, holding, in the work sphere, operate, organised, taken place, took place, working  judged to be correct and curb, limit and slow down were judged to be incorrect. The inter-annotator agreement for these judgements was measured at k = 0.605, which is conventionally interpreted as &amp;quot;good&amp;quot; agreement.</Paragraph>
    </Section>
    <Section position="3" start_page="600" end_page="600" type="sub_section">
      <SectionTitle>
3.3 Experiments
</SectionTitle>
      <Paragraph position="0"> We evaluated the accuracy of top ranked paraphrases when the paraphrase probability was calculated using: null  1. The manual alignments, 2. The automatic alignments, 3. Automatic alignments produced over multiple corpora in different languages, 4. All of the above with language model reranking. null 5. All of the above with the candidate paraphrases limited to the same sense as the original phrase.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="600" end_page="602" type="metho">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> We report the percentage of correct translations (accuracy) for each of these experimental conditions. A summary of these can be seen in Table 3. This section will describe each of the set-ups and the score reported in more detail.</Paragraph>
    <Section position="1" start_page="600" end_page="601" type="sub_section">
      <SectionTitle>
4.1 Manual alignments
</SectionTitle>
      <Paragraph position="0"> Table 2 gives a set of example paraphrases extracted from the gold standard alignments. The italicized paraphrases are those that were assigned the highest probability by Equation 2, which chooses a single best paraphrase without regard for context. The 289 sentences created by substituting the italicized paraphrases in for the original phrase were judged to be correct an average of 74.9% of the time.</Paragraph>
      <Paragraph position="1"> Ignoring the constraint that the new sentences remain grammatically correct, these paraphrases were judged to have the correct meaning 84.7% of the time. This suggests that the context plays a more important role with respect to the grammaticality of substituted paraphrases than with respect to their meaning.</Paragraph>
      <Paragraph position="2"> In order to allow the surrounding words in the sentence to have an influence on which paraphrase was selected, we re-ranked the paraphrase probabilities based on a trigram language model trained on the entire English portion of the Europarl corpus. Paraphrases were selected from among all those in Table 2, and not constrained to the italicized phrases. In the case of the paraphrases extracted from the manual word alignments, the language model re-ranking had virtually no influence, and resulted in a slight dip in accuracy to 71.7%</Paragraph>
    </Section>
    <Section position="2" start_page="601" end_page="601" type="sub_section">
      <SectionTitle>
4.2 Automatic alignments
</SectionTitle>
      <Paragraph position="0"> In this experimental condition paraphrases were extracted from a set of automatic alignments produced by running Giza++ over a set of 1,036,000 German-English sentence pairs (roughly 28,000,000 words in each language). When the single best paraphrase (irrespective of context) was used in place of the original phrase in the evaluation sentence the accuracy reached 48.9% which is quite low compared to the 74.9% of the manually aligned set.</Paragraph>
      <Paragraph position="1"> As with the manual alignments it seems that we are selecting phrases which have the correct meaning but are not grammatical in context. Indeed our judges thought the meaning of the paraphrases to be correct in 64.5% of cases. Using a language model to select the best paraphrase given the context reduces the number of ungrammatical examples and gives an improvement in quality from 48.9% to 55.3% correct.</Paragraph>
      <Paragraph position="2"> These results suggest two things: that improving the quality of automatic alignments would lead to more accurate paraphrases, and that there is room for improvement in limiting the paraphrases by their context. We address these points below.</Paragraph>
    </Section>
    <Section position="3" start_page="601" end_page="601" type="sub_section">
      <SectionTitle>
4.3 Using multiple corpora
</SectionTitle>
      <Paragraph position="0"> Work in statistical machine translation suggests that, like many other machine learning problems, performance increases as the amount of training data increases. Och and Ney (2003) show that the accuracy of alignments produced by Giza++ improve as the size of the training corpus increases.</Paragraph>
      <Paragraph position="1"> Since we used the whole of the German-English section of the Europarl corpus, we could not try improving the alignments by simply adding more German-English training data. However, there is nothing that limits our paraphrase extraction method to drawing on candidate paraphrases from a single target language. We therefore re-formulated the paraphrase probability to include multiple corpora, as follows:</Paragraph>
      <Paragraph position="3"> where C is a parallel corpus from a set of parallel corpora.</Paragraph>
      <Paragraph position="4"> For this condition we used Giza++ to align the French-English, Spanish-English, and Italian-English portions of the Europarl corpus in addition to the German-English portion, for a total of around 4,000,000 sentence pairs in the training data.</Paragraph>
      <Paragraph position="5"> The accuracy of paraphrases extracted over multiple corpora increased to 55%, and further to 57.4% when the language model re-ranking was included.</Paragraph>
    </Section>
    <Section position="4" start_page="601" end_page="602" type="sub_section">
      <SectionTitle>
4.4 Controlling for word sense
</SectionTitle>
      <Paragraph position="0"> As mentioned in Section 1, the way that we extract paraphrases is the converse of the methodology employed in word sense disambiguation work that uses parallel corpora (Diab and Resnik, 2002). The assumption made in the word sense disambiguation work is that if a source language word aligns with different target language words then those words may represent different word senses. This can be observed in the paraphrases for at work in Table 2.</Paragraph>
      <Paragraph position="1"> The paraphrases at the workplace, employment, and in the work sphere are a different sense of the phrase than operate, held, and holding, and they are aligned with different German phrases.</Paragraph>
      <Paragraph position="2"> When we calculate the paraphrase probability we sum over different target language phrases. Therefore the English phrases that are aligned with the different German phrases (which themselves maybe indicative of different word senses) are mingled. Performance may be degraded since paraphrases that reflect different senses of the original phrase, and which therefore have a different meaning, are included in the same candidate set.</Paragraph>
      <Paragraph position="3">  We therefore performed an experiment to see whether improvement could be had by limiting the candidate paraphrases to be the same sense as the original phrase in each test sentence. To do this, we used the fact that our test sentences were drawn from a parallel corpus. We limited phrases to the same word sense by constraining the candidate paraphrases to those that aligned with the same target language phrase. Our basic paraphrase calculation was therefore:</Paragraph>
      <Paragraph position="5"> Using the foreign language phrase to identify the word sense is obviously not applicable in monolingual settings, but acts as a convenient stand-in for a proper word sense disambiguation algorithm here.</Paragraph>
      <Paragraph position="6"> When word sense is controlled in this way, the accuracy of the paraphrases extracted from the automatic alignments raises dramatically from 48.9% to 57% without language model re-ranking, and further to 61.9% when language model re-ranking was included.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="602" end_page="602" type="metho">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> Barzilay and McKeown (2001) extract both singleand multiple-word paraphrases from a monolingual parallel corpus. They co-train a classifier to identify whether two phrases were paraphrases of each other based on their surrounding context. Two disadvantages of this method are that it requires identical bounding substrings, and has bias towards single words. For an evaluation set of 500 paraphrases, they report an average precision of 86% at identifying paraphrases out of context, and of 91% when the paraphrases are substituted into the original context of the aligned sentence. The results of our systems are not directly comparable, since Barzilay and McKeown (2001) evaluated their paraphrases with a different set of criteria (they asked judges whether to judge paraphrases based on &amp;quot;approximate conceptual equivalence&amp;quot;). Furthermore, their evaluation was carried out only by substituting the paraphrase in for the phrase with the identical context, and not in for arbitrary occurrences of the original phrase, as we have done.</Paragraph>
    <Paragraph position="1"> Lin and Pantel (2001) use a standard (nonparallel) monolingual corpus to generate paraphrases, based on dependancy graphs and distributional similarity. One strong disadvantage of this method is that their paraphrases can also have opposite meanings.</Paragraph>
    <Paragraph position="2"> Ibrahim et al. (2003) combine the two approaches: aligned monolingual corpora and parsing. They evaluated their system with human judges who were asked whether the paraphrases were &amp;quot;roughly interchangeable given the genre&amp;quot;, scored an average of 41% on a set of 130 paraphrases, with the judges all agreeing 75% of the time, and a correlation of 0.66. The shortcomings of this method are that it is dependent upon parse quality, and is limited by the rareness of the data.</Paragraph>
    <Paragraph position="3"> Pang et al. (2003) use parse trees over sentences in monolingual parallel corpus to identify paraphrases by grouping similar syntactic constituents. They use heuristics such as keyword checking to limit the over-application of this method. Our alignment method might be an improvement of their heuristics for choosing which constituents ought to be grouped.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML