File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-5008_metho.xml

Size: 16,437 bytes

Last Modified: 2025-10-06 14:09:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-5008">
  <Title>Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation</Title>
  <Section position="5" start_page="0" end_page="57" type="metho">
    <SectionTitle>
3 The linguistic resource used
</SectionTitle>
    <Paragraph position="0"> The linguistic resource used in the experiment presented in this paper relies on the C-STAR collection of utterances called Basic Traveler's Expressions1. This is a multilingual resource of expressions from the travel and tourism domain that contains 162,318 aligned translations in several languages, among which English. The items are quite short as the following examples show (one line is one item in the corpus), and as the figures in Table 1 show.</Paragraph>
    <Paragraph position="1">  Number of Avg. size +- std. dev.</Paragraph>
    <Paragraph position="2"> negationslash= sentences in characters in words  Thank you so much. Keep the change.</Paragraph>
    <Paragraph position="3"> Bring plenty of lemon, please.</Paragraph>
    <Paragraph position="4"> Please tell me about some interesting places [near here.</Paragraph>
    <Paragraph position="5"> Thank you. Please sign here.</Paragraph>
    <Paragraph position="6"> How do you spell your name? The quality of this resource is of at least 99% correct sentences (p-value = 1.92%). The few incorrect sentences contain spelling errors or slight syntactical mistakes.</Paragraph>
  </Section>
  <Section position="6" start_page="57" end_page="59" type="metho">
    <SectionTitle>
4 Our paraphrasing methodology
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="57" end_page="57" type="sub_section">
      <SectionTitle>
4.1 Our algorithm
</SectionTitle>
      <Paragraph position="0"> The proposed method consists in two phases: firstly, paraphrase detection through equality of translation and secondly, paraphrase generation through linguistic commutations based on the data produced in the first phase: * Detection: find sentences which share a same translation in the multilingual resource (4.2); * Generation: produce new sentences by exploiting commutations (4.3); limit combinatorics by contiguity constraints (4.4). Each of the steps of the previous algorithm is explained in details in the following sections.</Paragraph>
    </Section>
    <Section position="2" start_page="57" end_page="57" type="sub_section">
      <SectionTitle>
4.2 Initialisation by paraphrase detection
</SectionTitle>
      <Paragraph position="0"> In a first phase we initialise our data by paraphrase detection. By definition, paraphrase is an equivalence in meaning, thus, different sentences having the same translation ought to be considered equivalent in meaning, i.e., they are paraphrases2. As the linguistic resource used 2This is basically the same approach as (OHTAKE and YAMAMOTO, 2003, p. 3 and 4).</Paragraph>
      <Paragraph position="1"> in the present experiment is a multilingual corpus, we have at our disposal the corresponding translations in different languages for each of its sentences. For instance, the following English sentences share a common Japanese translation shown in bold face below. Therefore, they are paraphrases.</Paragraph>
    </Section>
    <Section position="3" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
4.3 Commutation in proportional analogies
</SectionTitle>
      <Paragraph position="0"> for paraphrase generation In a second phase, we implement paraphrase generation. Any given sentence may share commutations with other sentences of the corpus. Such commutations are best seen in analogical relations that explicit syntagmatic and paradigmatic variations (de SAUSSURE, 1995, part 3, chap 4). For instance, the seed sentence A slice of pizza, please.</Paragraph>
      <Paragraph position="1">  I'd like a beer, please. : A beer, please. :: I'd like a slice of pizza, please. : A slice of pizza, please.</Paragraph>
      <Paragraph position="2"> I'd like a twin, please. : A twin, please. :: I'd like a slice of pizza, please. : A slice of pizza, please.</Paragraph>
      <Paragraph position="3"> I'd like a bottle of red wine, please. : A bottle of red wine, please. :: I'd like a slice of pizza, please. : A slice of pizza, please.</Paragraph>
      <Paragraph position="4">  the sentence A slice of pizza, please. (i) I'd like a beer, please. : A beer, please. :: I'd like a slice of pizza, please. : A slice of pizza, please.</Paragraph>
      <Paragraph position="5"> (ii) I'd like a beer, please. : Can I have a beer? :: I'd like a slice of pizza, please. : x (iii) I'd like a beer, please. : Can I have a beer? :: I'd like a slice of pizza, please. : Can I have a slice of pizza?  beer, please. with one of its paraphrases acquired during the detection phase: Can I have a beer? The last sentence of the proportional analogy becomes unknown. (iii) Solving the analogical equation, i.e., generating a paraphrase of A slice of pizza, please. enters in the analogies of Table 3. The replacement of some sentences with known paraphrases in such analogies allows us to produce new sentences. This explains why we needed some paraphrases to start with. For instance, by replacing the sentence: A beer, please.</Paragraph>
      <Paragraph position="6"> with the sentence: Can I have a beer? in the first analogy of Table 3, one gets the following analogical equation, that is solved as indicated. null I'd like a beer, please. : Can I have a beer? :: I'd like a slice of pizza, please. : x = x = Can I have a slice of pizza? It is then legitimate to say that the produced sentence: null Can I have a slice of pizza? is a paraphrase of the seed sentence (see Table 4). Such a method alleviates the problem of creating templates from examples which would be used in an ulterior phase of generation (BARZI-LAY and LEE, 2003). Here, all examples in the corpus are potential templates in their actual raw form, with the advantage that the choice of the places where commutations may occur is left to proportional analogy.</Paragraph>
    </Section>
    <Section position="4" start_page="58" end_page="59" type="sub_section">
      <SectionTitle>
4.4 Limitation of combinatorics by
</SectionTitle>
      <Paragraph position="0"> contiguity constraints During paraphrase generation, spurious sentences may be produced. For instance, the replacement in the previous analogy, of the sentence: A beer, please.</Paragraph>
      <Paragraph position="1"> by the following paraphrase detected during the first phase: A bottle of beer, please.</Paragraph>
      <Paragraph position="2"> produces the unfortunate sentence: [?]A bottle of slice of pizza, please. Moreover, as no complete and valid formalisation of linguistic analogies has yet been proposed, the algorithm used (LEPAGE, 1998) may deliver such unacceptable strings as:  43 Could we have a table in the corner? 43 I'd like a table in the corner.</Paragraph>
      <Paragraph position="3"> 43 We would like a table in the corner. 28 Can we have a table in the corner? 5 Can I get a table in the corner? 5 In the corner, please.</Paragraph>
      <Paragraph position="4"> 4 We'd like to sit in the corner.</Paragraph>
      <Paragraph position="5"> 2 I'd like to sit in the corner.</Paragraph>
      <Paragraph position="6"> 2 I would like a table in the corner. 2 We'd like a table in the corner.</Paragraph>
      <Paragraph position="7"> 1 I'd prefer a table in the corner.</Paragraph>
      <Paragraph position="8"> 1 I prefer a table in the corner.</Paragraph>
      <Paragraph position="9">  N-sequences (N = 20) are struck out. Notice that the seed sentence itself has been generated again by the method (4th sentence from the top). The figures on the left are the frequencies with which the sentence has been generated.</Paragraph>
      <Paragraph position="10"> [?]A slice of pizzthe, pleaset for tha, please. In order to ensure a very high rate of well-formedness among the sentences produced, we require a method that extracts well-formed sentences from the set of generated sentences with a very high precision (to the possible prejudice of the recall).</Paragraph>
      <Paragraph position="11"> To this end, we eliminate all sentences containing sequences of characters of a given length unseen in the original data3. It is clear that, by adequately tuning the given length, such a method will be able to retain a satisfactory number of sentences that will be undoubtedly correct, at least in the sense of the linguistic resource.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="59" end_page="62" type="metho">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> During the first phase of paraphrase detection, 26,079 sentences (out of 97,769) got at least one possibly incorrect paraphrase candidate with an average of 5.35 paraphrases by sentence. However, the distribution is not uniform: 60 sentences get more than 100 paraphrases.</Paragraph>
    <Paragraph position="1"> The maximum is reached with 529 paraphrases for the sentence Sure. Such a sentence has a variety of meanings depending on the context, which explains the high number of its possible paraphrases as illustrated below.</Paragraph>
    <Paragraph position="2"> 3This is conform to the trend of using N-sequences to assess the quality of outputs of various NLP systems like (LIN and HOVY, 2003) for summary generation, (DODDINGTON, 2002) for machine translation, etc..</Paragraph>
    <Paragraph position="3"> Sure. Here you are.</Paragraph>
    <Paragraph position="4"> Sure. This way, please.</Paragraph>
    <Paragraph position="5"> Certainly, go ahead, please.</Paragraph>
    <Paragraph position="6"> I'm sure I will.</Paragraph>
    <Paragraph position="7"> No, I don't mind a bit.</Paragraph>
    <Paragraph position="8"> Okay. I understand quite well, thank you.</Paragraph>
    <Paragraph position="9"> Sounds fine to me.</Paragraph>
    <Paragraph position="10"> Yes, I do.</Paragraph>
    <Paragraph position="11"> . . .</Paragraph>
    <Paragraph position="12"> However, such an example shows also that the more the paraphrases obtained by this method, the less reliable their quality.</Paragraph>
    <Paragraph position="13"> During the second phase of paraphrase generation, the method generated 4,495,266 English sentences on our linguistic resource. An inspection of a sample of 400 sentences shows that the quality lies around 23.6% of correct sentences (pvalue = 1.19%) in syntax and meaning. The set of paraphrase candidates obtained on an example sentence are shown in Table 5.</Paragraph>
    <Paragraph position="14"> To ensure fluency of expression and adequacy of meaning, the method then filtered out any sentence containing an N-sequence unseen in the corpus (see Section 4.4). The best value for N that allowed us to obtain a quality rate at the same level to that of the original linguistic resource was 20.</Paragraph>
    <Paragraph position="15"> As a final result, the number of seed sentences for which we obtained at least one paraphrase is 16,153. With a total number of 147,708 para- null phrases generated4, the average number of paraphrases per sentence is 8.65 with a standard deviation of 16.98 which means that the distribution is unbalanced. The graph on the left of Figure 1 shows the number of seed sentences with the same number of paraphrases. while the graph on the right shows the number of paraphrases against the length of the seed sentence in words.</Paragraph>
    <Paragraph position="16"> 6 Quality of the generated paraphrases</Paragraph>
    <Section position="1" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
6.1 Well-formedness of the generated
</SectionTitle>
      <Paragraph position="0"> paraphrases The grammatical quality of the paraphrase candidates obtained was evaluated on a sample of 400 sentences: at least 99% of the paraphrases may be considered grammatically correct (p-value = 2.22%). This quality is approximately the same as that of the original resource: at least 99% (pvalue = 1.92%).</Paragraph>
      <Paragraph position="1"> An overview of the errors in the generated paraphrases suggests that they do not differ from the ones in the original data. For instance, one notes that an article is lacking before the noun phrase tourist area in the following sentence: Where is tourist area? Although we are not able to trace the error back to its origin, such a mistake is certainly due to a commutation with a sentence like: Where is information office? that contains a similar mistake and that is found in the original linguistic resource.</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="62" type="sub_section">
      <SectionTitle>
6.2 Equivalence in content between
</SectionTitle>
      <Paragraph position="0"> generated paraphases and seed sentence The semantic quality of the paraphrases produced was also checked by hand on a sample of 470 paraphrases that were compared with their corresponding seed sentence. We not only checked for strict equivalence, but also for meaning entail- null The following three paraphrases on the left with their corresponding seed sentences on the right are examples that were judged to be strict equivalences.</Paragraph>
      <Paragraph position="1"> Can I see some ID? Could you show me some ID? Please exchange this. Could you exchangethis, please. Please send it to Japan.</Paragraph>
      <Paragraph position="2"> Send it to Japan, please.</Paragraph>
      <Paragraph position="3"> The following are examples in which there is a lack of information either in the paraphrase produced or in the seed sentence. This is precisely what entailment is.</Paragraph>
      <Paragraph position="4"> Coke, please. Miss, could I have acoke? I want to change money. Please exchange this.</Paragraph>
      <Paragraph position="5"> Sunny-side up, please. Fried eggs, sunny-sideup, please. The result of the sampling is that the paraphrase candidates can be considered valid paraphrases in at least 94% of the cases either by equivalence or entailment (p-value = 3.05%). The following sentences exemplify the remaining cases where two sentences were not judged valid paraphrases of one another.</Paragraph>
      <Paragraph position="6"> Do you charge extra if I drop it off? There will be a drop off charge.</Paragraph>
      <Paragraph position="7"> Here's one for you, sir. You can get one here.</Paragraph>
      <Paragraph position="8"> There it is. Yes, please sit down.</Paragraph>
      <Paragraph position="9"> Table 6 summarises the distribution of paraphrase candidates according to the abovementionned classification.</Paragraph>
      <Paragraph position="10">  phrases per seed sentence (lower graphs). In these graphs, each point is the score of a set of paraphrases against the seed sentence they were produced for. Lower scores indicate a greater lexical and syntactical variation in paraphrases. The connected points show mean values along the axis of abscissae.</Paragraph>
    </Section>
    <Section position="3" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
7.1 Objective measures
</SectionTitle>
      <Paragraph position="0"> We assessed the lexical and syntactical variation of our paraphrases on a sample of 400 seed sentences using BLEU and NIST. On the contrary to evaluation of machine translation where the goal is to obtain high scores in BLEU and NIST, our goal here, when comparing a paraphrase to the seed sentence it has been produced for, is to get low scores. Indeed, high scores reflect some high correlation with translation references that is a lesser variation. As our goal is precisely to prepare data for evaluation with BLEU and NIST, it is thus to generate sets of paraphrases that would contain as much variation as possible to express the same meaning as the seed sentences, i.e. we look for low scores in BLEU and NIST.</Paragraph>
      <Paragraph position="1"> Again, all this can be done safely as long as one is sure that the sentences compared are valid sentences and valid paraphrases. This is the case of our data, as we have already shown that the paraphrases produced are 99% grammatically and semantically correct sentences and that they are paraphrases of their corresponding seed sentences in 94% of the cases.</Paragraph>
      <Paragraph position="2"> As for the meaning of BLEU and NIST, they are supposed to measure complementary characteristics of translations: namely fluency and adequacy (AKIBA et al., 2004, p. 7). BLEU tends to measure the quality in form of expression (fluency), while NIST6 tends to measure quality in meaning (adequacy).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML