File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-1612_evalu.xml

Size: 4,261 bytes

Last Modified: 2025-10-06 13:59:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1612">
  <Title>Explorations in Sentence Fusion[?]</Title>
  <Section position="6" start_page="5" end_page="5" type="evalu">
    <SectionTitle>
4 Merging and generation
</SectionTitle>
    <Paragraph position="0"> The remaining two steps in the sentence fusion process are merging and generation. In general, merging amounts to deciding which information from either sentence should be preserved, whereas generation involves producing a grammatically correct surface representation. In order to get an idea about the baseline performance, we explored a simple, somewhat naive string-based approach. Below, the pseudocode is shown for merging two dependency trees in order to get restatements. Given a labeled alignment A between dependency graphs D and Dprime, if there is a restates relation between node v from D and node vprime from Dprime, we add the string realization of vprime as an alternative to those of v.</Paragraph>
    <Paragraph position="1">  where MOD-DEP-REL is the set of dependency relations between a node and a modifier (e.g. head/mod and head/predm).</Paragraph>
    <Paragraph position="2"> Each procedure is repeated twice, once adding substrings from D into Dprime and once the other way around. Next, we traverse the dependency trees and generate all string realizations, extending the list of variants for each node that has multiple realizations. Finally, we filter out multiple copies of the same string, as well as strings that are identical to the input sentences.</Paragraph>
    <Paragraph position="3"> This procedure for merging and generation was applied to the 35 sentence pairs from the consensus alignment of chapter one of &amp;quot;Le Petit Prince&amp;quot;. Overall this gave rise to 194 restatement, 62 specifications and 177 generalizations, with some sentence pairs leading to many variants and others to none at all. Some output showed only minor variations, for instance, substitution of a synonym. However, others revealed surprisingly adequate generalizations or specifications. Examples of good and bad output are given in Figure 2.</Paragraph>
    <Paragraph position="4"> As expected, many of the resulting variants are ungrammatical, because constraints on word order, agreement or subcategorisation are violated. Following work on statistical surface generation [Langkilde and Knight, 1998] and other work on sentence fusion [Barzilay, 2003], we tried to filter ungrammatical variants with an n-gram language model. The Cambridge-CMU Statistical Modeling Toolkit v2 was used to train a 3-gram model on over 250M words from the Twente Newscorpus , using back-off and Good-Turing smoothing.</Paragraph>
    <Paragraph position="5"> Variants were ranked in order of increasing entropy. We found, however, that the ranking was often inadequate, showing ungrammatical variants at the top and grammatical variants in the lower regions.</Paragraph>
    <Paragraph position="6"> To gain some insight into the general performance of the  merging and generation strategy, we performed a small evaluation test in which the two authors independently judged all generated variants in terms of three categories: 1. Perfect: no problems in either semantics or syntax; 2. Acceptable: understandable, but with some minor flaws in semantics or grammar; 3. Nonsense: serious problems in semantics or grammar  put as the number of sentences in each of the three categories perfect, acceptable and nonsense per judge (J1 and J2), broken down in restatements, generalizations and specifications. task is .75, indicating a moderate to good agreement between the judges. Roughly half of the generated restatements and generalization are perfect, while this is not the case for specifications. We have no plausible explanation for this yet. We think we can conclude from this evaluation that sentence fusion is a viable and interesting approach for producing restatements, generalization and specifications. However, there is certainly further work to do; the procedure for merging dependency graphs should be extended, and the realization model clearly requires more linguistic sophistication in particular to deal with word order, agreement and subcategorisation constraints.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML