File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1069_intro.xml

Size: 5,559 bytes

Last Modified: 2025-10-06 14:01:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1069">
  <Title>Probabilistic Text Structuring: Experiments with Sentence Ordering</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Structuring a set of facts into a coherent text is a non-trivial task which has received much attention in the area of concept-to-text generation (see Reiter and Dale 2000 for an overview). The structured text is typically assumed to be a tree (i.e., to have a hierarchical structure) whose leaves express the content being communicated and whose nodes specify how this content is grouped via rhetorical or discourse relations (e.g., contrast, sequence, elaboration).</Paragraph>
    <Paragraph position="1"> For domains with large numbers of facts and rhetorical relations, there can be more than one possible tree representing the intended content. These different trees will be realized as texts with different sentence orders or even paragraph orders and different levels of coherence. Finding the tree that yields the best possible text is effectively a search problem. One way to address it is by narrowing down the search space either exhaustively or heuristically.</Paragraph>
    <Paragraph position="2"> Marcu (1997) argues that global coherence can be achieved if constraints on local coherence are satisfied. The latter are operationalized as weights on the ordering and adjacency of facts and are derived from a corpus of naturally occurring texts. A constraint satisfaction algorithm is used to find the tree with maximal weights from the space of all possible trees. Mellish et al. (1998) advocate stochastic search as an alternative to exhaustively examining the search space. Rather than requiring a global optimum to be found, they use a genetic algorithm to select a tree that is coherent enough for people to understand (local optimum).</Paragraph>
    <Paragraph position="3"> The problem of finding an acceptable ordering does not arise solely in concept-to-text generation but also in the emerging field of text-to-text generation (Barzilay, 2003). Examples of applications that require some form of text structuring, are single- and multidocument summarization as well as question answering. Note that these applications do not typically assume rich semantic knowledge organized in tree-like structures or communicative goals as is often the case in concept-to-text generation. Although in single document summarization the position of a sentence in a document can provide cues with respect to its ordering in the summary, this is not the case in multidocument summarization where sentences are selected from different documents and must be somehow ordered so as to produce a coherent summary (Barzilay et al., 2002). Answering a question may also involve the extraction, potentially summarization, and ordering of information across multiple information sources.</Paragraph>
    <Paragraph position="4"> Barzilay et al. (2002) address the problem of information ordering in multidocument summarization and show that naive ordering algorithms such as majority ordering (selects most frequent orders across input documents) and chronological ordering (orders facts according to publication date) do not always yield coherent summaries although the latter produces good results when the information is eventbased. Barzilay et al. further conduct a study where subjects are asked to produce a coherent text from the output of a multidocument summarizer. Their results reveal that although the generated orders differ from subject to subject, topically related sentences always appear together. Based on the human study they propose an algorithm that first identifies topically related groups of sentences and then orders them according to chronological information.</Paragraph>
    <Paragraph position="5"> In this paper we introduce an unsupervised probabilistic model for text structuring that learns ordering constraints from a large corpus. The model operates on sentences rather than facts in a knowledge base and is potentially useful for text-to-text generation applications. For example, it can be used to order the sentences obtained from a multidocument summarizer or a question answering system.</Paragraph>
    <Paragraph position="6"> Sentences are represented by a set of informative features (e.g., a verb and its subject, a noun and its modifier) that can be automatically extracted from the corpus without recourse to manual annotation.</Paragraph>
    <Paragraph position="7"> The model learns which sequences of features are likely to co-occur and makes predictions concerning preferred orderings. Local coherence is thus operationalized by sentence proximity in the training corpus. Global coherence is obtained by greedily searching through the space of possible orders. As in the case of Mellish et al. (1998) we construct an acceptable ordering rather than the best possible one.</Paragraph>
    <Paragraph position="8"> We propose an automatic method of evaluating the orders generated by our model by measuring closeness or distance from the gold standard, a collection of orders produced by humans.</Paragraph>
    <Paragraph position="9"> The remainder of this paper is organized as follows. Section 2 introduces our model and an algorithm for producing a possible order. Section 3 describes our corpus and the estimation of the model parameters. Our experiments are detailed in Section 4. We conclude with a discussion in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML