File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-1069_concl.xml

Size: 2,870 bytes

Last Modified: 2025-10-06 13:53:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1069">
  <Title>Probabilistic Text Structuring: Experiments with Sentence Ordering</Title>
  <Section position="6" start_page="3" end_page="3" type="concl">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> In this paper we proposed a data intensive approach to text coherence where constraints on sentence ordering are learned from a corpus of domain-specific  The summaries as well as the human data are available from http://www.cs.columbia.edu/~noemie/ordering/.</Paragraph>
    <Paragraph position="1"> texts. We experimented with different feature encodings and showed that lexical and syntactic information is important for the ordering task. Our results indicate that the model can successfully generate orders for texts taken from the corpus on which it is trained. The model also compares favorably with human performance on a single- and multiple document ordering task.</Paragraph>
    <Paragraph position="2"> Our model operates on the surface level rather than the logical form and is therefore suitable for text-to-text generation systems; it acquires ordering constraints automatically, and can be easily ported to different domains and text genres. The model is particularly relevant for multidocument summarization since it could provide an alternative to chronological ordering especially for documents where publication date information is unavailable or uninformative (e.g., all documents have the same date). We proposed Kendall's t as an automated method for evaluating the generated orders.</Paragraph>
    <Paragraph position="3"> There are a number of issues that must be addressed in future work. So far our evaluation metric measures order similarities or dissimilarities. This enables us to assess the importance of particular feature combinations automatically and to evaluate whether the model and the search algorithm generate potentially acceptable orders without having to run comprehension experiments each time. Such experiments however are crucial for determining how coherent the generated texts are and whether they convey the same semantic content as the originally authored texts. For multidocument summarization comparisons between our model and alternative ordering strategies are important if we want to pursue this approach further.</Paragraph>
    <Paragraph position="4"> Several improvements can take place with respect to the model. An obvious question is whether a trigram model performs better than the model presented here. The greedy algorithm implements a search procedure with a beam of width one. In the future we plan to experiment with larger widths (e.g., two or three) and also take into account features that express semantic similarities across documents either by relying on WordNet or on automatic clustering methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML