File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1632_concl.xml

Size: 2,370 bytes

Last Modified: 2025-10-06 13:55:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1632">
  <Title>Using Linguistically Motivated Features for Paragraph Boundary Identification</Title>
  <Section position="8" start_page="272" end_page="272" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we proposed a novel approach to paragraph boundary identification based on linguistic features such as pronominalization, discourse cues and information structure. The results are significantly higher than all baselines and a reimplementation of Sporleder &amp; Lapata's (2006) system and achieve an F-measure of about 59%.</Paragraph>
    <Paragraph position="1"> We investigated to what extent the paragraph structure is determined by each of the three factors and came to the conclusion that it crucially depends on the use of pronouns and information structure. Surprisingly, discourse cues did not turn out to be useful for this task and even negatively affected the results which we explain by the extremely sparseness of the cues in our data.</Paragraph>
    <Paragraph position="2"> It turned out that the best results could be achieved by a combination of surface features (rel-Pos, word1, word2) and features capturing text cohesion. This indicates that paragraph boundary identification requires features usually used for style analysis and ones describing cohesive relations. Therefore, paragraph boundary identification is in fact a task which crosses the borders between content and style.</Paragraph>
    <Paragraph position="3"> An obvious limitation of our study is that we trained and tested the algorithm on one-genre domain where pronouns are used extensively. Experimenting with different genres should shed light on whether our features are in fact domaindependent. In the future, we also want to experiment with a larger data set for determining whether discourse cues really do not correlate with paragraph boundaries. Then, we will move on towards multi-document summarization, the application which motivates the research described here.</Paragraph>
    <Paragraph position="4"> Acknowledments: This work has been funded by the Klaus Tschira Foundation, Heidelberg, Germany. The first author has been supported by a KTF grant (09.009.2004). We would also like to thank the three anonymous reviewers for their comments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML