File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0504_concl.xml

Size: 1,911 bytes

Last Modified: 2025-10-06 13:53:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0504">
  <Title>Summarization of Noisy Documents: A Pilot Study</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> In this paper, we have discussed some of the challenges in summarizing noisy documents. In particular, we broke down the summarization process into four steps: sentence boundary detection, preprocessing (part-of-speech tagging and syntactic parsing), extraction, and editing.</Paragraph>
    <Paragraph position="1"> We tested each step on noisy documents and analyzed the errors that arose. We also studied how the quality of summarization is affected by the noise level and the errors made at each stage of processing.</Paragraph>
    <Paragraph position="2"> To improve the performance of noisy document summarization, we suggest extracting keywords or phrases rather than full sentences, especially when summarizing documents with high levels of noise. We also propose using other sources of information, such as document lay-out cues, in combination with text when summarizing noisy documents. In certain cases, it will be important to be able to assess the noise level in a document; we have begun exploring this question as well. Our plans for the future include developing robust techniques to address the issues we have outlined in this paper.</Paragraph>
    <Paragraph position="3"> Lastly, we regard presentation and user interaction as a crucial component in real-world summarization systems.</Paragraph>
    <Paragraph position="4"> Given that noisy documents, and hence their summaries, may contain errors, it is important to find the best ways of displaying such information so that the user may proceed with confidence, knowing that the summary is truly representative of the document(s) in question.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML