File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0212_concl.xml

Size: 1,164 bytes

Last Modified: 2025-10-06 13:54:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0212">
  <Title>Annotation and Data Mining of the Penn Discourse TreeBank</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4 Summary
</SectionTitle>
    <Paragraph position="0"> In this paper we have presented the Penn Discourse TreeBank (PDTB), a large-scale discourse-level annotated corpus that is being developed towards the creation of a multi-layered annotated corpus, integrating the Penn TreeBank, PropBank and 11We thank an anonymous reviewer for pointing this out.</Paragraph>
    <Paragraph position="1"> the PDTB. The PDTB encodes low-level discourse structure information, marking discourse connectives as indicators of discourse relations, and their arguments. We have reported high inter-annotator agreement for the PDTB annotation. Our data mining experience and preliminary results show that the multi-layered corpora is a rich source of information that can be exploited towards the development of powerful and efficient natural language understanding and generation systems as well as towards large-scale corpus-based research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML