File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0212_concl.xml
Size: 1,164 bytes
Last Modified: 2025-10-06 13:54:11
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0212"> <Title>Annotation and Data Mining of the Penn Discourse TreeBank</Title> <Section position="5" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Summary </SectionTitle> <Paragraph position="0"> In this paper we have presented the Penn Discourse TreeBank (PDTB), a large-scale discourse-level annotated corpus that is being developed towards the creation of a multi-layered annotated corpus, integrating the Penn TreeBank, PropBank and 11We thank an anonymous reviewer for pointing this out.</Paragraph> <Paragraph position="1"> the PDTB. The PDTB encodes low-level discourse structure information, marking discourse connectives as indicators of discourse relations, and their arguments. We have reported high inter-annotator agreement for the PDTB annotation. Our data mining experience and preliminary results show that the multi-layered corpora is a rich source of information that can be exploited towards the development of powerful and efficient natural language understanding and generation systems as well as towards large-scale corpus-based research.</Paragraph> </Section> class="xml-element"></Paper>