File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0214_intro.xml
Size: 2,259 bytes
Last Modified: 2025-10-06 14:02:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0214"> <Title>Discourse Annotation in the Monroe Corpus</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Discourse information plays an important part in natural language systems performing tasks such as text summarization, question-answering systems and collaborative planning. But the type of discourse information that is relevant varies widely depending on domain, genre, number of participants, whether it is written or spoken, etc. Therefore empirical analysis is necessary to determine commonalities in the variations of discourse and develop general purpose algorithms for discourse analysis.</Paragraph> <Paragraph position="1"> The heightened interest in human language technologies in the last decade has sparked several discourse annotation projects. Though there has been a lot of research, many of the projects focus on a few specific areas of discourse relevant to their respective system. For example, a text summarization system working on texts from the web would not need to know about dialogue modeling or grounding or prosody. In contrast, for a spoken dialogue system that collaborates with a user, such information is crucial but the organization of web pages is not.</Paragraph> <Paragraph position="2"> In this paper we describe our work in the Monroe Project, an effort targeting the production and use of a linguistically rich annotated corpus of a series of task-oriented spoken dialogs in an emergency rescue domain. Our project differs from past projects involving reference annotation and discourse segmentation in that the semantics and discourse information is generated automatically. Most other work in this area has had minimal semantics or speech act tagging, if anything at all, which can be quite labor intensive to annotate. In addition, our domain is spoken language, which is rarely annotated for the information we are providing. We describe our research on reference resolution and discourse segmentation using the annotated corpus and the software tools we have developed to help us with different aspects of the annotation tasks.</Paragraph> </Section> class="xml-element"></Paper>