File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0214_evalu.xml
Size: 1,326 bytes
Last Modified: 2025-10-06 13:59:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0214"> <Title>Discourse Annotation in the Monroe Corpus</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> Spoken dialogue is a very difficult domain to work with because utterances are often marred with disfluencies, speech repairs, and are incomplete or ungrammatical. Speakers will interrupt each other. As a result, many empirical methods that work well in very formal, structured domains such as newspaper texts or manuals tend to suffer. For example, many leading pronoun resolution methods perform around 80% accuracy over a corpus of syntactically-parsed Wall Street Journal articles (e.g., (Tetreault, 2001) and (Ge et al., 1998)), but in spoken dialogue the performance of these algorithms drops significantly (Byron, 2002).</Paragraph> <Paragraph position="1"> However, by including semantic and discourse information, one is able to improve performance.</Paragraph> <Paragraph position="2"> Our preliminary results show that using the semantic feature lists associated with each entity as a filter for reference increases performance to 59% from 44%. Adding discourse segmentation boosts that figure to 66% over some parts of the corpus.</Paragraph> </Section> class="xml-element"></Paper>