File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1022_intro.xml
Size: 2,624 bytes
Last Modified: 2025-10-06 14:01:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1022"> <Title>A Machine Learning Approach to Pronoun Resolution in Spoken Dialogue</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Corpus-based methods and machine learning techniques have been applied to anaphora resolution in written text with considerable success (Soon et al., 2001; Ng & Cardie, 2002, among others). It has been demonstrated that systems based on these approaches achieve a performance that is comparable to hand-crafted systems. Since they can easily be applied to new domains it seems also feasible to port a given corpus-based anaphora resolution system from written text to spoken dialogue. This paper describes the extensions and adaptations needed for applying our anaphora resolution system (M&quot;uller et al., 2002; Strube et al., 2002) to pronoun resolution in spoken dialogue.</Paragraph> <Paragraph position="1"> There are important differences between written text and spoken dialogue which have to be accounted for. The most obvious difference is that in spoken dialogue there is an abundance of (personal and demonstrative) pronouns with non-NP-antecedents or no antecedents at all. Corpus studies have shown that a significant amount of pronouns in spoken dialogue have non-NP-antecedents: Byron & Allen (1998) report that about 50% of the pronouns in the TRAINS93 corpus have non-NP-antecedents. Eckert & Strube (2000) note that only about 45% of the pronouns in a set of Switchboard dialogues have NP-antecedents. The remainder consists of 22% which have non-NP-antecedents and 33% without antecedents. These studies suggest that the performance of a pronoun resolution algorithm can be improved considerably by enabling it to resolve also pronouns with non-NP-antecedents.</Paragraph> <Paragraph position="2"> Because of the difficulties a pronoun resolution algorithm encounters in spoken dialogue, previous approaches were applied only to tiny domains, they needed deep semantic analysis and discourse processing and relied on hand-crafted knowledge bases.</Paragraph> <Paragraph position="3"> In contrast, we build on our existing anaphora resolution system and incrementally add new features specifically devised for spoken dialogue. That way we are able to determine relatively powerful yet computationally cheap features. To our knowledge the work presented here describes the first implemented system for corpus-based anaphora resolution dealing also with non-NP-antecedents.</Paragraph> </Section> class="xml-element"></Paper>