File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1007_intro.xml
Size: 3,905 bytes
Last Modified: 2025-10-06 14:03:19
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1007"> <Title>Automatic Detection of Nonreferential It in Spoken Multi-Party Dialog</Title> <Section position="2" start_page="0" end_page="49" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> This paper describes an implemented system for the detection of nonreferential it in spoken multi-party dialog. The system has been developed on the basis of meeting transcriptions from the ICSI Meeting Corpus (Janin et al., 2003), and it is intended as a preprocessing component for a coreference resolution system in the DIANA-Summ dialog summarization project. Consider the following utterance: MN059: Yeah. Yeah. Yeah. I'm sure I could learn a lot about um, yeah, just how to - how to come up with these structures, cuz it's - it's very easy to whip up something quickly, but it maybe then makes sense to to me, but not to anybody else, and - and if we want to share and integrate things, they must - well, they must be well designed really. (Bed017) In this example, only one of the three instances of it is a referential pronoun: The first it appears in the reparandum part of a speech repair (Heeman & Allen, 1999). It is replaced by a subsequent alteration and is thus not part of the final utterance. The second it is the subject of an extraposition construction and serves as the placeholder for the postposed infinitive phrase to whip up something quickly. Only the third it is a referential pronoun which anaphorically refers to something.</Paragraph> <Paragraph position="1"> The task of the system described in the following is to identify and filter out nonreferential instances of it, like the first and second one in the example. By preventing these instances from triggering the search for an antecedent, the precision of a coreference resolution system is improved.</Paragraph> <Paragraph position="2"> Up to the present, coreference resolution has mostly been done on written text. In this domain, the detection of nonreferential it has by now become a standard preprocessing step (e.g. Ng & Cardie (2002)). In the few works that exist on coreference resolution in spoken language, on the other hand, the problem could be ignored, because almost none of these aimed at developing a system that could handle unrestricted input. Eckert & Strube (2000) focus on an unimplemented algorithm for determining the type of antecedent (mostly NP vs. non-NP), given an anaphorical pronoun or demonstrative. The system of Byron (2002) is implemented, but deals mainly with how referents for already identified discourse-deictic anaphors can be created. Finally, Strube & M&quot;uller (2003) describe an implemented system for resolving 3rd person pronouns in spoken dialog, but they also exclude nonreferential it from consideration. In contrast, the present work is part of a project to develop a coreference resolution system that, in its final implementation, can handle unrestricted multi-party dialog. In such a system, no a priori knowledge is available about whether an instance of it is referential or not.</Paragraph> <Paragraph position="3"> The remainder of this paper is structured as follows: Section 2 describes the current state of the art for the detection of nonreferential it in written text. Section 3 describes our corpus of transcribed spoken dialog. It also reports on the annotation that we performed in order to collect training and test data for our machine learning experiments. The annotation also offered interesting insights into how reliably humans can identify non-referential it in spoken language, a question that, to our knowledge, has not been adressed before.</Paragraph> <Paragraph position="4"> Section 4 describes the setup and results of our machine learning experiments, Section 5 contains conclusion and future work.</Paragraph> </Section> class="xml-element"></Paper>