File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1048_intro.xml
Size: 2,278 bytes
Last Modified: 2025-10-06 14:06:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1048"> <Title>Corpus-Based Identification of Non-Anaphoric Noun Phrases</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Most automated approaches to coreference resolution attempt to locate an antecedent for every potentially coreferent discourse entity (DE) in a text. The problem with this approach is that a large number of DE's may not have antecedents. While some discourse entities such as pronouns are almost always referential, definite descriptions I may not be. Earlier work found that nearly 50% of definite descriptions had no prior referents (Vieira and Poesio, 1997), and we found that number to be even higher, 63%, in our corpus. Some non-anaphoric definite descriptions can be identified by looking for syntactic clues like attached prepositional phrases or restrictive relative clauses. But other definite descriptions are non-anaphoric because readers understand their meaning due to common knowledge. For example, readers of this 1In this work, we define a definite description to be a noun phrase beginning with the.</Paragraph> <Paragraph position="1"> paper will probably understand the real world referents of &quot;the F.B.I.,&quot; &quot;the White House,&quot; and &quot;the Golden Gate Bridge.&quot; These are instances of definite descriptions that a coreference resolver does not need to resolve because they each fully specify a cognitive representation of the entity in the reader's mind.</Paragraph> <Paragraph position="2"> One way to address this problem is to create a list of all non-anaphoric NPs that could be used as a filter prior to coreference resolution, but hand coding such a list is a daunting and intractable task. We propose a corpus-based mechanism to identify non-anaphoric NPs automatically. We will refer to non-anaphoric definite noun phrases as existential NPs (Allen, 1995). Our algorithm uses statistical methods to generate lists of existential noun phrases and noun phrase patterns from a training corpus.</Paragraph> <Paragraph position="3"> These lists are then used to recognize existential NPs in new texts.</Paragraph> </Section> class="xml-element"></Paper>