File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1048_intro.xml

Size: 2,278 bytes

Last Modified: 2025-10-06 14:06:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1048">
  <Title>Corpus-Based Identification of Non-Anaphoric Noun Phrases</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Most automated approaches to coreference resolution attempt to locate an antecedent for every potentially coreferent discourse entity (DE) in a text. The problem with this approach is that a large number of DE's may not have antecedents. While some discourse entities such as pronouns are almost always referential, definite descriptions I may not be. Earlier work found that nearly 50% of definite descriptions had no prior referents (Vieira and Poesio, 1997), and we found that number to be even higher, 63%, in our corpus. Some non-anaphoric definite descriptions can be identified by looking for syntactic clues like attached prepositional phrases or restrictive relative clauses. But other definite descriptions are non-anaphoric because readers understand their meaning due to common knowledge. For example, readers of this 1In this work, we define a definite description to be a noun phrase beginning with the.</Paragraph>
    <Paragraph position="1"> paper will probably understand the real world referents of &amp;quot;the F.B.I.,&amp;quot; &amp;quot;the White House,&amp;quot; and &amp;quot;the Golden Gate Bridge.&amp;quot; These are instances of definite descriptions that a coreference resolver does not need to resolve because they each fully specify a cognitive representation of the entity in the reader's mind.</Paragraph>
    <Paragraph position="2"> One way to address this problem is to create a list of all non-anaphoric NPs that could be used as a filter prior to coreference resolution, but hand coding such a list is a daunting and intractable task. We propose a corpus-based mechanism to identify non-anaphoric NPs automatically. We will refer to non-anaphoric definite noun phrases as existential NPs (Allen, 1995). Our algorithm uses statistical methods to generate lists of existential noun phrases and noun phrase patterns from a training corpus.</Paragraph>
    <Paragraph position="3"> These lists are then used to recognize existential NPs in new texts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML