File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0319_intro.xml

Size: 3,047 bytes

Last Modified: 2025-10-06 14:06:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0319">
  <Title>Probabilistic Coreference in Information</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Natural language information extraction (IE) systems take texts containing natural language as input and produce database templates populated with information that is relevant to a particular application.</Paragraph>
    <Paragraph position="1"> These records may be fed as input to a downstream system for which the IE system is only one of several sources of information. In such a scenario, the downstream system must .fuse the incoming information from each of its sources, requiring the resolution of conflicts. To accomplish this, the fusion system must know the reliability of the information received from each source; in this way unreliable information from one source can be disregarded in favor of highly reliable information from another.</Paragraph>
    <Paragraph position="2"> Figure 1 exhibits this scenario with a typical IE system such as SRI's FASTUS system (Hobbs et al., 1996). The IE system has two components. The first component consists of a series of phases that recognize domain-relevant patterns in the text and create templates representing event and entity descriptions from them. The second component merges templates created from different phrases in the text that overlap in reference. The resulting set of templates constitutes a formal description of the state of affairs as described in the text with respect to the application specification, which is then fed to the downstream system.</Paragraph>
    <Paragraph position="3"> As part of determining this state of affairs, the IE system must create templates describing the relevant entities that are reported on. This requires determining when two or more templates describe the same entity, as templates created from coreferring phrases need to be merged. We have performed an informal study of FASTUS's processing of a set of texts which indicates that the merging phase is where most of the ambiguities (as well as most of the errors) lie. However, most IE systems, including FASTUS, have pursued a deterministic strategy for merging and report only a single possible state of affairs. This limitation makes it difficult for a downstream system to fuse the information with possibly contradictory information from other sources, as no information about the IE system's certainty of the results is passed along, nor is information about possible alternative states of affairs and their associated levels of certainty.</Paragraph>
    <Paragraph position="4"> In this paper, we consider the problem of assigning a probability distribution to alternative sets of coreference relationships among entity descriptions.</Paragraph>
    <Paragraph position="5"> We present the results of initial experiments with several approaches to estimating such distributions in an application using FASTUS.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML