XML Viewer - x98-1010

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/x98-1010_intro.xml
Size: 8,187 bytes
Last Modified: 2025-10-06 14:06:49
<?xml version="1.0" standalone="yes"?>
<Paper uid="X98-1010">
  <Title>COREFERENCE RESOLUTION STRATEGIES FROM AN APPLICATION PERSPECTIVE</Title>
  <Section position="3" start_page="45" end_page="46" type="intro">
    <SectionTitle>
2. ENTITY COREFERENCE
</SectionTitle>
    <Paragraph position="0"> Some coreference resolution techniques can be applied with only slight modifications across entity types. Identifying names and their variations is the first step in sorting out the person and organization entities. The NLToolset stores each newly recognized named entity, along with its computed variations and acronyms. The variations and acronyms are algorithmically generated, based on entity type, without reference to the text. For example, in general, person names can have nicknames, but organization names can have acronyms. (Persons sometimes also have acronyms, e.g JFK, but these are exceptions which must be stored as world knowledge.) Generated variations are stored in a temporary lexicon so that naturally occurring variations in the text can be recognized and linked to the original occurrence.</Paragraph>
    <Paragraph position="1"> In linking noun phrases with named entities, the NLToolset has rule packages which find noun phrases of specific types: organization, person, vehicle, drugs. This allows the NLToolset to limit the search space for referents.</Paragraph>
    <Paragraph position="2"> During context-based name recognition, entities are directly linked, via variable bindings within the patterns, with descriptive phrases that make up their context. These will be found in a set of four syntactic forms which are universal across entity types: appositives, predicate nominatives, prenominals, and name-modified head nouns.</Paragraph>
    <Paragraph position="3"> APPOSITIVE: Lockheed Martin, the aerospace giant, PREDICATE NOMINATIVE: Lockheed Ma~'n is a leader in information technology.</Paragraph>
    <Paragraph position="4"> PRENOMINAL: the defense contractor, Lockheed Martin Corporation, NAME-MODIFIED HEAD NOUN: the Lockheed Martin conglomerate These descriptive phrases can make up a document-specific ontology, or semantic filter, for the named entity which can be used to link isolated noun phrase references. This semantic filter had its origins in our TIPSTER II research on linking organization names with descriptive noun phrases.</Paragraph>
    <Paragraph position="5"> Organizations During our TIPSTER II research, it was found that organization names sometimes contain embedded semantic information which can be useful in resolving noun phrase coreferences. An experiment with the NLToolset's MUC6 performance, as reported in the TIPSTER II Proceedings, showed that using this information contributed five points of recall and seven of precision to the organization descriptor score. The technique used was to devise a semantic filter for an organization noun phrase and compare it to previous organization names to see if they can be linked. In the following example, the noun phrase and named organization have jewel references in common, which would be enough to link them.</Paragraph>
    <Paragraph position="6"> Semantic Filters: the jewelry chain =&gt;( jewelry jewel chain ) Smith Jewelers =&gt;( smith jewelers jeweler jewel ) If there is more than one candidate named entity, file position is considered as a factor, the closest name being the most likely referent.</Paragraph>
    <Paragraph position="7">  As the NLToolset's coreference resolution techniques were expanded to other types of entities, it was found that previous methods would not always be applicable. Person names do not generally contain semantic information. For example, John Smith would not automatically be recognized as a toilet manufacturer. For this reason, the semantic filter must rely solely on syntactically linked semantic information. For persons, however, the standard set of four forms (appositive, prenominal, predicate nominative, and name-modified head noun) can be  expanded to include person-specific information, such as titles, as in the following example.</Paragraph>
    <Paragraph position="8"> The Judiciary Committee voted today on the impeachment of President Nixon. The president has announced that he will resign.</Paragraph>
    <Section position="1" start_page="46" end_page="46" type="sub_section">
      <SectionTitle>
Vehicles
</SectionTitle>
      <Paragraph position="0"> The vehicle category is problematic because entities are often referred to by the type of vehicle, rather than by a specific name. For example, an airplane name might be Boeing 747 or F-14. Since it is possible to have several vehicles of the same type discussed in a document, all with the same &amp;quot;name,&amp;quot; the NLToolset's standard name linking algorithm does not apply. The decision to link names must come later, at the event level, when more information is known.</Paragraph>
      <Paragraph position="1"> Once the air vehicle names have been identified, airplane noun phrases are found and coreference resolution is performed, using the following algorithm. Assume that a noun phrase match belongs with the most recently seen entity, unless there is some contradictory information. If there is, then the current match is compared to the next most recently seen entity. If a match contradicts all previously seen entities, then it represents a new entity. The possible types of contradictory information currently are model information, manufacturer, military branch, airline, and flight number. The variable binding feature of the NLToolset pattern language allows the developer to extract type information during the name recognition process. For example, when the pattern for F-14 is constructed, the developer can inject the knowledge that plane types beginning with the letter F are considered fighter planes. This knowledge will allow the NLToolset to link the phrase &amp;quot;the fighter&amp;quot; to the named plane; moreover, it will prevent the phrase &amp;quot;the helicopter&amp;quot; from being linked. The algorithm for person and organization coreference resolution assumes that a noun phrase is not related unless there is some evidence to prove it, in direct contrast to that for vehicles.</Paragraph>
    </Section>
    <Section position="2" start_page="46" end_page="46" type="sub_section">
      <SectionTitle>
Quantified Artifacts
</SectionTitle>
      <Paragraph position="0"> Quantified artifacts, such as drug amounts, are handled with a straightforward algorithm that is usually successful, having achieved accuracy above 90% in the prototype application.</Paragraph>
      <Paragraph position="1"> All measured amounts of drugs are identified as unique entities. Generic noun phrases then can refer to the last mention of a drug, based on the specificity of the drug type. For example, the drugs would refer to the last drug entity regardless of type, while the cocaine would refer to the last cocaine entity. An exception to the rule is the case where the noun phrase is actually referring to a group of drug amounts. In that case, context clues would need to be considered in order to handle that occurrence. This is an area that has been identified for improvement.</Paragraph>
      <Paragraph position="2"> Measurement terms alone can indicate a drug amount within an elliPSiS, as in the following example.</Paragraph>
      <Paragraph position="3"> 17 kg. of cocaine was found in the trunk of the car, while 2 kg. were found in the glove compartment.</Paragraph>
      <Paragraph position="4"> To resolve this coreference to a common drug type, cocaine, the algorithm picks up the last mention of the drug from a drug stack, which keeps track of which drug was mentioned last.</Paragraph>
      <Paragraph position="5"> A problematic case is that in which a drug seizure is referred to in general terms, giving the total amount of drugs seized, and then gives a breakdown of the amounts. The NLToolset will identify all measured drug amounts as unique. Currently, there are no heuristics to check on redundancy of seizure information, based on quantity captured. This will be an area to explore in future work, as the prototype is brought to an operational level.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML