File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2055_metho.xml

Size: 24,538 bytes

Last Modified: 2025-10-06 14:10:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2055">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Analysis and Repair of Name Tagger Errors</Title>
  <Section position="5" start_page="420" end_page="421" type="metho">
    <SectionTitle>
3 Baseline Name Tagger
</SectionTitle>
    <Paragraph position="0"> We apply a multi-lingual (English / Chinese) bigram HMM tagger to identify four named entity types: Person, Organization, GPE ('geopolitical entities' - locations which are also political units, such as countries, counties, and cities) and Location. The HMM tagger generally follows the Nymble model (Bikel et al, 1997), and uses best-first search to generate N-Best hypotheses for each input sentence.</Paragraph>
    <Paragraph position="1"> In mixed-case English texts, most proper names are capitalized. So capitalization provides a crucial clue for name boundaries.</Paragraph>
    <Paragraph position="2"> In contrast, a Chinese sentence is composed of a string of characters without any word boundaries or capitalization. Even after word segmentation there are still no obvious clues for the name boundaries. However, we can apply the following coarse &amp;quot;usable-character&amp;quot; restrictions to reduce the search space.</Paragraph>
    <Paragraph position="3"> Standard Chinese family names are generally single characters drawn from a set of 437 family names (there are also 9 two-character family names, although they are quite infrequent) and given names can be one or two characters (Gao et al., 2005). Transliterated Chinese person names usually consist of characters in three relatively fixed character lists (Begin character list, Middle character list and End character list). Person abbreviation names and names including title words match a few patterns. The suffix words (if there are any) of Organization and GPE names belong to relatively fixed lists too.</Paragraph>
    <Paragraph position="4"> However, this &amp;quot;usable-character&amp;quot; restriction is not as reliable as the capitalization information for English, since each of these special characters can also be part of common words.</Paragraph>
    <Section position="1" start_page="420" end_page="420" type="sub_section">
      <SectionTitle>
3.1 Identification and Classification Errors
</SectionTitle>
      <Paragraph position="0"> We begin our error analysis with an investigation of the English and Chinese baseline taggers, decomposing the errors into identification and classification errors. In Figure 1 we report the identification F-Measure for the baseline (the first hypothesis), and the N-best upper bound, the best of the N hypotheses  , using different models: English MonoCase (EN-Mono, without capitalization), English Mixed Case (EN-Mix, with capitalization), Chinese without the usable character restriction (CH-NoRes) and Chinese with the usable character restriction (CH-WithRes).</Paragraph>
    </Section>
    <Section position="2" start_page="420" end_page="421" type="sub_section">
      <SectionTitle>
Name Identification
</SectionTitle>
      <Paragraph position="0"> Figure 1 shows that capitalization is a crucial clue in English name identification (increasing the F measure by 7.6% over the monocase score).</Paragraph>
      <Paragraph position="1"> We can also see that the best of the top N (N &lt;= 30) hypotheses is very good, so reranking a small number of hypotheses has the potential of producing a very good tagger.</Paragraph>
      <Paragraph position="2"> The &amp;quot;usable&amp;quot; character restriction plays a major role in Chinese name identification, increasing the F-measure 4%. With this restriction, the performance of the best-of-N-best is again very good. However, it is evident that, even with this restriction, identification is more challenging for Chinese, due to the absence of capitalization and word boundaries.</Paragraph>
      <Paragraph position="3"> Figure 2 shows the classification accuracy of the above four models. We can see that capitalization does not help English name classification;  These figures were obtained using training and test corpora described later in this paper, and a value of N ranging from 1 to 30 depending on the margin of the HMM tagger, as also described below. All figures are with respect to the official ACE keys prepared by the Linguistic Data Consortium.</Paragraph>
    </Section>
    <Section position="3" start_page="421" end_page="421" type="sub_section">
      <SectionTitle>
Name Classification
3.2 Identification Errors in Chinese
</SectionTitle>
      <Paragraph position="0"> For the remainder of this paper we shall focus on the more difficult problems of Chinese tagging, using the HMM system with character restrictions as our baseline. The name identification errors of this system can be divided into missed names (21%), spurious names (29%), and boundary errors, where there is a partial overlap between the names in the key and the system response (50%). Confusion between names and nominals (phrases headed by a common noun) is a major source of both missed and spurious names (56% of missed, 24% of spurious). In a language without capitalization, this is a hard task even for people; one must rely largely on world knowledge to decide whether a phrase (such as the &amp;quot;criminal-processing team&amp;quot;) is an organization name or merely a description of an organization. The other major source of missed names is words not seen in the training data, generally representing minor cities or other locations in China (28%). For spurious names, the largest source of error is names of a type not included in the key (44%) which are mistakenly tagged as one of the known name types.</Paragraph>
      <Paragraph position="1">  As we shall see, different types of knowledge are required for correcting different types of errors.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="421" end_page="423" type="metho">
    <SectionTitle>
4 Mutual Inferences between Informa-
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="421" end_page="422" type="sub_section">
      <SectionTitle>
tion Extraction Stages
4.1 Extraction Pipeline
</SectionTitle>
      <Paragraph position="0"> Name tagging is typically one of the first stages  If the key included an 'other' class of names, these would be classification errors; since it does not -- since these names are not tagged in the key -- the automatic scorer treats them as spurious names.</Paragraph>
      <Paragraph position="1"> in an information extraction pipeline. Specifically, we will consider a system which was developed for the ACE (Automatic Content Extraction) task  and includes the following stages: name structure parsing, coreference, semantic relation extraction and event extraction (Ji et al., 2006). All these stages are performed after name tagging since they take names as input &amp;quot;objects&amp;quot;. However, the inferences from these subsequent stages can also provide valuable constraints to identify and classify names.</Paragraph>
      <Paragraph position="2"> Each of these stages connects the name candidate to other linguistic elements in the sentence, document, or corpus, as shown in Figure 3.</Paragraph>
      <Paragraph position="3">  The baseline name tagger (HMM) uses very local information; feedback from later extraction stages allows us to draw from a wider context in making final name tagging decisions.</Paragraph>
      <Paragraph position="4"> In the following we use two related (translated) texts as examples, to give some intuition of how these different types of linguistic evidence improve name tagging.</Paragraph>
      <Paragraph position="5">  on the morning of the 6 th formed a &lt;crisis-handling  The ACE task description can be found at  Rather than offer the most fluent translation, we have provided one that more closely corresponds to the Chinese text in order to more clearly illustrate the linguistic issues. Transliterated names are rendered phonetically, character by character.</Paragraph>
      <Paragraph position="6"> supporting inference information  , to deal with transfer-of-power issues. null This crisis committee includes police, supply, economics and other important departments. In such a crisis, people cannot think through this question: has the &lt;yugoslav&gt;  president &lt;mi lo se vi c&gt;  used up his skills? According to the official voting results in the first round of elections, &lt;mi lo se vi c&gt;  . [...] Document 2: Biography of these two leaders [...]&lt;ke shi tu ni cha&gt;  used to pursue an academic career, until 1974, when due to his opposition position he was fired by &lt;bei er ge le&gt;  &lt;law school&gt;  and left the academic community.</Paragraph>
      <Paragraph position="7"> &lt;ke shi tu ni cha&gt;  also at the beginning of the 1990s joined the opposition activity, and in 1992 founded &lt;sai er wei ya&gt;  &lt;opposition party&gt;  .</Paragraph>
      <Paragraph position="8"> This famous new leader and his previous classmate at law school, namely his wife &lt;zuo li ka&gt;  live in an apartment in &lt;bei er ge le&gt;  .</Paragraph>
      <Paragraph position="9"> The vanished &lt;mi lo se vi c&gt;  was born in &lt;sai er wei ya&gt;  's central industrial city. [...]</Paragraph>
    </Section>
    <Section position="2" start_page="422" end_page="423" type="sub_section">
      <SectionTitle>
4.1 Inferences for Correcting Name Errors
</SectionTitle>
      <Paragraph position="0"> Constraints and preferences on the structure of individual names can capture local information missed by the baseline name tagger. They can correct several types of identification errors, including in particular boundary errors. For example, &amp;quot;&lt;ke shi tu ni cha&gt;  &amp;quot; is more likely to be correct than &amp;quot;&lt;shi tu ni cha&gt;  &amp;quot; since &amp;quot;shi&amp;quot; (Shi ) cannot be the first character of a transliterated name.</Paragraph>
      <Paragraph position="1"> Name structures help to classify names too. For example, &amp;quot;anti-democracy committee  &amp;quot; is parsed as &amp;quot;[Org-Modifier anti-democracy] [Org-Suffix committee]&amp;quot;, and the first character is not a person last name or the first character of a transliterated person name, so it is more likely to be an organization than a person name.  Information about expected sequences of constituents surrounding a name can be used to correct name boundary errors. In particular, event extraction is performed by matching patterns involving a &amp;quot;trigger word&amp;quot; (typically, the main verb or nominalization representing the event) and a set of arguments. When a name candidate is involved in an event, the trigger word and other arguments of the event can help to determine the name boundaries. For example, in the sentence &amp;quot;The vanished mi lo se vi c was born in sai er wei ya 's central industrial city&amp;quot;, &amp;quot;mi lo se vi c&amp;quot; is more likely to be a name than &amp;quot;mi lo se&amp;quot;, &amp;quot;sai er wei ya&amp;quot; is more likely be a name than &amp;quot;er wei&amp;quot;, because these boundaries will allow us to match the event pattern &amp;quot;[Adj] [PER-NAME] [Trigger word for 'born' event] in [GPE-NAME]'s [GPENominal]&amp;quot;. null  Any context which can provide selectional constraints or preferences for a name can be used to correct name classification errors. Both semantic relations and events carry selectional constraints and so can be used in this way.</Paragraph>
      <Paragraph position="2"> For instance, if the &amp;quot;Personal-Social/Business&amp;quot; relation (&amp;quot;opponent&amp;quot;) between &amp;quot;his&amp;quot; and &amp;quot;&lt;ke shi tu ni cha&gt;  &amp;quot; is correctly identified, it can help to classify &amp;quot;&lt;ke shi tu ni cha&gt;  &amp;quot; as a person name.</Paragraph>
      <Paragraph position="3"> Relation information is sometimes crucial to classifying names. &amp;quot;&lt;mi lo se vi c&gt;  &amp;quot; and &amp;quot;&lt;ke shi tu ni cha&gt;  &amp;quot; are likely person names because they are &amp;quot;employees&amp;quot; of &amp;quot;&lt;yugoslav&gt;  as a person name.</Paragraph>
      <Paragraph position="4"> Events, like relations, can provide effective selectional preferences to correctly classify names. For example, &amp;quot;&lt;mi lo se vi c&gt;  &amp;quot; are likely person names because they are involved in the following events: &amp;quot;claim&amp;quot;, &amp;quot;escape&amp;quot;, &amp;quot;built&amp;quot;, &amp;quot;beat&amp;quot;, &amp;quot;born&amp;quot;, while &amp;quot;&lt;sai er wei ya&gt;  &amp;quot;can be easily tagged as GPE because it's a &amp;quot;birth-place&amp;quot; in the event &amp;quot;born&amp;quot;.</Paragraph>
      <Paragraph position="5">  Names which are introduced in an article are likely to be referred to again, either by repeating the same name or describing it with nominal mentions (phrases headed by common nouns). These mentions will have the same spelling (though if a name has several parts, some may be dropped) and same semantic type. So if the boundary or type of one mention can be determined with some confidence, coreference can be used to disambiguate other mentions.</Paragraph>
      <Paragraph position="6"> For example, if &amp;quot;&lt; mi lo se vi c&gt;  refering to &amp;quot;&lt; mi lo se vi c&gt;  &amp;quot; as an organization name in preference to the alternative name candidate &amp;quot;&lt;crisis-handling&gt;  &amp;quot;.</Paragraph>
      <Paragraph position="7"> For a name candidate, high-confidence information about the type of one mention can be used to determine the type of other mentions. For example, for the repeated person name &amp;quot;&lt; mi lo se</Paragraph>
      <Paragraph position="9"> &amp;quot; type information based on the event context of one mention can be used to classify or confirm the type of the others. The person nominal &amp;quot;This famous new leader&amp;quot; confirms &amp;quot;&lt;ke shi tu ni cha&gt;  &amp;quot; as a person name.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="423" end_page="425" type="metho">
    <SectionTitle>
5 Incremental Re-Ranking Algorithm
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="423" end_page="423" type="sub_section">
      <SectionTitle>
5.1 Overall Architecture
</SectionTitle>
      <Paragraph position="0"> In this section we will present the algorithms to capture the intuitions described in Section 4. The overall system pipeline is presented in Figure 4.</Paragraph>
      <Paragraph position="1">  The baseline name tagger generates N-Best multiple hypotheses for each sentence, and also computes the margin - the difference between the log probabilities of the top two hypotheses. This is used as a rough measure of confidence in the top hypothesis. A large margin indicates greater confidence that the first hypothesis is correct. null  It generates name structure parsing results too, such as the family name and given name of person, the prefixes of the abbreviation names, the modifiers and suffixes of organization names. Then the results from subsequent components are exploited in four incremental re-rankers. From each re-ranking step we output the best name hypothesis directly if the re-ranker has high confidence in its decisions. Otherwise the sentence is forwarded to the next re-ranker, based on other features. In this way we can adjust the ranking of multiple hypotheses and select the best tagging for each sentence gradually.</Paragraph>
      <Paragraph position="2"> The nominal mention tagger (noun phrase chunker) uses a maximum entropy model. Entity type assignment for the nominal heads is done by table look-up. The coreference resolver is a combination of high-precision heuristic rules and maximum entropy models. In order to incorporate wider context we use cross-document coreference for the test set. We cluster the documents using a cross-entropy metric and then treat the entire cluster as a single document.</Paragraph>
      <Paragraph position="3"> The relation tagger uses a K-nearest-neighbor algorithm.</Paragraph>
      <Paragraph position="4"> We extract event patterns from the ACE05 training corpus for personnel, contact, life, business, and conflict events. We also collect additional event trigger words that appear frequently in name contexts, from a syntactic dictionary, a synonym dictionary and Chinese PropBank V1.0.</Paragraph>
      <Paragraph position="5"> Then the patterns are generalized and tested semi-automatically.</Paragraph>
    </Section>
    <Section position="2" start_page="423" end_page="425" type="sub_section">
      <SectionTitle>
5.2 Supervised Re-Ranking Model
</SectionTitle>
      <Paragraph position="0"> In our name re-ranking model, each hypothesis is an NE tagging of the entire sentence, for example, &amp;quot;The vanished &lt;PER&gt;mi lo se vi c&lt;/PER&gt; was born in &lt;GPE&gt;sai er wei ya&lt;/GPE&gt;'s central industrial city&amp;quot;; and each pair of hypotheses (h</Paragraph>
      <Paragraph position="2"> The margin also determines the number of hypotheses (N) generated by the baseline tagger. Using cross-validation on the training data, we determine the value of N required to include the best hypothesis, as a function of the margin. We then divide the margin into ranges of values, and set a value of N for each range, with a maximum of 30.</Paragraph>
      <Paragraph position="4"> is tagged as PER without family name, and it does not consist entirely of transliterated person name characters; otherwise 0</Paragraph>
      <Paragraph position="6"> the voting rate among all the candidate hypotheses  The method of counting the voting rate refers to (Zhai, 04) and (Ji and Grishman, 05)  Extracted from the high-frequency name lists from the training corpus, and country/province/state/ city lists from Chinese wikipedia.</Paragraph>
      <Paragraph position="7">  The goal of each re-ranker is to learn a ranking function f of the following form: for each pair of</Paragraph>
      <Paragraph position="9"> is worse than h j . In this way we are able to convert ranking into a classification problem. And then a maximum entropy model for re-ranking these hypotheses can be trained and applied. During training we use F-measure to measure the quality of each name hypothesis against the key. During test we get from the MaxEnt classifier the probability (ranking confidence) for each pair: Prob (f (h</Paragraph>
      <Paragraph position="11"> ) = 1). Then we apply a dynamic decoding algorithm to output the best hypothesis. More details about the re-ranking algorithm are presented in (Ji et al., 2006).</Paragraph>
    </Section>
    <Section position="3" start_page="425" end_page="425" type="sub_section">
      <SectionTitle>
5.3 Re-Ranking Features
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> ), we construct a feature set for assessing the ranking of h i and h j . Based on the information obtained from inferences, we compute (for each property) the property score</Paragraph>
      <Paragraph position="4"> some of these properties depend also on the corresponding name tags in h j . Then we sum over all names in each hypothesis h  summarizes the property scores PS ik used in the different re-rankers; space limitations prevent us from describing them in further detail.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="425" end_page="425" type="metho">
    <SectionTitle>
6 Experimental Results and Analysis
</SectionTitle>
    <Paragraph position="0"> Table 4 shows the data used to train each stage, drawn from the ACE training data and other sources. The training samples of the re-rankers are obtained by running the name tagger in crossvalidation. 100 ACE 04 documents were held out for use as test data.</Paragraph>
    <Paragraph position="1"> In the following we evaluate the contributions of re-rankers in name identification and classification separately.</Paragraph>
    <Paragraph position="2">  Tables 5 and 6 show the performance on identification, classification, and the combined task as we add each re-ranker to the system.</Paragraph>
    <Paragraph position="3"> The gain is greater for classification (2.7%) than for identification (1.2%). Furthermore, we can see that the gain in identification is produced primarily by the name structure and coreference components. As we noted earlier, the name structure analysis can correct boundary errors by preferring names with complete internal components, while coreference can resolve a boundary ambiguity for one mention of a name if another mention is unambiguous. The greatest gains were therefore obtained in boundary errors: the stages together eliminated over 1/3 of boundary errors and about 10% of spurious names; only a few missing names were corrected, and some correct names were deleted.</Paragraph>
    <Paragraph position="4"> Both relations and events contribute substantially to classification performance through their selectional constraints. The lesser contribution of events is related to their lower frequency. Only 11% of the sentences in the test data contain instances of the original ACE event types. To increase the impact of the event patterns, we broadened their coverage to include additional frequent event types, so that finally 35% of sentences contain event &amp;quot;trigger words&amp;quot;. We used a simple cross-document coreference method in which the test documents were clustered based on their cross-entropy and documents in the same cluster were treated as a single document for coreference. This produced small gains in both identification (0.6% vs. 0.4%) and classification (0.8% vs. 0.4%) over single-document coreference.</Paragraph>
  </Section>
  <Section position="9" start_page="425" end_page="426" type="metho">
    <SectionTitle>
7 Discussion
</SectionTitle>
    <Paragraph position="0"> The use of 'feedback' from subsequent stages of analysis has yielded substantial improvements in name tagging accuracy, from F=87.5 with the baseline HMM to F=91.2. This performance compares quite favorably with the performance of the human annotators who prepared the ACE  2005 training data. The annotator scores (when measured against a final key produced by review and adjudication of the two annotations) were F=92.5 for one annotator and F=92.7 for the other.</Paragraph>
    <Paragraph position="1"> As in the case of the automatic tagger, human classification accuracy (97.2 - 97.6%) was better than identification accuracy (F = 95.0 - 95.2%). In Figure 5 we summarize the error rates for the baseline system, the improved system without coreference based re-ranker, the final system with re-ranking, and a single annotator.</Paragraph>
    <Paragraph position="2">  Figure 5 shows that the performance improvement reflects a reduction in classification and boundary errors. Compared to the system, the human annotator's identification accuracy was much more skewed (52.3% missing, 13.5% spurious), suggesting that a major source of identification error was not difference in judgement but rather names which were simply overlooked by one annotator and picked up by the other.</Paragraph>
    <Paragraph position="3"> This further suggests that through an extension of our joint inference approach we may soon be able to exceed the performance of a single manual annotator.</Paragraph>
    <Paragraph position="4"> Our analysis of the types of errors, and the performance of our knowledge sources, gives some indication of how these further gains may be achieved. The selectional force of event extraction was limited by the frequency of event patterns - only about 1/3 of sentences had a pattern  Here spurious errors are names in the system response which do not overlap names in the key; missing errors are names in the key which do not overlap names in the system response; and boundary errors are names in the system response which partially overlap names in the key plus names in the key which partially overlap names in the system response. null instance. Even with this limitation, we obtained a gain of 0.5% in name classification. Capturing a broader range of selectional patterns should yield further improvements. Nearly 70% of the spurious names remaining in the final output were in fact instances of 'other' types of names, such as book titles and building names; creating explicit models of such names should improve performance. Finally, our cross-document coreference is currently performed only within the (small) test corpus. Retrieving related articles from a large collection should increase the likelihood of finding a name instance with a disambiguating context.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML