File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/m98-1009_metho.xml

Size: 13,640 bytes

Last Modified: 2025-10-06 14:14:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="M98-1009">
  <Title>ALGORITHMS THAT LEARN TO EXTRACT INFORMATION BBN: DESCRIPTION OF THE SIFT SYSTEM AS USED FOR MUC-7</Title>
  <Section position="3" start_page="10" end_page="10" type="metho">
    <SectionTitle>
TE/TR Results
</SectionTitle>
    <Paragraph position="0"> The SIFT system worked by first applying the sentence-level model to each sentence in the message and then extracting entities, descriptors, and relations from the resulting trees, heuristically merging TE elements, applying the cross-sentence model to identify non-local relations, and finally filtering and formatting TE and TR templates for output. The system's score on the TE task was 83% recall with 84% precision, for an F of 83.49%. Its score on TR was 64% recall with 81% precision, for an F of 71.23%.</Paragraph>
  </Section>
  <Section position="4" start_page="10" end_page="11" type="metho">
    <SectionTitle>
IDENTIFINDER(TM): A STATISTICAL NAME-FINDER
</SectionTitle>
    <Paragraph position="0"> Overview of the IdentiFinder(TM) HMM Model For the Named Entity task, we used the IdentiFinder(TM) trained named entity extraction system (Bikel, et. al., 1997), which utilizes an HMM to recognize the entities present in the text.</Paragraph>
    <Paragraph position="1"> The HMM labels each word either with one of the desired classes (e.g., person, organization, etc.) or with the label NOT-A-NAME (to represent &amp;quot;none of the desired classes&amp;quot;). The states of the HMM fall into regions, one region for each desired class plus one for NOT-A-NAME. (See Figure 4.) The HMM thus has a model of each desired class and of the other text. Note that the implementation is not confined to the seven name classes used in the NE task; the particular classes to be recognized can be easily changed via a parameter.</Paragraph>
    <Paragraph position="2">  Within each of the regions, we use a statistical bigram language model, and emit exactly one word upon entering each state. Therefore, the number of states in each of the name-class regions is equal to the vocabulary size, V . Additionally, there are two special states, the START-OF-SENTENCE and END-OF-SENTENCE states. In addition to generating the word, states may also generate features of that word. Features used in the MUC-7 version of the system include several features pertaining to numeric expressions, capitalization, and membership in lists of important words (e.g. known corporate designators). The generation of words and name-classes proceeds in the following steps:  1. Select a name-class NC, conditioning on the previous name-class and the previous word. 2. Generate the first word inside that name-class, conditioning on the current and previous name-classes.</Paragraph>
    <Paragraph position="3"> 3. Generate all subsequent words inside the current name-class, where each subsequent word is conditioned on its immediate predecessor.</Paragraph>
    <Paragraph position="4"> 4. If not at the end of a sentence, go to 1.</Paragraph>
    <Paragraph position="5">  Whenever a person or organization name is recognized, the vocabulary of the system is dynamically updated to include possible aliases for that name. Using the Viterbi algorithm, we search the entire space of all possible name-class assignments, maximizing Pr(W,F,NC), the joint probability of words, features, and name classes.</Paragraph>
    <Paragraph position="6"> This model allows each type of &amp;quot;name&amp;quot; to have its own language, with separate bigram probabilities for generating its words. This reflects our intuition that: * There is generally predictive internal evidence regarding the class of a desired entity. Consider the following evidence: Organization names tend to be stereotypical for airlines, utilities, law firms, insurance companies, other corporations, and government organizations. Organizations tend to select names to suggest the purpose or type of the organization. For person names, first person names are stereotypical in many cultures; in Chinese, family names are stereotypical. In Chinese and Japanese, special characters are used to transliterate foreign names. Monetary amounts typically include a unit term, e.g., Taiwan dollars, yen, German marks, etc.</Paragraph>
    <Paragraph position="7"> * Local evidence often suggests the boundaries and class of one of the desired expressions. Titles signal beginnings of person names. Closed class words, such as determiners, pronouns, and prepositions often signal a boundary. Corporate designators (Inc., Ltd., Corp., etc.) often end a corporation name.</Paragraph>
    <Paragraph position="8"> While the number of word-states within each name-class is equal to V , this &amp;quot;interior&amp;quot; bigram language model is ergodic, i.e., there is a non-zero probability associated with every one of the V 2 transitions. As a parameterized, trained model, for transitions that were never observed, the model &amp;quot;backs off&amp;quot; to a lesspowerful model which allows for the possibility of unknown words.</Paragraph>
    <Paragraph position="9"> Training The model as used for the MUC-7 NE evaluation was trained on a total of approximately 790,000 words of NYT newswire data, annotated with approximately 65,500 named entities. In order to increase the size of our training set beyond the 90,000 words of training of airline crash documents provided by the Government, we selected additional training data from the North American News Text corpus. We annotated full articles before discovering a more effective annotation strategy. Since the test domain would be similar to the dry-run domain of air crashes, we used the University of Massachusetts INQUERY system to select 2000 articles which were similar to the 200 dry run training and test documents. About half of our training data consisted of full messages; this portion included the 200 messages provided by the Government as well as 319 messages from the 2000 retrieved by INQUERY. The second half of the data consisted of sample sentences selected from the remainder of the 2000 messages with the hope of increasing the variety of training data. This sampling strategy proved more effective than annotating full messages. Improvement in performance on the (dry run) airline crash test set is shown in Figure 6.</Paragraph>
  </Section>
  <Section position="5" start_page="11" end_page="12" type="metho">
    <SectionTitle>
NE Results
</SectionTitle>
    <Paragraph position="0"> Our F-measure for the official evaluation condition, 90.44, is shown as &amp;quot;Text Baseline&amp;quot; in Figure 5. In addition to the baseline condition, we performed some unofficial experiments to measure the accuracy of the system under more difficult conditions. Specifically, we evaluated the system on the test data modified to remove all case information (&amp;quot;Upper Case&amp;quot; in Figure 5), and also on the test data in SNOR (Speech Normalized Orthographic Representation) format (&amp;quot;SNOR&amp;quot; in Figure 5). By converting the text to all upper case characters, information useful for recognizing names in English is removed. Automatically transcribed speech, even with no recognition errors, is harder due to the lack of punctuation, spelling numbers out as words, and upper case in SNOR format.</Paragraph>
    <Paragraph position="1"> The degradation in performance from mixed case to all upper case is somewhat greater than that previously observed in similar tests run on generic newswire data (about 2 points). One possible explanation is that  case information is more useful in instances where the test domain is different than the domain of the training set. The degradation from all upper case to SNOR is similar to that previously observed.</Paragraph>
    <Paragraph position="2">  We also measured the effect of the training set size on the performance of the system in the air crash domain of the dry run. As is to be expected, increasing the amount of training data results in improved system performance. Figure 6 shows an almost two point increase in F-measure as the training set size was doubled from 91,000 words to 176,000 words. However, the next doubling of the number of words in the training set only resulted in a one point increase in F-measure. This is most likely due to the fact that as training set size increases, the likelihood of seeing a unique name or construction decreases. Though performance might not have peaked, adding more training data will have a progressively smaller effect since the system will not be seeing many constructions which it has not already seen in previous training.</Paragraph>
  </Section>
  <Section position="6" start_page="12" end_page="15" type="metho">
    <SectionTitle>
SYSTEM WALKTHROUGHS
NE Walkthrough
</SectionTitle>
    <Paragraph position="0"> BBN's Identifinder(TM) HMM-based approach to named entity recognition did well overall, and it scored 94% on the NE walkthrough article. Of the 7 errors, some can be related directly to choices made in marking our training data. For example, two cases were TV network names, which our annotators typically marked in training as organizations, but which the answer keys did not mark as such in the context where they occurred in the walkthrough article. One error can be attributed to the bigram nature of the current HMM model; while the phrase &amp;quot;Thursday morning&amp;quot; is to be tagged as a date and time, &amp;quot;early Thursday  morning&amp;quot; should instead be tagged as a single time, but the bigram model does not remember &amp;quot;early&amp;quot; when processing &amp;quot;morning&amp;quot;. Two other errors were unfamiliar organization names (seen no more than twice in training data) that Identifinder(TM) guessed were persons, since that guess is more frequently correct in the absence of other clues.</Paragraph>
    <Paragraph position="1"> TE and TR Walkthrough In an integrated system of the sort we used for the TE and TR tasks, the main determinant of performance is the sentence-level model, and the semantic structures that it produces. Secondary but still significant effects on performance come from the post-processing steps that derive TE and TR output from the sentence-level decoder tree:  This section will follow through selected portions of the walkthrough message, giving examples of the different effects that applied.</Paragraph>
    <Paragraph position="2"> Example 1 from paragraph 16 shows a case where everything worked as planned.</Paragraph>
    <Paragraph position="3">  Here the decoder correctly recognized a person name (PER/NPA) bound to a person descriptor (PER-DESC/NP-R). That descriptor contains an organization (ORG/NP) which in turn is linked to a location. The LINK and PTR nodes connect the descriptor with the person, the organization with the person descriptor (and thus indirectly with the person), and the location with the organization. In the post-processing, the person name is extracted, with the descriptor text is linked to it, the organization name is extracted, and the employment relationship noted. The organization is also linked to the nested location; of the two location elements in the LOC phrase, the first is taken as the LOCALE field filler, while the second is looked up in the gazetteer to identify a country in which the locale value is then looked up.</Paragraph>
    <Paragraph position="4"> Example 2 from the last paragraph of the message shows the effect of a decoder error.</Paragraph>
    <Paragraph position="5">  Here the sentence-level decoder linked both organization descriptors back to the top-level named organization, while the correct reading would have attached the second descriptor to the nested &amp;quot;Bloomberg L.P.&amp;quot;. The post-processing also therefore links both descriptor phrases to &amp;quot;Bloomberg Information Television&amp;quot; internally. Only the longest descriptor, however, is actually output, which in this case results in output of only the mistaken value.</Paragraph>
    <Paragraph position="6"> Not surprisingly, a number of the decoder errors that affected output stemmed from conjunctions. In paragraph 19, for example, the manufacturer organization name &amp;quot;Lockheed Space and Strategic Missiles&amp;quot; was incorrectly broken at the conjunction, causing the location relation with Bethesda to be missed. The cross sentence model is the system component that tries to find further relations beyond those identified by the sentence-level model. In the walk-through article, that component did not happen to succeed in finding any such relations. Example 3 shows the sort of relation that we would like that model to be able to get. There the sentence-level decoder did link Rubenstein to the organization descriptor &amp;quot;company&amp;quot;, but since that descriptor was never linked to &amp;quot;News Corporation&amp;quot;, the employee relation was  missed. However, since News Corporation is mentioned both in that sentence and the following sentence, an improved cross sentence model would be one way of attacking such examples.</Paragraph>
    <Paragraph position="7">  Here the decoder correctly identified both the artifact descriptors &amp;quot;A Chinese rocket&amp;quot; and &amp;quot;an Intelsat satellite&amp;quot;, but the output filter chose not to include them. That choice was made because of frequent cases where an indefinite artifact descriptor not linked to any named artifact should not be output; an example is &amp;quot;the last rocket I'd recommend&amp;quot; in paragraph 16. But this example shows that this decision not to output such cases cost us some points.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML