File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3325_evalu.xml

Size: 7,278 bytes

Last Modified: 2025-10-06 13:59:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3325">
  <Title>The Difficulties of Taxonomic Name Extraction and a Solution</Title>
  <Section position="6" start_page="130" end_page="132" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> A combining approach gives rise to many questions, e.g.: How does a word-level classifier perform with training data automatically generated? How does rule-based filtering affect precision, recall, and coverage? What is the effect to dynamic lexicons? Which kinds of errors remain? We run two series of experiments: We first process individual documents. We then process the documents incrementally, i.e., we do neither clear the sets of known positives and negatives after each document, nor the statistics of the word-level language recognizer. This is to measure the benefit of reusing data obtained from one document in the processing of subsequent ones. Finally, we take a closer look at the effects of the individual steps and heuristics from Section 4.</Paragraph>
    <Paragraph position="1"> The platform is implemented in JAVA 1.4.2.</Paragraph>
    <Paragraph position="2"> We use the java.util.regex package to represent the rules. All tests are based on 20 issues of the American Museum Novitates, a natural science periodical published by the American Museum of Natural History. The documents contain about 260.000 words, including about 2.500 taxonomic names. The latter consist of about 8.400 words.</Paragraph>
    <Section position="1" start_page="131" end_page="131" type="sub_section">
      <SectionTitle>
5.1 Tests with Individual Documents
</SectionTitle>
      <Paragraph position="0"> First, we test the combined classifier with individual documents. The Docs column in Table 3 contains the results. The combination of rules and word-level classification provides very high precision and recall. The former is 99.7% on average, the latter 98.2%. The manual effort is very low: The average coverage is 99.7%.</Paragraph>
    </Section>
    <Section position="2" start_page="131" end_page="131" type="sub_section">
      <SectionTitle>
5.2 Tests with Entire Corpus
</SectionTitle>
      <Paragraph position="0"> In the first test the classifier did not transfer any experience from one document to later ones. We now process the documents one after another. The Corp column of Table 3 shows the results. As expected, the classifier performs better than with individual documents. The average recall is 99.2%, coverage is 99.8% on average. Only precision is a little less, 99.1% on average.</Paragraph>
      <Paragraph position="1">  The effect of the incremental learning is obvious. The false positives are less than half of those in the first test. A comparison of Line False Positives in Table 3 shows this. The same is true for the number feedback requests (Line User Feedbacks). The slight decrease in precision (Line False Negatives) results from the propagation of misclassifications between documents. The reason for the improvement becomes clear for documents where the number of word sequences in &lt;preciseTaxName&gt; is low: experience from previous documents compensates the lack of positive examples. This reduces both false positives and manual classifications.</Paragraph>
    </Section>
    <Section position="3" start_page="131" end_page="131" type="sub_section">
      <SectionTitle>
5.3 The Data Rules
</SectionTitle>
      <Paragraph position="0"> The exclusion of word sequences containing a sure negative turns out to be effective to filter the matches of &lt;taxName&gt;. Lines &lt;taxName&gt; and SN excluded of Tables 3 show this. On average, this step excludes about 20% of the word sequences matching &lt;taxName&gt;. Lines &lt;taxName&gt; and Names excluded tell us that the rule based on the names of scientists is even more effective. On average, it excludes about 40% of the matches of &lt;taxName&gt;.</Paragraph>
      <Paragraph position="1"> Both data rules decrease the number of words the language recognizer has to deal with and eventually the manual effort. This is because they reduce the number of words classified uncertain.</Paragraph>
    </Section>
    <Section position="4" start_page="131" end_page="131" type="sub_section">
      <SectionTitle>
5.4 Comparison to Word-Level Classifier
and TaxonGrab
</SectionTitle>
      <Paragraph position="0"> A word-level classifier (WLC) is the core component of the combining technique. We compare it in standalone use to the combining technique (Comb) and to the TaxonGrab (T-Grab) approach (Koning 2005). See Table 4. The combining technique is superior to both TaxonGrab and stand-alone word-level classification. The reason for better precision and recall is that it uses more different evidence. The better coverage results from the lower number of words that the word-level classifier has to deal with. On average, it has to classify only 2.5% of the words in a document.</Paragraph>
      <Paragraph position="1"> This reduces the classification effort, leading to less manual feedback. It also decreases the number of potential errors of the word-level classifier. All these positive effects result in about 99% f-Measure and 99.7% coverage. This means the error is reduced by 75% compared to word-level classification, and by 80% compared to Taxon-Grab. The manual effort decreases by 94% compared to the standalone word-level classifier.</Paragraph>
    </Section>
    <Section position="5" start_page="131" end_page="132" type="sub_section">
      <SectionTitle>
5.5 Misclassified Words
</SectionTitle>
      <Paragraph position="0"> Despite all improvements, there still are word sequences that are misclassified.</Paragraph>
      <Paragraph position="1"> False Negatives. The regular expressions in &lt;preciseTaxName&gt; are intended to be 100% precise. There are, however, some (rare) exceptions. Consider the following phrase: &amp;quot;... In Guadeloup (Mexico) another subspecies killed F. Smith.&amp;quot; Except for the word In, this sentence matches the  regular expression from &lt;preciseTaxName&gt; where &lt;subSpecies&gt; is mandatory. Similar pathologic cases could occur for the variety part. Another class of false negatives contains two word sequences, and the first one is the name of a genus. For instance, &amp;quot;Xenomyrmex varies ...&amp;quot; falls into this category. The classifier (correctly) recognizes the first word as a part of a taxonomic name. The second one is not typical enough to change the overall classification of the sequence. To recognize these false negatives, one might use POS-tagging. We could exclude word sequences containing words whose meaning does not fit into a taxonomic name.</Paragraph>
      <Paragraph position="2"> False Positives. Though &lt;taxName&gt; matches any taxonomic name, the subsequent exclusion mechanisms may misclassify a sequence of words. In particular, the word-level classifier has problems recognizing taxonomic names containing proper names of persons. The problem is that these words consist of N-Grams that are typical for common English. &amp;quot;Wheeleria rogersi Smith&amp;quot;, for instance, is a fictitious but valid taxonomic name. A solution to this problem might be to use the scientist names for constructing and recognizing the genus and species names derived from them.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML