File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1305_evalu.xml

Size: 7,298 bytes

Last Modified: 2025-10-06 13:59:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1305">
  <Title>Two-Phase Biomedical NE Recognition based on SVMs</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Experimental Environments
</SectionTitle>
      <Paragraph position="0"> Experiments have been conducted on the GENIA corpus(v3.0p)(GENIA, 2003), which consists of 2000 MEDLINE abstracts annotated with Penn Treebank (PTB) POS tags. There exist 36 distinct semantic classes in the corpus. However, we used 22 semantic classes which are all but protein, DNA and RNA's subclasses on the GENIA ontology 2.</Paragraph>
      <Paragraph position="1"> The corpus was transformed into a B/I/O annotated corpus to represent entity boundaries and a semantic class.</Paragraph>
      <Paragraph position="2"> We divided 2000 abstracts into 10 collections for 10-fold cross validation. Each collection contains not only abstracts but also paper titles. The vocabularies for lexical features and prefix/suffix lists were constructed by taking the most frequent 10,000 words from the training part only.</Paragraph>
      <Paragraph position="3"> Also, we made another experimental environment to compare with the previous work by (Kazama, 2002). From the GENIA corpus, 590 abstracts(4,808 sentences; 20,203 entities; 128,463 words) were taken as a training part and 80 abstracts(761 sentences; 3,327 entities; 19,622 words) were selected as a test part. Because we couldn't make the experimental environment such as the same as that of Kazama's, we tried to make a comparable environment.</Paragraph>
      <Paragraph position="4"> We implemented our method using the SVM-light package(Joachims, 2002). Though various learning parameters can significantly affect the performance of the resulting classifiers, we used the SVM system with linear kernel and default options.</Paragraph>
      <Paragraph position="5"> The performance was evaluated by precision, recall and Ffl=1. The overall Ffl=1 for two models and ten collections, were calculated using 10-fold cross validation on total test collection.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Effect of Training Data Size
</SectionTitle>
      <Paragraph position="0"> In this experiment, varying the size of training set, we observed the change of Ffl=1 in the entity identification and the semantic classification. We fixed the test data with 200 abstracts(1,921 sentences; 50,568 words). Figure 4 shows that the performance was improved by increasing the training set size. As the performance of the identification increases, the gap between the performance of the identification and that of the semantic classification is gradually decreased. null</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Computational Efficiency
</SectionTitle>
      <Paragraph position="0"> When using one-vs-rest method, the number of negative samples is very critical to the training in  the aspect of training time and required resources. The SVM classifier for entity identifiation determines whether each word is included in an entity or not. Figure 5 shows there are much more negative samples than positive samples in the identification phase. Once entities are identified, non-entity words are not considered in next semantic classification phase. Therefore, the proposed method can effectively remove the unnecessary samples. It enables us effectively save the training costs.</Paragraph>
      <Paragraph position="1"> Furthermore, the proposed method could effectively decrease the degree of the unbalance among classes by simplifying the classes. Figure 6 shows how much the proposed method can alleviate the unbalanced class distribution problem compared with 1-phase complicated classification model. However, even though the unbalanced class distribution problem could be alleviated in the identification phase, we are still suffering from the problem in the semantic classification as long as we take the one-vs-rest method. It indicates that we need to take another classification method such as a pairwise method in the semantic classification(Krebel, 1999).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.4 Discriminative Feature Selection
</SectionTitle>
      <Paragraph position="0"> We subsequently examined several alternatives for the feature sets described in section 3.1 and section 4.1.</Paragraph>
      <Paragraph position="1"> The column (A) in Table 2 shows the identification cases. The base feature set consisted of only the designated word and the context words in the range from the left 2 to the right 2. Several alternatives for feature sets were constructed by adding a different combination of features to the base feature set. From  abstracts, test with 100 abstracts): (A) identification phase, (B) semantic classification phase Table 2, we can see that part-of-speech information certainly improves the identification accuracy(about +2.8). Prefix and suffix features made a positive effect, but only modestly(about +1.2 on average).</Paragraph>
      <Paragraph position="2"> The column (B) in Table 2 shows semantic classification cases with the identification phase of the best performance. We took the feature set composed of the inside words of an entity as a base feature set. And we made several alternatives by adding another features. The experimental results show that functional words and left context features are useful, but right context features are not. Furthermore, part-of-speech information was not effective in the semantic classification while it was useful for the entity identification. That is, when we took the part-of-speech tags of inside context words instead of the inside context words, the performance of the semantic classification was very low(Ffl=1:0 was 25.1).</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.5 Effect of PostProcessing by Dictionary
Lookup
</SectionTitle>
      <Paragraph position="0"> Our two-phase model has the problem that identification errors are propagated to the semantic classification. For this reason, it is necessary to ensure a high accuracy of the boundary identification by adopting a method such as post processing of the identified entities. Table 3 shows that the post processing by dictionary lookup is effective to improving the performance of not only the boundary identification accurary(79.2 vs. 79.9) but also the semantic classification accuracy(66.1 vs. 66.5).</Paragraph>
      <Paragraph position="1"> When comparing with the (Kazama, 2002) even though the environments is not the same, the proposed two-phase model showed much better performance in both the entity identification (73.6 vs.</Paragraph>
      <Paragraph position="2"> 81.4) and the entity classification (54.4 vs. 68.0).</Paragraph>
      <Paragraph position="3"> One of the reason of the performance improvement is that we could take discriminative features for each subtask by separating the task into two subtasks.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML