File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1308_evalu.xml
Size: 6,348 bytes
Last Modified: 2025-10-06 13:59:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1308"> <Title>Bio-Medical Entity Extraction using Support Vector Machines</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Experiment and Discussion </SectionTitle> <Paragraph position="0"> Results are given as F-scores (van Rijsbergen, 1979) using the CoNLL evaluation script and are defined as F = (2PR)=(P+R). where P denotes Precision and R Recall. P is the ratio of the number of correctly found NE chunks to the number of found NE chunks, and R is the ratio of the number of correctly found NE chunks to the number of true NE chunks.</Paragraph> <Paragraph position="1"> All results are calculated using 10-fold cross validation. null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Experiment 1: Effect of Training Set Size </SectionTitle> <Paragraph position="0"> The effect of context window size is shown along the top column of Tables 3 and 4. It can be seen that without exception more training data results in higher overall F-scores except at 10 per cent. where the result seems to be biased by the small sample, perhaps because one abstract is partly included in the training and testing sets. As we would expect larger training sets reduce the effects of data sparseness and allow more accurate models to be induced.</Paragraph> <Paragraph position="1"> The rate of increase in improvement however is not uniform according to the feature sets that are used. For surface word features and head noun features the improvement in performance is consistently increasing whereas the improvement for using orthographic and part of speech features is quite erratic. This may be an effect of the small sample of training data that we used and we could not find any consistent explanation why this occurred.</Paragraph> <Paragraph position="2"> As we observed before, the best overall result comes from using Or hd, i.e. surface words, orthographic and head features. However the total score hides the fact that three classes, i.e.</Paragraph> <Paragraph position="3"> SOURCE.mo, SOURCE.mu and SOURCE.ti actually perform worse when using anything but surface word forms (shown in Table 5). One possible explanation for this is that all of these classes have very small numbers of samples and the effect of adding features may be to blur the distinction between these and other more numerous classes in the model. However it is interesting to note that this does not happen with the RNA class which is also very small.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Experiment 2: Effect of Feature Sets </SectionTitle> <Paragraph position="0"> The effects of feature sets is of major importance in modelling named entity. In general we would like to identify only the necessary features that are required and to remove those that do not contribute to an increase in performance. This also saves time in training and testing.</Paragraph> <Paragraph position="1"> The results from Tables 3 and 4 at 100 per cent.</Paragraph> <Paragraph position="2"> training data are summarized in Table 5 and clearly illustrate the value of surface word level features combined with orthographic and head noun features.</Paragraph> <Paragraph position="3"> Orthographic features allow us to capture many generalities that are not obvious at the surface word level such as IkappaB alpha and IkappaB beta both being PROTEINs and IL-10 and IL-2 both being PROTEINs.</Paragraph> <Paragraph position="4"> The orthographic-head noun feature combination (Or hd) gives the best combined-class performance of 74.23 at 100 per cent. training data on a -2+2 window. Overall orthographic features combined with surface word features gave an improvement of between 4.9 and 22.0 per cent. at 100 per cent. data depending on window size over surface words alone.</Paragraph> <Paragraph position="5"> This was the biggest contribution by any feature except the surface words. Head information for example allowed us to correctly capture the fact that in the phrase NF-kappaB consensus site the whole of it is a DNA, whereas using orthographic information alone the SVM could only say that NF-kappaB was a PROTEIN and ignoring consensus site. We see a similar case in the phrase primary NK cells which is correctly classified as SOURCE.ct using head noun and orthographic features but only NK cells are found using orthographic features. This mistake is a natural consequence of a limited contextual view which the head noun feature helped to rectify.</Paragraph> <Paragraph position="6"> Part of speech (POS) when combined with surface word features gave an improvement of between 7.9 and 11.7 per cent. at 100 per cent. data. The influence of POS though does not appear to be sustained when combined with other features and we found that it actually degraded performance slightly in many cases. This may possibly be due to either overlapping knowledge or more likely subtle inconsistencies between POS features and say, orthographic features. This could have occurred during training when the POS tagger was trained on an out of domain (news) text collection. It is possible that if the POS tagger was trained on in-domain texts it would make a greater and more consistent contribution. An example where orthographic features allowed correct classification but adding POS features resulted in failure is p50 in the phrase consisting of 50 (p50) - and 65 (p65) -kDa proteins. Also in the phrase c-Jun transactivation domain where only c-Jun should be tagged as a protein, by using orthographic features and POS the model tags the whole phrase as a PROTEIN. This is probably because POS tagging gives a NN feature value (common noun) to each word. This is very general and does not allow the model to discriminate between them.</Paragraph> <Paragraph position="7"> The fourth feature we investigated is related to syntactic rather than lexical knowledge. We felt though that there should exist a strong semantic relation between a word in a term and the head noun of that term. The results in Table 5 show that while the overall contribution of the Head feature is quite small, it is consistent for almost all classes.</Paragraph> </Section> </Section> class="xml-element"></Paper>