File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1316_evalu.xml

Size: 5,752 bytes

Last Modified: 2025-10-06 13:59:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1316">
  <Title>Selecting Text Features for Gene Name Classification: from Documents to Terms</Title>
  <Section position="5" start_page="3" end_page="3" type="evalu">
    <SectionTitle>
4 Experiments and discussions
</SectionTitle>
    <Paragraph position="0"> An experimental environment was set up by using the following resources: a) corpus: a set of documents has been obtained by collecting Medline abstracts (NLM, 2003) related to the baker's yeast (S. cerevisiae), resulting in 52,845 abstracts; this set, containing almost 5 million word occurrences, was used as both training and testing corpus.</Paragraph>
    <Paragraph position="1"> b) classification entities: a set of 5007 S. cerevisiae gene names has been retrieved from the</Paragraph>
    <Paragraph position="3"> , which also provided synonyms and aliases of genes; 2975 gene names appearing in the corpus have been used for the classification task.</Paragraph>
    <Paragraph position="4"> c) classification scheme: each gene name has been classified according to a classification scheme based on eleven categories (see Table 1) of the up- null d) training and testing sets: positive examples for each class were split evenly between the training and testing sets, and, also, the number of negative examples in the training set was set equal to the number of positive examples within each class.</Paragraph>
    <Paragraph position="5"> The only exception was the metabolism class, which had far more positive than negatives examples. Therefore, in this case, we have evenly split negative examples between the training and testing sets. Table 1 presents the distribution of positive and negative examples for each class.</Paragraph>
    <Paragraph position="6"> d) SVM engine: for training the multi-class SVM, we used SVM Light package v3.50 (Joachims, 1998) with a linear kernel function with the regulation parameter calculated as avg(&lt;x,x&gt;)  The January 2003 release of the GO ontology was used. A similar classification scheme was used in (Raychaudhuri et al., 2002).</Paragraph>
    <Paragraph position="7"> Features have been generated according to the methods explained in Section 3 (Table 2 shows the number of features generated). As indicated earlier, the experiments have been performed by using either all features or by selecting only those that appeared in at least two documents. As a rule, there were no significant differences in the classification performance between the two.</Paragraph>
    <Paragraph position="8"> feature no. of all features no. of features appearing in &gt;1 docs  To evaluate the classification performance we have firstly generated precision/recall plots for each class. In the majority of classes, terms have demonstrated the best performance (cf. Figures 1 and 2). However, the results have shown a wide disparity in performance across the classes, depending on the size of the training set. The classes with fairly large number of training entities (e.g. metabolism) have been predicted quite accurately (regardless of the features used), while, on the other hand, under-represented classes (e.g. sporulation) performed quite modestly (cf. Figure 1).  using words and terms Comparison between performances on different classes is difficult if the classes contain fairly different ratios of positive/negative examples in the testing sets, as it was the case in our experiments (see Table 1, column testing 1). Therefore, we re-evaluated the results by selecting - for each class the same number of positive and negative examples (see Table 1, column testing 2), so that we could compare relative performance across classes. The results shown in Figure 2 actually indicate which classes are &amp;quot;easier&amp;quot; to learn (only the performance of single-words and terms are presented). To assess the global performance of classification methods, we employed micro-averaging of the precision/recall data presented in Figure 2. In micro-averaging (Yang, 1997), the precision and recall are averaged over the number of entities that are classified (giving, thus, an equal weight to the performance on each gene). In other words, micro-average shows the performance of the classification system on a gene selected randomly from the testing set.</Paragraph>
    <Paragraph position="9"> The comparison of micro-averaging results for words, lemmas and stems has shown that there was no significant difference among them. This outcome matches the results previously reported for the document classification task (Leopold and Kindermann, 2002), which means that there is no need to pre-process documents.</Paragraph>
    <Paragraph position="10"> Figure 3 shows the comparison of micro-averaging plots for terms and lemmas. Terms perform generally much better at lower recall points, while there is just marginal difference between the two at the higher recall points. Very high precision points at lower recall mean that terms may be useful classification features for precise predictions for genes classified with the highest confidence.  lemmas and terms The results obtained by combining terms and words have not shown any improvements over using only terms as classification features. We believe that adding more features has introduced additional noise that derogated the overall performance of terms.</Paragraph>
    <Paragraph position="11"> Finally, Figure 4 presents the comparison of classification results using terms and abstract identifiers. Although PMIDs outperformed terms, we reiterate that - while other features allow learning more general properties that can be applied on other corpora - PMIDs can be only used to classify new terms that appear in a closed training/testing</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML