XML Viewer - w06-2202

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2202_evalu.xml
Size: 7,324 bytes
Last Modified: 2025-10-06 13:59:50
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2202">
  <Title>Simple Information Extraction (SIE): A Portable and Effective IE System</Title>
  <Section position="8" start_page="12" end_page="14" type="evalu">
    <SectionTitle>
7 Evaluation
</SectionTitle>
    <Paragraph position="0"> In order to demonstrate that SIE is domain and language independent we tested it on several tasks using exactly the same configuration. The tasks and the experimental settings are described in Section 7.1. The results (Section 7.2) show that the adopted filtering technique decreases drastically the computation time while preserving (and sometimes improving) the overall accuracy of the system. null</Paragraph>
    <Section position="1" start_page="12" end_page="13" type="sub_section">
      <SectionTitle>
7.1 The Tasks
</SectionTitle>
      <Paragraph position="0"> SIE was tested on the following IE benchmarks: JNLPBA Shared Task This shared task (Kim et al., 2004) is an open challenge task proposed  al., 2003), annotated with five entity types: DNA, RNA, protein, cell-line, and cell-type. The GE-NIA corpus is split into two partitions: training (492,551 tokens), and test (101,039 tokens). The fraction of positive examples with respect to the total number of tokens in the training set varies from 0.2% to 6%.</Paragraph>
      <Paragraph position="1"> CoNLL 2002 &amp; 2003 Shared Tasks These shared tasks (Tjong Kim Sang, 2002; Tjong Kim Sang and De Meulder, 2003)7 concern language-independent named entity recognition. Four types of named entities are considered: persons (PER), locations (LOC), organizations (ORG) and names of miscellaneous (MISC) entities that do not belong to the previous three groups. SIE was applied to the Dutch and English data sets. The Dutch corpus is divided into three partitions: training and validation (on the whole 258,214 tokens), and test (73,866 tokens). The fraction of positive examples with respect to the total number of tokens in the training set varies from 1.1% to 2%. The English corpus is divided into three partitions: training and validation (on the whole 274,585 tokens), and test (50,425 tokens). The fraction of positive examples with respect to the total number of tokens in the training set varies from 1.6% to 3.3%.</Paragraph>
      <Paragraph position="2"> TERN 2004 The TERN (Time Expression Recognition and Normalization) 2004 Evaluation8 requires systems to detect and normalize temporal expressions occurring in English text (SIE did not address the normalization part of the task). The TERN corpus is divided into two partitions: training (249,295 tokens) and test (72,667 tokens). The fraction of positive examples with respect to the total number of tokens in the training set is about 2.1%.</Paragraph>
      <Paragraph position="3"> Seminar Announcements The Seminar Announcements (SA) collection (Freitag, 1998) consists of 485 electronic bulletin board postings. The purpose of each document in the collection is to announce or relate details of an upcoming talk or seminar. The documents were annotated for four entities: speaker, location, stime, and etime. The corpus is composed by 156,540 tokens. The fraction of positive examples varies from about 1% to  domly partitioned five times into two sets of equal size, training and test (Lavelli et al., 2004). For each partition, learning is performed on the training set and performance is measured on the corresponding test set. The resulting figures are averaged over the five test partitions.</Paragraph>
    </Section>
    <Section position="2" start_page="13" end_page="14" type="sub_section">
      <SectionTitle>
7.2 Results
</SectionTitle>
      <Paragraph position="0"> The experimental results in terms of filtering rate, recall, precision, F1, and computation time for JNLPBA, CoNLL-2002, CoNLL-2003, TERN and SA are given in Tables 5, 6, 7, 8 and 9 respectively.</Paragraph>
      <Paragraph position="1"> To show the differences among filtering strategies for JNLPBA, CoNLL-2002, TERN 2004 we used CC, OR and IC filters, while the results for SA and CoNLL-2003 are reported only for OR filter (which usually produces the best performance).</Paragraph>
      <Paragraph position="2"> For all filters we report results obtained by setting four different values for parameterepsilon1, the maximum value allowed for the Filtering Rate of positive examples. epsilon1 = 0 means that no filter is used.  Precision, F1 and total computation time for TERN.</Paragraph>
      <Paragraph position="3"> The results indicate that both CC and OR do exhibit good performance and are far better than IC in all the tasks. For example, in the JNLPBA data set, OR allows to remove more than 70% of the instances, losing less than 1% of the positive examples. These results pinpoint the importance of using a supervised metric to collect stop words. The results also highlight that both CC and OR are robust against overfitting, because the difference between the filtering rates in the training and test sets is minimal. We also report a significant reduction ofthedataskewness. Table10showsthatalltheIF techniques reduce sensibly the skewness ratio, the ratio between the number of negative and positive examples, on the JNLPBA data set9. As expected, both CC and OR consistently outperform IC.</Paragraph>
      <Paragraph position="4"> The computation time10 reported includes the time to perform the overall process of training and testing the boundary classifiers for each entity11.</Paragraph>
      <Paragraph position="5"> The results indicate that both CC and OR are far superior to IC, allowing a drastic reduction of the  JNLPBA.</Paragraph>
      <Paragraph position="6"> larly convenient when dealing with large data sets. For example, using the CC metric the time required by SIE to perform the JNLPBA task is reduced from 615 to 109 minutes (see Table 5).</Paragraph>
      <Paragraph position="7"> Both OR and CC allow to drastically reduce the computation time and maintain the prediction accuracy12 with small values of epsilon1. Using OR, for example, with epsilon1 = 2.5% on JNLPBA, F1 increases from 66.7% to 67.9%. On the contrary, for CoNLL-2002 and TERN, for epsilon1 &gt; 2.5% and epsilon1 &gt; 1% respectively, the performance of all the filters rapidly declines. The explanation for this behavior is that, for the last two tasks, the difference between the filtering rates on the training and test sets becomes much larger for epsilon1 &gt; 2.5% and epsilon1 &gt; 1%, respectively. That is, the data skewness changes significantly from the training to the test set. It is not surprising that an extremely aggressive filtering step reduces too much the information available to the classifiers, leading the overall 12For JNLPBA, CoNLL 2002 &amp; 2003 and Tern 2004, results are obtained using the official evaluation software made available by the organizers of the tasks.</Paragraph>
      <Paragraph position="8"> performance to decrease.</Paragraph>
      <Paragraph position="9"> SIE achieves results close to the best systems in all tasks13. It is worth noting that state-of-the-art IE systems often exploit external, domain-specific information (e.g. gazetteers (Carreras et al., 2002) and lexical resources (Zhou and Su, 2004)) while SIE adopts exactly the same feature set and does not use any external or task dependent knowledge source.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML