XML Viewer - p06-2074

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2074_evalu.xml
Size: 6,745 bytes
Last Modified: 2025-10-06 13:59:44
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2074">
  <Title>ARE: Instance Splitting Strategies for Dependency Relation-based Information Extraction</Title>
  <Section position="8" start_page="576" end_page="577" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> In order to evaluate the efficiency of our method, we conduct our experiments in 2 domains: MUC4 (Kaufmann, 1992) and MUC6 (Kaufmann, 1995).</Paragraph>
    <Paragraph position="1"> The official corpus of MUC4 is released with MUC3; it covers terrorism in the Latin America region and consists of 1,700 texts. Among them, 1,300 documents belong to the training corpus.</Paragraph>
    <Paragraph position="2"> Testing was done on 25 relevant and 25 irrelevant texts from TST3, plus 25 relevant and 25 irrelevant texts from TST4, as is done in Xiao et al. (2004).</Paragraph>
    <Paragraph position="3"> MUC6 covers news articles in Management Succession domain. Its training corpus consists of 1201 instances, whereas the testing corpus consists of 76 person-ins, 82 person-outs, 123 positions, and 79 organizations. These slots we extracted in order to fill templates on a sentence-by-sentence basis, as is done by Chieu et al. (2002) and Soderland (1999).</Paragraph>
    <Paragraph position="4"> Our experiments were designed to test the effectiveness of both case splitting and action verb promotion. The performance of ARE is compared to both the state-of-art systems and our baseline approach. We use 2 state-of-art systems for MUC4 and 1 system for MUC6. Our baseline system, Anc+rel, utilizes only anchors and relations without category splitting as described in Section 3. For our ARE system with case splitting, we present the results on Overall corpus, as well as separate results on Simple, Average and Hard categories.</Paragraph>
    <Paragraph position="5"> The Overall performance of ARE represents the result for all the categories combined together.</Paragraph>
    <Paragraph position="6"> Additionally, we test the impact of the action promotion (in the right column) for the average and hard categories.</Paragraph>
    <Paragraph position="7">  The comparative results are presented in Table 4 and Table 5 for MUC4 and MUC6, respectively.</Paragraph>
    <Paragraph position="8"> First, we review our experimental results on MUC4 corpus without promotion (left column) before proceeding to the right column.</Paragraph>
    <Paragraph position="9"> a) From the results on Table 4 we observe that our baseline approach Anc+rel outperforms all the state-of-art systems. It demonstrates that both anchors and relations are useful. Anchors allow us to group entities according to their semantic meanings and thus to select of the most prominent candidates. Relations allow us to capture more invariant representation of instances. However, a sentence may contain very few high-quality relations. It implies that the relations ranking step is fuzzy in nature. In addition, we noticed that some anchor cues may be missing, whereas the other anchor types may be represented by several anchor cues. All these factors lead only to moderate improvement in performance, especially in comparison with GRID system.</Paragraph>
    <Paragraph position="10"> b) Overall, the splitting of instances into categories turned out to be useful. Due to the application of specific strategies the performance increased by 1% over the baseline. However, the large dominance of the hard cases (65%) made this improvement modest. null c) We notice that the amount of variations for connecting anchor cues in the Simple category is relatively small. Therefore, the overall performance for this case reaches F  =82%. The main errors here come from missing anchors resulting partly from mistakes in such component as NE detection.</Paragraph>
    <Paragraph position="11"> d) The performance in the Average category is F  =67%. It is lower than that for the simple category because of higher variability in relations and negative influence of support verbs. For example, for excerpt such as &amp;quot;X investigated murder of Y&amp;quot;, the processing tends to make mistake without the analysis of semantic value of support verb 'investigated'. null e) Hard category achieves the lowest performance of F  =51% among all the categories. Since for this category we have to rely mostly on anchors, the problem arises if these anchors provide the wrong clues. It happens if some of them are missing or are wrongly extracted. The other cause of mistakes is when ARE finds several anchor cues which belong to the same type.</Paragraph>
    <Paragraph position="12"> Additional usage of promotion strategies allowed us to improve the performance further.</Paragraph>
    <Paragraph position="13"> f) Overall, the addition of promotion strategy enables the system to further boost the performance to F  =60%. It means that the promotion strategy is useful, especially for the average case. The improvement in comparison to the state-of-art system GRID is about 3%.</Paragraph>
    <Paragraph position="14"> g) It achieved an F  =69%, which is an improvement of 2%, for the Average category. It implies that the analysis of support verbs helps in revealing the differences between the instances such as &amp;quot;X was involved in kidnapping of Y&amp;quot; and &amp;quot;X reported kidnapping of Y&amp;quot;.</Paragraph>
    <Paragraph position="15"> h) The results in the Hard category improved moderately to F  =52%. The reason for the improvement is that more anchor cues are captured after the promotion. Still, there are 2 types of common mis- null takes: 1) multiple or missing anchor cues of the same type and 2) anchors can be spread across several sentences or several clauses in the same sentence. null  For the MUC6 results given in Table 5, we observe that the overall improvement in performance of ARE system over Chieu et al.'02 is 6%. The trends of results for MUC6 are similar to that in MUC4. However, there are few important differences. First, 45% of instances in MUC6 fall into the Simple category, therefore this category dominates. The reason for this is that the terminologies used in Management Succession domain are more stable in comparison to the Terrorism domain. Second, there are more anchor types for this case and therefore the promotion strategy is applicable also to the simple case. Third, there is no improvement in performance for the Hard category. We believe the primary reason for it is that more stable language patterns are used in MUC6. Therefore, dependency relations are also more stable in MUC6 and the promotion strategy is not very important. Similar to MUC4, there are problems of missing anchors and mistakes in dependency parsing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML