File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2038_metho.xml

Size: 8,847 bytes

Last Modified: 2025-10-06 14:10:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2038">
  <Title>A Comparison of Tagging Strategies for Statistical Information Extraction</Title>
  <Section position="3" start_page="0" end_page="149" type="metho">
    <SectionTitle>
2 Modeling Information Extraction
</SectionTitle>
    <Paragraph position="0"> as a Token Classification Task There are multiple approaches that model IE as a token classification task, employing standard  classification algorithms. These systems split a text into a series of tokens and invoke a trainable classifier to decide for each token whether or not it is part of a slot filler of a certain type. To re-assemble the classified tokens into multi-token slotfillers, varioustagging strategies canbeused. The trivial (Triv) strategy would be to use a single class for each slot type and an additional &amp;quot;O&amp;quot; class for all other tokens. However, this causes problems if two entities of the same type immediately follow each other, e.g. if the names of two speakers are separated by a linebreak only. In such a case, both names would be collapsed into a single entity, since the trivial strategy lacks a way o mark the begin of the second entity.</Paragraph>
    <Paragraph position="1"> For this reason (as well as for improved classification accuracy), various more complex strategies are employed that use distinct classes to mark the first and/or last token of a slot filler. The two variations of IOB tagging are probably most common: the variant usually called IOB2 classifies each token as the begin of a slot filler of a certain type (B-type), as a continuation of the previously started slot filler, if any (I-type), or as not belonging to any slot filler (O). The IOB1 strategy differs from IOB2 in using B-type only if necessary to avoid ambiguity (i.e. if two same-type entities immediately follow each other); otherwise I-type is used even at the beginning of slot fillers. While the Triv strategy uses only n+1 classes for n slot types, IOB tagging requires 2n+1 classes.</Paragraph>
    <Paragraph position="2"> BIE tagging differs from IOB in using an additional class for the last token of each slot filler. One class is used for the first token of a slot filler (B-type), one for inner tokens (I-type) and another one for the last token (E-type). A fourth class BE-type is used to mark slot fillers consisting of a single token (which is thus both begin and end). BIE requires 4n+1 classes.</Paragraph>
    <Paragraph position="3"> A disadvantage of the BIE strategy is the high number of classes it uses (twice as many as IOB1|2). This can be addressed by introducing a new strategy, BIA (or Begin/After tagging). Instead of using a separate class for the last token of a slot filler, BIA marks the first token after a slot filler as A-type (unless it is the begin of a new slot filler). Begin (B-type) and continuation (I-type) of slot fillers are marked in the same way as by IOB2. BIA requires 3n+ 1 classes, n less than BIE since no special treatment of single-token slot fillers is necessary. The strategies discussed so far require only a single classification decision for each token. Another option is to use two separate classifiers, one for finding the begin and another one for finding the end of slot fillers. Begin/End (BE) tagging requires n + 1 classes for each of the two classifiers (B-type + O for the first, E-type + O for the second). In this case, there is no distinction between inner and outer (other) tokens. Complete slot fillers are found by combining the most suitable begin/end pairs of the same type, e.g. by taking the length distribution of slots into account. Table 1 lists the properties of all strategies side by side.</Paragraph>
  </Section>
  <Section position="4" start_page="149" end_page="150" type="metho">
    <SectionTitle>
3 Classification Algorithm and
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="149" end_page="150" type="sub_section">
      <SectionTitle>
Experimental Setup
</SectionTitle>
      <Paragraph position="0"> Our generalized IE system allows employing any classification algorithm with any tagging strategy and any context representation, provided that a suitable implementation or adapter exists. For this paper, we have used the Winnow (Littlestone, 1988) classification algorithm and  the context representation described in (Siefkes, 2005), varying only the tagging strategy. An advantage of Winnow is its supporting incremental training as well as batch training. For many &amp;quot;real-life&amp;quot; applications, automatic extractions will be checked and corrected by a human revisor, as automatically extracted data will always contain errors and gaps that can be detected by human judgment only. This correction process continually provides additional training data, but the usual batch-trainable algorithms are not very suited to integrate new data, since full retraining takes a long time.</Paragraph>
      <Paragraph position="1"> We have compared the described tagging strategies on two corpora that are used very often to evaluate IE systems, CMU Seminar Announcements and Corporate Acquisitions.1 For bothcorpora, we usedthe standardsetup: 50/50 training/evaluation split, averaging results over five (Seminar) or ten (Acquisitions) random splits, &amp;quot;one answer per slot&amp;quot; (cf. Lavelli et al. (2004)). Extraction results are evaluated in the usual way by calculating precision P and recall R of the extracted slot fillers and combining them in the F-measure, the harmonic mean of precision and recall: F = 2xPxRP+R .2 For significance testing, we applied a paired two-tailed  classification accuracy due to the very unbalanced class distribution among tokens. In the Seminar Announcements corpus, our tokenization schema yields 139,021 to-Strategy IOB1 Triv BIE BIA BEetime o(81.6%,-) o(85.3%,-) -(98.4%,-) o(68.6%,+) o(90.6%,-)</Paragraph>
      <Paragraph position="3"/>
      <Paragraph position="5"/>
    </Section>
    <Section position="2" start_page="150" end_page="150" type="sub_section">
      <SectionTitle>
Student's T-test on the F-measure results, with-
</SectionTitle>
      <Paragraph position="0"> out assuming the variance of the two samples to be equal.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="150" end_page="151" type="metho">
    <SectionTitle>
4 Comparison Results
</SectionTitle>
    <Paragraph position="0"> Table 2 list the F-measure results (in percent) reached for both corpora using batch training.</Paragraph>
    <Paragraph position="1"> Incremental results have been omitted due to lack of space--they are generally slightly worse than batch results, but in many cases the difference is small. For the Corporate Acquisitions, the batch results of the best strategies (IOB2 and BIA) are better than any other published results we are aware of; for the Seminar Announcements, they are only beaten by the ELIE system (Finn and Kushmerick, 2004).3 Tables 3 and 4 analyze the performance of each tagging strategy for both training regimes, kens, only 9820 of which are part of slot fillers. Thus most strategies could already reach an accuracy of 93% by always predicting the O class. Also, correctly extracting slot fillers is the goal of IE--a higher token classification accuracy won't be of any use if information extraction performance suffers.</Paragraph>
    <Paragraph position="2"> 3cf. (Siefkes and Siniakov, 2005, Sec. 6.5)  using the popular IOB2 strategy as a baseline.</Paragraph>
    <Paragraph position="3"> The first item in each cell indicates whether the strategy performs significantly better (&amp;quot;+&amp;quot;) or worse (&amp;quot;-&amp;quot;) than IOB2 or whether the performance difference is not significant at the 95% level (&amp;quot;o&amp;quot;). In brackets, we show the significance of the comparison and whether the results are better or worse when significance is ignored.</Paragraph>
    <Paragraph position="4"> Considering these results, we see that the IOB2 and BIA strategies are best. No strategy is able to significantly beat the IOB2 strategy on any slot, neither with incremental nor batch training. The newly introduced BIA strategy is the only one that is able to compete with IOB2 on all slots. The IOB1 and Triv strategies come close, being significantly worse than IOB2 only for one or two slots. The two-classifier BE strategy is weaker, being significantly outperformed on three (incremental) or four (batch) slots. Worst results are reached by the BIE strategy, where the difference is significant in about half of all cases. The good performance of BIA is interesting, since this strategy is new and has never been used before (to our knowledge).</Paragraph>
    <Paragraph position="5"> The Triv strategy would have supposed to be weaker, considering how simple this strategy is.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML