File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-2010_intro.xml

Size: 6,427 bytes

Last Modified: 2025-10-06 14:01:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-2010">
  <Title>Named Entity Recognition as a House of Cards: Classifier Stacking</Title>
  <Section position="3" start_page="0" end_page="70" type="intro">
    <SectionTitle>
AC
BP BE
AC
BE
</SectionTitle>
    <Paragraph position="0"> al., 1998), an important feature for the NER task is information relative to word capitalization. In an approach similar to Zhou and Su (2002), we extracted for each word a 2-byte code, as summarized in Table 1. The first byte specifies the capitalization of the word (first letter capital, etc), while the second specifies whether the word is present in the dictionary in lower case, upper case, both or neither forms. These two codes are extracted in order to offer both a way of backing-off in sparse data cases (unknown words) and a way of encouraging generalization. Table 2 shows the performance of the fnTBL (Ngai and Florian, 2001) and Snow systems when using the capitalization information, both systems displaying considerably better performance.</Paragraph>
    <Section position="1" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
2.2 Transformation-Based Learning
</SectionTitle>
      <Paragraph position="0"> Transformation-based learning (TBL henceforth) is an error-driven machine learning technique which works by first assigning an initial classification to the data, and then automatically proposing, evaluating and selecting the transformations that maximally decrease the number of errors. Each such transformation, or rule, consists of a predicate and a target. In our implementation of TBL - fnTBL predicates consist of a conjunction of atomic predicates, such as feature identity (e.g. DBD3D6CS</Paragraph>
      <Paragraph position="2"> CV), etc.</Paragraph>
      <Paragraph position="3"> TBL has some attractive qualities that make it suitable for the language-related tasks: it can automatically integrate heterogenous types of knowledge, without the need for explicit modeling (similar to Snow, Maximum Entropy, decision trees, etc); it is error-driven, therefore directly minimizes the  Spanish development data ultimate evaluation measure: the error rate; and it has an inherently dynamic behavior  . TBL has been previously applied to the English NER task (Aberdeen et al., 1995), with good results. The fnTBL-based NER system is designed in the same way as Brill's POS tagger (Brill, 1995), consisting of a morphological stage, where unknown words' chunks are guessed based on their morphological and capitalization representation, followed by a contextual stage, in which the full interaction between the words' features is leveraged for learning. The feature templates used are based on a combination of word, chunk and capitalization information of words in a 7-word window around the target word. The entire template list (133 templates) will be made available from the author's web page after the conclusion of the shared task.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="70" type="sub_section">
      <SectionTitle>
2.3 Snow
Snow - Sparse Network of Winnows - is an archi-
</SectionTitle>
      <Paragraph position="0"> tecture for error-driven machine learning, consisting of a sparse network of linear separator units over a common predefined or incrementally learned feature space. The system assigns weights to each feature, and iteratively updates these weights in such a way that the misclassification error is minimized.</Paragraph>
      <Paragraph position="1"> For more details on Snow's architecture, please refer to Munoz et al. (1999).</Paragraph>
      <Paragraph position="2"> Table 2 presents the results obtained by Snow on the NER task, when using the same methodology from Munoz et al. (1999), with the their templates  and with the same templates as fnTBL.</Paragraph>
      <Paragraph position="3">  The quality of chunk tags evolves as the algorithm progresses; there is no mismatch between the quality of the surrounding chunks during training and testing.</Paragraph>
      <Paragraph position="4">  In this experiment, we used the feature patterns described in Munoz et al. (1999): a combination of up to 2 words in a 3-word window around the target word and a combination of up to 4 chunks in a 7-word window around the target word. All throughout the paper, Snow's default parameters were used.</Paragraph>
    </Section>
    <Section position="3" start_page="70" end_page="70" type="sub_section">
      <SectionTitle>
2.4 Stacking Classifiers
</SectionTitle>
      <Paragraph position="0"> Both the fnTBL and the Snow methods have strengths and weaknesses: AF fnTBL's strength is represented by its dynamic modeling of chunk tags - by starting in a simple state and using complex feature interactions, it is able to reach a reasonable end-state. Its weakness consists in its acute myopia: the optimization is done greedily for the local context, and the feature interaction is observed only in the order in which the rules are selected. null AF Snow's strength consists in its ability to model interactions between the all features associated with a sample. However, in order to obtain good results, the system needs reliable contextual information. Since the approach is not dynamic by nature, good initial chunk classifications are needed.</Paragraph>
      <Paragraph position="1"> One way to address both weaknesses is to combine the two approaches through stacking, by applying Snow on fnTBL's output. This allows Snow to have access to reasonably reliable contextual information, and also allows the output of fnTBL to be corrected for multiple feature interaction. This stacking approach has an intuitive interpretation: first, the corpus is dynamically labeled using the most important features through fnTBL rules (coarse-grained optimization), and then is fine-grained tuned through a few full-feature-interaction iterations of Snow.</Paragraph>
      <Paragraph position="2"> Table 2 contrasts stacking Snow and fnTBL with running either fnTBL or Snow in isolation - an improvement of 1.6 F-measure points is obtained when stacking is applied. Interestingly, as shown in Figure 1, the relation between performance and Snowiteration number is not linear: the system initially takes a hit as it moves out of the local fnTBL maximum, but then proceeds to increase its performance,  on the development sets finally converging after 10 iterations to a F-measure value of 73.49.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML