File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0421_metho.xml

Size: 11,264 bytes

Last Modified: 2025-10-06 14:08:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0421">
  <Title>A Simple Named Entity Extractor using AdaBoost</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Learning the Decisions
</SectionTitle>
    <Paragraph position="0"> We use AdaBoost with confidence rated predictions as learning algorithm for the classifiers involved in the system. More particularly, the basic binary version has been used to learn the I, O, and B classifiers for the NER module, and the multiclass multilabel extension (namely AdaBoost.MH) has been used to perform entity classification. null The idea of these algorithms is to learn an accurate strong classifier by linearly combining, in a weighted voting scheme, many simple and moderately-accurate base classifiers or rules. Each base rule is learned sequentially by presenting the base learning algorithm a weighting over the examples, which is dynamically adjusted depending on the behavior of the previously learned rules.</Paragraph>
    <Paragraph position="1"> AdaBoost has been applied, with significant success, to a number of problems in different areas, including NLP tasks (Schapire, 2002). We refer the reader to (Schapire and Singer, 1999) for details about the general algorithms (for both the binary and multiclass variants), and (Carreras and M`arquez, 2001; Carreras et al., 2002b) for particular applications to NLP domains.</Paragraph>
    <Paragraph position="2"> In our setting, the boosting algorithm combines several small fixed-depth decision trees, as base rules. Each branch of a tree is, in fact, a conjunction of binary features, allowing the strong boosting classifier to work with complex and expressive rules.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Feature Representation
</SectionTitle>
    <Paragraph position="0"> A window W anchored in a word w represents the local context of w used by a classifier to make a decision on that word. In the window, each word around w is codified with a set of primitive features, together with its relative position to w. Each primitive feature with each relative position and each possible value forms a final binary feature for the classifier (e.g., &amp;quot;the word form at position(-2) is street&amp;quot;). The kind of information coded in those features may be grouped in the following kinds:  * Lexical: Word forms and their position in the window (e.g., W(3)=&amp;quot;bank&amp;quot;). When available, word lemmas and their position in the window.</Paragraph>
    <Paragraph position="1"> * Syntactic: Part-of-Speech tags and Chunk tags.</Paragraph>
    <Paragraph position="2"> * Orthographic: Word properties with regard to how is it capitalized (initial-caps, all-caps), the kind  of characters that form the word (contains-digits, all-digits, alphanumeric, roman-number), the presence of punctuation marks (contains-dots, containshyphen, acronym), single character patterns (lonelyinitial, punctuation-mark, single-char), or the membership of the word to a predefined class (functionalword1), or pattern (URL).</Paragraph>
    <Paragraph position="3">  * Affixes: The prefixes and suffixes of the word (up to 4 characters).</Paragraph>
    <Paragraph position="4"> * Word Type Patterns: Type pattern of consecutive  words in the context. The type of a word is either functional (f), capitalized (C), lowercased (l), punctuation mark (.), quote (') or other (x). For instance, the word type pattern for the phrase &amp;quot;John Smith payed 3 euros&amp;quot; would be CClxl.</Paragraph>
    <Paragraph position="5"> * Left Predictions: The {B,I,O} tags being predicted in the current classification (at recognition stage), or the predicted category for entities in left context (at classification stage).</Paragraph>
    <Paragraph position="6"> * Bag-of-Words: Form of the words in the window, without considering positions (e.g., &amp;quot;bank&amp;quot;[?] W). * Trigger Words: Triggering properties of window words. An external list is used to determine whether a word may trigger a certain Named Entity (NE) class (e.g., &amp;quot;president&amp;quot; may trigger class PER). * Gazetteer Features: Gazetteer information for window words. An external gazetteer is used to determine possible classes for each word.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 The NER Module
</SectionTitle>
    <Paragraph position="0"> The Named Entity recognition task is performed as a combination of local classifiers which test simple decisions on each word in the text.</Paragraph>
    <Paragraph position="1"> According to a BIO labelling scheme, each word is tagged as either the beginning of a NE (B tag), a word inside a NE (I tag), or a word outside a NE (O tag).</Paragraph>
    <Paragraph position="2"> We use three binary classifiers for the tagging, one corresponding to each tag. All the words in the train set are used as training examples, applying a one-vs-all binarization. When tagging, the sentence is processed from left to right, greedily selecting for each word the tag with maximum confidence that is coherent with the current solution (e.g., O tags cannot be followed by I tags). Despite its simplicity, the greedy BIO tagging performed very well for the NER task. Other more sophisticated representations and tagging schemes, studied in the past edition (Carreras et al., 2002a), did not improve the performance at all.</Paragraph>
    <Paragraph position="3"> The three classifiers use the same information to codify examples. According to the information types introduced in section 3, all the following features are considered for each target word: lexical, syntactic, orthographic, and affixes in a {-3,+3} window; left predictions in a {-3,-1} 1Functional words are determiners and prepositions which typically appear inside NEs.</Paragraph>
    <Paragraph position="4"> window; and all the word type patterns that cover the 0 position in a {-3,+3} window.</Paragraph>
    <Paragraph position="5"> The semantic information represented by the rest of features, namely bag-of-words, trigger words, and gazetteer features, did not help the recognition of Named Entities, and therefore was not used.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 The NEC Module
</SectionTitle>
    <Paragraph position="0"> NEC is regarded as a classification task, consisting of assigning the NE type to each already recognized NE. In contrast to the last year system, the problem has not been binarized and treated in an ECOC (error correcting output codes) combination scheme. Instead, the multiclass multilabel AdaBoost.MH algorithm has been used. The reason is that although ECOC provides slightly better results, its computational cost is also much higher than the required for AdaBoost.MH.</Paragraph>
    <Paragraph position="1"> The algorithm has been employed with different parameterizations, by modeling NEC either as a three-class classification problem (in which MISC is selected only when the entity is negatively classified as PER, ORG and LOC) or as a four-class problem, in which MISC is just one more class. The latter turned out to be the best choice (with very significant differences).</Paragraph>
    <Paragraph position="2"> The window information described in section 3 is used in the NEC module computing all features for a {-3,+3} window around the NE being classified, except for the bag-of-words group, for which a {-5,+5} window is used. Information relative to orthographic, left predictions, and bag-of-words features is straight-forwardly coded as described above, but other requires further detail: * Lexical features: Apart from word form and lemma for each window position, two additional binary features are used: One is satisfied when the focus NE form and lemma coincide exactly, and the other when they coincide after turning both of them into lowercase.</Paragraph>
    <Paragraph position="3">  words. Prefixes and suffixes of the NE being classified and of its internal components (e.g., considering the entity &amp;quot;People 's Daily&amp;quot;, &amp;quot;ly&amp;quot; is taken as a suffix of the NE, &amp;quot;ple&amp;quot; is taken as a suffix of the first internal word, etc.).</Paragraph>
    <Paragraph position="4"> * Trigger Words: Triggering properties of window words (e.g., W(3).trig=PER). Triggering properties of components of the NE being classified (e.g., for the entity &amp;quot;Bank of England&amp;quot; we could have a feature NE(1).trig=ORG). Context patterns to the left of the NE, where each word is marked with its triggering properties, or with a functional-word tag if appropriate (e.g., the phrase &amp;quot;the president of United States&amp;quot;, would produce the pattern f ORG f for the NE &amp;quot;United States&amp;quot;, assuming that the word &amp;quot;president&amp;quot; is listed as a possible trigger for ORG). * Gazetteer Features: Gazetteer information for the NE being classified and for its components (e.g., for the entity &amp;quot;Bank of England&amp;quot;, features NE(3).gaz=LOC and NE.gaz=ORG would be activated if &amp;quot;England&amp;quot; is found in the gazetteer as LOC and &amp;quot;Bank of England&amp;quot; as ORG, respectively.</Paragraph>
    <Paragraph position="5"> * Additionally, binary features encoding the length in words of the NE being classified are also used.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Experimental Setting
</SectionTitle>
    <Paragraph position="0"> The list of functional words for the task has been automatically constructed using the training set. The lowercased words inside a NE that appeared more than 3 times were selected as functional words for the language.</Paragraph>
    <Paragraph position="1"> Similarly, a gazetteer was constructed with the NEs in the training set. When training, only a random 40% of the entries in the gazetteer were considered. Moreover, we used external knowledge in the form of a list of trigger words for NEs and an external gazetteer. These knowledge sources are the same that we used in the last year competition for Spanish NEE. The entries of the triggerword list were linked to the Spanish WordNet, so they have been directly translated by picking the corresponding synsets of the English WordNet. The gazetteer has been left unchanged, assuming interlinguality of most of the entries. The gazetteer provided by the CoNLL-2003 organization has not been used in the work reported in this paper.</Paragraph>
    <Paragraph position="2"> In all cases, a preprocess of attribute filtering was performed in order to avoid overfitting and to speed-up learning. All features that occur less than 3 times in the training corpus were discarded.</Paragraph>
    <Paragraph position="3"> For each classification problem we trained the corresponding AdaBoost classifiers, learning up to 4,000 base decision trees per classifier, with depths ranging from 1 (decision stumps) to 4. The depth of the base rules and the number of rounds were directly optimized on the development set. The set of unlabelled examples provided by the organization was not used in this work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML