File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-2007_intro.xml

Size: 9,110 bytes

Last Modified: 2025-10-06 14:00:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2007">
  <Title>Noun Phrase Recognition by System Combination</Title>
  <Section position="4" start_page="0" end_page="51" type="intro">
    <SectionTitle>
2 Methods and experiments
</SectionTitle>
    <Paragraph position="0"> In this section we start with a description of our task: recognizing noun phrases. After this we introduce the different data representations we use and our machine learning algorithms. We conclude with an outline of techniques for combining classifier results.</Paragraph>
    <Section position="1" start_page="0" end_page="50" type="sub_section">
      <SectionTitle>
2.1 Task description
</SectionTitle>
      <Paragraph position="0"> Noun phrase recognition can be divided in two tasks: recognizing base noun phrases and recognizing arbitrary noun phrases. Base noun phrases (baseNPs) are noun phrases which do not contain another noun phrase. For example, the sentence In \[ early trading \] in \[ Hong Kong \] \[ Monday \] , \[ gold \] was quoted at \[ $ 366.50 \] \[ an ounce \] .</Paragraph>
      <Paragraph position="1"> contains six baseNPs (marked as phrases between square brackets). The phrase $ 366.50 an ounce is a noun phrase as well. However, it is not a baseNP since it contains two other noun phrases.</Paragraph>
      <Paragraph position="2"> Two baseNP data sets have been put forward by (Ramshaw and Marcus, 1995). The main data set consist of four sections (15-18) of the Wall Street Journal (WSJ) part of the Penn Treebank (Marcus et al., 1993) as training material and one section (20) as test material 1. The baseNPs in this data are slightly different from the ones that can be derived from the Treebank, most notably in the attachment of genitive markers.</Paragraph>
      <Paragraph position="3"> The recognition task involving arbitrary noun phrases attempts to find both baseNPs and noun phrases that contain other noun phrases. A standard data set for this task was put forward at the CoNLL-99 workshop. It consist on the same parts of the Penn Treebank as the main baseNP data set: WSJ sections 15-18 as training data and section 20 as test data 2. The noun phrases in this data set are the same as in the Treebank and therefore the baseNPs in this data set are slightly different from the ones in the (Ramshaw and Marcus, 1995) data sets.</Paragraph>
      <Paragraph position="4"> In both tasks, performance is measured with three scores. First, with the percentage of detected noun phrases that are correct (precision). Second, with the percentage of noun phrases in the data that were found by the classifier (recall). And third,  with the FZ=I rate which is equal to (2*precision*recall)/(precision+recall). The latter rate has been used as the target for optimization.</Paragraph>
    </Section>
    <Section position="2" start_page="50" end_page="50" type="sub_section">
      <SectionTitle>
2.2 Data representation
</SectionTitle>
      <Paragraph position="0"> In our example sentence in section 2.1, noun phrases are represented by bracket structures. Both (Mufioz et al., 1999) and (Tjong Kim Sang and Veenstra, 1999) have shown how classifiers can process bracket structures. One classifier can be trained to recognize open brackets (O) while another will process close brackets (C). Their results can be converted to baseNPs by making pairs of open and close brackets with large probability scores (Mufioz et al., 1999) or by regarding only the shortest phrases between open and close brackets as baseNPs (Tjong Kim Sang and Veenstra, 1999). We have used the bracket representation (O+C) in combination with the second baseNP construction method.</Paragraph>
      <Paragraph position="1"> An alternative representation for baseNPs has been put forward by (Ramshaw and Marcus, 1995).</Paragraph>
      <Paragraph position="2"> They have defined baseNP recognition as a tagging task: words can be inside a baseNP (1) or outside of baseNPs (O). In the case that one baseNP immediately follows another baseNP, the first word in the second baseNP receives tag B. Example: Ino earlyi tradingr ino Hongl Kongz MondayB ,o goldz waso quotedo ato $r 366.50z anB ounce/ -o This set of three tags is sufficient for encoding baseNP structures since these structures are non-recursive and nonoverlapping.</Paragraph>
      <Paragraph position="3"> (Tjong Kim Sang and Veenstra, 1999) have presented three variants of this tagging representation. First, the B tag can be used for the first word of every noun phrase (IOB2 representation). Second, instead of the B tag an E tag can be used to mark the last word of a baseNP immediately before another baseNP (IOE1). And third, the E tag can be used for every noun phrase final word (IOE2). They have used the (Ramshaw and Marcus, 1995) representation as well (IOB1). We will use these four tagging representations as well as the O+C representation.</Paragraph>
    </Section>
    <Section position="3" start_page="50" end_page="50" type="sub_section">
      <SectionTitle>
2.3 Machine learning algorithms
</SectionTitle>
      <Paragraph position="0"> We have used the memory-based learning algorithm IBI-IG which is part of TiMBL package (Daelemans et al., 1999b). In memory-based learning the training data is stored and a new item is classified by the most frequent classification among training items which are closest to this new item. Data items are represented as sets of feature-value pairs. In IBI-IG each feature receives a weight which is based on the amount of information which it provides for computing the classification of the items in the training data. These feature weights are used for computing the distance between a pair of data items (Daelemans et al., 1999b). ml-IG has been used successfully on a large variety of natural language processing tasks.</Paragraph>
      <Paragraph position="1"> Beside IBI-IG, we have used IGTREE in the combination experiments. IGTREE is a decision tree variant of II31-IG (Daelemans et al., 1999b). It uses the same feature weight method as IBI-IG. Data items are stored in a tree with the most important features close to the root node. A new item is classified by traveling down from the root node until a leaf node is reached or no branch is available for the current feature value. The most frequent classification of the current node will be chosen.</Paragraph>
    </Section>
    <Section position="4" start_page="50" end_page="51" type="sub_section">
      <SectionTitle>
2.4 Combination techniques
</SectionTitle>
      <Paragraph position="0"> Our experiments will result in different classifications of the data and we need to find out how to combine these. For this purpose we have evaluated different voting mechanisms, effectively the voting methods as described in (Van Halteren et al., 1998).</Paragraph>
      <Paragraph position="1"> All combination methods assign some weight to the results of the individual classifier. For each input token, they pick the classification score with the highest total score. For example, if five classifiers have weights 0.9, 0.4, 0.8, 0.6 and 0.6 respectively and they classify some token as npstart, null, npstart, null and null, then the combination method will pick npstart since it has a higher total score (1.7) than null (1.6). The values of the weights are usually estimated by processing a part of the training data, the tuning data, which has been kept separate as training data for the combination process.</Paragraph>
      <Paragraph position="2"> In the first voting method, each of the five classitiers receives the same weight (majority). The second method regards as the weight of each individual classification algorithm its accuracy on the tuning data (TotPrecision). The third voting method computes the precision of each assigned tag per classifier and uses this value as a weight for the classifier in those cases that it chooses the tag (TagPrecision).</Paragraph>
      <Paragraph position="3"> The fourth method uses the tag precision weights as well but it subtracts from them the recall values of the competing classifier results. Finally, the fifth method uses not only a weight for the current classification but it also computes weights for other possible classifications. The other classifications are determined by examining the tuning data and registering the correct values for every pair of classifier results (pair-wise voting).</Paragraph>
      <Paragraph position="4"> Apart from these five voting methods we have also processed the output streams with two classifiers: IBI-IG (memory-based) and IGTREE (decision tree).</Paragraph>
      <Paragraph position="5"> This approach is called classifier stacking. Like (Van Halteren et al., 1998), we have used different input versions: one containing only the classifier output and another containing both classifier output and a compressed representation of the classifier input.</Paragraph>
      <Paragraph position="6">  five classifiers applied to the baseNP training data after conversion to the open bracket (O) and the close bracket representation (C).</Paragraph>
      <Paragraph position="7"> For the latter purpose we have used the part-of-speech tag of the current word.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML