File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0627_metho.xml

Size: 9,629 bytes

Last Modified: 2025-10-06 14:09:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0627">
  <Title>Semantic Role Lableing System using Maximum Entropy Classi er [?]</Title>
  <Section position="4" start_page="0" end_page="190" type="metho">
    <SectionTitle>
2 System Description
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="189" type="sub_section">
      <SectionTitle>
2.1 Constituent-by-Constituent
</SectionTitle>
      <Paragraph position="0"> We use syntactic constituent as the unit of labeling.</Paragraph>
      <Paragraph position="1"> However, it is impossible for each argument to nd its matching constituent in all auto parsing trees. According to statistics, about 10% arguments have no matching constituents in the training set of 245,353  constituents. The top ve arguments with no matching constituents are shown in Table 1. Here, Charniak parser got 10.08% no matching arguments and Collins parser got 11.89%.</Paragraph>
      <Paragraph position="2">  Therefore, we can see that Charniak parser got a better result than Collins parser in the task of SRL. So we use the full analysis results created by Charniak parser as our classi er's inputs. Assume that we could label all AM-MOD and AM-NEG arguments correctly with simple post processing rules, the upper bound of performance could achieve about 95% recall.</Paragraph>
      <Paragraph position="3"> At the same time, we can see that for some arguments, both parsers got lots of no matchings such as AM-MOD, AM-NEG, and so on. After analyzing the training data, we can recognize that the performance of these arguments can improve a lot after using some simple post processing rules only, however other arguments' no matching are caused primarily by parsing errors. The comparison between using and not using post processing rules is shown in Section 3.2.</Paragraph>
      <Paragraph position="4"> Because of the high speed and no affection in the number of classes with ef ciency of maximum entropy classi er, we just use one stage to label all arguments of predicates. It means that the NULL tag of constituents is regarded as a class like ArgN and ArgM .</Paragraph>
    </Section>
    <Section position="2" start_page="189" end_page="190" type="sub_section">
      <SectionTitle>
2.2 Features
</SectionTitle>
      <Paragraph position="0"> The following features, which we refer to as the basic features modi ed lightly from Pradhan et al. (2005), are provided in the shared task data for each constituent.</Paragraph>
      <Paragraph position="1">  * Predicate lemma * Path: The syntactic path through the parse tree from the parse constituent to the predicate.</Paragraph>
      <Paragraph position="2"> * Phrase type * Position: The position of the constituent with respect to its predicate. It has two values, before and after ,  for the predicate. For the situation of cover , we use a heuristic rule to ignore all of them because there is no chance for them to become an argument of the predicate. * Voice: Whether the predicate is realized as an active or passive construction. We use a simple rule to recognize passive voiced predicates which are labeled with part of speech VBN and sequences with AUX.</Paragraph>
      <Paragraph position="3"> * Head word stem: The stemming result of the constituent's syntactic head. A rule based stemming algorithm (Porter, 1980) is used. Collins Ph.D thesis (Collins,  1999)[Appendix. A] describs some rules to identify the head word of a constituent. Especially for prepositional phrase (PP) constituent, the normal head words are not very discriminative. So we use the last noun in the PP replacing the traditional head word.</Paragraph>
      <Paragraph position="4"> * Sub-categorization We also use the following additional features. * Predicate POS * Predicate suf x: The suf x of the predicate. Here, we use the last 3 characters as the feature.</Paragraph>
      <Paragraph position="5"> * Named entity: The named entity's type in the constituent if it ends with a named entity. There are four types: LOC, ORG, PER and MISC.</Paragraph>
      <Paragraph position="6"> * Path length: The length of the path between a constituent and its predicate.</Paragraph>
      <Paragraph position="7"> * Partial path: The part of the path from the constituent to the lowest common ancestor of the predicate and the constituent.</Paragraph>
      <Paragraph position="8"> * Clause layer: The number of clauses on the path between a constituent and its predicate.</Paragraph>
      <Paragraph position="9">  We also use some combinations of the above features to build some combinational features. Lots of combinational features which were supposed to contribute the SRL task of added one by one. At the same time, we removed ones which made the performance decrease in practical experiments. At last, we keep the following combinations:</Paragraph>
    </Section>
    <Section position="3" start_page="190" end_page="190" type="sub_section">
      <SectionTitle>
2.3 Classi er
</SectionTitle>
      <Paragraph position="0"> Le Zhang's Maximum Entropy Modeling Toolkit 1, and the L-BFGS parameter estimation algorithm with gaussian prior smoothing (Chen and Rosenfeld, 1999) are used as the maximum entropy classi er.</Paragraph>
      <Paragraph position="1"> We set gaussian prior to be 2 and use 1,000 iterations in the toolkit to get an optimal result through some comparative experiments.</Paragraph>
    </Section>
    <Section position="4" start_page="190" end_page="190" type="sub_section">
      <SectionTitle>
2.4 No Embedding
</SectionTitle>
      <Paragraph position="0"> The system described above might label two constituents even if one embeds in another, which is not allowed by the SRL rule. So we keep only one argument when more arguments embedding happens.</Paragraph>
      <Paragraph position="1"> Because it is easy for maximum entropy classi er to output each prediction's probability, we can label the constituent which has the largest probability among the embedding ones.</Paragraph>
    </Section>
    <Section position="5" start_page="190" end_page="190" type="sub_section">
      <SectionTitle>
2.5 Post Processing Stage
</SectionTitle>
      <Paragraph position="0"> After labeling the arguments which are matched with constituents exactly, we have to handle the arguments, such as AM-MOD, AM-NEG and AM-DIS, which have few matching with the constituents described in Section 2.1. So a post processing is given by using some simply rules:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="190" end_page="190" type="metho">
    <SectionTitle>
3 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="190" end_page="190" type="sub_section">
      <SectionTitle>
3.1 Data and Evaluation Metrics
</SectionTitle>
      <Paragraph position="0"> The data provided for the shared task is a part of PropBank corpus. It consists of the sections from the Wall Street Journal part of Penn Treebank. Sections 02-21 are training sets, and Section 24 is development set. The results are evaluated for precision, recall and Fb=1 numbers using the srl-eval.pl script provided by the shared task organizers.</Paragraph>
    </Section>
    <Section position="2" start_page="190" end_page="190" type="sub_section">
      <SectionTitle>
3.2 Post Processing
</SectionTitle>
      <Paragraph position="0"> After using post processing rules, the nal Fb=1 is improved from 71.02% to 75.27%.</Paragraph>
    </Section>
    <Section position="3" start_page="190" end_page="190" type="sub_section">
      <SectionTitle>
3.3 Performance Curve
</SectionTitle>
      <Paragraph position="0"> Because the training corpus is substantially enlarged, this allows us to test the scalability of learning-based SRL systems to large data set and compute learning curves to see how many data are necessary to train. We divide the training set, 20 sections Penn Treebank into 5 parts with 4 sections in each part. There are about 8,000 sentences in each part. Figure 1 shows the change of performance as a function of training set size. When all of training data are used, we get the best system performance as described in Section 3.4.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="190" end_page="191" type="metho">
    <SectionTitle>
6 H F W L R Q V L Q W U D L Q L Q J V H W
</SectionTitle>
    <Paragraph position="0"> Fb=1) effecting of the training set size.</Paragraph>
    <Paragraph position="1"> We can see that as the training set becomes larger and larger, so does the performance of SRL system.</Paragraph>
    <Paragraph position="2"> However, the rate of increase slackens. So we can say that at present state, the larger training data has favorable effect on the improvement of SRL system performance.</Paragraph>
    <Section position="1" start_page="190" end_page="191" type="sub_section">
      <SectionTitle>
3.4 Best System Results
</SectionTitle>
      <Paragraph position="0"> In all the experiments, all of the features and their combinations described above are used in our system. Table 2 presents our best system performance on the development and test sets.</Paragraph>
      <Paragraph position="1"> From the test results, we can see that our system gets much worse performance on Brown corpus than WSJ corpus. The reason is easy to be understood for the dropping of automatic syntactic parser performance on new corpus but WSJ corpus.</Paragraph>
      <Paragraph position="2"> The training time on PIV 2.4G CPU and 1G Mem machine is about 20 hours on all 20 sections, 39,832 null the WSJ test (bottom).</Paragraph>
      <Paragraph position="3"> sentences training set with 1,000 iterations and more than 1.5 million samples and 2 million features. The predicting time is about 160 seconds on 1,346sentences development set.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML