XML Viewer - w04-2422

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2422_metho.xml
Size: 10,664 bytes
Last Modified: 2025-10-06 14:09:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2422">
  <Title>Learning Transformation Rules for Semantic Role Labeling</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Introduction to Transformation-Based
Error-Driven Learning
</SectionTitle>
    <Paragraph position="0"> For the 2004 Conference on Computational Natural Language Learning (CoNLL), our team has applied the methodology popularized by Eric Brill for part-of-speech tagging and linguistic parsing (Brill, 1995; Brill, 1993).</Paragraph>
    <Paragraph position="1"> In this methodology, illustrated in Figure 1, a system learns a sequence of rules that best labels training data.</Paragraph>
    <Paragraph position="2"> These rules are then used to annotate previously unseen data.</Paragraph>
    <Paragraph position="3"> According to (Brill, 1995), a Transformation-Based Error-Driven learning application is defined by:  1. The initial annotation scheme 2. The space of allowable transformations 3. The iterative algorithm for choosing a transforma- null tion sequence The initial annotation may be extremely simple. For example, in a part-of-speech tagging task, the initial annotation may assign each token its most likely tag without any regard to context (Brill, 1995).</Paragraph>
    <Paragraph position="4"> The iterative learning algorithm typically consists of simply searching for a rule that maximizes the increase in some objective function using a greedy hill-climbing  Error-Driven learning strategy. For the CoNLL shared task, since participants are evaluated by their F1 scores, it is reasonable to use the F1 score as an objective function. We also implemented some extensions to the hill-climbing strategy that we describe in Section 2.3.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Experimental Setting
</SectionTitle>
    <Paragraph position="0"> In our approach, we used three successive learning stages-the first stage tags the verb region V, the second tags the A0 and A1 arguments, and the third tags all remaining arguments. The output of each stage becomes the initial annotation for the following stage. Therefore, our system only defines an explicit initial annotation for the verb-tagging phase: for each proposition, we initially tag only the single token containing the verb as V.</Paragraph>
    <Paragraph position="1"> The search for new transformation templates is terminated when no new transformation can be found that would improve the objective function by at least 0.03%.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Transformation templates
</SectionTitle>
      <Paragraph position="0"> For the first stage, transformations are generated from the following eight transformation templates: Lengthen [shorten] the end of region1 V by one token if: a,b) followed by chunk with tag=X c,d) followed by token with POS2=X e,f) followed by chunk with tag=X and token with POS=Y g,h) the verb token's lemma is X In this formulation, &amp;quot;chunk&amp;quot; refers to the IOB2 chunks, and &amp;quot;clause&amp;quot; refers to the nested clause structure (S regions) given as task input. &amp;quot;Lemma&amp;quot; refers to the infinitive form of the verb, identified in the task input and coreferenced in the PropBank data. X and Y are variables that range over all types of chunks, POS tags, or lemmas. The rule-learning system must determine which values for these variables will produce the most effective transformations. For example, a rule that the system might produce from template 'e' is: Lengthen the end of region V by one token if the region is followed by chunk with tag=PRT and token with POS=RP.</Paragraph>
      <Paragraph position="1"> Based on the observation that all V regions in the training data were either one or two tokens in length, an additional constraint was added to the first stage, requiring that lengthening-rules only apply to regions of length one, and shortening-rules only apply to regions of length two. The second and third stages use a common set of eleven transformation templates, but in the second stage the learner is restricted to adding or altering only A0 and  A1 regions. The transformation templates are as follows: A,B) If chunk with tag=X is followed [preceded] directly by region with tag=Y , mark chunk as Z.</Paragraph>
      <Paragraph position="2"> C,D) If token with POS=X is followed [preceded] directly by region with tag=Y , mark token as Z.</Paragraph>
      <Paragraph position="3"> E,F) If chunk with tag=X is followed [preceded] (perhaps indirectly) by region with tag=Y , mark chunk as Z.</Paragraph>
      <Paragraph position="4"> G,H) If region with tag=X is followed [preceded] by chunk with tag=PP, which is in turn followed [preceded] by chunk with tag=Y , extend X forward [backward] through Y .</Paragraph>
      <Paragraph position="5"> 1In this paper, we use the term &amp;quot;region&amp;quot; to refer to a section of corpus text that has been labeled in the output as a verb or as a verb argument. We also use the term in rule definitions to refer to the type of label assigned to that section of text. 2part-of-speech I,J) If verb's first token has POS=X [and is preceded by POS=Y ], switch A0 and A1.</Paragraph>
      <Paragraph position="6"> K) If region with tag=X is contained in a clause null starting verb phrase, and this is preceded by a clause-starting token with POS=Y , mark token Y as Z.</Paragraph>
      <Paragraph position="7"> Templates &amp;quot;A-H&amp;quot; are meant to capture structural relationships among arguments, such as the fact that A1 regions usually follow V regions, or that arguments may consist of several NP chunks joined by PP chunks. Templates &amp;quot;I&amp;quot; and &amp;quot;J&amp;quot; were written to discover passive verb relationships. Template &amp;quot;K&amp;quot; was an explicit (admittedly ad hoc) attempt to recognize R-A0 and R-A1 arguments. To avoid creating tagged regions that overlap, we use a first-tag-wins strategy: if a transformation would tag a new region that overlaps an existing tagged region, the new region is trimmed until any overlaps vanish.</Paragraph>
      <Paragraph position="8"> Notice that unlike the templates in the first stage, these templates make no reference to lexical information. In particular, no rule takes advantage of PropBank data in its tagging process3. We anticipate that using PropBank data would potentially improve performance, but we have not yet experimented with it. Also, without any lexical information in these templates, we are capturing only general patterns of argument structure within the training corpus, not the statistical patterns of particular verb frames. In future experiments we expect to incorporate lexical data into transformation rules.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Arguments and clause structure
</SectionTitle>
      <Paragraph position="0"> In order to gain some traction on the problem, we analyzed the relationship between semantic arguments and clause boundaries. To investigate this, we labeled each argument with the smallest clause containing it as a proper subset. We then tallied the number of each type of argument labeled with the same clause as its verb, and the number labeled with a different clause. The results are shown in Table 1.</Paragraph>
      <Paragraph position="1"> Note that for almost all argument types, the overwhelming majority of arguments are found in the same clause as the verb. This motivated us to add an additional constraint to the transformation templates A-J: only create arguments in the same clause as the verb. This simplification necessarily will miss any legitimate arguments outside the clause (most notably 20% of A0 arguments).</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Reordering of learned rules
</SectionTitle>
      <Paragraph position="0"> In observing the sequence of transformations learned by the system, it became apparent that the system's strict 3We actually do use PropBank in a limited way: no transformation will assign an argument A0-A5 to a verb unless that argument is listed for one of the verb's senses in PropBank.</Paragraph>
      <Paragraph position="1">  data (arguments with fewer than 50 examples omitted) greedy-hill-climbing strategy often learned a non-optimal ordering of rules. This is because the system has no look-ahead capability to check whether a sequence of multiple rules applied in succession might produce a good final result despite providing little or no initial improvement. The addition of a look-ahead searcher has been suggested (Brill, 1995), but we have not seen it implemented in a research context, likely due to the fact that a straight-forward implementation of the concept would at minimum square the amount of time required for training.</Paragraph>
      <Paragraph position="2"> Instead, we implemented a look-behind search strategy, which allows rules to be reordered after discovery. It is meant to address the case in which the system learns a set of rules that each produce improvements in the target function, but interact with each other in a non-optimal way. Whenever our system discovers a new rule, rather than simply applying it and searching for the next rule, it is allowed to try all permutations of the last n discovered rules to see whether performance would improve by using a different ordering. If so, the rules are re-ordered.</Paragraph>
      <Paragraph position="3"> To our knowledge, this strategy has not been employed in Transformation-Based Error-Driven learning settings.</Paragraph>
      <Paragraph position="4"> In our experiments, the strategy discovered transformation sequences that better annotated the input data without using more rules, and therefore seems to produce a labeler less likely to overfit the training data. In our testing, the technique seems to have increased the overall F1 score by between 0.5% and 1.0%--we caution, however,  with all-zero entries omitted) that we have not undertaken a rigorous comparative study of the technique.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Results
</SectionTitle>
    <Paragraph position="0"> The quality of our transformation rules on the training set is shown in Table 2, and the results on the test set are shown in Table 3. The rules that generated these results are shown in Table 4, along with the iterative F1 scores on the training set as the rules are learned.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML