File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0635_metho.xml

Size: 9,556 bytes

Last Modified: 2025-10-06 14:09:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0635">
  <Title>Semantic Role Labeling Using Complete Syntactic Analysis</Title>
  <Section position="4" start_page="0" end_page="222" type="metho">
    <SectionTitle>
2 System Description
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="221" type="sub_section">
      <SectionTitle>
2.1 Mapping Arguments to Syntactic
Constituents
</SectionTitle>
      <Paragraph position="0"> Our approach maps each argument label to one syntactic constituent, using a strategy similar to (Surdeanu et al., 2003). Using a bottom-up approach, we map each argument to the first phrase that has the exact same boundaries and climb as high as possible in the syntactic tree across unary production chains.</Paragraph>
      <Paragraph position="1"> Unfortunately, this one-to-one mapping between semantic arguments and syntactic constituents is not always possible. One semantic argument may be mapped to many syntactic constituents due to: (a) intrinsic differences between the syntactic and semantic representations, and (b) incorrect syntactic structure. Figure 1 illustrates each one of these situations: Figure 1 (a) shows a sentence where each semantic argument correctly maps to one syntactic constituent; Figure 1 (b) illustrates the situation where one semantic argument correctly maps to two syntactic constituents; and Figure 1 (c) shows a one-to-many mapping caused by an incorrect syntactic structure: argument A0 maps to two phrases, the terminal &amp;quot;by&amp;quot; and the noun phrase &amp;quot;Robert Goldberg&amp;quot;, due to the incorrect attachment of the last prepositional phrase, &amp;quot;at the University of California&amp;quot;. Using the above observations, we separate one- null ing to their mapping to syntactic constituents obtained with the Charniak parser: (a) one-to-one, (b) one-to-many, all syntactic constituents have same parent, (c) one-to-many, syntactic constituents have different parents.</Paragraph>
      <Paragraph position="2"> to-many mappings in two classes: (a) when the syntactic constituents mapped to the semantic argument have the same parent (Figure 1 (b)) the mapping is correct and/or could theoretically be learned by a sequential SRL strategy, and (b) when the syntactic constituents mapped to the same argument have different parents, the mapping is generally caused by incorrect syntax. Such cases are very hard to be learned due to the irregularities of the parser errors.</Paragraph>
      <Paragraph position="3"> Table 1 shows the distribution of semantic arguments into one of the above classes, using the syntactic trees provided by the Charniak parser. For the results reported in this paper, we model only one-to-one mappings between semantic arguments and syntactic constituents. A subset of the one-to-many mappings are addressed with a simple heuristic, described in Section 2.4.</Paragraph>
    </Section>
    <Section position="2" start_page="221" end_page="221" type="sub_section">
      <SectionTitle>
2.2 Features
</SectionTitle>
      <Paragraph position="0"> The features incorporated in the proposed model are inspired from the work of (Gildea and Jurafsky, 2002; Surdeanu et al., 2003; Pradhan et al., 2005; Collins, 1999) and can be classified into five classes: (a) features that capture the internal structure of the candidate argument, (b) features extracted The syntactic label of the candidate constituent.</Paragraph>
      <Paragraph position="1"> The constituent head word, suffixes of length 2, 3, and 4, lemma, and POS tag.</Paragraph>
      <Paragraph position="2"> The constituent content word, suffixes of length 2, 3, and 4, lemma, POS tag, and NE label. Content words, which add informative lexicalized information different from the head word, were detected using the heuristics of (Surdeanu et al., 2003).</Paragraph>
      <Paragraph position="3"> The first and last constituent words and their POS tags. NE labels included in the candidate phrase.</Paragraph>
      <Paragraph position="4"> Binary features to indicate the presence of temporal cue words, i.e. words that appear often in AM-TMP phrases in training.</Paragraph>
      <Paragraph position="5"> For each TreeBank syntactic label we added a feature to indicate the number of such labels included in the candidate phrase.</Paragraph>
      <Paragraph position="6"> The sequence of syntactic labels of the constituent  from the argument context, (c) features that describe properties of the target predicate, (d) features generated from the predicate context, and (e) features that model the distance between the predicate and the argument. These five feature sets are listed in Tables 2, 3, 4, 5, and 6.</Paragraph>
    </Section>
    <Section position="3" start_page="221" end_page="222" type="sub_section">
      <SectionTitle>
2.3 Classifier
</SectionTitle>
      <Paragraph position="0"> The classifiers used in this paper were developed using AdaBoost with confidence rated predictions (Schapire and Singer, 1999). AdaBoost combines many simple base classifiers or rules (in our case decision trees of depth 3) into a single strong classifier using a weighted-voted scheme. Each base classifier is learned sequentially from weighted examples and the weights are dynamically adjusted every learning iteration based on the behavior of the  The predicate word and lemma.</Paragraph>
      <Paragraph position="1"> The predicate voice. We currently distinguish five voice types: active, passive, copulative, infinitive, and progressive. A binary feature to indicate if the predicate is frequent - i.e. it appears more than twice in the training partition - or not.  Sub-categorization rule, i.e. the phrase structure rule that expands the predicate immediate parent, e.g.</Paragraph>
      <Paragraph position="2"> NP-VBGNNNNS for the predicate in Figure 1 (b).</Paragraph>
      <Paragraph position="3">  The path in the syntactic tree between the argument phrase and the predicate as a chain of syntactic labels along with the traversal direction (up or down).</Paragraph>
      <Paragraph position="4"> The length of the above syntactic path.</Paragraph>
      <Paragraph position="5"> The number of clauses (S* phrases) in the path.</Paragraph>
      <Paragraph position="6"> The number of verb phrases (VP) in the path.</Paragraph>
      <Paragraph position="7"> The subsumption count, i.e. the difference between the depths in the syntactic tree of the argument and predicate constituents. This value is 0 if the two phrases share the same parent.</Paragraph>
      <Paragraph position="8"> The governing category, which indicates if NP arguments are dominated by a sentence (typical for subjects) or a verb phrase (typical for objects).</Paragraph>
      <Paragraph position="9"> We generalize syntactic paths with more than 3 elements using two templates:  (a) Arg|Ancestor|Ni |Pred, where Arg is the argument label, Pred is the predicate label, Ancestor is the label of the common ancestor, and Ni is instantiated with all the labels between Pred and Ancestor in the full path; and (b) Arg|Ni |Ancestor|Pred, where Ni is  instantiated with all the labels between Arg and Ancestor in the full path.</Paragraph>
      <Paragraph position="10"> The surface distance between the predicate and the argument phrases encoded as: the number of tokens, verb terminals (VB*), commas, and coordinations (CC) between the argument and predicate phrases, and a binary feature to indicate if the two constituents are adjacent.</Paragraph>
      <Paragraph position="11"> A binary feature to indicate if the argument starts with a predicate particle, i.e. a token seen with the RP* POS tag and directly attached to the predicate in training.  previously learned rules.</Paragraph>
      <Paragraph position="12"> We trained one-vs-all classifiers for the top 24 most common arguments in training (including R-A* and C-A*). For simplicity we do not label predicates. Following the strategy proposed by (Carreras et al., 2004) we select training examples (both positive and negative) only from: (a) the first S* phrase that includes the predicate, or (b) from phrases that appear to the left of the predicate in the sentence. More than 98% of the arguments fall into one of these classes.</Paragraph>
      <Paragraph position="13"> At prediction time the classifiers are combined using a simple greedy technique that iteratively assigns to each predicate the argument classified with the highest confidence. For each predicate we consider as candidates all AM attributes, but only numbered attributes indicated in the corresponding PropBank frame.</Paragraph>
    </Section>
    <Section position="4" start_page="222" end_page="222" type="sub_section">
      <SectionTitle>
2.4 Argument Expansion Heuristics
</SectionTitle>
      <Paragraph position="0"> We address arguments that should map to more than one terminal phrase with the following post-processing heuristic: if an argument is mapped to one terminal phrase, its boundaries are extended to the right to include all terminal phrases that are not already labeled as other arguments for the same predicate. For example, after the system tags &amp;quot;consumer&amp;quot; as the beginning of an A1 argument in Figure 1, this heuristic extends the right boundary of the A1 argument to include the following terminal, &amp;quot;prices&amp;quot;.</Paragraph>
      <Paragraph position="1"> To handle inconsistencies in the treatment of quotes in parsing we added a second heuristic: arguments are expanded to include preceding/following quotes if the corresponding pairing quote is already included in the argument constituent.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML