File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1002_intro.xml

Size: 6,130 bytes

Last Modified: 2025-10-06 14:01:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1002">
  <Title>Using Predicate-Argument Structures for Information Extraction</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Learning to Recognize
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Predicate-Argument Structures
2.1 The Data
</SectionTitle>
      <Paragraph position="0"> Proposition Bank or PropBank is a one million word corpus annotated with predicate-argument structures. The corpus consists of the Penn Treebank 2 Wall Street Journal texts (www.cis.upenn.edu/a0 treebank). The PropBank annotations, performed at University of Pennsylvania (www.cis.upenn.edu/a0 ace) were described in (Kingsbury et al., 2002). To date PropBank has addressed only predicates lexicalized by verbs, proceeding from the most to the least common verbs while annotating verb predicates in the corpus. For any given predicate, a survey was made to determine the predicate usage and if required, the usages were divided in major senses. However, the senses are divided more on syntactic grounds than  semantic, under the fundamental assumption that syntactic frames are direct reflections of underlying semantics.</Paragraph>
      <Paragraph position="1"> The set of syntactic frames are determined by diathesis alternations, as defined in (Levin, 1993). Each of these syntactic frames reflect underlying semantic components that constrain allowable arguments of predicates. The expected arguments of each predicate are numbered sequentially from Arg0 to Arg5. Regardless of the syntactic frame or verb sense, the arguments are similarly labeled to determine near-similarity of the predicates. The general procedure was to select for each verb the roles that seem to occur most frequently and use these roles as mnemonics for the predicate arguments. Generally, Arg0 would stand for agent, Arg1 for direct object or theme whereas Arg2 represents indirect object, benefactive or instrument, but mnemonics tend to be verb specific. For example, when retrieving the argument structure for the verb-predicate assail with the sense &amp;quot;to tear attack&amp;quot; from www.cis.upenn.edu/a0 cotton/cgibin/pblex fmt.cgi, we find Arg0:agent, Arg1:entity assailed and Arg2:assailed for. Additionally, the argument may include functional tags from Treebank, e.g. ArgM-DIR indicates a directional, ArgM-LOC indicates a locative, and ArgM-TMP stands for a temporal.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 The Model
</SectionTitle>
      <Paragraph position="0"> In previous work using the PropBank corpus, (Gildea and Palmer, 2002) proposed a model predicting argument roles using the same statistical method as the one employed by (Gildea and Jurafsky, 2002) for predicting semantic roles based on the FrameNet corpus (Baker et al., 1998). This statistical technique of labeling predicate argument operates on the output of the probabilistic parser reported in (Collins, 1997). It consists of two tasks: (1) identifying the parse tree constituents corresponding to arguments of each predicate encoded in PropBank; and (2) recognizing the role corresponding to each argument. Each task can be cast a separate classifier.</Paragraph>
      <Paragraph position="1"> For example, the result of the first classifier on the sentence illustrated in Figure 2 is the identification of the two NPs as arguments. The second classifier assigns the specific roles ARG1 and ARG0 given the predicate &amp;quot;assailed&amp;quot;.</Paragraph>
      <Paragraph position="2"> [?] POSITION (pos) [?] Indicates if the constituent appears before or after the the predicate in the sentence.</Paragraph>
      <Paragraph position="3"> [?] VOICE (voice) [?] This feature distinguishes between active or passive voice for the predicate phrase.</Paragraph>
      <Paragraph position="4"> are preserved.</Paragraph>
      <Paragraph position="5"> of the evaluated phrase. Case and morphological information [?] HEAD WORD (hw) [?] This feature contains the head word [?] PARSE TREE PATH (path): This feature contains the path in the parse tree between the predicate phrase and the argument phrase, expressed as a sequence of nonterminal labels linked by direction symbols (up or down), e.g.</Paragraph>
      <Paragraph position="6"> [?] PHRASE TYPE (pt): This feature indicates the syntactic NP for ARG1 in Figure 2.</Paragraph>
      <Paragraph position="7"> type of the phrase labeled as a predicate argument, e.g. noun phrases only, and it indicates if the NP is dominated by a sentence phrase (typical for subject arguments with active[?]voice predicates), or by a verb phrase (typical  for object arguments).</Paragraph>
      <Paragraph position="8"> [?] GOVERNING CATEGORY (gov) [?] This feature applies to [?] PREDICATE WORD [?] In our implementation this feature consists of two components: (1) VERB: the word itself with the case and morphological information preserved; and (2) LEMMA which represents the verb normalized to lower case and infinitive form.</Paragraph>
      <Paragraph position="9"> NP S VP VP for ARG1 in Figure 2.</Paragraph>
      <Paragraph position="10">  Statistical methods in general are hindered by the data sparsity problem. To achieve high accuracy and resolve the data sparsity problem the method reported in (Gildea and Palmer, 2002; Gildea and Jurafsky, 2002) employed a backoff solution based on a lattice that combines the model features. For practical reasons, this solution restricts the size of the feature sets. For example, the backoff lattice in (Gildea and Palmer, 2002) consists of eight connected nodes for a five-feature set. A larger set of features will determine a very complex backoff lattice. Consequently, no new intuitions may be tested as no new features can be easily added to the model.</Paragraph>
      <Paragraph position="11"> In our studies we found that inductive learning through decision trees enabled us to easily test large sets of features and study the impact of each feature</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML