File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-2030_metho.xml

Size: 17,822 bytes

Last Modified: 2025-10-06 14:07:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2030">
  <Title>A Novel Use of Statistical Parsing to Extract Information from Text</Title>
  <Section position="4" start_page="226" end_page="227" type="metho">
    <SectionTitle>
3 Integrated Sentential Processing
</SectionTitle>
    <Paragraph position="0"> Almost all approaches to information extraction - even at the sentence level - are based on the divide-and-conquer strategy of reducing a complex problem to a set of simpler ones. Currently, the prevailing architecture for dividing sentential processing is a four-stage pipeline consisting of:  1. part-of-speech tagging 2. name finding 3. syntactic analysis, often limited to noun and verb group chunking 4. semantic interpretation, usually based on  pattern matching Since we were interested in exploiting recent advances in parsing, replacing the syntactic analysis stage of the standard pipeline with a modem statistical parser was an obvious possibility. However, pipelined architectures suffer from a serious disadvantage: errors accumulate as they propagate through the pipeline. For example, an error made during part-of-speech-tagging may cause a future error in syntactic analysis, which may in turn cause a semantic interpretation failure. There is no opportunity for a later stage, such as parsing, to influence or correct an earlier stage such as part-of-speech tagging.</Paragraph>
    <Paragraph position="1"> An integrated model can limit the propagation of errors by making all decisions jointly. For this reason, we focused on designing an integrated model in which tagging, namefinding, parsing, and semantic interpretation decisions all have the opportunity to mutually influence each other.</Paragraph>
    <Paragraph position="2"> A second consideration influenced our decision toward an integrated model. We were already using a generative statistical model for part-of-speech tagging (Weischedel et al.</Paragraph>
    <Paragraph position="3"> 1993), and more recently, had begun using a generative statistical model for name finding (Bikel et al. 1997). Finally, our newly constructed parser, like that of (Collins 1997), was based on a generative statistical model. Thus, each component of what would be the first three stages of our pipeline was based on  the same general class of statistical model. Although each model differed in its detailed probability structure, we believed that the essential elements of all three models could be generalized in a single probability model.</Paragraph>
    <Paragraph position="4"> If the single generalized model could then be extended to semantic anal);sis, all necessary sentence level processing would be contained in that model. Because generative statistical models had already proven successful for each of the first three stages, we were optimistic that some of their properties - especially their ability to learn from large amounts of data, and their robustness when presented with unexpected inputs - would also benefit semantic analysis.</Paragraph>
  </Section>
  <Section position="5" start_page="227" end_page="227" type="metho">
    <SectionTitle>
4 Representing Syntax and Semantics
</SectionTitle>
    <Paragraph position="0"> Jointly Our integrated model represents syntax and semantics jointly using augmented parse trees. In these trees, the standard TREEBANK structures are augmented to convey semantic information, that is, entities and relations. An example of an augmented parse tree is shown in Figure 3. The five key facts in this example  are: * &amp;quot;Nance&amp;quot; is the name of a person. * &amp;quot;A paid consultant to ABC News&amp;quot; describes a person.</Paragraph>
    <Paragraph position="1"> t &amp;quot;ABC News&amp;quot; is the name of an organization.</Paragraph>
    <Paragraph position="2"> * The person described as &amp;quot;a paid consultant  to ABC News&amp;quot; is employed by ABC News. * The person named &amp;quot;Nance&amp;quot; and the person described as &amp;quot;a paid consultant to ABC News&amp;quot; are the same person.</Paragraph>
    <Paragraph position="3"> Here, each &amp;quot;reportable&amp;quot; name or description is identified by a &amp;quot;-r&amp;quot; suffix attached to its semantic label. For example, &amp;quot;per-r&amp;quot; identifies &amp;quot;Nance&amp;quot; as a named person, and &amp;quot;per-desc-r&amp;quot; identifies &amp;quot;a paid consultant to ABC News&amp;quot; as a person description. Other labels indicate relations among entities. For example, the co-reference relation between &amp;quot;Nance&amp;quot; and &amp;quot;a paid consultant to ABC News&amp;quot; is indicated by &amp;quot;per-desc-of.&amp;quot; In this case, because the argument does not connect directly to the relation, the intervening nodes are labeled with semantics &amp;quot;-ptr&amp;quot; to indicate the connection. Further details are discussed in the section Tree Augmentation.</Paragraph>
  </Section>
  <Section position="6" start_page="227" end_page="229" type="metho">
    <SectionTitle>
5 Creating the Training Data
</SectionTitle>
    <Paragraph position="0"> To train our integrated model, we required a large corpus of augmented parse trees. Since it was known that the MUC-7 evaluation data would be drawn from a variety of newswire sources, and that the articles would focus on rocket launches, it was important that our training corpus be drawn from similar sources and that it cover similar events. Thus, we did not consider simply adding semantic labels to the existing Penn TREEBANK, which is drawn from a single source - the Wall Street Journal - and is impoverished in articles about rocket launches.</Paragraph>
    <Paragraph position="1"> Instead, we applied an information retrieval system to select a large number of articles from the desired sources, yielding a corpus rich in the desired types of events. The retrieved articles would then be annotated with augmented tree structures to serve as a training corpus.</Paragraph>
    <Paragraph position="2"> Initially, we tried to annotate the training corpus by hand marking, for each sentence, the entire augmented tree. It soon became painfully obvious that this task could not be performed in the available time. Our annotation staff found syntactic analysis particularly complex and slow going. By necessity, we adopted the strategy of hand marking only the semantics.</Paragraph>
    <Paragraph position="3"> Figure 4 shows an example of the semantic annotation, which was the only type of manual annotation we performed.</Paragraph>
    <Paragraph position="4"> To produce a corpus of augmented parse trees, we used the following multi-step training procedure which exploited the Penn  1. The model (see Section 7) was first trained  on purely syntactic parse trees from the TREEBANK, producing a model capable of broad-coverage syntactic parsing.</Paragraph>
    <Paragraph position="5"> parses that were consistent with the semantic annotation. A parse was considered consistent if no syntactic constituents crossed an annotated entity or description boundary.</Paragraph>
    <Paragraph position="6"> 2. Next, for each sentence in the semantically annotated corpus: a. The model was applied to parse the sentence, constrained to produce only b. The resulting parse tree was then augmented to reflect semantic structure in addition to syntactic structure.</Paragraph>
    <Paragraph position="7">  Applying this procedure yielded a new version of the semantically annotated corpus, now annotated with complete augmented trees like that in Figure 3.</Paragraph>
  </Section>
  <Section position="7" start_page="229" end_page="229" type="metho">
    <SectionTitle>
6 Tree Augmentation
</SectionTitle>
    <Paragraph position="0"> In this section, we describe the algorithm that was used to automatically produce augmented trees, starting with a) human-generated semantic annotations and b) machine-generated syntactic parse trees. For each sentence, combining these two sources involved five steps. These steps are given below:</Paragraph>
    <Section position="1" start_page="229" end_page="229" type="sub_section">
      <SectionTitle>
Tree Augmentation Algorithm
</SectionTitle>
      <Paragraph position="0"> . Nodes are inserted into the parse tree to distinguish names and descriptors that are not bracketed in the parse. For example, the parser produces a single noun phrase with no internal structure for &amp;quot;Lt. Cmdr.</Paragraph>
      <Paragraph position="1"> David Edwin Lewis&amp;quot;. Additional nodes must be inserted to distinguish the description, &amp;quot;Lt. Cmdr.,&amp;quot; and the name, &amp;quot;David Edwin Lewis.&amp;quot; . Semantic labels are attached to all nodes that correspond to names or descriptors.</Paragraph>
      <Paragraph position="2"> These labels reflect the entity type, such as person, organization, or location, as well as whether the node is a proper name or a descriptor.</Paragraph>
      <Paragraph position="3"> . For relations between entities, where one entity is not a syntactic modifier of the other, the lowermost parse node that spans both entities is identified. A semantic tag is then added to that node denoting the relationship. For example, in the sentence &amp;quot;Mary Fackler Schiavo is the inspector general of the U.S. Department of Transportation,&amp;quot; a co-reference semantic label is added to the S node spanning the name, &amp;quot;Mary Fackler Schiavo,&amp;quot; and the descriptor, &amp;quot;the inspector general of the U.S. Department of Transportation.&amp;quot; . Nodes are inserted into the parse tree to distinguish the arguments to each relation.</Paragraph>
      <Paragraph position="4"> In cases where there is a relation between two entities, and one of the entities is a syntactic modifier of the other, the inserted node serves to indicate the relation as well as the argument. For example, in the phrase &amp;quot;Lt. Cmdr. David Edwin Lewis,&amp;quot; a node is inserted to indicate that &amp;quot;Lt.</Paragraph>
      <Paragraph position="5"> Cmdr.&amp;quot; is a descriptor for &amp;quot;David Edwin Lewis.&amp;quot; . Whenever a relation involves an entity that is not a direct descendant of that relation in the parse tree, semantic pointer labels are attached to all of the intermediate nodes. These labels serve to form a continuous chain between the relation and its argument.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="229" end_page="229" type="metho">
    <SectionTitle>
7 Model Structure
</SectionTitle>
    <Paragraph position="0"> In our statistical model, trees are generated according to a process similar to that described in (Collins 1996, 1997). The detailed probability structure differs, however, in that it was designed to jointly perform part-of-speech tagging, name finding, syntactic parsing, and relation finding in a single process.</Paragraph>
    <Paragraph position="1"> For each constituent, the head is generated first, followed by the modifiers, which are generated from the head outward. Head words, along with their part-of-speech tags and features, are generated for each modifier as soon as the modifier is created. Word features are introduced primarily to help with unknown words, as in (Weischedel et al. 1993).</Paragraph>
    <Paragraph position="2"> We illustrate the generation process by walking through a few of the steps of the parse shown in Figure 3. At each step in the process, a choice is made from a statistical distribution, with the probability of each possible selection dependent on particular features of previously generated elements. We pick up the derivation just after the topmost S and its head word, said, have been produced.</Paragraph>
    <Paragraph position="3"> The next steps are to generate in order:  1. A head constituent for the S, in this case a VP.</Paragraph>
    <Paragraph position="4"> 2. Pre-modifier constituents for the S. In this case, there is only one: a PER/NP.</Paragraph>
    <Paragraph position="5"> 3. A head part-of-speech tag for the PER/NP, in this case PER/NNP.</Paragraph>
    <Paragraph position="6"> 230 4. A head word for the PER/NP, in this case nance.</Paragraph>
    <Paragraph position="7"> 5. Word features for the head word of the PER/NP, in this case capitalized.</Paragraph>
    <Paragraph position="8"> 6. A head constituent for the PER/NP, in this case a PER-R/NP.</Paragraph>
    <Paragraph position="9"> 7. Pre-modifier constituents for the PER/NP.  In this case, there are none.</Paragraph>
    <Paragraph position="10"> . Post-modifier constituents for the PER/NP. First a comma, then an SBAR structure, and then a second comma are each generated in turn.</Paragraph>
    <Paragraph position="11"> This generation process is continued until the entire tree has been produced.</Paragraph>
    <Paragraph position="12"> We now briefly summarize the probability structure of the model. The categories for head constituents, ch, are predicted based solely on the category of the parent node, cp: e(c h Icp), e.g. P(vpls ) Modifier constituent categories, Cm, are predicted based on their parent node, cp, the head constituent of their parent node, Chp, the previously generated modifier, Cm-1, and the head word of their parent, wp. Separate probabilities are maintained for left (pre) and right (post) modifiers:</Paragraph>
    <Paragraph position="14"> Part-of-speech tags, tin, for modifiers are predicted based on the modifier, Cm, the part-of-speech tag of the head word, th, and the head word itself, wh:</Paragraph>
    <Paragraph position="16"> Head words, win, for modifiers are predicted based on the modifier, cm, the part-of-speech tag of the modifier word , t,,, the part-of-speech tag of the head word, th, and the head word itself, Wh: P(W m ICm,tmth,Wh), e.g.</Paragraph>
    <Paragraph position="17"> P(nance I per / np, per / nnp, vbd, said) Finally, word features, fro, for modifiers are predicted based on the modifier, cm, the part-of-speech tag of the modifier word , tin, the part-of-speech tag of the head word , th, the head word itself, Wh, and whether or not the modifier head word, w,,, is known or unknown.</Paragraph>
    <Paragraph position="18"> P(fm \[Cm,tm,th,Wh,known(Wm)), e.g.</Paragraph>
    <Paragraph position="19"> P( cap I per I np, per / nnp, vbd, said, true) The probability of a complete tree is the product of the probabilities of generating each element in the tree. If we generalize the tree components (constituent labels, words, tags, etc.) and treat them all as simply elements, e, and treat all the conditioning factors as the history, h, we can write:</Paragraph>
    <Paragraph position="21"/>
  </Section>
  <Section position="9" start_page="229" end_page="231" type="metho">
    <SectionTitle>
8 Training the Model
</SectionTitle>
    <Paragraph position="0"> Maximum likelihood estimates for the model probabilities can be obtained by observing frequencies in the training corpus. However, because these estimates are too sparse to be relied upon, we use interpolated estimates consisting of mixtures of successively lower-order estimates (as in Placeway et al. 1993). For modifier constituents, components are:  Finally, for word features, the mixture components are:</Paragraph>
  </Section>
  <Section position="10" start_page="231" end_page="231" type="metho">
    <SectionTitle>
9 Searching the Model
</SectionTitle>
    <Paragraph position="0"> Given a sentence to be analyzed, the search program must find the most likely semantic and syntactic interpretation. More precisely, it must find the most likely augmented parse tree. Although mathematically the model predicts tree elements in a top-down fashion, we search the space bottom-up using a chart-based search. The search is kept tractable through a combination of CKY-style dynamic programming and pruning of low probability elements.</Paragraph>
    <Section position="1" start_page="231" end_page="231" type="sub_section">
      <SectionTitle>
9.1 Dynamic Programming
</SectionTitle>
      <Paragraph position="0"> Whenever two or more constituents are equivalent relative to all possible later parsing decisions, we apply dynamic programming, keeping only the most likely constituent in the chart. Two constituents are considered  equivalent if: 1. They have identical category labels. 2. Their head constituents have identical labels.</Paragraph>
      <Paragraph position="1"> 3. They have the same head word.</Paragraph>
      <Paragraph position="2"> 4. Their leftmost modifiers have identical labels.</Paragraph>
      <Paragraph position="3"> . Their rightmost modifiers have identical labels.</Paragraph>
    </Section>
    <Section position="2" start_page="231" end_page="231" type="sub_section">
      <SectionTitle>
9.2 Pruning
</SectionTitle>
      <Paragraph position="0"> Given multiple constituents that cover identical spans in the chart, only those constituents with probabilities within a threshold of the highest scoring constituent are maintained; all others are pruned. For purposes of pruning, and only for purposes of pruning, the prior probability of each constituent category is multiplied by the generative probability of that constituent (Goodman, 1997). We can think of this prior probability as an estimate of the probability of generating a subtree with the constituent category, starting at the topmost node. Thus, the scores used in pruning can be considered as the product of: . The probability of generating a constituent of the specified category, starting at the topmost node.</Paragraph>
      <Paragraph position="1"> . The probability of generating the structure beneath that constituent, having already generated a constituent of that category.</Paragraph>
      <Paragraph position="2"> Given a new sentence, the outcome of this search process is a tree structure that encodes both the syntactic and semantic structure of the sentence. The semantics - that is, the entities and relations - can then be directly extracted from these sentential trees.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML