XML Viewer - w06-0202

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0202_metho.xml
Size: 17,903 bytes
Last Modified: 2025-10-06 14:10:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0202">
  <Title>Comparing Information Extraction Pattern Models</Title>
  <Section position="4" start_page="12" end_page="13" type="metho">
    <SectionTitle>
2 Pattern Models
</SectionTitle>
    <Paragraph position="0"> In dependency analysis (Mel'Vcuk, 1987) the syntax of a sentence is represented by a set of directed binary links between a word (the head) and one of its modifiers. These links may be labelled to indicate the grammatical relation between the head and modifier (e.g. subject, object). In general cyclical paths are disallowed so that the analysis forms a tree structure. An example dependency analysis for the sentence &amp;quot;Acme Inc. hired Mr Smith as their new CEO, replacing Mr Bloggs.&amp;quot; is shown Figure 1.</Paragraph>
    <Paragraph position="1"> The remainder of this section outlines four models for representing extraction patterns which can be derived from dependency trees.</Paragraph>
    <Paragraph position="2"> Predicate-Argument Model (SVO): A simple approach, used by Yangarber (2003) and Stevenson and Greenwood (2005), is to use subject-verb-object tuples from the dependency parse as extraction patterns. These consist of a verb and its sub-ject and/or direct object1. An SVO pattern is extracted for each verb in a sentence. Figure 2 shows the two SVO patterns2 which are produced for the dependency tree shown in Figure 1.</Paragraph>
    <Paragraph position="3"> This model may be motivated by the assumption that many IE scenarios involve the extraction 1Yangarber et al. (2000) and Sudo et al. (2003) used a slightly extended version of this model in which the pattern also included certain phrases which referred to either the sub-ject or object.</Paragraph>
    <Paragraph position="4"> 2The formalism used for representing dependency patterns is similar to the one introduced by Sudo et al. (2003). Each node in the tree is represented in the format a[b/c] (e.g. subj[N/bomber]) where c is the lexical item (bomber), b its grammatical tag (N) and a the dependency relation between this node and its parent (subj). The relationship between nodes is represented as X(A+B+C) which indicates that nodes A,B and C are direct descendents of node X.</Paragraph>
    <Paragraph position="5"> of participants in specific events. For example, the MUC-6 (MUC, 1995) management succession scenario concerns the identification of individuals who are changing job. These events are often described using a simple predicate argument structure, e.g. &amp;quot;Acme Inc. fired Smith&amp;quot;. However, the SVO model cannot represent information described using other linguistic constructions such as nominalisations or prepositional phrases. For example, in the MUC6 texts it is common for job titles to be mentioned within prepositional phrases, e.g. &amp;quot;Smith joined Acme Inc. as CEO&amp;quot;.</Paragraph>
    <Paragraph position="6"> Chains: A pattern is defined as a path between a verb node and any other node in the dependency tree passing through zero or more intermediate nodes (Sudo et al., 2001). Figure 2 shows the eight chains which can be extracted from the tree in Figure 1.</Paragraph>
    <Paragraph position="7"> Chains provide a mechanism for encoding information beyond the direct arguments of predicates and includes areas of the dependency tree ignored by the SVO model. For example, they can represent information expressed as a nominalisation or within a prepositional phrase, e.g. &amp;quot;The resignation of Smith from the board of Acme ...&amp;quot; However, a potential shortcoming of this model is that it cannot represent the link between arguments of a verb. Patterns in the chain model format are unable to represent even the simplest of sentences containing a transitive verb, e.g. &amp;quot;Smith left Acme Inc.&amp;quot;.</Paragraph>
    <Paragraph position="8"> Linked Chains: The linked chains model (Greenwood et al., 2005) represents extraction patterns as a pair of chains which share the same verb but no direct descendants. This model generates 14 patterns for the verb hire in Figure 1, examples of which are shown in Figure 2. This pattern representation encodes most of the information in the sentence with the advantage of being able to link together event participants which neither of the SVO or chain model can, for example the relation between &amp;quot;Smith&amp;quot; and &amp;quot;Bloggs&amp;quot;. Subtrees: The final model to be considered is the subtree model (Sudo et al., 2003). In this model any subtree of a dependency tree can be used as an extraction pattern, where a subtree is any set of nodes in the tree which are connected to one another. Single nodes are not considered to be subtrees. The subtree model is a richer representation than those discussed so far and can represent any part of a dependency tree. Each of the previ- null ous models form a proper subset of the subtrees.</Paragraph>
    <Paragraph position="9"> By choosing an appropriate subtree it is possible to link together any pair of nodes in a tree and consequently this model can represent the relation between any set of items in the sentence.</Paragraph>
  </Section>
  <Section position="5" start_page="13" end_page="14" type="metho">
    <SectionTitle>
3 Pattern Enumeration and Complexity
</SectionTitle>
    <Paragraph position="0"> In addition to encoding different parts of the dependency analysis, each pattern model will also generate a different number of potential patterns.</Paragraph>
    <Paragraph position="1"> A dependency tree, T, can be viewed as a set of N connected nodes. Assume that V , such that V [?] N, is the set of nodes in the dependency tree labelled as a verb.</Paragraph>
    <Paragraph position="2">  Predicate-Argument Model (SVO): The number of SVO patterns extracted from T is: Nsvo (T) = |V  |(1) Chain Model: A chain can be created between any verb and a node it dominates (directly or indirectly). Now assume that d(v) denotes the count of a node v and all its descendents then the number of chains is given by:  Subtrees: Now assume that sub(n) is a function denoting the number of subtrees, including single nodes, rooted at node n. This can be defined recursively as follows:</Paragraph>
    <Paragraph position="4"> The dependency tree shown in Figure 1 generates 2, 8, 14 and 42 possible SVO, chain, linked chain and subtree patterns respectively. The number of SVO patterns is constant on the number of verbs in the tree. The number of chains is generally a linear function on the size of the tree but, in the worst case, can be polynomial. The linked chain model generates a polynomial number of patterns while the subtree model is exponential.</Paragraph>
    <Paragraph position="5"> There is a clear tradeoff between the complexity of pattern representations and the practicality of computation using them. Some pattern representations are more expressive, in terms of the amount of information from the dependency tree they make use of, than others (Section 2) and are therefore more likely to produce accurate extraction patterns. However, the more expressive models will add extra complexities during computation since a greater number of patterns will be generated. This complexity, both in the number of patterns produced and the computational effort required to produce them, limits the algorithms that can reasonably be applied to learn useful extraction patterns.</Paragraph>
    <Paragraph position="6"> For a pattern model to be suitable for an extraction task it needs to be expressive enough to encode enough information from the dependency parse to accurately identify the items which need to be extracted. However, we also aim for the  model to be as computationally tractable as possible. The ideal model will then be one with sufficient expressive power while at the same time not including extra information which would make its use less practical.</Paragraph>
  </Section>
  <Section position="6" start_page="14" end_page="16" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We carried out experiments to determine how suitable the pattern representations detailed in Section 2 are for encoding the information of interest to IE systems. We chose a set of IE corpora annotated with the information to be extracted (detailed in Section 4.1), generated sets of patterns using a variety of dependency parsers (Section 4.2) which were then examined to discover how much of the target information they contain (Section 4.3).</Paragraph>
    <Section position="1" start_page="14" end_page="14" type="sub_section">
      <SectionTitle>
4.1 Corpora
</SectionTitle>
      <Paragraph position="0"> Corpora representing different genres of text were chosen for these experiments; one containing newspaper text and another composed of biomedical abstracts. The first corpus consisted of Wall Street Journal texts from the Sixth Message Understanding Conference (MUC, 1995) IE evaluation. These are reliably annotated with details about the movement of executives between jobs.</Paragraph>
      <Paragraph position="1"> We make use of a version of the corpus produced by Soderland (1999) in which events described within a single sentence were annotated.</Paragraph>
      <Paragraph position="2"> Events in this corpus identify relations between up to four entities: PersonIn (the person starting a new job), PersonOut (person leaving a job), Post (the job title) and Organisation (the employer). These events were broken down into a set of binary relationships. For example, the sentence &amp;quot;Smith was recently made chairman of Acme.&amp;quot; contains information about the new employee (Smith), post (chairman) and organisation (Acme). Events are represented as a set of binary relationships, Smith-chairman, chairman-Acme and Smith-Acme for this example.</Paragraph>
      <Paragraph position="3"> The second corpus uses documents taken from the biomedical domain, specifically the training corpus used in the LLL-05 challenge task (N'edellec, 2005), and a pair of corpora (Craven and Kumlien, 1999) which were derived from the Yeast Proteome Database (YPD) (Hodges et al., 1999) and the Online Mendelian Inheritance in Man database (OMIM) (Hamosh et al., 2002).</Paragraph>
      <Paragraph position="4"> Each of these corpora are annotated with binary relations between pairs of entities. The LLL-05 corpora contains interactions between genes and proteins. For example the sentence &amp;quot;Expression of the sigma(K)-dependent cwlH gene depended on gerE&amp;quot; contains relations between sigma(K) and cwlH and between gerE and cwlH. The YPD corpus is concerned with the subcellular compartments in which particular yeast proteins localize. An example sentence &amp;quot;Uba2p is located largely in the nucleus&amp;quot; relates Uba2p and the nucleus. The relations in the OMIM corpora are between genes and diseases, for example &amp;quot;Most sporadic colorectal cancers also have two APC mutations&amp;quot; contains a relation between APC and colorectal cancer. null The MUC6 corpus contains a total of six possible binary relations. Each of the three biomedical corpora contain a single relation type, giving a total of nine binary relations for the experiments.</Paragraph>
      <Paragraph position="5"> There are 3911 instances of binary relations in all corpora.</Paragraph>
    </Section>
    <Section position="2" start_page="14" end_page="15" type="sub_section">
      <SectionTitle>
4.2 Generating Dependency Patterns
</SectionTitle>
      <Paragraph position="0"> Three dependency parsers were used for these experiments: MINIPAR3 (Lin, 1999), the Machinese Syntax4 parser from Connexor Oy (Tapanainen and J&amp;quot;arvinen, 1997) and the Stanford5 parser (Klein and Manning, 2003). These three parsers represent a cross-section of approaches to producing dependency analyses: MINIPAR uses a constituency grammar internally before converting the result to a dependency tree, Machinese Syntax uses a functional dependency grammar, and the Stanford Parser is a lexicalized probabilistic parser.</Paragraph>
      <Paragraph position="1"> Before these parsers were applied to the various corpora the named entities participating in relations are replaced by a token indicating their class. For example, in the MUC6 corpus &amp;quot;Acme hired Smith&amp;quot; would become &amp;quot;Organisation hired PersonIn&amp;quot;. Each parser was adapted to deal with these tokens correctly. The parsers were applied to each corpus and patterns extracted from the dependency trees generated.</Paragraph>
      <Paragraph position="2"> The analyses produced by the parsers were post-processed to make the most of the information they contain and ensure consistent structures from which patterns could be extracted. It was found  that the parsers were often unable to generate a dependency tree which included the whole sentence and instead generate an analysis consisting of sentence fragments represented as separate tree structures. Some fragments did not include a verb so no patterns could be extracted. To take account of this we allowed the root node of any tree fragment to take the place of a verb in a pattern (see Section 2). This leads to the generation of more chain and linked chain patterns but has no effect on the number of SVO patterns or subtrees.</Paragraph>
      <Paragraph position="3"> Table 1 shows the number of patterns generated from the dependency trees produced by each of the parsers. The number of subtrees generated from the MINIPAR parses is several orders of magnitude higher than the others because MINIPAR allows certain nodes to be the modifier of two separate nodes to deal with phenomena such as conjunction, anaphora and VP-coordination. For example, in the sentence &amp;quot;The bomb caused widespread damage and killed three people&amp;quot; the bomb is the subject of both the verbs cause and kill. We made use of this information by duplicating any nodes (and their descendants) with more than one head.6 Overall the figures in Table 1 are consistent with the analysis in Section 3 but there is great variation in the number of patterns produced by the different parsers. For example, the Stanford parser produces more chains and linked chains than the other parsers. (If we did not duplicate portions of the MINIPAR parses then the Stanford parser would also generate the most subtrees.) We found that the Stanford parser was the most likely to generate a single dependency tree for each sentence while the other two produced a set of tree fragments. A single dependency analysis contains a greater number of patterns, and possible subtrees, than a fragmented analysis. One reason for this may be that the Stanford parser is unique in allowing the use of an underspecified dependency relation, dep, which can be applied when the role of the dependency is unclear. This allows the Stan6One dependency tree produced by MINIPAR, expanded in this way, contained approximately 1x1064 subtrees. These are not included in the total number of subtrees for the MINIPAR parses shown in the table.</Paragraph>
      <Paragraph position="4"> ford parser to generate analyses which span more of the sentence than the other two.</Paragraph>
    </Section>
    <Section position="3" start_page="15" end_page="16" type="sub_section">
      <SectionTitle>
4.3 Evaluating Pattern Models
</SectionTitle>
      <Paragraph position="0"> Patterns from each of the four models are examined to check whether they cover the information which should be extracted. In this context &amp;quot;cover&amp;quot; means that the pattern contains both elements of the relation. For example, an SVO pattern extracted from the dependency parse of &amp;quot;Smith was recently made chairman of Acme.&amp;quot; would be [V/make](subj[N/Smith]+obj[N/chairman]) which covers the relation between Smith and chairman but not the relations between Smith and Acme or chairman and Acme. The coverage of each model is computed as the percentage of relations in the corpus for which at least one of the patterns contains both of the participating entities. Coverage is related to the more familiar IE evaluation metric of recall since the coverage of a pattern model places an upper bound on the recall of any system using that model. The aim of this work is to determine the proportion of the relations in a corpus that can be represented using the various pattern models rather than their performance in an IE system and, consequently, we choose to evaluate models in terms of their coverage rather than precision and recall.7 For practical applications parsers are required to generate the dependency analysis but these may not always provide a complete analysis for every sentence. The coverage of each model is influenced by the ability of the parser to produce a tree which connects the elements of the event to be extracted. To account for this we compute the coverage of each model relative to a particular parser. The subtree model covers all events whose entities are included in the dependency tree and, consequently, the coverage of this model represents the maximum number of events that the model can 7The subtree model can be used to cover any set of items in a dependency tree. So, given accurate dependency analyses, this model will cover all events. The coverage of the subtree model can be determined by checking if the elements of the event are connected in the dependency analysis of the sentence and, for simplicity, we chose to do this rather than enumerating all subtrees.</Paragraph>
      <Paragraph position="1">  represent for a given dependency tree. The coverage of other models relative to a dependency analysis can be computed by dividing the number of events it covers by the number covered by the sub-tree model (i.e. the maximum which can be covered). This measure is refered to as the bounded coverage of the model. Bounded coverage for the subtree model is always 100%.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML