XML Viewer - p06-2074

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2074_metho.xml
Size: 18,341 bytes
Last Modified: 2025-10-06 14:10:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2074">
  <Title>ARE: Instance Splitting Strategies for Dependency Relation-based Information Extraction</Title>
  <Section position="5" start_page="571" end_page="572" type="metho">
    <SectionTitle>
(LP)
</SectionTitle>
    <Paragraph position="0"> is developed for semi-structured textual domain, where we can find consistent lexical patterns at surface text level. This is not the same for freetext, in which different order of words or an extra clause in a sentence may cause paraphrasing and alignment problems respectively, such as the example excerpts &amp;quot;terrorists attacked peasants&amp;quot; and &amp;quot;peasants were attacked 2 months ago by terrorists&amp;quot;. The classification-based approaches such as by Chieu and Ng (2002) tend to outperform rule-based approaches. However, Ciravegna (2001) argued that it is difficult to examine the result obtained by classifiers. Thus, interpretability of the learned knowledge is a serious bottleneck of the classification approach. Additionally, Zhou and Su (2002) trained classifiers for Named Entity extraction and reported that performance degrades rapidly if the training corpus size is below 100KB. It implies that human experts have to spend long hours to annotate a sufficiently large amount of training corpus.</Paragraph>
    <Paragraph position="1"> Several recent researches focused on the extraction of relationships using classifiers. Roth and Yih (2002) learned the entities and relations together. The joint learning improves the performance of NE recognition in cases such as &amp;quot;X killed Y&amp;quot;. It also prevents the propagation of mistakes in NE extraction to the extraction of relations. However, long distance relations between entities are likely to cause mistakes in relation extraction. A possible approach for modeling relations of different complexity is the use of dependency-based kernel trees in support vector machines by Culotta and Sorensen (2004). The authors reported that non-relation instances are very heterogeneous, and  hence they suggested the additional step of extracting candidate relations before classification.</Paragraph>
  </Section>
  <Section position="6" start_page="572" end_page="574" type="metho">
    <SectionTitle>
3 Our approach
</SectionTitle>
    <Paragraph position="0"> Differing from previous systems, the language model in ARE is based on dependency relations obtained from Minipar by Lin (1997). In the first stage, ARE tries to identify possible candidates for filling slots in a sentence. For example, words such as 'terrorist' or 'guerrilla' can fill the slot for Perpetrator in the terrorism domain. We refer to these candidates as anchors or anchor cues. In the second stage, ARE defines the dependency relations that connect anchor cues. We exploit dependency relations to provide more invariant structures for similar sentences with different syntactic structures.</Paragraph>
    <Paragraph position="1"> After extracting the possible relations between anchor cues, we form several possible parsing paths and rank them. Based on the ranking, we choose the optimal filling of slots.</Paragraph>
    <Paragraph position="2"> Ranking strategy may be unnecessary in cases when entities are represented in the SVO form.</Paragraph>
    <Paragraph position="3"> Ranking strategy may also fail in situations of long distance relations. To handle such problems, we categorize the sentences into 3 categories of: simple, average and hard, depending on the complexity of the dependency relations. We then apply different strategies to tackle sentences in each category effectively. The following subsections discuss details of our approach.</Paragraph>
    <Paragraph position="4">  Every token in ARE may be represented at a different level of representations, including: Lexical, Part-of-Speech, Named Entities, Synonyms and Concept classes. The synonym set and concept classes are mainly obtained from Wordnet. We use NLProcessor from Infogistics Ltd for the extraction of part-of-speech, noun phrases and verb phrases (we refer to them as phrases). Named Entities are extracted with the program used in Yang et al.</Paragraph>
    <Paragraph position="5"> (2003). Additionally, we employed the co-reference module for the extraction of meaningful pronouns. It is used for linking entities across clauses or sentences, for example in &amp;quot;John works in XYZ Corp. He was appointed as a vice-president a month ago&amp;quot; and could achieve an accuracy of 62%. After preprocessing and feature extraction, we obtain the linguistic features in Table 1.</Paragraph>
    <Section position="1" start_page="572" end_page="573" type="sub_section">
      <SectionTitle>
3.1 Mining of anchor cues
</SectionTitle>
      <Paragraph position="0"> In order to extract possible anchors and relations from every sentence, we need to select features to support the generalization of words. This generalization may be different for different classes of words. For example, person names may be generalized as a Named Entity PERSON, whereas for 'murder' and 'assassinate', the optimal generalization would be the concept class 'kill' in the Word-Net hypernym tree. To support several generalizations, we need to store multiple representations of every word or token.</Paragraph>
      <Paragraph position="1"> Mining of anchor cues or anchors is crucial in order to unify meaningful entities in a sentence, for example words 'terrorists', 'individuals' and 'soldiers' from Table 1. In the terrorism domain, we consider 4 types of anchor cues: Perpetrator, Action, Victim, and Target of destruction. For management succession domain, we have 6 types: Post, Person In, Person Out, Action and Organization. Each set of anchor cues may be seen as a pre-defined semantic type where the tokens are mined automatically. The anchor cues are further classified into two categories: general type A and action type D.</Paragraph>
      <Paragraph position="2"> Action type anchor cues are those with verbs or verb phrases describing a particular action or movement. General type encompasses any predefined type that does not fall under the action type cues.</Paragraph>
      <Paragraph position="3"> In the first stage, we need to extract anchor cues for every type. Let P be an input phrase, and A j be the anchor of type j that we want to match.</Paragraph>
      <Paragraph position="4"> The similarity score of P for A j in sentence S is given by:</Paragraph>
      <Paragraph position="6"> is the importance weight for A j . In order to extract the score function, we use entities from slots in the training instances. Each S_XXX  ,..., ddd = are learned automatically using Expectation Maximization by Dempster et al. (1977). Using anchors from training instances as ground truth, we iteratively input different sets of weights into EM to maximize the overall score.  Consider the excerpts &amp;quot;Terrorists attacked victims&amp;quot;, &amp;quot;Peasants were murdered by unidentified individuals&amp;quot; and &amp;quot;Soldiers participated in massacre of Jesuit priests&amp;quot;. Let W i denotes the position of token i in the instances. After mining of anchors, we are able to extract meaningful anchor cues in these sentences as shown in Table 2:</Paragraph>
    </Section>
    <Section position="2" start_page="573" end_page="574" type="sub_section">
      <SectionTitle>
3.2 Relationship extraction and ranking
</SectionTitle>
      <Paragraph position="0"> In the next stage, we need to find meaningful relations to unify instances using the anchor cues. This unification is done using dependency trees of sentences. The dependency relations for the first sentence are given in Figure 1.</Paragraph>
      <Paragraph position="1"> From the dependency tree, we need to identify the SVO relations between anchor cues. In cases when there are multiple relations linking many potential subjects, verbs or objects, we need to select the best relations under the circumstances. Our scheme for relation ranking is as follows. First, we rank each single relation individually based on the probability that it appears in the respective context template slot in the training data. We use the following formula to capture the quality of a relation Rel which gives higher weight to more frequently occurring relations:  of the set X.</Paragraph>
      <Paragraph position="2"> Second, we need to take into account the entity height in the dependency tree. We calculate height as a distance to the root node. Our intuition is that the nodes on the higher level of dependency tree are more important, because they may be linked to more nodes or entities. The following example in  Here, the node 'terrorists' is the most representative in the whole tree, and thus relations nearer to 'terrorists' should have higher weight. Therefore, we give a slightly higher weight to the links that are closer to the root node as follows:</Paragraph>
      <Paragraph position="4"> where Const is set to be larger than the depth of nodes in the tree.</Paragraph>
      <Paragraph position="5"> Third, we need to calculate the score of rela-</Paragraph>
      <Paragraph position="7"> belong to different anchor cue types. The path score of R i-&gt;j depends on both quality and height of participating relations:</Paragraph>
      <Paragraph position="9"> . The formula (5) tends to give higher scores to shorter paths. Therefore, the path ending with 'terrorist' will be preferred in the previous example to the equivalent path ending with 'MRTA'.</Paragraph>
      <Paragraph position="10"> Finally, we need to find optimal filling of a template T. Let C = {C</Paragraph>
      <Paragraph position="12"> } be the set of slot types in T and A = {A</Paragraph>
      <Paragraph position="14"> } be the set of extracted anchors. First, we regroup anchors A according to their respective types. Let  where K is number of slot types and M denotes the number of relation paths between anchors in F</Paragraph>
      <Paragraph position="16"> ), it is used for ranking all possible template fillings. The next step is to join entity and relation scores. We defined the entity score of F</Paragraph>
      <Paragraph position="18"> as an average of the scores of participating anchors:</Paragraph>
      <Paragraph position="20"> We combine entity and relation scores of F</Paragraph>
      <Paragraph position="22"> The first 2 instances are unified correctly. The only exception is the slot in the third case, which is missing because the target is not an object of 'participated'.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="574" end_page="576" type="metho">
    <SectionTitle>
4 Category Splitting
</SectionTitle>
    <Paragraph position="0"> Through our experiments, we found that the combination of relations and anchors are essential for improving IE performance. However, relations alone are not applicable across all situations because of long distance relations and possible dependency relation parsing errors, especially for long sentences. Since the relations in long sentences are often complicated, parsing errors are very difficult to avoid. Furthermore, application of dependency relations on long sentences may lead to incorrect extractions and decrease the performance.</Paragraph>
    <Paragraph position="1"> Through the analysis of instances, we noticed that dependency trees have different complexity for different sentences. Therefore, we decided to classify sentences into 3 categories based on the complexity of dependency relations between the action cues (V) and the likely subject (S) and object cues (O). Category 1 is when the potential SVO's are connected directly to each other (simple category); Category 2 is when S or O is one link away from V in terms of nouns or verbs (average category); and Category 3 is when the path distances between po- null parse trees for the simple and average categories respectively derived from the sentences: &amp;quot;50 peasants of have been kidnapped by terrorists&amp;quot; and &amp;quot;a colonel was involved in the massacre of the Jesuits&amp;quot;. These trees represent 2 common structures in the MUC4 domain. By taking advantage of this commonality, we can further improve the performance of extraction. We notice that in the simple category, the perpetrator cue ('terrorists') is always a subject, action cue ('kidnapped') a verb, and victim cue ('peasants') an object. For the average category, perpetrator and victim commonly appear under 3 relations: subject, object and pcomp-n. The most difficult category is the hard category, since in this category relations can be distant. We thus primarily rely on anchors for extraction and have to give less importance to dependency parsing.</Paragraph>
    <Paragraph position="2"> In order to process the different categories, we utilize the specific strategies for each category. As an example, the instance &amp;quot;X murdered Y&amp;quot; requires only the analysis of the context verb 'murdered' in the simple category. It is different from the instances &amp;quot;X investigated murder of Y&amp;quot; and &amp;quot;X conducted murder of Y&amp;quot; in the average category, in which transition of word 'investigated' into 'conducted' makes X a perpetrator. We refer to the anchor 'murder' in the first and second instances as promotable and non-promotable respectively. Additionally, we denote that the token 'conducted' is the optimal node for promotion of 'murder', whereas the anchor 'investigate' is not. This example illustrates the importance of support verb analysis specifically for the average category.</Paragraph>
    <Paragraph position="3">  The main steps of our algorithm for performing IE in different categories are given in Figure 5. Although some steps are common for every category, the processing strategies are different.</Paragraph>
    <Paragraph position="4"> Simple category For simple category, we reorder tokens according to their slot types. Based on this reordering, we fill the template.</Paragraph>
    <Paragraph position="5">  1) Perform token reordering based on anchors 2) Use linguistic+ syntactic + semantic feature of the head noun. Eg. Caps, 'subj', etc 3) Find the optimal linking node for action anchor in every F i 4) Find the filling F</Paragraph>
    <Paragraph position="7"> for filling the template if Rank  Average category For average category, our strategy consists of 4 steps. First, in the case of missing anchor type we try to find it in the nearest previous sentence. Consider an example from MUC-6: &amp;quot;Look at what happened to John Sculley, Apple Computer's former chairman. Earlier this month he abruptly resigned as chairman of troubled Spectrum Information Technologies.&amp;quot; In this example, a noisy cue 'he' needs to be substituted with &amp;quot;John Sculley&amp;quot;, which is a strong anchor cue. Second, we need to find an optimal promotion of a support verb. For example, in &amp;quot;X conducted murder of Y&amp;quot;, the verb 'murder' should be linked with X and in the excerpt &amp;quot;X investigated murder of Y&amp;quot;, it should not be promoted. Thus, we need to make 2 steps for promotion: (a) calculate importance of every word connecting the action cue such as 'murder' and 'distributed' and (b) find the optimal promotion for the word 'murder'.</Paragraph>
    <Paragraph position="8"> Third, using the predefined threshold l we cutoff the instances with irrelevant support verbs (e.g., 'investigated'). Fourth, we reorder the tokens in order to group them according to the anchor types.</Paragraph>
    <Paragraph position="9"> The following algorithm in Figure 6 estimates the importance of a token W for type D in the support verb structure. The input of the algorithm consists of sentences S  and V pos are automatically tagged as irrelevant and relevant respectively based on preliminary marked keys in the training instances. The algorithm output represents the importance value between 0 to 1.</Paragraph>
    <Paragraph position="10"> Figure 6. Evaluation of word importance We use the linguistic features for W and D as given in Table 1 to form the instances.</Paragraph>
    <Paragraph position="11"> Hard category In the hard category, we have to deal with long-distance relations: at least 2 anchors are more than 2 links away in the dependency tree. Consequently, dependency tree alone is not reliable for connecting nodes. To find an optimal connection, we primarily rely on comparison between several possible fillings of slots based on previously extracted anchor cues. Depending on the results of such comparison, we chose the filling that has the highest score. As an example, consider the hard category in the excerpt &amp;quot;MRTA today distributed leaflets claiming responsibility for the murder of former defense minister Enrique Lopez Albujar&amp;quot;. The dependency tree for this instance is given in Figure 7.</Paragraph>
    <Paragraph position="12"> Although words 'MRTA', 'murder' and 'minister' might be correctly extracted as anchors, the challenging problem is to decide whether 'MRTA' is a perpetrator. Anchors 'MRTA' and 'minister' are connected via the verb 'distributed'. However, the word 'murder' belongs to another branch of this  Processing of such categories is challenging.</Paragraph>
    <Paragraph position="13"> Since relations are not reliable, we first need to rely on the anchor extraction stage. Nevertheless, the promotion strategy for the anchor cue 'murder' is still possible, although the corresponding branch in the dependency tree is long. Henceforth, we try to replace the verb 'distributed' by promoting the anchor 'murder'. To do so, we need to evaluate whether the nodes in between may be eliminated.</Paragraph>
    <Paragraph position="14"> For example, such elimination is possible in the pairs 'conducted' -&gt; 'murder' and not possible in the pair 'investigated' -&gt; 'murder', since in the excerpt &amp;quot;X investigated murder&amp;quot; X is not a perpetrator. If the elimination is possible, we apply the promotion algorithm given on Figure 8:  are added to the set Z. Finally, the top node of the set Z is chosen as an optimal node for the promotion. The example optimal node for promotion of the word 'murder' on Figure 7 is the node 'distributed'.</Paragraph>
    <Paragraph position="15"> Another important difference between the hard and average cases is in the calculation of Rank</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML