File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1052_metho.xml

Size: 20,128 bytes

Last Modified: 2025-10-06 14:09:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1052">
  <Title>Extracting Relations with Integrated Information Using Kernel Methods</Title>
  <Section position="5" start_page="420" end_page="420" type="metho">
    <SectionTitle>
4 Kernel Relation Detection
4.1 ACE Relation Detection Task
ACE (Automatic Content Extraction)
</SectionTitle>
    <Paragraph position="0"> is a research and development program in information extraction sponsored by the U.S. Government. The 2004 evaluation defined seven major types of relations between seven types of entities. The entity types are PER (Person), ORG (Organization), FAC (Facility), GPE (Geo-Political Entity: countries, cities, etc.), LOC (Location), WEA (Weapon) and VEH (Vehicle). Each mention of an entity has a mention type: NAM (proper name), NOM (nominal) or  Kambhatla also evaluated his system on the ACE relation detection task, but the results are reported for the 2003 task, which used different relations and different training and test data, and did not use hand-annotated entities, so they cannot be readily compared to our results.</Paragraph>
  </Section>
  <Section position="6" start_page="420" end_page="420" type="metho">
    <SectionTitle>
ART (Agent-Artifact) and DISC (Discourse).
</SectionTitle>
    <Paragraph position="0"> There are also 27 relation subtypes defined by ACE, but this paper only focuses on detection of relation types. Table 1 lists examples of each relation type.</Paragraph>
    <Paragraph position="1">  heads of the two entity arguments in a relation are marked. Types are listed in decreasing order of frequency of occurrence in the ACE corpus. Figure 1 shows a sample newswire sentence, in which three relations are marked. In this sentence, we expect to find a PHYS relation between Hezbollah forces and areas, a PHYS relation between Syrian troops and areas and an EMP-ORG relation between Syrian troops and Syrian. In our approach, input text is preprocessed by the Charniak sentence parser (including tokenization and POS tagging) and the GLARF (Meyers et al., 2001) dependency analyzer produced by NYU. Based on treebank parsing, GLARF produces labeled deep dependencies between words (syntactic relations such as logical subject and logical object). It handles linguistic phenomena like passives, relatives, reduced relatives, conjunctions, etc.</Paragraph>
    <Section position="1" start_page="420" end_page="420" type="sub_section">
      <SectionTitle>
4.2 Definitions
</SectionTitle>
      <Paragraph position="0"> In our model, kernels incorporate information from</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="420" end_page="423" type="metho">
    <SectionTitle>
PHYS
PHYS EMP-ORG
</SectionTitle>
    <Paragraph position="0"> That's because Israel was expected to retaliate against Hezbollah forces in areas controlled by Syrian troops.</Paragraph>
    <Paragraph position="1">  tokenization, parsing and deep dependency analysis. A relation candidate R is defined as</Paragraph>
    <Paragraph position="3"> are the two entity arguments</Paragraph>
    <Paragraph position="5"> ) is a token vector that covers the arguments and intervening</Paragraph>
    <Paragraph position="7"> ) is also a token vector, generated from seq and the parse tree; path is a dependency path connecting arg  and arg  in the dependency graph produced by GLARF. path can be empty if no such dependency path exists. The difference between link and seq is that link only retains the &amp;quot;important&amp;quot; words in seq in terms of syntax. For example, all noun phrases occurring in seq are replaced by their heads. Words and constituent types in a stop list, such as time expressions, are also removed.</Paragraph>
    <Paragraph position="8"> A token T is defined as a string triple,</Paragraph>
    <Paragraph position="10"> where word, pos and base are strings representing the word, part-of-speech and morphological base form of T. Entity is a token augmented with other</Paragraph>
    <Paragraph position="12"> where tk is the token associated with E; type, sub-type and mtype are strings representing the entity type, subtype and mention type of E. The subtype contains more specific information about an entity.</Paragraph>
    <Paragraph position="13"> For example, for a GPE entity, the subtype tells whether it is a country name, city name and so on.</Paragraph>
    <Paragraph position="14"> Mention type includes NAM, NOM and PRO.</Paragraph>
    <Paragraph position="15"> It is worth pointing out that we always treat an entity as a single token: for a nominal, it refers to its head, such as boys in the two boys; for a proper name, all the words are connected into one token, such as Bashar_Assad. So in a relation example R</Paragraph>
    <Paragraph position="17"> . For names, the base form of an entity is its ACE type (person, organization, etc.). To introduce dependencies, we define a dependency token to be a token augmented with a vector of dependency arcs, DT=(word, pos, base, dseq), where dseq = (arc</Paragraph>
    <Paragraph position="19"> where w is the current token; dw is a token connected by a dependency to w; and label and e are the role label and direction of this dependency arc respectively. From now on we upgrade the type of</Paragraph>
    <Paragraph position="21"> to be dependency tokens. Finally, path is a vector of dependency arcs,</Paragraph>
    <Paragraph position="23"> where l is the length of the path and arc  .tk. So path is a chain of dependencies connecting the two arguments in R. The arcs in it do not have to be in the same direction.  link sequence is generated from seq by removing some unimportant words based on syntax. The dependency links are generated by GLARF. Figure 2 shows a relation example generated from the text &amp;quot;... in areas controlled by Syrian troops&amp;quot;. In this relation example R, arg  0), (SBJ, troops, controlled, 1)). path is ((OBJ, areas, controlled, 1), (SBJ, controlled, troops, 0)). The value 0 in a dependency arc indicates forward direction from w to dw, and 1 indicates backward direction. The seq and link sequences of R are shown in Figure 2.</Paragraph>
    <Paragraph position="24"> Some relations occur only between very restricted types of entities, but this is not true for every type of relation. For example, PER-SOC is a relation mainly between two person entities, while PHYS can happen between any type of entity and a GPE or LOC entity.</Paragraph>
    <Section position="1" start_page="421" end_page="422" type="sub_section">
      <SectionTitle>
4.3 Syntactic Kernels
</SectionTitle>
      <Paragraph position="0"> In this section we will describe the kernels designed for different syntactic sources and explain the intuition behind them.</Paragraph>
      <Paragraph position="1"> We define two kernels to match relation examples at surface level. Using the notation just defined, we can write the two surface kernels as follows:</Paragraph>
      <Paragraph position="3"> is a kernel that matches two tokens. I(x, y) is a binary string match operator that gives 1 if x=y and 0 otherwise. Kernel Ps  matches attributes of two entity arguments respectively, such as type, subtype and lexical head of an entity. This is based on the observation that there are type constraints on the two arguments. For instance PER-SOC is a relation mostly between two person entities. So the attributes of the entities are crucial clues. Lexical information is also important to distinguish relation types. For instance, in the phrase U.S. president there is an EMP-ORG relation between president and U.S., while in a U.S. businessman there is a  is a kernel that simply matches unigrams and bigrams between the seq sequences of two relation examples. The information this kernel provides is faithful to the text.</Paragraph>
      <Paragraph position="4"> 3) Link sequence kernel where min_len is the length of the shorter link se- null is a kernel that matches token by token between the link sequences of two relation examples. Since relations often occur in a short context, we expect many of them have similar link sequences.</Paragraph>
      <Paragraph position="6"> Intuitively the dependency path connecting two arguments could provide a high level of syntactic regularization. However, a complete match of two dependency paths is rare. So this kernel matches the component arcs in two dependency paths in a pairwise fashion. Two arcs can match only when they are in the same direction. In cases where two paths do not match exactly, this kernel can still tell us how similar they are. In our experiments we placed an upper bound on the length of dependency paths for which we computed a non-zero ker-</Paragraph>
      <Paragraph position="8"> This kernel matches the local dependency context around the relation arguments. This can be helpful especially when the dependency path between arguments does not exist. We also hope the dependencies on each argument may provide some useful clues about the entity or connection of the entity to the context outside of the relation example.</Paragraph>
    </Section>
    <Section position="2" start_page="422" end_page="423" type="sub_section">
      <SectionTitle>
4.4 Composite Kernels
</SectionTitle>
      <Paragraph position="0"> Having defined all the kernels representing shallow and deep processing results, we can define composite kernels to combine and extend the individual kernels.</Paragraph>
      <Paragraph position="1">  covers the most important clues for this task: information about the two arguments and the word link between them. The polynomial extension is equivalent to adding pairs of features as  new features. Intuitively this introduces new features like: the subtype of the first argument is a country name and the word of the second argument is president, which could be a good clue for an EMP-ORG relation. The polynomial kernel is down weighted by a normalization factor because we do not want the high order features to overwhelm the original ones. In our experiment, using polynomial kernels with degree higher than 2 does not produce better results.</Paragraph>
      <Paragraph position="2"> 2) Full kernel This is the final kernel we used for this task, which is a combination of all the previous kernels. In our experiments, we set all the scalar factors to 1. Different values were tried, but keeping the original weight for each kernel yielded the best results for this task.</Paragraph>
      <Paragraph position="3"> All the individual kernels we designed are explicit. Each kernel can be seen as a matching of features and these features are enumerable on the given data. So it is clear that they are all valid kernels. Since the kernel function set is closed under linear combination and polynomial extension, the composite kernels are also valid. The reason we propose to use a feature-based kernel is that we can have a clear idea of what syntactic clues it represents and what kind of information it misses. This is important when developing or refining kernels, so that we can make them generate complementary information from different syntactic processing results.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="423" end_page="2541212" type="metho">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Experiments were carried out on the ACE RDR (Relation Detection and Recognition) task using hand-annotated entities, provided as part of the ACE evaluation. The ACE corpora contain documents from two sources: newswire (nwire) documents and broadcast news transcripts (bnews). In this section we will compare performance of different kernel setups trained with SVM, as well as different classifiers, KNN and SVM, with the same kernel setup. The SVM package we used is SVM light . The training parameters were chosen using cross-validation. One-against-all classification was applied to each pair of entities in a sentence. When SVM predictions conflict on a relation example, the one with larger margin will be selected as the final answer.</Paragraph>
    <Section position="1" start_page="423" end_page="423" type="sub_section">
      <SectionTitle>
5.1 Corpus
</SectionTitle>
      <Paragraph position="0"> The ACE RDR training data contains 348 documents, 125K words and 4400 relations. It consists of both nwire and bnews documents. Evaluation of kernels was done on the training data using 5-fold cross-validation. We also evaluated the full kernel setup with SVM on the official test data, which is about half the size of the training data. All the data is preprocessed by the Charniak parser and GLARF dependency analyzer. Then relation examples are generated based these results.</Paragraph>
    </Section>
    <Section position="2" start_page="423" end_page="2541212" type="sub_section">
      <SectionTitle>
5.2 Results
</SectionTitle>
      <Paragraph position="0"> Table 2 shows the performance of the SVM on different kernel setups. The kernel setups in this experiment are incremental. From this table we can see that adding kernels continuously improves the performance, which indicates they provide additional clues to the previous setup. The argument kernel treats the two arguments as independent entities. The link sequence kernel introduces the syntactic connection between arguments, so adding it to the argument kernel boosted the performance. Setup F shows the performance of adding only dependency kernels to the argument kernel. The performance is not as good as setup B, indicating that dependency information alone is not as crucial as the link sequence.</Paragraph>
      <Paragraph position="1">  setups. Each setup adds one level of kernels to the previous one except setup F. Evaluated on the ACE training data with 5-fold cross-validation. F-scores marked by * are significantly better than the previous setup (at 95% confidence level).</Paragraph>
      <Paragraph position="2">  Another observation is that adding the bigram kernel in the presence of all other level of kernels improved both precision and recall, indicating that it helped in both correcting errors in other processing results and providing supplementary information missed by other levels of analysis. In another experiment evaluated on the nwire data only (about half of the training data), adding the bigram kernel improved F-score 0.5% and this improvement is statistically significant.</Paragraph>
      <Paragraph position="3">  different kernel setups. Types are ordered in decreasing order of frequency of occurrence in the ACE corpus. In SVM training, the same parameters were used for all 7 types.</Paragraph>
      <Paragraph position="4"> Table 3 shows the performance of SVM and KNN (k Nearest Neighbors) on different kernel setups. For KNN, k was set to 3. In the first setup of KNN, the two kernels which seem to contain most of the important information are used. It performs quite well when compared with the SVM result. The other two tests are based on the full kernel setup. For the two KNN experiments, adding more kernels (features) does not help. The reason might be that all kernels (features) were weighted equally in the composite kernel Ph  and this may not be optimal for KNN. Another reason is that the polynomial extension of kernels does not have any benefit in KNN because it is a monotonic transformation of similarity values. So the results of KNN on kernel (Ps  would be exactly the same. We also tried different k for KNN and k=3 seems to be the best choice in either case. For the four major types of relations SVM does better than KNN, probably due to SVM's generalization ability in the presence of large numbers of features. For the last three types with many fewer examples, performance of SVM is not as good as KNN. The reason we think is that training of SVM on these types is not sufficient. We tried different training parameters for the types with fewer examples, but no dramatic improvement obtained.</Paragraph>
      <Paragraph position="5"> We also evaluated our approach on the official ACE RDR test data and obtained very competitive scores.</Paragraph>
      <Paragraph position="6">  The primary scoring metric  for the ACE evaluation is a 'value' score, which is computed by deducting from 100 a penalty for each missing and spurious relation; the penalty depends on the types of the arguments to the relation. The value scores produced by the ACE scorer for nwire and bnews test data are 71.7 and 68.0 repectively. The value score on all data is 70.1.</Paragraph>
      <Paragraph position="7">  The scorer also reports an F-score based on full or partial match of relations to the keys. The unweighted F-score for this test produced by the ACE scorer on all data is 76.0%. For this evaluation we used nearest neighbor to determine argument ordering and relation subtypes.</Paragraph>
      <Paragraph position="8"> The classification scheme in our experiments is one-against-all. It turned out there is not so much confusion between relation types. The confusion matrix of predictions is fairly clean. We also tried pairwise classification, and it did not help much.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="2541212" end_page="2541212" type="metho">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> In this paper, we have shown that using kernels to combine information from different syntactic sources performed well on the entity relation detection task. Our experiments show that each level of syntactic processing contains useful information for the task. Combining them may provide complementary information to overcome errors arising from linguistic analysis. Especially, low level information obtained with high reliability helped with the other deep processing results. This design feature of our approach should be best employed when the preprocessing errors at each level are independent, namely when there is no dependency between the preprocessing modules.</Paragraph>
    <Paragraph position="1"> The model was tested on text with annotated entities, but its design is generic. It can work with  As ACE participants, we are bound by the participation agreement not to disclose other sites' scores, so no direct comparison can be provided.</Paragraph>
    <Paragraph position="2">  No comparable inter-annotator agreement scores are available for this task, with pre-defined entities. However, the agreement scores across multiple sites for similar relation tagging tasks done in early 2005, using the value metric, ranged from about 0.70 to 0.80.</Paragraph>
    <Paragraph position="3">  noisy entity detection input from an automatic tagger. With all the existing information from other processing levels, this model can be also expected to recover from errors in entity tagging.</Paragraph>
  </Section>
  <Section position="10" start_page="2541212" end_page="2541212" type="metho">
    <SectionTitle>
7 Further Work
</SectionTitle>
    <Paragraph position="0"> Kernel functions have many nice properties. There are also many well known kernels, such as radial basis kernels, which have proven successful in other areas. In the work described here, only linear combinations and polynomial extensions of kernels have been evaluated. We can explore other kernel properties to integrate the existing syntactic kernels. In another direction, training data is often sparse for IE tasks. String matching is not sufficient to capture semantic similarity of words.</Paragraph>
    <Paragraph position="1"> One solution is to use general purpose corpora to create clusters of similar words; another option is to use available resources like WordNet. These word similarities can be readily incorporated into the kernel framework. To deal with sparse data, we can also use deeper text analysis to capture more regularities from the data. Such analysis may be based on newly-annotated corpora like PropBank (Kingsbury and Palmer, 2002) at the University of Pennsylvania and NomBank (Meyers et al., 2004) at New York University. Analyzers based on these resources can generate regularized semantic representations for lexically or syntactically related sentence structures. Although deeper analysis may even be less accurate, our framework is designed to handle this and still obtain some improvement in performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML