XML Viewer - p06-1017

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1017_intro.xml
Size: 10,393 bytes
Last Modified: 2025-10-06 14:03:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1017">
  <Title>Relation Extraction Using Label Propagation Based Semi-supervised Learning</Title>
  <Section position="3" start_page="130" end_page="133" type="intro">
    <SectionTitle>
3 Experiments and Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="130" end_page="131" type="sub_section">
      <SectionTitle>
3.1 Feature Set
</SectionTitle>
      <Paragraph position="0"> Following (Zhang, 2004), we used lexical and syntactic features in the contexts of entity pairs, which are extracted and computed from the parse trees derived from Charniak Parser (Charniak, 1999) and the Chunklink script 2 written by Sabine Buchholz from Tilburg University.</Paragraph>
      <Paragraph position="1"> Words: Surface tokens of the two entities and words in the three contexts.</Paragraph>
      <Paragraph position="2"> Entity Type: the entity type of both entity mentions, which can be PERSON, ORGANIZA-TION, FACILITY, LOCATION and GPE.</Paragraph>
      <Paragraph position="3"> POS features: Part-Of-Speech tags corresponding to all tokens in the two entities and words in the three contexts.</Paragraph>
      <Paragraph position="4"> Chunking features: This category of features are extracted from the chunklink representation,  which includes: * Chunk tag information of the two entities and words in the three contexts. The &amp;quot;0&amp;quot; tag means that the word is not in any chunk. The &amp;quot;I-XP&amp;quot; tag means that this word is inside an XP chunk. The &amp;quot;B-XP&amp;quot; by default means that the word is at the beginning of an XP chunk.</Paragraph>
      <Paragraph position="5"> * Grammatical function of the two entities and words in the three contexts. The  last word in each chunk is its head, and the function of the head is the function of the whole chunk. &amp;quot;NP-SBJ&amp;quot; means a NP chunk as the subject of the sentence. The other words in a chunk that are not the head have &amp;quot;NOFUNC&amp;quot; as their function. * IOB-chains of the heads of the two entities. So-called IOB-chain, noting the syntactic categories of all the constituents on the path from the root node to this leaf node of tree.</Paragraph>
      <Paragraph position="6"> The position information is also specified in the description of each feature above. For example, word features with position information include:  1) WE1 (WE2): all words in e1 (e2) 2) WHE1 (WHE2): head word of e1 (e2) 3) WMNULL: no words in Cmid 4) WMFL: the only word in Cmid 5) WMF, WML, WM2, WM3, ...: first word, last  word, second word, third word, ...in Cmid when at least two words in Cmid</Paragraph>
      <Paragraph position="8"> We combine the above lexical and syntactic features with their position information in the contexts to form context vectors. Before that, we filter out low frequency features which appeared only once in the dataset.</Paragraph>
    </Section>
    <Section position="2" start_page="131" end_page="131" type="sub_section">
      <SectionTitle>
3.2 Similarity Measures
</SectionTitle>
      <Paragraph position="0"> The similarity sij between two occurrences of entity pairs is important to the performance of the LP algorithm. In this paper, we investigated two similarity measures, cosine similarity measure and Jensen-Shannon (JS) divergence (Lin, 1991). Cosine similarity is commonly used semantic distance, which measures the angle between two feature vectors. JS divergence has ever been used as distance measure for document clustering, which outperforms cosine similarity based document clustering (Slonim et al., 2002). JS divergence measures the distance between two probability distributions if feature vector is considered as probability distribution over features. JS divergence is defined as follows:</Paragraph>
      <Paragraph position="2"> where -p = 12(q + r) and JS(q,r) represents JS divergence between probability distribution q(y) and r(y) (y is a random variable), which is defined in terms of KL-divergence.</Paragraph>
    </Section>
    <Section position="3" start_page="131" end_page="133" type="sub_section">
      <SectionTitle>
3.3 Experimental Evaluation
3.3.1 Experiment Setup
</SectionTitle>
      <Paragraph position="0"> We evaluated this label propagation based relation extraction method for relation subtype detection and characterization task on the official ACE 2003 corpus. It contains 519 files from sources including broadcast, newswire, and newspaper. We dealt with only intra-sentence explicit relations and assumed that all entities have been detected beforehand in the EDT sub-task of ACE. Table 1 lists the types and subtypes of relations for the ACE Relation Detection and Characterization (RDC) task, along with their  frequency of occurrence in the ACE training set and test set. We constructed labeled data by randomly sampling some examples from ACE training data and additionally sampling examples with the same size from the pool of unrelated entity pairs for the &amp;quot;NONE&amp;quot; class. We used the remaining examples in the ACE training set and the whole ACE test set as unlabeled data. The testing set was used for final evaluation.</Paragraph>
      <Paragraph position="1"> 3.3.2 LP vs. SVM Support Vector Machine (SVM) is a state of the art technique for relation extraction task. In this experiment, we use LIBSVM tool 3 with linear kernel function.</Paragraph>
      <Paragraph position="2"> For comparison between SVM and LP, we ran SVM and LP with different sizes of labeled data and evaluate their performance on unlabeled data using precision, recall and F-measure. Firstly, we ran SVM or LP algorithm to detect possible relations from unlabeled data. If an entity mention pair is classified not to the &amp;quot;NONE&amp;quot; class but to the other 24 subtype classes, then it has a relation. Then construct labeled datasets with different sampling set size l, including 1%xNtrain, 10%xNtrain, 25%x Ntrain, 50%xNtrain, 75%xNtrain, 100%xNtrain (Ntrain is the number of examples in the ACE train3LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/[?]cjlin/libsvm. ing set). If any relation subtype was absent from the sampled labeled set, we redid the sampling. For each size, we performed 20 trials and calculated average scores on test set over these 20 random trials.</Paragraph>
      <Paragraph position="3"> Table 2 reports the performance of SVM and LP with different sizes of labled data for relation detection task. We used the same sampled labeled data in LP as the training data for SVM model.</Paragraph>
      <Paragraph position="4"> From Table 2, we see that both LPCosine and LPJS achieve higher Recall than SVM. Specifically, with small labeled dataset (percentage of labeled data [?] 25%), the performance improvement by LP is significant. When the percentage of labeled data increases from 50% to 100%, LPCosine is still comparable to SVM in F-measure while LPJS achieves slightly better F-measure than SVM. On the other hand, LPJS consistently outperforms LPCosine.</Paragraph>
      <Paragraph position="5"> Table 3 reports the performance of relation classification by using SVM and LP with different sizes of labled data. And the performance describes the average values of Precision, Recall and F-measure over major relation subtypes.</Paragraph>
      <Paragraph position="6"> From Table 3, we see that LPCosine and LPJS out-perform SVM by F-measure in almost all settings of labeled data, which is due to the increase of Recall. With smaller labeled dataset (percentage of labeled data [?] 50%), the gap between LP and SVM is larger. When the percentage of labeled data in- null and LP with different sizes of labeled data creases from 75% to 100%, the performance of LP algorithm is still comparable to SVM. On the other hand, the LP algorithm based on JS divergence consistently outperforms the LP algorithm based on Cosine similarity. Figure 1 visualizes the accuracy of three algorithms.</Paragraph>
      <Paragraph position="7"> As shown in Figure 1, the gap between SVM curve and LPJS curves is large when the percentage of labeled data is relatively low.</Paragraph>
      <Paragraph position="8">  In Figure 2, we selected 25 instances in training set and 15 instances in test set from the ACE corpus,which covered five relation types. Using Isomap tool 4, the 40 instances with 229 feature dimensions are visualized in a two-dimensional space as the figure. We randomly sampled only one labeled example for each relation type from the 25 training examples as labeled data. Figure 2(a) and 2(b) show the initial state and ground truth result respectively. Figure 2(c) reports the classification result on test set by SVM (accuracy = 415 = 26.7%), and Figure 2(d) gives the classification result on both training set and test set by LP (accuracy = 1115 = 73.3%).</Paragraph>
      <Paragraph position="9"> Comparing Figure 2(b) and Figure 2(c), we find that many examples are misclassified from class diamondmath to other class symbols. This may be caused that SVMs method ignores the intrinsic structure in data. For Figure 2(d), the labels of unlabeled examples are determined not only by nearby labeled examples,  algorithm on a data set from ACE corpus. * and triangle denote the unlabeled examples in training set and test set respectively, and other symbols (diamondmath,x,a50,+ and triangleinv) represent the labeled examples with respective relation type sampled from training set.</Paragraph>
      <Paragraph position="10"> strategy achieves better performance than the local consistency based SVM strategy when the size of labeled data is quite small.</Paragraph>
      <Paragraph position="11"> 3.3.4 LP vs. Bootstrapping In (Zhang, 2004), they perform relation classification on ACE corpus with bootstrapping on top of SVM. To compare with their proposed Bootstrapped SVM algorithm, we use the same feature stream setting and randomly selected 100 instances from the training data as the size of initial labeled data.</Paragraph>
      <Paragraph position="12"> Table 4 lists the performance of the bootstrapped SVM method from (Zhang, 2004) and LP method with 100 seed labeled examples for relation type classification task. We can see that LP algorithm outperforms the bootstrapped SVM algorithm on four relation type classification tasks, and perform comparably on the relation &amp;quot;SOC&amp;quot; classification task.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML