File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-2045_evalu.xml
Size: 7,719 bytes
Last Modified: 2025-10-06 13:59:25
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2045"> <Title>Unsupervised Feature Selection for Relation Extraction</Title> <Section position="4" start_page="264" end_page="266" type="evalu"> <SectionTitle> 3 Experiments and Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="264" end_page="264" type="sub_section"> <SectionTitle> 3.1 Data </SectionTitle> <Paragraph position="0"> We constructed three subsets for domains PER-ORG, ORG-GPE and ORG-ORG respectively from ACE corpus3 The details of these subsets are given in Table 3, which are broken down by different relation types. To verify our proposed method, we only extracted those pairs of entity mentions which have been tagged relation types.</Paragraph> <Paragraph position="1"> And the relation type tags were used as ground truth classes to evaluate.</Paragraph> </Section> <Section position="2" start_page="264" end_page="264" type="sub_section"> <SectionTitle> 3.2 Evaluation method for clustering result </SectionTitle> <Paragraph position="0"> Since there was no relation type tags for each cluster in our clustering results, we adopted a permutation procedure to assign different relation type tags to only min(jECj,jTCj) clusters, where jECj is the estimated number of clusters, and jTCj is the number of ground truth classes (relation types). This procedure aims to find an one-to-one mapping function > from the TC to EC. To perform the mapping, we construct a contingency table T, where each entry ti;j gives the number of the instances that belong to both the i-th cluster and j-th ground truth class. Then the mapping procedure can be formulated as:^> = argmax>PjTCjj=1 t>(j);j, where >(j) is the index of the estimated cluster associated with the j-th class.</Paragraph> <Paragraph position="1"> Given the result of one-to-one mapping, we can define the evaluation measure as follows:</Paragraph> <Paragraph position="3"> . Intuitively, it reflects the accuracy of the clustering result.</Paragraph> </Section> <Section position="3" start_page="264" end_page="264" type="sub_section"> <SectionTitle> 3.3 Evaluation method for relation labelling </SectionTitle> <Paragraph position="0"> For evaluation of the relation labeling, we need to explore the relatedness between the identified labels and the pre-defined relation names. To do this, we use one information-content based measure (Lin, 1997), which is provided in Wordnet-Similarity package (Pedersen et al., 2004) to evaluate the similarity between two concepts in Wordnet. Intuitively, the relatedness between two concepts in Wordnet is captured by the information content of their lowest common subsumer (lcs) and the information content of the two concepts themselves , which can be formalized as follows:</Paragraph> <Paragraph position="2"> measure depends upon the corpus to estimate information content. We carried out the experiments using the British National Corpus (BNC) as the source of information content.</Paragraph> </Section> <Section position="4" start_page="264" end_page="266" type="sub_section"> <SectionTitle> 3.4 Experiments and Results </SectionTitle> <Paragraph position="0"> For comparison of the effect of the outer and within contexts of entity pairs, we used five dif- null ferent settings of context window size (WINpre-WINmid-WINpost) for each domain.</Paragraph> <Paragraph position="1"> Table 4 shows the results of model order identification without feature selection (Baseline) and with feature selection based on different feature ranking criterion( '2 , Frequency and Entropy).</Paragraph> <Paragraph position="2"> The results show that the model order identification algorithm with feature selection based on entropy achieve best results: estimate cluster numbers which are very close to the true values. In addition, we can find that with the context setting, 010-0, the estimated number of the clusters is equal or close to the ground truth value. It demonstrates that the intervening words less than 10 are appropriate features to reflect the structure behind the contexts, while the intervening words less than 5 are not enough to infer the structure. For the contextual words beyond (before or after) the entities, they tend to be noisy features for the relation estimation, as can be seen that the performance deteriorates when taking them into consideration, especially for the case without feature selection.</Paragraph> <Paragraph position="3"> Table 5 gives a comparison of the average accuracy over five different context window size settings for different clustering settings. For each domain, we conducted five clustering procedures: Hasegawa's method, RLBaseline, RLFS'2, RLFSFreq and RLFSEntropy. For Hasegawa's method (Hasegawa et al., 2004), we set the cluster number to be identical with the number of ground truth classes. For RLBaseline, we use the estimated cluster number to cluster contexts without feature selection. For RLFS'2,RLFSFreq and RLFSEntropy, we use the selected feature subset and the estimated cluster number to cluster the contexts, where the feature subset comes from '2, frequency and entropy criterion respectively. Comparing the average accuracy of these clustering methods, we can find that the performance of feature selection methods is better than or comparable with the baseline system without feature selection. Furthermore, it is noted that RLFSEntropy achieves the highest average accuracy in three domains, which indicates that entropy based feature pre-ranking provides useful heuristic information for the selection of important feature subset.</Paragraph> <Paragraph position="4"> Table 6 gives the automatically estimated labels for relation types for the domain PER-ORG. We select two features as labels of each relation type according to their DCM scores and calculate the average (and maximum) relatedness between our selected labels (E) and the predefined labels (H).</Paragraph> <Paragraph position="5"> Following the same strategy, we also extracted relation labels (T) from the ground truth classes and provided the relatedness between T and H. From the column of relatedness (E-H), we can see that it is not easy to find the hand-tagged relation labels exactly, furthermore, the identified labels from the ground-truth classes are either not always comparable to the pre-defined labels in most cases (T-H). The reason may be that the pre-defined relation names tend to be some abstract labels over the features, e.g., 'management' vs. 'president', from ground truth classes. (E) is the identified relation labels from our estimated clusters. 'Ave (T-H)' denotes the average relatedness between (T) and (H). 'Max (T-H)' denotes the maximum relatedness between (T) and (H). management head,president president,control 0.3703 0.4515 0.3148 0.3406 0.7443 1.0000 general-staff work,fire work,charge 0.6254 0.7823 0.6411 0.7823 0.6900 1.0000 member join,communist become,join 0.394 0.4519 0.1681 0.3360 0.3366 1.0000 owner bond,bought belong,house 0.1351 0.2702 0.0804 0.1608 0.2489 0.4978 located appear,include lobby,appear 0.0000 0.0000 0.1606 0.3213 0.2500 1.0000 client hire,reader bought,consult 0.4378 0.8755 0.0000 0.0000 0.1417 0.5666 affiliate-partner affiliate,associate assist,affiliate 0.9118 1.0000 0.5000 1.0000 0.5000 1.0000 founder form,found invest,set 0.1516 0.3048 0.3437 0.6875 0.4376 0.6932 'head' or 'control'; 'member' vs. 'join', 'become', etc., while the abstract words and the features are located far away in Wordnet. Table 6 also lists the relatedness between (E) and (T). We can see that the labels are comparable by their maximum relatedness(E-T).</Paragraph> </Section> </Section> class="xml-element"></Paper>