XML Viewer - w06-1667

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1667_evalu.xml
Size: 12,138 bytes
Last Modified: 2025-10-06 13:59:48
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1667">
  <Title>Unsupervised Relation Disambiguation with Order Identification Capabilities</Title>
  <Section position="5" start_page="570" end_page="574" type="evalu">
    <SectionTitle>
3 Experiments and Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="570" end_page="571" type="sub_section">
      <SectionTitle>
3.1 Data Setting
</SectionTitle>
      <Paragraph position="0"> Our proposed unsupervised relation extraction is evaluated on ACE corpus, which contains 519 files from sources including broadcast, newswire, and newspaper. We only deal with intra-sentence explicit relations and assumed that all entities have  been detected beforehand in the EDT sub-task of ACE. To verify our proposed method, we only collect those pairs of entity mentions which have been tagged relation types in the given corpus. Then the relation type tags were removed to test the unsupervised relation disambiguation. During the evaluation procedure, the relation type tags were used as ground truth classes. A break-down of the data by 24 relation subtypes is given in Table 2.</Paragraph>
    </Section>
    <Section position="2" start_page="571" end_page="571" type="sub_section">
      <SectionTitle>
3.2 Evaluation method for clustering result
</SectionTitle>
      <Paragraph position="0"> When assessing the agreement between clustering result and manually annotated relation types (ground truth classes), we would encounter the problem that there was no relation type tags for each cluster in our clustering results.</Paragraph>
      <Paragraph position="1"> To resolve the problem, we construct a contingency table T, where each entry ti,j gives the number of the instances that belong to both the i-th estimated cluster and j-th ground truth class. Moreover, to ensure that any two clusters do not share the same labels of relation types, we adopt a permutation procedure to find an one-to-one mapping function Ohm from the ground truth classes (relation types) TC to the estimated clustering result EC.</Paragraph>
      <Paragraph position="2"> There are at most |TC |clusters which are assigned relation type tags. And if the number of the estimated clusters is less than the number of the ground truth clusters, empty clusters should be added so that |EC |= |TC |and the one-to-one mapping can be performed, which can be formulated as the function:</Paragraph>
      <Paragraph position="4"> dex of the estimated cluster associated with the j-th class.</Paragraph>
      <Paragraph position="5"> Given the result of one-to-one mapping, we adopt Precision, Recall and F-measure to evaluate the clustering result.</Paragraph>
    </Section>
    <Section position="3" start_page="571" end_page="571" type="sub_section">
      <SectionTitle>
3.3 Experimental Design
</SectionTitle>
      <Paragraph position="0"> We perform our unsupervised relation extraction on the devtest set of ACE corpus and evaluate the algorithm on relation subtype level. Firstly, we observe the influence of various variables, including Distance Parameter s2, Different Features, Context Window Size. Secondly, to verify the effectiveness of our method, we further compare it with supervised method based on SVM and other two unsupervised methods.</Paragraph>
    </Section>
    <Section position="4" start_page="571" end_page="571" type="sub_section">
      <SectionTitle>
3.3.1 Choice of Distance Parameter s2
</SectionTitle>
      <Paragraph position="0"> We simply search over s2 and pick the value that finds the best aligned set of clusters on the transformed space. Here, the scattering criterion trace(P[?]1W PB) is used to compare the cluster quality for different value of s2 3, which measures the ratio of between-cluster to within-cluster scatter. The higher the trace(P[?]1W PB), the higher the cluster quality.</Paragraph>
      <Paragraph position="1"> In Table 3 and Table 4, with different settings of feature set and context window size, we find out the corresponding value of s2 and cluster number which maximize the trace value in searching for a range of value s2.</Paragraph>
    </Section>
    <Section position="5" start_page="571" end_page="572" type="sub_section">
      <SectionTitle>
3.3.2 Contribution of Different Features
</SectionTitle>
      <Paragraph position="0"> As the previous section presented, we incorporate various lexical and syntactic features to extract rela- null is the between-cluster scatter matrix as: PB = summationtextcj=1(mj [?] m)(mj [?] m)t, where m is the total mean vector and mj is the mean vector for jth cluster and (Xj [?] mj)t is the matrix transpose of the column vector (Xj [?]mj).</Paragraph>
      <Paragraph position="1">  tion. To measure the contribution of different features, we report the performance by gradually increasing the feature set, as Table 3 shows.</Paragraph>
      <Paragraph position="2"> Table 3 shows that all of the four categories of features contribute to the improvement of performance more or less. Firstly,the addition of entity type feature is very useful, which improves F-measure by 6.6%. Secondly, adding POS features can increase F-measure score but do not improve very much.</Paragraph>
      <Paragraph position="3"> Thirdly, chunking features also show their great usefulness with increasing Precision/Recall/F-measure by 5.7%/2.5%/4.5%.</Paragraph>
      <Paragraph position="4"> We combine all these features to do all other evaluations in our experiments.</Paragraph>
    </Section>
    <Section position="6" start_page="572" end_page="572" type="sub_section">
      <SectionTitle>
3.3.3 Setting of Context Window Size
</SectionTitle>
      <Paragraph position="0"> We have mentioned in Section 2 that the context vectors of entity pairs are derived from the contexts before, between and after the entity mention pairs.</Paragraph>
      <Paragraph position="1"> Hence, we have to specify the three context window size first. In this paper, we set the mid-context window as everything between the two entity mentions.</Paragraph>
      <Paragraph position="2"> For the pre- and post- context windows, we could have different choices. For example, if we specify the outer context window size as 2, then it means that the pre-context (post-context)) includes two words before (after) the first (second) entity.</Paragraph>
      <Paragraph position="3"> For comparison of the effect of the outer context of entity mention pairs, we conducted three different settings of context window size (0, 2, 5) as Table 4 shows. From this table we can find that with the context window size setting, 2, the algorithm achieves the best performance of 43.5%/49.4%/46.3% in Precision/Recall/F-measure. With the context window size setting, 5, the performance becomes worse  because extending the context too much may include more features, but at the same time, the noise also increases.</Paragraph>
    </Section>
    <Section position="7" start_page="572" end_page="573" type="sub_section">
      <SectionTitle>
3.3.4 Comparison with Supervised methods
</SectionTitle>
      <Paragraph position="0"> and other Unsupervised methods To explore the effectiveness of our unsupervised method compared to supervised method, we perform SVM technique with the same feature set defined in our proposed method. The LIBSVM tool is used in this test 4. The kernel function we used is linear and SVM models are trained using the training set of ACE corpus.</Paragraph>
      <Paragraph position="1"> In (Hasegawa et al., 2004), they preformed unsupervised relation extraction based on hierarchical clustering and they only used word features between entity mention pairs to construct context vectors. We reported the clustering results using the same clustering strategy as Hasegawa et al. (2004) proposed. In Table 5, Hasegawa's Method1 means the test used the word feature as Hasegawa et al. (2004) while Hasegawa's Method2 means the test used the same feature set as our method. In both tests, we specified 4 LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. It supports multi-class classification.</Paragraph>
      <Paragraph position="2">  the cluster number as the number of ground truth classes.</Paragraph>
      <Paragraph position="3"> We also approached the relation extraction problem using the standard clustering technique, Kmeans, where we adopted the same feature set defined in our proposed method to cluster the context vectors of entity mention pairs and pre-specified the cluster number as the number of ground truth classes.</Paragraph>
      <Paragraph position="4"> Table 5 reports the performance of our proposed method comparing with SVM-based supervised method and the other two unsupervised methods. As the result shows, SVM-based method by using the same feature set in our proposed method can achieve 61.2%/49.6%/54.8% in Precision/Recall/Fmeasure. Table 5 also shows our proposed spectral based method clearly outperforms the other two unsupervised methods by 12.5% and 9.5% in F-measure respectively. Moreover, the incorporation of various lexical and syntactic features into Hasegawa et al. (2004)'s method2 makes it outperform Hasegawa et al. (2004)'s method1 which only uses word feature.</Paragraph>
    </Section>
    <Section position="8" start_page="573" end_page="574" type="sub_section">
      <SectionTitle>
3.4 Discussion
</SectionTitle>
      <Paragraph position="0"> In this paper, we have shown that the modified spectral clustering technique, with various lexical and syntactic features derived from the context of entity pairs, performed well on the unsupervised relation disambiguation problem. Our experiments show that by the choice of the distance parameter s2, we can estimate the cluster number which provides the tightest clusters. We notice that the estimated cluster number is less than the number of ground truth classes in most cases. The reason for this phenomenon may be that some relation types can not be easily distinguished using the context information only. For example, the relation subtypes &amp;quot;Located&amp;quot;, &amp;quot;Based-In&amp;quot; and &amp;quot;Residence&amp;quot; are difficult to disambiguate even for human experts to differentiate. null The results also show that various lexical and syntactic features contain useful information for the task. Especially, although we did not concern the dependency tree and full parse tree information as other supervised methods (Miller et al., 2000; Culotta and Soresen, 2004; Kambhatla, 2004; Zhou et al., 2005), the incorporation of simple features, such as words and chunking information, still can provide complement information for capturing the characteristics of entity pairs. Another observation from the result is that extending the outer context window of entity mention pairs too much may not improve the performance since the process may incorporate more noise information and affect the clustering result. null As regards the clustering technique, the spectral-based clustering performs better than direct clustering, K-means. Since the spectral-based algorithm works in a transformed space of low dimensionality, data can be easily clustered so that the algorithm can be implemented with better efficiency and speed. And the performance using spectral-based clustering can be improved due to the reason that spectral-based clustering overcomes the drawback of K-means (prone to local minima) and may find non-convex clusters consistent with human intuition. null Currently most of works on the RDC task of ACE focused on supervised learning methods. Table 6 lists a comparison of these methods on relation detection and relation classification. Zhou et al. (2005) reported the best result as 63.1%/49.5%/55.5% in Precision/Recall/F-measure on the extraction of ACE relation subtypes using feature based method, which outperforms tree kernel based method by Culotta and Soresen (2004). Although our unsupervised method still can not outperform these su- null pervised methods, from the point of view of unsupervised resolution for relation extraction, our approach already achieves best performance of 43.5%/49.4%/46.3% in Precision/Recall/F-measure compared with other clustering methods.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML