File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1667_intro.xml

Size: 3,154 bytes

Last Modified: 2025-10-06 14:04:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1667">
  <Title>Unsupervised Relation Disambiguation with Order Identification Capabilities</Title>
  <Section position="3" start_page="0" end_page="568" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The task of relation extraction is to identify various semantic relations between name entities from text. Prior work on automatic relation extraction come in three kinds: supervised learning algorithms (Miller et al., 2000; Zelenko et al., 2002; Culotta and Soresen, 2004; Kambhatla, 2004; Zhou et al., 2005), semi-supervised learning algorithms (Brin, 1998; Agichtein and Gravano, 2000; Zhang, 2004), and unsupervised learning algorithm (Hasegawa et al., 2004).</Paragraph>
    <Paragraph position="1"> Among these methods, supervised learning is usually more preferred when a large amount of labeled training data is available. However, it is time-consuming and labor-intensive to manually tag a large amount of training data. Semi-supervised learning methods have been put forward to minimize the corpus annotation requirement. Most of semi-supervised methods employ the bootstrapping framework, which only need to pre-define some initial seeds for any particular relation, and then bootstrap from the seeds to acquire the relation. However, it is often quite difficult to enumerate all class labels in the initial seeds and decide an &amp;quot;optimal&amp;quot; number of them.</Paragraph>
    <Paragraph position="2"> Compared with supervised and semi-supervised methods, Hasegawa et al. (2004)'s unsupervised approach for relation extraction can overcome the difficulties on requirement of a large amount of labeled data and enumeration of all class labels. Hasegawa et al. (2004)'s method is to use a hierarchical clustering method to cluster pairs of named entities according to the similarity of context words intervening between the named entities. However, the drawback of hierarchical clustering is that it required providing cluster number by users. Furthermore, clustering is performed in original high dimensional space, which may induce non-convex clusters hard to identified.</Paragraph>
    <Paragraph position="3"> This paper presents a novel application of spectral clustering technique to unsupervised relation extraction problem. It works by calculating eigenvectors of an adjacency graph's Laplacian to recover a submanifold of data from a high dimensional space, and then performing cluster number estimation on a transformed space defined by the first few eigenvectors. This method may help us find non-convex clusters. It also does not need to pre-define the number of the context clusters or pre-specify the similarity threshold for the clusters as Hasegawa et al.  (2004)'s method.</Paragraph>
    <Paragraph position="4"> The rest of this paper is organized as follows. Section 2 formulates unsupervised relation extraction and presents how to apply the spectral clustering technique to resolve the task. Then section 3 reports experiments and results. Finally we will give a conclusion about our work in section 4.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML