File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1667_metho.xml
Size: 12,175 bytes
Last Modified: 2025-10-06 14:10:48
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1667"> <Title>Unsupervised Relation Disambiguation with Order Identification Capabilities</Title> <Section position="4" start_page="568" end_page="570" type="metho"> <SectionTitle> 2 Unsupervised Relation Extraction Problem </SectionTitle> <Paragraph position="0"> Assume that two occurrences of entity pairs with similar contexts, are tend to hold the same relation type. Thus unsupervised relation extraction problem can be formulated as partitioning collections of entity pairs into clusters according to the similarity of contexts, with each cluster containing only entity pairs labeled by the same relation type. And then, in each cluster, the most representative words are identified from the contexts of entity pairs to induce the label of relation type. Here, we only focus on the clustering subtask and do not address the relation type labeling subtask.</Paragraph> <Paragraph position="1"> In the next subsections we will describe our proposed method for unsupervised relation extraction, which includes: 1) Collect the context vectors in which the entity mention pairs co-occur; 2) Cluster these Context vectors.</Paragraph> <Section position="1" start_page="568" end_page="568" type="sub_section"> <SectionTitle> 2.1 Context Vector and Feature Design </SectionTitle> <Paragraph position="0"> Let X = {xi}ni=1 be the set of context vectors of occurrences of all entity mention pairs, where xi represents the context vector of the i-th occurrence, and n is the total number of occurrences of all entity pairs.</Paragraph> <Paragraph position="1"> Each occurrence of entity mention pairs can be denoted as follows:</Paragraph> <Paragraph position="3"> where e1 and e2 represents the entity mentions, and Cpre,Cmid,and Cpost are the contexts before, between and after the entity pairs respectively.</Paragraph> <Paragraph position="4"> We extracted features from e1, e2, Cpre, Cmid, Cpost to construct context vectors, which are computed from the parse trees derived from Charniak Words: Words in the two entities and three context windows.</Paragraph> <Paragraph position="5"> Entity Type: the entity type of both entity mentions, which can be PERSON, ORGANIZA-TION, FACILITY, LOCATION and GPE.</Paragraph> <Paragraph position="6"> POS features: Part-Of-Speech tags corresponding to all words in the two entities and three context windows.</Paragraph> <Paragraph position="7"> Chunking features: This category of features are extracted from the chunklink representation, which includes: * Chunk tag information of the two entities and three context windows. The &quot;0&quot; tag means that the word is outside of any chunk. The &quot;I-XP&quot; tag means that this word is inside an XP chunk. The &quot;B-XP&quot; by default means that the word is at the beginning of an XP chunk.</Paragraph> <Paragraph position="8"> * Grammatical function of the two entities and three context windows. The last word in each chunk is its head, and the function of the head is the function of the whole chunk. &quot;NP-SBJ&quot; means a NP chunk as the subject of the sentence. The other words in a chunk that are not the head have &quot;NO-FUNC&quot; as their function.</Paragraph> <Paragraph position="9"> * IOB-chains of the heads of the two entities. So-called IOB-chain, noting the syntactic categories of all the constituents on the path from the root node to this leaf node of tree.</Paragraph> <Paragraph position="10"> We combine the above lexical and syntactic features with their position information in the context to form the context vector. Before that, we filter out low frequency features which appeared only once in the entire set.</Paragraph> </Section> <Section position="2" start_page="568" end_page="569" type="sub_section"> <SectionTitle> 2.2 Context Clustering </SectionTitle> <Paragraph position="0"> Once the context vectors of entity pairs are prepared, we come to the second stage of our method: cluster these context vectors automatically.</Paragraph> <Paragraph position="1"> In recent years, spectral clustering technique has received more and more attention as a powerful approach to a range of clustering problems. Among the efforts on spectral clustering techniques (Weiss, 1999; Kannan et al., 2000; Shi et al., 2000; Ng et al., 2001; Zha et al., 2001), we adopt a modified version (Sanguinetti et al., 2005) of the algorithm by Ng et al. (2001) because it can provide us model order selection capability.</Paragraph> <Paragraph position="2"> Since we do not know how many relation types in advance and do not have any labeled relation technique.</Paragraph> <Paragraph position="3"> Input: A set of context vectors X = {x1,x2,...,xn}, X [?]Rfracturnxd; Output: Clustered data and number of clusters; 1. Construct an affinity matrix by Aij = exp([?]s ij s2 ) if i negationslash=j, 0 if i = j. Here, s ij is the similarity between xi and xj calculated by Cosine similarity measure. and the free distance parameter s2 is used to scale the weights; 2. Normalize the affinity matrix A to create the matrix L = D[?]1/2AD[?]1/2, where D is a diagonal matrix whose (i,i) element is the sum of A's ith row; 3. Set q = 2; 4. Compute q eigenvectors of L with greatest eigenvalues. Arrange them in a matrix Y .</Paragraph> <Paragraph position="4"> 5. Perform elongated K-means with q + 1 centers on Y , initializing the (q +1)-th mean in the origin; 6. If the q+1-th cluster contains any data points, then there must be at least an extra cluster; set q = q + 1 and go back to step 4. Otherwise, algorithm stops and outputs clustered data and number of clusters.</Paragraph> <Paragraph position="5"> training examples at hand, the problem of model order selection arises, i.e. estimating the &quot;optimal&quot; number of clusters. Formally, let k be the model order, we need to find k in Equation: k = argmaxk{criterion(k)}. Here, the criterion is defined on the result of spectral clustering. Table 1 shows the details of the whole algorithm for context clustering, which contains two main stages: 1) Transformation of Clustering Space (Step 1-4); 2) Clustering in the transformed space using Elongated K-means algorithm (Step 5-6).</Paragraph> </Section> <Section position="3" start_page="569" end_page="569" type="sub_section"> <SectionTitle> 2.3 Transformation of Clustering Space </SectionTitle> <Paragraph position="0"> We represent each context vector of entity pair as a node in an undirected graph. Each edge (i,j) in the graph is assigned a weight that reflects the similarity between two context vectors i and j. Hence, the relation extraction task for entity pairs can be defined as a partition of the graph so that entity pairs that are more similar to each other, e.g. labeled by the same relation type, belong to the same cluster. As a relaxation of such NP-hard discrete graph partitioning problem, spectral clustering technique computes eigenvalues and eigenvectors of a Laplacian matrix related to the given graph, and construct data clusters based on such spectral information.</Paragraph> <Paragraph position="1"> Thus the starting point of context clustering is to construct an affinity matrix A from the data, which is an n x n matrix encoding the distances between the various points. The affinity matrix is then normalized to form a matrix L by conjugating with the the diagonal matrix D[?]1/2 which has as entries the square roots of the sum of the rows of A. This is to take into account the different spread of the various clusters (points belonging to more rarified clusters will have lower sums of the corresponding row of A). It is straightforward to prove that L is positive definite and has eigenvalues smaller or equal to 1, with equality holding in at least one case.</Paragraph> <Paragraph position="2"> Let K be the true number of clusters present in the dataset. If K is known beforehand, the first K eigenvectors of L will be computed and arranged as columns in a matrix Y . Each row of Y corresponds to a context vector of entity pair, and the above process can be considered as transforming the original context vectors in a d-dimensional space to new context vectors in the K-dimensional space. Therefore, the rows of Y will cluster upon mutually orthogonal points on the K dimensional sphere,rather than on the coordinate axes.</Paragraph> </Section> <Section position="4" start_page="569" end_page="570" type="sub_section"> <SectionTitle> 2.4 The Elongated K-means algorithm </SectionTitle> <Paragraph position="0"> As the step 5 of Table 1 shows, the result of elongated K-means algorithm is used to detect whether the number of clusters selected q is less than the true number K, and allows one to iteratively obtain the number of clusters.</Paragraph> <Paragraph position="1"> Consider the case when the number of clusters q is less than the true cluster number K present in the dataset. In such situation, taking the first q < K eigenvectors, we will be selecting a q-dimensional subspace in the clustering space. As the rows of the K eigenvectors clustered along mutually orthogonal vectors, their projections in a lower dimensional space will cluster along radial directions. Therefore, the general picture will be of q clusters elongated in the radial direction, with possibly some clusters very near the origin (when the subspace is orthogonal to some of the discarded eigenvectors).</Paragraph> <Paragraph position="2"> Hence, the K-means algorithm is modified as the elongated K-means algorithm to downweight distances along radial directions and penalize distances along transversal directions. The elongated K-means algorithm computes the distance of point x from the center ci as follows: * If the center is not very near the origin, cTi ci > epsilon1 (epsilon1 is a parameter to be fixed by the user), the distances are cal- null elongated clusters in the 2D clustering space using Spectral clustering: two dominant eigenvectors; (d) The clustering result using Spectral-based clustering (s2=0.05). (triangle,* and + denote examples in different</Paragraph> <Paragraph position="4"> , l is the sharpness parameter that controls the elongation (the smaller, the more elongated the clusters) 2.</Paragraph> <Paragraph position="5"> * If the center is very near the origin,cTi ci < epsilon1, the distances are measured using the Euclidean distance. In each iteration of procedure in Table 1, elongated K-means is initialized with q centers corresponding to data points in different clusters and one center in the origin. The algorithm then will drag the center in the origin towards one of the clusters not accounted for. Compute another eigenvector (thus increasing the dimension of the clustering space to q + 1) and repeat the procedure. Eventually, when one reach as many eigenvectors as the number of clusters present in the data, no points will be assigned to the center at the origin, leaving the cluster empty. This is the signal to terminate the algorithm.</Paragraph> </Section> <Section position="5" start_page="570" end_page="570" type="sub_section"> <SectionTitle> 2.5 An example </SectionTitle> <Paragraph position="0"> Figure 1 visualized the clustering result of three circle dataset using K-means and Spectral-based clustering. From Figure 1(b), we can see that K-means can not separate the non-convex clusters in three circle dataset successfully since it is prone to local minimal. For spectral-based clustering, as the algorithm described, initially, we took the two eigenvectors of L with largest eigenvalues, which gave us a two-dimensional clustering space. Then to ensure that the two centers are initialized in different clusters, one center is set as the point that is the farthest from the origin, while the other is set as the point that simultaneously farthest the first center and the origin. Figure 1(c) shows the three elongated clusters in the 2D clustering space and the corresponding clustering result of dataset is visualized in Figure 1(d), which exploits manifold structure (cluster structure) in data.</Paragraph> </Section> </Section> class="xml-element"></Paper>