File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1016_intro.xml
Size: 2,002 bytes
Last Modified: 2025-10-06 14:01:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1016"> <Title>Spectral Clustering for German Verbs</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Standard multivariate clustering technology (such as k-Means) can be applied to the problem of inferring verb classes from information about the estimated prevalence of verb frame patterns (Schulte im Walde and Brew, 2002). But one of the problems with multivariate clustering is that it is something of a black art when applied to high-dimensional natural language data. The search space is very large, and the available techniques for searching this large space do not o er guarantees of global optimality.</Paragraph> <Paragraph position="1"> In response to this insight, the present work applies a spectral clustering technique (Ng et al., 2002) to the verb frame patterns. At the heart of this approach is a transformation of the original input into a set of orthogonal eigenvectors. We work in the space de ned by the rst few eigenvectors, using standard clustering techniques in the reduced space. The spectral clustering technique has been shown to handle di cult clustering problems in image processing, o ers principled methods for initializing cluster centers, and (in the version that we use) has no random component.</Paragraph> <Paragraph position="2"> The clustering results are evaluated according to their alignment with a gold standard. Alignment is Pearson correlation between corresponding elements of the Gram matrices, which has been suggested as a measure of agreement between a clustering and a distance measure (Christianini et al., 2002). We are also able to use this measure to quantify the t between a clustering result and the distance matrix that serves as input to clustering. The evidence is that the spectral technique is more e ective than the methods that have previously been tried.</Paragraph> </Section> class="xml-element"></Paper>