File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1194_intro.xml
Size: 1,884 bytes
Last Modified: 2025-10-06 14:02:19
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1194"> <Title>Discovering word senses from a network of lexical cooccurrences</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Overview </SectionTitle> <Paragraph position="0"> The starting point of the method we present in this article is a network of lexical cooccurrences, that is a graph whose vertices are the significant words of a corpus and edges represent the cooccurrences between these words in the corpus. The discovering of word senses is performed word by word and the processing of a word only relies on the subgraph that contains its cooccurrents. The first step of the method consists in building a matrix of similarity between these cooccurrents by exploiting their relations in the subgraph. An unsupervised clustering algorithm is then applied for grouping these cooccurrents and giving rise to the senses of the considered word. This method, as the ones presented in (Veronis, 2003), (Dorow and Widdows, 2003) and (Rapp, 2003), relies on the following hypothesis: in the subgraph gathering the cooccurrents of a word, the number of relations between the cooccurrents defining a sense is higher than the number of relations that these cooccurrents have with those defining the other senses of the considered word. The clustering algorithm that we use is an adaptation of the Shared Nearest Neighbors (SNN) algorithm presented in (Ertoz et al., 2001). This algorithm particularly fits our problem as it automatically determines the number of clusters, in our case the number of senses of a word, and does not take into account the elements that are not representative of the clusters it builds. This last point is especially important for our application as there is a lot of &quot;noise&quot; among the cooccurrents of a word.</Paragraph> </Section> class="xml-element"></Paper>