File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1669_intro.xml

Size: 5,889 bytes

Last Modified: 2025-10-06 14:03:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1669">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Two graph-based algorithms for state-of-the-art WSD</Title>
  <Section position="3" start_page="0" end_page="585" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Word sense disambiguation (WSD) is a key enabling-technology. Supervised WSD techniques are the best performing in public evaluations, but need large amounts of hand-tagged data. Existing hand-annotated corpora like SemCor (Miller et al., 1993), which is annotated with WordNet senses (Fellbaum, 1998) allow for a small improvement over the simple most frequent sense heuristic, as attested in the all-words track of the last Senseval competition (Snyder and Palmer, 2004). In theory, larger amounts of training data (SemCor has approx. 700K words) would improve the performance of supervised WSD, but no current project exists to provide such an expensive resource. null Supervised WSD is based on the xed-list of senses paradigm, where the senses for a target word are a closed list coming from a dictionary or lexicon. Lexicographers and semanticists have long warned about the problems of such an approach, where senses are listed separately as discrete entities, and have argued in favor of more complex representations, where, for instance, senses are dense regions in a continuum (Cruse, 2000).</Paragraph>
    <Paragraph position="1"> Unsupervised WSD has followed this line of thinking, and tries to induce word senses directly from the corpus. Typical unsupervised WSD systems involve clustering techniques, which group together similar examples. Given a set of induced clusters (which represent word uses or senses1), each new occurrence of the target word will be compared to the clusters and the most similar cluster will be selected as its sense.</Paragraph>
    <Paragraph position="2"> Most of the unsupervised WSD work has been based on the vector space model, where each example is represented by a vector of features (e.g. the words occurring in the context), and the induced senses are either clusters of examples (Schcurrency1utze, 1998; Purandare and Pedersen, 2004) or clusters of words (Pantel and Lin, 2002). Recently, V*eronis (V*eronis, 2004) has proposed HyperLex, an application of graph models to WSD based on the small-world properties of cooccurrence graphs. Graph-based methods have gained attention in several areas of NLP, including knowledge-based WSD (Mihalcea, 2005; Navigli and Velardi, 2005) and summarization (Erkan and Radev, 2004; Mihalcea and Tarau, 2004).</Paragraph>
    <Paragraph position="3"> The HyperLex algorithm presented in (V*eronis, 2004) is entirely corpus-based. It builds a cooccurrence graph for all pairs of words cooccurring in the context of the target word. V*eronis shows that this kind of graph ful lls the properties of small world graphs, and thus possesses highly connected 1Unsupervised WSD approaches prefer the term 'word uses' to 'word senses'. In this paper we use them interchangeably to refer to both the induced clusters, and to the word senses from some reference lexicon.</Paragraph>
    <Paragraph position="4">  components (hubs) in the graph. These hubs eventually identify the main word uses (senses) of the target word, and can be used to perform word sense disambiguation. These hubs are used as a representation of the senses induced by the system, the same way that clusters of examples are used to represent senses in clustering approaches to WSD (Purandare and Pedersen, 2004).</Paragraph>
    <Paragraph position="5"> One of the problems of unsupervised systems is that of managing to do a fair evaluation.</Paragraph>
    <Paragraph position="6"> Most of current unsupervised systems are evaluated in-house, with a brief comparison to a re-implementation of a former system, leading to a proliferation of unsupervised systems with little ground to compare among them.</Paragraph>
    <Paragraph position="7"> In preliminary work (Agirre et al., 2006), we have shown that HyperLex compares favorably to other unsupervised systems. We de ned a semi-supervised setting for optimizing the freeparameters of HyperLex on the Senseval-2 English Lexical Sample task (S2LS), which consisted on mapping the induced senses onto the of cial sense inventory using the training part of S2LS. The best parameters were then used on the Senseval-3 English Lexical Sample task (S3LS), where a similar semi-supervised method was used to output the of cial sense inventory.</Paragraph>
    <Paragraph position="8"> This paper extends the previous work in several aspects. First of all, we adapted the PageRank graph-based method (Brin and Page, 1998) for WSD and compared it with HyperLex.</Paragraph>
    <Paragraph position="9"> We also extend the previous evaluation scheme, using measures in the clustering community which only require a gold standard clustering and no mapping step. This allows for having a purely unsupervised WSD system, and at the same time comparing supervised and unsupervised systems according to clustering criteria.</Paragraph>
    <Paragraph position="10"> We also include the Senseval-3 English All-words testbed (S3AW), where, in principle, unsupervised and semi-supervised systems have an advantage over purely supervised systems due to the scarcity of training data. We show that our system is competitive with supervised systems, ranking second.</Paragraph>
    <Paragraph position="11"> This paper is structured as follows. We rst present two graph-based algorithms, HyperLex and PageRank. Section 3 presents the two evaluation frameworks. Section 4 introduces parameter optimization. Section 5 shows the experimental setting and results. Section 6 analyzes the results and presents related work. Finally, we draw the conclusions and advance future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML