File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0851_intro.xml

Size: 3,236 bytes

Last Modified: 2025-10-06 14:02:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0851">
  <Title>Regularized Least-Squares Classification for Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Word sense disambiguation can be viewed as a classification problem and one way to obtain a classifier is by machine learning methods. Unfortunately, there is no single one universal good learning procedure. The No Free Lunch Theorem assures us that we can not design a good learning algorithm without any assumptions about the structure of the problem. So, we start by trying to find out what are the particular characteristics of the learning problem posed by the word sense disambiguation.</Paragraph>
    <Paragraph position="1"> In our opinion, one of the most important particularities of the word sense disambiguation learning problem, seems to be the dimensionality problem, more specifically the fact that the number of features is much greater than the number of training examples. This is clearly true about data in Senseval-1, Senseval-2 and Senseval-3. One can argue that this happens because of the small number of training examples in these data sets, but we think that this is an intrinsic propriety of learning task in the case of word sense disambiguation.</Paragraph>
    <Paragraph position="2"> In word sense disambiguation one important knowledge source is the words that co-occur (in local or broad context) with the word that had to be disambiguated, and every different word that appears in the training examples will become a feature. Increasing the number of training examples will increase also the number of different words that appear in the training examples, and so will increase the number of features. Obviously, the rate of growth will not be the same, but we consider that for any reasonable number of training examples (reasonable as the possibility of obtaining these training examples and as the capacity of processing, learning from these examples) the dimension of the feature space will be greater.</Paragraph>
    <Paragraph position="3"> Actually, the high dimensionality of the feature space with respect to the number of examples is a general scenario of learning in the case of Natural Language Processing tasks and word sense disambiguation is one of these examples.</Paragraph>
    <Paragraph position="4"> In such situations, when the dimension of the feature space is greater than the number of training examples, the potential for over-fitting is huge and some form of regularization is needed. This is the reason why we chose to use Regularized Least-Squares Classification (RLSC) (Rifkin, 2002; Poggio and Smale, 2003), a method of learning based on kernels and Tikhonov regularization.</Paragraph>
    <Paragraph position="5"> In the next section we explain what source of information we used and how this information is transformed into features. In section 3 we briefly describe the RLSC learning algorithm and in section 4, how we applied this algorithm for word sense disambiguation and what results we have obtained. Finally, in section 5, we discuss some possible improvements.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML