File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1131_metho.xml

Size: 7,599 bytes

Last Modified: 2025-10-06 14:08:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1131">
  <Title>Word sense disambiguation criteria: a systematic study</Title>
  <Section position="3" start_page="0" end_page="1" type="metho">
    <SectionTitle>
2 Methodology
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Corpus
</SectionTitle>
      <Paragraph position="0"> The corpus we worked on is composed of different types of texts and comprises 6 468 522 words. It was put together within the framework of the SyntSem project that aims at producing a French corpus which is morphologically and syntactically tagged, lemmatised and that comprises a light syntactical tagging as well as a lexical tagging of 60 target words selected for their strongly polysemous nature (Veronis, 1998)  .</Paragraph>
      <Paragraph position="1"> These 60 target words are evenly divided into 20 nouns, 20 adjectives and 20 verbs, having a total of 53 796 occurrences in the corpus.</Paragraph>
      <Paragraph position="2"> The inadequacy of standard dictionaries (Veronis, 2001) and computational lexicons (Palmer, 1998) for natural language processing is presently one of the major difficulties encountered in word sense disambiguation. For instance, by using these dictionaries, the inter-annotator agreement may sometimes reach only 57% (Ng, Lee, 1996) or may simply be equivalent to random sense allocation (Veronis, 1998). To overcome this weakness, a dictionary more specific to natural language processing is being developed in our team (Reymond, 2002). It has been used to tag the occurrences of the 60 target words of the SyntSem corpus.</Paragraph>
      <Paragraph position="3"> Table 8 in the appendix gives quantitative information for each target word. The number of senses per word may be very large for it includes idioms and phrasal verbs such as: &lt;&lt; mettre sur pied &gt;&gt;, &lt;&lt; mettre a pied &gt;&gt;, &lt;&lt; pied de nez &gt;&gt;, etc. A general agreement seems to emerge according to which morpho-syntatic disambiguation and sense disambiguation can be disentangled (Kilgarriff, 1997; Ng, Zelle, 1997). We have entrusted the part-of-speech tagging of our corpus to the Cordial software (developed by Synapse Developpement company) as it offers lemmatisation and part-of-speech tagging of a  satisfactory accuracy (Valli, Veronis, 1999).</Paragraph>
      <Paragraph position="4"> mform lemma ems cgems sense mettre mettre VINF VINF 1.12.7 fin fin NCFS NCOM a a PREP PREP la le DETDFS DET pratique pratique NCFS NCOM des de DETDPIG DET detentions detention NCFP NCOM 1  These words are those used in the French part of the Senseval-1 evaluation exercice (Segond, 2000) but the corpus and dictionary are different in the present study. Table 1 displays an extract of the SyntSem corpus. It shows all the tags of each word. We use the information provided by these tags in our lexical disambiguation criteria.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.2 Criteria
</SectionTitle>
      <Paragraph position="0"> The aim of our study is to evaluate a large variety of homogenous criteria (i.e. set of features). The name of each criterion specifies its nature and takes the following form [par1|par2|par3|par4].</Paragraph>
      <Paragraph position="1"> Parameter par1 indicates whether the criterion takes into account unigrams (par1=1gr), bigrams (par1=2gr) or trigrams (par1=3gr), knowing that an n-gram represents the juxtaposition of n words.</Paragraph>
      <Paragraph position="2"> Parameter par2 indicates which word tag is considered: morphological form (par2=mform), lemma (par2=lemma), part-of-speech (par2=ems) or coarse-grained part-of-speech (par2=cgems).</Paragraph>
      <Paragraph position="3"> Parameter par3 indicates if we take into account word positions (par3=position), if we only distinguish left from right context (par3=leftright), or if we simply consider unordered set of surrounding words (par3=unordered). Lastly, parameter par4 shows whether the criterion takes into account all the words (par4=all) or content words only (par3=content). We call these criteria &amp;quot;homogeneous criteria&amp;quot; as the four parameters together determine the nature of all pieces of contextual evidence selected by the criterion.</Paragraph>
      <Paragraph position="4"> For contexts within a range of +-1 to +- 8 words, the combination of all parameters generates 576 (34328) distinct criteria. We have systematically evaluated each one of these criteria as well as other criteria in order to answer specific questions and to validate or invalidate certain hypothesis.</Paragraph>
      <Paragraph position="5"> Within the framework of this study, we have developed an application used to model these criteria and to further apply them to the corpus in order to generate feature vectors used by our classifiers (Audibert, 2001).</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.3 Classifiers
</SectionTitle>
      <Paragraph position="0"> We have selected two complementary classifiers.</Paragraph>
      <Paragraph position="1"> We have chosen the Naive-Bayes classifier (NB) for its simplicity and widespread use, as well as for its well-known state-of-the-art accuracy on supervised WSD (Domingos, Pazzani, 1997; Mooney, 1996; Ng, 1997a). The NB classifier assumes the features are independent given the sense. During classification, it chooses the sense with the highest posterior probability. We have also selected a decision list classifier (DL) which is similar to the classifier used by (Yarowsky, 1994) for words having two senses, and extended for more senses by (Golding, 1995). DL classifier is further developed in (Audibert, 2003). In DL, features are sorted in order of decreasing strength, where strength reflects feature reliability for decision-making. The DL classifier distinguishes itself clearly from the NB classifier as it does not combine the features, but bases its classifications solely on the single most reliable feature identified in the target context selected by the criteria. We will make a use of this decision-making transparency several times in this article. Some other advantages of DL classifier are its significant simplicity and its ease of implementation.</Paragraph>
      <Paragraph position="2"> Both of the classifiers we used require probability estimates. Given the data-sparseness, we have to deal with zero or low frequency counts.</Paragraph>
      <Paragraph position="3"> For this reason, we have decided to use mestimation (Cussens, 1993) rather than classical estimations of probabilities or Laplace (&amp;quot;add one&amp;quot;) smoothing.</Paragraph>
      <Paragraph position="4"> When a classifier is not able to disambiguate a target word, which is very rare, it selects the most frequent sense from the training data. Thus, all occurrences are tagged. As in this case precision equals the recall, the present article speaks in terms of precision only.</Paragraph>
      <Paragraph position="5"> To evaluate a criterion in the corpus, we use a k-fold cross-validation method (in accordance with the common use, in our experiment, k=10). Despite the fact that this method takes much computing time, it enables the evaluation of the criterion in the whole corpus.</Paragraph>
      <Paragraph position="6"> Throughout the tests, the two classifiers have generally obtained comparable accuracy, although the NB classifier has almost systematically outperformed the DL classifier.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML