File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0859_metho.xml

Size: 17,299 bytes

Last Modified: 2025-10-06 14:09:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0859">
  <Title>The University of Alicante systems at SENSEVAL-3/</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DLSI-UA Method Combined
Systems Results
ALL-NOSU RD No
LS-ENG-SU Re-t No
</SectionTitle>
    <Paragraph position="0"> Most of these methods are relatively new and our goal when participating at SENSEVAL-3 is to evaluate for the first time such approaches. At the moment of writing this paper we can conclude that these are promising contributions in order to improve current WSD systems.</Paragraph>
    <Paragraph position="1"> In the following section each method is described briefly. Then, details of how the SENSEVAL-3 train and testing data were processed are shown. Next, the scores obtained by each system are explained.</Paragraph>
    <Paragraph position="2"> Finally, some conclusions and future work are presented. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Association for Computational Linguistics
</SectionTitle>
      <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Methods and Algorithms
</SectionTitle>
    <Paragraph position="0"> In this section we describe the set of methods and techniques that we used to build the four systems that had participated in SENSEVAL-3.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Re-training and Maximum Entropy
</SectionTitle>
      <Paragraph position="0"> In this section, we describe our bootstrapping method, which we call re-training. Our method is derived from the co-training method. Our re-training system is based on two different views of the data (as is also the case for co-training), defined using several groups of features from those described in Figure 1, with several filters that ensure a high confidence sense labelling.</Paragraph>
      <Paragraph position="1">  (+1;+2) + lemmas of nouns at any position in context, occurring at least m% times with a sense + grammatical relation of the target word + the word that the target word depends on + the verb that the target word depends on + the target word belongs to a multi-word, as identified by the parser + ANPA codes (Spanish only) + IPTC codes (Spanish only)  These two views consist of two weak ME learners, based on different sets of linguistic features, for every possible sense of a target word. We decided to use ME as the core of our bootstrapping method because it has shown to be competitive in WSD when compared to other machine learning approaches (Su'arez and Palomar, 2002; M`arquez et al., 2003).</Paragraph>
      <Paragraph position="2"> The main difference with respect co-training is that the two views are used in parallel in order to get a consensus of what label to assign to a particular context. Additional filters will ultimately determine which contexts will then be added to the next training cycle.</Paragraph>
      <Paragraph position="3"> Re-training performs several binary partial trainings with positive and negative examples for each sense. These classifications must be merged in a unique label for such contexts with enough evidence of being successfully classified. This &amp;quot;evidence&amp;quot; relies on values of probability assigned by the ME module to positive and negative labels, and the fact that the unlabeled example is classified as positive for a unique sense only. The set of new labeled examples feeds the training corpora of the next iteration with positive and negative examples. The stopping criteria is a certain number of iterations or the failure to obtain new examples from the unlabeled corpus.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Specification Marks
</SectionTitle>
      <Paragraph position="0"> Specification Marks is an unsupervised WSD method over nouns. Its context is the group of words that co-occur with the word to be disambiguated in the sentence and their relationship to the noun to be disambiguated. The disambiguation is resolved with the use of the WordNet lexical knowledge base.</Paragraph>
      <Paragraph position="1"> The underlying hypothesis of the method we present here is that the higher the similarity between two words, the larger the amount of information shared by two concepts. In this case, the information commonly shared by two concepts is indicated by the most specific concept that subsumes them both in the taxonomy.</Paragraph>
      <Paragraph position="2"> The input for the WSD module is a group of nouns W =fw1;w2;:::;wng in a context. Each word wi is sought in WordNet, each having an associated set of possible senses Si =fSi1;Si2;:::;Sing, and each sense having a set of concepts in the IS-A taxonomy (hypernymy/hyponymy relations). First, the common concept to all the senses of the words that form the context is gathered. This concept is marked by the initial specification mark (ISM). If this initial specification mark does not resolve the ambiguity of the word, we then descend through the WordNet hierarchy, from one level to another, assigning new specification marks. The number of concepts contained within the subhierarchy is then counted for each specification mark. The sense that corresponds to the specification mark with the highest number of words is the one chosen as the sense disambiguated within the given context We define six heuristics for our system: Heuristic of Hypernym, Heuristic of Definition, Heuristic of Common Specification Mark, Heuristic of Gloss Hypernym, Heuristic of Hyponym and Heuristic of Gloss Hyponym.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Relevant Domains
</SectionTitle>
      <Paragraph position="0"> This is an unsupervised WSD method based on the WordNet Domains lexical resource (Magnini and Cavaglia, 2000). The underlying working hypothesis is that domain labels, such as ARCHITEC-TURE, SPORT and MEDICINE provide a natural way to establish semantic relations between word senses, that can be used during the disambiguation process. This resource has already been used on  rava, 2000), but it has not made use of glosses information. So our approach make use of a new lexical resource obtained from glosses information named Relevant Domains.</Paragraph>
      <Paragraph position="1"> First step is to obtain the Relevant Domains resource from WordNet glosses. For this task is necessary a previous part-of-speech tagging of Word-Net glosses (each gloss has associated a domain label). So we extract all nouns, verbs, adjectives and adverbs from glosses and assign them their associated domain label. With this information and using the Association Ratio formula (w=word,D=domain label), in (1), we obtain the Relevant Domains resource. null</Paragraph>
      <Paragraph position="3"> The final result is for each word, a set of domain labels sorted by Association Ratio, for example, for word plant&amp;quot; its Relevant Domains are: genetics 0.177515, ecology 0.050065, botany 0.038544 . . . .</Paragraph>
      <Paragraph position="4"> Once obtained Relevant Domains the disambiguation process is carried out. We obtain from the text source the context words that co-occur with the word to be disambiguated (context could be a sentence or a window of words). We obtain a context vector from Relevant Domains and context words (in case of repeated domain labels, they are weighted). Furthermore we need a sense vector obtained in the same way as context vector from words of glosses of each word sense. We select the correct sense using the cosine measure between context vector and sense vectors. So the selected sense is that for which the cosine with the context vector is closer to one.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Pattern-Nica
</SectionTitle>
      <Paragraph position="0"> This is an unsupervised method only for Spanish nouns exploiting both EuroWordNet and corpus.</Paragraph>
      <Paragraph position="1"> In this method we adopt a different approach to WSD: the occurrence to be disambiguated is considered not separately, but integrated into a syntactic pattern, and its disambiguation is carried out in relation to this pattern. A syntactic pattern is a triplet X-R-Y, formed by two lexical content units X and Y and an eventual relational element R, which corresponds to a syntactic relation between X and Y. Examples: [X=canal-noun R=de-preposition Y=televisi'on-noun], [X=pasajenoun R=O Y=a'ereo-adjective]. The strategy is based on the hypothesis that syntactic patterns in which an ambiguous occurrence participates have decisive influence on its meaning. We also assume that inside a syntactic pattern a word will tend to have the same sense: the &amp;quot;quasi one sense per syntactic pattern&amp;quot; hypothesis. The method works as follows: null Step 1, the identification of the syntactic patterns of the ambiguous occurrence; Step 2, the extraction of information related to it: from corpus and from the sentential context; Step 3, the application of the WSD algorithm on the different information previously obtained; Step 4, the final sense assignment by combining the partial sense proposals from step 3.</Paragraph>
      <Paragraph position="2"> For step 1, we POS-tag the test sentence and extract the sequences that correspond to previously defined combinations of POS tags. We only kept the patterns with frequency 5 or superior.</Paragraph>
      <Paragraph position="3"> In step 2, we use a search corpus previously POStagged. For every syntactic pattern of the ambiguous occurrence X, we obtain from corpus two sets of words: the substitutes of X into the pattern (S1) and the nouns that co-occur with the pattern in any sentence from the corpus (S2), In both cases, we keep only the element with frequency 5 or superior.</Paragraph>
      <Paragraph position="4"> We perform step 3 by means of the heuristics defined by the Commutative Test (CT) algorithm applied on each set from 2. The algorithm is related to the Sense Discriminators (SD) lexical device, an adaptation of the Spanish WordNet, consisting in a set of sense discriminators for every sense of a given noun in WordNet. The Commutative Test algorithm lays on the hypothesis that if an ambiguous occurrence can be substituted in a syntactic pattern by a sense discriminator, then it can have the sense corresponding to that sense discriminator.</Paragraph>
      <Paragraph position="5"> For step 4, we first obtain a sense assignment in relation with each syntactic pattern, by intersecting the sense proposals from the two heuristics corresponding to a pattern; then we choose the most frequent sense between those proposed by the different syntactic patterns; finally, if there are more final proposed senses, we choose the most frequent sense on the base of sense numbers in WordNet.</Paragraph>
      <Paragraph position="6"> The method we propose for nouns requires only a large corpus, a minimal preprocessing phase (POStagging) and very little grammatical knowledge, so it can easily be adapted to other languages. Sense assignment is performed exploiting information extracted from corpus, thus we make an intensive use of sense untagged corpora for the disambiguation process.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Tasks Processing
</SectionTitle>
    <Paragraph position="0"> At this point we explain for each task the systems processing. The results of each system are shown in</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 DLSI-UA-LS-SPA-SU
</SectionTitle>
      <Paragraph position="0"> Our system, based on re-training and maximum entropy methods, processed both sense labelled and unlabelled Spanish Lexical Sample data in three consecutive steps: Step 1, analyzing the train corpus: words which most frequent sense is under 70% were selected. For each one of these words, each feature was used in a 3-fold cross-validation in order to determine the best set of features for re-training.</Paragraph>
      <Paragraph position="1"> Step 2, feeding training corpora: for these selected words, based on the results of the previous step, each training corpus was enriched with new examples from the unlabelled data using re-training. Step 3, classifying the test data: for the selected words, re-training was used again to obtain a first set of answers with, a priori, a label with a high level of confidence; the remaining contexts that re-training could not classify were processed with the ME system using a unique set of features for all words.</Paragraph>
      <Paragraph position="2"> The lemmatization and POS information supplied into the SENSEVAL-3 Spanish data were the information used for defining the features of the system. 0ur system obtained an accuracy of 0.84 for the Spanish lexical sample task. Unfortunately, a shallow analysis of the answers revealed that the UA.5 system performed slightly worse than if only the basic ME system were used1. This fact means that the new examples extracted from the unlabelled data introduced too much noise into the classifiers. Because this anomalous behavior was present only on some words, a complete study of such new examples must be done. Probably, the number of iterations done by re-training over unlabelled data were too low and the enrichment of the training corpora not large enough.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 DLSI-UA-LS-ENG-SU
</SectionTitle>
      <Paragraph position="0"> In the English Lexical Sample task our system goal was to prove that the re-training method ensures a high level of precision.</Paragraph>
      <Paragraph position="1"> By means of a 3-fold cross-validation of the train data, the features were ordered from higher to lower precision. Based on this information, four executions of re-training over the test data were done with different selections of features for the two views of the method. Each execution feed the learning corpora of the next one with new examples, those that re-training considered as the most probably correct.</Paragraph>
      <Paragraph position="2"> For this system Minipar parser (Lin, 1998)was used to properly add syntactic information to the training and testing data.</Paragraph>
      <Paragraph position="3"> Almost 40% of the test contexts were labelled by our system, obtaining these scores (for &amp;quot;fine-grained&amp;quot; and &amp;quot;coarse-grained&amp;quot;, respectively): 0.782/0.828 precision and 0.310/0.329 recall. In our opinion, such results must be interpreted as very positive because the re-training method is able to satisfy a high level of precision if the parameters of the system are correctly set.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 DLSI-UA-ALL-NOSU and
</SectionTitle>
      <Paragraph position="0"> DLSI-UA-LS-ENG-NOSU In the English All Words and English Lexical Sample tasks RD system was performed with information obtained from Relevant Domains resource using for the disambiguation process all the 165 domain labels.</Paragraph>
      <Paragraph position="1"> For All Words task we used as input information all nouns, verbs, adjectives and adverbs present in a 100 words window around the word to be disambiguated. So our system obtained a 34% of precision and a reduced recall around 28%.</Paragraph>
      <Paragraph position="2"> For Lexical Sample task we used all nouns, verbs, adjectives and adverbs present in the context of each instance obtaining around 32% precision.</Paragraph>
      <Paragraph position="3"> We obtained a reduced precision due to we use all the domains label hierarchy. In some experiments realized on SENSEVAL-2 data, our system obtained a more high precision when grouping domains into the first three levels. Therefore we expect with reducing the number of domains labels, an improvement on precision.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 DLSI-UA-LS-SPA-NOSU
</SectionTitle>
      <Paragraph position="0"> We used a combined system for Spanish Lexical Sample task, using the SM method for disambiguating nouns and the ME method for disambiguating verbs and adjectives. We obtained around 62% precision and a 62% recall.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.5 DLSI-UA-LS-SPA-PATTERN
</SectionTitle>
      <Paragraph position="0"> Our goal when participating in this task was to demonstrate that the applying of syntactic patterns to WSD maintains high levels of precision.</Paragraph>
      <Paragraph position="1"> In this task we used also a combined system for Spanish Lexical Sample task, using Pattern-Nica method for disambiguating nouns and ME method for disambiguating verbs and adjectives. We obtained around 84% precision and a 47% recall.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML