XML Viewer - w04-0813

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0813_metho.xml
Size: 12,298 bytes
Last Modified: 2025-10-06 14:09:11
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0813">
  <Title>The Basque Country University system: English and Basque tasks</Title>
  <Section position="3" start_page="0" end_page="3" type="metho">
    <SectionTitle>
2 Learning Algorithms
</SectionTitle>
    <Paragraph position="0"> The algorithms presented in this section rely on features extracted from the context of the target word to make their decisions.</Paragraph>
    <Paragraph position="1"> The Decision List (DL) algorithm is described in (Yarowsky, 1995b). In this algorithm the sense with the highest weighted feature is selected, as shown below. We can avoid undetermined values by discarding features that have a 0 probability in the divisor. More sophisticated smoothing techniques have also been tried (cf.</Paragraph>
    <Paragraph position="2"> Section 4).</Paragraph>
    <Paragraph position="3"> arg max</Paragraph>
    <Paragraph position="5"> The Naive Bayes (NB) algorithm is based on the conditional probability of each sense given the features in the context. It also requires smoothing.</Paragraph>
    <Paragraph position="6"> arg max</Paragraph>
    <Paragraph position="8"> For the Vector Space Model (V) algorithm, we represent each occurrence context as a vector, where each feature will have a 1 or 0</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Association for Computational Linguistics
</SectionTitle>
      <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems value to indicate the occurrence/absence of the feature. For each sense in training, one centroid vector is obtained. These centroids are compared with the vectors that represent testing examples, by means of the cosine similarity function. The closest centroid is used to assign its sense to the testing example. No smoothing is required to apply this algorithm, but it is possible to use smoothed values.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Regarding Support Vector Machines
</SectionTitle>
      <Paragraph position="0"> (SVM) we utilized SVM-Light (Joachims, 1999), a public distribution of SVM. Linear kernels were applied, and the soft margin (C) was estimated per each word (cf. Section 4).</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="3" type="sub_section">
      <SectionTitle>
3Features
3.1 Features for English
Wereliedonanextensivesetoffeaturesof
</SectionTitle>
      <Paragraph position="0"> different types, obtained by means of different tools and resources. The features used can be grouped in four groups: Local collocations: bigrams and trigrams formed with the words around the target. These features are constituted with lemmas, wordforms, or PoS tags  . Other local features are those formed with the previous/posterior lemma/word-form in the context.</Paragraph>
      <Paragraph position="1"> Syntactic dependencies: syntactic dependencies were extracted using heuristic patterns, and regular expressions defined with the PoS tags around the target  . The following relations were used: object, subject, noun-modifier, preposition, and sibling.</Paragraph>
      <Paragraph position="2"> Bag-of-words features:weextractthe lemmas of the content words in the whole context, and in a +-4-word window around the target. We also obtain salient bigrams in the context, with the methods and the software described in (Pedersen, 2001).</Paragraph>
      <Paragraph position="3"> Domain features: The WordNet Domains resource was used to identify the most relevant domains in the context. Following the relevance formula presented in (Magnini and Cavagli'a, 2000), we defined 2 feature types: (1) the most relevant domain, and (2) a list of domains above a predefined threshold  . Other experiments using domains from SUMO, the EuroWordNet  The PoS tagging was performed with the fnTBL toolkit (Ngai and Florian, 2001).</Paragraph>
      <Paragraph position="4">  This software was kindly provided by David Yarowsky's group, from Johns Hopkins University.  The software to obtain the relevant domains was kindly provided by Gerard Escudero's group, from Universitat Politecnica de Catalunya top-ontology, and WordNet's Semantic Fields were performed, but these features were discarded from the final set.</Paragraph>
    </Section>
    <Section position="4" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
3.2 Features for Basque
</SectionTitle>
      <Paragraph position="0"> Basque is an agglutinative language, and syntactic information is given by inflectional suffixes. The morphological analysis of the text is a necessary previous step in order to select informative features. The data provided by the task organization includes information about the lemma, declension case, and PoS for the participating systems. Our group used directly the output of the parser (Aduriz et al., 2000), which includes some additional features: number, determiner mark, ambiguous analyses and elliptic words. For a few examples, the morphological analysis was not available, due to parsing errors.</Paragraph>
      <Paragraph position="1"> In Basque, the determiner, the number and the declension case are appended to the last element of the phrase. When defining our feature set for Basque, we tried to introduce the same knowledge that is represented by features that work well for English. We will describe our feature set with an example: for the phrase &amp;quot;elizaren arduradunei&amp;quot; (which means &amp;quot;to the directors of the church&amp;quot;) we get the following analysis from our analyzer: eliza |-ren |arduradun |-ei church |of the |director |to the +pl.</Paragraph>
      <Paragraph position="2"> The order of the words is the inverse in English. We extract the following information for  We will assume that eliza (church) is the target word. Words and lemmas are shown in lowercase and the other information in uppercase. As local features we defined different types of unigrams, bigrams, trigrams and a window of +-4 words. The unigrams were constructed combining word forms, lemmas, case, number, and determiner mark. We defined 4  As for English, we defined bigrams based on word forms, lemmas and parts-of-speech. But in order to simulate the bigrams and trigrams used for English, we defined different kinds of features. For word forms, we distinguished two cases: using the text string (Big wf0), or using the tags from the analysis (Big wf1). The word form bigrams for the example are shown below.</Paragraph>
      <Paragraph position="3"> Inthecaseofthefeaturetype&amp;quot;Bigwf1&amp;quot;, the information is split in three features:  Trigrams are built similarly, by combining the information from three consecutive words. We also used as local features all the content words in a window of +-4 words around the target. Finally, as global features we took all the content lemmas appearing in the context, which was constituted by the target sentence and the two previous and posterior sentences.</Paragraph>
      <Paragraph position="4"> One difficult case to model in Basque is the ellipsis. For example, the word &amp;quot;elizakoa&amp;quot; means &amp;quot;the one from the church&amp;quot;. We were able to extract this information from our analyzer and we represented it in the features, using a special symbol in place of the elliptic word.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="3" end_page="3" type="metho">
    <SectionTitle>
4 Experiments on training data
</SectionTitle>
    <Paragraph position="0"> The algorithms that we applied were first tested on the Senseval-2 lexical sample task for English. The best versions were then evaluated by 10 fold cross-validation on the Senseval-3 data, both for Basque and English. We also used the training data in cross-validation to tune the parameters, such as the smoothed frequencies, or the soft margin for SVM. In this section we will describe first the parameters of each method (including the smoothing procedure), and then the cross-validation results on the Senseval-3 training data.</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
4.1 Methods and Parameters
</SectionTitle>
      <Paragraph position="0"> DL: On Senseval-2 data, we observed that DL improved significantly its performance with a smoothing technique based on (Yarowsky, 1995a). For our implementation, the smoothed probabilities were obtained by grouping the observations by raw frequencies and feature types. As this method seems sensitive to the feature types and the amount of examples, we tested</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="3" end_page="3" type="metho">
    <SectionTitle>
3 DL versions: DL smooth (using smoothed
</SectionTitle>
    <Paragraph position="0"> probabilities), DL fixed (replacing 0 counts with 0.1), and DL discard (discarding features appearing with only one sense).</Paragraph>
    <Paragraph position="1"> NB: We applied a simple smoothing method presented in (Ng, 1997), where zero counts are replaced by the probability of the given sense divided by the number of examples.</Paragraph>
    <Paragraph position="2"> V: The same smoothing method used for NB was applied for vectors. For Basque, two versions were tested: as the Basque parser can return ambiguous analyses, partial weights are assigned to the features in the context, and we can chose to use these partial weights (p), or assign the full weight to all features (f).</Paragraph>
    <Paragraph position="3"> SVM: No smoothing was applied. We estimated the soft margin using a greedy process in cross-validation on the training data per each word.</Paragraph>
    <Paragraph position="4"> Combination: Single voting was used, where each system voted for its best ranked sense, and the most voted sense was chosen.</Paragraph>
    <Paragraph position="5"> More sophisticate schemes like ranked voting, were tried on Senseval-2 data, but the results did not improve. We tested combinations of the 4 algorithms, leaving one out, and the two best. The best results were obtained combining  validation, best recall in bold. Only vector(f) was used for combination.</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
4.2 Results on English Training Data
</SectionTitle>
      <Paragraph position="0"> The results using cross-validation on the Senseval-3 data are shown in Table 1 for single systems, and in Table 2 for combined methods.</Paragraph>
      <Paragraph position="1"> All the algorithms have full-coverage (for English and Basque), therefore the recall and the precision are the same. The most frequent sense (MFS) baseline is also provided, and it is easily beaten by all the algorithms.</Paragraph>
      <Paragraph position="2"> We have to note that these figures are consistent with the performance we observed in the Senseval-2 data, where the vector method is the best performing single system, and the best combination is SVM-vector-DL smooth. There is a small gain when combining 3 systems, which we expected would be higher. We submitted the best single system, and the best combination for this task.</Paragraph>
    </Section>
    <Section position="2" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
4.3 Results on Basque Training Data
</SectionTitle>
      <Paragraph position="0"> The performance on the Senseval-3 Basque training data is given in Table 1 for single systems, and in Table 2 for combined methods. In this case, the vector method, and DL smooth obtain lower performance in relation to other methods. This can be due to the type of features used, which have not been tested as extensively as for English. In fact, it could happen that some features contribute mostly noise.</Paragraph>
      <Paragraph position="1"> Also, the domain tag of the examples, which could provide useful information, was not used.</Paragraph>
      <Paragraph position="2"> There is no improvement when combining different systems, and the result of the combination of 4 systems is unusually high in relation to the English experiments. We also submitted two systems for this task: the best single method in cross-validation (SVM), and the best 3-method combination (SVM-vector-NB).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML