XML Viewer - w02-0602

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-0602_evalu.xml
Size: 11,507 bytes
Last Modified: 2025-10-06 13:58:50
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0602">
  <Title>Unsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Experiment and Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Experiment
</SectionTitle>
      <Paragraph position="0"> We tested three unsupervised morphology learning systems on various sized word lists from English and Polish corpora. For English we used set A of the Hansard corpus, which is a parallel English and French corpus of proceedings of the Canadian Parliament. We were unable to find a standard corpus for Polish and developed one from online sources.</Paragraph>
      <Paragraph position="1"> The sources for the Polish corpus were older texts and thus our results correspond to a slightly antiquated form of the language. We compared our directed search system, which consists of the probability model described in Section 2 and the directed search described in Section 3 with Goldsmith's MDL algorithm, otherwise known as Linguistica1 and our previous system (2001), which shall henceforth be referred to as the Hill Climbing Search system. The results were then evaluated by measuring the accuracy of the stem relations identified. null We extracted input lexicons from each corpus, excluding words containing non-alphabetic characters.</Paragraph>
      <Paragraph position="2"> The 100 most common words in each corpus were also excluded, since these words tend to be function words and are not very informative for morphology.</Paragraph>
      <Paragraph position="3"> Including the 100 most common words does not significantly alter the results presented. The systems were run on the 500, 1,000, 2,000, 4,000, and 8,000 most common remaining words. The experiments in English were also conducted on the 16,000 most common words from the Hansard corpus.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Evaluation Metrics
</SectionTitle>
      <Paragraph position="0"> Ideally, we would like to be able to specify the correct morphological break for each of the words in the input, however morphology is laced with ambiguity, 1A demo version available on the web, http://humanities.uchicago.edu/faculty/goldsmith/, was used for these experiments. Word-list corpus mode and the method A suffix detection were used. All other parameters were left at their default values.</Paragraph>
      <Paragraph position="1"> and we believe this to be an inappropriate method for this task. For example it is unclear where the break in the word, &amp;quot;location&amp;quot; should be placed. It seems that the stem &amp;quot;locate&amp;quot; is combined with the suffix &amp;quot;tion&amp;quot;, but in terms of simple concatenation it is unclear if the break should be placed before or after the &amp;quot;t&amp;quot;. When &amp;quot;locate&amp;quot; is combined with the suffix &amp;quot;s&amp;quot;, simple concatenation seems to work fine, though a different stem is found from &amp;quot;location&amp;quot; and the suffix &amp;quot;es&amp;quot; could be argued for. One solution is to develop an evaluation technique which incorporates the adjustment or spelling change rules, such as the one that deletes the &amp;quot;e&amp;quot; in &amp;quot;locate&amp;quot; when combining with &amp;quot;tion&amp;quot;.</Paragraph>
      <Paragraph position="2"> None of the systems being evaluated attempt to learn adjustment rules, and thus it would be difficult to analyze them using such a measure. In an attempt to solve this problem we have developed a new measure of performance, which does not specify the exact morphological split of a word. We measure the accuracy of the stems predicted by examining whether two words which are morphologically related are predicted as having the same stem. The accuracy of the stems predicted is analyzed by examining whether pairs of words are morphologically related by having the same immediate stem. The actual break point for the stems is not evaluated, only whether the words are predicted as having the same stem. We are working on a similar measure for suffix identification, which measures whether pairs that have the same suffix are found as having the same suffix, regardless of the actual form of the suffix predicted. null  Two words are related if they share the same immediate stem. For example the words &amp;quot;building&amp;quot;, &amp;quot;build&amp;quot;, and &amp;quot;builds&amp;quot; are related since they all have &amp;quot;build&amp;quot; as a stem, just as &amp;quot;building&amp;quot; and &amp;quot;buildings&amp;quot; are related as they both have &amp;quot;building&amp;quot; as a stem. The two words, &amp;quot;buildings&amp;quot; and &amp;quot;build&amp;quot; are not directly related since the former has &amp;quot;building&amp;quot; as a stem, while &amp;quot;build&amp;quot; is its own stem. Irregular forms of words are also considered to be related even though such relations would be very difficult to detect with a simple concatenation model.</Paragraph>
      <Paragraph position="3"> We say that a morphological analyzer predicts two words as being related if it attributes the same stem to both words, regardless of what that stem actually is. If an analyzer made a mistake and said both &amp;quot;build' and &amp;quot;building&amp;quot; had the stem &amp;quot;bu&amp;quot;, we would still give credit to it for finding that the two are related, though this analysis would be penalized by the suffix identification measure. The stem relation precision measures how many of the relations predicted by the system were correct, while the recall measures how many of the relations present in the data were found. Stem relation fscore is an unbiased combination of precision and recall that favors equal scores.</Paragraph>
      <Paragraph position="4">  The correct number of stem relations for each lexicon size in English and Polish are shown in Table 1.</Paragraph>
      <Paragraph position="5"> Because Polish has a richer morphology than English, the number of relations in Polish is significantly higher than the number of relations in English at every lexicon size.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Results
</SectionTitle>
      <Paragraph position="0"> The results from the experiments are shown in Figures 1- 3. All graphs are shown use a log scale for the corpus size. Due to software difficulties we were unable to get Linguistica to run on 500, 1000, and 2000 words in English. The software ran without difficulties on the larger English datasets and on the Polish data.</Paragraph>
      <Paragraph position="1"> Figure 1 shows the number of different suffixes predicted by each of the algorithms in both English and Polish. The Hill Climbing Search system found a very small number of suffixes in the English data and was unable to find any suffixes, other than a37 , in the Polish data. Our directed search algorithm found a relatively constant number of suffixes across lexicon sizes and Linguistica found an increasingly large number of suffixes, predicting over 700 different suffixes in the 16,000 word English lexicon.  using the stem relation metric. Figure 3 shows the performance of the algorithms on the Polish input lexicon. The Hill Climbing Search system was unable to learn any morphology on the Polish data sets, and thus has zero precision and recall. The Directed Search maintains a very high precision across lexicon sizes in both languages, whereas the precision of Linguistica decreases considerably at larger lexicon sizes. However Linguistica shows an increasing recall as the lexicon size increases, with our Directed Search having a decreasing recall as lexicon size increases, though the recall of Linguistica in Polish is consistently lower than the Directed Search system's recall. The fscores for the Directed Search and Linguistica in English are very close, and the Directed Search appears to clearly outperform Linguistica in Polish.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Suffixes Stems
</SectionTitle>
      <Paragraph position="0"> found by our directed search algorithm when run on 8000 words of Polish. The first paradigm shown is for the single adjective stem meaning &amp;quot;strange&amp;quot; with numerous inflections for gender, number and case, as well as one derivational suffix, &amp;quot;-ie&amp;quot; which changes it into an adverb, &amp;quot;strangely&amp;quot;. The second paradigm is for the nouns, &amp;quot;cloud&amp;quot; and &amp;quot;ax&amp;quot;, with various case inflections and the third paradigm paradigm contains the verbs, &amp;quot;talk&amp;quot;, &amp;quot;return&amp;quot;, and &amp;quot;sell&amp;quot;. All suffixes in the third paradigm are inflectional indicating tense and agreement.</Paragraph>
      <Paragraph position="1"> As an additional note, Linguistica was dramatically faster than either our Directed Search or the Hill Climbing Search system. Both systems are development oriented software and not as optimized for efficient runtime as Linguistica appears to be.</Paragraph>
      <Paragraph position="2"> Of the three systems, the Hill Climbing Search system has poorest performance. The poor performance of the Hill Climbing Search system in Polish is due to a quirk in its search algorithm, which prevents it from hypothesizing stems that are not themselves words. This is not a bug in the software, but a property of the algorithm used. In English this is not a significant difficulty as most stems are also words, but this is almost never the case in Polish, where almost all stems require some suffix.</Paragraph>
      <Paragraph position="3"> The differences between the performance of Linguistica and our Directed Search system can most easily be seen in the number of suffixes predicted by each algorithm. The number of suffixes predicted by Linguistica grows linearly with the number of words, in general causing his algorithm to get much higher recall at the expense of precision. The Directed Search algorithm maintains a fairly constant number of suffixes, causing it to generally have higher precision at the expense of recall. This is consistent with our goals to create a conservative sys- null tem for morphological analysis, where the number of false positives is minimized.</Paragraph>
      <Paragraph position="4"> Most of Linguistica's errors in English resulted from the algorithm mistaking word compounding, such as &amp;quot;breakwater&amp;quot;, for suffixation, namely treating &amp;quot;water&amp;quot; as a productive suffix. While we do think that the word compounding detected by Linguistica is useful, such compounding of words is not generally considered suffixation, and thus should be penalized against.</Paragraph>
      <Paragraph position="5"> The Polish language presents special difficulties for both Linguistica and our Directed Search system, due to the highly complex nature of its morphology. There are far fewer spelling change rules and a much higher frequency of suffixes in Polish than in English. In addition phonology plays a much stronger role in Polish morphology, causing alterations in stems, which are difficult to detect using a concatenative framework.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML