File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-3002_evalu.xml

Size: 5,623 bytes

Last Modified: 2025-10-06 13:59:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-3002">
  <Title>Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering</Title>
  <Section position="8" start_page="9" end_page="11" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="9" end_page="10" type="sub_section">
      <SectionTitle>
4.1 Corpora
</SectionTitle>
      <Paragraph position="0"> For this study, we chose three corpora: the  for German, and 3 million sentences from a Finnish web corpus (from the same source). Table 1 summarizes some characteristics.</Paragraph>
      <Paragraph position="1"> lang. sent. tok. tagger  Since a high coverage is reached with few words in English, a strategy that assigns only the most frequent words to sensible clusters will take us very far here. In the Finnish case, we can expect a high OOV rate, hampering performance  Semi-automatic tags as provided by BNC.  Thanks goes to www.connexor.com for an academic license; the tags do not include interpunctuation marks, which are treated seperately.  of strategies that cannot cope well with low frequency or unseen words.</Paragraph>
    </Section>
    <Section position="2" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
4.2 Baselines
</SectionTitle>
      <Paragraph position="0"> To put our results in perspective, we computed the following baselines on random samples of the same 1000 randomly chosen sentences that</Paragraph>
    </Section>
    <Section position="3" start_page="10" end_page="11" type="sub_section">
      <SectionTitle>
4.3 Results
</SectionTitle>
      <Paragraph position="0"> We measured the quality of the resulting taggers for combinations of several substeps:  Figure 2 illustrates the influence of the similarity threshold s for O, OM and OMA for German - the other languages showed similar results. Varying s influences coverage on the 10,000 target words. When clustering very few words, tagging performance on these words reaches a PP as low as 1.25 but the high OOV rate impairs the total performance. Clustering too many words results in deterioration of results most words end up in one big partition. In the medium ranges, higher coverage and lower known PP compensate each other, optimal total PPs were observed at target coverages 4,0008,000. Adding ambiguous words results in a worse performance on lexicon words, yet improves overall performance, especially for high thresholds.</Paragraph>
      <Paragraph position="1"> For all further experiments we fixed the threshold in a way that partitioning 1 consisted of 5,000 words, so only half of the top 10,000 words are considered unambiguous. At this value, we found the best performance averaged over all corpora.</Paragraph>
      <Paragraph position="2"> Fig 2. Influence of threshold s on tagger performance: cluster-conditional tag perplexity PP as a function of target word coverage.</Paragraph>
      <Paragraph position="3">  oov% is the fraction of non-lexicon words.</Paragraph>
      <Paragraph position="4"> Overall results are presented in table 3. The combined strategy TMA reaches the lowest PP for all languages. The morphology extension (M) always improves the OOV scores. Adding ambiguous words (A) hurts the lexicon performance, but largely reduces the OOV rate, which in turn leads to better overall performance. Combining both partitionings (T) does not always decrease the total PP a lot, but lowers the number of tags significantly. Finnish figures are generally worse than for the other languages, akin to higher baselines.</Paragraph>
      <Paragraph position="5"> The high OOV perplexities for English in experiment TM and TMA can be explained as follows: The smaller the OOV rate gets, the more likely it is that the corresponding words were also OOV in the gold standard tagger. A remedy  would be to evaluate on hand-tagged data.</Paragraph>
      <Paragraph position="6"> Differences between languages are most obvious when comparing OMA and TM: whereas for English it pays off much more to add ambiguous words than to merge the two partitionings, it is the other way around in the German and Finnish experiments.</Paragraph>
      <Paragraph position="7"> To wrap up: all steps undertaken improve the performance, yet their influence's strength varies. As a flavour of our system's output, consider the example in table 4 that has been tagged by our English TMA model: as in the introductory example, &amp;quot;saw&amp;quot; is disambiguated correctly. Word cluster ID cluster members (size)</Paragraph>
      <Paragraph position="9"> saw 2 past tense verbs (3818) the 73 a, an, the (3) man 1 nouns (17418) with 13 prepositions (143) a 73 a, an, the (3) saw 1 nouns (17418) . 116 . ! ? (3)  We compare our results to (Freitag, 2004), as most other works use different evaluation techniques that are only indirectly measuring what we try to optimize here. Unfortunately, (Freitag 2004) does not provide a total PP score for his 200 tags. He experiments with an handtagged, clean English corpus we did not have access to (the Penn Treebank). Freitag reports a PP for known words of 1.57 for the top 5,000 words (91% corpus coverage, baseline 1 at 23.6), a PP for unknown words without morphological extension of 4.8. Using morphological features the unknown PP score is lowered to 4.0. When augmenting the lexicon with low frequency words via their distributional characteristics, a PP as low as 2.9 is obtained for the remaining 9% of tokens. His methodology, however, does not allow for class ambiguity in the lexicon, the low number of OOV words is handled by a Hidden Markov Model.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML