File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2100_evalu.xml

Size: 2,357 bytes

Last Modified: 2025-10-06 13:59:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2100">
  <Title>Learning and Natural Language Processing: A</Title>
  <Section position="8" start_page="783" end_page="784" type="evalu">
    <SectionTitle>
6 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> The average accuracy of the learning based (LB) tagger after 4-fold cross validation is 93.45%. To 10Most of the unknown words are proper nouns, which cannot be stored in the lexicon extensively. So, it also helps in named-entity detection.</Paragraph>
    <Paragraph position="1">  the best of our knowledge no comparable results have been reported so far for Hindi.</Paragraph>
    <Paragraph position="2"> From Table 1, we can see that the disambiguation module brings up the accuracy of simple lexicon lookup based approach by around 25% (LLBD). The overall average accuracy is also brought up by around 20% by augmenting the morphology-driven (MD) tagger by a disambiguation module; hence justifying our belief that a disambiguation module over a morphology driven approach yields better results.</Paragraph>
    <Paragraph position="3"> One interesting observation is the performance of the tagger on individual POS categories. Figure 3 shows clearly that the per POS accuracies of the LB tagger highly exceeds those of the MD and BL tagger for most categories. This means that the disambiguation module correctly disambiguates and correctly identi es the unknown words too. The accuracy on unknown words, as earlier shown in Figure 2, is very high at 92.08%. The percentage of unknown words in the test corpora is 0.013. It seems independent of the size of training corpus because the corpora is unbalanced having most of the unknowns as proper nouns. The rules are formed on this bias, and hence the application of these rules assigns PPN tag to an unknown which is mostly the case.</Paragraph>
    <Paragraph position="4"> From Figure 3 again we see that the accuracy on some categories remains very low even after disambiguation. This calls for some detailed failure analysis. By looking at the categories having low accuracy, such as pronoun, intensi er, demonstratives and verb copula, we nd that all of them are highly ambiguous and, almost invariably, very rare in the corpus. Also, most of them are hard to disambiguate without any semantic information.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML