File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-0701_concl.xml

Size: 1,980 bytes

Last Modified: 2025-10-06 13:54:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0701">
  <Title>part-of-speech tagging of Arabic</Title>
  <Section position="8" start_page="7" end_page="7" type="concl">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> We investigated the application of memory-based learning (k-nearest neighbor classification) to morphological analysis and PoS tagging of unvoweled written Arabic, using the ATB1 corpus as training and testing material. The morphological analyzer was shown to attain F-scores of 0.32 on unknown words when predicting all aspects of the analysis, including vocalization (a partly unpredictable task, certainly if no context is available). The PoS tagger attains an accuracy of about 74% on unknown words, and 92% on all words (including known words). A combination of the two which selects from the set of generated analyses a subset of analyses with the PoS predicted by the tagger, yielded a recall of the contextually appropriate analysis of 0.90 on test words, yet a low precision of 0.64 largely caused by overgeneration of invalid analyses. null We make two final remarks. First, memory-based morphological analysis of Arabic words appears feasible, but its main limitation is its inevitable inability to recognize the appropriate stem of unknown words on the basis of the ambiguous root form input; our current method simply overgenerates vocalizations, keeping high recall at the cost of low precision. Second, memory-based PoS tagging of written Arabic text also appears to be feasible; the observed performances are roughly comparable to those observed for other languages. The PoS tagging task as we define it is deliberately separated from the problem of vocalization, which is in effect the problem of stem identification. We therefore consider the automatic identification of stems as a component of full morpho-syntactic analysis of written Arabic an important issue for future research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML