File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-4038_concl.xml

Size: 1,335 bytes

Last Modified: 2025-10-06 13:54:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4038">
  <Title>Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunksa0</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusions &amp; Future Directions
</SectionTitle>
    <Paragraph position="0"> We have presented a machine-learning approach using SVMs to solve the problem of automatically annotating Arabic text with tags at different levels; namely, tokenization at morphological level, POS tagging at lexical level, and BP chunking at syntactic level. The technique is language independent and highly accurate with an a1 a2a5a4 a6 score of 99.12 on the tokenization task, 95.49% accuracy on the POS tagging task and a1a3a2a5a4a3a6 score of 92.08 on the BP Chunking task. To the best of our knowledge, these are the first results reported for these tasks in Arabic natural language processing.</Paragraph>
    <Paragraph position="1"> We are currently trying to improve the performance of the systems by using additional features, a wider context and more data created semi-automatically using an unannotated large Arabic corpus. In addition, we are trying to extend the approach to semantic chunking by hand-labeling a part of Arabic TreeBank with arguments or semantic roles for training.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML