File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1084_concl.xml

Size: 1,840 bytes

Last Modified: 2025-10-06 13:55:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1084">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation</Title>
  <Section position="7" start_page="670" end_page="671" type="concl">
    <SectionTitle>
5 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> In this work, we have introduced a new text encoding method that captures rules of word formation in a language with affixational morphology such as Hebrew. This text encoding method allows us to  learn in parallel segmentation and tagging rules in an unsupervised manner, despite the high ambiguity level of the morphological data (average number of 2.4 analyses per word). Reported results on a large scale corpus (6M words) with fully unsupervised learning are 92.32% for PoS tagging and 88.5% for full morphological disambiguation.</Paragraph>
    <Paragraph position="1"> In this work, we used the backoff smoothing method, suggested by Thede and Harper (1999), with an extension of additive smoothing (Chen, 1996, 2.2.1) for the lexical probabilities (B and B2 matrices). To complete this study, we are currently investigating several smoothing techniques (Chen, 1996), in order to check whether the morpheme model is critical for the data sparseness problem, or whether it can be handled with smoothing over a word model.</Paragraph>
    <Paragraph position="2"> We are currently investigating two major methods to improve our results: first, we have started gathering a larger corpus of manually tagged text and plan to perform semi-supervised learning on a corpus of 100K manually tagged words. Second, we plan to improve the unknown word model, such as integrating it with named entity recognition system (Ben-Mordechai, 2005).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML