File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0504_intro.xml

Size: 1,654 bytes

Last Modified: 2025-10-06 14:01:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0504">
  <Title>An HMM Approach to Vowel Restoration in Arabic and Hebrew</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 A Statistical Approach
</SectionTitle>
    <Paragraph position="0"> Identifying contextual relationships is crucial in deciphering lexical ambiguities in both Hebrew and Arabic and is commonly used by native speakers. Hidden Markov Models have been traditionally used to capture the contextual dependencies between words (Charniak 1995). We demonstrate the utility of Hidden Markov Models for the restoration of vowels in Hebrew and Arabic. As we show, our model is straightforward and simple to implement. It consists of hidden states that correspond to diacritisized words from the training corpus, in which each hidden state has a single emission leading to an undiacritisized (non-voweled) word observation.</Paragraph>
    <Paragraph position="1"> Our model does not require any handcrafted linguistic knowledge and is robust in the sense that it generalizes well to other languages. The rest of this paper is organized as follows: in Section 3, we provide an explanation of the corpora we used in our experiment. Section 4 and 5 describe the models we designed as well as our experimental setup for evaluating them.</Paragraph>
    <Paragraph position="2"> Section 6 describes related work done in morphological analysis and vowel restoration in 1 In literature relating to Hebrew morphology analysis, this is often refered to as a pointed word. Hebrew and in Arabic. Finally, Section 7 discusses future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML