XML Viewer - a00-2012

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-2012_metho.xml
Size: 20,525 bytes
Last Modified: 2025-10-06 14:07:02
<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2012">
  <Title>Arabic Morphology Generation Using a Concatenative Strategy</Title>
  <Section position="2" start_page="0" end_page="86" type="metho">
    <SectionTitle>
CVCVC (C=consonant, V=vowel).
</SectionTitle>
    <Paragraph position="0"> There are 15 triliteral patterns, of which at least 9 are in common use, and 4 much rarer quadriliteral patterns. All these patterns undergo some stem changes with respect to voweling in the 2 tenses (perfect and imperfect), the 2 voices (active and passive), and the 5 moods (indicative, subjunctive, jussive, imperative and energetic). ~ The stem used in the conjugation of the verb may differ depending on the person, number, gender, tense, mood, and the presence of certain root consonants. Stem changes combine with suffixes in the perfect indicative (e.g., katab-naa 'we wrote', kutib-a 'it was written') and the imperative (e.g. uktub-uu 'write', plural), and with both prefixes and suffixes for the imperfect tense in the indicative, subjunctive, and jussive moods (e.g. ya-ktub-na 'they write, feminine plural') and in the energetic mood (e.g. ya-ktub-unna or ya-ktub-un 'he certainly writes'). There are a total of 13 person-number-gender combinations. Distinct prefixes are used in the active and passive voices in the imperfect, although in most cases this results in a change in the written form only if diacritic marks are used. 2 Most previous computational treatments of Arabic morphology are based on linguistic models that describe Arabic in a non-concatenative way and focus primarily on analysis. Beesley (1991) describes a system that analyzes Arabic words based on Koskenniemi's 1 The jussive is used in specific constructions, for example, negation in the past with the negative particle tam (e.g., tam aktub 'I didn't write'). The energetic expresses corroboration of an action taking place. The indicative is common to both perfect and imperfect tenses, but the subjunctive and the jussive are restricted to the imperfect tense. The imperative has a special form, and the energetic can be derived from either the imperfect or the imperative.</Paragraph>
    <Paragraph position="1"> z Diacritic marks are used in Arabic language textbooks and occasionally in regular texts to resolve ambiguous words (e.g. to mark a passive verb use).</Paragraph>
    <Paragraph position="2">  (1983) two-level morphology. In Beesley (1996) the system is reworked into a finite-state lexical transducer to perform analysis and generation. In two-level systems, the lexical level includes short vowels that are typically not realized on the the surface level. Kiraz (1994) presents an analysis of Arabic morphology based on the CV-, moraic-, and affixational models. He introduces a multi-tape two-level model and a formalism where three tapes are used for the lexical level (root, pattern, and vocalization) and one tape for the surface level. In this paper, we propose a computational approach that applies a concatenative treatment to Arabic morphology generation by separating the issue of infixation from other inflectional variations. We are developing an Arabic morphological generator using MORPHE (Leavitt, 1994), a tool for modeling morphology based on discrimination trees and regular expressions. MORPHE is part of a suite of tools developed at the Language Technologies Institute, Carnegie Mellon University, for knowledge-based machine translation. Large systems for MT from English to Spanish, French, German, Portuguese and a prototype for Italian have already been developed. Within this framework, we are exploring English to Arabic translation and Arabic generation for pedagogical purposes. We generate Arabic words including short vowels and diacritic marks, since they are pedagogically useful and can always be stripped before display.</Paragraph>
    <Paragraph position="3"> Our approach seeks to reduce the number of rules for generating morphological variants of Arabic verbs by breaking the problem into two parts. We observe that, with the exception of a few verb types, there is very little interaction between stem changes and the processes of prefixation and suffixation. It is therefore possible to decouple, in large part, the problem of stem changes from that of prefixes and suffixes. The gain is a significant reduction in the size number of transformational rules, as much as a factor of three for certain verb classes. This improves the space efficiency of the system and its maintainability by reducing duplication of rules, and simplifies the rules by isolating different types of changes.</Paragraph>
    <Paragraph position="4"> To illustrate our approach, we focus on a particular type of verbs, termed hollow verbs, and show how we integrate their treatment with that of more regular verbs. We also discuss how the approach can be extended to other classes of verbs and other parts of speech.</Paragraph>
  </Section>
  <Section position="3" start_page="86" end_page="86" type="metho">
    <SectionTitle>
1 Arabic Verbal Morphology
</SectionTitle>
    <Paragraph position="0"> Verb roots in Arabic can be classified as shown in Figure 1. 3 A primary distinction is made between weak and strong verbs. Weak verbs have a weak consonant ('w' or 'y') as one or more of their radicals; strong verbs do not have any weak radicals.</Paragraph>
    <Paragraph position="1"> Strong verbs undergo systematic changes in stem voweling from the perfect to the imperfect. The first radical vowel disappears in the imperfect. Verbs whose middle radical vowel in the perfect is 'a' can change it to 'a' (e.g., qaTa'a 'he cut' -&gt; yaqTa'u 'he cuts'), 4 'i' (e.g., Daraba 'he hit' -&gt; yaDribu 'he hits'), or 'u' (e.g., kataba 'he wrote' -&gt; yaktubu 'he writes') in the imperfect. Verbs whose middle radical vowel in the perfect is 'i' can only change it to 'a' (e.g., shariba 'he drank' -&gt; yashrabu 'he drinks') or 'i' (e.g., Hasiba 'he supposed' -&gt; yaHsibu 'he supposes'). Verbs with middle radical vowel 'u' in the perfect do not change it in the imperfect (e.g., Hasuna 'he was beautiful' -&gt; yaHsunu 'he is beautiful'). For strong verbs, neither perfect nor imperfect stems change with person, gender, or number.</Paragraph>
    <Paragraph position="2"> Hollow verbs are those with a weak middle radical. In both perfect and imperfect tenses, the underlying stem is realized by two characteristic allomorphs, one short and one long, whose use depends on the person, number and gender.</Paragraph>
  </Section>
  <Section position="4" start_page="86" end_page="86" type="metho">
    <SectionTitle>
3 Grammars of Arabic are not uniform in their
</SectionTitle>
    <Paragraph position="0"> classification of &amp;quot;hamzated&amp;quot; verbs, verbs containing the glottal stop as one of the radicals (e.g. \[sa?a\[\] 'to ask'). Wright (1968) includes them as weak verbs, but Cowan (1964) doesn't. Hamzated verbs change the written 'seat' of the hamza from 'alif' to 'waaw' or 'yaa?', depending on the phonetic context.</Paragraph>
  </Section>
  <Section position="5" start_page="86" end_page="88" type="metho">
    <SectionTitle>
4 In the Arabic transcription capital letters indicate
</SectionTitle>
    <Paragraph position="0"> emphatic consonants; 'H' is the voiceless pharyngeal fricative ; &amp;quot;' the voiced pharyngeal fricative ; '?' is the glottal stop 'hamza'.</Paragraph>
    <Paragraph position="2"> Hollow verbs fall into four classes: . Verbs of the pattern CawaC or CawuC (e.g. \[Tawut\] 'to be long'), where the middle radical is 'w'. Their characteristic is a long 'uu' between the first and last radical in the imperfect. E.g., From the underlying root \[zawar\]: zaara 'he visited' and yazuuru 'he visits' Stem allomorphs: Perfect: -zur- and -zaar-Imperfect:-zur- and-zuur. Verbs of the pattern CawiC, where the middle radical is 'w'. Their characteristic is a long 'aa' between the first and last radical in the imperfect. E.g., From the underlying root \[nawim\]: naama 'he slept and yanaamu 'he sleeps' Stem aUomorphs : Perfect: -nirn- and -naam-Imperfect:-ham- and-naam. Verbs of the pattern CayaC, where the middle radical is 'y'. Their characteristic is a long 'ii' before the first and last radical in the imperfect. E.g., From the underlying root \[baya&amp;quot; \]: baa&amp;quot; a 'he sold' and yabii&amp;quot; u 'he sells' Stem allomorphs : Perfect: -bi'- and -baa'-Imperfect: and -bi'- and -bii'. Verbs of the pattern CayiC, where middle radical is 'y'. E.g., From the underlying root \[hayib\]: haaba 'he feared' and yahaabu 'he fears' Stem allomorphs : Perfect: -bib- and-haab-Imperfect: -hab- and-haab-In the relevant literature (e.g., Beesley, 1998; Kiraz, 1994), verbs belonging to the above classes are all assumed to have the pattern CVCVC. The pattern does not show the verb conjugation class and makes it difficult to predict the type of stem allomorph to use. To avoid these problems, we keep information on the middle radical and vowel in the base form of the verb. In generation, classes 2 and 4 of the verb can be handled as one because they have the same perfect and imperfect stemsP 5 The only exception is the passive participle. Verbs of classes 1 and 2 behave the same (e.g. Class 1: \[zawar\]: mazuwr 'visited'; Class 2 \[nawil\] --) manuwt 'obtained'), as do verbs of classes 3 and 4 (e.g. Class 3: \[baya'\] --) mabii&amp;quot; 'sold', Class 4: \[hayib\] --) mahiib 'feared').</Paragraph>
    <Paragraph position="3">  We describe our approach to modeling strong and hollow verbs below, following a description of the implementation framework.</Paragraph>
  </Section>
  <Section position="6" start_page="88" end_page="88" type="metho">
    <SectionTitle>
2 The MORPHE System
</SectionTitle>
    <Paragraph position="0"> MORPHE (Leavitt, 1994) is a tool that compiles morphological transformation rules into either a word parsing program or a word generation program. 6 In this paper we will focus on the use of MORPHE in generation.</Paragraph>
    <Paragraph position="1"> Input and Output. MORPHE's output is simply a string. Input is a feature structure (FS) which describes the item that MORPHE must transform. A FS is implemented as a recursive Lisp list. Each element of the FS is a feature-value pair (FVP), where the value can be atomic or complex. A complex value is itself a FS. For example, the FS for generating the Arabic zurtu 'I visited' would be:</Paragraph>
    <Paragraph position="3"> The choice of feature names and values, other than ROOT, which identifies the lexical item to be transformed, is entirely up to the user. The FVPs in a FS come from one of two sources.</Paragraph>
    <Paragraph position="4"> Static features, such as CAT (part of speech) and ROOT, come from the syntactic lexicon, which, in addition to the base form of words, can contain morphological and syntactic features. Dynamic features, such as TENSE and NUMBER, are set by MORPHE's caller.</Paragraph>
    <Paragraph position="5"> The Morphological Form Hierarchy.</Paragraph>
    <Paragraph position="6"> MORPHE is based on the notion of a morphological form hierarchy (MFH) or tree.</Paragraph>
    <Paragraph position="7"> Each internal node of the tree specifies a piece of the FS that is common to that entire subtree. The root of the tree is a special node that simply binds all subtrees together. The leaf nodes of the tree correspond to distinct morphological forms in the language. Each node in the tree below the root is built by specifying the parent of the node and the conjunction or disjunction of FVPs that define the node. Portions of the Arabic MFH are shown in Figures 2-4.</Paragraph>
    <Paragraph position="8"> Transformational Rules. A rule attached to each leaf node of the MFH effects the desired morphological transformations for that node.</Paragraph>
    <Paragraph position="9"> A rule consists of one or more mutually exclusive clauses. The 'if' part of a clause is a regular expression pattern, which is matched against the value of the feature ROOT (a string). The 'then' part includes one or more operators, applied in the given order. Operators include addition, deletion, and replacement of prefixes, infixes, and suffixes. The output of the transformation is the transformed ROOT string.</Paragraph>
    <Paragraph position="10"> An example of a rule attached to a node in the MFH is given in Section 3.1 below.</Paragraph>
    <Paragraph position="11"> Process Logic. In generation, the MFH acts as a discrimination network. The specified FS is matched against the features defining each subtree until a leaf is reached. At that point, MORPHE first checks in the irregular forms lexicon for an entry indexed by the name of the leaf node (i.e., the MF) and the value of the ROOT feature in the FS. If an irregular form is not found, the transformation rule attached to the leaf node is tried. If no rule is found or none of the clauses of the applicable rule match, MORPHE returns the value of ROOT unchanged.</Paragraph>
  </Section>
  <Section position="7" start_page="88" end_page="91" type="metho">
    <SectionTitle>
3 Handling Arabic Verbal
</SectionTitle>
    <Paragraph position="0"> Morphology in MORPHE Figure 2 sketches the basic MFH and the division of the verb subtree into stem changes and prefix/suffix additions. 7 The inflected verb is generated in two steps. MORPHE is first called with the feature CHG set to STEM. The required stem is returned and temporarily substituted for the value of the ROOT feature.  6 MORPHE is written in Common Lisp and the compiled MFH and transformation rules are themselves a set of Common Lisp functions. 7 The use of two parts of the same tree for the two problems is a constraint of MORPHE's  The second call to MORPHE, with feature CHG set to PSFIX, adds the necessary prefix and/or suffix and returns the fully inflected verb.  to traverse the discrimination tree. The feature PAT is used in conjunction with the ROOT feature to select the appropriate affixes. Knowing the underlying root and its voweling is crucial for the determination of hollow verb stems, as described in Section 1. Knowing the pattern is also important in cases where it is unclear. For example, verbs of pattern CtVCVC insert a 't' after the first radical (e.g. ntaqat 'to move, change location', intransitive). With some consonants as first radicals, in order to facilitate pronunciation, the 't' undergoes a process of assimilation whose effects differ depending on the preceding consonant. For example, the pattern CtVCVC verb from zaHam 'to shove' instead of *ztaHarn is zdaHam 'to team'. It is also difficult to determine from just the string ntaqat whether this is pattern nCVCVC of the verb *taqat (if it existed) or pattern CtVCVC of naqat 'to transport, move', transitive).</Paragraph>
    <Section position="1" start_page="89" end_page="90" type="sub_section">
      <SectionTitle>
3.1 Handling Strong and Hollow Verb
</SectionTitle>
      <Paragraph position="0"> Morphology in MORPHE As a demonstration of our approach, we discuss the case of hollow verbs, whose characteristics were described in Section 1. Figure 3 shows the MFH for strong and hollow verbs of pattern CVCVC in the perfect tense, active voice. We use the feature vow to carry information about the voweling of the verb in the imperfect (discussed below) and overload it to distinguish hollow and other kinds of verbs.  Strong and Hollow Verbs of Pattern CVCVC In the perfect active voice, regular strong verbs do not undergo any stem changes, but doubled radical verbs do. Rules effecting these changes are attached to the node labeled with the FVP (vow (*or* a i u)). 8 The hollow verbs, on the other hand, use a long stem with a middle 'alif' (e.g. \[daam\] 'to last') for third person singular and dual (masculine and feminine) and for third person plural masculine. The remaining person-number-gender combinations take a short stem whose voweling depends on the underlying root of the verb, as specified earlier. Transformation rules attached to the leaf nodes perform the conversion of the ROOT feature value to the short and long stem.</Paragraph>
      <Paragraph position="1"> Inside the stem change rules, the four different classes of hollow verbs are treated as three separate conditions (classes 2 and 4 can be merged, as described in Section 1) by matching on the middle radical and the adjacent vowels and replacing them with the appropriate vowel.</Paragraph>
      <Paragraph position="2"> 8 Hamzated verbs changes are due to interactions with specific suffixes and are best dealt with in the prefixation and suffixation subtree.</Paragraph>
      <Paragraph position="3">  An example of such a rule, which changes the perfect stem to a short one for persons 1 and 2 both singular and plural, follows.</Paragraph>
      <Paragraph position="5"> The syntax %{var} is used to indicate variables with a given set of values. Enclosing a string in parenthesis associates it with a numbered register, so the replace infix (ri) operator can access it for substitution.</Paragraph>
      <Paragraph position="6"> Figure 4 shows the imperfect subtree for strong and hollow verbs. Strong verbs are treated efficiently by three rules branching on the middle radical vowel, given as the value of vow. The consonant-vowel pattern of the computed stem is shown (e.g. for kataba 'he wrote', the imperfect stem would be -ktub- in the pattern CCuC). As described in Section 1, the possible vowel in the imperfect is restricted but not always determined by the perfect vowel and so must be stored in the syntactic lexicon. 9 Separating stem changes from the addition of prefixes and suffixes significantly reduces the number of transformation rules that must be written by eliminating much repetition of prefix and suffix addition for different stem changes. For strong verbs of pattern CVCVC, there is at least a three-fold reduction in the number of rules for active voice (recall the different kinds of vowel changes for these verbs from perfect to imperfect described in Section 1). Other patterns and the passive of pattern CVCVC verbs show less variation in stem voweling but more variation in prefix and suffix voweling. Since some of the patterns share the same prefix and suffix voweling, once the stem has been determined, the prefixation and suffixation rules can be shared by pattern groups.</Paragraph>
      <Paragraph position="7"> The hollow verb subtree is not as small for the imperfect as it is for the perfect, since the stem depends not only on the mood but also on the person, gender, and number. It is still advantageous to decouple stem changes from prefixation and suffixation. Suffixes differ in the indicative and subjunctive moods; if the two types of changes were merged, the stem transformations would have to be repeated in each of the two moods and for each person-number-gender combination. The same observation applies to stem changes in the passive voice as well. Significant replication of transformational rules that include stem changes makes the system bigger and harder to maintain in case of changes, particularly because each transformational rule needs to take into consideration the four different classes of hollow verbs.</Paragraph>
    </Section>
    <Section position="2" start_page="90" end_page="91" type="sub_section">
      <SectionTitle>
3.2 An Example of Generation
</SectionTitle>
      <Paragraph position="0"> Consider again the example verb form zurtu 'I visited' and the feature structure (FS) given in Section 2. During generation, the feature-value pair (CHG STEM) is added to the FS before the first call to MORPHE. Traversing the MFH shown in Figure 2, MORPHE finds the rule v-stem-fl-act-perf-12 given in Section 3.1 above. The first clause fires, replacing the 'awa' with 'u' and MORPHE returns the stem -zur-. This stem is substituted as the value of the ROOT feature in the FS and the feature-value pair (CHG STEM) is changed to (CHG PSFIX) before the second call to MORPHE. This time MORPHE traverses a different subtree and reaches the rule:</Paragraph>
      <Paragraph position="2"> This rule, currently simply appends &amp;quot;otu&amp;quot; to the string, and MORPHE returns the string &amp;quot;zurotu&amp;quot;, where the 'o' denotes the diacritic &amp;quot;sukuun&amp;quot; or absence of vowel. This is the desired form for zurtu 'I visited'.</Paragraph>
      <Paragraph position="3"> 9 In the presence of certain second and third radicals, the middle radical vowel is more precisely determined. This information can be incorporated into the syntactic lexicon as it is being built.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML