File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1018_metho.xml

Size: 13,195 bytes

Last Modified: 2025-10-06 14:14:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1018">
  <Title>Consonant Spreading in Arabic Stems</Title>
  <Section position="3" start_page="117" end_page="119" type="metho">
    <SectionTitle>
2 Arabic Consonant Gemination and
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="117" end_page="118" type="sub_section">
      <SectionTitle>
Spreading
2.1 Gemination in Forms II and V
</SectionTitle>
      <Paragraph position="0"> Some verb and noun stems exhibit a double realization (a copying) of an underlying radical, resulting in gemination 3 or spreading at the surface level. Looking at gemination first, it is best known from verb stems known in the European tradition as Forms II and V, where the middle radical is doubled. Kay's (1987) pattern notation uses a G symbol before the C slot that needs to be doubled. 4 3Gemination in Arabic words can alternatively be analyzed as consonant lengthening, as in Harris (1941) and as implied by Holes (1995). This solution is very attractive if the goal is to generate fully-voweled orthographical surface strings of Arabic, but for the phonological examples in this paper we adopt the gemination representation as used by phonologists like McCarthy (1981). 4 Kay's stem-building mechanism, using a multi-tape transducer implemented in Prolog, sees G on the pattern tape and writes a copy of the middle radical on the stem tape without consuming it. Then the following C does the same but consumes the radical symbol in the usual way. Kay's analysis in fact abstracts out the vocaliza- null Root: k t b d r s Pattern: CaGCaC CaGCaC Stem: kattab darras In the same spirit, but with a different mechanism, our Form II and Form V patterns contain an X symbol that appears after the consonant slot to be copied.</Paragraph>
      <Paragraph position="1"> Root: k t b d r s Pattern: CaCXaC CaCXaC Stem: katXab darXas As in all cases, the stem is formed by straight-forward intersection, resulting in abstract stems like darXas. The X symbol is subsequently realized via finite-state variation rules as a copy of the preceding consonant in a phonological grammar (/darras/) or, in an orthographical system such as ours, as an optionally written shadda diacritic (~r,~.~). Finite-state rules to effect such limited local copying are trivially written, s</Paragraph>
    </Section>
    <Section position="2" start_page="118" end_page="119" type="sub_section">
      <SectionTitle>
2.2 Gemination/Spreading in Form IX
</SectionTitle>
      <Paragraph position="0"> Spreading, which appears to involve consonant copying over intervening phonemes, is not so different from gemination; and indeed it is common in &amp;quot;spreading&amp;quot; verb stems for the spreading to alternate productively with gemination.</Paragraph>
      <Paragraph position="1"> The best known example of Arabic consonant spreading is the verbal stem known as Form IX (the same behavior is also seen in Form XI, Form XIV, Form QIV and in several noun forms). A typical example is the root dhm (~, 0 z), which in Form IX has the meaning &amp;quot;become black&amp;quot;.</Paragraph>
      <Paragraph position="2"> Spreading is not terribly common in Modern Standard Arabic, but it occurs in enough verb and noun forms to deserve, in our opinion, full treatment. In our lexicon of about 4930 roots, tion, placing it on a separate transducer tape, but this difference is not important here. For extensions of this multi-tape approach see Kiraz (1994; 1996). The current approach differs from the multi-tape approaches in formalizing roots, patterns and vocalizations as regular languages and by computing (&amp;quot;linearizing&amp;quot;) the stems at compile time via intersection of these regular languages (Beesley, 1998a; Beesley, 1998b).</Paragraph>
      <Paragraph position="3"> 5See, for example, the rules of Antworth (1990) for handling the limited reduplication seen in Tagalog.</Paragraph>
      <Paragraph position="5"> 20 have Form IX possibilities (see Figure 2).</Paragraph>
      <Paragraph position="6"> Most of them (but not all) share the general meaning of being or becoming a certain color.</Paragraph>
      <Paragraph position="7"> McCarthy (1981) and others (Kay, 1987; Kiraz, 1994; Bird and Blackburn, 1991) postulate an underlying Form IX stem for dhm that looks like dhamam, with a spreading of the final m radical; other writers like Beeston (1968) list the stem as dhamm, with a geminated or lengthened final radical. In fact, both forms do occur in full surface words as shown in Figure 3, and the difference is productively and straight-forwardly phonological. For perfect endings like +a ('he') and +at ('she'), the final consonant is geminated (or &amp;quot;lengthened&amp;quot;, depending on your formal point of view). If, however, the suffix begins with a consonant, as in +tu (T) or +ta ('you, masc. sg.'), then the separated or true spreading occurs.</Paragraph>
      <Paragraph position="8"> From a phonological view, and reflecting the  notation of Beeston, it is tempting to formalize the underlying Form IX perfect active pattern as CCaCX so that it intersects with root dhm to form dhamX. When followed by a suffix beginning with a vowel such as +a or +at, phonologically oriented variation rules would realize the X as a copy of the preceding consonant (/dhamm/). Arabic abhors consonant clusters, and it resorts to various &amp;quot;cluster busting&amp;quot; techniques to eliminate them. The final phonological realization would include an epenthetical/?i/on the front, to break up the dh cluster, and would treat the copied m as the onset of a syUable that includes the suffix: /?idham-rna/, or, orthographically, ~.2b.~!. When followed by a suffix beginning with a consonant, as in dhamX+tu, the three-consonant cluster would need to be broken up by another epenthetic vowel as in /?id-ha-rnam-tu/, or, orthographically, &amp;quot;~.~!. However, for reasons to become clearer below when we look at biliteral roots, we defined an underlying Form IX perfect active pattern CCaCaX leading to abstract stems like dhamaX.</Paragraph>
    </Section>
    <Section position="3" start_page="119" end_page="119" type="sub_section">
      <SectionTitle>
2.3 Other Cases of Final Radical
Gemination/Spreading
</SectionTitle>
      <Paragraph position="0"> Other verb forms where the final radical is copied include the rare Forms XI and XIV. Root lhj (.~ ~ ~) intersects with the Form XI perfect active pattern CCaaCaX to form the abstract stem lhaajaX (&amp;quot;curdle&amp;quot;/&amp;quot;coagulate&amp;quot;), leading to surface forms like /?il-haaj-ja/ (.~5~!) and /?il-haa-jaj-tu/ (-,~.~!) that vary exactly as in Form IX. The same holds for root shb (,.?. ~ ~,,), which takes both Form IX (s.habaX) and Form XI (shaabaX), both meaning &amp;quot;become reddish&amp;quot;. In our lexicon, one root q% (~r' ~. d) takes form XIV, with patterns like the perfect active CCanCaX and imperfect active CCanCiX (&amp;quot;be pigeon-breasted&amp;quot;). Other similar Form XIV examples probably exist but are not reflected in the current dictionary.</Paragraph>
      <Paragraph position="1"> Aside from the verbal nouns and participles of Forms IX, XI and XIV, other noun-like patterns also involve the spreading of the final radical. These include CiCCiiX and Ca-CaaCiiX, taken by roots nhr (.; ~ ~3), mean-</Paragraph>
      <Paragraph position="3"> t.xr (.j ~ .b), both meaning &amp;quot;cloud&amp;quot;. When an X appears after a long vowel as in t.uxruuX, it is always realized as a full copy of the previous consonant as in /tuxruur/ (_;.%~9d,), 1lo matter what follows.</Paragraph>
    </Section>
    <Section position="4" start_page="119" end_page="119" type="sub_section">
      <SectionTitle>
2.4 Middle Radical
Gemination/Spreading
</SectionTitle>
      <Paragraph position="0"> Just as Forms II and V involve gemination of the middle radical, other forms including Form XII involve the separated spreading of the middle radical. A preceding diphthong, like a preceding long vowel, causes X to be realized as a full copy of the preceding consonant, as shown in the following examples.</Paragraph>
      <Paragraph position="1">  A few other patterns show the same behavior. While not especially common, there are more roots that take middle-radical-spreading noun patterns than take the better-known Form IX verb patterns.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="119" end_page="119" type="metho">
    <SectionTitle>
3 Biliteral Roots
</SectionTitle>
    <Paragraph position="0"> As pointed out ill McCarthy (1981, p. 3967), the gemination vs. spreading behavior of Form IX stems is closely paralleled by Form I stems involving traditionally analyzed &amp;quot;biliteral&amp;quot; or ':geminating&amp;quot; roots such as tm (also characterized as tmm) and sm (possibly smm) and many others of the same ilk. As shown in Figure 4, these roots show Form I gemination with suffixes beginning with a vowel vs. full spreading when the suffix begins with a consonant. However Form IX is handled, these parallels strongly suggest that the exact same underlying forms and variations rules should also handle the form I of biliteral roots.</Paragraph>
    <Paragraph position="1"> However, the Form I perfect active pattern, in the current notation, is simply CaCaC (or Root: k t b k t b  idiosyncratically for some roots, CaCuC or CaCiC). As shown in Figure 5, there is no evidence, for normal triliteral roots like ktb, that any kind of copying is specified by the Form I pattern itself.</Paragraph>
    <Paragraph position="2"> Keeping CaCaC as the Form I perfect active pattern, the behavior of biliteral roots falls out effortlessly if they are formalized not as srn and tin, nor as smm and tram, but as smX and tmX, with the copying-trigger X as the third radical of the root itself. Such roots intersect in the normM way with triliteral patterns as in Figure 6, and they are mapped to appropriate surface strings using the same rules that realize Form IX stems.</Paragraph>
  </Section>
  <Section position="5" start_page="119" end_page="121" type="metho">
    <SectionTitle>
4 Rules
</SectionTitle>
    <Paragraph position="0"> The TWOLC rule (Karttunen and Beesley, 1992) that maps an X, coming either fl'om roots like tmX or from patterns like Form IX CCaCaX.</Paragraph>
    <Paragraph position="1"> into a copy of the previous consonant is the following, where Cons is a grammar-level variable ranging freely over consonants, LongVowel is a grammar-level variable ranging freely over long vowels and diphthongs, and C is an indexed local variable ranging over the enumerated set of consonants.</Paragraph>
    <Paragraph position="3"> where C in (b t 0 j h x d 6 r z sf d;6 xfqk imnhwy);  The rule, which in fact compiles into 27 rules, one for each enumerated consonant, realizes underlying X as surface C if and only if one of the following cases applies: 6 * First Context: X is preceded by a surface C and one or more non-consonants, and is followed by a suffix beginning with a consonant. This context matches lexical dhamaX+tu, realizing X as m (ultimately written &amp;quot;,/~/,L~.~!), but not dhamaX+a, which is written ~.~!.</Paragraph>
    <Paragraph position="4"> - Second Context: X is preceded by a surface C and a long vowel or diphthong, no matter what follows. This maps lexical dabaaXiir to dabaabiir (.t U-%).</Paragraph>
    <Paragraph position="5"> * Third Context: X is preceded by a surface C, another X and any symbol, no matter what follows. This matches the second X in samXaX+tu and samXaXWa to produce samXam+tu and samXam+a respectively, with ultimate orthographical realizations such as ~&amp;quot; and &amp;quot;~C/~. In the current system, where the goal is to recognize and generate orthographical words of Modern Standard Arabic, as represented in ISO8859-6, UNICODE or an equivalent encoding, the default or &amp;quot;elsewhere&amp;quot; case is for X to be realized optionally as a shadda diacritic.</Paragraph>
  </Section>
  <Section position="6" start_page="121" end_page="121" type="metho">
    <SectionTitle>
5 Multiple Copies of Radicals
</SectionTitle>
    <Paragraph position="0"> When a biliteral root like smX intersects with the Form II pattern CaCXaC, the abstract result is the stem samXaX. The radical m gets geminated (or lengthened) once and spread once to form surface phonological phonological strings like /sammama/ and /sammamtu/, which become orthographical -~ and &amp;quot;~ respectively. And if both roots and patterns can contain X, then the possibility exists that a copying root could combine with a copying pattern, requiring a full double spreading of a radical in the surface string. This in fact happens in a single example (in the present lexicon) with ~The full rule contains several other contexts and fine distinctions that do not bear on the data presented here. For example, the w in the set C of consonants must be distinguished from the w-like offglide of diphthongs.  the root mkX, which combines legally with the noun pattern CaCaaXiiC as in Figure 7. In the surface string makaakiik (&amp;quot;shuttles&amp;quot;), orthographically * A~, the middle radical k is spread twice. The variation rules handle this and the smX examples without difficulty.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML