Finite State Automata and Arabic Writing 
Michel Fanton 
CERTAL-INALCO 1 
73 rue Broca 
F75013 Paris France 
email : certal2@ext.jussieu.fr 
Abstract 
Arabic writing has specific features, which im- 
ply computational overload for any arabicized 
software. Finite state automata are well known 
to give efficient solutions for translation prob- 
lems which can be formalized as regular lan- 
guages. These automata are as more easily built 
that their alphabet have been reduced through a 
careful linguistic analysis. This reduction makes 
it possible to write directly an automaton with- 
out going through the intermediate stage of con- 
textual rules, which have to be translated into 
an automaton for the sake of efficiency. This 
paper presents two Moore automata, the first 
one, taken as an example, gives a solution to the 
choice of right shape for a letter to be printed 
or displayed (usually known as contextual anal- 
ysis), the second one studies the more complex 
problem of determining the right carrying letter 
for hamza. Every arabicized software has to face 
these questions and finite state automata are 
certainly a good answer to them. 
INTRODUCTION 
Arabic writing has specific features, which im- 
ply computational overload for any arabicized 
software. The first one, well known now for 
many years, is the fact that Arabic printing tries 
to imitate handwriting. Because of this, conso- 
nants and long vowels can have four or only two 
shapes depending of their ability to be bound to 
the following letter and of where they appear in 
the word. 
These shapes can be very different : for example 
letter o 2 (h) 
ICERTAL : Centre d'l~tudes et de Recherche en 
Traitement Automatique des Langues, INALCO : In- 
stitut National des Langues et Civilisations Orientales 
~the Arabic parts of this paper have been typeset 
isolated final medial initial 
or present only small variations : for example 
letter ~r* (s) 
isolated final medial initial 
Letters which cannot be bound to the next 
one have only two shapes, for example letters 
(d) and .~ (w and fi) 
isolated final isolated final 
During the seventies and the beginning of the 
eighties, hard controversies took place within 
the Arabs concerned with these questions, lin- 
guists and computer scientists. Finally in 1983 
the ASMO (Arab Society for Normalization 
which unfortunately does not exist any more), 
influenced by Pr. Lakhdar-Ghazal from IERA 
(Rabat Morocco) chose to give a unique code to 
all shapes of one particular letter. This is cer- 
tainly a good choice from a linguistic point of 
view, but even so, compromises had to be made 
to take into account writing habits that con- 
flicted with it. Letter hamza is the most notice- 
able example of such a compromise for reasons 
we shall explain later. 
1 CONTEXTUAL ANALYSIS 
Whatever be the choice made for coding, from 
a typesetting or a computational point of view, 
there must be different codes for the different 
shapes of a letter. So every arabicized software 
has to use two systems for coding : the reduced 
code we have just introduced and the extended 
code in which the different shapes have different 
using Klaus Lagally's ArabTEX 
26 
codes. Up to UNICODE, no normalization exists 
for the second one. So every arabicized software 
has to solve the problem of choosing the right 
shape of every printed or displayed letter. 
1.1 Rules for letter shape 
determination 
This determination, frequently known as con- 
textual analysis can be summarized into the fol- 
lowing set of unformal rules: 
1. At the beginning of a word: 
• If the letter is a binding letter it takes 
the INITIAL shape. 
• If it is a non binding one it takes the 
ISOLATED shape. 
2. In the middle of a word (there is at least 
one letter following the current one): 
(a) If the letter is a binding letter then 
• If it follows a binding letter it takes 
the MEDIAL shape. 
• If it follows a non binding letter it 
takes the INITIAL shape. 
(b) If the letter is a non binding letter 
• If it follows a binding letter it takes 
the FINAL shape. 
• If it follows a non binding letter it 
takes the ISOLATED shape. 
3. At the end of a word (for both types of 
letters) 
• If it follows a binding letter it takes the 
FINAL shape. 
• If it follows a non binding letter it 
takes the ISOLATED shape. 
1.2 Moore and Mealy automata 
Moore automata are state assigned output ma- 
chines : the output function assigns output 
symbols to each state. They differ from Mealy 
automata, transition assigned finite state ma- 
chines, where output symbols are associated 
with transitions between states. Mealy au- 
tomata are sometimes called finite transducers. 
The two machine types have been demonstrated 
to produce the same input-output mappings 3. 
3see (Aho and Unman, 1972) and (Hopcroft and Ull- 
man, 1979) for a full account of these matters 
Mealy automata are certainly a better choice 
when bidirectional applications are considered. 
As the question is to identify succession of sym- 
bols of a certain type we found it clearer to use 
a Moore automaton. 
1.3 A Moore automaton for contextual 
analysis 
1.3.1 Source language of the automaton 
It follows from the determination rules that we 
only need to know what particular letter we are 
dealing with only at the output stage. All we 
have to know is wether it is a binding or a non 
binding letter 4. The alphabet of the automaton 
should be A = (#} \[J L where L is the set of 
arabic letters present in the reduced code and 
# the word boundaries. The set of letters will 
then be partitioned into three sets • 
A--+ A'- {{#},N,B} 
N being the set of non binding letters and B the 
set of binding letters. If we denote respectively 
n and b an arbitrary element of each of these 
sets, the source language of the automaton can 
be reduced to: 
A1 = {#, n, b} 
L1 = {#(n Vb)'#} 
where V denotes disjunction and • is the Kleene 
star 
1.3.2 Grammar and automaton for L1 
Language L1 can be generated by the simple 
grammar : 
m -+ #A# 
A A( lb) 
or the as simple automaton : 
initial states = {1} 
final states: -- {5} 
transitions 
4As this question has only been taken as an example, 
the alphabet has been oversimplified. A full working 
automaton should cope, as far as arabic is concerned, 
with two additional problems : hamza on the line to 
which no preceding letter can be bound to and l~m alif 
ligature. It should also give a proper treatment of non 
arabic letters and symbols. But this would not affect the 
here described method. 
27 
b n # output 
100 2 0 
2340 # 
334 5 b 
434 5 n 
5000 # 
1.3.3 Target language of the automaton 
The alphabet for the target language L2, given 
what has been said before and using the same 
method of partioning and then reducing the al- 
phabet could be at first sight: 
A2 = {#,I,i,m,f} 
where I denotes a letter in isolated shape, i, m 
and f stand for initial, medial and final shape. 
But letters from N have only two shapes final 
and isolated. Moreover isolated and final shapes 
of letters from B can only appear at the end of a 
word, which is not the case for the correspond- 
ing shapes of letters from N. So, the following 
modified version of A2 will be prefered : 
A2 = {#,In, Ib, i,m,f~,fb} 
where In stands for isolated shape of a letter 
from N and so on. With these symbols the tar- 
get language L2 can be described by the regular 
expression : 
L2 = {#(I~im'fnI~)'(Ib V fb V E)#} 
where E denotes as usual the empty string. 
1.3.4 Translation automaton 
The translation process of a sequence of LI into 
a legal sequence of L2 can be operated through 
the following automaton : 
initial states = {1} 
final states = {8} 
transitions : 
n b # output 
1 2 {3,7} # 
2 2 {a,7} 8 I. 
3 6 {4,5} @ i 
4 6 {4,5} @ m 
5 0 0 8 A 
6 2 {3,7} S f. 
7 0 0 S Ib 
8 o o o # 
This automaton is clearly nondeterministic. 
This is due to the fact that a letter from B 
can appear in final or isolated shape when sit- 
uated at the end of a word, in initial or medial 
shape when another letter follows it. Because 
of this nondeterministic feature, every transi- 
tion should appear as a set. When this set is a 
singleton, the "only" state has been put without 
braces for an easier reading. 
It can be easily augmented to take account of 
occasional short vowels or shadda 5 (') that could 
occur : the transitions to add would force the 
automaton to loop onto the same state, what- 
ever be it since vowels or shadda can only ap- 
pear after a consonant and do not influence its 
shape. 
1.3.5 PROLOG test program 
This program is a straightforward translation of 
the above described grammar and automaton. 
The predicate test allows to limit the genera- 
tion of inputs to a given length. In the results 
we chose to limit the length of the input to 6 
included word boundaries. 
X 
7, generation of elements of LI 
X 
m--> \[#\],a,\[#\]. 
a --> (\[n\];l'b\]). 
a --> a,(\[n\];\[b\]). 
X 
7, translation automaton 
Y. 
init ial_stat e ( 1). 
final_state (8). 
tr(1,#,l) 
tr(1 ,n,2) 
tr(1 ,b,3) 
tr(1,b,7) 
tr(2,#,8) 
tr(2,n,2) 
tr(2,b,3) 
tr(2,b,7) 
tr(3,n,6) 
tr(3,b,5) 
tr(3,b,4) 
tr(4,b,5) 
tr(4,b,4) 
tr(4,n,6) 
tr(5,#,8) 
tr(6,#,8) 
tr(6,n,2) 
tr(6,b,3) 
tr(6,b,7) 
tr(7,#,8) 
output (I, #). output (5,fb). 
output(2, 'In'). output(6,fn). 
5sign denoting a double letter 
28 
output(3,i). 
output(4,m). 
output(7,'Ib'). 
output(8,#). 
forme(Input,Output):- 
initial_state(Is), 
path(Is,Fs,lnput,Output), 
final_state(Fs). 
path(S,S,\[\],\[\]). 
path(SI,S2,\[XIXs\],\[YIYs\]):- 
tr(SI,X,S), 
output(S,Y), 
path(S,S2,Xs,Ys). 
test(L):- 
m(M,\[\]), 
length(M,L1), 
((LI > L,!,nl,fail);true), 
printing_form(M,F), 
nl,write(M),tab(1),write(F),fail. 
test(_). 
1.3.6 Program results 
input output \[#,n,#\] 
\[#,b,#\] \[#,n,n,#\] 
\[#,n,b,#\] \[#,b,n,#\] 
\[#,b,b,#\] \[#,n,n,n,#\] 
\[#,n,n,b,#\] 
\[#,In,#\] \[#,Ib,#\] 
\[#,In,In,#\] \[#,In,Ib,#\] 
\[#,i,fn,#\] \[#,i,fb,#\] 
\[#,In,In,In,#\] \[#,In,In,Ib,#\] 
\[#,n,b,n,#\] 
\[#,n,b,b,#\] 
\[#,b,n,n,#\] 
\[#,b,n,b,#\] 
\[#,b,b,n,#\] 
\[#,b,b,b,#\] 
\[#,In,i,fn,#\] 
\[#,In,i,fb,#\] 
\[#,i,fn,In,#\] 
\[#,i,fn,Ib,#\] 
\[#,i,m,fn,#\] 
\[#,i,m,fb,#\] 
\[#,n,n,n,n,#\] 
\[#,n,n,n,b,#\] 
\[#,n,n,b,n,#\] 
\[#,n,n,b,b,#\] 
\[#,n,b,n,n,#\] 
\[#,n,b,n,b,#\] 
\[#,n,b,b,n,#\] 
\[#,n,b,b,b,#l 
\[#,b,n,n,n,#l 
\[#,b,n,n,b,#\] 
\[#,b,n,b,n,#\] 
\[#,b,n,b,b,#\] 
\[#,In,In,In,In,#\] 
\[#,In,In,In,Ib,#\] 
\[#,In,In,i,fn,#\] 
\[#,In,In,i,fb,#\] 
\[#,In,i,fn,In,#\] 
\[#,In,i,fn,Ib,#\] 
\[#,In,i,m,fn,#\] 
\[#,In,i,m,fb,#\] 
\[#,i,fn ,In ,In,#\] 
\[#,i,fn,In,Ib,#\] 
\[#,i,fn,i,fn,#\] 
\[#,i,fn,i,fb,#\] 
input 
\[#,b,b,n,n,#\] 
\[#,b,b,n,b,#\] 
\[#,b,b,b,n,#\] 
\[#,b,b,b,b,#\] 
output 
\[#,i,m,fn,In,#\] 
\[#,i,m,fn,Ib,#\] 
\[#,i,m,m,fn,#\] 
\[#,i,m,m,fb,#\] 
2 WRITING OF LETTER HAMZA 
The hamza can be written in five different man- $ 
ners (I, !, 3, ~, ') depending mainly upon: 
• its position within the word 
• the preceding and the following vowel 
As the choice made for coding, was to adhere 
to a linguistic point of view, there should have 
been only one code for all these shapes and car- 
rying consonants. But, as it has just been said, 
to determine the correct writing of hamza, one 
has to know the surrounding vowels, and it is of 
common knowledge that the Arabs do not usu- 
ally write short vowels. These essential data be- 
ing missing, no algorithm can take place to ful- 
fil this task for a common usage such as display 
a text on a screen. Thus, the ASMO decided 
to have distinct codes for the different carriers 
of hamza, but not of course for their different 
shapes which can be determined as seen before. 
So why is this question of any interest ? If we 
consider NLP applications for Arabic, it could 
worth considering this problem at generation 
stage. For instance many vowel alternations 
occur in the conjugation of verbs, and when a 
hamza is present in the verb root, the hamza 
writing will vary accordingly. 
For example the verb I~ qara'a - he (has) 
read-changes to 5.~." yaqra'fna-they read 
o (present) - and to ~.z~ quri'a - it (has) been 
read. And at the generation stage vowels are 
known even if we decided not to write them. 
The only alternative would be to put all the 
forms in a dictionary. At CERTAL, our philos- 
ophy is to use all the possible means to reduce 
the size of dictionaries. Hence this question ap- 
peared to us worth studying. 
2.1 Rules of hamza writing 
1. When a hamza is at the beginning of a word 
it is written 
29 
• over an alif (i) if the next vowel is 
an "a" (') as in l~l 
(present)- or an "u" 
'uktub - write ! - 
'aqra'u - I read 
(') as in 0.~ 
• under an alif (~ if the next vowel is an 
0 ~0 
"i" (.) as in l,)~ 'iv.l~m - information 
2. When a hamza is within a word (i.e. pre- 
ceded and followed by some consonant) it 
is written 
• over an alif (i) when 
- preceded by a sukfin (0) and fol- 
lowed by an "a" as in JL~" yas'alu 
- he asks- 
- preceded by an "a" and followed by 
a sukfin as in ~." ya'kulu - he eats 
- preceded by an "a" and followed by 
an "a" as in ¢Jk~ sa'ala - he (has) 
asked - 
• over a waw (~) when 
- preceded by a sukfin and followed 
by an "u" as in ~.'" yab 'usu - he 
is strong, brave - 
-preceded by an "a" and followed 
by an "u" or an "fi" as in " ~'" 
ya'~bu - to return or to suffer - 
preceded 
a sukfin 
prefers - 
by a "u" and followed by 
as in .~ yu'thiru - he 
- preceded by an "u" and followed by 
an "a" as in .~ yu'aththiru - he 
influences - 
- preceded by an "u" and followed 
by an "u" or an "fi" as in ~r_~Y. 
bu '~sun - distresses - 
- precede by an "fi" and followed by 
an "u" 
• over a ya (G) when 
- preceded by an "i" whatever be the 
following vowel as in ~. bi'run - 
well - .~ bi'drun plural of the 
same word 
- followed by an "i" whatever be the 
preceding vowel as in ~3~ qd'idun 
- leader, director, commandant,... 
• without any carrying letter when 
- preceded by an "&" and followed by 
an "a" as ~1~5 bad~'atun - begin- 
ning - 
- preceded by an "fi" and followed by 
an "a" as in O: li~" yasa'dni - they 
(both) become bad - 
3. When a hamza is at the end of a word it is 
written 
• without any carrier when 
- the preceding vowel is a sukfin 6 as 
in *At2". juz'un - a part - 
- the preceding vowel is an "g" as in 
~l~fi..l ajza'un, plural of the same 
word 
- the preceding vowel is an "fi" as in 
;y:~ yasa'u - it becomes bad - 
- the preceding vowel is an "i" as in 
• o *0~"~- yajf'u - he arrives - 
• over alif when the preceding vowel is 
an "a" and the following is one of "a', 
"an", "u", "un" as in i"a~ mubtada 'un 
• 
lS~I al-mubtada'u mubtada'an 
~ o~ 
I.~\].l al-mubtada 'a, different forms of 
the word meaning (grammatical) sub- 
ject 
• under alif when the preceding vowel is 
an "a" and the following is "i" or "in" 
I~ mubtada'in 15"_'-_~i al-mubtada'i, 
indirect case of the same word 
6there are some exception when the preceding conso- 
nant is "y" as in ~ shagt'an undetermined direct case 
- a thing - 
30 
• over waw when the preceding vowel is 
"u" as in ~. jaru'a- he (has) risked 
- ~." yajru'u - he riskes- 
• over ya when the precedin.gvowel is "i" 
as in ~I:,~ khati'un ~tldl al-khati'a 
~.~,t~d~ al-khati'i - wrong- 
A full account of the rules governing hamza 
writing have just been given. Usual presenta- 
tions of hamza writing add to these rules, the 
rules of madda (~) writing. Madda is a con- 
traction used for a hamza followed by an ~ or 
a hamza followed itself by a sukfin. This hap- 
pens in some derivations or conjugations, thus 
we considerer it as pertaining to the whole set 
of transformations which occur in those cases. 
~'q 5kulu +-- ~q~ 'a'kulu -I eat - 
~-l dkhad_a +-- ~.~ 'aakhad_a - he blamed - 
Besides, except for elementary schools and 
Coranic Recitation, noboby cares about ending 
short vowels. So, if the last vowel is not long 
it is treated as it were a sukfin, i.e. no vowel. 
This is always true of modern arabic and this 
reduces the number of rules involved at the end 
of a word. 
2.2 A Moore automaton for hamza 
writing 
With the aforementioned restrictions these rules 
can also be implemented as a Moore automaton. 
2.2.1 Source language of the automaton 
It follows from the determination rules that we 
have to know 
• if the consonant to be processed is a hamza 
(whatever its carrier has to be) or not, 
• wether a vowel is present before or after the 
hamza, 
• and if so, what are the surrounding vowels 
(short or long). 
Again the presence of a shadda is non pertinent 
and can be treated as mentioned for the con- 
textual analysis. The alphabet for the source 
language L3 can be, using the same method as 
before : 
A3 = {#,l, hz, su, a,u,i,?t, fz,~,} 
where hz is a hamza with any carrier, 1 any 
consonant other than hamza and su stands for 
sukfin. The only other constraints for this lan- 
guage are : 
i. a sukfin cannot 
• neither follow the first consonant 
• nor follow a consonant already pre- 
ceded by a sukfin 
2. a hamza cannot follow another hamza 7 
The regular expression corresponding to L3 
would be too complicated to be really clarify- 
ing so we shall go directly to the definition of a 
generating automaton for this language. 
initial states = {1} 
final states: = {21} 
transitions 
Because of the narrowness of this style 
columns, the transition tables have been dev- 
ided in two parts. The last column of the second 
table gives the output corresponding to every 
state. 
hz 1 a u i 
1 0 0 0 0 0 
2 3 4 00 0 
3 0 0 5 6 7 
4 0 0 8910 
5 0 {17,4} 00 0 
6 0 {17,4} 00 0 
7 O {17,4} O O O 
8 {18,3} {17,4} 0 0 O 
9 {18,3} {17,4} 0 0 0 
10 {18,3} {17,4} 0 0 0 
ii O 4 0 0 0 
12 0 4 00 0 
13 0 4 00 0 
14 3 4 00 0 
15 3 4 0 0 O 
16 3 4 00 0 
17 0 0 0 0 0 
7this is true since we are at writing stage, not deriva- 
tion or inflection stage 
31 
hz l a u i 
1800000 
19 3 4000 
2004000 
21 0 0 0 0 0 
ft f su # 0 
1 0 0 0 0 2 0 
2 0 0 0 0 2 # 
3 0 0 0 0 2 hz 
4 0 0 0 0 2 t 
5110 0 0 0 a 
6 0 12 0 0 0 u 
7 0 0 13 0 0 i 
8 14 0 0 0 0 a 
9 0 15 0 0 0 u 
10 0 0 16 0 0 i 
11 0 0 0 0 21 a 
12 0 0 0 0 21 
13 0 0 0 0 21 i- 
14 0 0 0 0 21 
15 0 0 0 0 21 fi 
16 0 0 0 0 21 ~: 
17 0 0 0 19 0 l 
18 0 0 0 20 0 hz 
19 0 0 0 0 21 su 
20 0 0 0 0 21 su 
21 0 0 0 0 0 # 
2.2.2 Target language of the automaton 
The only differences with the source language 
lie in the distinct carriers for the letter hamza: 
A4 -- {#,l, hwc, hoa, hua, how, hog, 
su, a, u, i, ~, ~, 7, } 
where hwc stands for hamza without a carrier, 
hoa for hamza on alif, hua for hamza under alif, 
how for hamza on waw and hog for hamza on 
ya. 
2.2.3 Translation automaton 
initial states = {1} 
final states: = {36) 
transitions 
+I l+ul.l,+l/ 1 0 0 010 0 0 
20 0 01456 
30 0 01789 
ImH  mmml \]mu+mlimmml~ 
32 
l hz su a u i 
6 16 {14,34} 0 0 0 0 
7 16 0 0 0 0 0 
8 0 0 0 0 0 0 
9 0 0 0 00' 0 
10 0 0 21 22 26 30 
11 16 {19,31} 0 0 0 0 
12 0 0 24 25 26 30 
13 16 {20,31} 0 0 0 0 
14 0 0 27 28 29 30 
15 16 {21,31} 0 0 0 0 
16 0 0 17 4 5 6 
17 2 {18,31} 0 0 0 0 
18 0 0 0 22 26 30 
19 0 0 0 23 26 30 
20 0 0 0 28 29 30 
21 2 0 0 0 0 0 
22 16 0 0 0 0 0 
23 16 0 0 0 0 0 
24 16 0 0 0 0 0 
2.5 16 0 0 0 0 0 
26 16 0 0 0 0 0 
27 16 0 0 0 0 0 
28 16 0 0 0 0 0 
29 16 0 0 0 0 0 
30 16 0 0 0 0 0 
31 0 0 35 0 0 0 
32 0 0 35 0 0 0 
33 0 0 35 0 0 0 
34 0 0 35 0 0 0 
3.5 0 0 0 0 0 0 
36 0 0 0 0 0 0 
fi f # output 
1 0 0 0 1 # 
2 0 0 0 0 l 
3 0 0 0 0 0 
4 11 0 0 0 a 
5 0 13 0 0 u 
6 0 0 15 0 i 
7 0 0 0 0 \[hoa, a\] 
8 0 13 0 0 \[hoa, u\] 
9 0 0 15 0 \[hua, i\] 
lO 0 0 0 0 0 
11 0 0 0 36 a 
12 0 0 0 0 0 
13 0 0 0 36 fi 
14 0 0 0 0 0 
a u , # outpu{ 
15 0 0 0 36 f 
160000 l 
17 0 0 0 0 su 
180000 0 
190000 0 
200000 0 
21000 0 \[hoa, su\] 
22 0 O O 0 \[hoa, a\] 
23 0 0 O O \[hwc, a\] 
24 0 0 0 0 \[how, su\] 
25 O 0 0 0 \[how, a\] 
26 0 ¢ 0 0 \[how, u\] 
27 000 0 \[hoy, su\] 
28 O O O O \[hoy, a\] 
29 O 0 O O \[hoy, u\] 
30 0 0 0 0 \[hoy, i\] 
31 0 0 0 0 hwc 
32 0 0 0 0 hoa 
33 0 0 0 O how 
34 O O 0 0 hoy 
35 0 0 0 36 su 
360000 # 
2.2.4 Test program results 
A PROLOG program similar to the one used for 
contextual analysis gives the following results: 
input 
\[#,hz,a,l,a,fi,l,a,hz,su,#\] 
\[#,hz,a,l,a,&l,u,hz,su,#\] 
\[#,hz,a,l,a,&l,i,hz,su,#\] 
\[#,hz,a,l,a,hz,a,l,a,~,#\] 
\[#,hz,a,l,a,hz,u,l,a,~,#\] 
\[#,hz,a,l,a,hz,i,l,a,& #\] 
\[#,hz,a,l,u,hz,a,l,a,&#\] 
\[#,hz,a,l,u,hz,u,l,a,fi,#\] 
\[#,hz,a,l,u,hz,i,l,a,~,#\] 
\[#,hz,a,l,i,hz,a,l,a,fi,#\] 
\[#,hz,a,l,i,hz,u,l,a,& #\] 
\[#,hz,a,l,i,hz,i,l,a,~,#\] 
\[#,hz,a,l,i,hz,i,l,a,fi,#\] 
\[#,hz,u,l,a,~,l,a,hz,su,#\] 
\[#,hz,i,l,a,fi,l,a,hz,su,#\] 
\[#,l,a,l,a,hz,a,l,a,~,#\] 
\[#,l,a,l,a,hz,u,l,a,fi,#\] 
\[#,l,a,l,a,hz,i,l,a,fi,#\] 
\[#,l,a,l,u,hz,a,l,a,fi,#\] 
\[#,l,a,l,u,hz,u,l,a,~,#\] 
\[#,l,a,l,u,hz,i,l,a,&#\] 
\[#,l,a,l,i,hz,a,l,a,fi,#\] 
\[#,l,a,l,i,hz,u,l,a,&#\] 
\[#,l,a,l,i,hz,i,l,a,&#\] 
\[#,l,a,fi,hz,a,l,a,hz,su,#\] 
output 
\[#,hoa,a,l,a,fi,l,a,hoa,su,#\] 
\[#,hoa,a,l,a,fi,l,u,how,su,#\] 
\[#,hoa,a,l,a,&l,i,hoy, su,#\] 
\[#,hoa,a,l,a,hoa,a,l,a,fi,#\] 
\[#,hoa,a,l,a,how,u,l,a,~,#\] 
\[#,hoa,a,l,a,hoy, i,l,a,fi,#\] 
\[#,hoa,a,l,u,how,a,l,a,&#\] 
\[#,hoa,a,l,u,how,u,l,a,~,#\] 
\[#,hoa,a,l,u,hoy,i,l,a,&#\] 
\[#,hoa,a,l,i,hoy, a,l,a,fi,#\] 
\[#,hoa,a,l,i,hoy, u,l,a,~,#\] 
\[#,hoa,a,l,i,hoy, i,l,a,fi,#\] 
\[#,hoa,a,l,i,hoy,i,l,a,~,#\] 
\[#,hoa,u,t,a,fi,l,a,hoa,su,#\] 
\[#,hua,i,l,a,fi,l,a,hoa,su,#\] 
\[#,l,a,l,a,hoa,a,l,a,~,#\] 
\[#,l,a,l,a,how,u,l,a,~,#\] 
\[#,l,a,l,a,hoy, i,l,a,&#\] 
\[#,l,a,l,u,how,a,l,a,~,#\] 
\[#,l,a,l,u,how,u,l,a,fi,#\] 
\[#,l,a,l,u,hoy, i,l,a,~,#\] 
\[#,l,a,l,i,hoy, a,l,a,fi,#\] 
\[#,l,a,l,i,hoy, u,l,a,&#\] 
\[#,l,a,l,i,hoy, i,l,a,~. #\] 
\[#,l,a,Lhss,a,l,a,hoa,su,#\] 
input 
\[#,l,a,Lhz,u,l,a,hz,su,#\] 
\[#,l,a,g,hz,i,l,a,hz,su,#\] 
\[#,l,u,l,u,fi,hz,a,l,su,#\] 
\[#,l,u,l,u,fi,hz,u,l,su,#\] 
\[#,l,u,l,u,fi,hz,i,l,su,#\] 
\[#,l,u,l,i,i,hz,a,l,su,#\] 
\[#,l,u,l,i,Lhz,u,l,su,#\] 
\[#,l,u,l,i,Lhz,i,l,su,#\] 
\[#,l,u,l,su,hz,a,l,su,#\] 
\[#,l,u,l,su,hz,i,l,su,#\] 
\[#,l,u,l,su,hz,u,l,su,#\] 
\[#,l,u,l,a,hz,su,l,a,~,#\] 
\[#,l,u,l,u,hz,su,l,a,fi,#\] 
\[#,l,u,l,i,hz,su,l,a,& #\] 
output 
\[#,l,a,~,how,u,l,a,hoa,su,#\] 
\[#,l,a,~,hoy, i,l,a,hoa,su,#\] 
\[#,l,u,l,u,fi,hss,a,l,su,#\] 
\[#,l,u,l,u,fi,how,u,l,su,#\] 
\[#,l,u,l,u,fi,hoy, i,l,su,#\] 
\[#,l,u,l,i,i,hoy,a,l,su,#\] 
\[#,l,u,l,i,i, hoy, u,l,su,#\] 
\[#,l,u,l,i,Lhoy, i,l,su,#\] 
\[#,l,u,l,su,hoa,a,l,su,#\] 
\[#,l,u,l,su,hoy,i,l,su,#\] 
\[#,l,u,l,su,how,u,l,su,#\] 
\[#,l,u,l,a,hoa,su,l,a,~,#\] 
\[#,l,u,l,u,how,su,l,a,&#\] 
\[#,l,u,l,i,hoy, su,l,a,& #\] 
CONCLUSION 
As a matter of conclusion we hope to have 
shown that, through a careful choice of a for- 
real language, linguistic rules can be specified 
as tractable automata. 

References 
A. V. Aho and J. D. Ullman. 1972. The Theory of Pars. 
ing, Translation and Compiling, volume 1: Parsing. 
Prentice-Hall. 
Arab League Arab Organization for Standardization 
and Metrology (ASMO), 1982. Data processing 7 bit 
Coded Arabic Character Set for Information Inter- 
change. 
Arab School on Science and Technology 1st Fall Session 
Rabat Morocco. 1983. Applied Arabic Linguistics and 
Signal ~ Information Processing, P.O. Box 7028 Dam- 
ascus Syria. 
Arab School of Science and Technology 7th Summer Ses- 
sion, Zabadani Valley - Syria. 1985. Informaties and 
Applied Arabic Linguistics, P.O. Box 7028 Damascus 
Syria. 
R. Blach~re and M. Gaudefroy-Demombynes. 1952. 
Grammaire de l'arabe classique. G.P. Maisonneuve & 
Larose, 3" edition. 
1985. Computer Processing of the Arabic Language. 
April 14-16, 1985 Kuwait. 
M. Fanton. 1997. L'~criture arabe : du manuscrit h 
l'ordinateur. La Tribune Internationale des Langues 
vivantes, (21), mai. 
J. E. Hopcroft and J. D. Unman. 1979. Introduction 
to Automata Theory, Languages and Computation. 
Addison-Wesley. 
K. Lagally. 1992. ArabT~c~X a system for typesetting 
arabic user manual version 3.00. Technical Report 
1993/11, Universit~t Stuttgart, Fakult~it Informatik, 
Breitwiesenstrafle 20-22, 70565 Stuttgart, Germany. 
Document ~lectronique fourni avec le logiciel. 
A. Lakhdar Ghazal. 1983. L'alphabet arabe et les ma- 
chines. In Applied Arabic Linguistics and Signal 
Information Processing (Ara, 1983), pages 233-258. 
W. Wright. 1859. A Grammar of the Arabic Language. 
Cambridge University Press, 3 ~ edition. 
