ON THE GENERATIVE POWER OF TWO.LEVEL MORPHOLOGICAL RULES 
Graeme Ritchie 
Department of Artificial Intelligence 
University of Edinburgh 
80 South Bridge 
Edinburgh EH1 1HN 
Scotland 
ABSTRACT 
Koskenniemi's model of two-level morphology 
has been very influential in recent years, but 
definitions of the formalism have generally been 
phrased in terms of a compilation (sometimes 
left unspecified) into a form of finite-state trans- 
ducers, or else have consisted of an informal 
outline of the intended interpretation of the 
rule-formalism itself. Analyses of the properties 
of the formalism have generally focussed on the 
transducer mechanism. It is, however, possible 
to give a fully formal definition of the original 
rule notation directly, in a way which reflects 
Koskenniemi's original informal characterisation 
and which does not depend directly on the 
notion of a transducer (although it must retain 
the essential nature of parts of the notation as 
being regular expressions). This re-formulation 
allows a proof that the ability of this formalism 
to charaeterise mappings between strings is 
more limited than that of arbitrary transducers. 
BACKGROUND 
Koskenniemi(1983a,b;1984) proposed a 
rule-system for describing morphological regu- 
larities in a language, depending centrally on the 
idea of matching two sequences of symbols - a 
lexical string (made up of the lexical forms of 
morphemes) and a surface string (the sequence 
of characters in the normal, inflected, form of 
the word). In principle, symbols could be 
orthographic or phonological; here we shall fol- 
low common practice within two-level morphol- 
ogy, and assume orthographic forms are being 
analysed. Koskenniemi(1983a) originally 
described the rules in two alternative forms - 
high-level rules and finite-state transducers, and 
conjectured that an automatic compilation pro- 
cedure could be devised to transform the more 
readable, high-level form into the more directly 
implementable transducer form. His implemen- 
tation was an interpreter for the transducers, 
which were directly written by the linguist as 
rules in their own right. The various linguistic 
analyses presented in Dalrymple et al.(1983) 
also follow this approach, expressing rules as 
transition tables for transducers. Kosken- 
niemi(1985) refined the notation and sketched a 
compilation method, and Ritchie, Black et 
a1.(1987), Karttunen et al.(1987) describe com- 
pilation techniques for two variants of the nota- 
tion. 
The aim of this paper is to give an alter- 
native statement of the meaning of the original 
high-level rule notation, without recourse to 
compilation into finite-state transducers. The 
benefits of this are twofold: 
(i) alternative implementation techniques can 
be considered or discussed with reference 
to a standard interpretation which is not 
tied to an existing approach to implemen- 
tation; 
(ii) the formal properties of the actual rule- 
formalism can be assessed, rather than the 
formal properties of another formalism 
(transducers) which might in principle be 
more powerful. 
In particular, it is possible to show that the 
two-level morphological mechanism is more 
limited than the transducer model in its ability 
to define relationships between strings. 
THE FORMALISM 
TWO-LEVEL NOTATION 
The original notation proposed in 
Koskenniemi(1983a) included some rather com- 
plex notational conventions which have not sur- 
vived into later versions. The formalisation 
given here will deal only with the core ideas, as 
embodied in Koskenniemi(1985) (and other 
implementations such as Karttunen et al.(1987), 
-51 - 
Ritchie, Black, et a1.(1987)). By way of illus- 
tration, here is a two-level morphological rule 
taken from Ritchie, Pulman et al. (1987): 
e:0 <=> =:C2 < +:0 V:= > 
or < C:C V:V> <+:0 e:e> 
or {g:g c:c} <+:0 {e:e i:i} > 
or 1:0 +:0 
or c:c <+:0 a:0 t:t> 
Rules are phrased in terms of symbol-pairs 
(written with an infix colon), where the first in 
the pair is a lexical symbol and the second is a 
surface symbol. In the above example, the pair 
of symbols on the left (lexical "e" and surface 
null) are allowed to occur only in the contexts 
listed on the right of the rule, where .... indi- 
cates the position of the pair "e:0". Each context 
has a left part and a right part, each of these 
being essentially a regular expression over 
symbol-pairs, where angle brackets indicate 
sequences of pairs and braces indicate alterna- 
tives (disjunction). Certain versions of the nota- 
tion may also allow the "Kleene star" symbol 
"*" to indicate zero or more repetitions, and the 
insertion of optional elements. In this example, 
"C", "V", "C2", and "=" represent subsets of the 
relevant symbol alphabets and "+" is an abstract 
symbol occurring in certain lexical forms. 
The formalism here will not include sym- 
bolic mnemonics for sets of symbols, nor vari- 
ables ranging over sets of symbols. The seman- 
tics of both these notations (which are com- 
monly used in two-level morphology) can be 
stated in terms of equivalent sets of rules 
without such abbreviatory conventions, so all 
that is required is a definition of the interpreta- 
tion of rules containing only actual character 
symbols, together with the various devices for 
indicating disjunction, repetition, etc. (Most of 
the latter could also be ignored here by a simi- 
lar assumption, but the presentation is perhaps 
easier to follow if the resemblance to the actual 
notation is retained). 
One of the more peripheral aspects of 
two-level morphology is the role of the rules in 
segmenting surface input strings into lexical 
forms (i.e. the interface between a rule inter- 
preter and a lexicon of morphemes). It is only 
there that the special null symbol "0" takes on 
special significance (see later section). Hence 
most of the definitions, and the subsequent dis- 
eussion of generative power, are concerned with 
sequences of symbol-pairs, which is equivalent 
to considering only pairs of strings of equal 
length. 
BASIC DEFINITIONS 
Given any two finite symbolic alphabets, 
A and A', a symbol-pair from A and A' is a 
pair <a, a'> where a ~ A and a' ~ A'. Such 
symbol-pairs will normally be written as "a:a'". 
A symbol-pair sequence from A and A' is sim- 
ply a sequence (possibly empty) of symbol-pairs 
from A and A', and a symbol-pair language 
over A and A' is a set of symbol-pair sequences 
(i.e. a subset of (A x A')*). 
Given two alphabets A and A', and a 
symbol-pair sequence S from A and A', a 
sequence <P1,..P~> of symbol-pair sequences 
from A and A' is said to be a partition of S iff 
S = P1P2....Pn (i.e. the concatenation 
of the P3 
CONTEXTS AND '.RULES 
Given two symbol sets A and A', a 
context-expression from A and A' is a regular 
expression over A x A'. That is, a context- 
expression characterises a regular set of 
sequences of symbol-pairs. For example, the 
expression 
b:b v (a:a b:b)* 
characterises the set 
{e, b:b, a:a b:b, a:a b:b a:a b:b ..... } 
where e denotes the empty sequence. 
Given two alphabets A and A', a two- 
level morphological rule over A and A' consists 
of a pair <P, C> where P is a symbol-pair from 
A and A', and C is a non-empty set of pairs 
<LC,RC> where LC and RC are context- 
expressions from A and A'. The reason for 
including a set of pairs of contexts, is that we 
must cater, in the g~meral case, for there being a 
disjunction of pairs of contexts (as in the illus- 
trative example above, where the disjuncts are 
separated by "or"). fin the case where the set is a 
singleton, this reduces to the simple (non- 
disjunctive) case. 
A context-expression ce is said to match 
at the right-end a symbol-pair sequence S iff 
there is a partition ",~v 1, P2> of S such that Pz is 
an element of the set characterised by ce. 
A context-expression ce is said to match 
at the left-end a symbol-pair sequence S iff 
there is a partition <P1, P2 > of S such that P1 is 
- 52 - 
an element of the set characterised by ce. 
In a two-level morphological grammar, 
there are generally three sorts of rule, although 
one of them can be re-expressed as a combina- 
tion of rules of the two more basic sorts. The 
first basic form of rule is the context restriction 
rule written with the operator "=>" separating 
the symbol-pair from the specification of the 
contexts. For example, 
l:i => b:b e:e 
would mean "if there is a lexical 1 paired with a 
lexical i, then there must be a lexical and sur- 
face b on its left, and a lexical e and surface e 
on its right". 
On the other hand, a surface coercion 
rule, written using the operator "<--" indicates 
that wherever the contexts (i.e. the right side of 
the rule) occur, and the lexical symbol is as 
given in the pair on the left side of the rule, 
then the surface symbol must be as given on the 
left side of the rule. For example: 
15 <= b:b e:e 
would mean that "whenever there is a lexical b 
and surface b on the left, a lexical e and surface 
e on the right, and a lexical 1, then the surface 
symbol must be i". 
The third type of rule, illustrated earlier, 
uses the "<=>" operator, and is defined to be 
equivalent to a pair of rules, one of each of the 
two basic types, but with the same content. 
Hence, no formal definition will be given of the 
third type of rule, on the grounds that a gram- 
mar written using the "<=>" operator is merely 
an abbreviation for a larger set of rules of the 
two basic types. We will first define the form 
of restriction imposed by rules normally written 
with the "=>" operator ("context restriction" 
rules). 
A set R of two-level morphological rules 
contextually allows a symbol-pair sequence S 
iff, for every partition <P1, a:a', P2> of S, either 
there is no rule of the form <a:a', C> in R, or 
there is at least one rule <a:a', C> in R such 
that C contains a context pair <LC, RC> such 
that LC matches P1 at the right end and RC 
matches P2 at the left end. 
The definition corresponding to a "surface 
coercion" rule (operator "<=") is as follows. 
A two-level morphological rule R = <<a,a'>, 
C> coercively allows a symbol-pair sequence S 
iff for every possible partition <Px, b:b', P2> of 
S and every element <LC, RC> in C such that 
LC matches P1 at the right end, and RC 
matches/'2 at the left end, if b = a, then b' = a'. 
An alternative but equivalent variation on 
the last definition would be that a two-level 
morphological rule R = <<a,a'>, C> coercively 
disallows a symbol-pair sequence S iff there is a 
possible partition <P1, b:b', P2 > of S and an ele- 
ment <LC, RC> in C such that LC matches P1 
at the right end, RC matches P2 at the left end, 
b = aand b' # a'. 
TWO-LEVEL GRAMMARS 
Given two alphabets A and A', a two- 
level morphological grammar based on A and 
A' consists of a pair <CR, SC> where CR and 
SC are finite sets of two-level morphological 
rules over A and A'. The two sets of rules are 
the context restriction and surface coercion rules 
respectively. 
One minor detail which must now be con- 
sidered is the question of feasible pairs. When 
set-mnemonics and variables are used within 
rules, these are deemed to cover not all possible 
symbol-pairs, but only those which are "feasi- 
ble". Even when not using these abbreviatory 
devices, it is necessary to have some notion of 
feasible symbol-pair, since such pairs are 
allowed to occur freely even if licensed by no 
rule (providing no rule forbids them). Usually, 
pairs of the form x:x (where x is in the intersec- 
tion of the two alphabets) are taken as feasible, 
but any pairs which appear in a rule are also 
deemed feasible. If we assume that the notion 
of a symbol-pair occurring in a regular expres- 
sion is clear enough, occurrence within a rule 
set is straightforward-- a symbol-pair a:a' is 
said to occur in a rule <b:b', C> iff either a:a' = 
b:b' or for at least one element <LC, RC> of C, 
a:a' occurs in at least one of LC and RC. 
Given a two-level morphological grammar G = 
<CR, SC>, the set of feasible pairs in G is the 
set of symbol-pairs 
{a:a' I 
a:a' occurs in some element of CR u SC} 
(In an implemented system, the user may be 
allowed to declare certain pairs as feasible, but 
at this level of abstraction we do not need to 
include this in our definition of a two-level mor- 
phological grammar, since such an effect could 
be represented by including rather vacuous 
context-restriction rules of the form 
- 53 - 
< a:b, { <o,o> } > ). 
Given a two-level morphological grammar 
G = <CR, SC>, a symbol-pair sequence S is 
generated by G iff all the following hold: 
(i) all the symbol-pairs in S are feasible pairs 
in G; 
(ii) each rule in SC coercively allows S; 
(iii) the set CR of rules contextually allows S. 
Notice that the two classes of rules are 
treated slightly differently - surface coercion 
rules are conjoined, forming a set of constraints 
all of which must be met, and context restriction 
rules are disjoined, giving a set of possible 
licensing contexts. If no rules apply to a particu- 
lar symbol-pair, it is acceptable if and only if it 
is feasible. 
THE LEXICON 
The mechanisms described so far have 
provided a way of relating one sequence of 
symbols to another sequence (of the same 
length). There has been little or no asymmetry 
between the roles played by the two sequences, 
and no explicit indication of how these rules 
might achieve the practical task of segmenting a 
word into a set of lexical forms which appear in 
a given dictionary. 1 The first convention that is 
needed is quite simple - the string of lexical 
symbols is regarded as being supplied by any 
valid concatenation of lexical forms. That is, the 
set of lexical entries implicitly defines an 
infinite set of strings of indefinite length, formed 
by any concatenation of lexical forms. It is in 
the course of integrating the string-matching 
with the segmentation that the special null sym- 
bol will be needed, so we must first define the 
notion of two strings being the same after the 
removal of nulls. 
Suppose we have some symbolic alphabet A. 
We define the function "delete" from A x A* to 
A* as follows, where, e denotes the empty 
string: 
delete(a, ~) = e 
delete(a, aS) ffi delete(a, S) 
delete(a, bS) = b delete(a,S) for any b ~ a. 
x The fonnal argtnnents concerning generative 
power concern only the mechanisms presented so far, 
so readers uninterested in the interface to the lexicon 
may skip this section. 
The other minor formal definition we 
need is to allow us to move from equal-length 
sequences of symbol-pairs to pairs of equal- 
length symbol-sequences in the obvious way. 
Suppose $1 and $2 are two sequences of sym- 
bols, of equal length, with Sl = al...an and $2 
= bl...bn. Then the symbol-pair sequence asso- 
ciated with $1 and $2 is the sequence 
al :bl....an:b,, 
We can then define a two-level morphological 
grammar as licensing a pair of strings of equal 
length, iff their associated symbol-pair sequence 
is generated by the grammar. 
A lexical segmentation system consists of 
a tuple (AL, AS, 0, L, (3) where AL is a finite 
set (the lexical alphabet ), AS is a finite set 
(the surface alphabet ), 0 is a symbol which is 
not an element of AL u AS, L is a set (the set 
of lexical forms ) of non-null elements of AL*, 
and G is a two-level morphographemic grammar 
based on AL u {0} and AS u {0}. 
Given a lexical segmentation system 
(AL, AS, 0, L, G), a siring S e AS* can be 
segmented as <ll,...l~> where li ~ L for all i, if 
there are strings S 1 ~ AL*, $2 ~ AS* such that 
the following all hold: 
delete(0, $1) = lll2....In 
delete(0, $2) = S 
G licenses <$1, $2> 
Notice that there is no distinguished symbol 
indicating a morpheme boundary or word boun- 
dary. Although the writer of the two-level rules 
will probably find it useful to insert certain spe- 
cial symbols (e.g. the "+" used in the example 
above), these have no special significance, and 
rules must be written to define how they relate 
to other symbols. The boundaries between mor- 
phemes are implicit in the successful match 
between the surface form (via the two-level 
rules) and the concatenated sequence of lexical 
forms. 
CROSS-LINKED LEXICONS 
In Koskenniemi(1983a) (and in the papers in 
Dalrymple et al.(1983)) the interface to the lexi- 
con is slightly more complicated, since the 
representation of morphotactic information is 
built into the interface, in the following way. A 
lexical entry (for a single morpheme) contains 
one or more con~!nuation classes which indicate 
what categories of morpheme might follow it 
within a valid word; for example, a noun stem 
- 54 - 
is marked as allowing a noun suffix as a possi- 
ble continuation. The morphemes are not held in 
a single, uniform dictionary, but in a set of sub- 
lexicons, where each lexicon corresponds to 
some single morphotactic class. Hence, when 
the lookup process has found a particular mor- 
pheme (say, a noun stem) by matching entries 
in the noun-stem sublexicon, the indication that 
noun-suffix is a possible continuation will cause 
the lookup process to continue scanning in the 
noun suffix sublexicon as it matches the input 
word from left to fight. This can be rephrased 
in a more declarative way by stating that an 
input string S corresponds to a sequence of lexi- 
cal forms wt,...w,, if S matches wtw2 ...w, (the 
concatenation of the forms) according to the 
morphographemic rules, and for each i between 
1 and n-l, wi+t is in a continuation class of wl. 
A lexical segmentation system would then have 
to include a function which mapped each lexical 
entry to its set of continuation classes. Hence 
the definitions given above would have to be 
altered to the following. 
A lexical segmentation system consists of 
a tuple (AL, AS, 0, {Lt,.. L,}, f, G) where AL is 
a finite set (the lexical alphabet ), AS is a finite 
set (the surface alphabet ), 0 is a symbol which 
is not an element of AL u AS, {Li} is a finite 
set of finite sets of non-null elements of AL* 
(the sublexicons), f is a function which associ- 
ates with each pair <w, j> (where w ~ L/) a 
subset of {Lt,...Ln} (the continuation class map- 
ping) and G is a two-level morphographemic 
grammar based on At. u {0} and AS u {0}. 
Given a lexical segmentation system 
(AL, AS, 0, L, f, G), a string S in AS* can be 
segmented as <It,J,,> where lj in Lso ) for each 
j, ff there are strings St in AL*, $2 in AS* such 
that the following all hold: 
delete(0, $1) -- ltl2....ln 
delete(0, $2) = S 
G licenses <St, $2> 
Ls~t ) ~ f(l/, gO)) for each j from 1 to n-1 
The advantage of introducing cross-linked lexi- 
cons is that some form of morphotactic informa- 
tion can be inserted directly into the lexicon, 
and the processing of this information incor- 
porated into the scanning of the surface string 
very easily. One theoretical disadvantage is that 
it imposes a finite-state structure on the morpho- 
tactics, which may well be undesirable. If 
cross-linked sublexicons are not used, some 
further descriptive device is needed to express 
morphotactic information in a usable form, but 
this could be completely separate from the two- 
level morphology system (cf. Ritchie, Pulman et 
al.(1987)). 
LANGUAGES GENERATED 
With the above definitions, it is now pos- 
sible to ask what sorts of symbol-pair languages 
can be characterised using a two-level morpho- 
logical grammar. Here we shall ignore the issue 
of the interface to the lexicon, and simply con- 
sider the capacity of two-level morphological 
grammars to characterise sets of sequences of 
symbol-pairs. 
Lemma 1 • Let R be a set of two-level morpho- 
logical rules. Let EI and E2 be symbol-pair 
sequences such that R contextually allows El, 
and R contextually allows E2. Then R contextu- 
ally allows the concatenation ERE2. 
Proof : If there is no symbol-pair a:a' in ErE 2 
such that there is some rule <<a:a'>, C> in R, 
then R contextually allows ErE 2 for trivial rea- 
sons. Let a:a' be a symbol-pair occurring in 
ErE2 such that there is at least one rule 
<<a:a',C> in R. Let <Pt, a:a',P2> be a partition 
of ERE2. It follows from the definitions of a 
partition and concatenation that either Pt is a 
proper initial subsequence of E1 and/>2 = $2E2 
for some sequence $2 (i.e. this occurrence of 
a:a' is in El), or Pt = EtS1 for some sequence 
S t and P2 is proper final subsequence of E 2 (i.e. 
this occurrence of a:a' is in E2). That is, either 
<Pt, a:a', $2> is a partition of El, or <St, a:a', 
P2> is a partition of E 2. Assume the former is 
true (a symmetrical argument can be followed 
for the latter). Since R contextually allows E 1, 
for the partition <Pt, a:a', $2> of El there is at 
least one rule C in R which contains at least one 
context-pair <LC, RC> such that LC matches Pt 
at the right end and RC matches $2 at the left 
end. If RC matches $2 at the left end, then RC 
will also match $2E2 = P2 at the left end. Hence, 
for the partition <P1, a:a', P2> of ErE 2 there is 
at least one rule C in R which contains at least 
one context-pair <LC, RC> such that LC 
matches Pt at the fight end and RC matches P2 
at the left end. A similar argument can be 
given for the occurrence of a:a' being in E 2. 
Since this will be true for any such a:a' in EiE2, 
R contextually allows ErE 2. 
Lemma 2 : Let R = <a:a', C> be a two-level 
morphological rule. Let El, E2, E3 be symbol- 
~.-F - 55 - 
pair sequences such that E1E2E3 is coercively 
allowed by R. Then E2 is coercively allowed by 
R. 
Proof : If E 2 were not coercively allowed by R, 
it would mean that there is a partition <$1, a:b, 
$2> of E2 such that for some <LC, RC> in C, 
LC matches S~ at the right end, RC matches $2 
at the left end, and b ~ a'. If this were the 
case, there would be a corresponding partition 
<E~S1, a:b, $2E3> of E~E2E 3, with LC matching 
E1S1 at the right end, and RC matching $2E3 at 
the left end. This would (by definition) mean 
that R does not coercively allow E~E2E 3, which 
is not the case by hypothesis. 
Corollary : Let C be a set of two-level morpho- 
logical rules, all of which coercively allow a 
symbol-pair sequence E. Then all of the rules in 
C coercively allow any subsequence of E. 
Lemma 3 : Let G be a two-level morphological 
grammar <CR, SC>, and let L(G) be the set of 
symbol-pair sequences generated by G. Sup- 
pose that there are sequences E 1, E2, E B, E 4 such 
that E2 ~ L(G), E3 ~ L(G), and E1E2E3E4 
L(G). Then E2E3 ~ L(G). 
Proof: (i) Since E1E2E3E4 ~ L(G), all the 
symbol-pairs in it are feasible with respect to G, 
hence all the symbol-pairs in E2E3 are feasible. 
(ii) Since E2 and E3 are in L(G), it follows that 
CR contextually allows E2 and E3 (by 
definition). By Lemma 1 above, this means that 
CR contextually allows E2E 3. 
(iii) Since E1E2E3E4 ~ L(G), it follows (by 
definition) that all of the rules in SC coercively 
allow EIE2E3E4. Hence, by the corollary to 
Lemma 2 above, all of the rules in SC coer- 
cively allow E2E3. 
This establishes the three defining conditions 
for E2E3 ~ L(G). 
REGULAR RELATIONS 
As mentioned in the introduction, two- 
level grammars have historically been written in 
two different ways-- as rules as defined here, 
and as sets of finite-state transducers. In the 
latter case, each transducer deals with some 
linguistic phenomenon, and a sequence of 
symbol-pairs is generated by the grammar if 
every transducer in the grammar accepts it. 
That is, the symbol-pair sequence must be in the 
intersection of the languages accepted by the 
transducers (viewed as acceptors); in procedural 
terms, this is often referred to as "having the 
transducers executed in parallel". Hence, when 
working with the transducer formalism the 
linguist has to devise independent transducers 
whose intersection is the required language. 
Kaplan(1988) discussed the notion of a 
regular relation, which is, roughly speaking, a 
symbol-pair language which can be character- 
ised by a regular expression of symbol-pairs. 
Not surprisingly, a set of symbol-pair sequences 
is regular if and only if it can be accepted by a 
finite-state transducer in the obvious way. 
Kaplan has developed an algebraic way of 
manipulating regular expressions over symbol- 
pairs together with ordinary regular expressions 
over symbols, and one of his results is that the 
intersection of several regular relations is also a 
regular relation. It follows that the symbol-pair 
languages accepted by the two-level transducer 
model are exactly the regular relations. 
Kaplan also formalises the re-expression 
of two-level morphological rules as transducers 
(i.e. the compilation mentioned in the introduc- 
tion above) by constructing regular relations 
equivalent to languages generated by individual 
two-level morphological rules. This re- 
expression is one-way - from a two-level mor- 
phological rule an equivalent regular relation 
can be formed. 
All this suggests that the "parallel trans- 
ducer" model is at least as powerful as the strict 
two-level grammar model defined earlier. The 
obvious question is whether there is a difference 
in power; in fact, there is: 
Theorem: There are regular relations (i.e. 
symbol-pair languages characterised by regular 
expressions of symbol-pairs) which cannot be 
generated by any two-level morphological gram- 
mar. 
Proof." This follows directly from Lemma 3 
above. Any language L generated by a two- 
level morphological grammar must have the 
property that if E2, E3, and E1E2E3E4 are in L, 
then E2E3 is in L. There are regular relations 
which do not have this property, such as the 
language b:b v (a:a b:b)* mentioned earlier 
(which contains b:b and a'a b:b but not b:b a:a 
b:b, even though that sequence is a subsequence 
of other elements of the language). 
There is another, rather trivial, difference 
between the power of two-level morphological 
rules and regular relations. According to the 
definitions given here, the empty sequence of 
symbol-pairs is in every language generated by 
a two-level morphological grammar, since it 
- 56 - 
conforms to the definition regardless of the con- 
tent of the rules. The definitions could be 
altered to exclude the empty sequence from 
every language, but it is hard to see how the 
rule mechanism could be used to allow the 
empty sequence in some languages but not oth- 
ors. 
CONCLUSIONS 
We have presented an alternative formal 
statement of the meaning of Koskenniemi's 
notation for two-level morphological rules. This 
definition appears to be wholly faithful to the 
original informal explanations of the intent of 
two-level morphological rules, but is indepen- 
dent of the expression of the rules as transduc- 
ers. The generative power of two-level morpho- 
logical grammars, viewed as ways of charac- 
terising sets of sequences of symbol-pairs, is 
less than that of arbitrary transducers, despite 
the fact that the transducer formulation is some- 
times discussed as if it were the essential 
definition of the two-level model. 
It now remains to determine further pro- 
perties of the set of two-level generatable 
languages. Barton et al.(1987) have shown that 
the recognition problem for two-level transduc- 
ers (including cross-linked lexicons) is NP- 
complete, and a very similar demonstration can 
be constructed for the two-level model defined 
here. Closure properties (or lack thereof) of the 
two-level generatable languages are yet to be 
proven. 
ACKNOWLEDGEMENTS 
This work was supported by SERC/Alvey 
grant GR/D/83507 (IKBS 096). I would like to 
thank Alan Black for useful discussions of this 
material. 
REFERENCES 
Dalrymple, Mary; Doron, Edit; Goggin, 
John; Goodman, Beverley; and McCarthy, John 
(eds) 1983 Texas Linguistic Forum 22, Depart- 
ment of Linguistics, University of Texas at Aus- 
tin, Austin, Texas. 
Barton, G. Edward; Berwick, Robert C.; 
and Ristad, Eric Sven 1987 Computational 
Complexity and Natural Language. MIT Press, 
Cambridge, Mass. 
Kaplan, Ronald M. 1988 Talk on finite- 
state transducers given at the Alvey Workshop 
on Parsing and Pattern Recognition, Oxford, 
April 1988. 
Karttunen, Lauri; Koskenniemi, Kimmo; 
and Kaplan, Ronald M. 1987 A Compiler for 
Two-level Phonological Rules. Unpublished 
manuscript. 
Koskenniemi, Kimmo 1983a Two-level 
Morphology: a general computational model for 
word-form recognition and production. Publica- 
tion No.ll, University of Helsinki, Finland. 
Koskenniemi, Kimmo 1983b Two-level 
model for morphological analysis. Pp. 683-685 
in Proceedings of the Eighth International Joint 
Conference on Artificial Intelligence, Karlsruhe. 
Koskenniemi, Kimmo 1984 A General 
Computational Model for Word-Form Recogni- 
tion and Production. Pp. 178-181 in Proceed- 
ings of COLING-84 (10th International Confer- 
ence on Computational Linguistics/22nd Annual 
Meeting of the ACL), Stanford, CA. 
Koskenniemi, Kimmo 1985 Compilation 
of Automata from Morphological Two-Level 
Rules. Pp. 143-149 in Papers from the Fifth 
Scandinavian Conference of Computational 
Linguistics , Publication No.15, University of 
Helsinki, Finland. 
Reape, Mike; and Thompson, Henry 1988 
Parallel Intersection and Serial Composition of 
Finite State Transducers. Pp.535-539 in 
Proceedings of COLING-88 (12th International 
Conference on Computational Linguistics). 
Bonn. 
Ritchie, Graeme D.; Black, Alan W.; Pal- 
man, Stephen G.; and Russell Graham J. 1987 
The Edinburgh/Cambridge Morphological Ana- 
lyser and Dictionary System: System Descrip- 
tion. Version 3.0. Software Paper 11, Depart- 
ment of Artificial Intelligence, University of 
Edinburgh. 
Ritchie, Graeme D.; Pulman, Stephen G.; 
Black, Alan W.; and Russell, Graham J. 1987 
A Computational Framework for Lexical 
Description. Computational Linguistics 13, (3- 
4):290-307. 
- 57 - 
