COMPUTATIONAL COMPLEXITY IN TWO-LEVEL 
MORPHOLOGY 
G. Edward Barton, Jr. 
M.I.T. Artificial Intelligence Laboratory 
545 Technology Square 
Cambridge, MA 02139 
ABSTRACT 
Morphological analysis must take into account the 
spelling-change processes of a language as well as its possi- 
ble configurations of stems, affixes, and inflectional mark- 
ings. The computational difficulty of the task can be clari- 
fied by investigating specific models of morphological pro- 
cessing. The use of finite-state machinery in the "two- 
level" model by Kimmo Koskenniemi gives it the appear- 
ance of computational efficiency, but closer examination 
shows the model does not guarantee efficient processing. 
Reductions of the satisfiability problem show that finding 
the proper lexical/surface correspondence in a two-level 
generation or recognition problem can be computationally 
difficult. The difficulty increases if unrestricted deletions 
(null characters) are allowed. 
INTRODUCTION 
The "dictionary lookup" stage in a natural-language 
system can involve much more than simple retrieval. In- 
flectional endings, prefixes, suffixes, spelling-change pro- 
cesses, reduplication, non-concatenative morphology, and 
clitics may cause familiar words to show up in heavily dis- 
guised form, requiring substantial morphological analysis. 
Superficially, it seems that word recognition might poten- 
tially be complicated and difficult. 
This paper examines the question more formally by in- 
vestigating the computational characteristics of the "two- 
level" model of morphological processes. Given the kinds 
of constraints that can be encoded in two-level systems, 
how difficult could it be to translate between lexical and 
surface forms? Although the use of finite-state machin- 
ery in the two-level model gives it the appearance of com- 
putational efficiency, the model itself does not guarantee 
efficient processing. Taking the Kimmo system (Kart- 
tunen, 1983) for concreteness, it will be shown that the 
general problem of mapping between \]exical and surface 
forms in two-level systems is computationally difficult in 
the worst case; extensive backtracking is possible. If null 
characters are excluded, the generation and recognition 
problems are NP-complete in the worst case. If null charac- 
ters are completely unrestricted, the problems is PSPACE- 
complete, thus probably even harder. The fundamental 
difficulty of the problems does not seem to be a precompi- 
lation effect. 
In addition to knowing the stems, affixes, and co- 
occurrence restrictions of a language, a successful morpho- 
logical analyzer must take into account the spelling-change 
processes that often accompany affixation. In English, 
the program must expect love+ing to appear as loving, 
fly+s as flies, lie+ing as lying, and big+er as bigger. 
Its knowledge must be sufficiently sophisticated to distin- 
guish such surface forms as hopped and hoped. Cross- 
linguistically, spelllng-change processes may span either a 
limited or a more extended range of characters, and the 
material that triggers a change may occur either before or 
after the character that is affected. (Reduplication, a com- 
plex copying process that may also be found, will not be 
considered here.) 
The Kimmo system described by Karttunen (1983} is 
attractive for putting morphological knowledge to use in 
processing. Kimmo is an implementation of the "two-level" 
model of morphology that Kimmo Koskenniemi proposed 
and developed in his Ph.D. thesis. I A system of lexicons in 
the dictionary component regulates the sequence of roots 
and affixes at the lexical level, while several finite-state 
transducers in the automaton component -- ~ 20 transduc- 
ers for Finnish, for instance -- mediate the correspondence 
between lexical and surface forms. Null characters allow 
the automata to handle insertion and deletion processes. 
The overall system can be used either for generation or for 
recognition. 
The finite-state transducers of the automaton compo- 
nent serve to implement spelling changes, which may be 
triggered by either left or right context and which may 
ignore irrelevant intervening characters. As an example, 
the following automaton describes a simplified "Y-change" 
process that changes y to i before suffix es: 
IUniversity of Helsinki, Finland, circa Fall 1983. 
53 
"Y-Change" 5 5 
y y * s = (lexicalcharacters) 
i y = s = (surface characters) 
state 1: 2 4 1 1 1 (normal state) 
state 2. 0 0 3 0 0 (require *s) 
state 3. 0 0 0 1 0 (require s) 
state 4: 2 4 8 1 1 (forbid+s) 
state S: 2 4 1 0 1 (forbids) 
The details of this notation will not be explained here; 
basic familiarity with the Kimmo system is assumed. For 
further introduction, see Barton (1985), Karttunen (1983), 
and references cited therein. 
THE SEEDS 
OF COMPLEXITY 
At first glance, the finite-state machines of the two- 
level model appear to promise unfailing computational ef- 
ficiency. Both recognition and generation are built on the 
simple process of stepping the machines through the input. 
Lexical lookup is also fast, interleaved character by charac- 
ter with the quick left-to-right steps of the automata. The 
fundamental efficiency of finite-state machines promises to 
make the speed of Kimmo processing largely independent 
of the nature of the constraints that the automata encode: 
The most important technical feature of Kosken- 
niemi's and our implementation of the Two-level 
model is that morphological rules are represented 
in the processor as automata, more specifically, as 
finite state transducers .... One important conse- 
quence of compiling \[the grammar rules into au- 
tomata\] is that the complexity of the linguistic de- 
scription of a language has no significant effect on 
the speed at which the forms of that language can 
be recognized or generated. This is due to the fact 
that finite state machines are very fast to operate 
because of their simplicity .... Although Finnish, 
for example, is morphologically a much more com- 
plicated language than English, there is no differ- 
ence of the same magnitude in the processing times 
for the two languages .... \[This fact\] has some psy- 
cholinguistie interest because of the common sense 
observation that we talk about "simple" and "com- 
plex" languages but not about "fast" and "slow" 
ones. (Karttunen, 1983:166f) 
For this kind of interest in the model to be sustained, it 
must be the model itself that wipes out processing diffi- 
culty, rather than some accidental property of the encoded 
morphological constraints. 
Examined in detail, the runtime complexity of Kimmo 
processing can be traced to three main sources. The rec- 
ognizer and generator must both run the finite-state ma- 
chines of the automaton component; in addition, the recog- 
nizer must descend the letter trees that make up a lexicon. 
The recognizer must also decide which suffix lexicon to ex- 
plore at the end of an entry. Finally, both the recognizer 
and the generator must discover the correct lexical-surface 
correspondence. 
All these aspects of runtime processing are apparent 
in traces of implemented Kimmo recognition, for instance 
when the recognizer analyzes the English surface form 
spiel (in 61 steps) according to Karttunen and Witten- 
burg's (1983) analysis (Figure 1). The stepping of trans- 
ducers and letter-trees is ubiquitous. The search for the 
lexical-surface correspondence is also clearly displayed; for 
example, before backtracking to discover the correct lexi- 
cal entry spiel, the recognizer considers the lexical string 
spy+ with y surfacing as i and + as e. Finally, after finding 
the putative root spy the recognizer must decide whether 
to search the lexicon I that contains the zero verbal ending 
of the present indicative, the lexicon AG storing the agen- 
tive suffix *er, or one of several other lexicons inhabited 
by inflectional endings such as +ed. 
The finite-state framework makes it easy to step the 
automata; the letter-trees are likewise computationally 
well-behaved. It is more troublesome to navigate through 
the lexicons of the dictionary component, and the cur- 
rent implementation spends considerable time wandering 
about. However, changing the implementation of the dic- 
tionary component can sharply reduce this source of com- 
plexity; a merged dictionary with bit-vectors reduces the 
number of choices among alternative lexicons by allowing 
several to be searched at once (Barton, 1985). 
More ominous with respect to worst-case behavior is 
the backtracking that results from local ambiguity in the 
construction of the lexical-surface correspondence. Even 
if only one possibility is globally compatible with the con- 
straints imposed by the lexicon and the automata, there 
may not be enough evidence at every point in processing 
to choose the correct lexical-surface pair. Search behavior 
results. 
In English examples, misguided search subtrees are 
necessarily shallow because the relevant spelling-change 
processes are local in character. Since long-distance har- 
mony processes are also possible, there can potentially be 
a long interval before the acceptability of a lexical-surfaee 
pair is ultimately determined. For instance, when vowel 
alternations within a verb stem are conditioned by the oc- 
currence of particular tense suffixes, the recognizer must 
sometimes see the end of the word before making final de- 
cisions about the stem. 
54 
Recognizing surface form "spiel". 
1 s 1.4.1.2.1.1 
2 sp 1.1.1.2.1.1 
3 spy 1.3.4.3.1.1 
4 "spy" ends, new lelXlCOn N 
5 "0" ends. new lexicon C1 
6 spy XXX extra input 
7 (5) spy+ 1.5.16.4.1.1 
8 spy+ XXX 
9 (5) spy + 1.8.1.4.1.1 
10 spy+ XXX 
11 (4) "spy" ends, new lextcon 1 
12 spy XXX extra tnput 
13 (4) "spy" ends, new lexicon P3 
14 spy+ 1.6.1.4.1.1 
15 spy+ XXX 
16 (14) spy+ 1,8.18.4.1.1 
17 spy+ XXX 
18 (4) "spy" ends, new lextcon PS 
19 spy+ 1.6.1.4.1.1 
20 spy+e 1.1.1.1.4.1 
Zl spy+e XXX 
22 (20) spy÷e 1.1.4.1.3.1 
23 spy+e XXX 
24 (19) spy+ 1.8.16.4.1.1 
25 spy+e XXX Epenthesls 
26 (4) "spy" ends, new lexicon PP 
27 spy+ 1.6.1.4.1.1 
28 spy+e 1.1.1.1.4.1 
zg spy+e XXX 
30 (28) spy+e 1.1.4.1.3.1 
31 spy+e XXX 
32 (27) spy+ 1.8.18.4.1.1 
33 spy+e XXX Epenthests 
34 (4) "spy" ends. new lexicon PR 
35 spy+ 1.6.1.4,1,1 
36 spy+ XXX 
37 (38) spy+ 1.8.16.4.1.1 
38 spy+ XXX 
39 (4) "spy" ends. new lextcon AG 
40 spy+ 1.6.1.4.1. I 
41 spy+e 1.1.1.1.4.1 
42 spy+e XXX 
43 (41) spy+e 1.1.4,1.3,1 
44 spy+e XXX 
45 (40) spy+ 1.8.16.4.1.1 
46 spy+e XXX Epenthests 
47 (4) "spy" ends. new lextcon AB 
48 spy+ 1,8.1.4.1.1 
49 spy+ XXX 
50 (48) Spy+ 1,5.18.4.1.1 
51 spy÷ XXX 
52 (3) spt 1.1.4.1.2.8 
53 spte 1.1.16.1.6.1 54 spte XXX 
58 (53) sple 1.1.16.1.5.6 
56 spiel 1.1.16.2, I. I 
57 "spiel" ends. new lextcon N 
58 "0" ends. new lexicon Cl 
59 "spiel" *** result 
60 (58) spiel+ 1.1.18.1.1.1 
61 spiel+ XXX 
"--+--'+---+ILL+LLL+III+ 
-~-+xxx+ l 
---+XXX+ 
LLL+\]H+ 
I LLL÷---+XXX÷ 
-~-+XXX+ 
LLL+---+-*-+XXX+ _l_+xxx÷ 
-o-+AAA+ 
LLL+---+---+XXX+ 
!:i:: ,x,. 
LLL+---+XXX+ 
-!-+XXX+ 
LLL+---+---÷XXX+ 
I ---÷XlX+ 
---+---+XXX+ 
I ---+---+LLL+LLL+**-÷ 
I ---+XXX+ 
Key to tree nodes: 
--- normal treversal 
LLL new lexicon 
AAA blocking by automata 
XXX no lexlcal-surface pairs 
compatible with surface 
char and dictionary 
III blocking by leftover input 
*'* analysis found 
(("spiel" (N SG))) 
Figure \]: These traces show the steps that the KIMMOrecognizer for English goes through while 
analyzing the surface form spiel. Each llne of the table oil the left shows the le\]dcal string and 
automaton states at the end of a step. If some autoz,mton blocked, the automaton states axe replaced 
by ~, XXI entry. An XXX entry with no autonmto,, n:une indicates that the \]exical string could not 
bc extended becau,~e the surface c\],aracter .'tnd h,xical letter tree together ruh'd out ,-dl feasible p,'drs. 
After xn XXX or *** entry, the recognizer backtracks and picks up from a previous choice point. 
indicated by the paxenthesized step l*lU,zl)er before the lexical .~tring. The tree Ol, the right depicts 
the search graphically, reading from left to right and top t. \])ottoln with vertir;d b;trs linking the 
choices at each choice point The flhntres were generated witl, a \](IM M() hnplen*entation written i, an 
;I.llgll*t,llter| version of MACI,ISI'I,t,sed initiMly on Kltrttllnel*',,¢ (1983:182ff) ;dgorithni description; the 
diction;n'y m.l antomaton contpouents for E,glish were taken front 1.;artt,ne, and Wittenlmrg (1983) 
with minor ('llikllgCS. This iJz*ple*l*Vl*tatio*) se;u'¢h(.s del.th-tlr,~t a,s Kmttu,en's does, but explores the 
Mternatives at a giwm depth in a different order from Karttttnen's. 
55 
• I 
Ignoring the problem of choosing among alternative 
lexicons, it is easy to see that the use of finite-state ma- 
chinery helps control only one of the two remaining sources 
of complexity. Stepping the automata should be fast, but 
the finite-state framework does not guarantee speed in the 
task of guessing the correct lexical-surface correspondence. 
The search required to find the correspondence may pre- 
dominate. In fact, the Kimmo recognition and generation 
problems bear an uncomfortable resemblance to problems 
in the computational class NP. Informally, problems in NP 
have solutions that may be hard to guess but are easy to 
verify -- just the situation that might hold in the discov- 
ery of a Kimmo lexical-surface correspondence, since the 
automata can verify an acceptable correspondence quickly 
but may need search to discover one. 
THE COMPLEXITY 
OF 
TWO-LEVEL MORPHOLOGY 
The Kimmo algorithms contain the seeds of complex- 
ity, for local evidence does not always show how to con- 
struct a lexical-surface correspondence that will satisfy 
the constraints expressed in a set of two-level automata. 
These seeds can be exploited in mathematical reductions 
to show that two-level automata can describe computa- 
tionally difficult problems in a very natural way. It fol- 
lows that the finite-state two-level framework itself cannot 
guarantee computational efficiency. If the words of natural 
languages are easy to analyze, the efficiency of processing 
must result from some additional property that natural 
languages have, beyond those that are captured in the two- 
level model. Otherwise, computationally difficult problems 
might turn up in the two-level automata for some natural 
language, just as they do in the artificially constructed lan- 
guages here. In fact, the reductions are abstractly modeled 
on the Kimmo treatment of harmony processes and other 
long-distance dependencies in natural languages. 
The reductions use the computationally difficult 
Boolean satisfiability problems SAT and 3SAT, which in- 
volve deciding whether a CNF formula has a satisfying 
truth-assignment. It is easy to encode an arbitrary SAT 
problem as a Kimmo generation problem, hence the gen- 
eral problem of mapping from lexical to surface forms in 
Kimmo systems is NP-complete. 2 Given a CNF formula ~, 
first construct a string o by notational translation: use a 
minus sign for negation, a comma for conjunction, and no 
explicit operator for disjunction. Then the o corresponding 
to the formula (~ v y)&(~ v z)&(x v y v z) is -xy.-yz .xyz. 
2Membership in NP is also required for this conclusion. A later 
section ("The Effect of Nulls ~) shows membership in NP by sketching 
how a nondeterministic machine could quickly solve Kimmo generation 
and recognition problems. 
The notation is unambiguous without parentheses because 
is required to be in CNF. Second, construct a Kimmo 
automaton component A in three parts. (A varies from 
formula to formula only when the formulas involve differ- 
ent sets of variables.) The alphabet specification should 
list the variables in a together with the special characters 
T, F, minus sign, and comma; the equals sign should be 
declared as the Kimmo wildcard character, as usual. The 
consistency automata, one for each variable in a, should 
be constructed on the following model: 
"x-consistency" 3 3 
x x = (lezical characters) 
T F = (surface characters} 
1: 2 3 1 (x undecided} 
2: 2 0 2 (x true} 
3: 0 3 3 (xfalsc} 
The consistency automaton for variable x constrains the 
mapping from variables in the lexical string to truth-values 
in the surface string, ensuring that whatever value is as- 
signed to x in one occurrence must be assigned to x in 
every occurrence. Finally, use the following satisfaction 
automaton, which does not vary from formula to formula: 
"satisfaction" 3 4 
= = , (lexical characters} 
T F , (surface characters} 
1. 2 1 3 0 (no true seen in this group) 
2: 2 2 2 1 (true seen in this group} 
3. 1 2 0 0 (-F counts as true) 
The satisfaction automaton determines whether the truth- 
values assigned to the variables cause the formula to come 
out true. Since the formula is in CNF, the requirement is 
that the groups between commas must all contain at least 
one true value. 
The net result of the constraints imposed by the consis- 
tency and satisfaction automata is that some surface string 
can be generated from a just in case the original formula 
has a satisfying truth-assignment. Furthermore, A and o 
can be constructed in time polynomial in the length of ~; 
thus SAT is polynomial-time reduced to the Kimmo gener- 
ation problem, and the general case of Kimmo generation 
is at least as hard as SAT. Incidentally, note that it is local 
rather than global ambiguity that causes trouble; the gen- 
erator system in the reduction can go through quite a bit of 
search even when there is just one final answer. Figure 2 
traces the operation of the Kimmo generation algorithm 
on a (uniquely) satisfiable formula. 
Like the generator, the Kimmo recognizer can also be 
used to solve computationally difficult problems. One easy 
reduction treats 3SAT rather than SAT, uses negated al- 
phabet symbols instead of a negation sign, and replaces 
the satisfaction automaton with constraints from the dic- 
tionary component; see Barton (1985) for details. 
56 
Generating from lexical form "-xy. -yz. -y-z,xyz" 
1 1,1.1,3 38 + 
2 
3 
4 
5 
6 
7 + 
8 g 
10 
ll 
12 + 
13 
14 
15 + 
16 
17 
18 + 
lg 
20 + 
21 
22 + 
23 
24 (8) 25 
26 27 
28 + 
29 30 
31 + 32 
33 34 + 
35 36 
+ 37 
-F 
-FF 
-FF, -FF, 
- 
-FF, -T 
-FF, -F 
-FF, -FF 
-FF, -FF. 
-FF, -FF, 
-FF, -FF, 
-FF, -FF, 
-FF -FF, 
-FF -FF, 
-FF -FF, 
-FF -FF, 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
-FF 
3,1,1,2 3g 
3.3,1,2 40 (3) 
3,3,1.1 41 
3,3,1,3 42 
XXX y-con. 43 3,3,1,2 44 + 
3,3,3,2 45 
3,3.3,1 46 
- 3,3,3,3 47 (45) -T XXX y-con. 48 
-F 3,3,3,2 49 
-F- 3,3,3,2 50 -F-T 
XXX z-con. 51 + -F-F 3,3,3,2 52 
-F-F, 3,3.3,1 53 -FF,-F-F,T XXX x-con. 54 + 
-FF,-F-F,F 3,3,3,1 55 
-FF,-F-F,FT XXX y-con. 56 (2) 
-FF, -F-F,FF 3,3,3,1 57 -FF,-F-F,FFT XXX z-con. 58 
-FF,-F-F,FFF 3,3,3,1 5g (57) 
-FF,-F-F,FFF XXX satis, nf. 60 
-FT 3,3,2,2 61 -FT, 3,3,2,1 62 
-FT,- 3,3,2,3 63 + -FT,-T XXX y-con. 64 
-FT,-F 3,3,2,2 65 
-FT,-F- 3,3,2,2 66 (64) -FT,-F-F XXX z-con. 67 
-FT,-F-T 3,3,2,2 68 
-FT,-F-T. 3,3,2,1 6g -FT,-F-T,T XXX x-con. 70 + 
-FT,-F-T,F 3,3,2,1 71 -FT,-F-T,FT XXX y-con. 72 
-FT,-F-T,FF 3,3,2,1 73 + -FT,-F-T,FFF XXX z-con. 74 
-FF,-FT,-F-T,FFT 3,3,2,2 "-FF,-FT,-F-T,FFT" *** 
result 
-FT 
-FT, 
-FT, - 
-FT, -F 
-FT, -T 
-FT -TF 
-FT -TF, 
-FT -TT 
-FT -TT, 
-FT -TT, - 
-FT -TT, -F 
-FT -TT, -T 
-FT -TT,-T- 
-FT -TT,-T-F 
-FT -TT,-T-T 
-FT -TT,-T-T, 
-T 
-TF 
-TF, 
-TT 
-TT 
-TT - 
-TT -F 
-TT -T 
-TT -TF 
-TT -TF, 
-TT -TT 
-TT -TT. 
-TT -TT, - 
-TT -TT, -F 
-TT -TT, -T 
-TT -TT,-T- 
-TT -TT,-T-F 
-TT -TT.-T-T 
-TT -TT,-T-T, 
3,2,1,2 3,2,1,1 
3,2,1,3 XXX y-con. 
3.2,1,1 3,2,3,1 
XXX satis. 
3,2,2,2 3,2,2,1 
3,2,2,3 
XXX y-con. 3,2,2,1 
3,2,2.3 
XXX z-con. 
3,2,2,1 
XXX saris. 
2,1,1,1 
2,3,1,1 
XXX saris. 
2,2,1,2 2,2,1,1 
2,2,1,3 
XXX y-con. 2,2,1,1 
2,2,3,1 XXX saris. 
2,2,2,2 2,2,2,1 
2,2,2,3 
XXX y-con. 2,2,2,1 
2,2.2,3 XXXz-eon. 
2,2,2,1 
XXX satis. 
("-FF,-FT,-F-T, FFT" ) 
Figure 2: The generator system for deciding the satisfiability of Boolean formulas in x, y, 
and z goes through these steps when applied to the encoded version of the (satisfiable) formula 
(5 V y)&(~ V z)&(~ V ~)&(z V y V z). Though only one truth-assignment will satisfy the formula, 
it takes quite a bit of backtracking to find it. The notation used here for describing generator actions is 
similar to that used to describe recognizer actions in Figure ??, but a surface rather than a lexical string 
is the goal. A *-entry in the backtracking column indicates backtracking from an immediate failure in the 
preceding step, which does not require the full backtracking mechanism to be invoked. 
THE EFFECT 
OF PRECOMPILATION 
Since the above reductions require both the lan- 
guage description and the input string to vary with the 
SAT/3SAT problem to be solved, there arises the question 
of whether some computationally intensive form of pre- 
compilation could blunt the force of the reduction, paying 
a large compilation cost once and allowing Kimmo run- 
time for a fixed grammar to be uniformly fast thereafter. 
This section considers four aspects of the precompilation 
question. 
First, the external description of a Kimmo automator 
or lexicon is not the same as the form used at runtime. In- 
stead, the external descriptions are converted to internal 
forms: RMACHINE and GMACHINE forms for automata, 
letter trees for lexicons (Gajek et al., 1983). Hence the 
complexity implied by the reduction might actually apply 
to the construction of these internal forms; the complexity 
of the generation problem (for instance) might be concen- 
trated in the construction of the "feasible-pair list" and 
the GMACHINE. This possibility can be disposed of by 
reformulating the reduction so that the formal problems 
and the construction specify machines in terms of their in- 
ternal forms rather than their external descriptions. The 
GMACHINEs for the class of machines created in the con- 
struction have a regular structure, and it is easy to build 
them directly instead of building descriptions in external" 
format. As traces of recognizer operation suggest, it is 
runtime processing that makes translated SAT problems 
difficult for a Kimmo system to solve. 
Second, there is another kind of preprocessing that 
might be expected to help. It is possible to compile a 
set of Kimmo automata into a single large automaton (a 
BIGMACHINE) that will run faster than the original set. 
The system will usually run faster with one large automa- 
ton than with several small ones, since it has only one 
machine to step and the speed of stepping a machine is 
largely independent of its size. Since it can take exponen- 
tial time to build the BIGMACHINE for a translated SAT 
problem, the reduction formally allows the possibility that 
BIGMACHINE precompilation could make runtime pro- 
57 
cessing uniformly efficient. However, an expensive BIG- 
MACH\]NE precompilation step does not help runtime pro- 
cessing enough to change the fundamental complexity of 
the algorithms. Recall that the main ingredients of Kimmo 
runtime complexity are the mechanical operation of the 
automata, the difficulty of finding the right lexical-surface 
correspondence, and the necessity of choosing among alter- 
native lexicons. BIGMACHINE precompilation will speed 
up the mechanical operation of the automata, but it will 
not help in the difficult task of deciding which lexical- 
surface pair will be globally acceptable. Precompilation 
oils the machinery, but accomplishes no radical changes. 
Third, BIGMACHINE precompilation also sheds light 
on another precompilation question. Though B\]GMA- 
CHINE precompilation involves exponential blowup in the 
worst case (for example, with the SAT automata), in prac- 
tice the size of the BIGMACHINE varies -- thus naturally 
raising the question of what distinguishes the "explosive" 
sets of automata from those with more civilized behav- 
ior. It is sometimes suggested that the degree of inter- 
action among constraints determines the amount of BIG- 
MACHINE blowup. Since the computational difficulty of 
SAT problems results in large measure from their "global" 
character, the size of the BIGMACHINE for the SAT sys- 
tem comes as no surprise under the interaction theory. 
However, a slight change in the SAT automata demon- 
strates that BIGMACHINE size is not a good measure 
of interaction among constraints. Eliminate the satisfac- 
tion automaton from the generator system, leaving only 
the consistency automata for the variables. Then the sys- 
tem will not search for a satisfying truth-assignment, but 
merely for one that is internally consistent. This change 
entirely eliminates interactions among the automata; yet 
the BIGMACHINE must still be exponentially larger than 
the collection of individual automata, for its states must 
distinguish all the possible truth-assignments to the vari- 
ables in order to enforce consistency. In fact, the lack of 
interactions can actually increase the size of the BIGMA- 
CHINE, since interactions constrain the set of reachable 
state-combinations. 
Finally, it is worth considering whether the nondeter- 
minism involved in constructing the lexical-surface cor- 
respondence can be removed by standard determiniza- 
tion techniques. Every nondeterministic finite-state ma- 
chine has a deterministic counterpart that is equivalent in 
the weak sense that it accepts the same language; aren't 
Kimmo automata just ordinary finite-state machines op- 
erating over an alphabet that consists of pairs of ordinary 
characters? Ignoring subtleties associated with null char- 
acters, Kimmo automata can indeed be viewed in this way 
when they are used to verify or reject hypothesized pairs of 
lexical and surface strings. However, in this use they do not 
need determinizing, for each cell of an automaton descrip- 
tion already lists just one state. In the cases of primary 
interest -- generation and recognition -- the machines are 
used as genuine transducers rather than acceptors. 
The determinizing algorithms that apply to finite-state 
acceptors will not work on transducers, and in fact many 
finite-state transducers are not determinizable at all. Upon 
seeing the first occurrence of a variable in a SAT problem, 
a deterministic transducer cannot know in general whether 
to output T or F. It also cannot wait and output a truth- 
value later, since the variable might occur an unbounded 
number of times before there was sufficient evidence to 
assign the truth-value. A finite-state transducer would not 
be able in general to remember how many outputs had 
been deferred. 
THE EFFECT OF NULLS 
Since Kimmo systems can encode NP-complete prob- 
lems, the general Kimmo generation and recognition prob- 
lems are at least as hard as the difficult problems in NP. 
But could they be even harder? The answer depends on 
whether null characters are allowed. If nulls are completely 
forbidden, the problems are in NP, hence (given the pre- 
vious result) NP-complete. If nulls are completely unre- 
stricted, the problems are PSPACE-complete, thus prob- 
ably even harder than. the problems in NP. However, the 
full power of unrestricted null characters is not needed for 
linguistically relevant processing. 
If null characters are disallowed, the generation prob- 
lem for Kimmo systems can be solved quickly on a nonde- 
terministic machine. Given a set of automata and a lex- 
ical string, the basic nondeterminism of the machine can 
be used to guess the lexical-surface correspondence, which 
the automata can then quickly verify. Since nulls are not 
permitted, the size of the guess cannot get out of hand; 
the lexical and surface strings will have the same length. 
The recognition problem can be solved in the same way 
except that the machine must also guess a path through 
the dictionary. 
If null characters are completely unrestricted, the 
above argument fails; the lexical and surface strings may 
differ so radically in length that the lexical-surface cor- 
respondence cannot be proposed or verified in time poly- 
nomial in input length. The problem becomes PSPACE- 
complete -- as hard as checking for a forced win from 
certain N x N Go configurations, for instance, and prob- 
ably even harder than NP-complete problems (cf. Garey 
and Johnson, 1979:171ff). The proof involves showing that 
Kimmo systems with unrestricted nulls can easily be in- 
duced to work out, in the space between two input char- 
acters, a solution to the difficult Finite State Automata 
Intersection problem. 
58 
The PSPACE-completeness reduction shows that if 
two-level morphology is formally characterized in a way 
that leaves null characters completely unrestricted, it can 
be very hard for the recognizer to reconstruct the superfi- 
cially null characters that may lexically intervene between 
two surface characters. However, unrestricted nulls surely 
are not needed for linguistically relevant Kimmo systems. 
Processing complexity can be reduced by any restriction 
that prevents the number of nulls between surface charac- 
ters from getting too large. As a crude approximation to 
a reasonable constraint, the PSPACE-completeness reduc- 
tion could be ruled out by forbidding entire lexicon entries 
from being deleted on the surface. A suitable restriction 
would make the general Kimmo recognition problems only 
NP-complete. 
Both of the reductions remind us that problems involv- 
ing finite-state machines can be hard. Determining mem- 
bership in a finite-state language may be easy, but using 
finite-state machines for different tasks such as parsing or 
transduction can lead to problems that are computation- 
ally more difficult. 
REFERENCES 
Barton, E. (1985). "The Computational Complexity of 
Two-Level Morphology," A.I. Memo No. 856, M.I.T. 
Artificial Intelligence Laboratory, Cambridge, Mass. 
Gajek, O., H. Beck, D. Elder, and G. Whittemore (1983). 
"LISP Implementation \[of the KIMMO system\]," Texas 
Linguistic Forum 22:187-202. 
Garey, M., and D. Johnson (1979). Computers and In- 
tractability. San Francisco: W. H. Freeman and Co. 
Karttunen, L. (1983). "KIMMO: A Two-Level Morpho- 
• logical Analyzer," Texas Linguistic Forum 22:165-186. 
Karttunen, L., and K. Wittenburg (1983). "A Two-Level 
Morphological Analysis of English," Texas Linguistic 
Forum 22:217-228. 
ACKNOWLEDGEMENTS 
This report describes research done at the Artificial 
Intelligence Laboratory of the Massachusetts Institute of 
Technology. Support for the Laboratory's artificial intel- 
ligence research has been provided in part by the Ad- 
vanced Research Projects Agency of the Department of 
Defense under Office of Naval Research contract N00014- 
80-C-0505. A version of this paper was presented to 
the Workshop on Finite-State Morphology, Center for the 
Study of Language and Information, Stanford University, 
July 29-30, 1985; the author is grateful to Lauri Kart- 
tunen for making that presentation possible. This research 
has benefited from guidance and commentary from Bob 
Berwick, and Bonnie Dorr and Eric Grimson have also 
helped improve the paper. 
59 
