A Freely Available Wide Coverage Morphological Analyzer for English* 
Daniel Karp 1, Yves Schabes, Martin Zaidel, and Dania Egedi 
Department of Computer and Information Science 
University of Pennsylvania 
Philadelphia PA 19104-6389 USA 
dkarp/schabes/zaidel/egedi¢cis, upenn, edu 
Abstract 
This paper presents a morphological lexicon for English 
that handle more than 317000 inflected forms derived 
from over 90000 stems. The lexicon is available in two 
formats. The first can be used by an implementation of 
a two-level processor for morphological analysis (Kart- 
tunen and Wittenhurg, 1983; Antworth, 1990). The 
second, derived from the first one for efficiency rea- 
sons, consists of a disk-based database using a UNIX 
hash table facility (Seltzer and Yigit, 1991). We also 
built an X Window tool to facilitate the maintenance 
and browsing of the lexicon. The package is ready to 
be integrated into an natural language application such 
as a parser through hooks written in Lisp and C. 
To our knowledge, this package is the only available 
free English morphological analyzer with very wide cov- 
erage. 
attributes. To improve performance, we used PC- 
KIMMO as a generator on our lexicons to build a disk- 
based hashed database with a UNIX database facility 
(Seltzer and Yigit, 1991). Both formats, PC-KIMMO 
and database, are now available for distribution. We 
also provide an X Window tool for the database to fa- 
cilitate maintenance and access. Each format contains 
the morphological information for over 317000 English 
words. The morphological database for English runs 
under UNIX; PC-KIMMO runs under UNIX and on a 
PC. 
This package can be easily embedded into a natural 
language parser; hooks for accessing the morphological 
database from a parser are provided for both Lucid 
Common Lisp and C. This morphological database is 
currently being used in a graphical workbench (XTAG) 
for the development of tree-adjoining grammars and 
their parsers (Paroubek et al., 1992). 
1 Introduction 
Morphological analysis has experienced great suc- 
cess since the introduction of two-level morphology 
(Koskenniemi, 1983; Karttunen, 1983). Two-level mor- 
phology and its implementation are now well under- 
stood both linguistically and eomputationany (Kart- 
tunen, 1983; Karttunen and Wittenburg, 1983; Kosken- 
niemi, 1985; Barton et al., 1987; Koskenniemi and 
Church, 1988). This computational model has proved 
to be well suited for many languages. Although there 
are some proprietary wide coverage morphological an- 
alyzers for English, to our knowledge those that are 
freely available provide only very small coverage. 
Working from the 1979 edition of the Collins Dic- 
tionary of the English Language available through 
ACL-DCI (Liberman, 1989), we constructed lexicons 
for PC-KIMMO (Antworth, 1990), a public domain 
implementation of a two-level processor. Using the 
morphological rules for English inflections provided 
by Karttunen and Wittenburg (1983) and our lexicons, 
PC-KIMMO outputs all possible analyses of each in- 
put word, giving its root form and its inflectional 
*This work was partially supported by DARPA Grant N0014- 
90-31863, ARO Grant DAAL03-89-C-0031, and NSF Grant 
IPd90-16592. We thank Aravind Joshl for his support for this 
work. We also thank Evan Antworth, Mark Fo~ter, Laur~ Kart- 
tunen, Mark Liberman, and Annie Zaenen for their help and suggestions. 
?Visiting from Stanford University. 
2 Lexicons for PC-KIMMO 
We used the set of morphological rules for English 
described by Karttunen and Wittenburg (1983). The 
rules handle the following phenomena (among others1): 
epenthesis, y to i correspondences, s-deletion, elision, i 
to y correspondences, gemination, and hyphenation. In 
addition to the set of rules, PC-KIMMO requires lexi- 
cons. We derived PC-KIMMO-style lexicons from the 
1979 edition of the Collins Dictionary of the English 
Language. The 90000-odd roots ~ in the lexicon yield 
over 317000 inflected forms. 
The lexicons use the following parts of speech: verbs 
(V), pronoun (Pron), preposition (Prep), noun (N), de- 
terminer (D), conjunction (Conj), adverb (Adv), and 
adjective (A). Figure 1 shows the distribution of these 
parts of speech ill the two formats: The first column is 
the distribution of the root forms in the PC-KIMMO 
lexicon files, and the second column is tile distribu- 
tion for the inflected forms derived from the lexicons 
and stored in the database. For each word, the lexicon 
lists its lexical form, a continuation class, and a parse. 
The continuation class specifies which inflections the 
lexical form can undergo. At most, a noun root engen- 
ders four inflections (singular, plural, singular genitive, 
plural genitive); an adjective root, three (base, com- 
lWe refer the render to Karttunen and Wittenburg (1983) or 
Antworth (1990) for more details on the morphological rule~. 
2Proper nouns were not included in the tables. 
AcrEs DE COLING-92. NANTES. 23-28 AOt)r 1992 9 5 0 Paoc. oF COLING-92. NArcr~s. AUG. 23-28. 1992 
parative, superlative); and a verb root, five (infinitive, 
third-person singular present, simple past, past partici- 
ple, progressive). The exact number generated by any 
given root depends on its continuation class. 
Pronoun 
Preposition 
Determiner 
Conjunction 
Adverb 
Noun 
Adjective 
Verb 
# Root Forms 
92 
148 
10O 
64 
6992 
50370 
20550 
11880 
TOTAL 90196 
# Inflected Forms 
93 
150 
100 
64 
7176 
199303 
65146 
45445 
317477 
Figure 1: Size of the PC-KIMMO Lexicons. 
2.1 Adjectives 
Ttle continuation classes for adjective specify that the 
word can undergo the rules of comparative and superla- 
tive. For example, the lexicon entry for the adjective 
'funky' is: 
funky A-Root2 "A (~unky)" 
The entry consists of a word ~unky, followed by the 
continuation class hA~oot2, and a parse "A(fuaky)". 
The continuation class specifies that the word can un- 
dergo the normal rules of comparative and superlative, 
and the parse states that the word is an adjective with 
root 'funky'. The following is a sample run of PC- 
KIMMO's recognizer: 
recognizer>>funky 
funky A(funky) 
recognizer>>funkier 
funky+er A(funky) COMP 
recognizer>>funkiest 
funky+est A(funky) SUPER 
The output line contains the root tbrm and any af- 
fixes, separated by '+'s. Thus, a '+' in the output indi- 
cates a morphological rule was used; its absence means 
no rule was used, and the parse was returned as found 
in the lexicon. PC-KIMMO will antomatically add at- 
tributes such as COKP and SUPER to the parse, depend- 
ing on the morphological rule matched by the surface 
form. But for irregularly inflected forms, special con- 
tinuation classes indicate that tbc complete parse (viz., 
part of speech, root, mid attributes) should be taken 
'as is' from the lexicon entry. For example: 
better A-Root I "l(good) COMP" 
beat A..Root; 1 "A (good) SUPFAt" 
good A-Root I "A(good)" 
Tile class A-Root1 tells PC-KIMMO not to apply 
the morphological rules to 'better', 'best', and 'good'. 
Thus, 'gooder' is not recognized as 'goodTcr'. 
recognizeC;~best 
best N(best) SG 
best A(good) SUPER 
best Adv(beet) 
recognizer>>good 
good N(good) SG 
good A(good) 
recognizer>>better 
better N(better) SG 
better A(good) COMP 
better V(better) INF 
better Adv(better) 
recognizer>>gooder 
*** NONE *** 
recognizer>>goodest 
*** NONE *** 
The attributes (such as COl,~') can later be translated 
into feature structures with the help of templates as in 
PATR (Shieber, 1986). The list of attributes is found 
in Appendix A. 
2.2 Nouns 
Inflections of nouns, such as the formation of plural and 
genitive, are handled by morphological rules (unless the 
formation is idiosyncratic). In the lexicon for nouns, 
the continuation class Ii~oott indicates that the for- 
mation of genitive applies regularly and that no other 
inflection applies. The continuation class IIAtoot2 in- 
dicates that the formation of the plural and of the gen- 
itive apply regularly. 
mice N-Root 1 "N (mouse) PL" 
mouse W_Root t "N(mouae) SG" 
ambassador ~-Root2 "I (ambassador)" 
" Thus, the above lexicon entries are recognized as be- 
low: 
recognizer>>mice 
mice N(mouse) PL 
recognizer>>mouse 
mouse N(mouse) S(; 
V(mouse) INF mouse 
recognlzer>>mouses 
mouse+s V(mouse) 3SG PRES 
recogmzer>>mice's 
mice+'s N(mouse) PL GEN 
recognlzer>>mouses' 
*** NONE *** 
recognlzer~:~mouse's 
rnouse+'s N(mouse) SG GEN 
recognizer>>a mbassadors 
ambassador+s N(arnbassador) PL 
r ecognlzer>>ambassador's 
ambassador+'s N(ambassador) SG GEN 
r ecognizer>>ambassadors' 
ambassador+s+'s N(ambassador) PL GEN 
2.3 Verbs 
Given the infinitive form of a verb, the formation of 
the third person singular (+s), its past tense (+ed), its 
past participle (+ed), and its progressive form (+ing) is 
AcrEs DE COLING-92. NANIES, 23-28 Aotrr 1992 9 $ l PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
handled by morphological rules unless lexical idiosyn- 
crasies apply. In order to encode all possible idiosyn- 
crasies over the three verb endings, eight continuation 
classes are defined (see Figure 2). Each continuation 
class specifies the inflectional rules which can apply to 
the given lexical item. 
Continuation class 
V_Rootl 
V.Root2 
V_Root3 
V_Root4 
V_Root5 
V_Root6 
V_Root7 
V_Root8 
Applicable rules 
none 
+ed 
+s 
+s, +ed 
+ing 
+ing, +ed 
+ing, +s 
+in~, +s, +ed 
Figure 2: Continuation classes for verbs 
Examples of lexical entries for verbs follow: 
admire V~oot8 "V(admire)" 
dyeing V_Roo1:1 "V(dye) PROG" 
dye V_~oot4 "V(dye)" 
zigza~ing V-Root I "V(zigzag) PROG" 
zigzagged V-Root1 "V(zigzag) PAST WE" 
zigzagged V_Rootl "V(zigzag) PPART WE" 
zigzag V_Root3 "V(zigzag)" 
tangoes V_P.oot;1 "V(tango) 3SG PRES" 
t;amgo V_Root6 "V(tango)" 
taught V_Rootl "V(teaeh) PAST STR °' 
taught V..Rootl "V(taach) PPART STR" 
teach V-Root7 "V(teach)" 
Examples of runs follow: 
recognizer>>admires 
admireTs V(admire) 3SG PRES 
recognizer>>admired 
admire+ed V(admire) PAST WK 
admire-Fed V(admire) PPART WK 
recognizeC;~admiring 
adrnire+ing V(admire) PROG 
recognizer>admire 
admire V(admire) INF 
recognizer>>dyed 
dyeTed V(dye) PAST WK 
dye+ed V(dye) PPART WK 
recognizer>>dyes 
dye+s N(dye) PL 
dyeTs V(dye) 3SG PRES 
recognlzer>>teaches 
teach+s V(teach) 3SG PRES 
recognizer>>teached 
*** NONE *** 
recoguizer>>taught 
taught V(teach) PAST STR 
taught V(teach) PPART STR recognizer:;~tangoed 
tango+ed V(tango) PAST WK 
tangoTed V(tango) PPART WK 
recognizer~tangoing 
tango+ing V(tango) PROG 
recognizer~tangoes 
tangoes V(tango) 3SG PRES 
The attributes WE (for "weak") and STR (for 
"strong") mark whether the verb forms its past tense 
regularly or irregularly, respectively. The distinction 
enables unambiguous reference to homographs--words 
spelled identically but with different semantic and syn- 
tactic properties. For example, the verb 'lie' with the 
meaning 'to make an untrue statement' and the verb 
'lie' with the meaning 'to be prostrate' have different 
syntactic and morphological behavior: the first one is 
regular, while the second one is irregular: 
He has lain on the floor. 
He has lied about; everything. 
Usually, it suffices to index the syntactic properties of 
each verb by its root form alone. However, homographs 
require addition information. In English, the attributes 
WE and STR are sufficient to distinguish homographs 
with different morphological behavior. 
recognizer>>lied 
lied N(lied) SG 
lie+ed V(lie) PAST WK 
lie+ed V(lie) PPART WK 
recognizer>lain 
lain V(lie) PPART STR 
recognizer>>lay 
lay V(lay) INF 
lay V(lie) PAST STR 
2.4 Other Parts of Speech 
Pronouns, prepositions, determiners, conjunctions, and 
adverbs are given continuation classes that inhibit the 
application of morphological rules. All of the morpho- 
logical informatiou is stored in tile parse in the lexicon 
entry: 
herself Pron "Pron(herself) REFL FEN 3SG" 
it Pron "Pron(it) NEUT 3SG NOMACC" 
behind Prep "Prep(behind)" 
coolly Adv "Adv (coolly)" 
PC-KIMMO recognizes them as follows: 
recognlzer>>herself 
herself Pron(herself) REFL FEM 3SG 
recognizer>it 
it NOt ) 5G 
it Pron(it) NEUT 3SG NOMACC 
recognizer>>behind 
behind N(behind) SG 
behind Adv(behind) 
behind Prep(behind) 
recognlzer>>coolly 
coolly Adv(coolly) 
3 Lexicons as a Database 
PC-KIMMO builds in memory a data structure from 
the complete lexicon. Consequently, our large lexicons 
occupy more than 19 Mbytes of process memory. Fur- 
ther, the large size of the structure implies long search 
times as PC-KIMMO swaps pages in and out. 
Thus, to solve both the time and space problems 
simultaneously, we compiled all inflectional forms into 
AUtT.S DE COI.\]NG-92, NANTES, 23-28 AOt~" 1992 9 5 2 PRoc. OF COLING-92, NANTES, AUG. 23-28, 1992 
a disk-based database using a UNIX hash table facility 
(Seltzer and Yigit, 1991). 
To compile the database, we used PC-K1MMO as 
a generator, inputting each root form and all the end- 
ings that it could take, as indicated by the continuation 
class. The resulting inflected form became thc key, and 
the associated morphological information was then in- 
serted into the database. 
For example, the PC-KIMMO lexicon file contains 
the entry: 
sa,~ if_Root 2 "II (saw)" 
The class LRoot2 indicates that tire noun 'saw' forms 
its plural, singular genitive, and plural genitive reg- 
ularly. Thus, we send to the generator three lexieal 
forms and the three suffixes for each infleetiou, extract- 
ing three inflected surface forms: 
Lexical ea~+s sav+'s sav+s+'s 
Surface saws saw ~ s saws J 
The root form of a noun is identical with the sin- 
gular iuflection, so we have a total of four inflected 
forn~s. Since we know which suffix we added to tbe 
root, we also know the attributes for that inflection. 
The inflected form becomes the key, while tile part of 
speech, root, and attributes are stored as the content 
in tire database. Hence, the lexicon entry for the noun 
'saw' produces four key-content pairs in tbe database: 
Csaw, saw N SG), (saws, saw II PL), (saw's, saw 
l\[ SG GEl\[), (saws ~ , saw l\[ PL GEN). 
Likewise, the verb lexicon contains the entries: 
salt V_Root 8 "V(saw)" 
saw V_Roo~l "vCsee) PAST STR" 
The continuation class VAtoot8 indicates fonr inflec- 
tions besides the infinitive: third-person singular (+s), 
past (+ed), weak past participle (Ted), and present 
participle (+ing). Hence, the generator produces: 
Lexical sal~+s saw+ed saw+ing 
Surface saws sawed sawing 
The class V_Rootl allows no irdlections, but 
builds tire inflection-feature pair directly: (sav, sea 
V PAST STR). 
Ilence, morphological aualysis is rednced to sending 
the surface forms to the database as keys arid retriev- 
ing thc returned strings. Figure 3 lists the database 
keys and content strings produced by the three lexicon 
lines given above. Note that distinct entries are sep- 
arated by '#'. Since multiple lexical forms can map 
to the same surface form, the actual number of keys (ca. 
292000) is less than the number of lexical forms (ca. 
317000). Also, with the database residing on the 
disk, access times average fi to I0 milliseconds, which 
greatly improves upon PC-KIMMO. 
3.1 Implementation Considerations 
Thc large number of keys implies a very large disk 
file. "Ib reduce the size of the file, we take advantage 
of tire morphological similarity in English between an 
inflected form and its lexical root form. Indeed, the 
root is often contained intact within the inflected form. 
Kcy~ontents 
saw N SG#saw V INF#see V PAST STR 
saw N PL#saw V 3SG PRES 
saw N SG GEN 
saw V PROG 
saw V PAST WK#saw V PPART WK 
saw N PL GEN 
Figure 3: Database pairs 
llence, instead of storing the root, we store the number 
of shared characters along with any differing charac- 
ters, and reassemble tile root front the inflected form 
on each database query. Further, despite tire large set 
of attributes, relatively few combinations (ca. 80) are 
meaningful, and can be encoded in a single byte. Since 
a large proportion of roots are wholly contained within 
tire surface form, and since 92% of the keys llave one 
lexical entry, the average content string is only three 
bytes long. Consequently, the total disk file is under 
9Mbytes. We anticipate further compaction in the near 
future. 
3.2 Accompanying Utilities 
Besides the PC-KIMMO lexicons, we currently main- 
tain the database file and an ASCII-character "flat" 
version for on-line database browsing. One program 
converts the lexicons into the database format, while 
others dump the database into the flat file or recon- 
struct tl~e database from the flat file. We have also 
built a X Windows tool to perform maintenance on 
the database file (see Figure 4). This tool automat- 
ically maintains the consistency between the flat file 
and the database file. We have built hooks in C and 
Lisp (Lucid 4.0) to access either the database or PC- 
K1MMO from within a running process. 
~: I~ I I v.,~ Re~: ~ Pre.oun 
V PI~T STR 
kamer V Pl~r SIR I~aJum.U~ 
r- I 
Figure 4: Morphological Database X Window qbol 
4 Obtaining the Analyzer 
The PCoKIMMO lexicons, the database files, ttle LISP 
mtd C access functions, programs for converting be- 
tween formats, and the X Window maintenance tool are 
ACl .T~s DE COLING-92, NAntEs, 23-28 AOt~l" 1992 9 5 3 l'aoc. Ol: COLING-92, NANTES, AUG. 23-28, 1992 
available without charge for research purposes. Please 
send e-mall to zaidell|cia.npann, adn or write to ei- 
ther Yves Sehabas, Martin Zaidel, or Dania Egedi. 
5 Conclusion 
We have presented freely available morphological ta- 
bles and a morphological analyzer to handle English 
inflections. The tables handle approximately 317000 
inflected forms corresponding to 90000 steins. 
These tables can be used by an implementation of a 
two-level processor for morphological analysis such as 
PC-KIMMO. 
However, these large tables degrade the performance 
of PC-KIMMO's current implementation, requiring 
about 18 Mbytes of RAM while slowing the access time. 
To overcome these shortcomings, we created a mor- 
phological analyzer consisting of a disk-based database 
using a UNIX hash table facility. With this database, 
access times average 6 to 10 milliseconds while moving 
all of the data to the disk. We also provide an X Win- 
dow tool for facilitating the maintenance and access to 
the database. 
The package is ready to be integrated into an appli- 
cation such as a parser. Hooks written in Lisp and C 
for accessing these tables are provided. 
To our knowledge, this package is the only available 
free English morphological analyzer with very wide cov- 
erage. 
A List of Attributes 
1SG 1st person singular 
2SG 2nd person singular 
3SG 3rd person singular 
1PL 1st person plural 
2PL 2nd person plural 
3PL 3rd person singular 
2ND 2nd person 
3RD 3rd person 
SG singular 
PL plural 
PROG progressive 
PAST past tense 
PPART past participle 
INF infinitive or present (not 3rd person) 
PRES present 
STR strongly inflected verb 
WK weakly inflected verb 
GEN genitive (+ 's) 
NOM nominative case 
ACC accusative case 
NOMACC nominative or accusative case 
NEG negation 
PASSIVE passive form (for "born") 
to contracted form verb + to 
COMP comparative 
SUPER superlative 
MASC masculine 
FEM feminine 
NEUT neuter 
WH wh-word 
REFL reflexive 
REF1SG 1st person singular referent 
REF2ND 2nd person referent 
REF2SG 2nd person singular referent 
REF2PL 2nd person plural referent 
REF3SG 3rd person singular referent 
REF3PL 3rd person plural referent 
REFMASC masculine referent 
REFFEM feminine referent 
ACRES DE COLING-92, NANTES, 23-28 AOt~r t992 9 5 4 PROC. OF COLING-92, NANTES. AUG. 23-28, 1992 
Un Analyseur Morphologique de l'Anglais 
RSsum~ du papier 
A b)~cely Available Wide Coverage Morphological Analyzer for English 
Daniel Karp, Yves Schabes, Martin Zaidel, et Dania Egedi. 
Nous prdsentous un mmlyseur morphologique de 
l'Anglais. Les tables morphologiques incluent plus de 
317000 formes fldchies, d~rivdes de 90000 racines. 
Les tables ont dtd construites £ l'aide de dietionaires 
dlectroniques (en particulier "Collins Dictionary of the 
English Language, 1979 edition") distribu6es par ACL 
DCI (Liberman, 1989). 
Les tables sont disponibles dans deux formats. Le 
premier format peut 6tre utilisd avec un analyseur 
morphologique £ deux niveaux tel que PC-KIMMO 
(Antworth, 1990). Dans le deuxi~me format, toutes 
les formes fldchies ont ~td insdrSes dans une base de 
donn~e sur disque h l'aide d'un utilitaire sur UNIX 
(Seltzer et Yigit, 1991). Un outil pour X Window per- 
met d'accdder et de modifier cette base de donn~es est 
anssi disponible. 
L'analyseur peut 6tre utilisd par un autre programme 
tel qu'un analysenr syntaxique. Lee tables peuvent 6tre 
accedes en Lisp et C. 
Tables pour PC-KIMMO 
Nous avons utilis~ les rdgles morphologiques de l'anglais 
dcrites par Karttunen et Wittenburg (1983). A l'aide 
de ces rdgles et de dictionaires, nous avons crdd des 
lexiques quit peuvent 6tre utilisd par PC-KIMMO (une 
implementation d'un analyseur morphologique £ deux 
niveaux (Antworth, 1990)). La Table 1 comporte le 
nombres de racines ainsi que le nombres de formes 
fl6chies qui peuvent 6tre reconnues. 
Categories ~ Formes fl~chies 
Pronom (Pron) ~ 93 
Preposition (Prep) ~ 150 
Determinant (D) 100 
Conjonction (Conj) 64 
Adverbe (Adv) 7176 
Nora (N) 199303 
Adjectif (A) 65146 
Verbe (V) 45445 
TOTAL ~ 317477 
Figure 1: Nombre de Racines et de Formes Fl6chies. 
Base de Donn6es 
PC-KIMMO charge la totalit~ du lexique en mdmoire 
sous la forme d'une structure de donnges qui permet 
de factoriser les prefixes communs des mots. Avee nos 
lexiques charges, PC-KIMMO oecupe environ 19 mega 
octets. L'espace mdmoite est trop important et de plus 
le temps d'accds n'est pas satisfaisant. 
Nous avons done compil,~ toutes les formes fi,~chies 
sous forme de base de donnde sur disque avee l'aide d'un 
utilitaire UNIX (Seltzer eL Yigit, 1991). Cette utili- 
taire permet d'dliminer PC-K1MMO t})ut en rdduisant 
I'espace m(imoire (200 kilo octects) et le temps d'accds 
(entre 6 et l0 millidme de secondes). 
Ces tables sont maiutenues sous forme de base de 
donng-es et aussi sous forme de texte. Des programmes 
permettent la transformation de ces tables d'uue form 
b. l'autre. Nous avons ~crit un outil pour X Window 
(Figure 2) qui permet d'accdder et de modifier cette 
base de donndes est aussi disponible. 
'xtmu: .,,tt~.,B II I I I in 
\[,-,a,t,,- IIc~-.~ I ~ I ~ J,,~,~ II m ,-,- II Erta " r ~--, 117"~' 
PP~q" ~YIR 
Figure 2: Utilitaire pour la Base de Donndes Mor- 
phologiques 
Distribution 
Nous distribuons ces tables 
ainsi que les utilitaires sans frais avec un contrat de 
non-commercialisation. Veuillez contacter par courier 
,ilectronique zaidel@¢is.upenn.edu ou dcrire h I'une 
des personnes suivantes: Yves Schabes, Martin Zaidel 
ou Dania Egedi. 

Bibliography 
Evan L. Antworth. 1990. PC-KIMMO: a two-levelpro- 
cessor for morphological analysis. Summer Institute 
of Linguistics. 
G. Edward Barton, Robert C. Berwick, and Eric Sven 
Ristad. 1987. Computational Complexity and Natu- 
ral Language. MIT Press. 
Lanri Karttunen and Kent Wittenburg. 1983. A two- 
level morphological analysis of English. Texas Lin- 
guistic Forum, 22:217-228. 
Lauri Karttunen. 1983. KIMMO: A two-level morpho- 
logical analyzer. Texas Linguistic Forum, 22:165- 
186. 
Kirmno Koskenniemi. 1983. Two-level morphology: a 
general computational model for word-form recogni- 
tion and production. Technical report, University of 
Helsinki, Itelsinki, Finland. 
Kimmo Koskenniemi. 1985. An application of the two- 
level model to Finnish. In Fred Karlsson, editor, 
Computational Morphosyntax: Report on Research 
1981-1984. University of Belsiuki. 
Kiramo Koskenniemi and Kenneth W. Church. 1988. 
Complexity, two-level morphology and Finnish. In 
Proceedings of the 12 th International Conference on 
Computational Linguistics (COLING'88). 
Mark Liberman. 1989. Text on tap: the ACL data col- 
lection initiative. In Proceedings of DARPA Work- 
shop on Speech and Natural Language Processing, 
pages 173-188. Morgan Kaufman. 
Patrick Paroubek, Yves Schabes, and Aravind K. Joshi. 
1992. XTAG - a graphical workbench for developing 
tree-adjoining grammars. In Third Conference on 
Applied Natural Language Processing, Trento, Italy. 
Margot Seltzer and Ozan Yigit. Winter 1991. A new 
hashing package for UNIX. In USENIX. 
Stuart M. Shieber, 1986. An Introduction to Unifi- 
cation-Based Approaches to Grammar. Center for 
the Study of Language and Information, Stanford, 
CA. 
