Lexicon and grammar in probabilistic tagging 
of written English. 
Andrew David Be, ale 
Unit for Compum" ~ on the English Languase 
Univenity of ~r Bailngg, 
Lancaster 
England LAI 4Yr 
mb0250~..az.~c~vaxl 
Abstract 
The paper describes the development of software for 
automatic grammatical ana\]ysi$ of unl~'Ui~, unedited 
English text at the Unit for Compm= Research on the Ev~li~h 
Language (UCREL) at the Univet~ of Lancaster. The work 
is ~n'nmtly funded by IBM and carried out in collaboration 
with colleagues at IBM UK (W'~) and IBM Yorktown 
Heights. The paper will focus on the lexicon component of the 
word raging system, the UCREL grammar, the datal~zlks of 
parsed sentences, and the tools that have been written to 
support developmem of these comlm~ems. ~ wozk has 
applications to speech technology, sl~lfing conectim, end 
other areas of natural lmlguage pngessil~ ~y, our goal 
is to provide a language model using transin'ca statistics to 
di.~.nbigu~ al.:mative 1~ for a speech .:a~nicim device. 
1. Text Corpora 
Historically, the use of text corpora to provide mnp/ncal 
data for tes~g gramm.~e.al theories has been regarded as 
important to varying degn~es by philologists and linguists of 
differing pe~msions. The use of co~us citations in ~-~,~ma~ 
and dictionaries pre~t~ electronic da~a processing (Brown. 
1984: 34). While most of the generative 8r~-,-a,iam of the 
60S and 70S ignored corpus ant,,: the inc~tsed power Of the 
new t~mlogy ,wenlw.l~ points the way to new 
applications of computerized text cmlxEa in dictiona~ makln~_: 
style checking and speech w, cognition. Compmer corpora 
present the computational linguist with the diversity and 
complexity of real language which is more challenging for 
testing language models than intuitively derived examples. 
Ultimately grammatl must be judged by their ability to 
contend with the teal facts of language and not just basic 
constructs extrapolated by grammm/ans. 
2. Word Tagging 
The system devised for automatic word tagging or part of 
speech selection for processing nmn/ng Enfli~ text, known as 
the Constituent-Likelihood Automatic Word-tagging System 
(CLAWS) (Garside et aL, 1987) serves as the basis for the 
current work. The word tagging system is an automated 
c~mponent of the probabilist/c parsing system we are curnmtly 
woddng on. In won/tagging, each of the rurmi.$ words in the 
coqms text to be processed is associated with a pre-termina/ 
symbol, denoting word class. In e.~enc~ the CLAWS suite can 
be conceplually divided imo two phases: tag assignment and 
tag selection. 
constable NNSI NNSI: NPI: 
constant JJ NNI 
constituent NNI 
constitutional JJ NNI@ 
construction NNI 
consultant NNI 
cons~"w~-~e JJ W0 
contact NNI VV0 
contained VVD VVN jJ@ 
containing WG NNI% 
contemporary JJ NNI@ 
content NNI JJ VV0@ 
contessa NNSI NNSI : 
contest NNI VV0@ 
contestant NNI 
continue VV0 
continued VVD VVN JB@ 
contraband NNI JJ 
contract NNI W0@ 
contradictory jj 
contrary JJ NNI 
contrast NNI VV0@ 
Figure 1: Section of the CLAWS I.~icon 
JB = attributive adjective; JJ = general adjective: NNI = 
singular~co~mon noun; I~S1 = noun of style or title; NP1 = 
singular proper noun; W0 : base form of lexical verb, VVD 
-- past tense of lex/cal verb; WG = qng form of lexical verb; 
VVN = past participle of lexical verb; %, @ = probability 
markers; :- = word initial capital marker. 
211 
Tag assignmeat involves, for each input nmning word or 
punctuation mask. lexicon look-up, which provides one or 
more potential word tags for each input word or punctuation 
mark. The lexicon is a list of about 8,000 records containing 
fields for 
(1) the word form 
(2) the set of one or more ~u-~41da~ tabs denoting the wont's 
word class(es) with probability markers attached 
indicating three ~ levels of plrl0~tl~lity. 
Words not in the CLAWS lcxicoa me assigned potemial 
tabs either by suffixlist look-up, which attempts to match end 
characters of the input wo~ with a suffix in the ~ or, 
if the input word does not have a word.ending to match one of 
these enuies, default tags are assigned. The procedures emure 
that ~ words and neologL~as not: in the lezi~n .am 
given an analysis. 
de NNI 
ade NNI VV0 NPI: 
made JJ 
ede VV0 NPI : 
ide NNI W0 
side NNI 
wide JJ 
oxide NNI 
ode NNI VV0 
ude VV0 
rude NNI 
ee NNI 
free JJ 
fe NNI NPI : 
ge NNI W0 NPI- 
dge NN1 WO 
ridge NNI NPI: 
Figure 2: Section of the Suffixlist 
Tag selection disambiguates the aRemative tags that are 
assigned to some of the running words. Disambiguafion is 
achieved by invoking one-step probabilities of tag pair 
E_~kelihoods exmtaed from a previously tagged training corpus 
and upgrading or downgrading likelihoods according to the 
probability markets against word tags in the lexicon or 
suffixlist. In the majority of cases, this first order Ma:kov 
model is sufficient to con~tly select the most likely 
of tags associated with the input nau~g text. (Over 90 per 
ant of running words am correctly disambiguatcd in this way.) 
Exceptions me dealt with by invoking a look up procedme that 
searches through a limited list of groups of two or more 
words, or by automatically adjus~ng the probabilities of 
sequences of three tags in cases where the intermediate tag is 
misleading. 
The curreat vemm of the CLAWS system requires no pro- 
editing and auribums the correct won1 tag to over 96 per cent 
of the input running words, leaving 3 to 4 per cast to be 
conectat by lmaum post.editom. 
3. Error Analysis 
En'm" analysis of CLAWS output has resulted, and 
ccminms to result, in diveaue imlaovemems to the system, 
from the simple adjustm~ of probability weightings against 
tags in the lexicon tO the inclusioa of additional procedures, 
for insum~ m deal wire fl~ dis~cflon l~m pn~r names 
Pare of the system can also be used to develop new parts, 
to extend ~ pans, or to interfaz with other systems. For 
instam~ in onler to lzaXlace a lexicon sufficiently large and 
denial mou~ for pm~t, we _~___d m ~ ~ ori~ 
Ust of almut &000 enuies to or= 20,000 (the new CLAWS 
lexiccm ¢oma~s almut 26,500 enn~es)..In onfer to do this, a 
list of 15,000 wools not alnmdy in the CLAWS lexicon was 
tagged msn~ the CLAWS tag as~gmnem program. (Since they 
wee not already in the lexicon, the candidate tags for each 
new amy were assigned by sut~axlim toolcup or default tag 
asaignmem.) The new list was rhea post-edited by interaJ~ive 
scum edi~ md m~ with the old l~icon. 
Anot/a~ example of 'self impmvemem' is in the pnxluaion 
of a better set of case-step tmmiticea probabilities. The first 
CLAWS system used a mat~ of tag trmsttion probabilities 
derived fnxn the tagged Brown corpus (F-nmcis and gu~em. 
1982). Some cells of this matrix were inaccurate because of 
incompmilz'lity of the Brown tagset and the CS...AWS tagset. To 
remedy this, a new manix was created by a statistics-gathedng 
program that processed the post-edited version of a corpus of 
one million WOldS tagged by the ofigiglal CLAWS suite of 
programs. 
4. Subcategorization 
Apart ~ ~g tim vocaiml~ coverage of the 
CLAWS lexicon, we are also subcamgorizing words belonging 
to the major won1 classes in order to reduce thc over- 
generation of alternative parses of semences of gx~tter than 
trivial lmgtlL The task of subcalegorizafion involves: 
(1) a linguist's specification of a schema or typology of 
lexical sulr.ategorics based ca distributional am1 
212 
functional cri~efi~ 
(2) a lexicographer's judgement in assigning one or more of 
the mbcategory codes in the linguist's schenm to the 
major lexical word forms (verbs, nouns, adjectives). 
The amount of detail demarcated by the sub~ttegodzation 
typology is dependent, in part, on the practical n~quinnne~s of 
the system. ~ subcategorization systems, such as the one 
provided in the Longman Dic~onary of Contempora~ English 
(1978) or Sager's (1981) sutr.atogories, need tO be taken into 
account. But these are assessed critically rather thaa adop~ 
wholesale (see for instanoe Akkenmm et al., 1985 and 
Boguraev et al., 1987, for a discussion of the strengths and 
wea~____~_ of the LDOCE grammar codes). 
\[I\] intran~tlve verb : ache, age, allow, care. conflict, escape. 
occur, mp~y, snow. stay, sun-bad~, swoon, talk, vanish. 
\[2\] transitive verb : abandon, abhor, a11ow, hoild, complete, 
contain, demand, exchange, get. give, house, keep, mail, 
master, oppose, pardo~ spend, sumSe~e~ warn. 
\[3\] copular verb : appear, become, feel, ~ grow, rfmain: 
seem. 
\[4\] prepositional verb : absWd~ aim, ask. belong, cater, 
consist, prey, pry, search, vote. 
\[5\] phrasal verb : blow, build, cry, dn~as, ease. farm, fill, 
hand, jazz, look, open, pop, sham, work. 
\[6\] vevb followed by that-danas : accept, believe, demlnd; 
doubt, feel, guess, know, ~ reckon, mqu~ think. 
\[7\] verb followed by to-infinitive : ask. come, dare, demand, 
fail, hope, intend, need, prefer, pmpese, refuse, seem, try, 
wish. 
\[8\] verb followed by -ing construction : abhor, begin. 
continue, deny, dislike, enjoy, keep, recall, l~'maember, risk, 
suggest. 
\[9\] ambltrans/tive verb : accept, answer, close, omnpile, cook, 
develop, feed, fly, move, obey, prm~ quit. sing, stop, teach. 
try. 
\[A\] verb habitually followed by an adverbial : appear, come, 
go, keep, lie, live, move, put. sit, stand, swim, veer. 
\[W\] verb followed by a wh-dause : ask, choose, doubt, 
imagine, know, matter, mind, wonder. 
Figure 3: The initial schema of eleven verb subcategories 
We began subca~gorization of the CLAWS lexicon by 
word-tagging the 3,000 most frequem words in the Brown 
corpus (Ku~ra and Francis, 1967). An initial system of eleve~ 
verb subcategories was proposed, and judgame~s about which 
subcategory(ies) each verb belonged to wen: empirically tested 
by looking up ena'ies in the microfiche concordenoe of the 
tagged Lancaster/Oslo-Bergen corpus CHofland and Johansson, 
1982; Johansson et aL, 1986) which shows every occur~nce of 
a tagged word in the corpus together with its contexL 
Ahout 2.500 verbs have been coded in this way, and we are 
now wo~ng on a more derailed system of about 80 diffem~ 
verb subcm~q~des using the Lexicon Development 
Em, imnmem of Bogumev et al. (1987). 
5. Constituent Analysis 
The task of implemem~ a p~ohabili~c ~ algorifl~n 
to provide a dismnbiguatod conmimant analysis of uormmcxod 
Enrich is mine demanding than implementing the word 
tagging suite, not least because, in order to operate in a 
maonm" similar tO ~ wofd-tag~\[lg model, the system mcluims 
(1) specification of an appropriate grammar of rules and 
symbols and 
(2) the consuucfion of a sufficiently large d::.bank of parsed 
smm~es conforming tO the (op~msD grammar specified 
in (1) tO provide suuistics of the relative likelihoods of 
cons~uem tag mmsitions for consfiutcot tag 
disambigumion. 
In order m meet these prior n~ptin~ms, researche~ have 
been employed on a full-time basis to assemble a corpus of 
parasd ~ 
6. Grammar Development and Parsed 
Subcorpora 
The databank of approximately 45,000" words of manually 
parsed semences of the Lancaster/Oslo-Bergen corpus 
(Sampson, 1987: 83ff) was processed to .show the disl/nct 
types of pmduodon ndas and ~ir fn~iue~ of occorrenco in 
gv,mmAr associated with the Sampson m:chank. 
of the UCR\]~ pmbabilistic syslz~ (Gandde and Leech, 1987: 
66ff) and mgges~ons from other researchers prompdng new 
rules resulted in a new context-f~e grammar of about 6,000 
pmductians cresting mine steeply nested slmcun~ than those 
of the Sampson g~anm~. (It was antici~m_!~ that steeper 
nesting would mduco the size of the m~ebank requin:d to 
obtain adequate f'n~luency stal~cs.) The new ~w-~rnar is 
defined descriptively in a Parser's Manual (Leech, 1987) and 
formaiLu~ as a set of context-free phrase-su~cmn: productions. 
Developmem of the grammar then proceeded in ~lem 
with the construc~n of a second ,~tnhank of parsed sentences, 
fitting, as closely as pos,~ole" the ralas expressed by the 
grammar. The new databank comprises extracts from 
newspaper r,~pons dining from 1979-80 in the Associated Press 
(A.P) corpus. Any difficolflas the grammarians had in parsing 
were resolved, whine appropriate, by amending or adding rules 
tO the grammar. This methodology resulted in the grammar 
213 
being modified and extended to nearly 10,000 context-free 
productions by December 1987. 
V' -> V 
Od (I) (v) 
Oh (I) (Vn) 
Ob {I) {(Vg)/(Vn)} 
Figure 4: Fragm~ of the Grammar from the l~u-ser's Mamml 
Ob = operator ~ of, or ending with, a form of/~, Od 
ffi operator consisting of, or ending with, a form of ~ Oh - 
operator ~ of, or ending with, a form of the verb 
hart, V ffi main verb with complemmumiom V' ffi predicate; 
Vg = an -/rig veto p~m¢; Vn = a past participle plume; 0 = 
op~oml con~umm; {/} = altcmmive comuiumm. 
7. Constructing the ParsedDambank 
For c~wenieme of ~ editing and compuu= pmcess~,, 
the constituent stmctmm are relamen~ in a linear form, as 
su-inss of ~-,~nafical words with labelled bracketing. The 
grammariam are givan prim-oum of post-¢diu~l output from 
the CLAWS suite. They then construct a consfime~ analysis 
for each sentence on the p~im-om, either in derail or in outline, 
according to the rules described in the Pamer's Mamufl, and 
key in tbeir sm~mms using an input program that checks for 
well-fonnedne~ The wen-fonmsdv~ ~,t~ impo~,~l by 
the pmgr~ a~: 
(I) mat labe2s m legal non-umnin~ symhols 
(2) tl~ labelled brackm tmmce 
(3) that the productions obufined by the ~ analysis am 
contained in the existing grammar. 
One se~ance is p~¢seraed at a time. Any mmrs found by 
the program a~ reported back to the sc~ean, once the 
grammarian has sent what s/he conside~ to be the completed 
prose. Sentences which are not well formed can be ~.edited or 
abandoned. A validity nuuker is appended to the w.f=enco for 
each sentence indicating ~ the semele has bean 
abandoned with errors contain~ in it. 
^ Shortages NN2 of_IO gasoline_NNl and..CC 
rapidly_RR risin~_VVG prlces_NN2 for_IF 
the__AT fuel_NN1 are_VBR given_VVN as_II 
the_AT reasons_NN2 for_IF a_ATI 6.7_MC 
percent_NNU reduc~ion_NNl in_II ~raffic_NNl 
dea~hs_NN2 on_II New_NPI York_NPl s~ane NNI 
• s_$ roads_NNL2 las~_MD year_NNTl . . 
Figure 5: A word.tagged senu:m~ from the AP coqms 
AT = article; AT1 = singular article; CC : coordinating 
conjunction: IF = for as preposifiow, II = l~-posifion; IO = of 
as preposition; MC ffi cardinal number;, MD ffi ordinal number, 
NN2 ffi plural common noun; NNL2 ffi plural locative noun; 
NNTI = u~mporal noun; NNU = unit of measuremen~ RR = 
general adverb; VBR ffi are; $ ffi germanic genitive marker. 
8. Assessing the Parsed Databank and the 
Grammar 
We have written ancillary prosrmn~ to help in the 
development of the tpmumar and to check the validity of the 
parses in the ~*.henk One program searches thnmgh the 
parsed dmtqmk for every occumm~ of a consfimant matching 
a specilied comfimem rag. Output is a list of all occurrances of 
the specil~ ~ together with fnxlucoc~ This facility 
allows selective searching through the 4-t-h~k, which is a 
~0OI for revising p~rts of I11 grnmmar. 
9. Skeleton Parsing 
We are aiming to produce a millinn word corpus of parsed 
sentences by December 1988 so that we can implement a 
variant of the CYK algorithm (Hopemfl and Ullman, 1979: 
140) m obtain a set of pames for each sentence. VRerbi 
labelling (Bahl et aL, 1983; Fomey, 1973) could be used to 
select the most pmbeble prose from ~e output paine set. But 
pmblmm associated with assembling a fully parsed datnhank 
(t) ~ of pmmmicm ml 
(2) .,,H~ the parsed dmalm~ m am evolving grammar. 
In order to cimmmvem these problems, a su~-gy of 
skeleum parsing hm been muoduced. In skeleton pms-ing, 
.gFmmn~mm cream" mininml labelled bracketing by inserting 
only those labelled bmckem that are unconuvversial and, in 
some cases, by insm~g brackets with no labels. The grammar 
validation routine is de-coupled from the input program so 
changes to the smmmar cam be made without disrupting the 
input parsing. The strategy also • prevems extrusive 
re~o~e editing whenever the grammar is modified. 
Grammar development and parsed a~t~nk ccmtmction are 
not mtiw.ly indeI~nd_ ~ however. A sulmet (I0 per cant) of the 
skeleton pames a~ ~ for comparison with the current 
grammar, wiule another subset (I per cent) is checked by 
il~ grnmmariai~. 
Skeleum parting win give us a partially parsed databank 
which should limit the alternative parses compatible with the 
final grammar. We can either assume each parse is equally 
likely and use the fiequency weighted productions generated 
by the paniaUy parsee d:tntmxk to upgrade or downgrade 
alternative parses or we can use a 'restrained' outsidefmside 
algerifl~m (Baker. 1979) to find the optimal parse. 
214 
/.-: ._-> ) ..... ~ ~,~..,. 
A010 1 v 
IS' \[Sd\[N' IN'& \[N Shortages_NN2 \[Po of_IO \[N' \[N gasoline_NNl N\]N' \]Po\]N\] 
N'&\] and_CC \[N'+\[Jm rapidly_RR rising_VVG Jm\] IN prices_NN2 \[P for_IF 
IN" \[Da the_AT Da\] \[N fuel_NNl N\]N" \]P\]N\]N'+\]N'\] IV' lOb are_VBR Oh\] \[Vn 
given_VVN \[P as II \[N' IDa the_AT Da\] IN reasons_NN2 N\]N" \]P\] \[P for_IF 
\[N' \[D a_ATI \[M 6.7_MC MID\] \[N percent_NNU reduction_NNl \[P in_II \[N' \[N 
traffic_NNl deaths_NN2 \[P on_II IN' \[D\[G\[N New_NPI York_NPI state_NNl 
N\] 's_$ G\]D\] \[N roads_NNL2 N\] \[Q\[Nr" \[D\[M last_MD M\]D\] year_NNTl Nr'\]Q\] 
N'\]P\]N\]N'\]P\]N\]N'\]P\]Vn\]V'\]Sd\] ._. S'\] 
Figure 6: A Fully Parsed Veqi~ of the Semmce in figure 5. 
D = general de~ermlnafive element; Da = detetminadve element containing an article as 
the last or only word; G = genitive consmu:tion; Jm = adjective phrase; M = numeral 
' phrase; N ffi nominal; N' ffi noun phrase; N'& =-fltlt conjunct of co-ordinated noun 
phrase; N'+ ffi non-initial conjunct following a conjunction; Nr' = temporal noun phrase; P 
= prepo~on~ phrase; Po ffi p~.pesiaon~ phrase; Q ffi quadfiec S' = sen~ Sd = 
declarative sentenc~ 
A062 96 v 
"" \[S Now RT, , " " \[Si\[N he PPHSI N\] \[V said VVD V\]Si\] , , "_" \[S& 
\[N we PPIS2 HI \[~ arLVBR negotiating VVG \[P under II IN duress NNI N\] 
P\]V\]S~\] ,_, and CC \[S+\[N they_PPHS2 HI IV can_VM p~ay_VV0 \[P w~th_IW 
\[N us_PPI02 N\]PT\[P like_ICS \[N a ATI cat_NNl \[P with_IW IN a_ATI 
mouse_NNl N\]P\]N\]P\]V\]S+\]S\] ._. _ 
Figure 7: A Skeleton Premed Se~a~ce. 
word rags: ICS = im~0os/tion.conjuncli~; IW = w/~, w/thou: as prepositions; 
PPHSI = he, she;, PPI-IS2 = they; PPI02 = m~. PPIS2 = we;, RT = nominal adverb of 
time; VM = modal auxiliary verb; ~,pert~r. S = incl~d~ sentence; S& = first 
coordi-,,,'d main cJause; S+ = non-inital coordinated main clmu~ following a 
conjun~iom Si = inte~olated or appended sentence. 
10. Feamrisation 
The development of the CLAWS tagset md UCREL 
grammar owes much to the work of Quirk et al. (1985) while 
the tags themselves have evolved from the Brown tagset 
G:~ and Ku~ra, 1982). However, the rules and symbols 
chosen have been wa~l,-~_ into a notation compatible with 
other theories of grammar. For instate, tags from the 
extended ve~ion of the CLAWS lexicon have been translated 
into a formalism compatible with the Winchester pa~er 
(Sharman, 1988). A program has also been written to map all 
of the ten thousand productions of the c~urent UCREL 
grammar into the notation used by the Gr~-mm~tr Deve/opment 
Environment ((\]DE) (Briscoe et at., 1987; Grover et aL, 1988; 
Carroll et aL. 1988). This is a l~.liminary step in the task of 
recasting the grammar into a feanne-hased unification 
formalism which will allow us to radically reduce the size of 
the rule set while preventing file grammar from overgeneradng. 
V 1 
\[ W0* \] 50 85 
\[ VV0* N" \] 800 86 
\[ W0* J \] 80 87 
\[ VV0* P \] 400 88 
\[ VV0* R \] 80 89 
\[ W0* Fn \] 100 90 
Figure 8: A Fragment of tl~ UCREL grammar 
215 
! 
PSRULE V85 : V1 --3, V. 
PSRULE V86 : V1 --~ V NP. 
PSRULE V87 : VX --~ V AP. 
PSRULE V88 : V1 --~ V PP. 
PSRULE V89 : V1 --~ V ADVP. 
PSRULE vg0 : V1 -~ V V2 \[FIN\]. 
Figure 9: Tramlmion of the Rules in Figure 8 
into ODE ~msematio~ 
1 I. Summary 
In ,~m~/, we have a wor~ tagging system fl~ 
minimal post-editing, a _~jly accumulating ¢oqms of parsed 
and a ¢OIIge~-fl~: ~'.~rnmar of about ten thousand 
producdons which is currently being recast into a 
unification forma, m Additionally, w~ have p~grams for 
extruding statistical and conocatinnal data from both word 
tagged and pined text cotl~Om. 
12. Acknowledgements 
The author is a member of a gnmp of tesearchem woddng 
at the Unit for Computer Research on the English Language at 
Lancaster Univemity. The ~ members of UCREL me 
Geoffrey Leech, Roger Gannde (UCRI~ directmu), 
Beale, Louise Denmark, Steve ~liou., Jean Forum., Fanny 
Leech and IAta Taylor. The work is ~nently funded by IBM 
UK (research grant: 8231053 and ~ out in collaboration 
with Oaire Graver, Richard Sharma~ Peter Aldemo~ Ezra 
Black and Frederick Jelinck of IBM. 
13. References 
Erik Akkerman, Pieter Masereeuw and V/ilium Meijs (1985). 
'Designing a Com~ Lexi~n for Linguistic Proposes'. 
ASCOT Report No. I, CIP-Gegevens KoninHij~e Bib~otheeg. 
Den Haaf, Netherlm~. 
Lalit R. Bahl, Frederick Jelinck and Rol~rt L Mercer (1983). 
"A Maximum I.ik~lillood A~ tO ~ Speech 
Recognition', IEEE Transactions on Pattern Analysis and 
Machine In:eUigence, VoL PAMI-5, No. 2, March 1983. 
J. IL Baker (1979). 'Trainable Grammms for Speech 
Recognition,' Proceedings of the Spring Conference of the 
Acoustical Society of America. 
Bran Boguraev, Ted Brlscoe, John ~ll, David ~ and 
Claire Graver (19873. 'The Derivation of a Grammatically 
Indexed Lexicon from the Longman Di~onary of 
Contemporary Engfish', Proceedings of ACL-87, Ste~forrL 
California. 
Ted Brise~, Claire Grover, Bran Boguraev, Jolm Carroll 
(19873. 'A Formalism and Environment for the Develol~nent 
of a Large Grammar of English', proceedings of IJCAI, Milan. 
Keith Brown (1984)./~nguugi¢$ Today, Fomana, U.K. 
John Carroll, Brml Bo~, Claire Grover, Ted Briscoe 
(1988). 'The Grammar Development Environment User 
M~ual', Cambridge Computer Laboratory Technical Report 
127, Cambridge, England. 
Roger Gmside, Geoffrey Leech aad Geoff~y Sampson (19873. 
The Comp,m~gnal Analysis of English: A Corpus-Based 
Approach, Longman, London and New York. 
Claire Graver, Ted Bt~.oe, John Can~ll, Bran Boguraev 
(1988). 'The Alvey Natural L,mguage Tools Proje:t Grammar:. 
A Wide-Coverage Compalafiooai Grammar of F~Sllxh', 
Lancaster Papers In ~ 47. ~ of Linguistics. 
Univorsity of Lma:uler: Mawdt 1988. 
G. Fomey, Jr. (1973). '1"he Viu~oi Algorithm', Proc. IEEE, 
Vol 61: March 1973, pp. 268-278. 
W. Nelson Franc~ mad Henry ~ (1982). Frequency 
• Analysis of English Usage: Lexicon and Granmu~, Houghtoo 
Boston. 
Knut Hofland and Stig Johansson (1982). Word Frequencies in 
BriOJh and Ismerican EnglisS. Norwegian Computing Cenue 
for the Humanities. Bergen: Longmmx. Lo~on. 
John E. Ho~ a~! Jeff~'y D. Ullmm (1979). l~n 
w Automata Theory, Languages, and Compum~on, Addlsow 
Wesley, Reading, MesL 
Stig J~ F.~ Atwe~ Roger Gmeide and Geoffrey 
Leech (1986). Whe Tagged LOB Corpus Users' Mmmal,' 
Norwegian Computing ~ for the Humanities, Bergen. 
Henry ~ and W. Nelson Francis (19673. Compum:ional 
Analysis of Present-day Ame~an English, Brown Unive:sity 
Press, Pmvidmu:e, Rlmde lsla~ 
Geoffrey L~ (198"/). 'Parsers' Manual', Depamnmu of 
!J-m~is~cs, UnivemSy of Lmmu~er. 
Longman Dicdonary of Conu~pomry Eng/~ (1978), second 
edition (19873, Lonmman Group I.imig~ I-Iar~w and l~Jnelmld 
Randolph Quirk, Sidney G~mn: Geoffrey Leech and Jan 
Svartv~ (19853. A Compre.hens~ Grammar of the English 
Language, Longm~ Inc., New Yor~ 
Naomi Sager (1981). Namra/ Language Information 
Praces~g, Addi-¢on-Wesley, Reading, Mass. 
Geo~ Sampson (1987). "The grammatical database and 
panm 8 scheme' in Gar~de, Leech and Smnpson, pp 82-96. 
Richard A. Slmmmn (1988). "The Winchesl~r Unification 
Parsing System', IBM UICSC Report 999: April 1988. 
216 
