GramCheck: A Grammar and Style Checker 
Flora Ramlrez Bustamante 
Escuela Politdcnica SuI)erior 
Universidad Carh)s lll de Madrid 
C/ l\]utaI'qU(~ i\[5 
28011 Legands, Madrid, SPAIN 
flora@ia, uc3m. es 
Fernando S&nchez Le6n 
Laboratorio de I,ingiifstica InformStica 
Facultad de Filosoffa y Letras 
Universidad Autdnoma de Madrid 
28049 Madrid, SPAIN 
fernando@maria, lllf. uam. es 
Abstract 
This paper presents a gratmnar and 
style checker demonstrator for Span- 
ish and Greek native writers devel- 
oped within the project GramCheck. 
Besides a brief grmnmar error typol- 
ogy for Spanish, a linguistically moti- 
vated approach to detection and diag- 
nosis is presented, based on the gen- 
eralized use of PROLOG extensions 
to highly typed unification-based gram- 
mars. The demonstrator, currently in- 
cluding flfll coverage for agreement er- 
rors and certain head-argmnent relation 
issues, also provides correction by means 
of an analysis-transfer-synthesis cycle. 
Finally, fltture extensions to the (:urrent 
system are discussed. 
1 Introduction 
Grammar checking stelnmed as a logical appli- 
cation from forlner attelni)ts to natural language. 
ulMerstanding by comtmters. Many of the NLU 
systems developed in the 70's indu(le(l a kind of 
error recovery Inechanisln ranging flom the treat- 
ment only of spelling e.rrors, PARRY (1)arkin - 
son c't al., 1977), to tile inclusion also of incom- 
plete int)ut containing some kind of ellipsis, LAD- 
DEll,/LIFEll (Hendrix et al., 1977). 
The interest in the 80's begun to turn consid- 
ering grammar checking as an enterprise of its 
own right (Carbonell & Hayes, 1983), (Ilayes & 
Mouradian, 1981), (Heidorn et al., 1982), (.lensen 
at al., 1983), though many of the approaches 
were still in I;t1(: NLU tradition ((\]harniak, 198a), 
(Granger, 1983), (Kwasny & Sondheimer, 1981), 
(Weischedel & Black, \]980), (Weisehedel & Sond- 
heimer, 1983). A 1985 Ovum report on nal;ll- 
ral language applications (.lohnson, 1985) already 
identifies grammar and style checking as one of 
the seven major apt)lications of NLP. Currently, 
every project in grammar checking has as its goal 
the creation of a writing aid rather than a robust 
man-machine interface (Adriaens, 1994), (llolioli 
ctal., 1992), (Vosse, 1992). 
Current systems dealillg with grammatical de- 
viance have be(m inainly involve(t in the integi~> 
don of special techniques to detect and correct, 
when possible, these, deviances. In some case.s, 
these have be.en incorporated to traditional pars- 
ing techniques, as it is the case with feature re- 
laxation in the context of unification-based for- 
malisms (Bolioli et al., 1992), or the addition of a 
set of catching error rules si)ecially handling the 
deviant constructions (Thurlnair, 1990). In other 
eases, the relaxation component has heen included 
as a new add-in feature to the parsing algoril,hm, 
as in the IBM's PLNLI' aI)proach (Heidorn et al., 
1982), or in the work developed for tim Tra.nsla- 
tor's Workbench t)roject using the METAL MT- 
system (TWB, 1992). 
Besides, an increasing concern in current 
projects is that of linguistic relevance of the anal- 
ysis t)erformed by the grammar correction system. 
In this sense, the adequate integration of error 
detection and correction techniques within main- 
stream grammm" formalisms has l)een addressed 
by a nunl|)er of these projects (\[Iolioli eta/., 1992), 
(Vosse, 1992), ((\]enthia.l ctal., t992), (O(~uthial et 
al., 1994). 
l~bllowing this concern, this paper presents re- 
suits fl'om the project GramCheck (A Gram- 
mar and Style Checker, MLAP93-11), flmded by 
the CEC. GramCheck has developed a grammar 
checker demonstrator for Spanish and Greek na- 
tive writers using ALEP (ET6/1, 1991), (Simp- 
kins, 1994) as the NLP development platform, a 
client-server architeeUlre as implenmnted in the 
X Windows system, Motif as the 'look ~md fe, el' 
interface and Xminfo as the kllowh!dge t)ase, stor- 
age format. Generalized use of extensions to the 
highly typed and unifi(:ation based formalism im- 
i)Iemented in ALEP has been 1)erformed. These 
extensions (called Constraint Solvers, CSs) are 
nothing but pieces of PR()I,OG code l)erforlning 
different l)oolean and relational operations over 
feature wdues. Besides, GramCheck has used 
ongoing results Dora LS-GRAM (LRE61029), a 
project alining at the implementation of middle 
175 
coverage ALEP grammars for a number of Euro- 
pean languages. 
The demonstrator checks whether a document 
contains grammar errors or style weaknesses and, 
if found any, users are provided with messages, 
suggestions and, for grammar errors only, auto- 
inatic correction(s). 
2 Brief grammar error typology 
for Spanish 
The linguistic statements made by developers of 
current grammar checkers based on NLP ted> 
niques are often contradictory regarding the types 
of errors that grammar checkers must correct au- 
tomatically. (Veronis, 1988) claims that native 
writers are unlikely to produce errors involving 
morphological features, while (Vosse, 1992) ac- 
ce.t)ts such morpho-syntactic errors, in spite of tile 
fact that an examination of texts by the author 
revealed that their appearance in native writer's 
texts is not frequent. Both authors agree in char- 
acterizing morpho-syntactic errors as a sainple of 
lack of competence. 
On the other hand, an examination of real texts 
produced by Spanish writers revealed that they do 
produce morpho-syntactic errors I . Spanish is an 
inflectiolml language, which increases the possi- 
bilities of such exrors. Nevertheless, other errors 
related to structural configuration of the language 
ark: produced as well. 
Errors found fall into one of the following sub- 
types, assuming that featurization is the technique 
used in t)arsing sentences: 
1. Mislnatching of features that do It(it af- 
fect representational issues (intra- or inter- 
syntactic agreement on gender, number, per- 
son and cask' for categories showing this 
phenonmnon). These mismatchings produce 
rto~?,- St'l"ttct~K/'ttl e7"l'O~'S. 
2. Mismatching of fl:atures which describe cer- 
tain representational properties for cate- 
gories, as wrong head-argument relations, 
word order and substitution of certain cat- 
egories. These mismatchings produce struc- 
tural errors. 
Table 1 shows rite percentages of diflhrent types 
of errors tbund in the corpus. Punctuation errors 
must be considered as structural violations, while 
for style weaknesses, it depends on its subtype. 
Errors at the lexical level are diflh:ult to classit~y 
and itios(; of them must regaMed as spelling rather 
than gralnlnar errors. The nulnl)er of erroIs iden- 
tiffed in this corpus is 543. These statisti(:s couhl 
1The corims used contains nearly 7(I,000 words in- 
cluding text fragments from literature, newspal)(:rs , 
technical and administradw: documentation. It has 
been provided to a large extent by GramCheck pilot 
user, ANAYA, S.A. 
be regarded as a representative average of the fi'e- 
quency of errors/mistakes oc(:urring in Spanish 
texts. 
Table \]: Statistics of errors 
\[~ Type of error Percent ~tge 
Non-struct:urM violations as described above 18,5 
Structural violations as described above 9,7 
PttllCt ua.Lioi1 Ollligsion ,'{2.2 
Addition 4.8 
Errors at the At the character lev(l 6.3 
lexic~d level a Stress 8.0 
Style Structural 3.5 
weaknesses Lexical 12,0 
Others 5.0 
"Errors at the h'xical level includ(' tyl)ing errors, 
word segmentation (ai no vs sirl.o), and cognitive errors 
(onccavo (pmtitivc) vs nndd.cirno (ordinal). 
A presupposition adopted in the project led to 
the idea that violations at the featm'e level can be 
capl:ured by means of the relaxation of the possi- 
bly violated features while violations at (;he level of 
configuration may not be relaxed withou(; raising 
unpredictable parsing results, thus being candi- 
dates for the implementation of explicit rules en- 
coding such incorrect structures. 
Under this view, a comprehensive gralnlnar 
checker must make use of both strategies, called in 
the literature feature or constraint relaxation 
and error anticipation, respectively. 
However, given the relevance of features ill the 
encoding of linguistic information in TFSs, SOIIlO 
strllcttlral errors (;ail t)e re.analyzed as agrcelilent 
errors in a wide sense (as feature nfisma.tching 
violal;ions rather than structural ones). This al- 
lows the implementatioll of a uniforn) apt)roach to 
grammar ('orrcction, thus a.voitting explicit rules 
for ill-formed illpUt. This paI)er describes such 
ilnpleinentatioI~ for both Ilon-sl;rnctural and strltt:- 
tura.I violations. 
3 Error detection, diagnosis and 
correction techniques 
The overall strate.gy for detection, diagnosis and 
(:orre(;tion of gramnmr a,n(1 style errors wil;hin 
GramCheck relies on three axes: 
• For detection, a combined fcal, ure rela:,:atiorl, 
and error anticipation apI)roach is adopt(xl. 
In order to iml)le.mcnt the former, extensiv(~ 
use of external CSs is performe(1 in the anal-- 
ysis grammar, whereas for the latter, exl)licit 
rules, adequate.ly detined either in the core 
gralnlilar or in satellite, subgranlnlars~ are 
iml)lenmnte(l 2 . 
2Gram(Hwck checks texts belonging I,o the s(;an- 
dard language and ix) the ~(hninisl;rativ(: subl~mguagc. 
The analysis moduh! has 1)een (:on(:eived ¢~s COml)osed 
1)y a (:()re grmmna.r lind (;we sate\]lit(: sut)grmnmars 
for overlappiltg (:as(:s tha.t are mutuMly exchlsive. 
Thus, (:he acl;ivaLiolt of one subgrammar implies i;he 
176 
• Diagnosis is performed 1)y t;he CSs themselv('.s 
wit;ix the aid of a hcwristic I;cch, niqu(', for those 
errors wher(*' tests should 1/(*' l)erformed Oil 
so.veral olo, lll(~Ill;S mid a, lmttm'n-'rc.lal, cd t, cch- 
niquc which 1)rovides a mr;ram to extx;nd fea- 
t;urc vahlcs wil;h a gra(lal;iotl of (',OII'(R:I, ;tilt| 
l)osit)h ', })tit iltt:orre(:l; inforltutl;ion. The tyl> 
ical case for l,ho, former is }/,gl'tR!IIl(*'llI;, l,hus 
for signs inw)lving Lhis Lyl)e o\[ information, 
both an initial h(;uristic vahm is assigned 
and aritlmle, tica\] ot)(;rati(ms re'c, perform(~d on 
(in)cqualiLy l;(;sl,s. As for the \]&l;l,cr, head- 
&lgtllllt;tll; r(;lations wh(;r(; l)()llll(\[ prt*'l)OSitions 
&l:e involved are tre~tx*'d this way. li'or MI 
g;r~muna,r err()rs t}l(we is IlO notion o\[' weak vs. 
strong diagnosis, b('.ing all (:onsidere(1 ,%rong 
(;rrors ne(;(ling ;I, utonl~Ll;i(; (;orr(~(;l,iOl~. 
• (\]orre(:l;ion is p(!rfornw.d by m(m.ns of I,I'(R~ 
Lrans(hlc.tion of lJnguisti(: Sl,rlt('tltr(;s (L~s) 
(',ontaining errors l;() ;t qa.ngua.g(:' (ac, tually 
a 'langua.ge use') (letine(1 as corvecl, &m',,ish.. 
These synthesized sl;rucl;urc.s are (tist)la.ye(l to 
the user. The ovcrM1 (It.sign is then simila,r to 
a transfer-based MT sysl;elll, where (;It(*. llSll&l 
cycle is a.nalysis-tra,nsfer-synthesis, being t,he 
ill&ill dilti~.rences the addition ()f the ab()ve- 
lllOlltiOIl(~,(1 ~l':~/,llllll;l,l" (:(/rr(~('ti(/n (levi(;e,~ a.n(t 
the fa(:t iJm.t n(/l; a.ll, but only in(:()rr(~(:t sen- 
teI)Ces, will l)e l)ush('xl through th(! c()ml)h%(! 
(:y(:h!. 
CSs alh)w the relaxation of certain f('~a.tm'es in 
the gramumr rules whos(~ unifi(:ati(m will 1)c (h> 
(:idea upon, in ;~ n(m-l;rivial way, within tlm,~(~ CSs. 
Thus, l'ltlcs (1() nol; \])(;rforln f(',;tLUl(; wduc (;bet:king, 
so CSs play a. (:ru(:ia\] roh~ 1)erfornling (;xttmtled 
wnia\])le uifiti(:~t,ion a.n(l t;aking approl)ria.Lt~ (teci.- 
sions. I)(~l)(m(ting on l;tm err(n' Lyt)e, (~Ss (:arvy out 
(lifferl',nt oi)erations ()n t'eaturt*'s, st:t)rt;s mM lisLs. 
Th(!se Ol)(:rations (:on(:(*'rn t)asi(:a,lly die (tet(*'(:l;ion 
and Llm (*'vahlal;ion o\[ the error, providing a (tiag- 
nosti(: on t, he error and (;or\]'(;(:t vahte(s) \[or fo,~/,1;lll'(~S 
involved. The list ~, of (\]SS \[~lV()lll'S ;/ ()llO Sl;(; I) (li- 
agnt/sis l)ro(:(xhH'e w\]mrc (\[(x:i,'d(/lm are t)nly l,a.k('.l~ 
()tl(:(~ a\]l l;h(*' r(*'h'.v;mt ilfl't)rnl~t;it)n is g;~l,h(;r(M. 
3.1 A h(mristi(" tcchniqu(; Lo diagnose 
II()II- S t;r 11Ct iii'~-i\[ (~.r l'Oi'S 
CSs are used in GrmnChe('k roughly in Lira way 
\[)r('.sent(*'d ill (Croud~, \] 994) and (l{uessinl¢, 1994), 
(l(;v(*'l()l)(',l's of CSs fl)r AI~EI'. N(*'v(n'ttml(;ss, while 
t}lese r('~t)(/rl;s t\[cs(:ril)('~ (:onsl;r~Jnl; solving as ml ('x~ 
|;ciisioIl 1;o l;\]lc expl'('.ssive i)()W(',r ()t: El'~l,lllllHt, l' l'llles~ 
the novell;y of t;})e al)t)roa(;h 1)r(~s(mt;(;(l her(~ is |;h(~ 
use, of CSs to allow fe~l;ur(', r(J;/xaLiOll in rules mM 
t)ool(;ml, rela.tioua.l and a.ril;hm(fl;i(:al ()l)(*'rati()ns ()n 
17(JCV;HI|; \['(~,;/.\[;1 ll(}s. 
dea(tival,i0n of l;he other, a,,d in both cases l,hey are 
a,(t(\[c.(l |;o I;he (:o1.'(~ gl:;Hlllllar d(q)endin Z ()I the l;ype of 
(sub)langlta,gc s(Jected hy the user. 
A{~I'C(~III(}II(; (Wl'Ol's t)os(~ ~/, t)ro})l(~Ili for a {~l~ittl- 
m~r (:he(:ker whi(:h i)a,rses ll~l;llr&\] \]mlguage, l)e- 
(',allS(} Lh(', (\[(fl;(,,(;tioll ()f th(; (Wl'or a,li(l L}I(~ (liagnosi,q 
1)roce(lur(~ have t() })e 1)(M'orme(l aui;onm,t;i(m,lly. In 
inflectional languages, like Sl)anish, this issue is 
essential given tha£ in certain (;otti,(~xl2-; iL is nol, 
t)ossi/)le to give a, single (:orr(',cl,ioll wht!n lmrforn> 
ing a.na.ly,qis only aL s(!nten(:( ~, level (i.e. without 
a.napht)ric, r(!la.d()ns). For these ('.ases, l;h(! sys\[,(!lll 
should 1)e provided with a hemistics for the c.()rrt!c.- 
don in order to detect and diagnose 1,t1(.' 1)la(:t,.(s) 
Ii,*',., 
to take a (ledsion about the unit(s) to 1)e (:of 
rt'x:te(l, l/or (;r~mK\]h(~(:l% dfis htmrisdt:s relies on a 
t);u'aaneLriz;~Li(m of two assumptions: 
1. Lhc (:OllSLiLitellt which holds the fea.tm(~ vahms 
l,hal; ill a, given (~r\]or siLuadon control I;he resL 
of Llw. lea.Lure vahl(!S in the other (:()nstil;u(mi,s, 
2. tim (wa\]uaLion of Lhe llllllJ)(!l' O{ c.onsLitu(!ntN 
which share and d() not ,share dm same values. 
Our diagnosis 1)rocedure assmnes dial t,h(' g(m- 
der and munber thatures in tim head of a l)luas(~ 
coIfl,rol t;\]msc ill Lhe (teptmdeafl; constiLu(!nl;(s), a,1- 
though, as it will 1)e l)rovcd laLer, this is not: net'- 
essarily l.ru(;. Ill order Lo (Io tiffs dia.gnosis in(we- 
dure, the CS will COltI;l'~lSl; thoso ~k~3q.\[;lll'(!s ;lll({ \[(~lV(! 
SOltte ('.lll(~,q o\[ l;his evalu;~don in phrasa.l proj(x'Lions 
in order f'()t I,h(;se to t)e availa.ble for furl;her op- 
(!la.l;it)\[ts should l\]ley were tlect*'ss;H'y. 'l'\]les(? (;ht(m 
are sha.p(~(\[ as scores in the a,t)proa.ch a(h)I)t;(~(l tt)l' 
~l, gt'(K':lItClll; (~,l'l'()l'S, D31(1, ill LIIis seltse, ollr heurisLi('.s 
is clos('xl to 1;he inel;ri(: Ol)erations 1)('af()rm(!(1 1)y 
ol;h(*'r/ffm)nnar checkers tmsed on Nil,l ) t:edmiques 
(Veronis, 1988), (Bolioli ct al., \]992), (Vosse, 
\]!).()2), ((~('.nl;hia\] (;I, al., t994). 
Tim core of dfis }wm'isl;i(:s is that deptmding 
(m a. set of linguistic l)rincilfles lmsed on lo.xi(:o 
morl)hoh)gi(:a\] prol)ert;ies , l;ll(; va,hms for gender 
a,ml mmll)er in ce, rl;a, iu h*'xit:al units will 1)e l)rO 
mol;t!d over Lhe, wflues in otlmr units, thus, ~msign- 
ing thent a. higher score. 
Ther(~ are s(w('aa,l conditions which have to I)e 
taken int;() a,(:(x)uHt; in or(h~r t;() t)(M't)rln lJle (Iiaw 
lit)sis 1)I'()(;(',(\[111'(*'. \]!'()F iltS(,;l\[l('(',, lit)liltS with iuht',r- 
(!hi g(md('.r should c.()nl;rol l,he g(utdel' of the l(mL 
of I;h(~ eh',ut(mts in a giveu NIL llowever, if Lh(! 
noun (loes n()l; ha,v(~ inherent, gellder i/,'s a, noun 
thaL shows sex infle(:l,ion then l;hc gcn(h!r va.ht(', 
should t)(; (;(mtrolled t)y l;hos(; ('\](unenLs l;h~tl,, sh;i.r- 
ing Lhe mint(*' wflue, art; majority, lh,.n(',(', a st> 
(tu(;nc(; like el_rims(: casa_f(;n~ (t;he house) must bc 
corret:i;(M inl;o hz_fen~ ca.sa_t'enl t)e(:;ms(~ dfis n(mn 
has inher(~nt feminine g(*'n(ler in Sl)anish. ()n i,}m 
ot;h(*'r ha.rid, an NP like ln_fem chic-o_~l)as(: yflw.p 
a. \['era (lit;. 'the boy t)(!a,ulJlul') shou\](l 1)e (:orr(!clx!(/ 
a.s lo, li'.\[n (:h, ic-.,_ii.n ,q'n(qr-a. l'eln ('tim girl I)(~aut,i- 
ful'), thus (:ha.n~;ing the gen(l(~r value of t,h(, h(m(l 
n(mn ill tim (lirt~(:titm mlgg(mi;(xl \])y the ()Llmr (t(> 
p(',n(hmt (J(ml(;nt,'< This nma,us i;h~t alth()ut,~h dm 
sysl;en) (:()uhl l;a.k(; Lh('. gmMer v~flue ot! the h('.ml a~ 
1'7 7 
the value which commands the whole phrase, the 
munber of elements that share the same feature 
values, if in contrast to those of the head and if 
the head takes its agreement properties from mor- 
phology --ie. are susceptible of keystroke errors- 
, can influence the final decision. Finally, for cases 
where equal scores are obtained, as it happens 
with a non-inherent masculine noun and a fen> 
inine determiner, both possible corrections should 
be pertbrmed, since there is not enough informa- 
tion so as to decide the correct value (unless this 
can be obtained from other agreeing elements in 
the sentence --for instance an attribute to this NP). 
Basically, the final operation to be performed 
with the scores is to determine that the higher the 
score of an element the severer its substitution. 
Thus, scores are clues for the correction of those 
elements having the lowest scores. 
The initialization steps in order to perform the 
heuristic technique are related to the assignment 
of values and scores to lexical projections depend- 
ing on its inherentness. The values for gender 
and number of the head of the projection serve 
as a parameter for the computation of values and 
scores for the possible modifier which could ap- 
pear closed to it. Note that; agreement in Span- 
ish is based on a binary value system. Thus, the 
computation of values for the modifier of a given 
head simply relies on the instantiation of opposite 
values to those of the head. In the case of under- 
specification of the head for gender, for instance, 
the presupposition is that this value is the same 
as the one of the modifier, if this is not under- 
specified. Otherwise, both elements remain un- 
derspecified. Besides, the weight given to control- 
ling elements (50) ensures that there is no way for 
modifiers to overpass this score. Note as well that 
the weight given to inherentless values, as number 
(10), ensures that there are no promoted elements 
in this calculation. The following schematic CS 
illustrates the assignment of scores: 
and(=(Score'number'head,10), 
and( 
or (and (: (lnherm~t~ess'head,yes), = (Score'gender'head,50)), 
~u~d(= (lnherel~tness'head,no), (Score'gender'head, 10))), 
=(Score'ntmlher'mod,0))) 
The following steps to be performed by CSs are 
related to the addition of all those scores associ- 
ated to a given value in the successive rules build- 
ing the nominal prbjection and the percolation of 
morphosyntaetic features: 
or( 
and(or(:((lender'head'mother,Oendcr'mod), 
--(Gendcr'mod,mase'fem)), 
and(num'add(MG\]gN 'SCOI,tF,'H I,\]A D, 
MGEN'SCORE'MOI), 
M(~ EN'SCORE' MOTI IP, I{), 
and(= (Gender'mod'head,(lender'mod'mother), 
hum'add (It (I EN'SCO l/,\]'Y 11EAD, 
HGEN'SOOII,E'MOD, 
IlGEN'SCORE'MOTI\[EII~)))), 
alld(=(Gender'nmd,Gender'mod'head), 
and (nu re'add (H GEN'SCO I{E'ftEA I), 
MCIEN'SCORE'MOD, 
HGEN'SOORE'MOTIIER), 
and(=(Gender'mod'head,Gendcr'mod'mothcr), 
num'add(MGEN'SCORE'HEAD, 
HGEN'SCOR, I~'MOD, 
MGEN'SCOI~E'MOTHER))))) 
The final evahlation performed by CSs is 
done when categories showing agreement over- 
pass their maximal projection, only if no other 
inter-syntagmatic agreement must be taken into 
account (as it is the case with subject-attribute 
agreement, for instance). Postponing in this 
way the final ewfluation ensures that the CS will 
take into account all the previous parameters to 
give an appropriate diagnosis about the complete 
XP containing the agreement violation. This 
evaluation is based on the comparison of scores 
by means of the 'greater than' predicate in or- 
der to determine (a) the correct wflue for the 
feature(s) checked corresponding to the highest 
score(s) (Right_Gender, Right_Number in the ex- 
ample below), to be used by the transfer module, 
and (b) the error diagnosis (gender, number and 
gender_number below), to be used by the error 
handling module that will display appropriate er- 
ror information to the user: 
178 
and( 
or( 
or (and (mun'gt (I 1 (I EN'S CO 1/.1,'" NO l \] N, M ('I,;NN (\]() It.I,;' N O U N), 
=:((-lender' N ram, ltlght' (hulder)), 
and (nuln' gl,(M C I'\]N'H(X) 11 E" N O U N,I 1 (I |,\]N'F,C\]O|I\]",' N O I l N), 
~ ((lender' Mod,night.' (lender))), 
: : (lI(ll'~N',q( \]O11.1<" N O 1 IN, M (I t(N'S (IOII.F,' N O U N)), 
or( 
or(and(hum'st (I\[ N UM'SCOItI'Y NOUN,MNUM'SCOI{E'NOUN), 
= (Number" Noull,lt.ight'N umlmr)), 
and(nmn'gt(MNUM'SC()ltl,YNOU N,IINU M'SCOI( t,Y NOUN), 
::(Number'Mod,ll.ight'Nundmr))), 
, (IINUM'HCOII.I'Y NOUN,MNUM'SCOIi.t'\]'NOU N))), 
if a}l e}elnents agree, scores for one of the argu- 
ments will a}ways be 0, whi}e if this argum(',nt has 
a value different than 0, this information is con- 
sidered as an evidence that an error has ()c(;urre(t, 
the subsequent (:omparison det(~rmining the value 
for {;tit winning score: 
or( 
and(--(MNUM'NCOII.E'MOTII ER,0), 
or(=(M£1\],;N "SCO I{F'M OTI{ 1,;1~.,0), 
and(nmn'gt (M(I ISN'SCORE" M O'\['I I I'31t.,0), 
= ( F.II.T Y P E ,gender) ) ) ), 
trod (mlm'gg(MN U M'SCOI{E'MOTI I Ell.,0), 
(>r (and (= (MCI EN'S(IO R.F' M O'I'1 \[ t'~I1,0), 
:= (EIgI'YP E,mnnber )), 
and (nunl'gt (M (' I,',N'SCOII.F" M O'1'11 F, I1,0), :-:( l,:l{' rvP ~,',g,,,m,,.' ,~,,~,,t,(,,.) ) ) ) ) 
3.2 A pattern-related technique to 
perfi)rm structural (}ITOr 
detection/diagnosis 
Tm'ning back to the. general (letinitions on error 
types given at; the beginning of this doculnent, 
st, ru(:tm'al violations can lie seen ~s special (:ases 
of feature mismatching t)roduced by addition, sub- 
stitution and omission of elements whi(:h result in 
a wrong dependency re}alien: 
Wrong head-argmnent relations 
(it Substitution of a 1)nund preposition by another 
one (PP ~-> PP) 
Los alumnos rclacionan la tareo, \[*a/con\] .su 
conoci'mie'nto. 
(ii) Omission of a bound preposition resu}ting in 
a change of the sub(:at,egorized arguInent (Pl) ~+ Ni,/s) 
Se acord6 \[*/dc/ que tenla una reunidn pot la 
manana. 
(iii) Addition of a p,'eposition resulting in a (:}tang(; 
of the subcategorized argmnent (NP/S ,-} l)P) 
Las emprcsa.s dcma'ndan \[*dc\] 'm~!todo,s. 
In the IIPSG-likc grammar used, bound prel)O- 
sitions at(, considered NI's attached t() the subcat 
list (ie. the subcategorization \]i:ature) of a t)re(l- 
ieative unit. These NPs have the feature pform 
instantiated to the value of the preposition, if atty. 
If the argmnent does not }lave a bound i)reposi- 
lion, the vahle for pform is none. Thus, the ap- 
proach adopted within GramCheck is that these 
err()r cases have a (;orrect rei)reselltation of the de- 
t)enden('y structure where the only offending in- 
fl)rmation is stored as a thature in the governed 
e.lement. 
Tit(', linguistic principle behind the pattern- 
related t(;chni(tue is based on the fact that na- 
tive writers substitute a l)reposition by another 
one when certain a,qsodations between 1)atterns, 
showing either the same }exi(:o-semantic and/or 
syntactic protmrtics , are performed. Thus, this 
kind of error is not. so a(:cidental as it could lm 
imagined. 
t, br instance, Spanish speakers/writers usually 
associate the argument structure of the comtmr- 
at(re adje(:tiv(~ it@riot (lower), which sul)catcgo- 
rizes the l)rei)osition a (to), with the Spanish (:om- 
parative syntactic pattern (inches ... que., less ... 
than) whose second term is introduced by the con- 
jun(:tion q*u', producing phrases such as *i'~@rior 
q'ae instead of i'nferior a. With the verb relacionar' 
(to relate), something similar occurs: t;his verb 
sulmatcgorizes for t,he preposition con; however, 
due. to the fact; that there exists tilt: prel)ositional 
multi-word units ~:n rclo, cidn a an(l c'n r('.lac.idn 
con, st)eakers tend to think that the same 1)ret)osi- 
I;ional alternation can be 1)crfornm(t with tin,. verl) 
(*rclacionar a vs. r'clacionwr con). 
Following this idea, configurational ruh:s are re- 
garded, R)r grammar dmcking, as desc:riptions of 
l)atl;erns~ each of them having associated a wrong 
pattern linked to the correct pattern. Both pat;-- 
terns are in a complementary distribution. This 
way, structural errors can be foreseen and con- 
trolled, and the systeln is provi(led with a mech- 
anism which establishes the way rule constraints 
lnust }to re}axed. 
To cope with this error, a CS operating on lists 
(:he(ks whether the prel)osition in 1;tlo (:onstitu(mt 
attached to the predicative sign belongs to the 
head of the list or to the tail. If the preposition 
is member of the tail, the salne actions showll fO\]' 
agreement errors are performed instandation of 
the (:orrect value and determination of the error 
type. 
4 Error coverage 
The current version of the GrmnCheck demonstra- 
tor is able to deal with the following types of er- 
I'OFH: 
• lntra- and inter-syntagmatie agreement er- 
rors (gender att(l/or number in act, lye with 
both predicative and (:opu}ative verbs and 
passive sentences). 
• Direct obje.cts: omission of tit(: preposition a 
with an animate entity and addition of such 
a preposition with a non-animate entity. 
• Addition, omission and sul)stitution of a 
bomM prepositi(m covering what is (:a}}ed 
179 
deqne£smo the addition of a false bound 
preposition de with clausal arguments and 
quegsmo the omission of the bound prepo- 
sition de with clausal arguments. 
• Errors Oil portmanteau words (use. de el, a el 
instead of del, a O. 
Regarding style issues, three different types of 
weaknesses are detected: structural weaknesses, 
lexical weaknesses and abusive use of passive, 
gerunds and rammer adverbs. While structural 
weaknesses are detected in tim phrase structure 
rules using CSs (noun + "a" + infinitive), by 
means of an error anticipation strategy, lexical 
weaknesses arc detected at the lexical level, with 
no st)octal mechanisms other than simple CSs. 
Lexieal errors currently detected are related with 
the use of Latin words which it is better to avoid, 
foreign words with Spanish deriwttion, cognitive 
erl'ors, foreign words for which a Spanish word is 
recommended and verbosity. 
5 Further developments 
i{,esults obtained with the cmTent demonst;rator 
are very promising. The performance of the sys- 
tem using CSs is similm' to that shown widlout 
them, hence its us(; in conjunction with the detec- 
tion techniques proposed, rather than a burden, 
may be seen as a means to add robustness to NLP 
systems. In fact, CSs may provide more natural 
solutions to grammar implemental;ion issues, like 
PP-attachmellt control. 
Several directions for further developments have 
a\]ready been defined. These include the integra- 
tion of these grammar checking techniques into 
the final release of the LS-GRAM Spanish gram- 
mar, which will have a more realistic coverage ill 
terms both of linguistic i)henomena and lexicon. 
Besides, on this new version of the grammar, hy- 
brid teehniques will be used, taking advantage of 
the preproco.ssing facilities included in ALEP. In 
particular, while for errors like those presented in 
this paper the approach adopted is linguistically 
motivated, for certain imnctuation errors (or sim- 
ply ill order to reduce lexi(:al arab*girl(y) other 
relatively simple iHeailS C~Lli be defined that ill- 
chide (:ertain extended pattern ma.tching on reg- 
ular expressions or the passing of linguist*(: in- 
formation gathered in a t)reproeessing phase to 
the unifh:ation-bas('.d parser. It; is also foreseen to 
inchlde a treatlnent for own*tire spelling errors, 
usually not dealt with by conventional st)elling 
checkers. 
References 
Adriaens G. :1994. The LRE SECC Project: Simpli- 
fied English Grammm: and Style Correct;ion ill an 
MT lhalnework, ill Pr'ocee.ding.s of Li'n.gv.istic Engi- 
nee'rill, 9 Convenl, ion, Paris. 
Bolioli A., 1, Din*, G. Mahmti. \]992. JDII: Pars- 
ing Italian with a Robust Constraint Grammar, 
ill Proce(:dings of tile iSth International Col@fence 
on Computational Linguistics (UOLING- 921:1003- 
1007. 
Carbonell J. and P. Hayes. 1983. l{ecow~ry strategies 
tbr parsing cxtr~gramlna.tical language, in A~ncr- 
*can Journal of Computational .Linguistics, 9(3~ 
4):123-146. 
Charniak E. 11983. A parser with soincthing flom ev- 
eryone, in M. King led.), Par'sing Nat'wral Language., 
Academic Press, London. 
Crouch R. 1994. A protot?lpc ALEP Conatr'aint Solver, 
SRI Into.rnational, Cambridge Research Centre. 
Puhnan, S. G. led.). 1991. EUROTRA ET6/I: l~ulc 
leer'realism and Virtual Machine Des*an Stu@, SRI 
Internationa.1, Cambridge Computer Sckmce lie- 
search Centre, @Commission of European COIIHllll- 
nities. 
lien(hiM D., J. @Ollr|;in t:/, at.. 1992. From Detcc- 
tion/Corre, ction to Coinpul;er Aided Writiltg, m 
Proceed'ing.s of I, hc lgth \[nter'natiorml CorffcreT,:e 
on Comtmtafional Linguist, ic:s (COL1NG-92):10\] 3- 
1018. 
Gendli~d D., ,I. @our(in, J. Mdnd!zo. 1994. Towards a 
more user-friendly correction, in Proceedings of the 
16th International Co~@'.r'cnce on Cornpv, tatiorzal 
Lingv, istic,s (COLING- 9/t) , Kyoto, Jat)a.n. 
Granger R. 1983. The NOMASI) system: ext)(;(:(;al;ion- 
based detection and correction of errors during un- 
derstanding of syntacl;icMly and smnant;ica.lly ii1- 
tbrmed text, Amc'rican Journal of Cornpntationo, l 
Lingv, istic,s, 9(3-4):188q96. 
Ha,yes P. and G. Momadiml. 1981. Flexilfle Pa.rs- 
ing, km, er'icwn .\]ournag of Cornp'utatior~,ag Linlq~t~s- 
tics, 7(4):232-244. 
Heidorn G. E., K..lensen, L. A. Miller, R. 3. Byrd 
and M. S. Chodorow. 1982. Developing a Natural 
Language Interface to Complex Data, ACM 'J}a'~zs. 
on Database; S~\]s., 3(21:105-147. 
Ilendrix G. G, E. 1). Sacerdoti, D. SagMowicz and ,\]. 
Slocum. 1977. Developing a Natural Language ln- 
l;crfaco to Comph;x l)al;~, ACM 7}'ans. on I)atabasc 
Sy.s., 9(2):\]05-147. 
,lensen K., G. E. Itcidorn, L. A. Miller, Y. Rabin. 1983. 
Parse fil;l;ing and prose tixiiJg: getting a hold on 
ill-tormedness, A'm, ericwn, .lo'u,r'nal of Comp'ld, atiorml 
Linguistics, 9(3-4):147-160. 
,\]ohllsOli 'F. :1985. Natu'ra{ LarLg'a, agc Compv, l, ing: The 
Commercial Applicatior~,s, ()wun, London. 
Kwasny S. and N. Sondheimm. \[98\]. Relaxation ted> 
niques for parsing gl'amm~t;ically ill-formed input in 
natural language understanding sysl;enls, Am,r'ican 
,lourv, al of Uom, pv, tational Linqv, istics, 7(2):99~ 108. 
Parkinson R. C, K. M. Colby and W. S. Freight. \] !t77. 
@onvers~tl;ional Langll&ge Comprehension Using \]\[n- 
tegrat('.d l~attern-Matching and Parsing ,ilI ATtij\[- 
cial Intelligence, 9:1 t 1- \] 34. 
11,uessink 1i. 1994. Manual to tll.c ll,(;l~, c:rtensio~,,s jbr" 
ALEI', STT/Oq'S, preliminary version. 
180 
Simpkins N. K. \]!)94. Art, Open Arehitcct'ur'e .for l,a.,,- 
9uage Engineering: The. Advanec.d Langu~ge En- 
gincer'in9 Platform (ALEP). Proceedings of lJ~\]2, 
Paris. 
Ttmrmair G. 1990. Parsing tLr granunar ~md style 
checking, in Proceedings of th, e l~th Ir~te'rna- 
tional Confercrl, cc o'n Computational Linguistics 
(COLING- 90):365-370. 
~lh'~mslator's Workbenelt. 1!)92. Final I{el)ort of the F,s- 
prit Project 23\]5. 
Veronis ,l. 1.988. Morphosynt~Letic correction in natural 
languages interStees, in lbocecdings of the 13th in- 
ternational Cor@rcnce on Computational Linguis- 
tics (COLING-88):708-713. 
Vosse, T. 1.992. Detecting and Correcting Morpho- 
syntactic Errors in Real Texts, Proceedi'ng.s of &c 
3rd Conference on Applied Natural l, an9uage Pro- 
ccssin 9 (14 CL- 92): \] 1 l-118. 
Weischedcl I1.. and .l. Black. 1980. Responding inl;clli- 
gently to unparscable inputs, in AmeT"ican ,Iournal 
of Uomputational Lin9'a,i,stics, 6(2):97-1119. 
W(,is(:hedel it. and N. Sondh('.im('.r. 1983. M('.taruh~s as 
a basis for i)roeessing ill-\[ormed input, it, Amc.'rican 
,lournal of Computational Ling'aisties, 9(3-4):16\]- 
177. 
Wcischedel R. and 1,. A. l{.~mtshaw. 1987. l{.etlections 
()It Knowledg(! ne('dc(\[ to 1)rot;ess ill-formed l~tlt- 
guage, in S. Nirenlmrg (ed.) Machine. Trart,slation. 
Theoretieal and Mctholodogical Issue,s, Caml)ridg(~, 
(\]aml)ridgo. University Press, pl ). 155-:{37. 
181 
