Towards Robust PATR 
Shona Douglas* and Robert Dale t 
Centre for Cognitive Science 
University of Edinburgh 
Edinburgh EH8 9LW, Scotland 
Abstract 
We report on the initial stages of development of a 
robust parsing system, to be used as part of The 
Editor's Assistant, a program that detects and cor- 
rects textual errors and infelicities in the area of 
syntax and style. Our mechanism extends the stan- 
dard PATR-n formalism by indexing the constraints 
on rules and abstracting away control of the appli- 
cation of these constraints. This allows independent 
specification of grouping and ordering of the con- 
straints, which can improve the efficiency of process- 
ing, and in conjunction with information specifying 
whether constraints are necessary or optional, allows 
detection of syntactic errors. 
Introduction 
The Editor's Assistant \[Dale 1989, 1990\] is a rule- 
based system which assists a copy editor in massag- 
ing a text to conform to a house style. The central 
idea is that publishers' style rules can be maintained 
as rules in a knowledge base, and a special inference 
engine that encodes strategies for examining text can 
be used to apply these rules. The program then op- 
erates by interactively detecting and, where possible, 
offering corrections for those aspects of a text which 
do not conform to the rules in the knowledge base. 
The expert-system-like architecture makes it easy to 
modify the system% behaviour by adding new rules 
or switching rule bases for specific purposes. 
Our previous work in this area has been oriented to- 
wards the checking of low-level details in text: for 
example, the format and punctuation of dates, num- 
bers and numerical values; the punctuation and use 
of abbreviations; and the typefaces and abbrevia- 
tions to be used for words from foreign languages. 
In this paper, we describe some recent work we have 
carried out in extending this mechanism to deal with 
syntactic errors; this has led us to a general mecha- 
nism for robust parsing which is applicable outside 
the context of our own work. 
° E-mail addeas is S. Douslaa@ed. ac. uk. 
tAIso of the Department of Artificial Intelligence 
at the University of Edinburgh; e-mail address is 
R,Dale©ed. ac.uk. 
Syntactic Errors 
Categories of Errors 
Ultimately, the aim of The EdilorJs Assistant is to 
deal with real language~unrestricted natural lan- 
guage text in all its richness, with all its idio~yn- 
cracies. The system is therefore an experiment in 
what we call intelligent text processing: an in- 
tersection of techniques from natural language pro- 
cossing and from more mundane text processing ap- 
plications, with the intelligence being derived from 
the addition of language sensitivity to the basic text 
processing mechanisms. 
Many of the corrections made routinely in the course 
of human proofreading require subleties of semantic 
and pragmatic expertise that are simply beyond cur- 
rent resources to emulate. However, examination of 
common syntactic errors and infelicities, both as de- 
scribed in the literature (see, for example, \[Miller 
1986\]) and es appearing in data we have analysed, 
has led us to distinguish a number of tractable er- 
ror types, and we have based the development of our 
system on the various requirements imposed by these 
classes. The error types are defined very much with 
processing requirements in mind; orthogonal cate- 
gorisations are of course possible. We give summary 
descriptions of these cla.~s here; examples are pro- 
vided in Figure 1. 
Constraint Violation Errors: 
These involve what, in most contemporary syn- 
tactic theories, are best viewed as the violation of 
constraints on feature values. All errors in agree- 
ment fall into this category. 
Lexlcal Confusion: These involve the confusion of 
one lexical item with another. We specifically in- 
clude in this category cases where a word contain- 
ing an apostrophe is confused with a similar word 
that does not, or vice versa. 
Syntactic Awkwardness: We include here cases 
where the problem is either stylistic or likely to 
cause processing problems for the reader. Note 
that these 'errors' are not syntactically incorrect, 
but are constructions which, if overused, may re- 
sult in poor writing, and as such are often in- 
cluded in style-checker 'hit-lists'; thus, we would 
include multiple embedding constructions, poten- 
ACT, S DE COLING-92, NANTES, 23-28 AOIJT 1992 4 6 8 PROC. OF COLING-92, NANTES, AUO. 23-28, 1992 
Constraint Violation Errors: 
(l) Subject-verb number disagreement: 
a. *John and Mary runs. 
b. *The dogs runs. 
(2) Premodifier-noun number disagreement: 
a. *This dogs runs. 
b. *All the dog run. 
(3) Subject-complement number disagreement: 
a. *There is live dogs here. 
h. *There are a dog. 
(4) Wrong pronoun case.: 
a. *llc and me ran to the dog. 
b, *This stays between you and I, 
(5) Wrong indelblitc article: 
a. *A apple and an rotten old pear. 
b. A NeXT workstation and *a NEC laptop. 1 
Le.xical Confosion: 
(6) Confusion of its and it's: 
a. *Its late. 
b. *Tim dog ate it's bone. 
(7) Confusion of that,, their, and they're: 
a. *Their is a dog here. 
b, *They're is a dog here. 
e. *Ttmrc dog was cohi. 
d. *They're dog was cold. 
e. *There }lere now. 
f. *Ttmir here now. 
(8) Confusion of p~ivc's and plural s: 
a. *The dog's are cold. 
b. *3'lie boy ate the dogs biscuit. 
Syntactic Awkwardness: 
(9) ~\[bo many prepositional phraees: 
a. Tile boy gave the dog in the window at the end 
with tile red collar with the address on the back 
of it a biscuit. 
(10) i)a~'~ive constructions: 
a. The boy wa.s seen by the dog. 
Missing or extra elements: 
(11) Unpaired delimiters: 
a. *The dog, wllich was in tile garden was quiet. 
(12) Missing delimiters: 
a. *The dog, I think was in the garden. 
b. *In the garden dog,s arc a menace. 
(13) Missing list ~parators: 
a. *There were two dog~ three cats and a canary. 
(ld) Double syntactic function: 
a. *it s~,cms to be is a dog. 
b. *l think know Fve been there before. 
Figure 1 : Example's of Syntactic Errors 
:tally anlbiguous syntactic structures, and garden 
path sentence~s in this category. These problems 
are detectable by simple counting or recognition 
of syntactic forms. 
Missing or Extra Elements: 
These arc c~mes where elements (either words or 
ptmctuation symbols) arc omitted or mistakelfly 
included in a text. An interesting sub-category 
here, which is surprisingly frequent, is the pres~ 
ence of two constituents which serve the same or 
a similar purpose; by analogy with double-word 
errors (where a word appears twice it, succession 
when only one occurrence was intended), we refer 
to these as cas~ of doul)le syntactic function. 
The crrors dealt with in this paper all fall into the 
first class, i.e, thorn that can be seen as breaking 
constraint.s on feature values. At the end of the paper 
we slake sonte observations on how the mechanism 
can bc. extended lx) the other classes. 
:Previous Work 
Of course, there exists a signiiicant body of work 
dealing with (xmlputational approaches to syntactic 
errors like those just discu~.scd. Broadly, work deal- 
ing with ungrammatical input falls into two cate- 
goric: approaches where the principal objective is 
to determine what meaning the speaker intended, 
and approach~ where the principal objective is to 
oonstruct ml appropriate correction. The first kind 
of approach is most appropriate in the development 
of natural language inLcrfaces, where syntactic dys- 
flueneies can often be ignored if tile ~mer~s intentions 
can be determined by means of other evidence, tIow- 
ever, these, approach~ (in the simplest cases, based 
on detecting content words) arc inappropriate where 
the sysLma must al.,~o propose a correction for the 
hypothcsised error. 
Of the different tedmiqucs that have been propo.~md 
under the second category, the most useful is that 
usually referred to as relaxation. Tiffs is a rather 
elegant method for extending a grammar's coverage 
to include ill-formed input, while retaining a princi- 
pled connection between thc constructions accepted 
by the more restrictive grammar and those accepted 
by the extended one. If a grammar exprt.~ses in- 
formaLion ill terms of constraints or conditions on 
features, a slightly leas re,~trietivc grammar can be 
constructed by relaxing some subset of these con- 
strainLs. Work commonly referred to in this corn 
text inelud~ Kwasny and Sondheimer \[1981\] and 
Weischedel and Black \[1980\], but very many systems 
u.~e .some kind of relaxation process, whether of syn- 
tactic or semantic constraints. The most well known 
is lnM'~q work on the Epistle and Critique systems 
\[tleidorn ct el. 1982; Jensen et at. 1983; Richardson 
and Braden-tIarder 1988\]. 
lln British English, NEC is spelled out, rather than 
being pronolmeed like the word neei4 thus, the correct 
form here is an NEC, 
Ae*rl~s DE COIANG-92, NAtCn.:S, 23-28 AOI\]T 1992 4 6 9 PROC. O1: COLING-92, N^I~rgs, AUG. 23-28, 1992 
Epistle parses text in a left-to-right, bottom-up fash- 
ion, using grammar rules written using an augmented 
phrase structure grammar (APSG). In APSG, each 
grammar rule looks like a conventional context-free 
phrase structure rule, but may have arbitrary tests 
and actions specified on both sides of the rule. So, 
for example, we might have a rule like the follow- 
ing: 
(15) NO VP (NUMB.AGREE.NUMB(NP)) -~ 
VP(SUBJECT = NP) 
This rule states that a noun phrase followed by a verb 
phrase together form a VP, 2 provided the number of 
the N\[ and the original VP agree. The resulting VP 
structure then has the original NP as the value of its 
SUBJECT attribute. 
Using rules like the~e, the system attempts to parse a 
sentence as if it were completely grammatical. Then, 
if no parse is found, the system relaxes some condi- 
tions on the rules and tries again; if a parse is now 
obtained, the system can hypothesise the nature of 
the problem on the basis of the particular condition 
that was relaxed. Thus, if the above rule was used 
in analysing the sentence Either of the models are 
acceptable, no parse would be obtained, since the 
number of the NP Either of the models is singular 
whereas the number of the VP are acceptable is plu- 
ral. However, if the number agreement constraint 
is relaxed, a parse will be obtained; the system can 
then suggest that the source of the ungran\]matical- 
ity is the lack of number agreement between subject 
and verb. 
One thing that must be borne in mind when con- 
sidering the merits and demerits of relaxation meth- 
ods is that they depend crucially on how much of 
the particular grammar2s information is expressed 
as constraints on feature values. Where the basic 
form of a grammar is, say, complex phrase struc- 
ture rules, the use of features may be confined to 
checking of number and person agreement. If, on 
the other hand, more of the informative content of 
the grammar is represented as constraints, as in re- 
cently popular unification-based grammars \[Sheibcr 
1986\], relaxation can be used to transform grammars 
to less closely related ones. 
In the remainder of this paper, we show how a 
unification-based formalism, PATR-II, may be ex- 
tended by a declarative specification of relaxations 
so that it can be used flexibly for detecting syntactic 
errors. Under one view, what we are doing here is 
rationally reconstructing the Epistle system within a 
unification-based framework. A useful consequence 
of this exercise is that the adoption of a declarative 
approach to the specification of relaxations makes it 
much easier to explore different processing regimes 
for handling syntactic errors. 
~This second, higher-level VP plays the role of what we 
would normally think of as an S node. 
Y,O Xl X2 
(XO cat) = VP 
(Xl cat) = NP 
(X2 cat) = VP 
(X0subject) = Xl 
(Xl hum) = (X2 hum) 
Figure 2: PATti version of the Epistle rule 
Making PATR Robust 
The Basic Mechanism 
In this section, we describe an experimental system, 
written in Prolog, that is designed to support the 
mechanisms necessary to apply PATR-type rules to 
solve constraints selectively. The major components 
of the system are (a) the parsing mechanism; (b) the 
underlying P^TR system; and (c) the rule application 
mechanism that mediates between these two. 
The parser encodes the chosen strategy for applying 
particular grammar rules in a particular order. At 
this stage, the parser is not a crucial component of 
the system; all we require is that it apply rules in 
a bottom-up fashion. Accordingly, we use a simple 
shift-reduce mechanism. The parser will be the focus 
for many of the proposed extensions discussed later; 
in particular, we are in the process of implementing 
a chart-based mechanism to allow handling of errors 
resulting from missing or extra elements. 
The basic PATR system provides a unification based 
mechanism for solving sets of constraints on feature 
structures. A PATR rule corresponding to the gram- 
mar rule discussed in the context of Epistle above is 
shown in Figure 2. 
It is fairly obvious that, given some mechanism that 
allows us to remove the final constraint in this rule, 
we can emulate the behaviour of the Epistle system. 
In our model, the rule application mechanism pro- 
vides the interface between the parsing mechanism, 
which accesses the lexicon and decides the order in 
which to try rules, and the PATR. system. To see how 
this works, we will consider a slightly morn complex 
rule, shown in Figure 3; the use of the numbers on 
the constraints will be explained below. 
Given this rule, a constituent of category NP will be 
found given two lexical items which axe respectively 
a determiner and a noun, provided all the constraints 
numbered 1 through 6 are found to hold. Note the 
constraint numbered 4: we suppose that the features 
addressed by (X1 agr precedes) and (X2 agr begins) 
may have the values vowel and consonant. This al- 
lows us to specify the appropriate restrictions on the 
use of the two forms a and an. 3 
aOf course, the imp\]ication here that a is used be- 
fore words beginning with a vowel and an is used before 
words beginning with a consonant is an oversimplification. 
There are aiso, of course, other means by which this con- 
ACIT.S DE COL1NG-92, NANTES, 23-28 AOIYI' 1992 4 7 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
X0 Xl X2 
1 (X0 cat) 
2 (Xl cat) 
3 (X2 cat) 
4 (Xl agr precedes) 
5 (Xl agr num) 
6 (X0 agr num) 
:~ NP 
-- Det 
-- N 
(X2 agr begins) 
-- (X2 agr hum) 
= (X2 agr hum) 
Figure 3: Simple NP tale in the PATR formalism 
Relaxing Constraints 
Given the rule in Figure 3, and a standard parsing 
mechanism, there will be no problem in parsing cor- 
rect NPS like these dogs. tlowever, consider our target 
errors in (16a-e): 
(16) a. *this dogs 
b. *an dog 
c. *an dogs 
Exmnple (16a) exhibits premodifier noun number 
disgrecment; (16b) exhibits use of the wrong indefi- 
nite article; and (16c) cxmtains both of these errors. 
If the parser is to make any sense of thcse strings, we 
must introduce a more elaborate control structure. 
Premodifier-noun number agreement is enforced by 
constraint 5; constraint 4 enforces the use of the 
proper indefinite article. We need to be able to relax 
constraint 5 to parse (16a), and to relax constraint 4 
to parse (16t)); to parse (16c), we want to relax both 
(xmstralnts 5 arid 4 at once. 
"\[b deal with this, we make use of the notion of 
a relaxation level Instead of applying all con 
strafers associated with a rule, we specify for evcry 
rule, at any given relaxation level, those constraints 
that are necessary and those that are optional. 
At relaxation level 0, which is equivalent to thc bo- 
haviour of the standard PATR system, all constraints 
are deemed nece.~ary. At relaxation level 1, how- 
ever, constraints 4 and 5 are optional. Optional con- 
straints, if violated, need not result in a failed parse, 
but do correspond to particular errors. 
The algorithm in Figure 4 applies all constraints ap- 
propriately, given a specification as just dcscribed. 
Here, N is the set of nccessary constraints and O 
is the set of optional constraints, both for a given 
relaxation level L; R is the set of constraints which 
have to be relaxed in order for the rule to he used. 
R will always be a subset of O, of course; we r(.~ 
turn the actual vahm of 1~ as a result of parsing 
with the rule. The outer conditional ensures that 
all the necessary constraints are satisfied. The inner 
conditional takes appropriate action for each relax- 
able constraint whether or not it is satisfied: if the 
straint could be d~eckcd; however, we include it here as 
a constraint on the application of the rule for expository 
purp~c~. 
When applying rule r at relaxation level L: 
N ~- necessary constraints on r at L 
O *-- optional constraints on r at L 
;~ ~ {} 
if all n (: N can be solved 
then incorporate any instantiations required 
for eadl oi r50 do 
if o, can be solved 
then incorporate instantiations 
else R ¢-- O ~ O o~ 
endif 
next 
else return failure 
endif 
return C 
Figure 4: The relaxation algorithm, version 1 
l~.claxation level 0: 
necessary constraints = {1,2,3,4,5,6} 
optional constraints = { } 
Relaxation level t: 
nec~sary coustraints -- {1,2,3,6} 
optional constraints = {5,4} 
Figure 5: The relaxation specification for the NP rule, 
version l: optional constraints 
Relaxation level 1: 
necessary constraints: {1,2,3} 
relaxation packages: 
(a) {5, 6}: Premodifier-noun munber disagreement (b) {4}: 
~/~error 
Figure 6: The relaxation specification for the NP rule, 
version 2: grouped constraints 
constraint is satisfied, it has exactly thc same effect 
as a necc~ary constraint; if not, the constraint is 
recorded as having been relaxed. 
Once paining is complete, the information in R can 
then be used to generate an appropriate error mes- 
sage. 
The operation of this algorithm is supported by ex- 
plicitly indexing each constraint within a tale, as 
in Figure 3, and absl.racting out the specification of 
whieh vonstraint.s may be relaxed at a given relay 
ation lewfl. The constraint application specification 
for the NP rule is given ill Figure 5. 
Grouping Constraints 
This is not the whole story, however. Consider the 
NP this dogs, which would be correctly parsed at re- 
Acids DE COLING-92, NAN'Ir~.s, 23-28 AO~" 1992 4 7 1 PROC. OF COLlNG-92, N^tCrEs, AUG. 23-28. 1992 
laxation level 1 as exhibiting premodifier-noun num- 
ber disagreement under the system described so far. 
The instantiation of X0 resulting from this rule ap- 
plication would be as follows: 
°° \]\] x0: I:;:: 0,u 
Note in particular that (X0 agr num) has the value 
plu. This results from the solution of constraint 6, 
which is one of the necessary constraints at relax- 
ation level 1 as specified in Figure 5. This 'feature 
transport' constraint propagates the number of the 
tread noun to tile superordinate noun phrase. It is 
not appropriate to perform such a propagation under 
the current, cirolmstances, however, because once a 
case of prcmodifier-noun number disagreement has 
been identified, we cannot tell whether it is the num- 
ber of the noun or the number of the determiner that 
is in error. One might argue that one of the two 
is more likely than the other, but such a heuristic 
belongs in the mechanism that offers replacements 
rather than in the relaxation mechanism itself. If 
the number of the noun is always propagated to the 
noun phrase, spurious error reports may emerge in 
subsequent parsing: for example, in the text Th/s 
doys runs, a subject-verb number disagreement will 
be flagged in addition to the premodifier-noun num- 
ber disagreement error. This will be at best mislead- 
ing. 
We would like to be able to express the intuition 
that it is not really meaningful to apply constraint 5 
if constraint 5 has failed; these constraints should be 
grouped together, to be applied together or not at 
all. So we introduce an addition to the specification 
for relaxation level 1, shown in Figure 6. 
We refer to a group of constraints to be relaxed to- 
gether or not at all, plus the error message that cor- 
responds to the failure of the group of constraints, as 
a relaxation package. The algorithm of Figure 4 
has been adapted to apply such relaxation packages, 
resulting in the algorithm in Figure 7. Here, R is 
the set of relaxation packages required in order to 
complete the parse. 
Note that if all the constraints in a relaxation pack- 
age can be applied successfully, they have exactly the 
same effect as necessary ones, in terms of contribut- 
ing to the building of structure. Thus, if the number 
agremnent condition constraint 5 is satisfied, as in 
the case of the text an dogs, then the associated fea- 
ture percolation constraint, 6, will add the feature 
(agr nmn) to XO, with value (X2 agr hum). 
Ordering Constraints 
In the previous section, we altered the mechanism 
to allow for the fact that it is not meaningful to 
apply some coustraints if others have failed; in the 
worst case, this avoided confusing error diagnoses. 
Even if no such confusion would result, however, con- 
When applying rule r at relaxation level L: 
N *- necessary constraints on r at L 
O 4- relaxation packages for r at L 
R~- {} 
if all n • N can be solved 
then incorporate any instantiations 
for each relaxation package P~ E O do 
if all constraints c4 • Pi can be solved 
then incorporate any instantiations 
else R *-- R + P~: 
endif 
next 
else return failure 
endif 
return R 
Figure 7: The relaxation algorithm, version 2 
siderable efficiency gains can be made by ordering 
constraints in such a way as to minimise unneces- 
sary structure building. A similar point is made by 
Uszkoriet \[1991\], who talks of the need for a flex* 
ible control strategy for efficient unification based 
parsers, to ensure that the conditions that are most 
likely to fall are tried first. 
Ideally, the ordering of constraints would be derived 
automatically from other information; but it is un- 
clear how this would be done. Currently, we make 
use of one central ordering principle: 
(18) Category constraints on RnS items come first. 
In the bottom-up parsing system we use, all RrlS 
items will be instantiated with feature structures cor- 
responding to lexical entries, or to syntactic cate- 
gories built up by rule from lexical entries; it is a 
discipline on our lexicon and our structure build- 
ing rules that all such feature structures will have 
a cat feature. This means that a query about the cat 
value will involve no structure building. However, 
if, before checking the category, we were to enquire 
about the (agr num) feature, we might involve our- 
selves in some unnecessary structure building, be- 
cause if applied to a feature structure that does not 
have an (agr num) feature, what was thought of as a 
conditional constraint will in fact result in structure 
building. For example, the constraint in (19) ap- 
plied against the structure in (20) will result in the 
structure shown in (21); this is clearly not desirable. 
(19) (Xl agr num)= plu 
conjunction\] (20) X1 = \[l':xt:: and J 
cat: conjunction\] lex: and / 
AcrEs DE COLING-92, NA~fI~S, 23-28 AOt~r 1992 4 7 2 PROC, OF COLING-92, NANrES, AUG. 23-28, 1992 
Relaxation level 0: 
necessary c~mstraints: {2,3,5,4,1,6} 
relaxation packages: {} 
Relaxation level 1 : 
necessary constraints: {2,3,1 
relaxation packages: 
(a) {5, 6}: Premodifier-noun number disagreement (b) {4}: a/a,, 
error 
Figure 8: The relaxation specification for the Nr rule, 
version 3: constraint ordering 
These considerations give ri~ to the ordering of con- 
straints given in Figure 8; we assume that when the 
algorithm in Figure 7 tests whether all members of 
a constraint set can be solved, the constraints are 
solved in the order given in the specification, and the 
test halts as soon o.s any member of the constraint 
set cannot be solve& 
Discussion 
Wc have argued that combining the relaxation tech- 
nique for syntactic error correction with a grammar 
(such a.s is found in recent unification formalisms) 
that expresses most of its information in the form of 
constraints provides a good starting point for a llexi- 
ble mechanism for detecting and correcting syntactic 
errors. Our work in this area so far raises a number 
of interesting (lUt.~tions which need to be pursued 
lurther. 
Dependencies betwcen Constraints: As 
we have seen, the ordering of constraints in the 
relaxation specifications is very important. How- 
ever, the particular role a specific constraint per- 
h)rms will of course depend on the particular pars- 
ing strategy being used. Ideally, we would like to 
generate the ordering information antx)matically, 
although it is not entirely clear how this might 
be (lone. One source of some ordering constraints 
might come from using typed feature structures 
in tile lexicon, so that the rule application mech- 
anism can deterniine abead of time what the pri 
mary source of information is. Another approach 
might lie to require the grammar writer to spec- 
ify the c, onstraints on rules as belonging to specific 
categories, and then to allow the rule application 
mechanisni to impose a predefined ordering be- 
tween categories; in particular, the most trouble- 
~)mc constraints are those which transport feature 
values around a structure, since ttmy may trans- 
port the wrong values, ms we saw in the example 
discussed earlier. 
Generation of Replacement Text: A topic we 
have not addressed in the pr~ent paper is the gen- 
eration of corrections for hypothesiscd errors. The 
result of parsing using relaxatkm provides sufli- 
cleat information to generate such replacements, 
but once again we need to maintain infurmatiou 
about the dependencies between elements of a 
structure so that, when a new structure is created, 
any conflicts that ari~ can be re=solved: for exam- 
ple, if generating a correction involves changing 
tile num feature of a noun from plural to singw 
lar, we need to encode the information that the 
lex feature is dependent upon the hum feature and 
some specification of the root form, so that the re- 
tfiaeement mechanism knows which features take 
priority and which may be overridden. 
Deciding between Error lt:ypotheses: When a 
constraint unifying two incompatible values vl mid 
v~ has to be relaxed, then in tile absence of further 
infi)rmation there are two equally likely error hy- 
pothers: one, that vl is the correct value and t~a 
is wrong, mid the other that v~ is correct and Vl 
is wrong. However, there are two typ~ of situa- 
tion in which further information available dm5ng 
parsing may ratable one hypothesis to be preferred. 
The first is where the absolute likelihood of one 
error seems greater thau that of the other, l, br ex- 
ample, in the case of the noun phro.¢¢c these dog it 
might prove to be much more likely for a writer to 
mistakenly omit the single letter s than to choose 
the wrong determiner, which involvee a change of 
two letters there may be quantifiable difference 
between the assumptious behind the two hypothe- 
ses. The second is where a number of possible er- 
rors are linked, for example if the whole sentence 
w~m "llJese dog are fie~vze, llere, two possible errors 
involving different rules are interdependent, mid 
once again it is possible to argue that one error 
hypothesis requir~ a quantiliably dilferent set of 
~umptions; here, both these and ant would have 
to be wrong if dog were to be a.~sumed COrrect,. 
"lb a certain extent, it may be possible to rely on 
unilication to deal with these confliet.~. The relax- 
ation package dealing with the noun phr~ num- 
ber disagreement might 'hold its fire'-not signal 
an error immediately- -leaving the number feature 
of the noun phrase uninstantiatcd. Then there will 
be no clash with the number of the verb phrmse, 
which will be propagated down to the noun phrase. 
It may be pix~sible to hook this value up to the sub 
sequent t)rocc~ing el the error suggestion from the 
nolin phrase rule. 
Alternatively, the idea that there are a number of 
a~sumptious behind a given error hypothc'sis could 
be formalised, perhaps by 1L~ing an A'rMS \[de Kleer 
1986a, 1986b\] to keep track of inconsistencies. Ily- 
potimses could be weighted both by their absolute 
likelihood and the contextual evidence (i.e., the 
number mid weight of related errors eonsis£ent mid 
inco~mistent with the hypotheses). 
Much depends on where during the parsing pro~ 
eess errors arise and are notified, and so detailed 
consideration of this issue h~.u been deferred until 
our eht~rt parser extension to this system has been 
explored. 
Acrl!s DE COLING-92. NANTES. 23-28 hO(rr 1992 4 7 3 P~oc. o~ COLING-92. NArcri!s. Aeo. 23 k28. 1992 
Levels of Relaxation: The examples we have pro- 
vided have only explicitly mentioned one level 
of relaxation, One can imagine situations where 
other, further levels of relaxation are available. 
In particular, note that, since categorial informa- 
tion can be specified by means of constraints, we 
can also consider handling instances of words mis- 
spelled as words of other syntactic categories by 
means of the same mechanism; relaxing category 
feature constraints might be an appropriate can- 
didate for a further level of relaxation. There is 
of course the question of how one decides what re- 
laxations should be available at what levels; deter- 
mining this requires more detailed statistical anal- 
ysis of the frequencies of different kinds of errors. 
It is also likely to bc required that individual error 
rules, spread across a number of grammar rules, be 
capable of being treated as a unit, that is, switched 
on or off together, orthogonal to the idea of relax- 
ation levels. 
Different Kinds of Relaxation: In the forego- 
ing, we a~qumed that relaxing a constraint sim- 
ply meant removing it. There are other notions of 
constraint relaxation that could be used, of course; 
for example, if a constraint assigns a value to some 
feature, we could relax this constraint by assigning 
a less specific value to that feature. There may be 
other cases where we would want to generalise the 
notion of relaxation to include the possibility that 
a constraint could be replaced by a quite different 
constraint. 
Conclusions and Future Work 
We have described a simple extension to the PATR- 
n fornndism which allows us to provide declarative 
specifications of possible relaxations on rules. This 
provides a good starting point for a flexible mech- 
anism for detecting and c~rrecting syntactic errors. 
One rea.~on for this is that relaxation provides a pre- 
cise and systematic way of specifying the relation- 
ship t)etweeu errorflfl and 'CorrecU forms, making it 
easier to generate suggestions for corrections. A sec- 
ond reason is that the very uniform representation 
of linguistic information will allow flexible strategies 
for relaxation to be applied; this is particularly im- 
portant when dealing with text that may contain 
unpredictable errors. 
As we have shown, the mechanism described here can 
be applied straightforwarly to Constraint Violation 
Errors as described at the beginning of the paper. 
At the moment wc have a rather ad hoe mechanism 
that deals with cases of Lexical Confusion by pro- 
viding alternative lexical entries in the case of parse 
failure, but this needs to be integrated better with 
the relaxation mechanism. Cases of Stylistic Awk- 
wardness simply require the addition of a critic that 
walks over the structures produced by the parser. 
The mQor focus of our current work is the replace- 
ment of the shift-reduce parser by a chart parser, to 
enable us to handle cases of Missing or Extra Ele- 
ments. 
Acknowledgements 
This work was carried out as part of lED Project 
1679, The EditorJ8 Assistan~ Douglas is supported 
by SERC grant GRF 35654. Much of our thinking on 
this topic was inspired by conversations with Pablo 
Romero-Mares, who constructed an early version of 
the parser as an MSe project. 
AcrE.s DE COLING-92, NANTES, 23-28 AOUT 1992 4 7 4 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 

References 

tt Dale \[1989\] Computer-based Editorial Aids. Pages 
12-20 in Recent Developments and Applications of 
Natural Language Understanding, edited by Jeremy 
Peckham. Kogan Page, London. 

It Dale \[1990\] A Rule-based approach to Computer- 
Assisted Copy Editing. Computer Assisted Language 
Learning, 2, 59-67. 

G E Heidorn, K Jensen, L A Miller, R J Byrd, and 
M S Chodorow \[1982\]\] The Epistle text-critiquing 
system. IBM Systems Journal, 21, 305-326. 

J de Kleer {1986a\] An Assumption-based Truth 
Maintenance System. Artificial Intelligence, 28, 
127-162. 

J de Kleer \[1986b\] Extending the TMS. Artificial 
Intelligence, 28, 163-196. 

S C Kwasny and N K Sondheimer \[1981\] Relaxation 
Theories for Parsing Ill-Formed Input. American 
Journal of Computational Linguistics, 7, 99-108. 

K Jensen, G E Heidorn, L A Miller, and Y Ravin 
\[1983\] Parse fitting and prose fixing: getting a hold 
on ill-formedness. American Journal of Computa- 
tional Linguistics, 9, 147-160. 

L A Miller \[1986\] Computers for Composition: A 
Stage Model Approach to Helping. Visible Lan- 
guage, XX(2), 188 218. 

S D Richardson and L C Braden-Harder \[1988\] The 
ILxperience of Developing a Large-Scale Natural Lan- 
guage Text Processing System: CRITIQUE, In Pro- 
ceedings of the 2nd Applied Natural Lan ouage proH 
cessing Conference, pp195-202. 

S M Shiebcr I1986\] An Introduction to Unification- 
based Approaches to Grammar. The University of 
Chicago Press, Chicago, Illinois. 

H Uszkoreit \[1990\] Strategies for Adding Control In- 
formation to Declarative Grammars. In Proceedings 
of the 29th Annual Meeting of the Association for 
Computational Linguistics, pp237-245, 

R M Weischedcl and J E Black \[1980\] Responding In- 
telligently to Unparsable Inputs. American Journal 
of Computational Linguistics, 6, 87-109. 
