A Linguistic Theory of Robustness * 
Sebastia.n Goeser 
lnstitut fClr maschinelle Spra.chverarbeitung 
mns~;_~rus.unl-s tut tga.r t .dbp.de 
IBM Deutschlalad GmbH- GADL 
Hans-Klemm-Str. 45 D-7030 
"t - r~ " GSR at, SD\[ ' \,"i 'i 1 
1 Introduction 
Syntactical robustness is a desired design pro- 
perry of natural language parsers. Within the 
past decade, several developmental robustness 
approaches have been %rwarded: Syntax-free se- 
mantic passing \[1\] co,~lstraint relaxation after 
parse failure in a pattern matching \[2\] or ATN 
framework \[:3,4\], parse tree fittiug \[5\] and several 
non-formalized case frame approaches (e.g. the 
parser series in \[6,7\]). Three approaches \[5,8,9\] 
account for special defectivities by extending 
grarnmatical coverage. This paper refo,:mulates 
the so-called weakness approach, first, published 
in \[I0\], which extends robustness to declarative 
parsing formalisms. 
There are serious shortcomings in robustness re- 
search, emerging fl'om the common view of ro- 
bustness as a parsing and not as a representa- 
tion problem. Typically, two distinct representa- 
tion levels for grammatical and non-grammatical 
language are assumed. The former is given by 
the basic fi'amework, the latter by relaxed pat- 
tern slots \[2\] or ATN arc re.sis \[3\], by "non- 
grammatical" meta-rules \[4\], by some construc- 
tion specific strategies \[6,7\] or by the schema me- 
chanism \[11\]. \Virile formalism syntax is somet;i- 
rues specified (e.g. \[4,10\]), ~ semantics of robust 
grmmnar formalisms, being m'.cessa.ry to define 
these two representation levels, has not been gi- 
ven vet. Without a well-defined formalism se- 
mantics, it is impossible to predict the behaviour 
of a (robust) grammar fragment when applied 
to non-grammatical language. Therefore, no ro- 
bustness methodology has been available until 
llOW, 
2 The WACSG approach 
\VACSG (Weak ACSG) is an experimental for- 
realism for defining robust grammars, ACSG 
(Annotated Constituent Structure Grammar) is 
a class of two-level grammar formalisms such as 
°The work reported has been supported by an LGF 
grant from the Land Baden-VC/h'ttemberg. For valuable 
comments on art earlier draft of this paper Imn indebted 
to Christian Rohrer and Tobias Goeser, 
M,'G \[11\], DCG \[12\] a.nd PAT\[L-II \[13\]. Nevert- 
heless, WACSG weakness concepts may also Iw. 
iml)hmlent.ed in monostratal formalisms as ~.g. 
tIPSG \[\]4\]. WAC.S<~ is dedicated to synlacti(;al 
robustness, and not. to lnorphosyntacbic (spel- 
ling correction), semau{k: or l,ragmat, ic robust- 
hess. '\]'his does not preclude scmaut.ics and/or 
pragmatics f'rom resolving robustness conflic s. 
For a \,VA().S(.~-grammar f!'agme,/t to be robust, 
its formalism's weakness is necessary l,nt, not 
sm'Iicient and its adequacy w.r.t, defecti.v¢~ !an- 
guage is necessary but not sufficient. Robustness 
theory is to show that defective lang~ ~)'? is cx- 
actl.q the language described by "weal¢" des< rip- 
lion methods. Any less metaphorical constrtlc- 
tion of the notion of weakness needs a conside- rable formal apparatus. 
3 The WACSG Formalism 
A WACSG grammar rule is a context Dee pro- 
duction annotated with an attribute-value- (av- 
) formula. The following two subsections deal 
with weakness relations for context free gram- 
mars and av-languages. Section 3.3, then, sp,~ci - 
ties the \\"ACSG formalism semanlics. 
3.1 Partial String Languages 
Below (1), three part, lid st, ring languages of a context- 
free grammar G =< Cal, I.e,~:, Pr,,q's(~ > are 
defined, where Cat and Lcx are sets of nonter- 
rninal and terminal symbols, respectively, Pr a 
set. of productions and >;set a set. of start sym- 
bols. Now let I'w a set of substrings of w and 
PP~ a set of power-substrings of w with any 
'w ~ E PP,~ resulting fl'om deletion of a.rbitrary 
subst,a'ings in w. If \[w\] > 0, then t~0 and P\]~) 
must not contain e. Z~ and ZZw are parti- 
tion flmetions in Pw and I)t~ respe(:tiwqy. More 
simply, ,~'\]'2T((/) equa.ls L(G) +. SUB((J) allows 
an undefined leftside and/or rightside snbstring 
and PAI~(G) even undefined infix substrings for 
every element from L(G). 
(1.) 
156 1 
SET(G) = e e e n 
L(G) n } 
PAR(G) = e I e n L(G) } 
Partial string languages have appealing formal 
properties: ¢(G) for ¢ E {SET, SUB, PAR} 
is context-free, contains ciff L(G) contains c 
and there is an order L(G) C SET(a) c 
SUB(G) c_ PAR(G). Nesting partial string lan- 
guages introduces a set ~(G) of languages such 
as e.g. SET(SUB(G)),SET(PAR(G)). We have I¢¢(G)l 
= 1, i.e. a lie languages with maximal 
ope.rator ¢ are weakly equivalent, though not. 
pairwise strongly equivalent. 
A recurs±re partial string grarnmar (RPSG) is 
obtained by indexing rights±de (nonterminal) 
symbols of a cfg G with indices SET, SUB, or 
PAR. The formalism- semantics for an RP~G is given by a derivation relation (cf.\[15\]) for non- 
indexed and SET-indexed nodes of a tree graph 
and by a generation function gen as displayed in 
2 for any other nodes. Let Q(G) the set of deri- 
vations for a given G, w E Q(G) a derivation 
and tw its tree graph. Let lw be a label function 
with l~o(O) 6 Ssetind a (possibly indexed) start 
symbol ~ The languages L(G) (derived language) 
and RPSL(G) (generated language) are defined 
in .3. L(G) and RPSL(G) are context-free and 
we have L(G) C RPSL(G), L(G) usually being 
much smaller than RPSL(G). 
(2) 
ge.n~ : t x (Catind U Lex) + --* {0, 1} 
('3) 
Let G be a RPSG. 
• nTSS(G) = c Lex+ \] 3w e 
: 1} 
3.2 Attribute-Value Languages 
The av-language c9 is a first order predicate lo- 
gic including l-dry function symbols and two 2- 
dry predicates "~" and "E"for equality and set 
membership, respectively. Soundness and com- 
pleteness of 0 without E have been proven in 
\[16\]. The predicate "E" introduces well-founded, 
distributive, recurs±re sets of attribute-value- 
structures, and is discussed in \[17\] . We as- 
sume the existence of a reduction algorithm 
RNF with RNF(A) E O, iff is A satisfiable 
and RNF(A) = 2_ otherwise (for any formula 
AE!O) 2 
1By notational convention, it is Catind C_ Cat x 
{SET, SUB,PAR} and by definition of RPSG, it. is 
Ssetind ~ ~a-tind, 
2RNF(A) is in disjunctive normal form, such that 
DNF(RNF(A)) : RNF(A) 
Robustness in the area of av-languages is the 
ability to cope with inconsistent (i.e. overspeei- 
fled) formulae. Two different methods for main- 
taining consistency will be considered, namely 
set weakening and default formulae. 
3.2.1 Set weakening 
In robustness theory, the purpose of av-sets 
is to weaken the flmction condition on dr- structures. Set weakening may be used e.g. tbr 
the transition from an inconsistent formula A = x(syn)(case) ~ nora A x(syn)(case)~ akk to 
a consistent (therefore non-equivalent) formula 
A x(syn)(case)=xl ~ nom Axa~ akk As1 E 
x(syn)(case) A x2 C x(syn)(case). This U'ansi- 
tion preserves case information, but not incon- 
sistency for the denotatmn J\[x~ . In general, set 
weakening is defined as follows: 
(4) 
Let A E cO a fonmfla in disjunctive nor-- 
mal form and t a non-constant tenn. Let. L ¢ A 
{Ai,j.k It occurs k-times in a literalAi.j} a set 
of indices. For any r E L~t , zr is a variable 
not occuring in A. The set weakening of A for a 
term t is 
For any A 6 cO and non-constant term t it has 
been shown (see \[17\]) that, if RNF(A) = A # ±, 
then also DNF(At) = RNF(A ~) # 1. There- 
fore, if A is satisfiable, then A t is also satisfia- 
ble. Since satisfiability of A does not follow from 
satisfiability of A ~ (see above), A t is weaker-of 
equivalent to A. However, the theoretically mo- 
tivated Aqnotation has not been integrated into 
WACSG formalism, since set weakening can be 
achieved by using the predicate "6". 
3.2.2 Dethults 
r£he classical subsurnption ~_ gives a partial or- 
dering within tile set of av-models. There are, ho- 
wever, no inconsistent models. Therefore, a par- 
tiality notion with inconsistency must be based 
upon descriptions i.e. av-formulae. The relation 
3-partial _C 0 ~ is a subsumption-isomorphisrn 
into a (canonical) subset of 0. The relation 0- 
partial defined below is still weaker in allowing 
inconsistency of one formula B and can be shown 
to be a superset of 3-partial, i.e. 3-partial C_ 0- 
partial. 
Let I 6 0 aconjunctionofliterals, and A,B 6 
CO . Then A 0-partial B iff: 
1. RNF(A A I) ¢ RNF(A) 
2. RNF(A A I) = RNF(B), if RNF(B) 7! 2 
DNF(A A I) = DN F(B) otherwise 
2 157 
a. leNF(A) ¢ 
Tile formula I C c) may be restricted to be a 
conjunction of default literals, whose predicate 
is marked with a. subscript a. This gives a de- 
fault relation, which is a subset of a superset 
of subsumption between formulae. A relation of 
default-satisfiability "l=a" may be based upon 
this default relation. It is easy to demonstrate 
that a default-relation like this has some desired disambiguation properties: a disjunctive formula 
A = A1 VA2 is reduced to RN.F(A1 A I) by con- 
joining it with a default formula I G 0 such 
that RNF(A2 A I) = I. 
4 WACSG formalism se- 
mantics 
For any WACSG-Grammar G, a domain D(G) 
and its subset SDDE(G) of strictly derivable 
domain elements is defined as follows. Any do- 
main element not in SDDE(G) bears weak- 
hess relations to a derivation co C Q(G) , 
where w(0)(1) G Ssctind .Anyformulaw(i)(2) 
(0 <~ i <_ leo) may be inconsistent. Now, a gram- 
mar G is called weak iff D(C;) -SDDE(G) ¢ O. 
(s) 
Let; C be a WACSG grammar, G 1~ the cf base of 
G and ~'~ the cf part of a derivation w E f~(G) 
• Let M be the set of av-mode!s. 
• D(G) = {< u,,ell >E Lea: + x M \[ 3a0 
9-Cc) 
1. u, C: IePSL~(G a:) 
e. M 
• $DDE((;) = {< w,M >~ Lex + x M \[ 
3wEg.(G) 
1. ¢,, = 
l)e\['ault-formulae and set. membershi I) formulae 
cannot be simulated by anything else in WACSG 
formalism. For every WACSG grammar G, ho- 
wever, there is an equivalent WACSG grammar 
G' without any partial string indices within it. 
This grammar G' shows an extreme complexity 
already for a few indices in G. This fact chal- 
lenges the view (see e.g. \[8\]) that robustness can 
be achieved by coverage extension of any non- 
weakeable ACSG. 
5 A WACSG-treatment of 
restarts 
in t;his section, the WACSG formalism is applied 
to restarts, a class of spoken language construe- 
tions, which is often referred to in robustness li- 
terature \[2,3,4\]. A grammatical explanation, ho- 
wever, is still lacking. The German restart data 
in 7 are given with transliteration and segmen- 
tation. Constructions in 7,8 are ungrammatical, 
but not inacceptable. 
(7) die \[ Umschaltung 
the \[ switching 
A Einstellung\] des Fonts 
A adjustment \]of the font 
/3' 7 
(8) Peter \[ versuchte dann A konnte\] kommen 
Peter \[ tried then ,4 could \] come 
From the viewpoint of robustness theory, a rest- 
art < a/3 A S~7, M > 6 D(G) should not be 
in SDDE(G) exactly if it is defect, where G is 
a realistic WACSG fragment of the language in 
question. Roughly, restarts are a kind of phra- 
sal coordination not allowing for deletion phe- 
nomena such as ellipsis, gapping or left deletion. 
Additionally, the ~-substring (i) does not contri- 
bute to (extensional) meaning a of the construc- 
tion and, (ii), may show recursive defectivenes- 
ses such as contanfination and constituent break 
(examples 9,10). 
(9) dab er \[dieses Meinung ~4 dieser Meinung\] ist 
that he \[ this-neuter opinion-fern A this-fern 
opinion-fern \] has 
(10) Peter ist \[ ins in das A dann Vater gewesen\] 
Peter is \[in-the in the A then father been \] 
5.1 NP-restarts 
The following WACSG rules 11-14 deal with 
openly coordinated NP restarts and are easily 
generalized to prepositional, adverbial or ad- 
jectival phrase restarts. Under the coordina- 
tion hypothesis, a parallelism between defective 
and non-defective restarts is assumed. Right- 
recursive coordination of defective and nondefec- tive conjuncts is unrestricted. In 11, equations si- 
mulating semantic and syntactic projections (see 
\[18\]) "control up" the syntactic but not, the se- 
mantic description of afl conjunct in a restart 
construction. 
In rules 13,14, partial string indices ,sef3r~ und 
PAH allow a defect conjunct to cover a prefix 
substring (if no phonological restart marker of 
category AC is present) or every substring (if 
there is a restart marker). 
aHowever, it does contribute t,o meaning in an inten- 
sional sense: ~3-substrings are not, absurd. 
158 3 
i~,ulc 1.1 applies set weakening to the syntactic 
av structures of both conjuncts, resulting in a 
well-l:nown coordina.tion treatment \[19\]. Default 
eqtmtions provide disaml)iguation to syntactic 
features \[~x:(syn)(case)~ and ~x:(syn)(gender)\]\] 
, since defectiviV may render the first conjunct 
ambiguous 4. Furthermore, rule 15 shows default 
weakening of the syntactic description of NP's. 
(11) NP --~ NPC NP 
Xl(syn) E x(syn) A 
x1(syn)(gender) ~d mas A 
Xl(syn)(case) ~a non: A 
~2(syn) ¢ a:(syn) A 
\[ ,<(:~yn)(koord)(sy4(defec) ~ + A 
• ~(sem) ~ x(st,n) 
v .~;:(~vu)(koord)(s~'n)(~te:~e) .~ - A 
:,:.2(se,,,) ~ .(s:,\]:)(:~rg8)\] 
(12) NPC --~ NP CO 
:~:(sy~:) ~ ,,;(sy,:) A 
.; ~ 4sy\],)(koord) "~ 
(13) NPC'su~s~ ~ Det 
;t'\] ~,-~, :r A 
x(syn)(koord)(syn)(defec) = ÷ 
Within a simulated projection theory, con- 
trolling down a verbal argument into 
a vcomp-embedded element of a.n av- 
set requires a complex regular term 
x( syn )+\[ (vcomp)(syn) +\]*, which is ex- 
pensive to compute. Therefore, rules 16,18 
introduce an additional term x(kosem), 
such that \[\[;c(kosem)\]\] is the semantic struc- 
ture of a set Ix\] of openly coordina- 
ted av.-structures. By default satisfiability 
of x(sem) ~d x(kosem), \[\[x(sem)~ equals 
~\[x(kosem)~ except if \[\[x~ is the av set of a 
non-restart coordination. 
Since defectivity, e.g. a constituent, break. 
may render incomplete the fl verbal phrase 
incomplete, rule 17 provides semantic de- 
fault values for every possible semantic ar- gument. 
l)istribut.ed av formulae may be necessary 
for one conjunct but inconsistent with (the 
description of) the other. This situation 
ma.y arise due to contamination of the 
first: (fl-) conjunct. Independently it can be 
shown that contaminations almost exclusi- vely affect syntactic (as opposed to seman- 
tic) t~atures. Now, if the conditions cohe- 
rence and completeness (see \[11\]) are deti- 
ned on semantic structure, syntactic cohe- 
rence can be inforced by lexicalized formu- 
lae as shown in 19 that depend on a syntac- 
tic defectivity feature ~x(syn)(defec)\] 
(14) A' IOCpAI.i --~, I)et ACe 
X 1 ~ ;l: A 
:~? 2 ~ x (syn) (koord) 
(15) NP---,Det N 
a:l(syn) ~4 m(syn) A 
2,: 2 ..~, A' 
Example C1 (al)pendJx) shows a complex NP- 
coordination of defective and no\]>defe(-live con-- juncl;s. The 
coi@Inct NP des 1)clef shows a con- 
t.a\]ninated case fea~i.ir{::, sillce des has genii.ire 
and Peter has \]~ominative., accusative or dative 
morphological case markil~g. Neverthdess, re- 
,,,ark that lO.2.1(syn)(casc)~l is disambiguat{,.d 
i.o nominative in the av-slructure in C1. 
5.2 VP-restarts 
Although VP-restarts follow the same lines as 
NP-restarts, open coordination of detective con- 
juncts imposes additional problems 5. 
'll"or any av-term t, It\] is the denotation of t (in |he 
modcq in question). 
"'A coordination construct.ion is calfed open ifl" there is 
a constituent whose av structure is distributed over the 
syntactic av-set assigned {.o this construction. 
(1{5) Vt' --~ VP1 \:P 
;~.1 C :r A x., 6 
x A 
\[ x:(syn)(koord)(syn)(defec) ,,~ + A 
..,(sere). ~ .~.(kos~,.) 
v .l(svn~(koord~(~v\],~(dere~ .~ - A .(kosem)" 
"'~- "" "" "~*:(s'vn)(koora)($en,)A 
:c 2 (se m) ~ x ( kosem)(argS)\] 
(17) VPl PAJ~ ---+ V AC 
xl(syn) ~ :,~(sy~ A 
x:(syn)(defec) ... + A 
x(sem)(y)(pre(l) ~,d unfflled A 
xi(sem.)(tense) --~e. pres A 
x2 ~ x(syn)(koord) ~; 
(18) VP---V AC 
;L' 1 ,-~. :c A 
x' 2 ~-,xA 
x(sen:) ~d x(kosem) 
(19) V --+ gef~llt 
X i ~ x A 
m(syn)(defec) ~ + 
,,~'ts;.n)(obi) ~ - A 
4s~'n)(vcomp) ~ - A 
;,: (sy,,)(a.c old,p) ~ -\] 
4 159 
The example C2 (appendix) involves a distribu- 
ted av-structure, whose description is inconsi- 
stent with respect to syntactic case subcatego- 
rization of fl's finite verb gefiillt. 
6 Conclusion 
The reformulation of robustness theory as a 
theory of weak grammars (and, consequently, of 
robust parsing as parsing of weak grammars) has 
enabled both the specification of working par- 
sets \[17\] and a substantial explanation of non- 
gl'ammatical langua;ge. Further study has to be 
done. Cross-linguistlc resear& on defective con- 
st.ructions (e,g. non-grammatical ellipses) and a 
default logic ma~oching methodological standards 
of A1 theory remain important desiderata,. Our 
prediction {hat there is no strong theory of de- 
fectiveness, however, invites for falsification. 
Appendix 
(Cl) 
dieser also des A des Peter m~d die Maria 
this therefore the-gen A the-gen Peter and t h<- 
Mary 
-\7 pf) 
N PC!-P7\ 17.o 1 ?C-Po ,~ 
D~t A(" .NPC~o o t ;\."t~.>.~ 
die~ ~ also der A des Peterunddi~Eria 
160 5 
syn 
SO 11-! 
I case liO1Tl f\]d gender ma~s 
0.1 koord syn efec 
seIn 1 
\[gender rnas \] 
gender fem 
und'(arg4,arg5) 
\[1\] arg4 pred peter' \] 
arg5 \[ pred maria' \] 
(c'2) 
den Peter \[geffillt A interessiert die Schule sehr\] 
the-akk Peter \[likes (with no akk argument) et 
is-interested-in t.he-akk school very \] 
I 
I 
S-SET0 
/~ ,#No v xe xP 
de g lit lntm smerth 
0.2.1 
0.2.2 
i i 
I 
syn 
kosem 
syn 
sc1TI 
kosem 
de fee 
koord 
obj 
subj 
obj2 
\[3!lefec 
obj2 
obj 
subj 
+ 
\[syn\[defec+\]\] 
0.1.1 \[ spec def . 
SelTl class nllrnan I t 
sern \[ pred unfilled \] 
syn ... 
sere \[ pred unfilled \] 
\[ pred interessieren'(arg3,arg2) \[3\] arg3 
\[a\] 
0.1.1 spec def \[ pred \]~etm" \] 
sem \[1\] class human 
syn \[ gender fern 
sem \[2\]\[ pred sclmle' \] \] 
6 161 

Bibliography 

DeJong, G.: An Overview of the FB.UMP System, in: 
Lehnert,W., l:l.ingle, M. (eds.): Strategies for Natural 
Language Processing, Lawrence Earlbaum Ass., 1982 

Ilayes, P.J. and Mouradian, G.: Flexible Parsing, in: 
Proc. ACL !8, Phih~delptfia 1980 

I4wasny, S.C. and and Sondheimer, N.K.: lt.elaxation 
"l~echniques for Parsing Grarnmatically lll-l'K)rmed In- 
put, in: AJCL 7,2, 1981 

XVeischedel, t~..M, and Sondheimer, N.IC: Metarules as a 
Basis for Processing lll-Porlned Input, in: AJCL 9, 3-4, 
10;7~3 

Jensea, I,: , lleidorn, G.E., Miller, L.A., Ravin, Y: 
Parse Fitting m~d Prose Fixing: Getting a tloht on I\[l- 
Formedness, in: AJCL 9, 3-4, 1983 

Carbonell, J.G. and Hayes, P.J.: Dynamic Strategy Se- 
lection in Flexible Parsing, in: Proc. ACL 19, Stanford 
lq81 

Carboneil, J.G. and !layes, P.J.: Coping with Extra- 
grammaticality, in: Proc. COLING-10, Stanford 1984 

Linebarger, M.C, Dahl, D.A., Ilirshman, L., Passoneau, 
1-ca.: Sentence Fragments' I:tegalm" Structures, Proc. 
ACL 25, 1987 

Ennrkanian, L, and Bouchard, L.tI.: Knowledge Interac- 
tion in a robust and efticient morpho-syntactic altalyzer 
for French, in: Proc. COLINO-12, Budapest 1988 

I(udo, I., Koshino, It., Moonkyung> C., Morimoto, T.: 
Schema \]Method: A Frmnework for Correcting Gram- 
tactically Ill-Fbrmed Input, Proc COLING-12, Buda- 
pest 1988 

Kaplan, R.M. and Bresnan,J.: Lexical-Ftmctional 
Grammar: A Formal System for Grammatical Repre- 
sentation, in: The Mental Representation of Gramma- 
r.teal Relations, Bresnan, J. (ed.), MIT Press, 1982 

Pereira, F.C.N. and Warren, D.H.D.: Definite Clause 
Grammars for Language Analysis, in: AI 13, 1980 

Shieber, S.M.: q'he Design of a Computer Language for 
Linguistic Information, in: Proc. COLING-10, Star> 
ford 1984 

Pollard,C., Sag, I.: Head Driven Phrase Structure 
Grammar, CSLI Lecture Notes, 1987 

Wedekind, J. A Concept of Derivation for LVG, in: 
Proe. COLING-11, Bonn 1986 

Johnson,M. E.: Attribute-Value-Logic and the q?heory 
of Grammar, Doctoral Diss., Stanford 1987 

Goeser, S.: Gine lingnistisdae Theorie der Robustheii. 
auf der Grundlage des Schwachheitsbegriffs, Doctoral 
Diss., Stuttgart 1990 

Halvorsen, P.K.> Kaplan, tL: Projections and Semantic 
Descriptions in LFG, Proc. of the Int. Conf. on IPifch 
Generation Computer Systems, Tokio 1988 

Kaplan, R.M. and Maxwell,J.-I'.:CoordinationinLt:(i, 
Proc COLING-12, I~ud~lpest 1988 
