Flexible Parsing 
Phil Hayes and George Mouradian 
Computer Science Department, Carnegie-Mellon University 
Pittsburgh. P A 15213, USA 
Abstract' 
When people use natural language in natural settings, they often 
use it ungrammatically, rnisSing out or repeating words, 
breaking-oil and restarting, speaking in Iragments, etc.. Their 
human listeners are usually able to cope with these deviations with 
little difficulty. If a computer system wislles tc accept natural 
language input from its users on a routine basis, it must display a 
similar indifference. In this paper, we outline a set of parsing 
flexibiiilies that :',uch a system should provide. We go, on to 
describe FlexP. a bottom-up pattern-matching parser that we have 
designed and implemented to provide these flexibilities for 
restricted natural lanai.age input to a limited-domain computer 
system. 
1. The Importance of Flexible Parsing 
When people use natural language in natural conversation, they often 
do not respect grammatical niceties. Instead of speaking sequences of 
grammatically well-formed and complete sentences, people Often miss out 
or repeal words or phrases, break off what they are saying and rephrase 
or replace it, speak in fragmentS, or use otherwise incorrect grammar. 
The Iollowing example colwersation involves a number of these 
grammatical deviations: 
A: I wmlt., can you send a memo a message to to Smith 
El: Is Ihal John or John Smith or Jim Smith 
A: Jim 
Instead of being unable or refusing to parse such ungrammaticality, 
human listeners are generally unperturbed by it. Neither participant in the 
above example, for instance, would have any di|ficulty in Iollowing the 
conversation. 
If computers are ever to converse naturally with humans, they must be 
al)l~, to l)nr.~t~ th~4ir inl)Id :.is ilexii~iy and rni)Izslly ;m htlnmns do. While 
considerable advances have been made in recent years in applied natural 
language processing, few el the systems thai have bean constructed have 
paKI 5uificien, uttenlion In Iho kinrIs el devialio=l that will inevitably occur 
=u~ their ulq)ul if (f)ey are tlsed In ,'~ natural environment. In many cases, if 
the user's tat)tit (ions sol COlllefnl to tile sysh.~m's grammar, an in(iication 
of incomprnllermanl) followed by a rerluest to rephrase may be Ihe best he 
(:a=~ P~xt~¢~(:l W(; ht~.liP.vt. • Ihat .~uch ,fllexibili!y i. parsing severely limits Ihe 
practicality O| natLiral language contpuler hderl:~rces, an(| is a major roasell 
why nalar~d language tlaa yet to find wide acceptance in sucl~ ;tpplications 
as database retrieval Or interactive carom{rod langut,.ges. 
In this paper, we report on a flexible parser, called FlexP, suitable for 
use with a restricted natural language interlace to a limited-domain 
counputer system. W~. describe first the kinds of grammatical deviations 
we are trying Io deal with, then the basic design decisions for FlexP with 
juslificalion for them based on the kinds of problem to be solved, and 
finally more details of our parsing system with worked examples of its 
operation. These examples,and most of the others in tl~e paper, represent 
natural language input to an electronic mail system that we and others \[1 I 
are constructing as part of our research on user interfaces. This system 
employs FlexP to purse ils input. 
2. Types of Grammatical Deviation 
There are a number of distinct types of grammatical deviation and not 
;ill lypt~; ;|r~ tl)tOll~l it1 ;Ill Iypes of COlnlnunicatJon siltiation. In tllin so;cites. 
we first define the restricted type el communication situation that we will 
be concerned will1, thai of a limile~-I-domain computer system and its user 
communicating via a keyboard and (hsplay screen. We then present a 
taxonomy of grammatical deviations common in this context, and by 
implication a set el parsing flexibilities needed to dealwith them. 
2.1. Communicalio0t withaLimited-DomainSystem 
In the remainder of this paper, we will focus out a restricted type of 
canto)unitarian situation, that between a limited-domain system and its 
user, and on the p:trsing flexibilities neede(f by suuh a system Le ColJe with 
the user's inevitable grammatical deviations. Examples of the type of 
system we have in mind are data-b;~e retr0eval systems, electroa)ic mail 
systems, medical diaunosis systems, or any systems operating in a domain 
so rE'stricted thai they can COmpkHely understand ;311y relevant input a 
user might provide, In short, exactly the kind O! system that is normally 
used for work in applied natural Imtguage processing. There are several 
points to be made. 
First. although ,~uch systems can be expected to parse and understand 
anythi,lg relevant la their domain, their users cannot be expected to 
confine tllemselves to relevant input. As Bohrow el, al. 121 .ale. users 
oflcn explain Iltl~ir underlying motivations or olhorwzse jt=nlify their 
l(~(Itli'.%l,'~ ill l(~llnB ~Itlih~ ilr(!l~v;ilil Ill lh(!' (i()lnain ()fth(: ~yst~in. \]'hit ro,~tlJ| 
is lhal slJch systems cannot expecl Io parse ;.dl llx~il inlnH .,:vun wdh lhe 
use of flexible parsirx.j lechniqq.. 
Secondly. a flexible parser is just purl of the conversational comporient 
of such ;,I system, ai'id cannot solve all parsi,g problems by itself, For 
example, il a parser can extract two coherent fragments train an otherwise 
incomprellensible input, the decisions about what Ihe system should 
next must be made by another component of the system. A decision on 
wllether to jump to a conclusion about wllat the user intended, to present 
him with a set of alternative interpretations, or to profess total confusion, 
can only be made with information about the Itistory of the conversation, 
beliefs about the user's goals, and measures of plausibility for any given 
action by the user. See \[7~ for more discusSion o| Ihis broader view of 
graceful interaction in man-machine communication. Suffice it to say that 
we assume a flexible parser is iust one component of a larger system, and 
Ihal any incomprehensions or ambiguities that it finds are passed on to 
another component of the system with access to higtler-level information, 
putting it in a better Position to decide what to do next. 
Finally, we assume that, as usual for such systems, input is typed, 
rather than spoken as is normal in human conversations. This simplifies 
low.level processing tremendously because key-strokes unlike speech 
wave-farms are unambiguous. On the other hand, problems like 
misSpelling arise, and a flexible parser cannot assume thut segmentation 
into words by spaces :Slid carriage returns will always be corr~:t. 
However, such input is stilt one side of a conversation, rather than a 
polished text in the manner of most written material. As such, it is likely to 
contain many of the same type of errors normally found in spoken 
conversations. 
2.2. Misspelling 
Misspelling is perhaps the most common form of grammatical deviation 
in written language. Accordingly. it is the form of ungrammaticality that 
has been dealt wdh the most by language processing systems. PARRY 
J t I J. I.II'E\[1 Jl~ I. ;taxi tlumernus olher systems have tried te correct misspell 
i.p0Jt from their users. 
llhis n(,:'£a,mch w;l~ Sll~ll~.i~tl by IIH~. A. ll,ce OliVe uI SCI~IlliIic nl!s('lllc:h till(Jilt" 
97 
An ability to correct spelling implies the existence of a dictionary of 
correctly spelled words An input word =tot fot.ld m the dictionary is 
assumed to be misspell and is compared against each of the dictionary 
words. If a dichonary word comes close enough to the input word 
according to some criteria of lexical matching, it is used in place of the 
input word. 
Spelhng correction nloy be attempted in or out ol COntext. For instance. 
there is only one regson~.lble correction for "relavegt" or Ior "seperate". 
l)td Ior all mlitlI like *'till" SOltle k.'~d at conlext is typlc;.dly ilecossory as m 
'TII see yet= tm April" or "he w;.tS shot will} ltle stolen till." In ellect, 
c(}lltexl c;in Lx.. t !.lse(I to rc(ttlCO tile size Oi Ihe diClll)ltaly tO i}e searched for 
correct words. )'his lJt}lh n}akl,=s Ihe seuich inure t:|ficlent al}d red}ices tile 
possibilily el nlullll)le Ill.:ll(.;hus OI Ihe input ;.tgalllSt life LliCtiOI}afy. The 
LIFEF1 {UI sysletn uses tile strong cun:;tralnIs typically llrovlde~ by its 
SCII};.n}IIC gl;nnlnal if} IhlS way to r(.'~Iuc(3 tile range el possibilities Ior 
spelling correction. 
A particukvly troublesome kind of spelling error results in a valid word 
different from the one intended, as in "show me on of the messages". 
C|Parly. ~lich on error colt only t~e corre(;It~l Ihrotlgh cI)nlp;Irison against 
-'. contextually determined vocabulary. 
2.3. Novel Words 
Even accomplished users Of a language will sometimes encounter 
words they do not know. Suci} situations are a test of their language 
learning skills. If one (lidn'l know tile word "fawn". one could nt least 
decide it was a cotour from "a fawn COlOUred sweater". If one just knew 
Ih~ wur(J il~ lulOf Ilia lu ~.t young (.IL~I. one nllgh\[ CgllcJud(J thai II was L~ll~lg 
used to mean tile colour of a young deer. in general, beyond making 
direct inferences t}bout the role ol unknown words from their immediate 
context, vocabulary learning c:~l require arbitrary amounts of real*world 
knowledge and .derence. and this is certainly beyond the capabilities Of 
present day altificial intelligence techniques (though see Carbonell \[4} for 
work in this direction). 
There is. however, a very common special subclass of novel words that 
is well within the capabilities of present day systems: unknown proper 
names. Given an appropriate context, either sentential or discourse, it is 
relatively straightforward to parse unknown words into tile names of 
people, places, etc. Thus in "send copies to Moledes.ki Chiselov" it is 
reasonable to conclude Iron} the local context that "Moledeski" is a first 
name. "Chiselov" =s a suman~e, and together they identily a person (the 
intended roe:pit'.hi of the copm~5). Strnt~gles like this were used in the 
POLITICS \[St. FRUMP 16J. and PARRY 11 I I systems. 
Since novel words are by definition not in the known voc=bulary, how 
can a parsing system distiogt,sh them from misspellings? In most cases. 
the novel words will not be close enough lo known words to allow 
SUCCeSSful correction, aS in the above oxamole, bul this is not illways true; 
an unknown first name of "AI" COUld easily be corrected to "all". 
Conversely, it is not s~te to assume that unkl}own words ill contexts which 
allow proper names are re;.}lly proper names as in: "send copies to al 
managers". In this example. "or" probably should be corrected to "all". 
In order to resolve such cas~. it may be necessary to clleck ;}gainst a list 
of referents lor proper nameR, if this is known, or otherwis(~ to consider 
such factors aR whelher tile inlli;ll letters of Iho words are capilalized. 
AS lar as we know. no systems yet constr,ctc<t have int~jroted their 
handling of mi.~spclt wortl.q iln(t unknown, proper nanl~"s Io Ihe degree 
oullined ;.Ifl¢)v~.,. However, It}t~ COOP 19l .~,y,,it{~ln allows sysllHllnlic access 
In a dat;.i llaSt. • (:Ulllailllll~j |)lOller ii;nnes wllhotll Ihe ni'~L~t Ii)l ilICitlSlOll of 
Ihe words ,1 Ihe system's ilnrsing vocabulary. 
2.4. Erroneous segmenting markers 
Wntten text is segmented into words by spaces and new lines, and into 
higher level units by commas, periods and olher punctuation marks. Both 
classes, especially the second, may be omitted or inserted speciously. 
Spoken laf~gtJago s a so segmented, but by the Clt,te different markers of 
stress, interaction and noise words and phrases: we will not cons=der 
those further here. 
IncorreCt segmentation ;ll the lexical level results in two or more words 
being run togetl)er, as in "runtogether". or a single word being split up 
into two or more segments, ns in "tog ether" or (inconveniently) "to get 
her". or combinations of these effects as in "runlo geth el". In all cases, it 
seems natural to deal with such errors by extending the spelling 
correction mechanism to be able to recognize target words as initial 
se(jments of unknown words, and vice-versa. AS far as we know. no 
current systems deal with incorrect segmentation into words. 
The other type of segmenting error, incorrect punctuation, has a much 
broader impact on parsing methodology. Current parsers typ;catty work 
one sentence at a time. and assume that each sentence is terminated by 
an explicit end of sentence marker. A flexible parser must be able to deal 
with Ihe potenliai absence of such a marker, and recognize the sentence 
boundary regardless. It sllould also be able to make use of such 
punctuation if il is used correctly, and to ignore it if it is used incorrectly. 
Instead of punCtuation, many interactive systems use carriage-return to 
il~'Jicale sentence termination. Missing sentence terminators in this case 
correspond to two sentences on one line. or to the typing of a sentence 
without the terminating return, while specious terminators correspond tO 
typing a sentence on more than one line. 
2.5. Btokon-OflandRestaHodUtferallcas 
In spoken language, it is very common to break off and restart all or part 
of an utterance: 
I want to -- Could you lell me the name? 
Was tile man --er-- tile ofliciol here yesterday? 
Usually. such restarts are sKjnall~l in some way. by "urn" or "er". or more 
explicitly by "lers back tip" or some si,,Ior phrase. 
In written language, such restarts do not normnlly occur because they 
are erase(l by lhe writer bolore the reatler sees Ihenl. interactive 
COmputer sysle--n~ typically prpvide facilitios for Iheir users tO delete the 
last cllorocler, word. or ctlrletlI hno as Ihotlgh ii had never been typed, for 
the very purpose of allowing such restalts. Given these signals, tl~e 
lustarIs aru ~Jasy Io (letecl anti inlerpr(;I. However. sonle|inlL'bs tIS(~rs I:lll to 
make use ol Ihese s=gnals. Sometimes. for instance, i~lptlt not containing 
a carriage-return can be spread over several lines by intermixing of input 
and output. A flexible parser should be able to make sense out. of 
"obvious" restarts that are not signalled, as in: 
delete the show me aU the messages from Smith 
2.6. Fragmentary and Otherwise Elliptical Input 
Naturally occurmg language often involves utterances that are not 
complete sentences. Often the appropriateness of such fragmentary 
utterances depends oil conversational or physical context as in: 
A: Do you mean Jim Smith or Fred Smith? 
B: Jim 
A: Send a message to Smith 
B: OK 
A: with copies to Jones 
A flexible parser must be able to parse such fragments given the 
appropriate context. 
There is a question here of what such fragments should be parsed into. 
Parsing systems which have dealt with the problem have typically 
assumed tl it such inputs are ellipses of complete sentences, and that 
their parsing involves finding that complete sentence, and pursing it. Thus 
the sentence corresponding to "Jim" in the example above would be "I 
moon Jim". Essenhally this view has been taken by the LIFER \[81 and 
GUS \[2l systems. An alternative view =s that such fragments are not 
ellipses of more complete sentences, but are themselves complete 
98 
utterances given tile context in which they occur, and sholdd be parsc<l as 
such. We have taken this view in our approach to flexihto parsing, as we 
will explain more fully below. Carbonoll (personal communication) 
suggests a third view appropriale for some fragments: that of an extended 
case frame, hi tile second examt.lle above, for instance. A's 'with copies 
fo Jones" forms a natural pint ul the c=ts~.' Irame est~.lblish~t fly "Self(| a 
message to .~;mith" Yet :molh~.,r approach to Ir~lgmnnt l)ar:;iflq is taken in 
the PLANES system ~ 12\[ which always parses in terms el major fragments 
rather than Complete utterances. This technique relies on there I~ing 
only one way to combine Ihe fragments thus obtained, whicll may he a 
reasonable aSs|lnlptJon tar ill;.iny limited clara;rot systenls. 
Ellipses call ulna occur without regard Io context. A type Ihal 
inleract=ve .';yshtms are paHK:uhtrly likely 1o I:.lce is cryl)licness in which 
;irhcles :tnd fdh(~r nOll-e~.~.%enlJ;iJ words are entitled ;is ill ":;how nleSS;.IgOS 
alter June 17" inste.;p.I ol the m¢lre complete ".,;how me all mesnacles dat(.~l 
after June 17" Again, tiler(: is a question of whether to consider Ihe 
cryptic tnl)LII cunlpluh~, which would me~fn inodJlying file system's 
urzmmmr, or whether to consider il ellil}tical, and cnmplele it by using 
Ilexlble techniques te parse if against the comply.re versioll as it exisls in 
Ihe standard gr;Inlnlar. 
Other cam;non forms of ellipses are associated with conjunction as in: 
John got up and \[John\] brushed his teeth. 
Mary saw Bill and BIll {sawl Mary. 
Fred recognized \[Ihe buildingl and \[Fred\[ walked towards the building. 
Since conjunctions can support such a wide range of ellipsis, it is 
generally impractical to recognize such utterances by appropriate 
grammar exlensions. Efforts to deal with conhlnctJon have Iherefore 
depended on general mecllanisms which supplement the basic parsing 
strategy, as in fhe LUNAR system \[fSl, or wilich modify the grammar 
temporarily, as ill the work el Kwasny and Sondheimer I IOI. We have not 
attempted 1o deal wilh tills type of ellipsis in our parsing system, and will 
not discuss further the type at flexibility it requires. 
2.7. InierjectedPhrases, Omission, and Substitution 
Sometimes people inlorject noise or other qualifying phrases into what 
is otherwise a normal grmnmatical flow as in: 
I want the message dated I think June 17 
Such interjections can be inserted at ahnost any point in an utterance, and 
so must be dealt with as they arise by flexible techniques. 
It is retahvely straightforward for a system of limited comprehension to 
screen out and igfloro standard noise phrases such as "1 think" or "as lar 
as I can tell". More troublesome are interjections that cannel be 
recogni,~ed by the system, as might for instance be the case in 
Display \[ju.'~I to relre:;h my memory I the message dated June 17. 
I want to see tile message {as I forgot what it saidJ dated June 17. 
where the unrecognized intefiections are bracketed. A flexible parser 
should be able to ignore such interjections. There is always tile chance 
that the unrecognizc~t part was an important part of what tile user was 
Iryillg In say, bl.fl clearly, the problems that arise from tills c;.tnllot be 
handlml by a parser. 
Omissions of words (or phrases) from the input are closely related to 
cryptic input aS discussed above, and one way of dealing with cryptic 
IflpLll in to treat il as a set of omi.~,~ions. However, Jn Cryptic input only 
iness~.*fdi~d ifdormaliOll is missed oul. while it is cooceivable thai one could 
also onlit essential ifllormation as ill: 
Display Ihe men,age June t 7 
Herr~ it is unclear whether tile Si)e\[lker illeans a ines.,Ja(le dated ell ,hlne t f 
or b*:lore Juno 17 or ;liter June 17 (we assume that the system addfessc~t 
Calf di.~;t)lay lhilt~ts illlfn(.~lJately, or i1ol at all). If aft onlission can b~ 
i1;llrowl~(I (l()Wll ill IhJs w;ly, tile I);fr.°,l?r nllnldd he. • ;it)k. TM tO gE,itf'!r;llP :ill tile 
alfern~diven liar c¢lnh~xtual resohllinfl nf the ambiHllily or for the basis of a 
(lllesti(lll Io tile us¢.~r). If tile omis.'~inn can be narrowed down to one 
;llh.~rn;llive fhell tile illl)tlt was flleloly CI yl)tic. 
Besides omitting words and phrases, people sometimes substitute 
incorrect or unintended ones. Often such substitutions are spelling errors 
and should be caught by Ihe spelling correction mechanism, but 
sometinles they are inadvertent substitutions or uses of equivalent 
vocabulary not known tO the system. This type of substitution is just like 
an omission except that there is an unrecognized word or phrase in the 
place where tile omitted input should have been. For instance, in "the 
message over June 17", "over" takes the place of "dated" or "sent after" 
or whatever elst: is appropriate at that point. If the substifution is of 
vocabulary which is appropriate but unknown to the syslem, parsing o| 
substihlted words can provide tl~e basis of vocabulary extension. 
2.8. Agreement Failure 
It is not uncommon for people to fail to make the appropriate agreement 
between the various parts of a noun or verb phrase as in : 
I wants to send a messages to Jim Smith. 
\]'he appropriate action is to ignore the lack of agreement, and Weischedel 
and Black \[13J describe a melhod for relaxing the predicates in an ATN 
which typically check for soch agreements. However, it is generally not 
possible to conclude locally which value of the marker (number or person) 
for whicll the clash occurs is actually intended. We considered examples 
in which the disagreement involves more than inflections (as in "tile 
message over Jr,he 17") in the section on substitutions. 
2.9. Idioms 
Idioms are phrases whose interpretation is not what would be obtained 
by parsing and interpreting them constructively in the normal way, They 
may also not adllere to the standard syntactic rules. Idioms must thus be 
parsed as a whole in a pattern matching kind of mode. Parsers based 
purely oil patlern matching, like thai el PARRY I I t J, titus are able to parse 
idioms naturally, while others must eifher add a preprocessing phrase of 
pattern matchimj as in tile LUNAR system \[15~. or mix specific patterns in 
will1 more general rules, as in Ihe work of Kwnsny and Sondheimer \[10\]. 
Semantic grammars \[3, 81 provide a relatively natural way of mixing 
idiomatic and more general patterns. 
2.10. User Supplied Changes 
In normal hunlall conversalif}fl, once SOme;Ihing is said, it is suid and 
c;.tllnOt be ch,lnul.~t, excl;pt indirectly by more words wlfich refer Uack to 
tile original ones. In inleractively typf.~l lie)at, there is alwayS the 
possit)ilily thai a user nlay notice ;.in error he has made ;.ind go back an(I 
correcl it hmf.~(:ll, wilhoul wading for the :wstem to ptlrslle =Is own, 
possibly slow and inef\[e(:tive, motile(Is el correction. Wilh appropriate 
editing lacilities, Ihe user may do this wilhoul erasing inlervening words, 
alld, if |he system is processing his input oil a word by word basis, may 
3. An Approach to Flexible Parsing 
Most current parsing systems are unable to code with most of the kinds 
of grammatical deviation outlined above. This is because typical parsing 
systems attempt to apply their grammar to Illeir input in a rigid way, and 
since deviant input, by defimtion, does not conform to the grammar, they 
are unable to produce any kind of parse for it at all. Attempts to parse 
more flexibly have typically involved parsing strategies to be used after a 
tog-down parse using an ATN It4J or similar tran~lion net has failed. 
Such efforts include the ellipsis and garapllrase mechanisms of LIFER \[81, 
tile predicate relaxation techniques of Weischedel and Black \[13J, and 
several of the devices for extending ATN's proposed by Kwasny and 
Sondheimer \[ 101. 
thus alter a word that the system has already processed. A flexible parser 
must be able to take advantage of such user provided corrections to 
unknown words, and to prefer them over its own corrections. It must also 
be DreDared to change its parse if the user changes a valid word to 
another different but equally valid word. 
99 
We have constructed a parser, FlexP. which can apply its grammar tO 
its input flexibly, and thus deal wdh the grammatical deviations discussed 
in the previotls sechon We shotdd empllas~;~e, however, that FlexP is 
designed to be used in thu lltturluce to a restncted-domain system AG 
such. it is intended to work Irom a domuilt-sDecific semantic grammar. 
rather titan one st.tuble Ior broader classes of input. FlexP thus does not 
embody a solutloll for Ilexible parsing of natural language in general. In 
describing FlexP. we will note those of its techoiques that seem unlikely to 
scale up to use with more complex grammars with wider coverage. 
We have adopted in FlexP an approach to flexible parsing based not on 
ATN's. but closer to the pattern-matching purser OI tile PARRY system 
\[11J. possibly tim most robust parser yet constructed. Our approacl~ is 
based on several design decisions: 
• bottom up rather than top-down por~ing: This aids io the 
• Parsing el fragmentary utterances, un(I in the r~rxll.li¢,l nf 
interjechonR alld restarts. 
• pattern matching: 1 Ilis is essential Inr idioms, and also aids 
in tile ilelection n! omissions and sobsMutions in 
non-i(limontic phrases. 
• parse suspension and conli,luoiion: Thu ;tt)ilily to 
F.uspelld it I);Irse and letter re.~Lin|e il.'; I)rocnRsilU,| i~ illtllortant 
for intorlections, restarts, and non-explicit terntinolions. 
In the remain(ler of this section we examine and juslify these design 
decisions in more detail. 
3.1. Bottom-Up Parsing 
Our choice of a bottom-up strategy is based o, our need to rocu~jnize 
isolated sentence Iragments. If an utterance which would normally be 
considered only a fragment of a complete sentence is to be recognized 
top-down, there are lwo approaches to take. First. the grammar can be 
altered so that Ihe fragment is recognized as a complete ulteraoce in its 
own right. This is undesirable bee;ruse it can cause enormous exp;msion 
of the grmnmar, and because it becomes difficult to decide whether s 
fragmeot appears in isolali~ or as port OIa larger utterance, especiully if 
the possibility of missing end of sentence markers also exists. The second 
option is for the purser to infer from the convers;ttidnal context what 
grammatical sub-category (or sequence of sub-cate(jories) the fragment 
might fit Dnto. and thee to do a top-down parse tram that sub-category. 
This essentially is tile tzlctic used in the GUS \[21 and LIFER lot systems. 
This strutegy =s clearly better than the first one. but has two Problems; first 
of predicting all no.ss~ble sub-categories which might come next. and 
secondly, of inefficiency if a large number are predicted. Kwosr.y and 
Sondheimer I10\] use :. combination of the two strategies by temporarily 
modifying an ATN grammar to accept fragment categories as complete 
ulterances at the braes they are contextually predicted. 
Pattern-uP Doming avoids the problem of predicting what 
sub-categories may occur. If a fragment filling a given sub-category does 
occur, it is ~3rsed as such whatever the context. However. if n given input 
can be p.'~rsed as more thon one sub-category, the bottom-up approach 
would llave to produce them all. even if only one would be predicted 
top-down. In a syslem of limited comprehension, fragmentary recognition 
is sometunes necessary because not all of an input con be recognized, 
rather tilan because el intentional ellipsis. Here. it is probably in)possible 
to make pte(tictloos altCI bottom-up pursing is tile ()lily toothed that is likely 
to work. As described below, boltom-up stnltegms, coupled with 
suspended purses, are also helphrl in recognizing mteqections and 
restarts. 
3.2. PatternM~tching 
We have chosen to use a granlnlar of linear I);lltorns rntller thao a 
ITuiiSlllOn network boc;.ttl..;e palterll-nl{llChlllg ineshus well wllll I)olJoln.up 
purSlllg, bec;.itise it f;.1ciIitutes reco~l|lllOiI (11 UIIuI;uIcuS wilh nllli.%sioIl.~ 
;|llt| SUbStitutiOnS. ;|ll(i \[~3cause it is I~eces.~.ury ;.lllyw;ly l~Jr tile lecogndion 
oi i(tidm;itiC phrases. 
TIIu (.}r31lllil;.t; oJ the parser is ;.= SOt of rewrde or I)roduCtlOIt rlllt~$ whose 
tell h;.u)(I :role is ;.t til)(.l\[il II;.l|tL=fn Of COil:;llttlHIttS (ll;XlL;;.ll ()1 hl(Ih(}l k:vel) ;tltll 
wllose right hand side derides a result constWJi}ot. Elenleots el the 
pattern may be labelled opholsal or allow for repeated matches, We make 
the assumption, certainly true Ior the grammar we are presently working 
with. that the grammar will be semantic rather than synt{tctic, with patterns 
corresponding tO idiemntic phrases or to object and event descriph~,ls 
meonulgful it) some hmitod domain, rather than to general syntactic 
structures. 
Linear patterns fit well with bottom-up parsing because they can De 
indexed by any of their components, and because, once indexed, it is 
straiglltforward to confirm wl)ether a pattern matches input already 
processed in a way consistent with the way II~e pattern was indexed. 
Patterns help with rite detection of omissions and substitutions because 
in either case the relevant pattern can still be indexed by the remaining 
elements that appear correctly in the input, and thus the pattern as a 
whole can be recognized even if some of its elements are missing or 
incorrect. In the case of substitutions, such o technique cnn actually help 
locus the st~011ing correction, proper name reco(jnition, or vocabulary 
learning techniques, whichever is appropriate, by tsolahng the substituted 
input and the pattern constituent which it should have matched. In effect. 
this allows the normally bottom-up parsing strategy to go top-down to 
resolve such substitutions. 
In normal left to right processing, it is not necessary to activate all the 
patterns io(lexed by every new word as it is COnSidered. If a new word is 
accounted lot by a pattern that has already been partKflly matclled by 
previous input, it is likely that no other patterns need to be indexed and 
mulched Io~" thai input, ll)ts heuristic Plows FlexP's pursing algorithm to 
limit the number of patterns it toes to ntatch. We should emphasize. 
however, that it is a I'.ettr|stic. and while it has caused us no trouble with 
the limited*domino grammar we have been using, it is unclear how well it 
would transfer to a more complex grmnmar. FlexP's algorithm does. 
however, carry along ntultii)le partial par.."~es in other alliblguOUS cases. 
removing tile need for any backtracking. 
3.;3. Parse Suspension and Continuation 
FlexP employs the technique of suspending a Parse with the possibility 
el later cominualion to help with the recognition of inlerlecliofls, restartS. 
and implK, il termlnatio,s. Tile I}arsmg algurittun works tell to right in a 
t}re:tdlh-lir.qt retainer. It ntainlui=is a set of p;Irtiu! parses, each el which 
~tccotlnts for Ihe input ulre~lty proces.=~..(t but riot yet accot.llod lot by .'1 
COmpleted pari.;e. The purser attempts to incorporate o~tch new input into 
each of Ihu P;trtial p~.~rsOs. I{ Ihis is successful, the t)artiul parses are 
exleniled al~l lil:ly irlcreos~ or decrease ill ittinlber. If no partial purse can 
be extendo~t, the entire set is ~.lVed as a SUspended parse, 
There are several possible explanations for input mismatch. Le. the 
failure o! tile nex! input tO extend a parse. 
• The input could be an implicit terminal=on, i.e. the start of a 
new top-level utterance, and the previous utterance should be 
assumed complete. 
• t he: Inp¢ll ¢util~.i b~J a reslart, m whlcll case li.e active Parse 
should be abandoned and a new parse starte(I Item that point. 
• The input could be the start of an interjection, io which case 
lhe actwe parse should be temporarily suspended, and a new 
mtrse started for the intorlection. 
It is not possible, in general, tO dL~tmguish between these cases at the 
time tim mismatch occurs. II the active parse is not at a possible 
termination Point. then input mismatch cannot indicate implicit 
100 
termioation, but may indicate either restart or interjection. It is necessary 
to suspend the active parse and wuit to see if it is continued at the next 
input mismotclt. On the other hand. if the active parse is at a possible 
termination point, input mismutch does not rule out interjection or even 
restart. In this situation, our algorithm tentatively ussumes that there has 
been an implicit termination, but suspends the active parse anyway for 
subsequent potential continuation. 
Note also that tl~e possibility el implicit termination provides justification 
for the strategy of interpreting each input immediately it is received. If the 
input signals an implicit termination, then the user =nay well expect the 
system to respond immediately to the input thus terminated. 
4. Details of FlexP 
This section describes how FlexP achieves the Sex=bit=ties discussed 
earlier, The implementation described is being used as the parser for an 
intelli(jent interface Io ;i multi-mediu message system \[ 1 \], The intelligence 
in this interface is cnncentrated in u tl.ser A(lent whictl =ned=sites between 
the user and the underlying tool System. The Agent ensures that the 
interaction goes smootlfly by, amoog other things, checking Ihat tile user 
specifies the operations he wants performed and their parameters 
correctly and uuumbiguously, conducting a dialogue wilh the user if 
prohlems arise. Th(: role el FlexP" us tile Agent's parser is to transform the 
user's input into the internal ropresenlutions employed by tile Agent. 
Us.idly this inl)ut is a re(Itlest for aclio, hy the to(ll or a description of 
obiects known to the tool. Our exzmq=les are drawn from that context. 
4.1. Prolimi.aryExample 
Suppose tile user types 
display new messages 
Interpretation begins as soon as any input is available, The first word is 
used us an index into the store of rewrite rides. Each rule gives a pattern 
and u structure to be pr=xlu(:od when lira pattern is matcherf. The 
components el the structure ure built from the structures or words which 
match the elements of the pattern. The word "display" indexes the rule: 
(pat.=.or.: (I)isplay Message Descript. i.on) 
result,: J SLrucLureiype: OperaL ionReques IL 
OperaLion: Display 
Message: (Fit let Messagel)escr'ipLion)\] 
Using this rule Ihe parser constructs the partial parse tree 
(Display MessageOescr ipt io.) 
I 
I 
display 
We call the partially-instantiated pattern which labels the zipper node a 
hWJothesis. It represents a possible interpretation lot a segment of input. 
The next word "new" does not directly match the hypothesis, but since 
"new" is a MsgAdj (an adjective which can modify a description of a 
message), il indexes the rule: 
(paLLm'n: (?l\]et *MsgAdj Msgllead *MsgCase) 
resulL: J St.ruc L.ro I yl)e: MessageDescripL ion 
Cnllq)O,e, LS : ............ \] ) 
Here. "?" means optional, and ..... means repeatable. For the sake Of 
clarity, we have omitted other prefixes which distinguish between terminal 
and non-terminul pattern elements. Tile result of this rule fits the current 
hypothesis, so extends the purse as follows: 
(t) isliIny Messagel\]escr ip L ion) 
I I 
I I 
J (?DeL *MsgAdj Msgllead °MsgCase) 
I 1 
i I 
it=splay new 
1 he hypolhesis is not yol hdly conlirm(.,d evq.,n Ihuugh all tht; elements ore 
It|arched. It.~ !;l~(:i)ll¢l t~lt~ltll~lll n=all.~he."~ ;tlnlthq~r h~w~r I,~vt~l hypothesis 
which is ooly iucompletoly matclled. Thi.s lower putluH= I)ut:ulnus Lime 
clirr(,rlt hw~=lHIt!:;t:; b~c;.lus~; il pledicts whal should COllie iit.~x\[ ill the illput 
stream. 
The third input m;.dcho.,; Ihe C;.It(~tlory M:;gl-lead (head noun el a 
met.sage (lest:Silltion) and so lits tile current hypothesis, This match lills 
the lust non-oplional slot in Ihut pattern. By doing so it makes tile current 
hypothesis and its parent pattern potemia/ly complete. When the parser 
finds a potentially complete phrase whose result is of interest to the Agent 
(and the parent phrase in this example is in that category), the result is 
constructed and sent. However. since the p;irs~,r has not seen a 
lomlination signal, this purse is kepl u(.,hvu. Ihu iiq)ut 5,;us su lur may be 
only a prefix Ior some longer utterance such as "display new messages 
about ADA". In this case "ubout ADA'" would be recognized as a match 
for MsgCase (a prepositional phrase that can be part of a message 
description), the purse would be extended, and a revision of the previous 
slructul'e sent to the Agent. 
4.2. Unrecognized Words 
When an input word cannot be found in the dictionary, spelling 
correction is attempted in a background process which runs at lower 
priority than the parser, 1"he input word and a list at possibilities derived 
front the current hypothesis are passed as arguments. For example: 
display the new messaegs 
produces lhe partial parse 
(Display MessageOescrip =.ton) 
I I 
I I 
I ( ?Pet "MsgAdj Msgtlead °MsgC use ) 
I I I 
I t I 
display Lhe new 
The lower pattern is the current hypothesis and has two elements eligible 
to match the next input. Another Ms(JAdi could be matched. A matcll for 
MsgHeud would also lit. Both elements have associated lists of keywords 
known to occur in phrases wl~ich match them. The one for MsgHead 
inclu(les tl~e word "nt~.~os,ages ''. and the spelling correcter passes this 
back to the purser as the most likely interpretation. 
In some cases the spelling correcter produces several likely 
alternatives. The parser handles such alnhiguous words using the same 
mecllanisms which ucconlmotlate phrases with ambiguous interpretations 
\]'hut is. ulternative interpretations are curried altJng until Ihere is enough 
input to discriminate those which are pla.sible from those which are not. 
| lie d~.,tails ira: given in the n~:xt section. 
The user inuy also corrl:ct Ihe input taxi himself, These changes are 
hundle~l in ilnlch the S;llno way as those proposed by Ihe spellillg 
correcter. Of course, thes~ u'.~.r-suppliot ch;ingos ure given priority, and 
Ililrs=..'s built u~.allg Ihe formal ver'.;iun musI lxJ mlv.lili.~l or discarded. 
Spellimj correction is run as a separate, lower priority process because 
it reusonublo parse may be produced even without a proper interpretation 
for the unknown word. Since spelling correction can involve rather 
time-consuming searches, this work is best done when the parser has.no 
better alternatives to explore. 
4.3. Ambiguous Input 
In the first example there was only one I~ypothesis about the structure 
Of the input. More generally, there may be several hypotheses which 
I)rovide competing interprelutions uboul what has already been seen and 
whal will appear =text. Until these p~lrtial parses are Iound to be 
inconsistent with the actual input, they are carried along as part of the 
~zctive purse. Therefore the active parse is a set at partial purse trees each 
101 
efficiency required for real-time response, but could conceivably fail to 
find appropriate parses. We have not encountered such circumstances 
wilh tile s=nall domain-specitic semantic grammar we have been using. 
4.4. Flexible Matching 
rl+e oaly Ilexibiltty described so lar *s that allowed by the optional 
elements el patterns, II om~ssions can be anttcipLIte(I, allowances trlay be 
built Ilil(= the grammar. In Ihi$ sechon we show how other OlnissiOI1S may 
h~ lUllittl(;~t ;tnlt Olhee Ilexitiililles achit=ved by ~j|low,ncj ;t(J('liliontil freHtlom 
in the wtw an item is allowed tO matcI1 a pattern. Ihere are two ways in 
with a top-level Ilypothesis about the overall structure at the input so far 
anti a curr~nt hyl)othesis concerning the next input. The actual 
mlplementation allows sharln(j of COnln)OII structure alnOllg competing 
hypotheses and so =S more ollic=ent than this descnption suggests. 
The input 
were there any messages on ....... 
could be completed by giving a date ("+..on Tuesday") or a topic ("..+on 
ADA"). Consequently, the sub-phrase "any messages on" results in two 
partial parses: 
( ?De L "MsgAdj Msgllead °MsgCase ) 
I I \] 
I I I 
any messages (On DaLe) 
I 
I 
on 
a.d 
(?De(. "NsgAd.j Hs,jllead *NsgCase) 
I I I 
I I I 
a.y messages (O. TOpiC) 
I 
I 
on 
II 1110 next inptll were "Tttesday" it wold(| be consislenl will1 Ihe tirst parse, 
I)lll nnf the necond. Shice one ol the \[tJlOrn;itlVeS (|DOS ~lccount tar the 
lilt)el. Ihoso thai do IIOi may I)~J (tisc;tr(Ic'(I. On IhP. oilier liimd, it :Ill tile 
i):.lrti~.lt |):.|ISI!',.~ tilll tO Ilt;.lt(:h lilt. = in|lilt. Oll'~t.~l ;tctiol~ iS t;,tkoll. We consi(tor 
511Ch L~IIU;Iil(+IIS ill the S(.~llOl) UII suspol+th.~,l fxlrses. 
AS ~ tjeltur\[tl str:.ltegy, we carry seVel :.11 linssitile inlerl)retallOltS only as 
kintj ;I.~ thert! is 11o clear lit;st ;.lllernalive. II1 l):.lrlictllar r'~o fh~xible parsing| 
t*.,chniqueS are us~t to suttl)ort parses Ior which th,.=re are pl-'tuszblo 
;alternatives tmt|or normal imrsing. This heuristic helps achieve 11)0 
wlllch the malching crilerla may be relaxed, namely 
• relax consistency constraints, e.g. number agreement 
• allow out Of order matches 
Consff;lency constraints are predicates which are attached to rules. 
They assert relationships which must hold among the items which till the 
pattern Fhese constraints allow contexl-sensilive constructions in the 
gramnmr. Such predicates are commonly used for simdar purposes by 
ATN parsers 1!41 and the flexibility achieved by relaxmg these constraints 
has been explored belore 113J. The tochmque fits smoothly into FlexP but 
has no1 ;icttJally been needed or used in our current application. 
On tDe other hand. nut of order matching is essential for the 13arser's 
aPliroach Io errors Of OlniSSlOn. transposition, anti substitution. Even 
wilel~ strictly Iltl(.=liir~l~(J. several eielnents el ~ t);JllC'rll may tie elk llbie to 
match lhe next input item. For example. =n the pattern for a 
MessageOescription 
(?DeL "MsoAd j Msgllead "MsgCase) 
each at the lirst thre~, etemetlts is indi;dly eligible but the lasl is not. On the 
otilt.~r h;ind+ uncu Msullead it;is I~.'cn mLttclie(I Dilly tile last elenlelll iS 
eligible trader tile strict interpretation ot the pattern. 
Consider the input 
dlSpl~ty new ;|i~1.11 A\[')A 
Tile I.~;t Iwo words p;.~rse normally to produce 
(Display MessageDescript. ion) 
I I 
I I I ( ?Pet "NsgAdj NsgHead "MsgCase ) 
I I 
I I 
display new 
The next word (foes not fit that hypothesis. The two eligible elements 
predict either another message adjective or a MsgHead. The word 
"about" does not match edher ot these, nor can the parser construct any 
path to them using intermediate hypotheses. Since there are no other 
partial parses available to account for this input, and since normal 
matching tails, flexible matching is tried. 
First. previously skioPed elements are compared to the input. In this 
example, the element ?Pet is considered but does not match. Next, 
elements to the right of the eligible elements are considered. Thus 
MsgCase is considered even though the non-optional element MsgHead 
has not been matched. This succeeds and allows the partial parse to be 
extended to 
(Display MessageDescript~on) 
I I 
I I I (?gel °Msg^dj Msgtlead "MsgCase) 
I I I 
I J (AbouL Lop ic) 
I I I 
display new about 
which correctly predicts the final input item. 
Unreeocjnizable substitutions are also handled by this mechanism. In 
the pll ra.se 
display the new stuff aboul ADA 
the word "stuff" iS not found in the dictionary so spelling correction is 
tned but does not produce any plausible alternatives. While spelling 
correction =s underway, the remaining spurs can be parsed by siml~y 
omlthng "stuff" and using the flexible matching proce<hJre. 
Tr;.lnspo31llOlIS :.ire handlEKI Ihrough one applic-'~llofl el Ilexible matching if 
Iho elemenl of the IransposL'<l pair is option~d, two applic;.tlions if not. 
4.5+ Suspe-dodParses 
h'lteri~.~:;tions are inore colnll~on in spoken than in wl ;ell language but 
do at:cur if= lyp(~t input sglnOltlnes. To deal wdh such ,1put, out design 
allows lot blocked patios tO be suspended rtllher than merely discarded. 
Users. especially novices. =nay embellish their inpul will1 words and 
phrases that do r',~t provide essential information and cannot be 
specifically anl,clpalet+ Consider t.vo examines: 
display please massages dated June 17 
disl~ay Ior me messages dated June 17 
In the first case. the ml~.rjected word "please" could be recognized as a 
r:.mnmon noise phrase wI.ch means nothing to the Agent except possibly 
to suggust that the user is a nowce. The second example is more difficult. 
Both words of the interjected phrase can appear in a num0er of legitimate 
and me~lnu'lghJI constru+;h(.a.'~: they cannot be ignored so easily. 
102 
For the latter example, parse suspension works us follows. After the 
first word, the active parse contains a single partial parse: 
(Display HossageDesc r't pt. ion) 
I 
I 
display 
The next word does not fit this hypothesis, so it is suspended. In its place, 
a new active parse is constructed. It contains several partial parses 
including 
(For Person) and (For Thne\[nLerva\]) 
I I 
t I 
for rot 
The next word confirms the first of these, hut the fourth word 
"messages" does not. When the Darser finds that it cannot extend the 
active parse, it considers the suspended parse. Since "messages" fits, 
the active and suspended parses are exchanged anti the remainder of the 
input processed normally, so that the parser recognizes "display 
messages dated June 17" as if it had never contained "for me". 
5. Conclusion 
When peDDle use language naturally, they make mistakes and employ 
economies of expression that. allen result in language which is 
ungrammalical by strict standards. In particular, such grammatical 
deviations will inp.vilabty occur in the inpul of a computer syslem which 
allows its user Io elnploy nalural langua¢.le. Such a computer system must, 
Ihert~.ior¢:, I}o p,t~l);Lrt~H to I)arsH its input nexibly, if il is avoid Irt=slration for 
its user. 
ht this paper, we have attemple'(J Io outline the main kinds of flexibility a 
nc'ttural I;.tnguage parsur intended for ~att=ral use sltouk| provide. We also 
describod a bottom-up pattern-matching parser, FloxP, which exhibits 
these Iloxibilities, and wllicl~ is suitable for restricted natural language 
input to a limited-domain system. 
References 
1. Ball, J. E. and Hayes, P. J. Representation of Task-Independent 
Knowledge in a Gracefully Interacting User Interface. Tecll. Rept., 
Carnegie-Mellon University Computer Science Department, 1980. 
2. Bobrow, 0. G., Kaglan, R. M., Kay. M.. Norman 0. A., Thompson, H., 
and Wintxjrad, T. "GUS: a Frame-Driven Dialogue System." Artificial 
Intelligence 8 (1977), 155-173. 
3. Burton. R. R. Semantic Grammar: An Engineering Technique for 
Constructing Natural Language Understanding Systems. BBN Report 
3453, Bolt, Beranek. and Newman, Inc., December, 1976. 
4. Carbonell, J. G. Towards a Self-Extending Parser. Proc. of 17th 
Annual Meeting of the Assoc. for Comput. Ling., La Jolla, Ca., 
August, 1979, pp. 3-7. 
5. Carbonell, J. G. Subjective Understanding: Com~uter Models ol 
Be/ielSystems. Ph.D. Th.. Yale University, 1979. 
6. DeJong, G. Skimming Storiesin Real-Time. Ph.D. Th., Computer 
Science Dept., Yale University, 1979. 
7. Hayes, P. J.. and Reddy, R. Graceful Interaction in Man-Machine 
Communication. ProD. Sixth Int. Jr. Conf. on Artificial Intelligence. Tokyo, 
1979, pp. 372-374. 
8. Hendrix, G. G. Human Engineering for Applied Natural Language 
Processing. Pro(. Fifth Int. Jr. Conf. on Artificial Intelligence, MIT, 1977, 
pp. 183-191. , 
g. Kaplan, S. J. Cool)nrative \[~espunses tram a Portable Natural 
l.lllU!l~!;(: Data B~l:;~ QL,~tV Sy.~t(~m. Ph.D. Th.. Dept. of Computer and 
Intormalion Science, University of Pennsylv~ulia, Philadelphia. 1079. 
10. Kwasny, S. C. and Sondheimer, N. K. Ungrammaticality and 
ExtraoGrammaticality in Natural Language Understanding Systems. Pro(. 
of 17Ul Annual Meeting of the Assoc. for Comput. Ling., La Jolla, Ca., 
August. 1979, pp. 10-23. 
11. Parkison. R. C., Colby, K. M.. and Faught. W. S. "Conversational 
Langua(.io Comprehension Using Inlegraled PzHtern-Matching and 
Parsing." ..lttthci~d hlt~lliget~c~ ~.) (1077). I 11-134. 
1 2. Waltz. D. L. "An English Language Que.~lion Answering System for 
a Large Relational gala Base." Comm. ACM 2 1.7 (1978). 526-539. 
13. Weischedel. R. M. and Black. J. Res~)onding to Polentially 
Unparseable Senlences. Tecta. Rept. 79/3. Depl. ol Computer and 
tniormalion Sciences, tJniversity ol Delaware, 1070. 
14. Woods, W. A. "Transition Network Grammars for Natural Language 
Analysis." Comm. ACM 13, 10 (October 1970), 591-606. 
15. Woods, W. A.. Kaglan, R. M., and Nash-Webber, B. The Lunar 
,°~:ienc,~; t altLiH~,l~t '.';y',~teln: Final Report. Tech. Rept. 2378, Bolt, 
Beranek, and Newman, Inc., 1972. 
103 

