Entity-Oriented Parsing 
Philip J. Hayes 
Computer Science Department, Carnegie.Mellon Llniversity 
Pi~tsbur~ih, PA 152_13, USA 
Abstract f 
An entity-oriented approach to restricted-domain parsing is 
proposed, In this approach, the definitions of the structure and 
surface representation of domain entities are grouped together. 
Like semantic grammar, this allows easy exploitation of limited 
dolnain semantics. In addition, it facilitates fragmentary 
recognition and the use of multiple parsing strategies, and so is 
particularly useful for robust recognition of extragrammatical 
input. Several advantages from the point of view of language 
definition are also noted. Representative samples from an 
enlity-oriented language definition are presented, along with a 
control structure for an entity-oriented parser, some parsing 
strategies that use the control structure, and worked examples 
of parses. A parser incorporaling the control structure and the 
parsing strategies is currently under implementation. 
1. Introduction 
The task of lypical natural language interface systems is much 
simpler than the general problem of natural language 
understanding: The simplificati~ns arise because: 
1. the systems operate within a highly restricted domain of 
discourse, so that a preci..~e set of object types c;~n be 
established, and many of tl;e ambiguities that come up in 
more general natural language processing can be ignored or 
constrained away; 
2. even within the restricted dolnain of discourse, a natural 
language i.terface system only needs to recognize a limited 
subset of all the Ihings that could be said -- the subset that 
its back-end can respond to. 
The most commonly used tr:chnique to exploit these limited 
domain constraints is semantic ~j~amrnar \[I, 2, 9\] in which 
semantically defined categories (such as <ship> or <ship- 
attrihute>) are used in a gramrnur (usually ATN based) in place of 
syntactic categories (such as <noun> or <adjective>). While 
semantic grammar has been very successful in exploiting limited 
domain constraint.~ to reduce ambiguities and eliminate spurious 
parses of grammatical input, it still suffers from the fragility in the 
face of extragrammatical input characteristic of parsing based on 
transition nets \[41. AI~o. the task of restricted-domain language 
definition is typically difficult in interlaces based on semantic 
grammar, in part bscaus~ th.,: grammar definition formalism is not 
well imegrated with the method of d~..fining the object and actions 
of tl~e domain of discourse (though see \[6\]). 
1This r~t,;e~rch wmJ spont;(.cd by the At; Fnrco Office of Scient=fic Resr.,'l¢,";h 
und{;r Cow,tract AFOC, R-82-0219 
\]his paper proposes an alternat;ve approach to restricted 
domain langua~fe recognition calI~d entity-oriented p;rsing. 
Entity-orie=-ted parsing uses the same notion of semar~tlcally- 
defined catctjeries a.', ~2mantic grammar, but does net embed 
these cate,:.iories in a grammatical structure designed for sy.tactic 
recognition. Instead, a scheme more reminiscent of conceptual or 
case.frame parsers \[3, 10, II\] is employmf. An entity-oriented 
parser operates from a collection of definitions of the various 
entities (objects. events, cem, m~mds, states, etc.) that a particular 
interf:~ce sy-~teln needs to r:.~cognize. These definitions contain 
informatiol~ about the internal structure of the entities, about the 
way the entitie:~ will be manifested in the natural language input, 
s~}(I about the correspondence belween the internal shucture and 
surface repres.~ntation. \]his arrangement provides a good 
frarnewo~k for exploiting the simplifications possible in restricted 
£locY~ain natt:rnl lanouage recognition because: 
1. the entitle:z; form a ~dtural set of !ypes through which to 
cun:~train Ih~; recognition semantically. the types also form a 
p.alura~ basis fnr the structurctl definitions of entities. 
2. the set of things thai the back-end can respond to 
corresponds to a subSet of the domain -:-nlities (remember 
that entities can be events or commar,ds as well as objects). 
Re the f~o~l of an entity.ori,;nted ~ystem will normally be to 
recognize one of a "top.ievel" class of entities. This is 
analogous to the sot el basic message pa~.terns that Lhe 
ir;\[~.chin~; translation system of Wilks \[11\] aimed to recognize 
in any input. 
In addition to providing a good general basis for restricted 
domain n41ural language recognition, we claim that the entity~ 
o;iented ,~pproach also fa,.;iJitate5 rubu:.;tness in the face of 
ex~r~tgrammatical input ~.l~(I ease nf k~guage definition for 
ros;r!ctc:l d'm;cJn I~ng~.~Ua:~. EnLity.arie,~ted parsh;g I',.~.s the 
potential to provide better parsing robustness Lhan more 
traditional semantic gramn~;\]r techniques for two major reasons: 
• The individual definition of aq domain entities facilit~los their 
indepcncl,~mt recoL4rfilion. As:,um;;t,':l there is apl)rof~riaLe 
inde'<ing at entiLies tl~rough lex~cai ~toms that mir;iht appt~ar in 
a surface dt.'.~cription '.}f them. thi:~ rc.cognitior: c;;n be done 
bottom.up, thus rnuking pos:.ible recognition of elliptical, 
tru~Fner{~ary, or p~rtially incornpr~.h~;,,siblo input. The same 
de~imtions can ~i..-;(, be us~cl i~ a m.:.~re eft;cic:nt top-down 
f\[l;Jt*ll!~:'l when t!le input conlorrns to the system's 
exDect.alio~\]s. 
,, Recem work \[5, 8\] h~ls suggested the usefulness of multiple 
cor~structioq.specific reco.qnition str;tt(;gies f,ar restrict,~d 
domah\] parsing, pat ticularly for dealing witll 
extragr;.'nimaiic.q! input. 1 he ir~dividual entity cJo!initlons form 
an i(h;al \[rc, rnewur}~ arcq~,d which to organize lhr multiple 
212 
strateg!es. In particular, each definitio~ can specify which 
strategies are applicable to recognizing it. Of course, "this 
only provides a framework for robust recognition, the 
robustness achieved still depends on the quality of the actual 
recognition strategies used. 
The advantages of entity-oriented parsing for language 
definition include: 
• All information relating to an entity is grouped in one place, 
so that a language definer will be able to see more clearly 
whether a dehnition is complete and what would be the 
conseouences of any addition or change to the definition. 
• Since surface (syntactic) nnd structural information about an 
entity is groupe~t to~\]ether, tile s,.trface information cau refer 
to the structure in a clear al';{\] coherent way. In particular, 
this allows hierarchical surface information to use the natural 
hierarchy defined by the structural informatiol~, leading to 
greater consistency of coverage in the surface language. 
• Since entity definitions are independent, the information 
necessary In drive Jecognilion by the mulliple construction- 
spucific strL, tegi~:s mentioned above can be represented 
directly in the form most useful to each strategy, thus 
removing the need for any kind of "grammar co~pilation" 
step and allowing more rapid £irammar development. 
In the remainder of the paper, we make these arguments more 
concrete by looking at some fragments of an entity-oriented 
lan(\]u~ge definition, by outlining the control :~truclure of a robust 
resUicted-domain parser driven by such defiqitions, and by tracing 
through some worked examples of !he parser in operation. These 
examples also shown describe some specifi~ parsing strategies 
that exploit the control structures. A parser i~=corporating the 
control structure and the parsing strategies is currently under 
implementation. Its design embodies our e;{perience with ~ pilot 
entily-oriented parser that has already been implemented, but is 
not described here. 
r--v 4 .,. ~,,ampie Entity Definitions 
This section present'~ .~r)me example eat=t,/ and language 
(lefi,fitions suitable for use in entity-oriente(\] parsing. The 
examples are drawn fi om the Oomain of an in!~rface to a database 
of college courses. Here is the (partial) de\[initio=~ of a course, 
\[ 
Ent ttyNarne : Col legeCourse 
type: Structured 
Components : ( 
\[Componen tName: £.otlrseNumber 
type: Integer 
Greater1han : g9 
LeSSI I~an : |000 
\] 
\[ComponentName : CourseDepartment 
lype: Co1 legeDepartment 
\] 
\[ C 011ll}0 n e n L N ~ll\[le : CourseC I&ss 
F3,po : CollegeC lass 
\] 
\[CemponentName : Cuurse\[nstructo¢ 
lype: Col|egeProressor 
J 
) 
Silt raceRupresen LaL ion: 
\[SynLaxfype : NounPhr~se 
IIo,l¢l: (course I sesninsr 
$CoursoDepartmenL SCour'set, umber I • • • ) 
AdiectivalCo,lponen£s: (Courseaepartment ...) 
Adjectives: ( 
JAdjecLiva\]Phrase: (new J most. recent) 
CotllpOllOn L : CollrseSemos ter 
Value: CUI'I t!q LSdm(}S ter 
\] 
i" 
PostNomina ICases: ( 
\[PreposiLion: (?intended For J directed to J .) 
Cofi|ponellt : CourseClass 
J 
LPrl:posiLion: (?L~ughL b v I ..,) 
Colnpollel1 t : Co(~rse \[ i1.~ L rllc tot 
\] 
) 
J \] 
For reasons of space, we cannot explain all the details of this 
language. In essence, zz course is definc'd as 3 structured object 
with components: number, department, instructor, etc. (square 
brackets denote attribute/value lists, and round brackets ordinary 
lists). "lhis definition is kept separate from the surface 
representation of a course which is defined to be a noun phrase 
with adjectives, postnor~irla! cases, etc.. At a more deiailed level, 
note the special purpose way of specifying a course by its 
department juxtaposed with its number (e.g. Computer Science 
101) is handled by an alternate patt.'.,rn for the head of the noun 
phrase (dollar signs refer back to the components). Tiffs allows 
the user to s,sy (redur=,~antly) phrases like "CS 101 taught by 
Smith". Nolo. also that the way the dep~¢rtment of a course can 
appear in the surface representation of a course is specified in 
terms of the £:ourseDepartment component (and hence in terms of 
its type, Colleg(;Depmln\]ent) rather than directly as an explicit 
surface representation. This ensures consistency througl~out the 
language in what will be recognized as a description of a 
department. Coupled wdh the ability to use general syntactic 
descriptors (like NounPhrase in the description of a 
SurfaceRepresentation), this can prevent the ki~,J of patchy 
coveraqe prevalent with standard semantic grammar language 
definitions. 
Subsidiary objects like CollegeDepartment are defined in similar 
fashion. 
\[ 
r n t i LyNnmn : £o I I egel)epa v Linen t 
|ypo: Er.uiiler'~L ion 
E numeratodVa lues : { 
Conlptltel SC i,nceDepartment 
Ma t hema I. i c sl)el)a r Linen t 
II istorylJeparLment 
"i" 
SurfaceRepresentat ion : 
J Syntaxlype: PaLternSet 
Patterns: ( 
\[Patt*:rn: (CS I Computer Scie,ce J Camp Sol J ...) 
Va hte : CompuLerSc ietLcel}~lpal'tment 
\] 
) 
\] 
1 
213 
r;cllegeCoursu will also be involved in higher-level entities ef our 
restricted domain such as a cc}mrnan(I to the data base ay.*t:.~m to 
+:.rol a student in a course. 
\[ 
I Ill. i~l,lalllO: \[l)l'O|COlll/ll~tl(I 
lype: Structured 
Comllonul~ts : ( 
I.CompononI.Nam+!: Fnrol leo 
fypo: CO I I~UeSL.det~L 
.I 
\[CemponenLNamu : I:nee\] \[n 
Type: Co I leg,'~Co,lrse \] 
) 
Sur f'aceRopr,;se.ta L =el;: 
Sy=lta~ \[:tp~,: \[lll;~.~r.lt. iveC.tsel'ramo 
Ilea'J: (corgi I ¢etliSLe¢ \] incl~(le \[ ...) 
II i re¢ LObju,: I.: ($E.rol lee) 
Cases: ( 
\[PreposiLi,~n: (in I tote J ...) 
CO;tlpOltOl| L : ~: It I'01 \] I} 
\] 
) 
\] 
\] 
These examples als~ show how all information about an entity, 
co.cerning both tundamental structure and surface 
representation, is grouped tooeth',~r al~d integrated. Tiff,.; supports 
the claim that entity-c~ri~nted lanuuage definition makes it easier to 
deter.nine whether a language definition is complete. 
3. Control Structure for a tqcbust Entity- 
Oriented Parser 
lhe potential advanta.qes of an entily-oriented approach from 
tile point of view of robLmtne.~3 in the face of ungr:¢mmatical input 
were outlined in the inlrodu(.tion. To exploit this potential while 
maintaining efficiency in parsing grammatical input, special 
attention must he paid to the control structure of the parser used. 
Desirable characteri,=.tics for the control Structure uf ;my parser 
capable of handling ungrammatical as well as grammatical input 
include: 
. the control structure allows grammatical input to be parsed 
straightforwardly without consider.ring any of the possible 
gralnmatical deviations d;at could occur; 
• the om~trol structure enables progr~:,~siw:.ly highP.r degrees of 
grammatical (leviatior~ Io be consi(Ic~:.~d when the ilt\[~LIt does 
not satisfy grammatical exp,~ctations; 
• the control structure ;dlows simpler deviatio.s to be 
considered before more complex deviations. 
\]he first two points are self-evident, but the third lll;+ty require 
some explanalion. "The r, robl~m it addresses arises particularly 
when there are several alternative parses under consideration. In 
s.ch cases, it is important to prevent the parser h'om cons!tiering 
drastic (levi.xtions in one branch of the par.~'e before cor~si(lering 
si~nple ones in the othur. For in::'.ance, tile par.~er sh(;uld not start 
hypothesizir=g missing words ir; one bra.ch when a ~;impl,~) sp~flli~l O 
correction in another blanch would allow tile parse I¢~ go through. 
We have (le-;i(jned a parser control .~hucture for use in e~,tity- 
oriented p~.':;in U which i}a~; all (,~, the rh;lracteristics lis~e,t above. 
Thi.~ control structure operates thrr~u~;h an acJenda mechanism. 
Each item of the agenda represents a dii'ier,.:nt nonU/\]uati.on of the 
paine, i.e. a partial parse plus a specificatit,+~ of what to do next to 
continue that partial parse, With each cont}nuation is associated 
an integer flexibility level that represents the degree of 
grammatical deviation imphed by the continuation. That is, the 
flexibility level represents the degree of grammatical deviation in 
the input if the continuation were to produce a complete parse' 
without finding any more deviation. Continuations with a lower 
flexibility are run before continuations with a higher flexibility level. 
Once a complete parse has been obtained, continuations with a, 
flexibility level higher than that of the continuation which resulted 
in the parse are abandoned. This means that the agenda 
mechanism never activates any continuations with a flexibility 
level higher than the level representing the lowest level of 
grammatical deviation necessary to account for the input. Thus 
effort is not wasted exploring more exotic grammatical deviations 
when the input can be accounted for by simpler ones. This shows 
that the parser has the first two of the characteristics listed above. 
In addition to taking care of alternatives at different flexibility 
levels, this control structure also handles the more usual kind of 
alternatives faced by parsers -- those representing alternative 
parses due to local ambiguity in the input. Whenever such an 
ambiguity arises, the control structure duplicates the relevant 
continuation as many times as there are ambiguous alternatives, 
giving each of the duplicated continuations the same flexibility 
level. From there on, the same agenda mechanism used for the 
various flexibility levels will keep each of the ambiguous 
alternatives separate and ensure that all are investigated (as long 
as their flexibility level is not too high). Integrating the treatment of 
the normal kind of ambiguities with the treatment of alternative 
ways of handling grammatical deviations ensures that the level of 
grammatical deviation under consideration can be kept the same 
in locally cmbiguous branches of a parse. This fulfills the third 
characteristic listed above. 
Flexibility levels are additive, i.e. if some grammatical deviation 
has already been found in the input, then finding a new one will 
raise the flexibility level of the continuation concerned to the sum 
of the flexibility levels involved. This ensures a relatively h!gh 
flexibility level and thus a relatively low likelihood of activation for 
continuations in which combinations of deviations are being 
postulated to account for the input, 
Since space is limited, we cannot go into the implementation of 
this control structure. However, it is possible to give a brief 
description of the control structure primitives used in 
programming the parser. Recall first that the kind of entity- 
oriented parser we have been discussing consists of a collection 
of recognition strategies. The more specific strategies exploit the 
idiosyncratic features of the entities/construction types they are 
specific to, while the more general strategies apply to wider 
cl3sses of entities and depend on more universal characteristics. 
In either case, the strategies are pieces of (Lisp) program r~.ther 
than more abstract rules or networks. Integration of such 
strategies with the general scheme of flexibility levels described 
above is made straightforward through a special split function 
which the control structure supports as a primitive. This split 
function allows the programmer of a strategy to specify one or 
more alternative continuations from any point in the strategy and 
to associate a different flexibility increment with each of them. 
214 
The implementation of this statement takes care of restarting each 
of the alternative continuations at the appropriate time and with 
the appropriate local context. 
Some examples should make this account of the control 
structure much clearer. The examples will also present some 
specific parsing strategies and show how they use the split 
function described above. These strategies are designed to effect 
robust recognition of extragrammatical input and efficient 
recognition of grammatical input by exploiting entity-oriented 
language definitions like those in the previous section. 
4. Example Parses 
t.et us examine first how a simple data base command like: 
Enro; Susan Smith in CS 101 
might be parsed with the control structure and language 
defin;tions presented in the two previous sections. We start off 
with the top-level parsing strategy, RecognizeAnyEntity. This 
strategy first tries to identify a top-level domain entity (in this case 
a data base command) that might account for the entire input. It 
does this in a bottom-up manner by indexing from words in the 
input to those entities that they could appear in. In this case, the 
best indexer is the first word, 'enro!', which indexes 
EnrolCommand. In general, however, the best indexer need not 
be the first word of the input and we need to consider all words, 
thus raising the potential of indexing more than one entity. In our 
example, we would also index CollegeStudent, CollegeCourse, 
and Co!legeDepartment However, tt'ese are not top.level domain 
entities and are subsumed by EnrolCommand, and so can be 
ignored in favour of it. 
Once EnrolCommand has been identified as an entity that might 
account for the input, RecognizeAnyEntity initiates an attempt to 
recognize it. Since EnrolCommand is listed as an imperative case 
frama, this task is handled by the ImperativeCaseFrame 
recognizer strategy. In contrast to the bottom-up approach of 
RecognizeAnyEntity, this strategy tackles its more specific task in 
a top-down manner using the case frame recognition algorithm 
developed for the CASPAR parser \[8\]. In particular, the strategy 
will match the case frame header and the preposition 'in', and 
initiate recognitions of fillers of its direct object case and its case 
marked by 'in'. These subgoals are to recognize a CollegeStudent 
to fill the Enrollee case on the input segment "Susan Smith'" and 
a CollegeCourse to fill the Enrolln case on the segment "CS 101 ". 
Both of the~e recognitions will be successful, hence causing the 
ImperativeCaseFrame recognizer to succeed and hence the entire 
recognition. The resulting parse would be: 
\[InstanceOf : Enro ICo~nand 
£nrol\]ee: \[InstanceOt': Co\]\]egeStudent 
FirstNaaes : (Susan) 
Surname: Smith 
\] 
\[nrotZn: \[\]nstance0£: CollegeCourse 
EourseDepar tment : Compute rSc I enceDepar tment. 
CourseNumber : t01 
\] 
\] 
Note how this parse result is expressed in terms of the underlying 
structural representation used in the entity definitions without the 
need for a separate semantic interpretation step. 
The last example was completely grammatical and so did not 
require any flexibility. After an initial bottom-up step to find a 
dominant entity, that entity was recognized in a highly efficient 
top-down manner. For an example involving input that is 
ungrammaUcal (as far as the parser is concerned), consider: 
Place Susan Smith in computer science for freshmen 
There are two problems here: we assume that the user intended 
'place' as a synonym for 'enror, but that it happens not to be in the 
system's vocabulary; the user has a!so shortened the 
grammatically acceptable phrase, 'the computer science course 
for freshmen', to an equivalent phrasenot covered by the surface 
representation for CollegeCourse as defined earlier. Since 'place' 
is not a synonym for 'enrol' in the language as presently defined, 
the RecognizeAnyEntity strategy cannot index EnrolCommand 
from it and hence cannot (as it did in tl~e previous example) initiate 
a top-down recognition of the entire input. 
To deal with such eventualities, RecognizeAnyEntity executes a 
split statement specifying two continuations immediately after it 
has found all the entities indexed by the input. The first 
continuation has a zero flexibility level increment. It looks at the 
indexed entities to see if one subsumes all the others. If it finds 
one, it attempts a top-down recognition as described in the 
previous example. If it cannot find one, or if it does and the top- 
down recognition fails, then the continuation itself fails. The 
second continuation has a positive flexibility increment and 
follows a more robust bottom-up approach described below. This 
second continuation was established in the previous example too, 
but was never activated since a complete parse was found at the 
zero flexibility level. So we did not mention it. In the present 
example, the first continuation fails since there is no subsuming 
entity, and so the second continuation gets a chance to run. 
Instead of insisting on identifyir,g a single top-level entity, this 
second continuation attempts to recognize all of the entities that 
are indexed in the hope of later being able to piece together the 
various fragmentary recognitions that result. The entities directly 
indexed are CollegeStudent by "Susan" and "Smith", 2 
CollegeDepartment by "computer" and "science", and 
CollegeClass by "freshmen". So a top-down attempt is made to 
recognize each of these entities. We can assume these goals are 
fulfilled by simple top-down strategies, appropriate to the 
SurfaceRepresentation of the corresponding entities, and 
operating with no flexibility level increment. 
Having recognized the low-level fragments, the second 
continuation of RecognizeAnyEntity now attempts to unify them 
into larger fragments, with the ultimate goal of unifying them into a 
description of a single entity that spans the whole input. To do 
this, it takes adjacent fragments pairwise and looks for entities of 
which they are both components, and then tries to recognize the 
subsuming entity in the spanning segment. The two pairs here are 
CollegeStudent and CollegeDepartment (subsumed by 
CollegeStudent) and CollegeDepartment and CollegeClass 
(subsumed by CollegeCourse). 
To investigate the second of these pairings, RecognizeAnyEntity 
would try to recognize a CollegeCourse in the spanning segment 
'computer science for freshmen' using an elevated level of 
flexibility. This gGal would be handled, just like all recognitions of 
215 
CollegeCourse, by the NominalCaseFrame recognizer. With no 
flexibility increment, tiffs strategy fails because the head noun is 
missing. However. with another flexibility increment, the 
recognition can go through with the CcllegeDepartment being 
treated as an adjective and the CollegeClass being treated as a 
postnominal case -- it has the right case marker, "for", and the 
adjective and post-nominal are in the right order. This successful 
fragment unification leaves two fragments to unify -- the old 
CollegeStudent and the newly derived CollegeCourse. 
There are several ways of unifying a CollegeStudent and a 
CollegeCourse -- either could subsume the other, or they could 
form the parameters to one of three database modification 
commands: EnrolCommand, WithdrawCommand, and 
TransferCommand (with the obvious interpretations). Since the 
commands are higher level entities than CollegeStudent and 
CollegeCourse, they would be preferred as top.level fragment 
unifiers. We can also rule out TransferCommand in favour of the 
first two because it requires two courses and we only have one. In 
addition, a recognition of EnrolCommand would succeed at a 
lower Ile×ibility increment than WithdrawCommand, 3 since the 
preposition 'in' tilat marks the CollegeCourse in the input is the 
correct marker of the Enrolln case of EnrolCommand, but is not 
the appropriate marker for WithdrawFrom, the course-containing 
case of WithdrawCommand. Thus a fragment unification based 
on EnrolCommand would be preferred. Also, the alternate path of 
fragment amalgamation -- combining CollegeStudent and 
CollegeDepartment into CollegeStudent and then combining 
CoilegeStudent and CollegeCourse -- that we left pending above 
cannot lead to a complete instantiation of a top-level database 
command. So RecognizeAnyEntity will be in a position to assume 
that the user really intended the EnrolCommand. 
Since th~s recognition involved several significant assumptions, 
we would need to use focused interaction techniques\[7\] to 
present the interpretation to the user for approval before acting on 
it. Note that if the user does approve it, it should be possible (with 
further approval) to add 'place' to the vocabulary as a synonym for 
'enrol' since 'place' was an unrecognized word in the surface 
position where 'enrol' should have been. 
For a final example, let us examine an extragrammatical input 
that involves continuations at several different flexibility levels: 
Transfel Smith from Coi,~pter Science 101 Economics 203 
The problems here are that 'Computer' has been misspelt and the 
preposition 'to' is missing from before 'Economics'. The example 
is similar to the first one in that RecognizeAnyEntity is able to 
identify a top-level entity to be recognized top-down, in this case, 
TransferCommand. Like EnrolCommand, TransferCommand is an 
imperative case frame, and so the task of recognizing it is handled 
by the ImperativeCaseFrame strategy. This strategy can find the 
preposition 'from', and so can !nitiate the appropriate recognitions 
for fillers of the O.tOfCour~e and Student cases. The recognition 
for the student case succeeds without trouble, but the recognition 
for the OutOfCourse case requires a spelling correction. 
2We assume we have a complete listing of students and SO can index from their 
names. 
Whenever a top-down parsing strategy fails to verify that an 
input word is in a specific lexical class, there is the possibility that 
the word that failed is a misspelling of a word that would have 
succeeded. In such cases, the lexical lookup mechanism 
executes a split statement. 4 A zero increment branch fails 
immediately, but a second branch with a small positive increment 
tries spelling correction against the words in the predicted lexical 
class. If the correction fails, this second branch fails, but if the 
correction succeeds, the branch succeeds also. In our example, 
the continuation involving the second branch of the lexical lookup 
is highest on the agenda after the primary branch has failed. In 
particular, it is higher than the second branch of 
RecognizeAnyEntity described in the previous example, since the 
flexibility level increment for spelling correction is small. This 
means that the lexical lookup is continued with a spelling 
correction, thus resolving the problem. Note also that since the 
spelling correction is only attempted within the context of 
recognizing a CollegeCourse -- the filler of OutOfCourse -- the 
target words are limited to course names. This means spelling 
correction is much more accurate and efficient than if correction 
were attempted against the whole dictionary. 
After the OutOfCourse and Student cases have been 
successfully filled, the ImperativeCaseFrame strategy can do no 
more without a flexibility level increment. But it has not filled all 
the required cases of TransferCommand, and it has not used up 
all the input it was given, so it splits and fails at the zero-level 
flexibility increment. However, in a continuation with a positive 
flexibility level increment, it is able to attempt recognition of cases 
without their marking prepositions. Assuming the sum of this 
increment and the 3pelling correction increment are still less than 
the increment associated with the second branch of 
RecognizeAnyEntity, this continuation would be the next one run. 
In this continuation, the ImperativeCaseFrameRecognizer 
attempts to match unparsed segments of the input against unfilled 
cases. There is only one of each, and the resulting attempt to 
recognize 'Economics 203' as the filler of IntoCourse succeeds 
straightforwardly. Now all required cases are filled and all input is 
accounted for, so the ImperativeCaseFrame strategy and hence 
the whole parse succeeds with the correct result. 
For the example just presented, obtaining the ideal behaviour 
depends on careful choice of the flexibility level increments. 
There is a danger here that the performance of the parser as a 
whole will be dependent on iterative tuning of these increments, 
and may become unstable with even small changes in the 
increments. It is too early yet to say how easy it will be to manage 
this problem, but we plan to pay close attention to it as the parser 
comes into operatio n . 
3This relatively fine distinction between Enro\]Command and 
Withd~awCemmand. based on the appropriateness of the preposition 'in', is 
problem~',tical in that it assumes that the flexibility level would be incremented in 
very fine grained steps. If that was impractical, the final outcome of the parse 
would be ambiguous between an EnrolCommand and a WithdrawCommand and 
the user would have to be asked to make the discrimination. 
4If this causes too many splits, an alternative is only to do the split when the 
input word in question is not in the system's lexicon at all. 
216 
5. Conclusion 
Entity-oriented parsing has several ~dvantages as a basisfor 
language rueognilion in restricted domain natural language 
int.£\[faces. Like techniques based on semantic grammar, it 
ext~loits limited domain semantics through a series of domain- 
specific entity types. However, because of its suitability for 
fragmentary recogniticn and its ability to accornmodate multiple 
construction.specific parsing strategies, it has the i>otential for 
greater robustness in the face of extragrammaLical input than the 
usu\[;I semantic grammar techniques. In this way, it more closely 
resembles conceptual or case-frame parsi~lg tc{:t,niques. 
Moreover, entity-oriented pursing offers advanta.'jes h:, I:~ngua0e 
d~inition because of the integration of struchlr;tl anJ :aurfJ'c~ 
representutio~z information and the ability to ropr~ sent surta.'.e 
information in the form most convenient to drive co+zstruction. 
specific recogqifion strategies directly. 
A pilot implementation of a~ entity-oriented parser has been 
completed and provides preliminary support for our claims. 
t4owever, a more rigorous lest of the entity-oriented approach 
rnust wait for the more complete implementation <:urrently being 
undertaken. \]he agenda-style control structure we plan to use in 
this imptementath)~ is described above, along wilh some parsing 
sbateGies it will employ and some worked examples of the 
sbategies and control structure in action. 
Acknowler.igements 
I-he ideas in this paper benefited cousiderably from discussions 
with other membr~rs of the Multipar group at Carnegie-Mellon 
Cnraputer Science Department, parlicu!arly Jaimo CarbonelL Jill 
Fain, ..rod Ste,~e F4inton. Steva Minton was a co-dc~si§ner o! the. 
control stru<;tu+e ;~resented att)ov.~:, and also founrl :m efficient w:w 
to iruplement the split function de.'..cribed in coa+~ec+tion with that 
control structure. 
References 
1. Brown, J. S. and Bt;rton. R. I::l. Multiple Representations of 
"Q~owl~dgo for I utoriai Reasoning. In Repf(~s,'~nt;ttion and 
Uod~-:rstan'.'.'mrj, Bubr,,w, D. ,.G. and Collins, A., Ed.,Academic 
Press, New York, 1975, pp. ,311-349. 
2. Burton, R. R. Semantic Grammar: An Engineering Technique 
for Ccnstructing Natural I.ai%luae, ~ Understanding Systems. BBN 
Reporl 3453, Bolt, Beranek, and Newman, Inc., Cambridge, Mass., 
December, 1976. 
3. Carbonell, J. G., Boggs, W. M., Mau\]din, M. L., and Anick, P. G. 
The ×CAI.tBUR Project: A Natural Lan{luage Interface ~o Expert 
Systems. Prt;c. Eighth Int. Jt. Conf. on Artificial Intelligence, 
Karl.'~ruhe, August, 1983. 
4. Carbonell, J. G+ and Hayes, P.J. "Recovery Strategies for 
Parsing Extragrammatical Language." Com~utational Linguistics 
10 (t 984). 
5. Carbonell, J. G. and 14ayes, P. J. Robust Parsing Using 
Multiple Construction-Specific Strategies. In Natural Language 
Pcrsing Systems, L. Bole, Ed.,Springer-Verlag, 1984. 
6. Grosz, B. J. TEAM: A Transport\[~ble Nalural Language 
Interface System. Prec. Conf. on Applie(I Natural L:~n~tuage 
Processing, S'mta Monica, February, 198,3. 
7. Hayes P. J. A Construction Specific Approach to Focused 
h,teraction in Flexible Parsing. Prec. of 19th Annual Nl~-.,~ting of 
the Assoc. for Comp~Jt. ling.. Stanford University, June, 1981, pp. 
149-152. 
8. Hi:yes, P. J. and Ca~t:onell, J. G. lvtulti-Strategy P~r,~i+~g ~;nd its 
Role in \[~'obust Man. I~,tachin÷.~ Cnmmunicatio'.~. Carnegie-Mellon 
IJ~iversity Computer Sc~olJce Department. ,May, 1981. 
9. I'lendrix, G. G. Hum~.n Engine+;ring for At)plied Natural 
Language Processi~;g. Prec. Fifth Int. Jt. Conf. on Arlificial 
Into!l;genc,~., t,.;; r. 1077, pp. 183. ! 91. 
IO. i:hes;)e,.;;~. C. K. ao,-I Sch~-nk. R.C. Comprehension by 
C'ompuLr~r: Expectation.\[lase, l An;.tly:,;3 el S~nteac+~G irt Context. 
rech. Ru'pL 7~5, C, omputc;r Science Dept., Y£1e Uoiveruity, 1976. 
1 I. W~lks, ?. A. Prefere:-,ce Semantics. In F-ormal Semantics of 
IV~tural L~.ngu:zge , Keer;an, k(I..Can}bridge University Press, 1975. 
217 
