Parsing 
w. A. Martin 
Laboratory for Computcr Science 
Massachusetts Institute of Technology 
Cambridge, Massachu.~tts 02139 
\[.ooking at the Proceedings of last year's Annual Meeting, one sccs that the 
session most closely parallcling this one was entitled Language Structure and 
Par~ing. \[n avcry nice prescnu~fion, Martin Kay was able to unite the papers of 
that scssion uudcr a single theme. As hc stated it. 
"Illcre has been a shift of emphasis away from highly 
~tmctured systems of complex rules as the principal 
repository of infi~mmtion about the syntax of a 
language towards a view in which the responsibility 
is distributed among the Icxicoo. semantic parts of 
the linguistic description, aod a cognitive or strategic 
component. Concomitantly, interest has shiRed 
from al!lorithms for syntactic analysis and 
generation, in which the central stnlctorc and the 
exact seqtlencc of events are paramount, to systems 
iu which a heavier burden is carried by the data 
stl ucture and in wilich the order of,:vents is a m,.~ter 
of strategy. 
\['his ycar. the papers of the session represent a greater diversity of rescan:h 
directions. The paper by Hayes. and thc paper by Wilcnsky and Aren~ arc both 
examples of what Kay had in mind. but the paper I)y Church, with rcgard to the 
question of algorithms, is quite the opposite. He {tolds that once the full range 
uf constraints dcscribing pc~plc's processing behavior has been captul'ed, the 
best parsing strategies will be rather straightforwarcL and easily cxplaincd as 
algorithms. 
Perilaps the seven papers in this year's session can best be introduecd by briefly 
citing ~mc of the achievcmcqts and problems reported in the works they 
refcrence, 
In thc late i960"s Woods tweeds70\] capped an cfTort by several people to 
dcvch)p NI'N parsing. 'lllis well known technique applies a smdghtforward top 
down, left CO right` dcpth fic~t pat.~ing algorithm to a syntactic grammar. 
I-:~pccialiy in the compiled fi)rm produced by Ilorton \[Bnrton76~,\]. the parser 
was able to produce the first parse in good time. but without ~mantic 
constraints, numcroos syn~ictic analyses could be and ~,mctimcs were fou.nd, 
especially in scntenccs with conjunctions. A strength of the system was the 
ATN grammar, which can be dc~ribcd as a sct of context frec production rules 
whose right hand sides arc finite statc machincs and who.~ U'ansition arcs have 
bccn augmented with functions able to read and set registers, and also able to 
block a transition on their an:. Many people have found this a convenient 
fonnulism in which m develop grammars of Engtish. 
The Woods ATN parser was a great success and attempts were made to exploit 
it (a) as a modc\[ of human processing and (b) as a tool for writing grammars. At 
the same time it was recognized to havc limimdoos. It wasn't tolerant of errors, 
and it couldn't handle unknown words or constructions (there were n~'tny 
syntactic constmcdons which it didn't know). In addidon, the question 
answering system fed by the parser had a weak notion of word and phrase 
.~emantics and it was not always able to handle quantificrs properly. It is not 
ctcar thcs¢ components could have supported a stronger interaction with 
syntactic parsing, had Woods chosen to a~cmpt it. 
On the success side. Kaplan \[Kaplan72\] was inspired to claim that the ATN 
parser provided a good model tbr some aspects of human processing. Some 
aspects which might bc modeled are: 
Linnuistic Phenomenon 
Prefcrred readings of 
Ambiguous Sentences 
Garden ~th Sentences 
Perceived Complexity 
Differences 
Center Embedding Pounds 
A'rN Comnntadonal Mechanism 
Ordcred Trying of 
Alternative Arcs 
Back-tracking 
Hold List Costing 
Counting Total Transitions 
None 
\[n one study, most pcople got the a) reading of 1). One can try to explain des 
l) Thcy told the girl that Bill liked the story. 
la) They told the girl \[that \[Bill liked the scoryJs \]. 
lb) Th~ told \[the girl that Bill likedlN P the 
story. 
by ordering the arcs leaving the state where the head noun of'an NP has been 
~'ccpccd: a Ix)p am (tcrminuting the NP) is tried before an an: accepting a 
modifying relative clause. \]-h)wcver, Ricil \[Rich75\] puims out that dfis an: 
ordering solution would seem to have diltlculdcs with 2). This sentence is often 
nut peracived 
2) They told the girl that Bill liked that he 
would be at the loath;all game. 
as requiring backup, yet if the arcs an: ordered as for I), it does require backup. 
There is no doubt that whatever is going on. the awareness of backup in 3) is so 
much stronger than in 2) that it seems like a different phenomcnoo. To resolve 
this, 
3) The horse raced past the b,'u'n fell. 
one could claim that perceived backup is some fimction of' the length of the 
actual b~kup, or maybe of the degree of commiunent to the original path 
(althoogh it isnt clear what this would mean in ATN terms). 
In this session. Ferrari and Stock will turn the are ordering game around and 
describe, for actual tex~ the probability that a given arc is the correct exit an: 
from a node. given the an: by wiuch the parser arrived at the node. \[t will be 
intcr~ting to look at their distributions. \[n the speech project at IBM War, sou 
Laboratories \[Baker75\] it was discovered some time ago that, for a given text, the 
syntactic class era word could be predicted correctly over 90% of the umo given 
only the syntactic class of the preceding word` Interestingly, the correctness of' 
predictions fell off less than 10% whcn only the current word w~ used. One 
wonders if this same level of skewncss holds across texts, or (what we will hear) 
for the continuation of phrases. These results should be helpful in discussing 
the whole issue of arc orderiog" 
Implicit in any al~ ordering strategy is the assumption that not all parses of a 
sentence will be fi)und. Having the "best" path, the parscr will stop wben it gets 
an acceptable analysis. Arc ordering helps find that "best' path. Marcus 
\[Man:us7g\], agreed with the idea of following only a best path, but he claimed 
that the reason there is no pe~eived backup in 2) is that the human parser is 
able to look ahead a few constituents iostead of just one s~ate and one 
eoilstitucnt in making a u'ansition. He claims this makes a more accurate model 
of human garden path behavior, but it doesn't address the issue of unlimited 
stuck depth. Here, Church will describe a parser similar in design co Marcus', 
except that it conserves memory. This allows Church to address psychological 
facLS not addrc~qed by either Marcus or the ATN models. Church claims that 
exploiting stack size constraints will incn:ase the cimnces of building a good best 
path parser. 
91 
Besides psychological modeling, thcre is also an interest m using thc ATN 
ft)nnalism for writing and teaching grammars. Paramount here is e:;planation, 
both of the grammar and its appiicatinn to a particu!ar sentence. The papcr by 
Kchler and Woods reports on this. Weischcdcl picks a particular problem, 
responding to an input which the ATN can't handle. He a~,'xiatcs a list of 
diagnostic couditions and actions with each state. When no pur.xc is found, the 
parser finds tile last store on the path which progressed the thnhcst d)rongh the 
input string and executes its diagnostic conditions and actions. When a parser 
uses ,rely syutactic constraints" one cxpects it to find a lut of parses. UsuuJly the 
number of parses grows marc than tincarly with sentence length. Thus, for a 
~tirly COmlflete grammar and moderate to king sentences, one would expect 
that the cast of no parses (handled by Wei.%hedcl) would be rare in comparison 
with the oilier two cases (not handled) where file set of parses doesn't include 
the correct one, or where the grammar has been mistakenly, written to allow 
undesired pa!~s" Success of the above eflol'ts to folinw only the best path 
would clearly be relevant here. No doubt Wcischcdel's proeedure can help find 
a lot of bugs if die t~t examples are chosen with a little care. Ihtt there is sdll 
interesting work to be done on grammar and parser explanation, and 
Weisehcdcl is onc of those who intends to explore it 
The remaining three papers stem from three separate traditions which reject the 
strict syntactic ATN formalism, each for its own reasons. They are: 
i) Semantic Grammars -- the Davidson and 
Kaplan paper 
ii) Scmantic Structure Driven Parsing - 
Wilcnsky and Arens paper 
iii) Multiple knowledge Source Parsing -- Hayes 
paper 
Each of these systems claims some advantage over the more widely known and 
accepted ATN. 
The somandc grammar parser can be viewed as a variation of the ATN which 
attempts to cope with the ATN's lack of semantics. Kapian's work builds on 
work stancd by Burton \[Burton76b\] and picked up by Hcndrix et al 
\[ltendrtx78J. The semantic grammar parser uses semanuc in.;tcad of syntactic 
arc categories. "l'his collapses syntax and semantics into a single structure. 
When an ATN parsing strategy is used the result is actuall7 ~ flexible than a 
syntactic ATN, but it is faster because syntactic possibilities are elin'*in;tted by 
the semantics of the domain. "Ilm strategy is justified m terms of the 
pcrfum'*ancc of actual running systems. Kaplan also calls on a speed criteria in 
suggest,og (hat when an unkuown word is cncountcred the system assomc all 
possibilities which will let parsing prncccd. Theo if more than one possibility 
leads to a successful parse, the system should attempt to rt,~olve the word fi.trthcr 
by file search or user query. 
As Kaplan points nut. d)is trick is not limited to semantic grammars, but only to 
systems having enough constraints. It would hc interesting to know hOW w(:. it 
woutd work for systems using Oshcrson's \[Oshcrson78\] prcdicahility criterion. 
instead of troth for their scmanocs. Oshcrson distinguishes between "green 
idea", which he says is silly and "marricd bachelor" which he say~ is just raise. 
Hc ilotes that "idea is oat green" is no better, but "bac\[~ehlr is not married" is 
fine. Prcdicability is a looser constrain* than Kaplan uses, aud if it would still be 
cuough to limit database search this wo. "l bc intcrcv;ng, because prcdicability 
is easier to implement across a broad domain. 
Wilen~ky is a former stu,:tent of Schank's and thus COlt'*us ffom a tradition which 
emphastzes sentatmcs over syutax. He ~s right in emphasizing Ore importance 
of phrase scmantics. The grammarians Quirk aud Grcenhaum \[Quirk731 poiut 
out tile syntactic ,ll'*d semantic importaucc of verb phrases over verbs.- in 
hngutstms, lhesnan \[Ih'csnang0l is developing a theory of Icxical phrases which 
92 
accounts" by lcxical relatkms between constituents (if a phrase, for many of the 
phenomena explained by the old transfomtational grammar. }:or example. 
given 
4) There were reported to have been lions 
sighted. 
a typical ATN parser would attempt by register manipulations to make "lions" 
the suhject. Using a phrase approach, "there be lions sighted" can be taken as 
meaning "exist lions sighted." wl)erc "lions" is an object and "sighted" an object 
complement "There" is related to the "be" m "been" by a series of 
relationships between the argumentS of semantic structures. Wilensky appears 
to have suppressed syntax into his semantic component, and so it will be 
inrct~ting to sec how he handles the traditional syntactic phenomcna of 4), like 
passive and verb forms. 
Finalb, the paper by Hayes shows the influence of the speech recognition 
projects where bad input gave the Woods A'rN great dimcnlty. Text input is 
much better than speech input. However, examination of actual input 
\[Malhotra75\] does show sentences like: 
5) What would have profits have been? 
Fortunately, these cases are rare. Much more likely is clipsis and the omission 
of syntax when the semantics are clear. For example, the missing commas in 
6) Give ratios of manufacturing costs to sales 
for plants 1 2 3 and 4 for 72 and 73. 
Examples like these show that errors and omissions are not random phenomena 
and that there can be something to the study of errors and how to deal with 
diem. 
In summary, it can be seen ~at while much progress has been made in 
consmtcting u~bic parsers, the basic i~ues, such as the division of syntax. 
semantics" and pragmatics both in representation and in urdcr uf processing, are 
still up for grabs. 'l'be problem has plenty of structure, so there is good fun to 
be had. 
References 
\[Ikukcr751 
\[llresnang0\] 
\[Burton76aj 
\[Burmn76bl 
\[Hcndrix73\] 
Baker. J.K. "Stochastic Modeling for 
Automatic Speech Understanding," Sneech 
Rceoeuition." lnvi\[~ ~ Pap~r~ ~ ~ IEEE 
SvmnosiurTL Reddy, D.R. (E'kt.), \]975. 
Bresnan. Joan. "Polyadicity: Part I of a 
Theory of l.exical Rules and 
Rcpreseflmtions," MI'\[" Department of 
Linguistics (January 1980). 
Burton. Richard R. and Woods, William A. 
"A Compiling System fnr Augmented 
Transition Networks," COLING 76. 
Burton. Richard R. "Semantic Grammar: An 
Engineering Technique \[or Constructing 
Natural I~mguage Undcr~tanding Systems," 
BBN Report 3453, Bolt. Beranek, and 
Newman, Boston, Ma. (December D76). 
Hendrix, Gary G. Sacerdoa, E.D., 
Sagalowicz. D.. and Slocum. J. "l)cveloping a 
Natural I.anguage Interface to Complex 
l')ata," ACM l"rans, ~ Dqf.ahase Systems. vo\[. 
3, no. 2 (June 1978). pp. 105-147. 
\[Kaplan72\] 
\[Malhotra751 
\[Marcus7Sl 
\[O~erson7Sl 
\[Quirk731 
IRich751 
\[Woods70l 
Kaplan, Ronald M. "Augmented Transition 
Networks as Psychological Models of 
Sentence Comprehension," Artificial 
Intcllieenee, 3 (October 1972). pp. 77-100. 
Malhotra. Ashok. "l)esign Critcria for a 
Knowlcdgc-Based English Language System 
for Management: An Experimental 
Analysis," MIT/LCS/rR- 1.46, MIT, 
Laboratory for Computer Science, 
Cambridge. Ma. (February 1975). 
Marcus` Mitchell. "A Theory of Syntactic 
Recognition for Natural l.'mguages," Ph.D. 
thesis. MIT Dept. of Electrical Engineering 
and Computer Science, Cambridge, Ma. (to 
be published by MrT Press). 
Oshcrson, Danicl N. "Three Conditions on 
Conceptual Naturalness." Cognition, 6 (197g), 
pp. 263-289. 
Quirk. R. and Greenbaum. S. A Concise 
Grammar o~Ctmiemnorarv F.nnlisll, Harcourt 
Brace Jovanovich. New York (L973). 
Rich, Charles. "On the Psychological Reality 
of Augmented Transition Network Models of 
Sentence Cumprehension," unpublished 
paper, MIT Artilicial Intelligence I.aboratory, 
Cambridge, Ma. (July \[97S). 
Woods. William A. "Transition Network 
Grammars for Natural Language Analysis" 
CACM 13. 10 (October 1970), pp. 591-602. 
93 

