A MULTILEVEL APPROACH TO HANDLE 
NON-STANDARD INPUT 
Manfred Gehrke 
Project "Prozedurale Dialogmodelle" * 
Department of Linguistics and Literature 
University of Bielefeld 
P.O.Box 8640, D-4800 Bielefeld 1 
"da kommen sic doch ungefaehr 
ganz bestimmt hln." 
from one of our dialogues 
ABSTRACT 
In the project "Procedural Dialogue 
Models" being carried on at the University 
of Bielefeld we have developed an Incre- 
mental multilevel parsing formalism to 
reconstruct task-oriented dialogues. A 
major difficulty we have had to overcome 
is that the dialogues are real ones with 
numerous ungrammatical utterances. The 
approach we have devised to cope with this 
problem is reported here. 
I THE INCREMENTAL, MULTILEVEL PARSING 
FORMALISM 
In recent NLU-systems a major impor- 
tance is lald on processing non-standard 
input.l) The present paper reports on the 
experiences we have made in the project 
"Procedural Dialogue Models" reconstruc- 
ting task~oriented dialogues, which were 
uttered in a rather colloquial German.2) 
To this aim we have developed an incre- 
mental multilevel parsing formalism (Chri- 
staller/Metzlng 82, Gehrke 82, Gehrke 83), 
based on an extension of the concept of 
cascaded ATNs (Woods 80). This formalism 
(see fig. A) organizes the interaction of 
several independent processing components, 
in our case 5. The processing components 
need not be ATNs; it is up to the user of 
the formalism to choose the tool for the 
specific task that suits her/hlm best. 
* The project is funded by the Deutsche 
Forschungsgemeinschaft. 
I) See e.g. session VIII in ACL 82, Car- 
bonell 83, Kwasny 80, 'Sondheimer/Wei- 
schedel 80; for handling of ellipsis 
see Weischedel/ Sondheimer 82, Wahlster 
et al. 83. 
2) The dialogues that we are working with 
were recorded in the City of Frankfurt/ 
Main (Klein 79). 
The first level, an ATN, is responsible 
for the syntactic analysis. Its main put ~ 
pose is to detect phrases as well as wh~ 
and imperative structures and to determine 
the syntactic status a phrase may have in 
the utterance. On this level the analysis 
of an utterance can reach a permissible 
final state even if there is no complete 
sentence structure derived. The decision, 
if permissible or not, is made on the 
pragmatic level. 
The semantic interpretation is carried 
out by a case-oriented production rule 
system. According to the incremental man- 
ner of processing there are two defini ~ 
tions of case slots: 
i. a general one for a tentative categori- 
zation of phrases before the main verb 
is detected, and 
2. a specific one, connected with the 
respective verb frame. 
This double definition of case slots en- 
ables the parsing formalism to make a 
minimal interpretation of parts of the 
utterance in the case of a missing verb 
and thus gives suggestions for filling 
this gap. 
The QUESTION-ANSWER-INTERACTION~compo~ 
nent is an ATN. It has to categorize an 
utterance as a question, a part of an 
answer or as communication maintaining 
categories such as assurance, confirmation 
etc. This component is also responsible 
for recognizing a dialogue within in a 
dialogue when e.g. some clarification on 
that dialogue takes place. 
Finally the TASK-COMMUNICATION-compo- 
nent is itself a two-level cascade. One 
stage, the TASK-INTERACTION-component, 
provides the formalism with a dialogue 
scheme that presumably is applicable to 
most types of information-giving dialo- 
gues. The other stage, the TASK-SPECIFICA- 
TION-component, is responsible for the 
183 
SYNTACT/C- ~ 
COMPONENT "-~ I 
I 
I 
SEMANTIC- 
COMPONENT ~ .... 
I 
QUESTION- 
ANSWER- 
~-~,~~ INTERACT/ON" 
COMPONENT 
addresser's 
KS 
addressee's 
KS 
t 
I 
........ I I 
TA SK-INTERAC T ION- 
COMPO NENT { 
TASK- SPECIF/CA - 
T ION "COMPONENT 
common KS 
ufferance 
: :: fransmif ~ fransfer of confro! 
-- o := read, resume 
-.. :: wrife, gef ~ " " dafa 
info/out of KSs 
Fig. A: Archifecfure of fhe Forma(ism 
184 
task-specific categorization, in this case 
direction giving with categories such as 
route description or place description. We 
divided this component into two stages 
which are both realized as ATNs, 
I. in order to have a greater modulariza- 
tion between different components (pro- 
cessing other types of task-oriented 
dialogues may require only to change 
the TASK-SPECIFICATION-component on the 
pragmatic level.), and 
2. because each level contributes one 
category to the utterance or a part of 
it, which avoids double categorizations 
at one level. 
The pragmatic components are supported 
by knowledge sources (KS) that hold for 
each participant about his knowledge of 
the world, the partner and the course of 
the dialogue dependent of the task. The 
processing components exchange their re- 
sults via a common KS (a kind of a black- 
board). Only control information is trans- 
mitted by the cascade. The parsing forma- 
lism is written in MacLISP and in FLAVORS 
(diPrimio/Chrfstaller 83) - an object- 
oriented language embedded in MacLISP. 
II The Dialogue Corpus 
The dialogues that we are dealing with 
are real task-oriented dialogues. The 
majority of utterances in these dialogues 
contain non-standard constructions or are 
in some sense incomplete. There are dia- 
lect words, word duplications, self-cor- 
rections and interjections. On the other 
hand they do not contain complicated sen- 
tence structures such as subordinations, 
complex noun-phrases, etc. The translation 
of one of our dialogues (see fig. B) may 
give a little impression of these non- 
standard features. 
An extreme approach to the solution of 
the problem of non-standard utterances 
would be, in our case, to take the dialo ~ 
gues in the corpus as they are as stan ~ 
dard. But this would only be an ad ho~ 
solution, lacking generality. Thus we 
burden the pragmatic components with the 
decision whether an utterance is accept- 
able or not. 
III HANDLING OF NON-STANDARDS ON THE 
WORD LEVEL 
Dialect words are handled as words of 
the standard speech, i.e. they occur in 
the lexicon. Duplication of words is re- 
cognized during the read process t ~heTc~e 
actual word is compared with its predeces- 
sor. If they are identical and if they 
belong only to one syntactic category, 
then the next word is processed directly. 
Otherwise a flag is set, stating that 
there is possibly a duplication of words 
to analyse. Such words are analysed as 
usual, but the syntactic category of the 
predecessing word may not be used. This 
condition may cause a new problem, namely 
X: Could You please tell me, how I can come to the old opera? to 
y: What? 
X: the old opera y: to the old opera; straight ahead, yes. Come on, I show 
X: yes, yes (I0 sec. pause) 
Y: it to you. ahead to the Kaufhof. To the 
X: yes 
Y: right there is the Kaufhof, isn't it? and there you stay on the 
X: yes, the eh 
Y: right side, straight on through the Fressgass" it is new 
X: eh mhm 
Y: it's just in a new shape, the Fressgass', yes then you will 
X: thank you 
Y: reach directly the opera square, that is the opera ruin. 
X: very much. 
Y: 
Fig. B: a sample translation 
185 
when a participial construction occurs 
within a noun-phrase, e.g. "die die Stras- 
se ueberquerende Frau". Comparable to this 
problem are constructions in English that 
begin with "that that ...". Luckily such 
constructions do not occur in our corpus , 
but this prob~lem has to be kept in mind. 
If the analysis runs into an error, then 
the status quo ante is reestablished and 
the actual word is dlscarded as a duplica- 
tion. 
Cases of self-correctlon on the word 
level, when a word is replaced by another 
word of the same syntactic category or the 
same word with an altered inflection, are 
recognized during the read process as 
well. They can be treated in a similar way 
with the difference being, that the pre- 
ceeding word is discarded and the diffe * 
ring features of the actual word are taken 
but no rules are without exceptions. The 
rare case of two suceeding nouns, e.g. in 
proper names (names of streets or buil- 
dings) is captured in the lexicon, while 
groups of prepositions or adverbs are 
permissible. 
IV HANDLING OF INCOMPLETE UTTERANCES 
To handle utterances that are in some 
sense incomplete we have the great advan ~ 
rage that they have been uttered in a 
specific context. A linguistic analysis of 
the dialogues shows furtheron that some 
types of answers, especially route des ~ 
criptions und partial goal determinations, 
have a preference for being elliptificat- 
ed. In the cases mentioned the degree of 
elllptification ranges from omitting the 
facultative SOURCE case slot to omitting 
the AGENT case slot up to uttering only a 
GOAL case slot. 
Due to the incremental manner o6 par ~ 
sing, as soon as a partial analysis of an 
utterance is obtained the SEMANTIC-compo- 
nent is triggered. There a phrase is ten 4 
tatively categorized, depending on case 
markers (ending, preposition); auxiliary 
verbs mark tense or mood, etc. Some deic- 
tic adverbs such as "hier" ("here") could 
act as a SOURCE case slot for MOVE-verbs. 
Categorized phrases are sent to the QUEST- 
ION-ANSWER-INTERACTION-component. 
When the end of an utterance is recog- 
nized (sentence markers; colons can act as 
end markers too), then the SEMANTIC-compo- 
nent tests for completion. If a main verb 
and/or a obligatory case slot is missing, 
then a procedure is triggered to fill this 
gap. This inference procedure fir~:t in- 
spects the actual states of the pragmatic 
components to gather information as to 
which categories they expect next and 
wether the partial analysis fits into the 
requirements of the respective category. 
This information is then used by various 
inference rules to fix the missing verb or 
case slot. 
Let us consider some examples: 
i. "vor bis zum Kaufhof." ("ahead to the 
Kaufhof") 
Expectations of the pragmatic compo- 
nents: 
QUESTION-ANSWER- 
INTERACTION-comp.: answer 
T.ASK-INTERACTION~ 
comp.: an act of 
information~giving 
TASKISPECIFICATION ~ 
comp. : route-,place description, 
partial goal determination, 
goal declaration 
SEMANTIC~comp. : "zum Kaufhof" is care ~ 
gorized as a GOAL case slot. 
The categories goal declaration and 
place description can be discarded, 
because their requirements are not 
matched. Since an explicit goal (buil~ 
ding, street connection etc.) is utter- 
ed the requirements of partial goal 
determination are fulfilled first. This 
category requires a verb of the field 
MOVE, e.g. "gehen" ("to go"). The GOAL 
case slot matches one of the require- 
ments of the verb, but an AGENT is 
still missing. Since the utterance is 
part of a dialogue and it is directed 
from the person, who is asked to give 
a direction, to that person, who had 
asked for the direction, a reference to 
the last person, "sie" ("you"), is 
taken as AGENT. 
2. "gradaus dutch die Fressgass'" 
("straight on through the Fressgass'") 
The expectations on the pragmatic com- 
ponents are the same as above. "dutch 
die Fressgass'" is categorized as a 
PATH case slot. In this case a route 
description is proved first and again a 
MOVE-verb is taken as a candidate for 
the verb. The PATH case slot matches 
with its requirements and the adverb 
"gradaus" is a possible description of 
the way of MOVing. The AGENT case slot 
is found as above. 
3. At last a very funny example. One of 
our dialogues starts with the following 
sequence: 
X: to the old opera? 
Y: Yes? 
186 
Here Y must have recognized, presumably 
by eye contact, that X wants to get 
into contact with him. X's answer, 
itself a question, is quite unpollte 
but understandable. Syntactically this 
utterance is an elliptical question 
(voice rising, when uttered) and on the 
semantic stage it can be categorized as 
a GOAL case slot, depending on "zur" 
and the fact that the NP refers to a 
building. Since it is at the beginning 
of a task-oriented dialogue with no 
task fixed until now, it is categorized 
as a de~i.af~o~i{,'c~lo.. A complete ver- 
sion of this utterance may be 
"How can, I get to the old opera?" 
Another possible interpretation may be 
that X only wants to be confirmed in 
her/hls assumption that he/she is on 
the right way to his goal. In this case 
a correct answer would have been simply 
"yes". But a decision which interpreta- 
tion holds true can not be made with 
the available information. 
V Conclusion 
It has been shown how some types of 
ill4formed input are handled, especially 
with the help of semantic constraints and 
pragmatic considerations. At present, our 
work in this field is laid on handling 
selfocorrections above the word level, as 
you will find one in llne 5 of the sample 
translation. 
Acknowlegdements 
I would llke to thank D. Me,zing, T. 
Christaller and B. Terwey without whose 
cooperation this work would not have been 
possible. 
References 
ACL 82 
Proc. of 20th Annual Meeting of the 
Association for Computational Lingu- 
istics, Toronto, 1982 
Carbonell, J.G. 
"The EXCALIBUR project: A natural lan- 
guage interface to expert systems", in: 
Proc. 8th IJCAI Karlsruhe 1983, Los 
Altos, Ca. 1983 
Chrlstaller, T., Me,zing, D. 
"Parsing Interaction: a multilevel par ~ 
set formalism based on cascaded ATNs." 
in: Sparek-Jones, K., Wilks, Y. (eds.), 
Automatic Natural Language Parsing, 
Chlchester, 1983 
Gehrke, M. 
"Rekonstruktion aufgabenorlentierter 
Dialoge mit einen mehrstufigen Parsing ~ 
Algorithmus auf der Grundlage kaska- 
dierter ATNs", in: W. Wahlster (ed.), 
Proc. of 6th German Workshop on AIp 
Berlln-Heidelberg~New York, 1982 
Gehrke, M. 
"Syntax, Semantics and Pragmatics in 
Concert: an incremental, multilevel 
approach in reconstructing task-oriented 
dialogues", in: Proc. 8th IJCAI Karlsru- 
he 1983, Los Altos, Ca., 1983 
Klein, W. 
"Wegauskuenfte", Zeitschrift fuer Lin~u~ 
istik und Literaturwissenschaft, 9: 
9~57, (1979) 
Kwasny, S.C 
Treatment of ungrammatical and extra~ 
grammatical phenomena in natural langu- 
age understanding systems, Indiana Uni- 
versity, 1980 
di Primio, F., Christaller, T. 
A poor man's flavor system, ISSCO, Gene~ 
va, 1983 
Sondheimer, N.K., Weischedel, R.M. 
"A rule based Approach to Ill-formed 
Input", in: Proc. of COLING 80, Tokyo, 
1980 
Wahlster,W., Marburger,H., Jameson,A., 
Busemann,S. 
"Over'Answering Yes-No Questions: Exten- 
ded Responses in a NL Interface to a 
Vision System", in: Proc. 8th IJCAI 
Karlsruhe 83, Los Altos, Ca., 1983 
Weischedel, R.M., Sondheimer, N.K. 
"An Improved Heuristic for Ellipsis 
Processing", ~CL 82, 85-88 
Woods, W.A. 
"Cascaded ATN Grammars", Journal of ACL, 
6: 1 (1980), 1-13 
187 
