CASE ROLE FILLING AS A SIDE EFFECT OF VISUAL SEARCH 
Heinz Marburger 
Research Unit for Information Science 
and Artificial Intelligence 
University of Hamburg 
Mittelweg 179 
D-2000 Hamburg 13, F.R. Germany 
Wolfgang Wahlster 
FBI0 - Angewandte Mathematlk 
und informatlk 
University of SaarbrOcken 
Im Stadtwald 
0-6600 Saarbr0cken 11, F.R. Germany 
ABSTRACT 
This paper addresses the problem Of generating 
communicatively adequate extended responses in the 
absence of specific knowledge concerning the 
intensions of the questioner. We formulate and 
justify a heuristic for the selection of optional 
deep case slots not contained in the question as 
candidates for the additional information con- 
tained in an extended response. It is shown that, 
in a visually present domain of discourse, case 
role filling for the construction of an extended 
response can be regarded as a side effect of the 
visual search necessary to answer a question con- 
taining a locomotion verb. The paper describes the 
various representation constructions used in the 
German language dialog system HAM-ANS for dealing 
with the semantics of locomotion verbs and illus- 
trates their use in generating extended responses. 
In particular, we outline the structure of the 
geometrical scene description, the representation 
of events in a logic-oriented semantic representa- 
tion language, the case-frame lexicon and the 
representation of the referential semantics based 
on the Flavor system. The emphasis is on a 
detailed presentation of the application of 
object-oriented programming methods for coping 
with the semantics of locomotion verbs. The pro- 
cess of generating an extended response is illus-. 
trated by an extensively annotated trace. 
1. INTRODUCTION 
Frequently a questioner expects more than a 
direct, literal response although he must assume 
that the answerer is not informed about what par- 
ticular information he is seeking. The questioner 
imputes to a cooperative dialogue partner the com- 
municative competence to reply to a simple yes-no 
question like (I) with an extended response (cf. 
(12\], (11\]) like (la) rather than with a simple 
Yes. 
(I) Are you going to travel this summer? 
(la) Yes, to Sicily. 
In the absence of special information about the 
previous course of the dialog or the intentions of 
the questioner (the unmarked case) an answer like 
(la) seems more appropriate than (Ib) or (Ic). 
(Ib) Yes, with an old school friend. 
(Ic) Yes, by plane. 
OF course, there are numerous dialog situatlons in 
which (lb) os" (lc) could be generated as a commun- 
icatively adequate response on the basis of a par. 
t±cular partner model. But it still must b~ asked 
why in dialogs of the type 'information ,upply' 
the unmarked response takes the form (la) ~nd not 
(lh) or (lc). 
In this paper we will present the results of a 
computational study of this problem for the domain 
Research on HAM..ANS Is currently being supported 
by the German Ministry of Research and Technology 
(8MFT) under contract 081T15038 
'locomotion verbs' in dialogs based on a visually 
present world of discourse. This question is par- 
ticularly important for the construction of 
cooperative dialo 9 systems, since, in many appli- 
cations, no explicit knowledge about the dialog 
goals of the questioner is available at the 
outset. If a,system is nevertheless expected to 
'over-answer , i.e. to volunteer information that has not specifically been requested, it must com- 
mand a set of heuristic criteria for selecting the 
additional information that is to be verbalized 
\[111. 
It is noteworthy that the three additional points 
of information in (la), (lb), (1c) correspond to 
filled deep case slots of the verb used in the 
question (GOAL, CO-AGENT and INSTRUMENT, respec- 
tively). This suggests that the unfilled optional 
case slots in the question are candidates for 
additional information. For a question like (2), 
in which all the deep case slots of 'break' are 
filled, only a direct response like (2a) is to be 
expected as a positive answer, while in (3), where 
only the obligatory deep case slots are filled, an 
extended response like (3a) can be expected. 
(2) Did you break the window with your slingshot 
yesterday? 
(2a) Yes. 
(3) Did you break the window? 
(3a) Yes, with my slingshot. 
Since not every optional deep case of a given verb 
unspecified in the question is suitable for an 
unmarked extended response (e.g. (la)-(lc)) we may 
define the problem more precisely by asking which 
of the deep case slots unspecified in the question 
are to be chosen as the unmarked values. 
For our domain Of investigation 'locomotion verbs' 
let us consider questions (4) and (5), which refer 
to a visually present world of discourse. In each 
case perceptual processes are assumed as a prere- 
quisite for the answer. 
4) Which vehicle stopped? 
&a) The bus, on Hartungstreet. 
4b) The bus, because the driver stepped on the 
brake. 
5) Did the bus turn off? 
5a) Yes, from Hartungstreet onto Schlueterstteet. 
5b) Yes, together with the taxi cab. 
The instantiation of the iocatlve slot in answer 
(4a) and the source and goal slots in (Sa) is 
predictable in contrast to the causative slot in 
(4b) and the co agent slot in (5b). As examples 
(4) and (5) demonstrate, the same optional deep 
case slot is not always selected as the unmarked 
option. The choice is dependent upon the verb con- 
tained in the question. Moreover, {Sa) shows that 
combinations of deep cases are possible in 
unmarked extended responses, 
In the area under investigation here, the follow- 
ing heuristic carl be employed to determine the 
188 
FAMILIAR WITH 
SCENE BUT 
CANNOT SEE IT 
PDP-IO 
NL DIALOG SYSTEM 
HAM 
ANS 
IMAGE SEQUENCE 
ANALYSIS SYSTEM 
NAOS 
MORIO 
\] IL 
STREET INTERSECTION 
Fig. 1: Situational context of the dialog 
selection of the deep case slots for an um~marked 
extended response: Select the deep case slots 
which contain the concepts necessary for the per- 
ceptual verification of the motion descrlbed by 
the verb. 
In order to verify a stop-event it is necessary to 
determine the end point of the motion (Cf. (4a)) 
but not the cause (cf. (4b)). For a turn-off event 
a change of direction between source and goal must 
be established (cf. (Sa)). It is not essential to 
determine whether other objects make this change 
of direction at the same time (cf. (Sb)). 
Hence case role filling for the construction of an 
extended response can be regarded as a side effect 
of the visual, search necessary to answer the ques- 
tion. 
This also appears plausible when seen in the light 
of the beliefs that the questioner imputes to the 
answerer. The questioner believes that the 
answerer will fill in the case sluts necessary for 
answering the question and that it is therefore 
unnecessary to explicitly mention these in the 
question. Additionally the questioner believes 
that the answerer believes that the questioner 
expects an extended reply and fur this reason did 
not explicitly request the additional information. 
A cooperative dialog system fulfills this user 
expectation by applying the heuristic formulated 
above. 
A prerequisite for the application of this heuris- 
tic is that \[he system have knowledge about which 
deep case slots are relevant for the verification 
OF a movem~mt. This prerequisite is not met by 
most natural language (NL) systems since they sim- ply represent events in the domain or discourse in 
fully instant~ated Form using case frames, e.g. as 
part of a semantic net or frame hierarchy. In con- 
trast, the G,,rman language dialog system HAM-ANS 
(Hamburg application--oriented natural language 
system) \[6\], which we have developed, can apply 
this heuris(~c because in addition to the case 
frame of each verb the system includes a represen- 
tation of the referential semantics of predica- 
tions associated with that verb which makes it 
possible to ~valuate the ViSual input data for the 
movement in question. 
The goal of this article is to elucidate the representation constructions for case frames and 
referential semantics of verbs of motion used in 
HAM-ANS and to illustrate their use in generating 
unmarked extended responses. 
2. A SHORT OVERVIEW OF HA~-ANS 
HAM-ANS is a large German natural language dialog 
system of both considerable depth and breadth 
which presently provides access to three different 
application classes, namely an expert system 
(hotel reservation situation), a database system 
(fishery data) and a scene analysis system 
(traffic scene). 
The communicative situations the system handles 
are characterized as follows: 
In the hotel reservation situation the system 
takes the role of a hotel manager, who tries to 
persuade the user to book a room. The caller is 
assumed to have the overall goal of determining 
whether the room offered meets his requirements. 
The system must attempt to recognize the user s 
specific desires concerning the room as they are 
revealed - usually indirectly - in his utterances 
and to make use of the various devices available 
in natural language that permit the room in ques- 
tion to be presented in a particularly favorable 
light (e.g. the generation of tendentious descrip- 
tions using hedged relative adjectives). 
In the database application \[8\] HAM-ANS provides 
marine scientists with NL access to a fishery 
database, which contains data collected during 
international expeditions in the South Atlantic 
and the Indian Ocean. The complex structured data 
sets, which consist of cruise data concerning the 
vessels' voyages , krill biology data, relevant 
net hauls and oceanographic data, are stored in a 
relational database (76 relations. 11MByte) with 
a PASCAL/R data base management system. 
In the traffic domain we presuppose the following 
situational context: The system has observed a 
189 
street intersection and supplies on the phone the 
user, who is familiar with the scene but cannot 
see it from his remote location, with information 
about the traffic at that intersection (cf. fig. 
I). A geometrical description of the observed 
scene Is provided by the image sequence analysis 
system NAOS/HORIO \[g\]. In this application our 
research efforts are concerned with the handling 
of time (tense and temporal adverbs) and the 
semantics of locomotion verbs (cf. fig. 2). 
? How many trucks have you seen? 
TWO. 
? Which trucks have gone by? 
A LIGHT-COLORED ONE ON HARTUNGSTREET. 
? Did a Mellow car and a green station wagon 
turn off first? 
NO. A YELLOW CAR TURNED OFF HARTUNGSTREET 
ONTO SCHLUETERSTBEET, BUT THE GREEN STATION 
DID NOT TURN OFF FIRST. 
? Were a young man and an old woman walking 
down Schloeterstreet during this time? 
YES. AN OLD WOMAN AND A YOUNG MAN. 
? Did the green station wagon stop on 
Schlueterstreet? 
YES. 
? How many station wagons altogether have 
stopped? 
ONE OF THE THREE ON SCHLUETERSTREET. 
? What did I just ask you? 
YOU ASKED: HOW MANY STATION WAGONS STOPPED. 
? Did a station wagon drive off? 
YES, ONE ON SCHLUETEBSTREET. 
? By the way, the light-colored truck: Did 
it drive down Schlueterstreet too? 
YES. 
How many cars are there in all? 
SIXTEEN. 
? Is one of these cars blue 9 
YES. TWO OF THEM ARE KIND OF BLUE. 
? Did a blue car drive down both streets? 
YES, ONE BLUE CAB. 
So long! 
Fig. 2: A transcript of an actual dialog 
(translated from German) 
3. REPRESENTATION FORMALISMS FOR THE SEMANTICS OF 
LOCOMOTION VERBS 
3,1. THE GEOMETRICAL SCENE DESCRIPTION 
A basic requirement for answering questions about 
movements that have occurred in real sequences of 
scenes is an adequate representation of these 
sequences. Not only the shape, the centers of 
gravity, col,,r, etc. of objects must be 
represented, but also the trajectories of moving 
ob\]ects. 
Thls geometrical scene description consists of a 
combination of automatically generated outputs oF 
the scene analysis processes (insofar as this is 
presently possible) and a number of manual augmen- 
tations. 
The length in time of the scene under considera- 
tion is ca. 14 sec., which corresponds.to ca. 360 
single TV images. From these 360 lmages 72 
snapshots are coded in a relational formalism, 
denotlng which objects were observed, the shape of 
these objects, their current center of gravity and 
some other properties (e.g. color). The represen - 
ration of the first snapshot contains information 
about all objects that are visible at that time. 
For the successive snapshots only changes with 
respect to the predecessors are recorded, i.e. 
objects and their descriptions are only entered if 
they have changed location or appeared in the 
scene. A trajectory of an object is determined by 
its different centers of gravity relative to an 
underlying coordinate system. In contrast to the 
real TV image sequence this representation is only 
2 dimensional and thus provides a bird's-eye view 
of the scene. 
3.2. THE REPRESENTATION LANGUAGES SURF AND DEEP 
The logic-oriented semantic representation 
languages SURF and DEEP are the central represen- 
tation formalisms used in HAM-ANS. These languages 
are designed to be declarative and easily extend- 
able. SURF is the target language of the analysis 
components and source language for the generation 
components and thus as close as possible to NL 
utterances, whereas DEEP is better suited for the 
evaluation of utterances on the basis of the 
system's domain-specific knowledge sources. 
Originally SURF and DEEP were designed to 
represent term and predicate structures which 
serve as a representation formalism for state 
descriptions occurring typically in the hotel 
reservation situation. For an adequate representa- 
tion of the semantics of questions containing 
verbs, the definition of SURF and DEEP was aug- 
mented by meta-predicates for marking deep cases, 
tense and voice adapted from Fillmore's deep case 
theory \[3\]. Since events can be existentially quantlfied as in (6) or explicitly quantified as 
in (7) 
(6) Did \]ohn fly to Hamburg? 
(7) Did John fly to Hamburg three times last week? 
SURF and DEEP provide a means of representing 
quantification of events. A special quantifier 
E-ACT denotes an existential quantification of 
events. Other quantifiers like those in (7) are 
currently not available but can easily be 
included. Examples of SURF and DEEP expressions 
are shown in the annotated example (cf. fig. 8). 
In this paper only some of the features of SURF 
and DEEP are discussed, see \[6\] for a more 
detailed description. 
3.3. THE CASE-FRAME LEXICON 
The case frames for verbs used in the system are 
stored in the case-frame lexicon \[5\]. Each entry 
in the word lexicon for a verb contains a pointer 
to its applicable case frame which describes the 
semantics of that verb in terms of case relations. 
A case frame is represented as a combination of 
deep case descriptions specifying for each deep 
case its name, a marker, whether the deep case is 
obligatory (0) or optional (F), and the semantic 
restrictions which are required from a syntactic 
substructure to fill the deep case (of. fig. 3). 
This pointer technique permits the use of a 
specific case frame for several verbs during the 
analysis phase without predetermining a single 
process for these verbs during the evaluation of 
whole utterances. For verbs with different 
referential semantics, e.g. 'to accelerate' and 
'to stop', a single case frame, namely that speci- 
tying an obligatory AGENT of type 'vehicle' and a 
optional LOCATIVE of type 'thoroughfare', is 
applied during the analysis phase. 
Case frames are formulated in SURF so that the 
checking of the semantic restrictions can be 
accomplished by the inference rules usually 
applied during the evaluation of a complete utter-- 
ance; The selectional restriction that, e.g., the 
NP a car' describe an object of the class of 
vehicles, and therefore be a possible candidate to 
fill ~ the agent role of the verb 'to stop', can be 
190 
verified because of the transitivity of the super- 
set relation in the conceptual semantic net. 
In the case-frame lexicon the case frames are not 
recorded in the form shown in fig. 3. but rather 
are represented as constructor calls for building 
\[rl-s: ageL~t: 
\[d-l: rolommarker: 0 
restrictions: 
(lambda: xl \[af-a: ISA xl VEHICLE\]\]\] 
objective: 
SOUrce: 
locative. 
(d-l: rolA--marker: F 
restrictions: 
\[lambda: xl (af-a: ISA xl THOROUGHFARE\]\]\] 
goal: 
time: 
path: 
instrumeht:\] 
Fig. 3: Case frames for verbs of type 'to stop 
a case frame according to the actual syntax defin- 
ition of SURF, This guarantees that all possible 
modifications of SURF are immediately present in the case frames. 
3.4. OB3ECT-ORZENTEB REPRESENTATION OF MOTION 
CONCEPTS 
In object-oriented programming languages program- 
ming is more or less the activity of creating a 
world of entities called objects and of specifying 
a set of generic operations that can be performed 
on them• Objects can communicate with each other 
by sending and receiving messages. Essentially, 
running a program means that the object sends a 
message to ar, object (possibly to itself) which in 
turn sends a message etc., until the required task 
is fulfilled. An important benefit of the object- 
oriented style is that it lends itself to a par- 
ticularly simple and lucid kind of modularity. 
3.4.1. THE FLAVOR SYSTEM 
The Flavor system \[2\] \[13\] is an implementation of 
the language features that support object-oriented 
programming. Two kinds of objects exist in a Fla- 
vor system, namely one called flavor and the other 
instance of a flavor. A flavor represents a gen- 
eric object and an instance an individual realiza- 
tion of a ge,~eric object. It is possible to send 
messages to both kinds of objects. Flavors are 
organized in ,, directed graph called the flavor 
graph• There is one designated flavor, the 
vanilla flays, r, which corresponds to the thing 
frame in FRL \[I0\]. Since the heritage of informa- 
tion for each flavor is provided by the flavor 
graph, it zs necessary to specify for each newly 
defined flavor its location in the graph by naming 
its direct predecessors (its superflavors). The 
information contained in a flavor is a combination 
of all the information inherited from its super- 
flavors and the added information given by its own 
definition. The added information can also over- 
ride, augment or modify the inherited information. 
This is one dimension of the information contained 
in a flavor: owned or inherited. Another is the 
declarative/procedural distinction. The declara- 
tive knowle~tge of a Flavor is stored in variables 
of different kinds whereas procedural knowledge is 
encoded in so called methods• 
One kind of variable the instance variable - is 
used to give instances of the same generic object 
their individual information. The other kind - the 
class variable is owned by a flavor, can be 
'bequeathed' to other flavors, and accessed by any 
object in the flavor system. However, a flavor is 
only allowed to change a value of a class vari- 
able, if it owns this variable. 
Methods are function definitions that implement 
the operations defined for each flavor. The combi- 
nation of methods from different flavors is called 
mixing flavors. 
In comparison with FRL the Flavor system has 
mainly three distinguishing features: 
The 'A kind of' slot in FRL serves both for 
establishing an inheritance hierarchy and for 
connecting instances to superclasses, i.e. no 
clear distinction is made between generic 
frames and instances• On the other hand the 
flavor graph is built by specifying the 
superflavors for each flavor, instances are 
created by the make-instance-method. 
Because the distinction between generic 
frames and instances is not made in FRL there 
is also no distinction between instance vari- 
ables and class variables• In the Flavor sys- 
tem the semantics of variables is more 
clearly defined in that instance variables 
can only be modified in instances and class 
variables can only be modified in flavors• 
Frames in FRL are passive data structures, 
whereas flavors can be (re-)activated, 
created and modified; they are autonomous; 
they are declarative and procedural at the 
same time and hence are entities which are 
better suited for as formalisms for 
representing common knowledge (cf. \[2\]). 
Although the flavor system is a tool for the 
development of large software systems and not a 
knowledge representation language, it includes the 
basic concepts for the rapid design of specific 
knowledge representation formalisms. In contrast 
to a full-fledged knowledge representation 
language this approach requires some additional 
programming in the beginning, but it avoids any 
permanent overhead for features which are super- 
fluous for the task at hand• 
3.4.2. THE MOTION CONCEPT HIERARCHY 
The Flavor system is used in HAM-ANS for 
representing a specialization hierarchy of motion 
concepts (cf. fig. 4). The root flavor of this 
hierarchy is the motion concept HOVE. Descendants 
in the tree, e.g. GO_BY, TURN inherit the 
declarative and procedural information contained 
) 
( ) 
I TIME I SPACE 
I STOP IDRIVE-OFF J 
I VANILLA 
I I .ov,- I 
1 
I I TO' N I 
) 
SUBFLAVOR 
--0 NSTANC£ OF 
Fig. 4: The! motion concept hierarchy 
191 
< HAS A YELLOW CAR TURNED OFF? 
HAM-ANS 
FLAVOR :TURN SUPERFLAVORS : GO_BY 
VARS: AGENT, SOURCE... 
METHODS :JONLY_ASENT_SLOT_FILLED 
J FIND A SOURCE J 
CHECK DIRECTION CHANGE 
I F~O A GOAL NEQ SOURCE 
I INSTANCE_OF APPLICATION_OF 
I 
TURN120 : 
AGENT: CAR120 
SOURCE: HARTUNGST 
DIRECTION_CHANGE?: 
GOAL: BIBERSTREET 
 Es. oNE Y LLOW FROM --7 ARTONOSTRE T ON,O B,BERS  ETJ 
t I +k 
0 
e 
e 
1 tl+k÷l 
Fig. 5: Case slot filling as side effect of visual search 
in their parents. Instance variables comprise 
information about the deep cases associated with 
the motion concept as well as information needed 
and extracted by methods. The methods are respon-- 
sible for checking the referential semantics of 
the motion concepts. Instances of a flavor denote 
specific events in the domain of discourse that 
could be verified by the application of the 
methods. 
The methods of the additionally defined flavors 
TIME and SPACE are responsible for temporal and 
~;patial computations. Instances of these flavors 
determine the temporal and spatial description of 
the actual scene: the length of the scene in time, 
the number of snapshots, the spatial extent, etc. 
The task of checking.the truth value of the propo- 
sition in ;~ user s question is accomplished 
through messaqe passing. These messages include: 
creating in' Lances of motion concepts, e.g. 
TURN120, inst.,~tiating deep case slots specified 
il, the question, and activating appropriate 
(nt! t hod S . 
Let's now con,,zder the example given in fig. 5 in 
more detail. '.ince only the AGENT was specified in 
the questioh, the selected method is 
ONLY AGENT Sl~'l !ILLED. After determinirlg an interval ~f c~nsideration 
this method calls further m~.thods, namely FIND_A_SOURCE, 
DIRECTION_CHAUGE and FIND_A_GOAL NEQ ~;OURCE. 
DIRECTION CIIAI;GE is a special method of the flavor 
TURN. Th~ first and last methods are inherited 
(of. fig. 5) from flavor GO_BY because they are 
also needed in that flavor for answering questions 
like: 'Has the yellow car driven from Biberstreet 
to Hartungstreet~'. 
FIND A SOURCE identifies the first entry of the 
agen~'~ trajectory in the interval of considera- 
tion and checks which of the objects of the static 
background these coordinates belong to. For this 
test only those static objects are selected that 
satisfy the selectional restrictions for the 
source slot specified in the case-frame lexicon. 
If the test succeeds for an object, the name o~ 
this object is stored in the source slot, 
DIRECTION CHANGE now follows the agent's trajec- 
tory look~ng for a significant change of direc- 
tion. If this test is also positive, 
FIND A GOAL NEQ_SOURCE is tried. This method 
searches fur a point on the trajectory which is 
not inside the ob3ect identified in the source 
slot. If there is such a point, the same selec- 
tional check as for the source slot is executed 
for the possible goal object. The successful 
application of these methods yields a ful\].y 
instantiated flavor instance, e.g. TIJRN120 (cf. 
fig. ?). 
4. AN EXAMPLE OF THE PROCESSING OF AN UTTERANCE 
The processing of a user's utterance may be illus- 
trated by an example taken .from the dialog in fig. 
2. 
USER: Which trucks have gone by? 
HAM-ANS: A YELLOW ONE ON HARTUNGSTREET. 
192 
o.,.ov,, 1 
TYPE FLAVOR I .SUPERFLAVORS 
INSTANCE-VARIABLES 
AGENT SOURCE GOAL ..... 
~XACT.SOURCE EXACT.C..~OAL.-... 
~T~RVAL. OF. CON ~DERAI30N 
CURR~ff.TIHE 
METHOOS 
AC~NT.MO~D ? 
F~O_MOVEMBWT 
RNO_LOCAllON_OF_~EMT 
RNO_A_~URCE 
RN0.A.GOAL 
RND_A.GQAL_NEQ.S~LRCE 
INSTANCES I 
INANE GO-BY I ISUPERFLAVORS~ 
ITYPE FLAVOR t 
INSTANCE -VARIABLES 
INHERITEI I 
I \] 
AOOmONAL \] 
METHOOS 
INHERITED 
I I '" 
AODmONAL 
CHO~ 
~NLY_AOEM T_~.OT _FBJ.ED 
~GENT.ANO .SOLI~SP~iED 
AGENT .AN0 .GOAL_~=ECIF lED 
AC~ff _AN0 .LOCATW E..cPEO FlED 
AOENT.SOJ~GDAL .SPECFFn 
JTYPE FLAVOR J 
TURN \] I P FL VO   Jll t I 
INSTANCE VARIABLES 
I~_o~N~ I I 
RB~-BNED 
ONLY_AOENT~.RLED 
AII~ONAL 
I ts°  
Fig. 6: Instance variables and methods in the 
motion concept hierarchy 
The following discussion of some of the processing 
phases can hi:st be understood if continual re~er- 
ence is made to fig. B, which shows a traced ver- 
sion of the example. 
The processing of a user's NL input starts with a 
rather elaborate lexical and morphological 
analysis - a process which on the one hand reduces 
single words to their canonical forms with their 
morphologi<al and syntactic features (e.g. gender, 
person, number) and on the other hand recognizes 
syntagmatic groups of words and discontinuous verb 
constituents, transforming them according to predefined rules. 
The generated structure - the preterminal string 
(not shown in fi@. 8) - forms the input to the 
parser. The syntactlc analysis consists of two 
different strategies, both of which use the same 
ATN-definitions of syntactic categories, e.g. for 
noun phrases and prepositional phrases. One of 
INAME  N120 1 INSTANCE_OF 
ITYPE INSTANCEI 
INSTANCE VARIABLES 
NAM~ VALUE 
AGENT CAR 2O 
CURRENT_TIME TSD 12B 
CURRENT.SPACE SSO 128 
INTE~L.0F_CONS~BRATION ( 21 . 5~ ) 
SOURCE 
EXACT_SOURCE 
OIRETION_CHANGE ? 
GOAL 
EXACT_GOAL 
RLLEO.BY.METHOO 
MAKE_INSTANCE 
OETERM~E_INTERVAL_ 
OF_CONSIDERATION 
BIBERSTREET } 
( 50 . 70 ) FINO_A_SOURCE 
T CPECR-DIREClqON_CHANGE 
HARTUNGS-I'REET } FINO_A_GOAL_NEO_ 
( 300. I00 ) SOURCE 
I'.tg. 7: An instance of TURN 
these strategies - always applied for sentences 
with copula verbs - uses a surface grammar to cope 
with word order variations. The other is a case- 
driven analysis strategy which is used for sen- 
tences containing verbs with an associated case 
frame. 
Since in the example the verb 'to go by' has a 
case frame the second strategy is applied. After 
an access to the case-frame lexicon the case frame 
is constructed. This case frame is used to guide 
the parsing in the following manner: The al@orithm 
first attempts to recognize those syntactic con- 
stituents that are possible candidates for a deep case marked obligatory, and then to recognize 
those constituents that are possible candidates 
for optional deep cases. When the input is com- 
pletely consumed and all obligatory deep cases are 
filled the process ends. 
The test for determining if a syntactic consti- tuent 
is a possible candidate to fill a specific 
deep case is divided into a syntactic and a seman- 
tic check. The syntactic check requires, e.g., 
that in order to fill the agent role a constituent must contain the attribute 'nominative' (sentence 
in active voice) and that its number must 
correspond to that of the verb. The semantic check 
requires that the noun of the constituent fulfill 
the semantic restrictions specified for the 
specific deep case. This is accomplished through 
the building of a SURF expression for the consti- 
tuent, the transformation of this expression into 
a DEEP expression, and the evaluation of the DEEP 
expression on the basis of the conceptual net. 
In our example only the agent case is marked as 
obligatory and the noun phrase 'which trucks' ful- 
fills both the syntactic and semantic requirements 
to fill this slot. Since no other syntactic con- 
stituents are encountered, the complete SURF 
representation is constructed. 
The structure is normalized into a DEEP structure. 
One of the maln tasks or this process is the 
determination of the scope of quantifiers. The 
algorithm used for this purpose is modelled after 
the one described by Hendrix \[4\]; it takes into 
account the relative strength of natural language 
quantifiers (e.g. 'a', 'both') and question opera-- 
tots (e.g. 'which' 'how many ). The strength is 
determined by a numeric value, which in some cases 
is modified by the degree of generality of the 
noun. E.g. the existential quantifier 'a' is 
weaker than the more specific (luantifier 'both'. 
193 
? Which trucks hive gone by# 
It Syntactic analysis 
;; Call frame 
Irl-i: lgent: 
(d-l: rOll-litter: O 
rlltrictionl 
(isabel: II lit-is ISA II VEHICLE))) 
objective: 
source: 
(e-it role+marker: F 
restrictions: 
Ilelbdl: I| lit-It ISA el THOROUGHFARE))) 
looetivl: 
(d-l: role-narke~: F 
rlltriotiunl: 
Llimbde: 11 liE-e: ISA It THOROUGHFARE))) 
goiI: 
Id-L: roll-marker: F 
restrictions: 
Ileabds: It lit-is ISA =| THOROUGHFAEE\])) 
time: 
pith: 
inltruleut:) 
;: AGENT plrlld 
llllhdl: IS 
Lit-is AGENT 
19 
It-s: \[q+v: HUICU) Ilelbdl: x$ (it-at XSA x0 TRUCEI)))| 
;; SURF representation of input sentence 
Ill+d: EVENT 
It-s: (g'qt: E-ACT) (llibdl: ItO leE-is ACT xl0 GOBYIll 
Ld-l: rOll+hit: 
(ri-e: agent: 
Llanbdl: IS 
lit-t: AGENT 
=9 
(t'J: (qm+: HHICUI Ilenbda: aS let-x: ISA sO TRUCE))))) 
objective: 
eource: 
locltive: 
goal: 
tile: 
pith: 
inltruaent:) 
mud: 
Id-a: tense: 
t;albdl: It1 lit-e: TENSE II1PERF)) 
voice: 
(lanbdl: It2 Let-e: VOICE 112 ACTIVE)))|I 
** iormelinnt*on: Trenltorlin S into DEEP representation 
:: 9EEP structure 
If-d: It-q: (for: (B-V: NRIEH) elg) lit-R: ISA xt4 TRUCE)) 
It-d: (i-q: (for: (q-qt: G-ACTI 113) let-e: ACT ItS GO BY)) 
If-e: role-lilt: 
(rl-d: agent: 
lit-a: AGENT a13 el4) 
objective 
source: 
locative: 
poll: 
tiM: 
path: 
ialtrunent:) 
nod: 
It-s: tense: Let-s: TENSE at\] PERF) voice: Let-s: VOICE !13 ACTIVE))I 
)) 
Ii gvllualion 
:; Evaluation of i formula uith the quantifier 
(q-w MGICH) 
;; Evlluatio. oti toraull vith the quantifier 
(q-qt R-ACT) 
;; Object TfllICKI his not loved during the entire scene 
;: Evaluation of a formula with the qu|ntititr 
Iq-qt: R-ACT) 
:; Tilting nf • partially inltantietld till frame 
If-e: Poll-Jigs: 
Irl-d" agent 
\[at-a: AGENT GG_BY TRUCKi| 
objective' 
source: 
locative: 
goal: 
time: 
path: 
instrument ) 
Iod: 
It'l: tense (If'e: TENSE GO BY PEBF) voice: elf-is VOICE GO_BY ACTIVE)J) 
;; Interval of consideration determined from tense land adverb) 
(1 641 
:; Thi object becomes visible betleln till points SG lad GS 
;; The interval et consideration lOdified in icourdlnol vith object till il: 
IGG 64) 
;; Change determined betroth till points SG and 57 
3; Completed ceil frill 
If-IS rOll-lilt: 
l+l-d: Iglflt: 
Lit-IS AGENT GO_BY tngcxi) 
objeetivet 
SOurce: 
locative: 
(If-iS LOCATZVE G0_BY nON DAOTONGGTGEET) 
goll: 
tint: 
path: 
instrument:) 
nod: 
If-it annie: Let-s: TENSE GO_BY PERFI voice: (if-is VOICE GO BY ACTIVE))) 
:; +Veritication of event win polsibil 
;; Olsult Ot the Evaluation 
If-d: It-q: Ifor: (q-s: ITRUCNS)) el4) T) 
)f-d: It-q: (for: Lq-qt: E-ACT) zt+l tit-as ACT xlS GO BY)) 
It-Is roll-list: 
lrl-d: agent: 
(it-l: AGENT IT3 ZI4I 
objective: 
source: 
locative: 
lit=a: LOCATIVE 113 *ON HbRTUNGSTREET) 
goal: 
time: 
path: 
instruments) 
lode 
if-Is tenll: (it-is TERSE It3 PERF) voice: (at+at VOICE It3 ACTIVE})))) 
la InVlrll norli|illtion: TFllltOFling into SURF rlpresentltion 
;; EUHF rlprlllntlbio+ ot elliot 
lit+d: EVENT 
It-IS (q-qk: S-ACT) Llsabdl: xt3 Let-t: ACT xl3 GO_DYlll 
{d-e: role-list: 
(rl-I: event: 
Ilelbdl: ItS (it-l: AGENT 113 (t-l~ (B-a: ITGUCESII T)II 
obJeetivl: 
lOUrCl: 
lo©Itiva: 
(lllbd|: It3 lit-as LOCATIVE all tON HARTUNGSTREET)) 
goal: 
tines 
pith: 
inetrulent:) 
ned: 
Id-a: tinier 
(llabdl: st3 \[if+is TENSE st3 PERFI\] 
voi~e: 
(llubds: =13 Lit-is VOICE at3 ACTIVE))))) 
** Ellipsis gineration 
;; Elliptitted SURF representation of answer 
(rl-e~ Igent: 
(1elba1: aS lit-as AGENT tO {t-l: (q-s: (TRUCR2)\] T))) 
objective: 
lOUrce: 
locative: 
Ll|abde: sO (It-It LOCATIVE eO *ON UARTUNBSTREETI) 
goal: 
till: 
pith: 
inlt?Ullnti) 
II Vltbl~llltiO n 
tt NP-Generetion for TOUCH2 
;; The ggnerited DP for TRUCRS is: 
(t-q: (tor: lq-qt: A) 1IS) If-o: AND Let-is ISA lIB 
TRUED) (if-e~ BEF ItS LIGGT-COLORBDI)I 
;; VerblIilld itructure Of easier 
(SENTENCE IAGEDT (HP (HP (H: SOl A LIGHT-COLORED (ELLIPSIS THUCE))I) 
(LOCATIVE IPP *OH IflP (Ms SOL HARTUNGSTEEETIIII 
*l Surface trlnsformitioni 
A LIGNY+COLOREG ONE ON GARTBNGBTNEET 
Fig. 8: Annotated example Lnteraction 
194 
Since, in the example discussed, the question 
operator 'which' is stronger than the existential 
quantifier for verbs 'E-ACT', the structure is 
rearranged. 
The task of evaluating a OEEP formula is governed 
by a generate and test strategy. Generate and test 
procedures can De viewed as being activated by 
pattern-directed invocation and differ from each 
other in that the generate procedures assign 
internal object identifiers to variables in DEEP 
formulas, while the test procedures yield two 
values, the first of which is either a fully 
instantiated formula equivalent to the input for- 
mula or a modified formula, and the second of 
which indicates the truth value of the input for- 
mula in the range \[0,1\]. In the interpretation 
phase these two processes interact in such a way 
that a test attempt activates generate procedures 
which in turn call test procedures and so on. 
A closer look at our example shows that after the 
first test attempt has discovered a structure con- 
taining a variable in this case the term 
representing the noun phrase 'which trucks' - a 
package of generate procedures is activated to 
produce the set of object identifiers denoting the 
referential set of objects that are trucks - here 
TRUCK1 and TRUCK2. The rest of the formula is 
then recursively sent to a test process with the 
variable 'w14' replaced by elements of the refer- 
ence set for trucks one after the other. 
The next formula to be tested requires the genera- 
tion of a set of instances of the type GO_BY. 
Since events are not represented in fully instantiated form but rather must be extracted 
from the geometrical scene description, a special 
set of procedures - the methods specified in the 
verb flavor hierarchy - is activated. (See section 
3.4.2 for how this process functions,) 
A verification of an event GO BY is possible only 
for TRUCK2. The additional ~nformation extracted 
durin 9 the process of visual search - the specific 
location of the event - is recorded in the loca- 
tive slot. 
During the formation of the result of the evalua- 
tion, the system, guided by general heuristics, 
decides whether the additional detail will cause 
too ~reat a complexity in the answer or not \[11\]. 
In this case the complexity is suitable and the 
location will be mentioned in the answer. The word 'which' is defined as quantifier that 
causes a description of a set of objects to be 
returned (instead of a truth value). Thus the set 
of reference objects for which the proposition in 
question could be verified, i.e. TRUCK2, is sub- 
stituted for the noun phrase 'which trucks'. 
The resulting DEEP expression is transformed by 
the inverse normalization process into a SURF 
expression. In order to verbalize extended 
responses in a manner both informative and concise 
as possible, the ellipsis generation process 
elides those parts of the semantic representation 
of complete responses that are identical to the 
stored representation of the question \[?\]. 
The verbalization component produces a string of 
canonical words and their grammatical features 
using translation rules attached to the various 
categories of SURF expressions, A special subcom- 
ponent provides for the generation of noun phrases 
as descriptions of domain individuals, in our 
example TRUCK2. In this case the NP-generator 
decides not to generate a definite description 
since neither the system nor the user has already 
referred to TRUCK2 in the previous dialog and the 
existence of TRUCK2 as a moving ob3ect is not 
implied by the existential assumptions supplied by 
the a priori user model (cf. \[?\]). Instead, the 
indefinite NP a light-colored truck' is gen- 
erated, using the property 'light-colored' as an 
initial characterization. 
Finally the "surface transformation' component \[1\] 
pronominalizes the noun 'truck' and yields a 
standard word order of the utterance and the 
correctly inflected forms of the canonical words. 
5. CONCLUSZON 
We have attempted to show that case role filling 
for the construction of an unmarked extended 
response can be regarded as a side effect of the 
visual search necessary to answer questions refer- 
ring to a visually present domain of discourse. A 
new method for the representation of the referen- 
tial semantics associated with locomotion verbs 
has been presented in the framework of object- 
oriented programming based on the Fla.vor system. 
The approach presented has been useful in extend- 
ing the communicative capabilities of the dialog 
system HAM-AN$ as an interface to a vision system. 
REFERENCES 
\[1\] 
\[z\] 
\[32 
\[4\] 
\[5\] 
\[s\] 
\[7\] 
\[e\] 
\[9\] 
\[10\] 
(11\] 
\[12\] 
(13\] 
BUSEMANN, S.: Problems involving the 
automatic generation of utterances in German. 
Hemo ANS-8, Research Unit for Information 
Science and AI, Univ. of Hamburg, April 1082. 
Ol PRIMIO F., CHRISTALLER, T.: A poor man's 
flavor system. Working paper No. 47, ISSCO, 
Univ. de Geneva, laB3. 
FILLHORE, C. 3.: The case for case. In: Bach, 
E., Harms, R. T. (eds.): Universals in 
linguistic theory. Holt, Rinehart & Winston, 
1968, pp. 1-88. 
HENDRIX, G. G.: Semantic aspects of transla- 
tion. In: Walker, O. E. (ed.): Understanding 
spoken language. New York, North-Holland, 
1978, pp. 193-228. 
HOEPPNER, W.: ATN-Steuerung durch Kasusrah- 
men. In: Wahlster, W. (ed. : GWAI-82. Proc. 
Sth German Workshop on AI. Berlin: Springer, 
1982, pp. 215-226. 
HOEPPNER, W., CHRISTALLER, TH., HARBURGER, 
H., HORIK, K., NEBEL, B., O'LEARY, H., WAHL- 
STER, W.: Beyond domain independence: Experi- 
ence with the development of a German 
language access system to highly diverse 
background systems. In: Prec. 8th IJCAI, 
Karlsruhe 1083, pp. 588-594. 
3AHESON, A., WAHLSTER, W.: User modelling in 
anaphora generation: Ellipsis and definite 
description. In: Proc. ECAI-82, Orsay 1982. 
pp. 222-227. 
HARBURGER, H., NEBEL, B.: Natuerli- 
chsprachlicher Oatenbankzugang mit HAH-ANS: 
Syntaktische Korrespondenz, natuerlichspra- 
chliche Ouantifizierung und semantisches 
Hodell des Diskursbereichs. In: Kupka, I. 
(ed,): GI-13. Jahrestagung. (To appear) 
NEUHANN, B.: Towards natural language description of real- world image sequences. 
In: Nehmer, J. (ed.): GI-12. 3ahrestagung. 
Berlin: Springer, 1982, pp. 349-358. 
ROBERTS, R.B., GOLDSTEIN. I.P.: The FRL 
manual. AI Hemo &09, AI Lab., HIT, Cambridge, 
1977. 
WAHLSTER, W., HARBURGER, H., 3AHESON, A., 
BUSEMANN, S.: Over-answering yes-no ques- 
tions: Extended responses in a NL interface 
to a vision system. In: Proc. 8th IJCAI, 
Karlsruhe 1983, pp. 6&\]-B&6. 
WEBBER, B., 30SHI, A., HAYS, E., HCKEOWN, K.: 
Extended natural language database interac- 
tion. In: Int. 3. Computers and Mathematics, 
Spring 1983. 
WEINREB, D., MOON, O.: Lisp Machine Manual 
(;th ed.). HIT, 1981. 
195 
