COLING 82, 3". Horec£'~ (ed.) 
North-Holland Publizhing Company 
© Academia, 1982 
COGNITIVE MODELS FOR COMPUTER VISION 
G.Adorni A.Boccalatte M.Di Manzo 
INSTITUTE of ELECTROTECHNICS 
UNIVERSITY of GENOA 
GENOA, ITALY 
This paper is focused on the relations existing between language and 
vision. Its goal is to discuss how linguistic informations about ob- 
jects,shapes,positions and spatial relations with other objects can 
be integrated into a cognitive model tailored to spatial inferencing 
operations. 
INTRODUCTION 
A common approach to the problem of scenes interpretation is to gen- 
erate hypothesis about the position and size of objects and try to 
use these expectations to guide the search for picture areas which 
exhibit the expected features 14,8,15\[. But where this expectation 
came from? If a robot operates in a known environment,expectations can 
be self-generated on the basis of built-in knowledge and previously 
experienced situations. Another very common source of informations can 
be some kind of external input, often based on natural language commu- 
nication. A piece of conversation as "look for the pencil","where?", 
"on the table" conveys a lot of informations about the presence of a 
reference object (table) and the characteristics of a surface (top of 
table) which must be located in order to restrict the search for the 
target object (pencil). To take advantage of these linguistic infor- 
mation sources we must be able to extract from a qualitative expres- 
sion like "on the table" all those quantitative constraints which are 
relevant from a geometric modelling point of view 121. These problems 
could seem much more related to the generation of visual analog repre- 
sentations than to the understanding of a scene; but what does it mean 
exactly to "understand" a scene? When we analyze a scene, we use a I~ 
of not geometric knowledge; we are not surprised to find smoken ciga- 
rettes into an ashtray, and a glance is enaugh to classify them, but 
we could have some troubles to recognize that it contains a company 
of goldfishes, and this surely not only because of geometric con- 
straints! Therefore the processing of visual knowledge must be based 
on cognitive models that are able to handle different kinds and sources 
of informations, and in this sense we feel that there is not a clear 
• cut between scene analysis and scene generation 114,161. 
In the following we will deal mainly with the representation of objects 
and the formalization of spatial relationships, trying to point out 
how linguistic informations can be related to visual ones. 
* Work supported by Italian National Research Council under grant 80.01142.07 
G. ADORNI, A. BOCCALATTE and M. 19I MANZO 
OBJECT DESCRIPTION AND SPACE MODELLING 
The knowledge of the structure of an object is often intimately relat- 
ed to our capability of understanding the meaning of a spatial relat~n- 
ship; fop instance, the meaning of the sentence "the cat is under the 
car" is clear, even if it may depend on the state of the car, moving 
or parked; on the contrary, the sentence "the cat is under the wall" 
is not clear, unless the wall is crashed or it has a very particular 
shape. Every object modelling technique must deal at least with the 
following issues 16,7\[ : 
l.Objeot must be described at several levels of detail.To understand 
the sentence "put the chair near the table" only a rough definition 
of chair and table dimensions can be sufficient,while to build a mod- 
el of "a man sitting on a chair" a more sophisticated knowledge a- 
baut the structure of a chair and a man is requested. 
2.The articulation of movable object parts must be properly described. 
The sentences"open the door" and "open the drawer" have different 
geometric meanings because the movements of doors and drawers usual- 
ly obey different rules. 
3.Characteristic features of objects must be pointed out.0ften these 
features are free surfaces,as the top of a table,in canonical posi- 
tions. The recognition of a feature allows the generation of hypoth- 
esis about the presence of an object. 
4.Typical relations between objects must be described.When we look for 
a pencil we do not start analyzing a wall or a window,but we look at 
first foP a table or some other piece of furniture in which or on 
which it is reasonable to find a pencil. 
0ur conceptual definition language allow the definition of lines,suP- 
face and solid objects. Solid objects are described by means of GENER- 
ALIZED CONES 19,101,at several levels of detail.Cones can be intercon- 
nected by means of fixed or movable points,with arbitrary constraints 
on rotations and shifting. Specific jointing elements are defined to 
properly describe the surface of an articulated object;so we can cor- 
rectly answer to the question: "is the fly on the snake?" indipenden~y 
of how the snake is actually coiled.More details can he found in Ill. 
From a computational point of view,the use of a system of coordinated 
axes represent a very natural way to describe the position of an object. 
If we are able to transform linguistic relations into quantitative geo- 
metrical ones,the well knows methodologies of anal~tical geometry can 
be used as a simple,general purpose set of inferencing rules.Hence the 
goal of describing objects and spatial relations by means a simple,non 
redundant n-tuple of coordinated axes is very appealing. Unfortunately 
it seems quite far from the psychology of language 121. 
Therefore we associate a redundant FRAME OF REFERENCE (FOR) to every 
object,consisting of : 
- an axis,Z,having direction of the "maOor" axis of the coDe.Two points 
aPe specified on it,Zmi n and Zmax,COrresponding to the extremities 
of this major axis; 
- a point 0,on the Z axis,which is the origin of the frame; 
- an axis,X,orthogonal to Z,that specifies a further privileged direc- 
tion of the object;this axis is definib!e only for some objects(eg. 
COGNITIVE MODELS FOR COMPUTER VISION 
a man) in which a front and a back can be distinguished. Objects f(r 
which the X axis is definable are to be called CLASS 1 objects; 
those for which the X. axis is not definable (eg. a pole) are to be 
called CLASS 2 objects; 
an axis,Y,orthogonal to X and Z.The Y axis is obviously not defina- 
ble for class 2 objects; 
a radial coordinate ~ whose origin is at O; 
the coordinates ~ and ~ specified on the X-Y plane; 
a curvilinear coordinate t originating at point O. 
The use of cones simplifies the FOR;it allows a homogeneous represen- 
tation of an object shape and of its spatial relations with the exter- 
nal world;it proves particularly useful in situations like "the ball 
is inside the box". 
SPATIAL RELATIONS BETWEEN OBJECTS 
Let's now analyze some spatial relations between objects,in order to 
discuss how they can be translated in terms of geometrical primitives. 
Spatial relations involving the Z axis generally use a "major" axis 
perpendicular to the earth surface;this is the onl~ absolute reference 
used in language perhaps because the concept of "high" and "low" is 
directly related to the line of action of the force of gravity. There- 
fore the sentence "the object A is above the obOect B" can be concep- 
tualized as : 
// B P-point~CQNE(A),Q-point~ CONE(B) : X(P)=X(Q),Y(P)=Y(Q), 
Z(P)~ Z(Q) I FOR does not require further specification // 
Note that we can state conditions only for pairs of points whose hor- 
izontal projections are the same. In fact, even the"pure" meaning of 
"above" is much more constraining 14,131,this relationship is used in 
a number of "impure" meanings,in which we cannot say that the horizon- 
tal projection of A is included in the horizontal projection of B(Fig. 
la), or Z(P)~Z(0) for any pair or points P~ C0NE(A) and QeCONE(B) 
(Fig.lb). 
The preposition "on" is often synonymous of "above",but in some cases 
it can mean "below", as in "on the ceiling", or involve horizontal re- 
lations as in "the lamp is on the wall".Usually "A on B" requires B 
to support A against the action of gravity,by means of some kind of 
physical contact. Hence,the conceptualization of "a man on a chair" is 
the same as "a man above a cha~r~'~plus an assertion about physical 
contact and Supporting action : 
// ~ P-point~ CONE(MAN),Q-point~CONE(CHAIR) : X(P)=X(Q), 
Y(P)=Y(Q),Z(P)~Z(Q) I CONE(CHAIR) appli~sa force to the 
CONE(MAN) I FOR does not require further specification // 
Horizontal relations ar~ much more ambigous. Sometimes FOR is expl~ity 
stated,as in "looking at the church,the post office is on your right"; 
otherwise a default assumption is to use FOR associated with the speak_ 
er or the listener. 
If we consider the sentence "the object A is behind the object B",two 
interpretations are possible : 
a) FOR is the n-tuple associated with the object B; 
10 G. ADORNI, A. BOCCALATTE and M. DI MANZO 
b) FOR is external to both objects A and B. 
Case a can be assumed only if B is a class 1 object;case b is always 
assumed when B is a class 2 object,but it is not usual even when B is 
a class 1 object. In the ease a the previous sentence is conceptualized 
as follows : 
// ~ P-point~CONE(A),Q-point~ CONE(B) : Y(P)=Y(Q),X(P)~ X(Q) I 
FOR associated with CONE(B) (ie. FORcCONE(B) // 
This definition and the next one allowto handle situations asthoseshownin F~. 
2a-b ; the situation of Fig.2c does not represent a proper use of "be- 
hind";if such a preposition is used,more inferencing capabilities are 
needed. In the case b the previous sentence means that B is (partially) 
hiding A to an observer,who can be assumed,to be one of the actors in 
the story;hence the conceptual representation is : 
// ~ P-pointe CONE(A),Q-point~CONE(B) : Y(P)=Y(Q),X(P)> X(Q)I 
X(P),X(Q)•~,Y(P),Y(Q)~@ I FORaK-point ~ (CONE(A) or CONE(B))// 
a) 
Y E-\] b 
Y --® 
.,y 
--® 
x I b) 
FIG. 2 F IG. 1 
x 
X 
Let's now to consider relations as "on the edge of","on the su~faoe of;' 
"in the middle of" and so on. For every point P on the surface of the 
cone which describes the object A,its possible to find the correspon- 
ding cross-section,that is characterized by a value Z of the coordi- 
nate along the cone axis. The boundary of this section is described by 
a radial coordinate ~((~,~.).Therefore the sentence "the pen is in the 
middle of the table" can be conceptualized as follows,assuming as ref- 
erence the cross section of the table cone which corresponds to the 
table top : 
// 3 P-pointm CONE(PEN) ,Q-point eCONE(TABLE) : &(~,Z)m~, 
Z(P)=Z (Q) I CONE(TABLE) a~lies a force to the CONE(PEN) I 
max FOR c C ONE ( TABLE ) // 
Let's conclude looking at sentences as "the house is before the bridge',' 
"two miles after the lights" and so on. In these cases spatial relations 
are referred to a path,usually not straight.This type of relations can 
be conceptualized using a curvilinear coordinate t associated with a 
COGNITIVE MODELS FOR COMPUTER VISION II 
trajectory s originating in the center of FOR.If the analytical de- 
scription of such a trajectory is unknown,the robot will be able to 
make inferences only about the relative positions of objects along the 
path;so,for instance,from the sentence "the house is two miles after 
the bridge along the ~oad t0 Florence" it is possible to deduce that 
a man wolking towards Florence will meet at first the bridge and then 
the house,after an evaluable time. If more informations are available 
(eg. the path is a road and the map of the town is known),the position 
relative to other FOR can be evaluated from the actual value of t,in 
order to infer that "two miles after the bridge" means exactly "on the 
right of the station".The formal description of "the object A is after 
the object B", is : 
// ~ P-point~ CONE(A),Q-pointg CONE(B) : Pg s-trajectory, 
Q~ s-trajectory, t(P) ~t(0) I s-trajectory star, from CONE(B)// 
Finally,we should discuss how to quantify all the inequalities which 
result from the previously analyzed conceptualizations. Such a quanti- 
fication can be considered as a special case of spatial inference, 
which unfortunately we cannot introduce here because of lack of space. 
An attempt to classify inferences can be found in Ill. 
CONCLUSIONS 
The'problem of robotic vision has been 0nly sketched in this paper. 
Even if more detailed analysis of some particular objects can be found 
in the literature 13,7,10,131,vision is yet a substantially open prob~ 
lem. A number of basic questions as,for example,~the representation of 
objects with variable shapes,or the use of knowledge about the expected 
goals of an actor to infer its future movementyand the proper linking 
of cognition with image-processing procedures,are still waiting fora 
suitable answer. However,these topics are receiving more and more at- 
tention,both because of impact that an advanced,integrated vision- 
manipulation system could have on the applications of robotics,and 
because artificial intelligence people are aware that there is a large 
number of linguistic problem that cannot be solved if this perception 
capability is not achieved. 
The work described in this paper is part of a largo project,whose goal 
is the development of a cognitive background,based on conceptual de- 
pendency and related concepts lll,121,for an integrated vision-manip~ 
lation system. 
z 
I A I 
Y 
FIG. 2 - c 
12 G. ADORNI, A. BOCCALATTE and M. DI MANZO 

REFERENCES 

Adorni,G.,Boccalatte,A.,Di Manzo,M., Object representation and 
spatial knowledge: an insight into the problem of men-robot com- 
munication, 7th.Conf. Canadian Man-Computer Communication Society, 
Waterloo,Canada (june 1981). 

Adorni,G.,Di Manzo,M., Some considerations about a conceptual mod 
el oriented to the representation of spatial relationship(in Ital_ 
ian), National Research Council ITD-O45,Genoa,Italy (march 1980). 

Agin,G.J., Vision systems for inspectation and for manipulation 
control, J.Automatic Control Conf.,S.Francisco,CA,USA (june 1977). 

Hanson,A.,Riseman,E.(eds.), Computer vision systems (Academic 
Press,New York,1978). 

Boggess,L.C., Computational Interpret'ation of english spatial 
preposition, Tech-Rep. T-75,Coordinated Science Laboratory,Univ. 
of Illinois (february 1979). 

Kuipers,B.J., Modelling spatial knowledge, 5th. Int.J.Conf. on A. 
I.,Cambridge,MA,USA (august 1977). 

Lehnert,W.G., Representing physical objects in memory,Res-Rep.131, 
Dept. of Comp. Sc.,Yale Univ. (may 1978). 

Mackworth,A.K.,Havens,W.S., Structuring domain knowledge for vis- 
ual perception, 7th. Int.J. Conf. on A.I.,Vancouver,Canada (au- 
gust 1981). 

Marr,D.,Nishihara,H.K., Representation and recognition of the 
spatial organization of 3-D shapes,Proc.Royal.Soc.Lond.B. (1978). 

Nevatia,R., Computer analysis of scene of 3-D curved objects, 
(Birkhauser Verlag,Basel,1976). 

Schank,R.C.(ed.), Conceptual information processing (North- 
Holland,Amsterdam,1975). 

Schank,R.C.,Abelson,R.P., Scripts Plans Goals and Understanding 
(Lawrence Erlbaum,Hillsdale,1977). 

Waltz,D.L., Relating images concepts and words, NFS Workshop on 
the representation of 3-D objects,Univ.of Pennsylvania,Phyladel- 
phia (1979). 

Waltz,D.L.,Boggess,L.C., Visual analog representation for na~ural 
language understanding, 6th. Int.J.Conf. on A.I.,Tokyo,Japan 
(august 1979). 

Weymounth,T.E., Experiments in knowledge-driven interpretation 
of natural scene, 7th.lnt.J.Conf. on A.I.,Vancouver,Canada (au- 
gust 1981). 

Winston,P.H.(ed.), The psychology of computer vision (Mc Graw 
Hill,New York,1975). 
