From route descriptions to sketches: 
a model for a text-to-image translator 
Lidia Fraczak 
LIMSI-CNRS, b£t. 508, BP 133 
91403 Orsay cedex, France 
fraczak@limsi.fr 
Abstract 
This paper deals with the automatic trans- 
lation of route descriptions into graphic 
sketches. We discuss some general prob- 
lems implied by such inter-mode transcrip- 
tion. We propose a model for an automatic 
text-to-image translator with a two-stage 
intermediate representation in which the 
linguistic representation of a route descrip- 
tion precedes the creation of its conceptual 
representation. 
1 Introduction 
Computer text-image transcription has lately be- 
come a subject of interest, prompting research on 
relations between these two modes of representa- 
tion and on possibilities of transition from one to 
the other. Different types of text and of images 
have been considered, for example: narrative text 
and motion pictures (Kahn, 1979; Abraham and De- 
scl~s, 1992), spatial descriptions and 3-dimensional 
sketches (Yamada et al., 1992; Arnold and Lebrun, 
1992), 2-dimensional spatial scenes and linguistic de- 
scriptions (Andr~ et al., 1987), 2-dimensional image 
sequences and linguistic reports (Andr~ et al., 1988). 
Linguistic and pictorial modes may be considered 
as complementary since they are capable of convey- 
ing different kinds of content (Arnold, 1990). This 
complementarity of expression is explored in order to 
be used in multi-modal systems for human-computer 
interaction such as computer assisted architectural 
conception (Arnold and Lebrun, 1992). Such sys- 
tems should not only use different modes to ensure 
better communication, but should also be able to 
pass from one to the other. Given the differences 
in capacities of these two means of expression, one 
may expect some problems in trying to encode into 
a picture the information contained in a linguistic 
description. 
The present research is concerned with route 
descriptions (RDs) and their translation into 2- 
dimensional graphic sketches. We deal with a type 
of discourse whose informational content may seem 
quite easy to represent in a graphic mode. In every- 
day communication situations, verbal RDs are often 
accompanied by sketches, thus participating in a 2- 
mode representation. A sketch can also function as 
a route representation by itself. 
We will first outline some problems that may ap- 
pear while translating descriptions into graphics. 
Then we will describe our general model for an auto- 
matic translator and some aspects of the underlying 
knowledge representation. 
2 Some translation problems 
Our first approach to translate RDs into graphic 
maps consisted in manually transcribing linguistic 
descriptions into sketches. By doing this, we encoun- 
tered several problems, some of which we will try to 
illustrate through the following example, taken from 
the French corpus of (Gryl, 1992). 
Example 2.1 A la sortie des tourniquets du RER 
tu prends sur ta gauche. II y a une magni\]ique de- 
scente~ prendre. Puis tu tournes ~ droite, tu tombes 
sur une sdrie de panneaux d'informations. Tu con- 
tinues tout droit en longeant les terrains de tennis 
et tu tombes sur le bdtiment A. 1 
In the description here above we can observe some 
ambiguities, or incompleteness of information, which 
may be a problem for a graphic depiction. The 
most striking case is the information about the ten- 
nis courts: we do not know on which side of the path, 
right or left, they are located. 
1 At the turnstiles of the RER station you turn left. 
There is a steep (a magnificent) downgrade to take. 
Then you turn right, you come across a series of sign 
posts. You continue straight on, passing alongside the 
tennis courts, and you come to building A. 
299 
There is also another kind of ambiguity due to 
the fact that in a RD the whole path does not 
have to be "linguistically covered". Consider the 
fragment about turning to the left ("tu prends sur 
ta gauche") and the downgrade ("descente"). It 
is difficult to judge whether the downgrade is lo- 
cated right after the turn, or "a little further". The 
same question holds for the right turn ("puis tu 
tournes ~ droite") and the sign posts ("panneaux 
d'informations"): should the posts be represented 
as immediately following the turning point (as ex- 
pressed in the text) or should there be a path be- 
tween them? This kind of ambiguity is not really 
perceived unless we want to derive a graphic repre- 
sentation of the route. The information is complete 
enough for a real life situation of finding one's way. 
Another kind of problem concerns the "magnifique 
descente". It would not be easy to represent a slope 
in a simple sketch and, even less so, its characteristic 
of being steep, which the French word "magnifique" 
suggests in this context. The incompleteness of in- 
formation will occur on the graphic side this time, 
not all properties of the described element being pos- 
sible to express in this mode. 
Such transcription constraints, once defined and 
analyzed, should be taken into account in order to 
obtain a "faithful" graphic representation. It seems 
that, in some cases, verbal-side incompleteness prob- 
lems might be solved thanks to some relevant linguis- 
tic markers, as well as to the knowledge included 
in the conceptual model of the route. We think 
here in particular of the questions whether there is 
a significant stretch of path between two elements 
of environment (landmarks), or a turn and a land- 
mark, mentioned in the text immediately one after 
• the other. Concerning the ambiguity related to the 
location of landmarks, one can either choose an ar- 
bitrary value or try to find a way of preserving the 
ambiguity in the graphic mode itself. 
We have mentioned here only some of the prob- 
lems concerning the translation of RDs into graphic 
sketches. We have not considered those parts of 
linguistic description contents which are not repre- 
sentable by images, such as comments or evaluations 
(e.g. "you can't miss it"; "it's very simple"). 
3 Steps of the translation process 
Translating linguistic utterances into a pictorial code 
cannot be done without an intermediate representa- 
tion, that is, a conceptual structure that bridges the 
gap between these two expression modes (Arnold, 
1990). Abraham and Descl~s (1992) talk about the 
necessity of creating a common semantics for the two 
modes. 
In our case, the purpose of the intermediate repre- 
sentation is to extract from the linguistic description 
the information concerning the route with the aim of 
representing it in the form of a sketch. However, in- 
stead of trying to create a unique "super-structure", 
we envisage a dual representation, with the linguistic 
and the conceptual levels. The core of the process of 
translating RDs into graphic maps will thus consist 
in the transition from the linguistic representation 
to the conceptual one. 
For the sake of the linguistic representation, we 
thought it necessary to carry out an analysis of real 
examples and elaborate a linguistic model of this 
particular type of discourse. We have worked on a 
corpus of 60 route descriptions in French. The anal- 
ysis has been performed at two levels: the global 
level and the local level. Global analysis consisted 
in dividing descriptions into global units, defined 
as sequences and connections, and in categorizing 
these units on a functional and thematic basis. We 
have thus specified several categories of route de- 
scription sequences, the main ones being action pre- 
scriptions (e.g. "tu continues tout droit") and land- 
mark indications (e.g. "tu tombes sur le b£timent 
A."). 2 The inter-sequence connections (e.g. "puis", 
"quand", "ou": "then", "when", "or"), which mark 
the relationships between sequences or groups of se- 
quences, have been categorized according to their 
functions (e.g. succession, anchorage, alternative). 
Local analysis consisted in the determination of se- 
mantic sub-units of descriptions and in the definition 
of the content of different sequences with respect to 
these sub-units. These latter will enable, during the 
processing of a RD, to extract and represent infor- 
mation concerning actions and landmarks, and their 
attributes. Thus, one of the objectives of local anal- 
ysis has been to determine which types of verbs in 
the RD express travel actions and which ones serve 
to introduce landmarks. The sub-units have been 
further analyzed and divided into types (e.g. differ- 
ent types of actions). 
For the purpose of the conceptual representation 
of RDs, we need a prototypical model of their refer- 
ent which is the route. We have decomposed it into 
a path and landmarks. A path is made up of trans- 
fers and relays. Relays are abstract points initiating 
transfers and may be "covered" by a turn. Land- 
marks can be either associated with relays or with 
transfers. More formally, a route is structured into 
a list of segments, each segment consisting of a re- 
lay and of a transfer. Landmarks are represented as 
possible attributes (among others) of these two ele- 
2 Cf. Example 2.1 
300 
ments. Having such a prototype for routes, with all 
elements defined in terms of attribute-value pairs, 
it is relatively easy to re-construct the route de- 
scribed by the linguistic input: the reconstruction 
consists in recognizing the relevant elements and in 
assigning values to their attributes. Using the route 
model, some elements missing in the text can be 
inferred. For example, since every route segment 
contains one relay (which may be a turn) and one 
transfer, the information concerning the fragment of 
the route expressed by: "tournez k gauche et puis 
droite" ("turn to the left and then to the right"), 
must be completed by adding a transfer between the 
two turns. 
Apart from models for linguistic and conceptual 
representations, the rules of transition have to be 
defined. For this purpose, it is necessary to establish 
relationships between different linguistic and con- 
ceptual entities. For example, the action of the type 
"progression" (e.g. "continuer", "aller") corresponds 
to a transfer and the actions of the type "change of 
direction" (e.g. "tourner") or "taking a way" (e.g. 
"prendre la rue") to a relay (which will coincide with 
a turn or with the beginning of a way-landmark, e.g. 
a street, respectively). 
Another aspect of modeling consists in specifying 
graphic objects corresponding to the entities in the 
route model. For the time being, we decided to do 
with simple symbolic elements, without a fine dis- 
tinction between landmarks. The graphic symbols 
have been created on the basis of the information 
accessible from the context rather than the one con- 
tained in the "names" of landmarks. These latter 
are included in sketches in the form of verbal labels. 
Once the whole route has been reconstructed at 
the conceptuM level, we start to generate the corre- 
sponding graphic map, like the one here below. 
0 b&timen~ A 
OOO panneaux d'informations 
dQscenl;@ 4 
to~"niquets du RER 
4 Conclusion 
Computer translation of route descriptions into 
sketches raises some interesting issues. Firstly, one 
has to investigate the relationships between the lin- 
guistic and the graphic modes, the constraints and 
possibilities which appear while generating images 
from linguistic descriptions. 
Secondly, a thorough linguistic analysis of route 
descriptions is necessary. We have used a discourse 
based approach and analyze "local" linguistic ele- 
ments by filtering them through the discourse struc- 
ture, described at the "global" level. Our goal is 
to build a linguistic model for the text type "route 
description". 
Another interesting problem is the form and the 
derivation of the conceptual representation of the de- 
scribed route. We believe that it cannot be directly 
obtained from the linguistic material itself. During 
the understanding process, the linguistic meaning 
has to be represented before the conceptual repre- 
sentation can be created. That is why we need a 
two-stage internal representation, based on specific 
linguistic and conceptual models. 

References 
M. Abraham and J-P. Desclds. 1992. Interaction be- 
tween lexicon and image: Linguistic specifications of 
animation. In Proc. o\] COLING-92, pages 1043-1047, 
Nantes. 
E. Andrd, G. Bosch, G. Herzog, and T. Rist. 1987. Cop- 
ing with the intrinsic and the deictic uses of spatial 
prepositions. In K. Jorrand and L. Sgurev, editors, 
Artificial Intelligence II: Methodology, Systems, Appli- 
cations, pages 375-382. North-Holland, Amsterdam. 
E. Andrd, G. Herzog, and T. Rist. 1988. On the simul- 
taneous interpretation of real world image sequences 
and their natural language description: The system 
SOCCER. In Proc. o\] the 8th ECAI, pages 449-454, 
Munich. 
M. Arnold and C. Lebrun. 1992. Utilisation d'une 
langue pour la creation de sc~nes architecturales en 
image de synthbse. Exp6rience et r6flexions. Intellec- 
tica, 3(15):151-186. 
M. Arnold. 1990. Transcription automatique verbal- 
image et vice versa. Contribution ~ une revue de la 
question. In Proc. of EuropIA-90, pages 30-37, Paris. 
A. Gryl. 1992. Op6rations cognitives mises en oeuvre 
dans la description d'itin6ralres. Mdmoire de DEA, 
Universitd Paris 11, France. 
K.M. Kahn. 1979. Creation of computer animation from 
story descriptions. A.I. Technical report 540, M.I.T. 
Artificial Intelligence Laboratory, Cambridge, MA. 
A. Yamada, T. Yamamoto, H. Ikeda, T. Nishida, and 
S. Doshita. 1992. Reconstructing spatial image from 
natural language texts. In Proc. of COLING-9P, pages 
1279-1283, Nantes. 
