STS: An Experimental Sentence Translation System 
Eric Wehrli* 
University of Geneva 
1211 Geneva 4 
wehrli@uni2a.unige.ch 
Abstract 
STS is a small experimental sentence translation 
system developed to demonstrate the efficiency 
of our lexicalist model of translation. Based on 
a GB-inspired parser, lexical transfer and lexi- 
cal projection, STS provides real-time accurate 
English translations for a small but non-trivial 
subset of French sentences. 
1 Introduction 
As a preliminary step towards the develop- 
ment of an on-line interactive system, we de- 
velop the STS system to establish whether our 
lexicalist translation model, using a GB-.inspired 
parser, lexical transfer and lexical projection, 
provides the kind of efficiency required for on- 
line translation 2. In its present implementa- 
tion, STS translates sentences from French to 
English. It can handle a limited, although not 
trivial subset of clausal structures, using a lexi- 
cal database of more than 5,000 entries (includ- 
ing compounds and idiomatic expressions). 
Accurate natural language translation requires a 
wide range of cognitive abilities, including gram- 
matical knowledge of the languages involved but 
also some amount of extralinguistic knowledge 
and common sense reasoning. Lack of weU- 
defined theoretical models of knowledge and 
conmlon sense reasoning makes fully automatic 
high quality translation unlikely in the near fu- 
ture. 
In the meantime, what appears to be an in- 
creasingly appealing alternative is on-line inter- 
active machine translation, i.e. systems which 
can consult the user when they are unable to 
solve a problem 1. However, in order to be a 
viable alternative to other machine or machine- 
aided translation models, and in addition to the 
usual requirements of reasonable quality and low 
cost, an on-line interactive system must also sat- 
isfy the requirements of real-time systems and in 
particular be fast enough not to use the user's 
patience. 
*Part of the work described in this paper has been 
supported by a grant from the Swiss national science 
foundation (grant no 11-25362.88). 
'For a discussion of on-line interactive translation 
see Kay (1982), Tomita (1986), Johnson and Whitelock 
(1987), among others. 
2 Architecture of the system 
The basic architecture of the STS system is the 
familiar transfer model with its three main com- 
ponents: analysis, transfer and generation. To 
see how these components interact, consider the 
following (drastically oversimplified) sketch of 
the translation process: an input sentence in 
the source language (SL) is first mapped onto 
some formal representation for this language (S- 
structure). This is done by the parser, on the 
basis of lexical information and detailed knowl- 
edge about the grammar of SL. The transfer 
component maps then the S-structure :returned 
by the parser onto an appropriate S-structure in 
the target language. The transfer is done con- 
stituent by constituent, in a top-down fashion, 
starting with the top S (or S) constituent. For 
each constituent, the lexical head is first con- 
sidered: Its lexeme is associated with a set of 
possible translations, ie. one or more lexemes 
in the target language. Once the relevant lex- 
eme has been selected an appropriate structure 
is projected on the basis of its lexical properties 
2For a description of the parser used in the STS 
project, see Wehrli (1988). 
1 76 
and of the general rules and principles of target 
language grammar. Notice that the projection is 
done solely on the basis of information internal 
to the target language. 
In other words, the interface structure is min- 
imal, as it should be, and is almost entirely a 
matter of lexical :napping. Both the analysis 
and generation modules are completely indepen- 
dent of each other. This again, is a desirable 
feature, in the sense that, for instance, the same 
parser can be used no matter what the target 
language :night be. In fact, given that the S- 
structures in this system are solely justified in 
terms of the gra~unar, the parser (and genera- 
tor) do not have to be application dependent. 
2.1 Lexical database 
The lexical database is the central piece of the 
STS system. It contains crucial information 
used by the three active components of the sys- 
tem. This information is distributed in two 
monolingual lexicons (SL lexicon and TL lexi- 
con) along with one bilingual lexicon. We shall 
consider them in turn: 
2.1.1 Monolingual lexicons 
We assume a static - or relational - concep- 
tion of morphology, along the lines of Jackendoff 
(1975), Wehrli (1985). According to this view, 
morphological relations between two or more 
lexical entries are expressed by a complex net- 
work of relations. 
A monolingual lexicon distinguishes three ba- 
sic entities: lexeme, word and idiom. A lexeme is 
an abstract lexical unit, which can be compared, 
roughly speaking, to a standard dictionary en- 
try. It stands for a whole class of morphological 
variants. By contrast, a word corresponds to a 
particular morphological instantiation of a lex- 
eme. In other words, we make a clear distinction 
among :features which may vary with inflexion 
and those which are invariant. To give an exam- 
ple, am, are, were, being, be, etc. are words, :nor- 
phological variants of the lexeme "be". The lex- 
emes are associated all the features which are in- 
dependent of the morphological realization, such 
as semantic features, subcategorization features, 
and the like. Features which depend on inflex- 
ional markers - e.g. tense, number, person, etc. 
- are naturaly attached to the words. In addi- 
tion to words and lexemes, a monolingual lex- 
icon also contains a list of idioms, ie. pharses 
which have a fixed, non-compositional meaning, 
such as to kick the bucket or to be caught red- 
handed. 
The notion of lexeme turns out to be one of 
great significance: Not only does it make possi- 
ble to factor out basic syntactic and/or senmn- 
tic properties shared by morphologically related 
words. At the same time, it also provides the 
abstract lexical level which is relevant for lexi- 
cal transfer. 
2.1.2 Bilingual dictionary 
The bilingual dictionary specifies the set of pos- 
sible relations between lexemes of the source lan- 
guage and lexemes of the target language. Each 
entry in this dictionary specifies one SL lexeme 
and one TL lexeme. In case one particular SL 
lexeme has more than one corresponding TL lex- 
eme (e.g. aimer -7 to like, to love, etc.), the 
bilingual dictionary contains as many entries as 
there are correspondences. The bilingual dictio- 
nary contains other kind of information as well. 
For instance, in the case of argument-taking el- 
ements, such as verbs or predicative adjectives, 
an entry of the bilingual dictionary must also 
specify how the arguments of the SL predicate 
match the arguments of the TL predicate. 
3 The transfer component 
The role of the transfer component is to :nap 
SL S-structures onto TL S-structures. In STS, 
this mapping is done indirectly, by means of two 
mechanisms: lexical transfer and lexical projec- 
tion. 
Transfer applies to the syntactic structures re- 
turned by the parser. In a top-down fashion, 
starting with the main S-structure, the transfer 
procedure considers the lexical head of a phrase, 
look it up in the bilingual dictionary and se- 
lects the most appropriate TL lexeme, based on 
contextual information, features in the bilingual 
dictionary. Once a lexeme has been selected, a 
process of lexical projection creates a TL syntac- 
tic structure on the basis of the lexical properties 
of the TL lexeme, and of the general syntactic 
77 
properties of TL. In the next step, the trans- 
fer procedure considers the complements of the 
head, using the same strategy. In addition to 
lexicaJ\[ transfer and lexical projection, the trans- 
fer procedure is guided by the argument map- 
ping information found in the bilingual dictio- 
nary~ as mentioned above. 
In the STS system, discrepancies between SL 
and TL, such as differences in word order or ar- 
gument matching, can be handled quite natu- 
rally without the need of complex and ad hoc 
structural transfer rules. To illustrate this point, 
the fact that a sentence containing a modal 
verb must be assigned a bi-sentential structure 
in French, but not in English follows from the 
lexical properties of modal verbs in French and 
in English, i.e. French modals are main verbs 
selecting an infinitival sentential complement, 
while English modals are not marked as main 
verbs. Within a linguistic theory which assumes 
that phrase structures are projected from their 
lexical elements, the structural differences be- 
tween French and English sentences follows from 
the lexical differences between, say, pouvoir and 
car~. 
(2)a. l'homme dont vous semblez avoir oublid 
le nom ne pourra-t-il pas vous fournir 
les renseignements dont vous avez be- 
soin? 
b. won't the man whose name you seem to 
have forgotten be able to provide you 
with the information that you need? 
Such examples show that the STS system 
can succesfully handle structures of a non-trivial 
level of complexity. The second example, in par- 
ticular, shows the ability of this system to han- 
dle problems such as difference in word-order, 
argument matching and idiomatic expressions. 
In addition, this model proved extremely effi- 
cient - total time for parsing and translation of 
the above sentences averages 150 ms/word 4 - 
which is a crucial prerequisite for on-line inter- 
action. 
4 Some examples 
Two examples of translations produced by the 
STS :system are given below. In the first ex- 
ample, a is the input sentence, b the structure 
returned by the parser, and c the sentence re- 
turned by the translator s . The structure has 
been omitted in the second example. 
(1)a. 
b. 
le livre que vous regardez semble ~tre 
facile ~ fire. 
\[ s \[ NP \[ I~plelivre\]j \[ ~ \[ c0MP \[ ~P 
quelj\] \[ s \[" NP eJk \[" VP \[" CLI v°usk\] 
regaxdez \[ NP eJj\]JJJi \[ vP semble \[ 
\[ s \[ NP e\]i \[ uP 6tre \[ AP \[" NP eJl 
facile \[ ~h \[ s \[ sPe\]\[ vplire \[ t~r 
eSl\]\]\]\]\]\]\]\]\] 
c. the book at which you are looking 
seems to be easy to read. 
3The structures returned by the parser correspond to 
slightly simplified S-structure representations of a GB- 
granmlar. The indexes in (lb) express A'-binding or con- 
trol relations with empty categories (e.g. \[ NP el). 
4CPU time on a DEC Vaxstation II computer. 

References 

Jackendoff, R. (1975). "Morphological and se- 
mantic regularities in the lexicon," Lan- 
guage 51:639-671. 

Johnson, R. and P. Whitelock (1987). "Machine 
Translation as Expert Task," in S. Niren- 
burg (ed.) Machine Translation, Cam- 
bridge University Press. 

Kay, M. (1982). "Machine Translation," Amer- 
ican Journal of Computational Linguistics 
8:74-78. 

Tomita, M. (1986). Efficient Parsing for Natural 
Language, Kluwer Academic Publishers. 

Wehrli, E. (1985). "Design and Implementation 
of a Lexical Data Base," Proceedings of the 
2nd European ACL Conference, pp. 146-153. 

Wehrli, E., 1988. "Parsing with a GB gram- 
mar," in U. Reyle and C. Rohrer (eds.), 
Natural Language Parsing and Linguistic 
Theories, Reidel, 1988. 
