IMPL~\[CI(TNESS~ A~:; A G(.lt~)I\[NG PRINC~itLE 
iN tV~AC~/li~NE TRANSLATION 
Klaus SC~IIJBI!;RT 
lgSO/Research, Postbus 834{L NI,.3503 RH Uhecht, The Netherlands 
schubert@dill .uucp 
Multiling~,al cxtcnsibility requires an MT system t(~" have a 
tau/,uagc-iudcpendcnt pivot. It is mgtmd that au ideal, purely so. 
mastic pivot is impossil)le. A translafiou method is descfihcd iu 
which scmantic relations m~ kept implicit in synlax, while file sc- 
manlic trails and distinetious am implicit in the words of a fllll- 
ftcdged language iisell as pivot. 
L l~iulfiiinguai e~tensibility 
There is an extcnlal fitctor with vcry substantial conscquenec,,; 
lot the internal design o1" machine translation systems: exteno 
aibility. When a machine mmslation system has to allow lbr 
adding m'bitrary soumc m~d target languages without each time 
adaptint; the atmady existing pa~ts of the system, tim Reed ar- 
ises for at careftflly defiv.ed interface ~;tr,ctm'e to which 
modules R)r addithmal lauguagcs may bc linked. The design 
that besl lneets these requirements is the pivot or interlingual 
appmac, h, since it~ such a system them is only a single inter- 
face whi,:b gives access it) all tim languages already included 
in {itc system. 
In modeis of this type the only link hatween a source and a 
iarget lm~guagc is file in\[ermediale relcwesentation, it has a 
double lhnetion: 
1. The intcrmediNe representation should render the 
tel content of the iext being translated, with NI its 
details aud mtances. 
2 The intermcdiae representation stmuld contain lhe 
resnlts of Ihe grammatical analysis cmTied out on the 
som~:c text, wbem these chala('telistics are 
translation-relevant. 
(t is desirable that the intermediate mpresc, ntation express both 
the cnatem and the glammatical dlaracteristics of the text 
tmambit;aonsly, and since it is the interface to arbitrary 
languages, it should express them in a languageoindependent 
way. 
2o Lan!!:m~,,e.indepet~de~Nt semantics? 
't'o r(a~(.er tx)th the content aud the \[/HlCtiona{ features of a 
text is vsually taken to mean Nmlling them out in ~m appropri.. 
ate way. Tim intemmdiate mpresentNion provides a formalism 
t;or this puq)osc. ' Spelling out means maldng explicit. My 
main concern here, is investigating to what extent the requiIed 
e~q~lf,~:~L'm~s can bc achieved in a lmlguage-independent 
rt:p,esv,tation. Am there language-independently valid 
categories and values \[or the characteris|ics of words trod 
wold groups needed in an intcimediate ~epresentation? (When 
speaking of grammatical analysis, I take grammar to denulc 
the study of the entire inlema system of language, so that 
i;uth sy~ttax aml semantics on all levels between mmpheme 
aud text am stfl)liclds of grammar, l'ragmatics, by contrast, 
describes the inlluenee of extralinguistic factors on lauguage 
and is ~mt pa,'t of grammar; el. Schubelt 1987b: 14f.) 
The form of the linguistic sign is l~mguage-spccifi,=, wt~mcas 
its content ix nm'mally thought to be language,hldellcndenl. 
The content side of Ihe linguistic sign is therclorc ollel) as- 
sumed to he a good tart\[urn comparationis lot tra~mlation 
grammar. In oilier words, the lrallsfer slap fn)m a sytnactic 
fmYn in lhc source language to a conesponding lbrm th the 
target language is perlonned on lilt: hasis of iht; common 
meaning the two forms are supposed to have. 
As a consequence, an intermediate reprcsenlation is usually 
devised as a structure in which this common meaning is made 
explicit. The intermediate representation is scan as a semantic 
equivalcnl of the source text. For obtaining such a slrucmre, a 
syntactic analysis of the source text is by no mean:, superflu- 
ous. An inlennediate representatim~ consists, like any system, 
of elements and their relations. In a semantic system elc 
menls and relations ine semantic. But ill order to detect the 
elements and their relations in a given text, a syniactic 
maalysis is needed. ("Syntax-fi'ee semantic parsers" a, pply syn- 
tactic knowledge lacitly, and as a nile they work especially 
well for languages where the sequential mdet of "purely st: 
mantle" elements carries symactic thlbrmation.) 
There are two major clusters of reasons why an ideal ~;cmantic 
intermediate representation of the language.independent lend 
sketched above is impossible, however desirable it may be i>_ 
theory. 
Filet of 'all, tram are rm languagedndependeut sernanlic ale. 
manN. Whatever symbols am chosen --words, moiplmmes, 
numbers, letter codes... -- they are ,always inherently 
I.mgtmge-bnund. The elements of an artificial synthol system 
are either directly taken from an existthg language, or have aa 
explicit m implicit definition in a rcli.',rcnce hmguage. It is ira. 
possible to make a tufty language-independent system of sym. 
bols, if it is to possess the fill expressiveness of a hmnan 
language (ef. Schnbell 1986). Symbols cannot be giveu a 
meaning independenlly of a reference hmguage; I\[leir meaning 
can only become autonomous by being used th a language 
community during a long period. This is why a plamle(t 
language like Esperanto could not rank as a lhll.-fledged hn. 
nmn language fimn the very day the first textbook was pub. 
lished but had to develop slowly from ml artificial, refercnce 
language-dependent symbol system into m~ autotiomous 
hmguage by being used in a community (cf. Sehnbeit fotthc.). 
Perhaps this is an tmusuN argument in a eomtmtalkmal con. 
text, where people are u~d to defining symbol systems which 
they call "languages". It shoukl be borne in mind, however, 
that such defined symlx)l systems am subsets of an existing 
human language (or o1' several). Machine translation, by con 
trast, is concerned with translatin G texts between thunau 
languages, which hem a sem,'mtic point of view -- even if die 
lmlguage may be simplified or the text pre-edited-- are in- 
hermNy more complicated than artificial symbol systems. 
Not only are deft)ted semantic units in such systems reference 
hmguage-dependent, but the mad to the basic semantic units 
needed is via semantic deeompositim~ - with all its we11- 
599 
known problems. Scholars have for centuries been trying to 
find universally valid semantic atoms (or primitives), but none 
of the many systems suggested has met with acknowledge- 
ment or proved applicable on any wider scale. Individual 
languages cut up and label reality in different ways; no under- 
lying "smallest semantic units" have been found as yet and 
possibly they will never be found. In my opinion the conclu- 
sion is that meaning is not portioned, so that no smallest 
portions can be found. 
Semantic atoms would be needed for totally spelling out the 
content of a text in a language-independent way, that is, in 
such a way that it would be suited for translation into any ar- 
bitrary target language. In many machine translation systems, 
ambitions are not that high. Most often, intermediate represen- 
tations use words or other language-bound symbols, decorated 
with semantic features which are held to be cross- 
linguistically valid. Yet, what is true for semantic atoms ap- 
plies to semantic features as well, albeit in a less obvious 
way: They contain portions of meaning which do not function 
in all languages in the same way. That semantic atoms and 
features are not as cross-linguistic as they seem to be, is also 
suggested by the experience that they are very hard to define 
and delimit in a way that fulfils exactly the required function, 
or denotes precisely the intended distinction for a large 
number of languages simultaneously. It is because of this that 
intermediate representations often have to be adapted, attuned 
or even redesigned when a new source or target language is 
added to the system. Such representations fail to provide for 
multilingual extensibility. 
3. Case frames 
The second cluster of reasons for the impossibility of an ideal, 
purely semantic, intermediate representation concerns seman- 
tic relations. One of the best-known approaches to making 
semantic relations explicit is Fillmore's ease grammar 
(1968). Deep cases are often believed to be cross-linguistically 
valid. Although there are many substantial difficulties in del- 
imiting and labelling deep cases (cf. Fillmore 1987), many 
machine translation systems perform transfer with case 
frames. This works quite well to a certain degree, but slowly 
the insight is gaining ground that deep cases nevertheless are 
language-specific. If case frames really were an autonomous 
tertium comparationis, translating on the basis of case frames 
would mean just filling in target language forms in a 
language-independent case frame obtained from the source 
language analysis. But in reality case frame-based translation 
often entails a transfer from a source language-specific case 
frame to a target language one. Evidence for this need comes 
first from general linguistics (e.g. Pleines 1978: 372; Engel 
1980: 11), but recently alms up in computational linguistics as 
well (Tsujii 1986:. 656; cf. Schubert 1987a). This is in con- 
cord with Harold Somers' (1987: viii) observation about the 
popularity of case grammar, already declining in theoretical 
linguistics, but still in vogue in computational applications. 
Returning to the argument about a purely semantic system, it 
can be concluded that neither the elements nor the relations, 
which together should constitute the theoretically desirable 
language-independent intermediate representation, actually ex- 
ist. This insight, among others, is the origin of the idea of im- 
plicitness in machine translation. 
4. Implicitness 
Since there are no cross-linguistically valid semantic relations, 
and since case frames arc therefore language-specific, the 
transfer step actually lacks a language-independent intermedi- 
ate stage. This means that, where semantic relations are con- 
cerned, there is no tree pivot. There are only source structures 
and target structures with a transfer step somewhere between 
them. Given the notorious difficulties of defining deep cases, 
600 
the question arises whether it is really necessary for machine 
translation to make semantic relations explicit. As they are 
language-specific anyway, it is much easier to perform 
transfer at another level, which is language-specific as well, 
but about which there is much more certainty: syntax. If 
transfer is carded out at the syntactic level, semantic deep 
cases can remain implicit. 
Before describing this in somewhat more detail, a few words 
about the semantic elements. If there are no language- 
independent semantic relations, looking for language- 
independent semantic elements does not seem worthwhile ei- 
ther. Yet, the above discussion of the function of an inter- 
mediate representation entails another unexpected implication: 
Since an intermediate representation is the only link between 
source and target languages, it must be as expressive as any 
of them. If high-quality machine translation is the goal, this 
condition is inevitable, since the intermediate representation 
has to render and to convey the full and unsimplified content 
of the text, to make further translation possible. It must be 
feasible to translate into such an intermediate representation 
from all other languages. Interestingly enough, this translata- 
bility criterion is the property by which human language is 
distinguished from artificial symbol systems by one of the 
classics of linguistics, Louis Hjelmslev (1963: 101). Accord- 
ing to him, a human language (his term is dagligsprog) is a 
language into which all other communication systems (human 
languages and artificial symbol systems) can be translated. As 
a consequence of Hjelmslev's theory, an intermediate 
representation with the expressiveness indispensable for mul- 
tilingual high-quality machine translation should indeed be it- 
self a human language. 
Now the elements and relations in the semantic system of the 
intermediate representation can be considered together. The 
discussion so far has yielded two results: There are no 
language-independent semantic elements and there are no 
cross-linguistically valid semantic relations. Moreover, the re- 
quired expressiveness entails the consequence that the inter- 
mediate representation should be a full-fledged language. 
If the pivot of a machine translation system is a language 
(rather than an artificial symbol system), this removes the 
problems of spelling out semantic dements and relations. Se- 
mantics can then be kept implicit, that is, it can be expressed 
in tile intermediate language by purely linguistics means, in 
the way illustrated below. 
If the intermediate language is a full language, the syntactic 
side of the translation process comes down to performing two 
direct translations: first from a source language into the inter- 
mediate language, and then from the intermediate into a target 
language. Moreover, if one opts for a human intermediate 
language, this brings about a substantial change in the design 
of a pivot-based mnltilingual machine translation system. Ar- 
tificial intermediate representations are designed to achieve 
multilingual extensthillty at the level of transfer. The condi- 
tions that provide for extensibility are thus directly intertwined 
with the mechanisms that translate from one particular 
language into another. But when the intermediate representa- 
tion is a language, multilingual extensibility shifts to another 
level: it is now catered for by the combination of language 
pair modules in which the intermediate language is always 
one of the two counterparts. This considerably facilitates the 
design, since mullllingual extensibility with all its needs of 
cross-linguistically valid grammatical elements and relations 
no longer interferes with the translation steps proper. For this 
type of direct translation within a language pair, a translation 
method that performs the syntactic transfer on the basis of 
syntactic functions is both suitable and sufficient. 
A possible implementation of this idea is found in the meta- 
taxis translation method (Schubert 1987b: 222ff.). It works on 
the basis of language-specific syntactic functions and contras- 
tive transformation ntles that cater for the transfer step. Meta- 
taxis mle,~; can be seen as contrastive lexical redundancy 
rules ore1' a bilingual dictionary. Teehulcally speaking, they 
are tree tt~msduction rules which presuppose the dictionary to 
consist of tree-structured entries. Metataxis is contrastive 
dependency syntax for translation. Of course it is not the only 
possible way of performing the syntactic part of a machine 
translation procedure. A dependency-based approach, howev- 
er, is esprit\[ally well suited for a multilinguul system, since 
dependency syntax takes syntactic functions as its primary 
units, using syntactic form as a secondary means. This is an 
essential enhancement, since syntactic functions- i.e. depen- 
dency retation.s such as subject, object etc. - are translation- 
relevant, whereas syntactic form characteristics- such as a 
word's Position vis-~t-vis other words, its endings for case, 
number, lerson, tense, mood, aspect etc.- are needed for 
monolingual analysis and synthesis steps in an overall transla- 
tion proo'.ss, but are not themselves directly translation- 
relevant). 
As for th¢~ semantic side of the translation process, an inter- 
mediate representation tempts its designers to make explicit all 
the semantic distinctionsneeded for specific source and target 
languages, which ultimately leads astray if mnltilingual exten- 
sibility is aimed at. This is the danger of an "exploding" pivot. 
If the pivot is a language, the degree of semantic detail it pro- 
vides can be taken as a natural limitation to this explosive tern 
dency: An implementation is possible in which the entire se- 
mantic pn~cessing needed for a machine translation procedure 
is carried out with linguistic means in the intermediate 
language only. This means that whatever semantic elements or 
relations are used, they are always expressed by means of 
words aria morphemes from the intermediate language. No se- 
mantic ll;atures, no selection rules and no meta-linguistic la- 
bels or togs are used. This is in good agreement with the 
metataxis approach to the syntactic side of the process: Meta- 
taxis provides all syntactically possible translations of a source 
sentence (clause, paragraph ._) and the semantic processing 
performs a choice among these Alternatives. (It normally needs 
a substanlial pragmatic augmentation witli knowledge of the 
world etc; ef. Papegaaij/Sehubert forthc.: chapter 3.5.). This 
semantic process can be carried out entirely in the intermedi- 
ate language and is titus suitable for metataxis altemative 
translations generated from whatever source language. 
The second half of the translation, from the intermediate into 
a target lauguage, could in theory work in the same way, but 
this would presuppose semantic processing in all the different 
target lanl,,uages. The requirement of extensibility is much 
better met, if all the semantic processing for the second half 
as well is carried out by means of the intermediate language. 
This is indeed possible. The semantic-pragmatic processing in 
the second half is - to put it in plain words - conceroed with 
fitting in the alternative translations offered in the bilingual 
dictionary (intermediate language ---> target language) into the 
context of the sentence and the entire text. What is needed for 
assessing the probability of different contexts is information 
about the typical contexts of the words in question: word ex- 
pert knowledge. It is possible to describe the typical contexts 
of target language words by means of words and phrases in 
the intermediate language. Thus all semantic-pragmatic com- 
parisons and probability computations are carried out ex- 
clusively in the intermediate language, and as a consequence 
only a single semantic system is needed for trattslating 
between arbitrary languages: a system in the intermediate 
language, ff rids central system is built up within the limita- 
tions of fl~e intermediate language without reference to any 
peculiarities of p .a.rtieular source and target languages, the re- 
quirement of complete extensibility is fulfilled. 
5. Conch~slon 
An inte.uediate language for high-quality machine translation 
needs to he a full-fledged human language, due to the inherent 
lack of expressiveness that is an inevitable characteristic of ar- 
tificial symbol systems. 1 argue that one can make a virtue of 
this necessity: A human language as intermediate representa- 
tion allows for rendering the full content of the text without 
making semantic elements and relations more explicit than 
what is expressed by appropriately interrelated words of the 
intermediate language. 
Of course the question arises whether, in that ease, any arbi- 
trary language would be suited for this function. It should be 
pointed out, however, that the full range of trade-offs related 
to the choice of an intermediate language cannot be dealt with 
in this three-page contribution. My ideas about implicitness 
are closely related to one of at least three fundamental criteria 
for an intermediate language: expressiveness. The other two 
are regularity and semantic autonomy. Only when all cri- 
teria are considered together, can a choice be made. 

References 

Engel, Ulrich (1980): Fiigungspotenz and Sprachvergleich. Vom 
Nutzen eines semantisch erweiterten Valenzbegriffs fiir die 
kontrastive Linguisfik. In: Wirkendes Wort 30, pp. 1-22 

Fillmore, Charles J. (1968): The ease for case. In: Universals in 
linguistic theory. E. Bach / R. T. Harms (eds.). New York: 
Holt, Rinehart & Winston, pp. 1-88 

Fillmore, Charles J. (1987): A private history of the concept 
"frame". In: Concepts of case. Ren6 Dirven / Giinter Radden 
(eds.). Tiibingen: Narr, pp. 28-36 

Hjelmslev, Louis (1963): Sproget. Kcbenhavn: Berlingske forlag 
\[2nd ed.\] 

Papegaaij, B. C. / Klans Schubert (forthc.): Text coherence in trans- 
lation. Dordrecht/providence: Foris 

Pleines, Jochen (1978): Ist der Universali~tsanspruch der Kasus- 
grammatik bereehtigt? In: Valence, semantic case, and gram- 
matical relations. Wemer Abraham (ed.). Amsterdam: Benja- 
mins, pp. 335-376 

Schubert, Klans (1986): Linguistic and extra-linguistic knowledge. 
In: Computers and Translation 1, pp. 125-152 

Schubert, Klaus (1987a): Warm bedeuten zwei W6rter dasselbe? 
0bet Tiefenkasus als Tertium comparationis. In: Linguistik in 
Deutschland. Wemer Abraham / Ritva Arhammar (eds.). 
Tiibingen: Niemeyer, pp. 109-117 

Schubert, Klans (1987b): Metataxis. Contrastive dependency syntax 
for machine translation. Durdrecht / Providence: Foris 

Schubert, Klaus (forthe.): Ausdmckskraft und Regelm~l\]igkeit. 
In: Language Problems and Language Planning 12 \[1988\] 

Somers, H. L. (1987): Valency and case in computational linguis- 
tics. Edinburgh: Edinburgh University Press 

Tsujii, Jun-ichi (1986): Future directions of machine translation. 
In: 11th International Conference on Computational Linguis- 
tics, Proceedings of Coling '86. Bonn: Institut fiir 
angewandte Kommunikations- and $prachforschung, pp, 655-668
