An English to Turkish Machine Translation System 
Using Structural Mapping 
Cigdem Keyder Turhan 
Dept. of Computer Engineering 
Middle East Technical University 
Ankara 06531, Turkey 
cigdem@ceng, metu. edu. tr 
Abstract 
This paper describes the design and imple- 
mentation of an English-Turkish machine 
translation (MT) system developed as a 
part of the TU-Language project supported 
by a NATO Science for Stability Project 
grant. The system uses a structural trans- 
fer approach in translating the domain of 
IBM computer manuals. The general de- 
sign of the translation system and a de- 
tailed description of the transfer compo- 
nent is presented in this paper. 
1 Introduction 
The TU-Language project sponsored by the NATO 
Science for Stability Programme was started in 1994 
to establish computational foundations for the natu- 
ral language processing research on the Turkish lan- 
guage with the collaboration of the Computer Engi- 
neering Department of Middle East Technical Uni- 
versity, the Computer Science Department of Bilkent 
University and ttalici Computing, Inc. The project 
attempts to perform extensive research on Turkish 
which will eventually lead to the development of 
an English to Turkish machine translation system, 
Turkish language tutorial system, a Turkish dictio- 
nary and other software tools to be used in further 
research. 
In this paper, some issues in translating from En- 
glish to Turkish languages, the translation domain, 
the outline of the machine translation system un- 
der development, and a detailed description of the 
transfer component will be presented. 
2 Turkish Language 
Morphology and syntax of Turkish are very differ- 
ent from English, therefore, the formalism used to 
represent English texts has to be altered signifi- 
cantly for Turkish text representation. The Turk- 
ish language is characterized as a head final lan- 
guage where the modifier/specifier always precedes 
the modified/specified. This characteristic also af- 
fects the word order of the sentences which can be 
described as SOV where the verb is positioned at the 
end. 
Also, when compared to other languages, Turkish 
relies more on overt case markings which mark the 
role of the argument in a sentence. The case mark- 
ings enables Turkish to have a relatively free word- 
order property where every variation in the word 
order in a sentence results in a different meaning. 
In the MT system being developed, these and 
other different characteristics of the Turkish lan- 
guage are handled in the transfer and generation 
components. 
3 Translation Domain 
As more and more computer companies enter the 
Turkish market, a growing demand for English 
to Turkish translation of computer manuals has 
emerged. Other machine translation systems have 
also chosen the domain of computer manuals for 
their translation systems because of the relatively 
unambiguous and narrow sublanguage used (Tsut- 
sumi, 1986). Also, in his research, Nasukawa (Na- 
sukawa, 1993) concluded that the statistical anal- 
ysis of the text in IBM computer manuals showed 
that 92.6 percent of the words in a computer man- 
ual are used in the same word sense which would 
significantly reduce the problem of lexical ambiguity 
resolution. Another advantage is that the material 
in a computer manual is observed to be written as 
clearly as possible in a relatively narrow area which 
will hopefully ease the difficult job of understanding 
and representing the input sentence. 
As a result of these observations, the TU- 
Language project team has chosen the IBM com- 
puter manuals as their translation domain.. 
4 Machine Translation System 
The English to Turkish MT system under develop- 
ment uses a structural transfer approach which has 
the following components. First, the English sen- 
tence retrieved from the IBM manual is analyzed 
320 
by the CLE parser (Alshawi and Moore. 1992) to 
generate an intermediate representat.ion. This rep- 
resentation is mapped onto a recursively embedded 
case frame which is then input to the transfer mod- 
ule. The transfer module maps the input case fi'ame 
into the target case frame which is then filtered to 
be transformed into the required input format of 
the target language generator. Lastly, the gener- 
ator maps the Turkish case frame into the Turkish 
sentence which is then post-edited by a human trans- 
lator to get. an intelligible and accurate translation. 
4.1 Analysis Phase 
For analyzing the English input, the (:;ore Language 
Engine developed by the SRI Cambridge Computer 
Science Research (:entre was used (Alshawi and 
Moore: 1992). The CLE system has been trained 
to meet the lexical, syntactic and semantic demands 
of the IBM corpus. In CLE, explicit intermediate 
levels of linguistic representation are used in the dif- 
ferent phases of the analysis. Following the syntac- 
tic and semantic analysis/synthesis which uses the 
unification-based approach, the quasi logical form 
(QLF) is developed. QLF can be described as a con- 
textually seT~silive logical form. Since the CLE sys- 
t.em produces various parses for an input sentence, 
the best parse is filtered by the system which con- 
veys the intended meaning of the sentence. Then the 
chosen representation is mapped into a case frame. 
4.2 Transfer Phase 
Experience with previous systems using the interlin- 
gua technique showed the significant complexity of 
extracting and representing deep meaning of a natu- 
ral language text (Goodman and Nirenburg, 1991). 
Another major difficulty encountered with this ap- 
proach is that the language specific attributes nee- 
essary to define the translation equivalents in the 
lexical and structural levels are neutralized in the 
interlingual representation thereby complicating the 
task of generation considerably. 
A similar problem occurred with systems using the 
transfer approach with deep semantic analysis such 
as the EUROTRA project (Johnson et al., 1985). 
Such systems were observed to be difficult to develop 
a.nd maintain. To avoid these problems, the MT 
systems developed recently generally chose to use 
the straightforward transfer approach which relies 
on various types of lexical, syntactic information and 
a limited use of semantic analysis (Tsntsumi, 1986). 
The system being developed as a part of the TU- 
Language project also chose the structural l.ransfer 
approach with a minimal amount of semantic anal- 
ysis. T'he transfer phase of our MT systeln per- 
forms structural transfer between the respective case 
frames of the analysed English sentence and target- 
ted Turkish output. In a top-down manner, the 
transfer lnodule tra.nsfbrms the English case frame 
or adds new infbrmatioll to the 'turkish case frame 
in order to generate the equivalent Turkish noun 
phrase, clause or sentence with the aid of a trans- 
fer dictionary, and the transfer rules. 
The English and Turkish case frames for 
clauses/sentences are generally similar to each other 
with differences seen in the sentence's mood and the 
verb's aspect and modality. Some information not 
extracted in the analysis phase such as the sentence 
form, clause type. role, etc. have to be determined 
in the beginning of the transfer phase and added to 
the Turkish case frame. An example sentence and 
parts of the corresponding English and Turkish case 
frames can be seen below: 
(1) John must show the program works. 
John goster+tns program+gen calis+t.ns 
'John progralnin calistigini gostermeli'. 
English Case Frame: 
-mood declarative 
voice active 
root verb 
nlod 
suhj 
args 
obj 
show\] mustJ 
110111 
-mood 
voice 
verb 
a,rgs 
Turkish Case Frame: 
"sform finite 
ctype pred 
mood necess 
voice active 
root. 
verb \[tells 
subj 
a.rgs 
obj 
goster\] 
pres j 
\[John\] 
"role 
sforn\] 
ctype 
nlood 
voice 
w?rb 
a.rgs 
John\] 
declarative 
active 
root work\] 
tense pres j 
\[subj \[nom p o r m\]\] 
fact 
part 
pred 
decl 
active 
root ca\]is\] 
tens pres / 
asp aor _1 
The case frames representing tile noun phrases of 
the English and Turkish sentences vary from each 
321 
other in a number of ways because the generator 
requires additional information to form an equiv- 
alent Turkish representation. For example, in the 
sentences below, 
(2) That man writes programs. 
O adam yaz+pres program 
'O adam program yazar.' 
(3) Programs were written for the project. 
Program+pl yaz+pass+pst icin proje 
'Proje icin progralnlar yazildi.' 
Even though the word program is used in the plu- 
ral form in both of the English sentences, the trans- 
fer module needs to determine the specificity of the 
noun phrase in question and send it to the generator 
which will accordingly output either the singular or 
plural form of the noun. 
Some of the complex transfer issues presented by 
Lindop and Tsujii (Lindop and Tsujii, 1991) also 
arise in our machine translation system. These is- 
sues are handled with special transfer rules and 
transfer lexicon entries. In the beginning of the 
transfer phase, the exception rules are tested and 
eventually a checklist containing the problematic 
components of the input is generated. Some ex- 
amples of these components are verbs which change 
meaning when used with different attributes, pas- 
sive, existential or conditional sentences, relative 
clauses, idiomatic use of prepositional phrases, etc. 
As the transfer process continues, the checklist is 
referenced in order to block the default translation 
and handle the exceptions. The rest of the mapping 
proceeds in a straightforward fashion until all of the 
information in the source case frame is mapped onto 
the target case frame. 
Some of the complex transfer issues handled in the 
transfer phase will be presented in this section.First, 
a significant amount of head-switching is performed 
to resolve the lexical and structural differences in 
the English and Turkish languages. In the example 
below, 
(4) attempted execution 
tesebbus calistirma 
'calistirma tesebbusu : execution attempt' 
execution is the head noun of the English phrase 
whereas tesebbus (attempt) becomes the head noun 
in the target phrase. 
Another problem encountered in the transfer mod- 
ule is complex lexical transfer with category changes 
such as the example given below: 
(5) John gave a weak cough. 
John oksur+pst hafifce 
'John hafifce oksurdu.:John coughed weakly.' 
The adjective weak has to be mapped onto an ad- 
verb hafifce and the verb give's default translation 
into the verb ver has to be blocked when it is used 
with the dependent noun cough. Consequently, the 
fitting target verb is found to be oksurmek. 
Also, dependent on the verb, an object of an 
English sentence may be mapped to different case 
markings in Turkish. 
(6) I hit the man. 
vur+pst+pers adam+dat 
'Adama vurdum' 
(7) I shot the man. 
vur+pst+pers adanl+acc 
'Adami vurdum' 
As seen above,the object of the sentence *be man. is 
mapped either to a.n accusative marked object adami 
or a dative marked indirect object adama in the tar- 
get sentence. 
There are also some complex structural changes 
encountered during transfer. An English clause 
might be mapped into a Turkish gerund: 
(8) While he was working 
+ken calis+tns 
'Calisirken' 
Another example of a structural transforlnation 
encountered can be seen in active/passive forms of 
sentences. In the English passive tbrm, the surface 
subject can correspond to both the direct object or 
the indirect object of the active form. Yet in Turk- 
ish, the surface subject of a passive sentence can only 
be the direct object of the active form. The differ- 
ence between the two sentences is distinguished by 
the order of the phrases in the target sentence as 
seen in the example below: 
(9) This program was given to the user. 
Bu program ver+pas+pst kullanici+dat 
'Bu program kullaniciya verildi.' 
(10) 
The user was given the program. 
Kullanici+dat ver+pas+pst program 
'Kullaniciya program verildi' 
In both of the Turkish translations, the sur- 
face subject is program whereas the surface subject 
changes in the English inputs. 
The order of the words in the output sentences are 
determined by the topic and focus features of the tar- 
get case frame which are mapped during the transfer 
phase. In the first sentence, the topic is found to be 
program, and the focus is kullanici, whereas in the 
second sentence the topic and the focus are kullanici 
and program., respectively. 
The transfer module also attacks problelns related 
to sentential transformation such as the ones re- 
quired in the example below: 
(11) 
There are programs in the disk. 
var program+pl disk+loc 
'Diskte progralnlar var' 
322 
Parts of the case frames for the sentences above 
are as follows: 
English Case Frame: 
-mood decl 
voice active \[ 
verb root 
tens 
args 
adjs 
Turkish 
-sform 
ctype 
nlood 
voice 
verb 
subj 
arg2 
place 
pres 
nolll entity\]\] 
det there J / 
\[nora program\]j 
\[nom disk\]\] 
args 
Case Frame: 
finite 
exist. 
decl 
active 
root 
tens 
asp 
subj 
to-be 
pres 
aor  l ro r ,q 
\[disk\]\] adjs place 
Other problems encountered in the transfer phase 
are the lexical gaps, idiomatic uses of phrases, and 
lexical disambiguation by syntactic or semantic con- 
tent. 
With all the complex transfer issues resolved in 
the transfer phase, the corresponding Turkish case 
frame is generated which is then translated froln its 
Prolog notation into the Lisp notation required by 
the generation module. 
4.3 Generation Module 
The generation component of the system is based on 
the GenKit environment developed at the Carnegie 
Mellon University - Center for Machine Transla- 
tion which provides facilities for a unification-based 
generation grammar environment (Hakkani et al., 
1996). 
As input, the generator receives a recursively em- 
bedded target case frame representation where all 
the lexica.1 choices have been made, and produces 
the Turkish sentence conveying the same meaning. 
Since Turkish has complex agglutinative word 
forms, a separate morphological generator handles 
the proper morpheme selection, vowel harmony, etc. 
to produce the surface form of the generated words. 
The Turkish sentence output by the generator is 
post-edited by a. human translator to ensure accu- 
racy and intelligibility of the target sentence. 
5 Conclusion 
In this paper, an English to Turkish MT system us- 
ing the structural transfer approach with a limited 
amount of semantic analysis has been described. 
The structural transfer method which uses the re- 
cursively embedded case frames as intermediate rep- 
resentation proved to be very suitable in the applica- 
tion of English to Turkish machine translation. The 
greatest difficulty encountered with this approach is 
handling the complex transfer issues that arise due 
to the differences between the two languages. 
Hopefully, the introduction of the English to Turk- 
ish MT software into the Turkish market will meet 
the growing demands for accurate, fast and high- 
quality translations in the field of computer manu- 
als. Depending on the success of the system, the 
lexicons and the transfer module might be modified 
to tackle other translation domains in the future. 
6 Acknowledgements 
Helpful comments of Asst.Prof Cem Bozsahin and 
Assoc.Prof.Mehmet R.Tolun are gratefully acknowl- 
edged. This work has been supported by the NATO 
Science for Stability Project TU-LANGUAGE. 

References 

Hiyan Alshawi and Robert C. Moore. 1992. In- 
troduction to the CLE. In Hiyan Alshawi, editor, 
The Core Language Engine. The MIT Press, Cam- 
bridge, Massachusetts. 

Kenneth Goodman and Sergei Nirenburg ed. 1991. 
The KBMT Project: A Case Study in Knowledge- 
Based Machine Translation. Morgan Kaufmann, 
San Ma.teo, C, alifornia. 

Dilek Z. Hakkani, Kemal Oflazer and Ilyas Cicekli. 
1996. Tactical Generation in a Free Constituent 
Order Language. In Proceedings of 8th International Workshop on Natural Language Generation, Sussex, UK, June. 

Rod Johnson, Maghi King and Lois Tombe. 1985. 
EUROTRA: A Multilingual System Under Development. In Computational Linguistics, 11:155-169. 

Jeremy Lindop and Jun-ichi Tsujii. 1991. Com- 
plex Transfer in MT: A Survey of Examples. 
CCL/UMIST Report No:91/5. 

Tetsuya Nasukawa. 1993. Discourse Constraint in 
C, omputer Manuals. In Proceedings of the TMI 
1993, pages 183-193. 

Taijiro Tsutsumi. 1986. A Prototype' English- 
Japanese Machine Translation System for Trans- 
lating IBM C, olnputer Manuals. In Proceedings of 
Coling 1986, pages 646-648. 
