AN ENGLISH-TO-KOI~EAN MACHINE TI~ANSLATOR, : 
• ~' .:~ MA~ ES/EK 
Key-Star Choi, Seungmi Lee, \]tiongun Kim, Deok--bong Kim, 
Chcoljung Kweon, and Gilchang Kim 
(;enter' fin' Artificial Intelligence Research 
Computer Science Department 
Korea Advanced \[nstit'td'c of Science and Technology 
7h~:jon, 805-70I, Korea 
{ kschoi, le{:sm, hgkim, dbkim, cjkwn, gckim} {a.csking.kai,st.ac.kr 
Abstract 
This note introduces at, English-to-Korean Machiue 
'lYanslation System MA!t'ES/EK, which ha.s been de- 
veloped as a research prototype and is still under up- 
grading in KAIST (Korea Advanced Institute of Sci- 
ence and Technology). MATES/EK is a transfl:r.-ba.~ed 
system and it has several subsystems tt, at can be used 
to support other MT-developments. They are gram-. 
mar developing environment systems, dictionary de- 
veloping tools, a set of angmented context free gritm 
Inars for English syntactic analysis, and so on. 
1. Introduction 
An \],;nglish-to-Korean mactdne translation sys- 
tem MATES/EK has been developed through 
a co-research done by KAIST and SF, RI (Sys- 
tmns Engineering l{.esearch Institute) from 1988 
to 1992, and is still under ewflution in KA1ST. It 
has several tools snpporting system dew~.lopment, 
such as the grammar writing language and its de 
veloping environment, a set of augmented context 
free grammar for English syntactic analysis, ;rod 
dictionary editor. This system has been devel- 
oped for UNIX workstation. 
MATES/EK was originally developed using 
Common Lisp in order to test the possibility 
of English-to Korean machine translation, and 
then it has been totally reconstructed using 
C language. Its main target dmmfin is elec- 
tric/electronic papers and so the dictionary ~tnd 
the grammars are specitically adjusted to the do- 
*This research is partly supported by Center for ArtiIi- 
cial Intelligence Research (CAIR) (1992). 
main and one of sample sentences is II"li;F, com- 
puter m;tgazine September 1991 issue to test and 
ewduate the system. 
2. Overview of The System 
MNI'ES/EK is a typical transfi'x-based system, 
which does English sentence analysis, tnmsforms 
the result (parse tree) into an intermediate repre-. 
sentation, and then transRn'ms it into ~L Korean 
syntactic structure to construct ~t Korean sen- 
tmlce. Figure 1 de.picts the ow'.rall configuration 
of MNI'ES/EK, which has R)llowing features: 
* Morphological Analysis Using l'¢-gr;ml : We 
resolve tile category ambiguities by combin- 
ing the N-gram and the rules. (Kim, 1992) 
. Augmented Context Free (',ranlmars for Ell- 
glish Syntactic An~dysis : We dew'.loped a set 
of augmented context free gr~mmu~r rules \['or 
general English synt~mtic analysis a, td the. ~m- 
alyzer is implemented using '/bnfita~ LI{. pars- 
ing algorithm (Tomita, 1987). 
, I,exical Semantic Structure (LSS) to repre-- 
sent the intermediate representation : The 
result of the syntactic structure is trans.- 
tbrmed into an intermediate representatkm 
LSS, which is a dependency structure that 
is relatively independent to specific lan- 
guages. In I, SS, the constituents in a sen- 
tence are combined only in head-dependent 
relation based on the lexieal categories, and 
there art.' no order relatkm between the con- 
stituents. Hence LSS is desirable for trans- 
129 
ell- formed~ English \ 
Sentence\ 
Semanticll / <~ISyntac ic IAnalysis~ 1 Trams ferI~\]Generat \[or 
\[Grammar ~I Grammar ~LGrammar -- 
V / / S~nLenoe / 
L Grammar Writing Language Environment / 
...... :<-" ........ ' ...... H ........... 
English En $lisl. E-Lo-K Korean , Korean 
Syntax Semant l( Trano fer Syntactlc Mot r Generator Generator 
<!English orph 
nalyzer Analyzer Analyzel 
\[--"i ~- - ~/___.~ll" -___ ,. 
ISyntactic ( Transfer 
~-~:i:~"w ~ \]Grammar "~:ai~9".%.':~:~:~:i~:~ .'~" 
Figure 1: The System Configuration of MATES/EK 
lation between English and Korean, two lan- 
guages with fairly different syntactic struc- 
tures (Kweon, et M., 1990, Kweon, 1992). 
• Grammar Writing Language and Its Envi- 
ronment : MATES/EK runs a series of tree 
transformations on LSS structures from the 
English syntactic structure, in order to get a 
structure specific to Korean syntactic struc- 
ture. To do this, a grammar writing language 
and the supporting system were developed for 
the tree transformations (Kweon, 1992). 
The whole tree transformations are done in 
a single grammar processing system in which 
a grammar writing language is defined and 
a set of tools, such as the parser, the in- 
terpreter and some debugging facilities for 
the language, are supported. In the gram- 
mar writing language, a rule describes a tree 
transformation by specifying the p~ttern of 
an input tree, test conditions, the transforma- 
tion operations, and the resultant tree struc- 
tures. Figure 2 is an example of a tree trans- 
formation rule written in grammar writing 
language and two trees before and after its 
apphcation. 
MATES/EK consists of a set of dictionaries, a 
set of grammar rules, and the processing modules. 
Translation is done through a series of processes; 
English morphological analysis, English syntac- 
tic analysis, English semantic analysis, English- 
Korean lexical transfer, English-to-Korean struc- 
tural transformation, Korean syntactic structure 
generation, and Korean morphologicM genera- 
tion. Brief introductions to each processing fol- 
lows. 
3. English Analysis 
3.1. Morphological Analysis 
It incorporates the method of categorial ambigu- 
ity resolution using N-gram with rule combina- 
tions, as well as the basic English word identifica- 
tion, such as word separation, processing of Mtixes 
and recognition of idiomatic phrases (Kim, et M., 
1092). 
3.2. English Syntactic Analysis 
It uses the gener~flized Tomita Lit parsing algo- 
rithm on augmented context free grammar. Tile 
grammar is inductively constructed from 3,000 
e~trefully selected sentences that include various 
linguistic phenomena of English. Those sentences 
are mainly selected from the IEEE computer mag- 
azine September 1991. Other sources of the 
test sentences are HP issue test sentences, ~nd 
I,ongman English Grammar Textbook. The con- 
structed grammar for syntax analysis consists of 
about 500 rules. 
As described above, LSg(the Lexical Semantic 
Structure) is description for the intermediate rep- 
resentation. The result of syntactic analysis is 
transformed into an LSS which is relatively more 
specific to English, and then is transformed into 
130 
A-samplc-trans tl)ml-!xde { 
iil(id12 : 2: 
(A! (1~! .-tmpl) (C! --trap2)) 
with { 
feature conditions 
, I _var ._lre¢ 1), I';; 
<~\'\\ actions StlC~l as 
_acti()n { 
/ 
<2;'> 0!Z, > , _ _ - feature operations A ¢\[1 ..... 
tml)l tmp2a' a h (I) (ll Impl) |;; (C Imp2)) 
..::~!:~:i:~:!::,-..,.. 
< ,,) 
/'~ ///*'\]\'\~\\ 
c ,i i; i)C b ) 
 ll/\]h ::::::::::::::::::::::::::::::: ========================== l' \]~. 
before after 
Figure 2: An example of grammar writing rule and the tree transformation - in the rule "(A! (B! 
tmpl) (C! tmp2))" describes that 'A' as a parent node of a pattern, '(B! trap)' and '(C! trap2)' as 
the first mtd second child, and each child may have zero or more children. '.lThe action part describes 
the necessary transformation operation. 
an LSS speeitic to Korean. 
3.3. English Semantic Analysis 
We developed more titan 300 tree transforming 
rules that are written in grammar writing law 
guage. These grammar rules lead the English 
syntactic structure into a dependency structure of 
English. This dependency structure is relatively 
similar to meaning structure but it is still specitic 
to English, so we need more tree transformations 
to get a structure for Korean language. 
head in an \];',uglish dependency structure is a \]:;n- 
glish verb word, the hexd in corresponding Korean 
dependency structure is Korean verb or a,djec- 
live word, those two words are often not mapped 
directly, l"igure 3 is an example of transforma- 
tion from an English syntactic structure into its 
English specilic dependency structures LSS for a 
sentence "Pipelining increases \])erformance by ex- 
I)loiting instruction-level parallelism." 
5. Korean Generation 
4. English to Korean Transfer 
In this step the system looks up the English- 
Korean bilingual dictionary. We manage the anal- 
ysis dictionary separately from the transfer dictio- 
nary so that we may use the san, e analysis dictio- 
nary to the other language pair such as Fmglish 
to Japanese with the other transfer dictionary. 
There are more than 300 lexicat specitic selection 
rules developed to make the lexieal selection bet- 
ter. 
4.1. English-Korean Structural 'lYans- 
formation 
Using another tree transformation grammar, the 
English specific dependency structure is trans- 
formed into a Korean language specific depen- 
dency structure after looking up the bilingual dic- 
tionary. The dependency structures are repre- 
sented as head and dependents. Although the 
5.1. Korean Syntactic Generation 
In this step the system transforms further the re- 
sultant structure into a list of Korean morphemes. 
Since the dei)endency structure sl)eeilies no word 
order, we have to lind the word order of a sen 
tence and necessary postpositions by help of rules 
and lexical information (Jang, eta\]., 1991). Note 
that Korean has, like Japane.se, several alternative 
postl)ositions lbr conveying the same meaning. 
5.2 Korean Morphological Generation 
After the whoh: tree transformation, tile resultant 
structure is a list of pairs of a morpheme and its 
category. The morphologicM generator is an au- 
tomaton that does the synthesis and separatim, of 
morphemes according to the context and Korean 
morpheme combination rule. Since the Korean 
language has some complex morphological struc- 
ture, the synthesis is a very complex process. 
131 
increases performance by / ~,~ 
exploiting parallelism 
s t!!!Je(ft obji~\[i fief 
- )% pipelinhlg perfomumce 
lb st jfct ot,jeqt 
missing parMlclism 
Figure 3: An example of English syntactic structure and the corresponding English dependency struc- 
ture which is described in LSS, where; PILED (PREDicate) the head of a sentence, normMly verbs or 
adjectives are selected, COMN (COMplement Noun) a node that leads a noun phrase, PREA (PREdi- 
cate Adjective) corresponds to a verb or an adjective in an adjective phrase, PREB (Pll, Edicate adverB) 
corresponds to a verb or an adjective in an adverb phrase. 
6. Problems for Evolution of the 
System 
Since after the first completion of this project, we 
have been trying to find and solve the problems of 
this system. Following list is a brief list of those 
problems, and they are not seem to be easy to 
solve in the near future. 
Robust processing for ill-formed sentences : 
Current MATFS/EK assumes that the input 
sentence be a well formed English sentence. 
After practical test, we found the robustness 
for ill-formed sentences is highly required, be- 
cause the papers fi'oln the IEEE computer 
magazine contains the non-sentential, non- 
text text such as braces, numeric expressions, 
formulas and so on. 
Processing of non-continuous idiomatic ex- 
pressions : In the dictionary entry specifica- 
tion we have a simple rule to represent the 
non-continuous idiomatic expressions, but it 
is not easy to detect those expressions from a 
sentence and represent the found expression 
in the internal structure for processing. 
Selecting correct word correspondency be- 
tween several alternatives : MATES/EK uses 
the semantic marker and a scoring of frequen- 
cies to select the word correspondeney. The 
system still lacks a strong strategy for the 
word selection. 
Processing Korean Sentence Style : Ko- 
reau language has various styles of sen- 
tences(difference between normal ones from 
the honorific or polite expressions), which are 
quite difficult to catch from the English sen- 
tences. 
Too many ambiguities in English syntactic 
anMysis : Currently MATES/EK uses a set of 
ad hoc heuristics and lexieal semantic mark- 
ers coded in the dictionary in order to solve 
the ambiguity resolution, such as the PP at- 
tachment. This problcm is related to the 
problem of selecting the right postposition of 
Korean. 
7. Test and Evaluation 
EvMuation of an MT systeln enmrges as a criti- 
c'M issue these days, but we have not yet found a 
strong and objective way of evaluation. After the 
first completion of the project we tried though, to 
make an evaluation of the system. 
In order to inake the evaluation as objective as 
possible we prepared three factors. First, the ref- 
erees of the evaluation should be those who are 
not the developers of the system, and they should 
take a training to make objective decisions. We 
selected randomly five master degree students as 
the referees. Second, the referees are given a deci- 
sion criteria of four levels: best, good, poor, and 
732 
fail. A sentence is 'best' translated if the resultant 
Korean sentence is very natural and requires no 
additional postediting. A sentence is 'good' trans- 
lated if the result sentence is structurally correct 
but it has some minor lexicM selection errors. A 
sentence is translated 'poor' if there is structural 
error as well as lexical errors. By 'fail', we mean 
when the system produces very ill-formed transla- 
tion or fails to prodnce any result at all. We took 
the first three levels to be 'success,' because even 
a sentence is translated in 'poor' degree, it is still 
understandable. (Even if a translation is scored 
to be 'fail', it could sometimes be understand- 
able from the view point of 'mechanical transla- 
tion.') Third, the test sentences should be those 
sentences which were never used during the de- 
velopment time. 
This system was tested on 1,708 sentences, 
whose length were less than 26 words selected 
from 2500 sentences in the IEEE comtmter mag- 
azine September 1991 issue. It showed about 95 
percent of success rate for sentences shorter than 
15 words, about 90 percent for 18 words, 80 per- 
cent for 21 words, and 75 percent for 26 words. 
This is a quite encouraging result since the IEEE 
computer magazine contains wide range of texts 
of wrions styles. 
8. Conclusion and Further Study 
Development of MATES/EK gaw; a strong mo- 
tivation of attacking practically important prob- 
lems, such as dictionary management, scaling up 
the grammar rules to the real texts, controlling 
the consistency of a large system. 
The system MATES/EK is still under grow- 
ing, trying to overcmne the problems listed above, 
scaling up the dictionaries and the grammar rules, 
and doing downsizing to the PC environment. 
l~eferences 
\[1\] Choi, K.S., (1988). Developing Linguistic 
Model and Skeleton System for Machine Trans- 
lation. KAIST TR, M20160. 
\[2\] Choi, K.S., (1989). Research on English-to- 
Korean Transfer and Development of Machine 
Dictionary. KAIST TR M03330. 
\[3\] Jang, M.G., et al., (1991). Korean Generation 
in MATES/EK. Proceedings of Natural Lan- 
guage Processing Pacific Rim Symposium (NL- 
PRS '9I), Singapore. 
\[4\] Kim, D.B., Chang, D.S., and Choi, K.S., 
(1992). English Morphological Analyzer in 
English-to-Korean Machine 'lYanslation Sys- 
tem. PI~ICAI'92, Scoul. 
\[5\] Kweon, C.J., Choi, K.S., and Kim, G.C., 
(1990). Grammar Writing Language (GWL) 
in MATES-El(. Proceedings of PRICAI 1990, 
Nagoya, Japan, November 14th 1990. 
\[6\] Kweon, C.J., (1992). Grammar Writing Lan- 
guage : CANNA-2. KAL~T TR, M~0071. 
\[7\] Tomita, M., (1987). An efficient augmented- 
context free parsing Mgorithm. Computational 
Lingui,stic,s. 13 (I-2), 1-6 198Z 
133 
