NARA: A Two-way Simultaneous Interl3retation System 
between Korean and Jal~anese -A methodological study- 
\[lee Sung Chung and Tosiyasu L. Kunii 
Department of" Information Science 
Faculty of Science, University of Tokyo 
7-3-I Ilongo, Bunkyo-ku Tokyo, 113 Japan 
Abstr_~ 
This paper presents a new computing model for con- 
structing a two-way simultaneous interpretation system 
between Korean and Japanese. We also propose several 
methodological approaches to the construction of a 
two-way simultaneous interpretation system, and real- 
ize the two-way interpreting process as a model unify- 
ing both linguistic competence and linguistic perfor- 
mance. The model is verified theoretically and 
through actual applications. 
I. Introduction 
Our goal is to develop a two-way simultaneous in- 
terpretation system between Korean and Japanese. In 
order to achieve this goal, we have designed a specif- 
ic computing model, which is a computer program based 
on the a\]gorithll that formalizes the mechanism of 
two-way simultaneous interpretation and the correspon- 
dence of the two languages. Our computational ap- 
proach consists of two parts° First, build an explicit 
computational model, then sh(m the practical applica- 
biIity and theoretical validity of the model° The 
most significant advantage of using a formal descrip-. 
tion to represent our" system is in that the descrip- 
tive contents of the representative algoritb~ do not 
depend upon the conventJ.orml approaches to machine 
translation° ~.Jc have also implemented a prototypint~ 
system .\[~AR~.., a two-way simultaneous interpretation 
system between Korean and Japanese. In this paper, we 
outline the features of the system without \[4oin\[~ into 
the details. 
2. Methodology 
the adjusted grammar with the interpretation mechan- 
ism: a representation, an a\]goritb~l, and a complexity 
metric. We take the following items as the subjects of 
c;lethodological study. 
(I) The theory of grammar 
We require an adjusted grammar to be suitable for 
description of the two languages as input and output. 
It is intuitiw}ly clear that the more cc~,municatable 
the adjusted gralr~imr is, expressed by a powerful for- 
real system, the more efficiently is the grammar inter- 
prcted. We adopt general ized phrase structure 
grarl,~lar (GPSG) framework\[\]{\]. 
(2) The notion of direct realization of interpretation 
Because we need to connect competence and performance 
as directly as possible, one of the goals of our study 
is to identify rules of the grammar with the manipula- 
tive unit of interpretation in a one-to-one fashion° 
Thus we carefully distinguish between the grammar and 
the rules of interpretation. For this, we adopt the 
following notions as the methodological principles of 
our system: 
I) Equivalence of grammar\[5\]~ 2) gralfmlar cover' and 
grarmlar modif:ieation\[6\]5 3) type transparency\[2\], and 
4) an invariant of forma\] languages\[4\]. 
(3) The notion of complexity measure 
fhe direct association between unit interpretation 
time cost and the complexity of a sequential operation 
during interpretation can be measured. 
Our approach is intuitively motivated by Cha~mky's 
hypothesis\[l\]: hc~logeneous communication by the same 
linguistic performance is possible among those who 
have the same linguistic competence. We take a perfor- 
mance theory to be the study of real time processing 
of languages. The performance theory cannot be 
developed without a c~apetence theory. This hy- 
pothesis suggests that a key point of contact between 
the theory of grab, mr and the interpretation control 
is the natural link between the theory of knowledge 
representation and the theory of knowledge proeessing. 
That is, for two classes of languages to be interpret- 
able by humaln being, there exists an interpreting pro- 
cedure. Consequently, if we can show that there is an 
adjusted grab,far for the two languages plus an ade- 
quate interpreting procedure to predetermine the 
mechanism of' our two-way simultaneous interpretation, 
then we have some support for our methodology. In 
order to guarantee two-way simultaneous interpreta- 
tion, there are several subareas to be inquired. The 
first is the type of representation constructed during 
the interpretation. The second is the method of util- 
izing the representation during the interpretation. 
The third is the measure of c~nputational complexity 
during the interpretation. These three components of a 
complete c(~aputational model are necessary for linking 
3. Linguistic Data Structure and Computing Model 
in order to investigate the correspondence between 
the two languages, we partition a grammar into in- 
dependent components: segmented words, the word order, 
morphology, syntax, and semantics. The partition of a 
grammar constitutes an important step of modular 
decomposition into the interpretation subsystems. 
3.1 Interpretation strategy of segmented word com- 
ponent 
3.1.1 Data structure 
In comparison with other symbol system, every hmlan 
language has a remarkable characteristics; namely, the 
structure of segmented words. The utterance as a seg- 
mented word conveys a message regarding some matter, 
and communicates the information concerning the 
matter. A se\[~nented word is a word or an ordered pair 
of words. Using some criteria: positional transforma- 
tion, substitution and insertion, we can specify a 
segmented word of Korean or Japanese. 
3.1.2 Word order in a segmented word of Korean or 
Japanese 
325 
Between Korean and Japanese, some common properties 
are observed, such as an agglutinative language struc- 
ture and the identical word order(SOV). In addition, 
we sight three corresponding word order properties of 
segmented words between the two languages: 
For some (kl, k2) e:Sk and (jl, j2) eSj, where Sk and 
Sj are a set of Korean segmented wordsj a set of 
Japanese segmented words, respectively, and I a binary 
relation (interpretation) : 
\[Property I\] reflexivity 
(kl,k2) <I> (jl,j2). 
e.g. 0~ W <I> 4-~I q~l (our nation) 
\[Property 2\] synlnetry 
(jl,j2) <I> (k2,kl). 
e.g.~55 --/~<I> ~,}.~,,! ~1 (one more time) 
\[Property 3\] transitivity 
(jl,no,j2) <I> (kl,k2). 
e.g. Bako X <I> %t-t'~ ~ (a Japanese) 
~mong above properties, Property 3 depends upon Korean 
pragmatic information. 
3.1.3 Computing model 
The production form of a se\[{nented word of Korean 
or Japanese can be described in the rule forms in a 
regular granmar, and it is right linear. Since a 
language L generated by some right linear grammar G is 
regular, there exists a finite automaton which accepts 
L. If L is a context-free language and s is a substi- 
tution map such that for every a e V(a fixed vocabu- 
lary), s(a) is a context-free language, then s(L) is a 
context-free language. A type of substitution map 
that is of special intcrest is a hc~mmorphiaa. If' L 
is a regular language and h is a hom~rJorphia~l, then 
the range of tile inverse homomorphism ff~(L) is also 
regular language. And, for two given regular grammars 
G and G', if L(G) : L(G'), there is a sequence 
equivalence. Two sequences generate the same word ord- 
er in the increasing length order. 
3.2 Interpretation strategy of Norphological component 
3.2.1. Data structure 
The study of the structure of words occupies an im- 
portant place within linguistics, sandwiched between 
phonology and syntax. Horphemes may also be parti- 
tioned into lexical and grammatical classes. Lexical 
morphemes are generally free, while many of the gram- 
matical morphemes are bound. 
3.2.2 C~nputing model 
In a given Korean-Japanese (or Japanese-Korean) dic- 
tionary, let Dk be the set of morphemes of Korean, and 
Dj be the set of morphemes of Japanese. A mapping I 
between the sets is defined as follows. 
I(Dk) = Dj 
implying that the image of Dk is D j; taking the in- 
verse mapping, 
I(Dj) : Dk. 
By generalizing the relation and the mapping between 
the two sets, we may consider the set of Korean words 
to be a domain, and the set of Japanese words a range. 
~ssuming the same cardinality for both, Dk and Dj may 
be partitioned as shown below. Here we suppose 
{I<I, k2,..kn}eDk, {jl, j2,..jm}eDj: 
(I) one-to-one (ki,ji) e DkxDj. 
(2) one-to-many (ki. lJn.Ji2,...Ji,(il\]) e DkX2"i 
(3) many-to-nlany (Ikihki2,-..ki,(i)l,lJil,Ji2.'"Ji,,,(i)}) e 21)kx2 l)j 
where, A xB is the Cartesian product of the two sets A 
and B~ and 2 A is the a power set of a set A. 
Obviously, one-to-one correspondence is isomorphic. 
Naturally, our attention will be focused on the one- 
to-many and many-to-many relations. Interpretation of 
these relations depends on various factors: allomorph, 
synonym and homonym. Thus, as for the interpretation 
which is dependent on synonymy or polysemy, we charac-- 
terize the interpretation by specifying the canonical 
form, or the semantic feature instantiation, respec- 
tively. 
3.3 Syntax level interpretation strategy 
We examine the syntactic structure of the two 
languages. Frcn~ the correspondence in a segTaented word 
and word order, it is seen intuitively that they are 
strongly equivalent. And there is a sufficient 
linguistic evidence for it based on the study of ex- 
perimental comparative linguistics\[2\]. ~ phrase struc- 
ture preserves each lexical semantic feature of a con- 
stituent structure, and a parse tree describes the 
construction of syntactic representation of a sen- 
tence. Horeover~ a partial tree in the whole parse 
tree plays a role of adjusting semantic and syntactic 
interpretation. Let us compare the examples of two 
parse tree constructions(Fig I): 
VP VP ./ \, / \ 
NP VP <I> PP VP 
/\ /\ / \ / k 
VP I': V AUX VP P V AUX 
I I l I I I I { 
Fig 1: Syntactic trees of "(I) thought (somebody) 
went (to somewhere) ' 
It is obvious that parse trees coincide with each oth- 
er in one-to-one fashion, but syntactic categories do 
not. This implies that two given languages, Korean 
and Japanese, do not generate the same set of senten- 
tial forms. Furthermore, there is no algorit~n for 
deciding whether or not two given context-free gram- 
mars generate the same sentential forms. This is the 
reason why we adopt the covering grar~ar technique to 
parse the source language for interpretation. 
3.4 Semantics, pra~aatics and ambiguity 
Semantics and pragmatics also play an important 
role in generating the well-formed target language. In 
the interpretation between Korean and Japanese, there 
exist several kinds of inherently ambiguous sentences 
which are generated only by the ambiguous gralrmars of 
326 
both languages. (see 5°Fragments of interpretation) 
4. K-J Gr~nmar 
We design the K-J (or J-K) grammar which elgninates 
syntactical and semantical ambiguity of' both languages 
for interpretation° This gra~m~mr corresponds to the 
ccxnmunicative c~npetence for the interpretation 
between Korean and Japanese. The K-J (J-K) grammar is 
motivated by grammar modification and the coverinl\] 
grammar. 
ALGORITHM" irregularity categories removal or adjust- 
ment and semantic features insertion. 
Input: a 5-tuple phrase structure grammar G : 
(N,Tk,Tj, P,S). 
Output: an equivalent 5-tuple phrase structure gram- 
mar G' : (N',Tk'\[semj\],Tj',P',S'). 
Method: entpirical and heuristic method° 
llere N and I~' are nontermina\].s, Tk, Tj, Tk' ancl T o ' 
are terminals, sem~ is semantic features, P and P' are 
production rules, ~nd S and S' are the start symbols° 
The J-K granmmr is designed analogously. In the 
framc~vorl,: of the generalized phrase structure grammar, 
the semantic features are accepted by a special phrase 
structure rule, that is a linking rule, which causes 
the relevant information about the phrase to be passed 
down the tree as a feature on the syntactic nodes. 
Therefore, interpretation procedure is constructed by 
a succinct algorithn founded on the K-J(J-K) grammar. 
5. Fragments of Irlterpretation 
In this section, we exhibit the frap~nents of our 
intcrpretatJon system: how phrase structure rules and 
semantic £eatures interact in the interpretation pro- 
cedure aceordJ.ru;; to the K-.J(J-K) grai~lt/iai". 
5.1 \[Iomonymous construction 
There are some kinds of construction types provided 
by syntax relations of each constituent. Among them, 
modificatiorl is a construction type related to Head 
and Attributes. Coordination imp\].ies that more than 
two subconstituents have syntactical coordination re-. 
lation. Let us consider the following Japanese utter- 
anoes : 
I) t~2~.I-~,~ \[T\] ~>~<~o (modification 
"(Someone) goes to school, and eats bread. 
2) ~>99:< \[-c\] g*~-~9<o (coordination 
"(Someone) eats \[)read and goes to school.' 
The two utterances imply the semantic notions of 
modification and coordination, respectively, but have 
the same conjunction morpheme \[tel. Semantically, 
they are represented in Korean by the outcome of in- 
terpretation as follows: 
I) ~d;,~°ll d*l ~'~~z '~#4=*:I. (modification) 
2) ,,~.~-,-'\].,L q;,'-ol\] ",\]'-l. (coordination) 
All such morpheme ambiguities induce not only lexieal 
semantic ambiguity but sentential ambiguity, in order 
to interpret such ambiguous utterances, we c~nploy se- 
mantic feature specification as the discipline of the 
semantic conjunction schemata. The foilowin{~ rules 
account immediately for the sentences in the example. 
Here we use the GPSG notations: 
(I) modification schema 
S-> \[l\[sem~0 , Conj *l \], lt\[\]semc~ 1\] 
where~Y~\[ (0,1), (O,O) \] 
(2) eoordinat:i.on schema 
S -> \[i\[semrm , Conj .,'- \], II\[sem~l \] 
where~,(:{ (I,0), (1,1) \] 
5.2 Missing construction 
Korean and Japanese allow one of" the constituents 
of a sentence not to be explicitly stated when it is 
understandable fr~ll the context. In the GPSG frame- 
work, this kind of difference can be expressed by a 
FOOT feature S\[,ASH\[3\]. The SLAS\[I feature indicates 
that somethinl\] is lilJ.ssJ.n 6 in the structure dominated 
by the category specified, in this subsection, we ex- 
hibit a semantically ~m\]biguous utterance across a 
h(~nonytilous construction and a missing construction° 
Consider the fol\].owing Korean utterance. This utter- 
anee also has inherent syntactical and semantic ambi- 
l~ui ty. 
1 ) *l-R-°llq -',4°I 'ffd-<: 11,II°1 -~.:~. 
This utterance has two distinct syntactic trees: 
( 1 ) S/PP j i / \ 
/ \ 
PP\[de\] 
'\] ~J ~b lJ, 6 #,/PP VP 
/ " .... _<F. _'>~. 
S/PP Conj\[ P- \] ~fJ~ ~4~o 
PP/PP VP 
"Frcxn Seoul came a report that there was a fire 
Seoui ) ' 
(2) SIPP. 
/- 
\]S- " S/PP 
/ 
S Con j\[ ~ \] PP/PP VP -11"- i jC"E  
"(From Seoul) came a report that there was a fire 
Seoul ' 
(in 
in 
In the above example, h~nonymous construction does not 
arise in Japanese, but missing construction remains. 
We ~nploy a parse tree (2) for semantic adjustment, 
and fill the gap of local environment with syntacti- 
cally and semantically agreeable vocabulary; then such 
utterance of Korean and Japanese is interpretable 
without ambiguity° Consequently, the utterance of 
Korean I) is interpreted as follows° 
\[\[seoul- cj~ \[kazi-ga okitato\] \[se99~\-~a_/~ \[renraku ga 
kita\]\]\]. 
327 
S / 
S S 
S Conj \[ ~ \] PP VP 
"From Seoul came a report that there was a fire in 
Seoul ' 
6. K-J(J-k) system 
In order to define a two-way interpretation system 
more formally, we formulate the internal interfaee(K-J 
system) for the interpretation. This interface 
corresponds to the transducer of interpretation. We 
can define the K-J(J-K) syster,; as a 3-tuple grammar 
G:(wj,k(or j),wk ), wherewk and w i are Korean words and 
Japanese words, respectively, and k(j) : Wi-~Wk ( Wk--~Wj ) 
is a homomorphism. The K-J(J-K) system G defines the 
following sequence preserving the word order: 
w~-k(wD, w~w~=k(wDk(w~),. ..... 
It also defines the language 
L(Gk) = {ki(wi)li>O}. 
As mentioned above, the K-J(J-K) systel;L constitutes a 
simple device for interpretation. A language defined 
by the K-J(J-K) systom corresponds to the target 
language. Inversely, the mapping j of w~ into w i is such 
that the inverse homc{i~orphi~i 
j(wO = {wilk(wi) =w~} , j :k 
exists. Thus, we define the two-way simultaneous in- 
terpretation system ~ by: 
j(Lk) :k ~(Lk) = {wilk(wj) eLk}. 
We can define our system ~ using the extended no- 
tion; the inverse homo~\]orphism can be replaced by the 
direct operation of a finite substitution. Consider 
a gra\[~ar(e.g. Korean) GK" = (Nk, Tk, Pk, Sk) and let j 
be a finite substitution, defined on the vocabulary 
(Nk u Tk)*, such that j(w) is a finite(possibly empty) 
set of word for each word A. We denote 
j(Nk) = \[~j, j(Tk) = Tj, PjDj(Pk), Sjnj(Sk). Then, 
the gray, nat (e. g. Japanese) 
Gj = (Nj, Tj, Pj, Sj) 
is an interpretation of Gk. If I(Gk), I(Cj) are the 
sets of all interpretation of Gk and G j, respectively, 
then I(G#') = I(Gj), and I is an invariant for Gk and 
Gj. 
7. Complexity of System NARA 
The complexity of the algorithm is usually measured 
by the growth rate of its time and space requirements, 
as a function of the size of its input (or the length 
of input string) to which the algorithm is applied. We 
adopt a finite state transducer as a computing model 
which governs the fundamental interpretation control. 
Since we do not count the time it takes to read the 
input, finite state languages have zero complexity. If 
reading the input is counted, then finite languages 
have time complexity of exactly A (the length of input 
string). Such languages are interpretable in exactly 
time it, and then called real-time languages. The in- 
terpretation which is accompanied by co-occurrence 
dependency cannot be done in general without relying 
on arbitrary look-ahead or rescanning of the output. 
However, the nature of on line interpretation is un- 
changeable. Consequently, our system \]~R_& is inter- 
preted in real-time. 
8. Concluding Remarks 
Our approach for constructing this system has both 
logical view and experimental view; the former is 
given by mathematical formalization, the latter by the 
correspondence of two languages. In the view of compu- 
tational linguistics, we separated the mechanism of 
our two-way simultaneous interpretation system into 
the levels of abstract theory, algorit~ii, and imple- 
mentation to carve out the results at each level ira 
more independent f'ashion. In order to do so, we speci- 
fied four important levels of description; the lowest 
level is morphology, the second level is se~lented 
word, the third level is syntax and semantics, and the 
top level controls the computing model of each level. 
Hence, we could determine the range of correspondence 
between internal representations of both grammars, and 
the basic architecture of the machinery actually in- 
stantiates the algorithn. Consequently, our model pro- 
duces the extra power by the proposed theory with mul- 
tiple levels of representation and systematic mapping 
between the corresponding levels of two languages, be- 
cause interpretation efficiency requires both func- 
tional and mathematical discussions. Nevertheless, the 
complete pragmatic interpretation still remains quite 
obscure. Finally, we confront the proble~,~ whether it 
is possible to construct a two-way simuitaneous in- 
terpretation system between other two different 
language systems such as Japanese and English. We 
presuppose that the key point of problem-solving is in 
the study of universality and individuality between 
two given languages. 
Acknowled ~F~ients 
We are deeply grateful to Prof. If. YAI~\[ADA for his en- 
couragement. We would like to thank Dr. A. ADACHI and 
Dr. K. HASHIDA, for many stimulating discussions and 
for detailed commentsp and to I lr. Y. SHIRAI and Hr. I. 
FUJISIIIRO for suggestions to if~iprove the paper. 

References 

N. CHOHSKY, Aspects of the Theory of Syntax, 
I I.I.T. Press, Reading, 1963. 

H. S. CI\]UNG, Current Korean: Elementary Sentence 
Patterns and Structures, Komasholin, Reading, 1982(in 
Japanese). 

GAZDAR, KLEIN, PULLUM arld SAG, Generalized Phrase 
Structure Grammar, Blacl~Jell, Reading, 1985. 

H. HORZ, Eine l'leue Invariante f(Jr Kontext-freie 
Sprachen, Theoretical Computer Science 11, 1980. 

H. R. LE\]~IS, C. H. PAPADIHITRIOU, ELEMEf~TS OF THE 
THEORY OF COHPUTATION, Prentice-Hall, Inc. Reading, 
1981. 

A. NIHOLT, Context-Free Grill,mr: Cover, Normal 
Forms and Parsing, Springer, Reading, 1980. 

A. SALOMAA, Jewels of Formal Language Theory, Com- 
puter Science Press, Reading, 1981. 
