A MACHINE TRANSLATION SYSTEM 
FROM JAPANESE INTO ENGLISH BASED ON CONCEPTUAL STRUCTURE 
Hiroshi Uchida and Kenji Sugiyama 
FUJITSU LABORATORIES LTD. 
1015, Kamikodanaka, Nakahara-ku 
Kawasaki 211, JAPAN 
s um___~m a r___!y 
In this paper a language transla- 
tion system based on conceptual struc- 
ture is described. The conceptual 
structure is extended from case grammar 
from practical viewpoints. The concep- 
tual structure is composed of concepts 
and relations among them; in our system, 
a given Japanese text is transformed 
into conceptual structure, and then an 
English text is generated from it. 
In this paper, the needs and bene- 
fits in introducing conceptual structure 
as intermediate representation are dis- 
cussed, and then the construct of con- 
ceptual Structure, and in what way our 
system utilizes it in a translation pro- 
cess are described. 
I. Introduction 
It is believed that, in the course 
of development of present information 
society, the amount of documents, such 
as technical writings, correspondences, 
to be exchanged in every international 
community has become huge. This great 
amount of documents have to be translat- 
ed since no unique universal language is 
available. But obviously, there is a 
definite limitation on translation speed 
by hand. This situation urges on us the 
importance of development of a machine 
translation system. 
Around 1960 when computers were 
becoming widely used, various experimen- 
tal studies were done on machine trans- 
lation. However, most of them brought 
about almost no commercial products, and 
shortly after that, even such kind of 
researchs seem to have disappeared. 
Among a few developed systems of that 
period, only the Russian-English 
translation systems, MARK2 used by the 
U.S. Air Force and another used by the 
Atomic Energy Commission at Aok ridge 
which was designed at Georgetown Univer- 
sity, were widely known. The revised 
version of the latter system named 
SYSTRAN, has been on the market, and is 
currently being used by the EC and other 
few organizations. At that time, a 
'word-for-word' translation was shiefly 
considered. Such systems are classified 
in first generation translation sys- 
tems 2 
After first generation systems, 
second generation translation systems 
which rely on intermediate language 
model instead of a 'word-for-word' 
translation, are now under development. 
The features in the approaches of this 
new generation systems are: 
(I) it performs a translation between 
intermediate languages constructed 
over source language (SL) and target 
language (TL) respectively (called 
transfer approach ), as shown in 
fig. I, 
(2) it encourages to separate linguis- 
tic data from programs 2. 
In the transfer approach, the 
intermediate languages are strongly 
required to retain the characteristics 
(including syntactical charcterlistics) 
of the original ones. This approach 
seems to be effective for translation 
among languages of the same linguistic 
family (in the sence that there are 
similallities in syntax and meaning of a 
text in 
Source 
Language 
(SL) 
analysis 
intermediate representation I 
I transfer 
I text in 
Target 
Language 
(TL) 
_ I synthesis 
intermediate 1 
epresentation~ 
t 
Fig.l Translation Model of 
Transfer Approach. 
--455-- 
word) such as English, German, and 
French, because translation of words and 
some transformation on syntactic struc- 
tures are merely needed. However, it is 
not seemed to be quite effective when 
performing translation among non-related 
languages, for example, Japanese and 
English, because of the need for large 
structural transformation. Among exam- 
ples of this approach are TAUM of 
University of Montreal and GETA of 
Grenoble University~ 
In our translation system from 
Japanese into English, conceptual struc- 
ture is introduced and utilized in 
translation process. 
In this paper, we discuss why con- 
ceptural structure is needed, what beni- 
fits are obtained from our approach, 
what conceptual structure is, and in 
what way our system performs. 
2. Need for Conceptual Structure 
Let us think of the process we take 
for natural language translation. Do 
people really construct intermediate 
representation for both SL and TL, and 
then carry out translation between them? 
Surely they don't. Instead, when 
translating, they first understand the 
meaning of a text, and then perform its 
translation. Furthermore, in addition 
to explicit meaning in the text, they 
usually comprehend implicit meaning 
behind word order (i.e., syntactic in- 
formation), and a choice of words by the 
writer. In other words, to understand a 
text is to extract concepts represented 
by words or phrases and their mutual 
relaions from a text. 
Therefore, from this obsevation, we 
can conclude correct translation cannot 
bypass intermediate semantic representa- 
tion of sentences (we call conceptual 
structure ) in the process of transla- 
tion. This conceptual structure is con- 
structed upon concepts in SL, but these 
concepts are considered as universaly 
general to the extent that they can be 
translated into any languages. That is 
because it is considered peoples will 
only create concepts which can be under- 
stood by every people since they share 
the same space and physical laws in 
their life on the globe. 
In a case where a concept in one 
language does not correspond to one in 
other language, it is supposedly possi- 
ble to express such a concept with other 
concepts in another language. In this 
sence, we can assume every concept has 
universality so that it is always possi- 
ble to find a word or a phrase 
representing a given concept. 
As illustrated in fig. 2, in a 
translation process from language A into 
B, conceptual structure is constructed 
from concepts in A, and subsequently is 
represented with language B. There also 
exists reverse process, namely from B 
into A. Some concepts of a language may 
not fit in any concept of the other one. 
In this case, it is translated by 
paraphrasing with available concepts. 
text in language A 
I 1 
und mding 
of meat Lng 
text in language B 
translation 
into language A 
translation 
into language B 
H~ 
conceptual structure based on I 
concepts in language A I 
(intersection) 
conceptual structure based on 
concepts in language B 
understanding 
of meaning 
Fig.2 A Model of Human Translation Procedure. 
--456-- 
Apparently correct translation can- 
not be expected only with information 
of 'surface structure'. Thus concept~aI 
structure plays an important role for 
translation, but incorporating it into a 
machine translation system will bring 
out difficulties, such as how to extract 
meanings out of sentences, and how to 
represent meanings in conceptural struc- 
ture. Nevertheless, we do consider our 
approach is better than transfer ap- 
proach, because the latter might involve 
much more complex treatment, as dis- 
cussed later. From the above arguments 
and the fact that our approach is closer 
to the process we human beings use, we 
believe ours is more practical and 
promising than transfer approach. 
3. Usefulness of Conceptual Structure 
In what follows, advantages in 
adopting conceptual structure as an 
intermediate language for a machine 
translation system are described. 
3.1 Separation of Syntax and Semantics 
In our approach, first, concepts 
and relations among them are extracted 
to construct conceptual structure; 
next, then it is re-expressed in the 
target language. In this scheme, syn- 
tactical information of the source 
language does not affect on the second 
step, and conversely, syntactical regu- 
lations of target language do not affect 
the first step. 
However, in transfer approach it 
tries to convert intermediate structure 
of the source language to another inter- 
mediate structure of the target 
language, and the translation process 
cannot be essentially freed from the 
characteristics of both languages; in 
other words, syntactical and semantical 
matters have to be attacked at one time, 
and this increased complexity seems to 
make itself inferior to our approach. 
(However, when SL and TL are in the same 
linguistic family, transfer approach 
might be more suitable.) 
3.2 Availability of Discourse 
Information 
Discourse information is an inevit- 
able thing in order to comprehend sen- 
tences of a natural language, so unless 
using it~ correct translation cannot be 
expected. In our approach, the meaning 
of a sentence is represented by concep- 
tural network (which will be defined in 
section 4), and discourse information 
will be also composed of these networks. 
This scheme, that is, discourse 
information and sentence meanings are 
expressed by the same construct, brings 
us a great convenience to make use of 
discourse information in process of 
translation, as well as to accumulate 
that of sentences transacted so far. 
3.3 Advantage in Translation among Many 
~anguages 
There are many languages being used 
in the world, and even if we only count 
the languages of importance, the number 
of them is still large. Our translation 
system aims at translation between 
Japanese and English language, specifi- 
cally from Japanese into English, but in 
the ordinary course of events, it will 
be applied to other languages in future. 
When doing translation among many 
languages the work needed will be beyond 
our power if transfer approach is 
chosen, becouse it requires to supply 
different programs and dictionaries for 
transfer portion of every pair of inter- 
mediate languages. 
On the other hand, our approach has 
only single conceptual structure, so it 
only requires to add analysis and syn- 
thesis procedures for one distinct 
language (although in our approach, con- 
cepts which constitutes conceptural 
structure might be defined somewhat dif- 
ferently, we believe most of concepts 
are common). This is one of the advan- 
tages over transfer approach in transla- 
tion among many languages. 
4. Conceptual Structure 
The meaning of a sentence is ex- 
pressed by concepts represented by words 
or phrases, and relations among them. 
The concept is what is recognized by us 
abstracting general factors in events 
and objects (abstraction), but excluding 
pecurialities to each of them (subtrac- 
tion). 
In our model, a node represents a 
concept, and an arc represents a rela- 
tion beween concepts. This constitutes 
a network representing conceptual struc- 
ture, and we also call such a network 
--457-- 
conceptual structure. This conceptual 
structure is based on the case grammar 
by Fillmore I, but is extended for prac- 
tical use. 
Roughly speaking, concepts in our 
model is fourfold: 
(I) to represent an object 
(corresponding to a class of nourish, 
(2) to represent motion (verbs~, 
(3) to represent the nature or state 
of an object (adjeetives), 
(4) to represent the nature or state 
of motion (adverbs). 
There are relations between con- 
cepts of an 8rbitarily chosen pair of 
the above classes. For instance, there 
is a relation between noun and noun 
class to express "possesion" or "place", 
and a relation between noun and verb 
class to express "actor", "place", or 
"purpose." Some concepts and relations 
are shown in table I. The symbols in 
this table are called semantic symbols. 
class example 
verb give, walk, 
adjective beautiful, red, 
noun I, he, book, 
Tanaka, Johnson, 
adverb slowly, always, 
(a) Concepts (node). 
Table 1 Concepts and Relations 
class arc name explanation 
shows tense modifier of 
concepts 
relation 
between 
action concept 
and others 
past, present, 
future 
temporary, 
may, must 
actor 
causer 
object 
property 
to(direction) 
from(direction) 
at(time) 
after(time) 
before(time) 
means 
possesive 
cause 
reason 
modifier 
shows aspect or 
modal 
who does 
who causes an action 
the object which is affected 
by an action 
property or state of an 
action or an object 
direction of action 
the time when an action 
Occurs 
method by which a result is 
obtained 
possesive relation 
cause of an action 
reason of an action 
simple modification 
(b) Relation between Concepts (arc). 
Table i. Concepts and Relations between Concepts. 
--458-- 
4.1 Level of Concept in Conceptual 
Structure 
The meaning of a sentence is ex- 
pressed by conceptual structure, and 
deciding in what level such a concept 
should be expressed is an important 
matter. That is, the possibility to 
construct a translation system depends 
on the level of concepts. For example, 
"~I~~,~ a ~ %~ < ~<. " 
can be expressed with several levels of 
concepts: 
i) let "~ (kare:he)", "{~(sensei: 
teacher)", and " ~'9 ~ ~ % ~ < ~ < 
(iukoto wo yoku KIKu: De loyal to, 
obey)" be concepts, 
2) separate "~,9 a ~ % ~ < ~< " into 
"~9(say)", "a ~(thing)", "~ <(fre- 
quently)", and "~<(listen and 
obey)". 
3) further, separate a verb " " 
into primitive elements as Schank has 
done 3 as illustrated in fig. 3, where 
" " is defined to satisfy one's 
mind by directing ears toward him. 
In the third method, however, 
although the model is cleared up because 
of limited primitives, it is not 
guaranteed that any meaning could be 
expressed with them. It seems that we 
could not even know how to choose primi- 
tives to express a wide class of mean- 
ings. Furthermore, difficulties in sen- 
tence synthesis seem to be a barrier for 
practical use. In addition, when people 
extract conceptrual structure out of a 
text in process of translation, they do 
not seem to separate each concept into 
elements. From these observations, the 
above third level of concepts has been 
rejected for our model. 
On the other hand, the oposite 
direction in terms of level, that is, to 
introduce compound concepts (usually to 
represent more complicated concepts) 
into conceptual structure will make con- 
text available from sentences be hidden 
behind them. Nevertheless, compound 
concepts are allowed in our system be- 
cause availability of arbitrary level of 
concepts enables the system to handle 
idiomatic expressions and other compound 
expressions in s straighforward way --- 
without transforming them into compli- 
cated conceptual network (this advantage 
is also recognized in transfer ap- 
proach). 
As an example, a concept network 
for 
~a~, " 
is depicted in fig. 4. What this figure 
explains are: 
There is "~7(show)" as in a class of 
verb whose tense is present, and the 
place where it occurs is explained by 
"~(table)"; the object of '~#show)" 
is a concept of "~\[~(specification)", 
and it is connected to "LSI" by a 
'theme' relation; "LSI" is an object 
of verbial concept'i~(use)" , and has 
'aspect' relation of continuation; 
"gx 9/,(system)" has 'modifying' 
relation of " ~:(this)" 
4.2 Causative Sentence and Voice 
Causative sentence is typically 
recognized by an existence of 'causer' 
relation. 
A sentence is classified into pas- 
sive or active voice. Sould voice 
concept be also incorporated in a con- 
cept network like tense, aspect, and 
modal? We consider that writer's choice 
of voice is not necessarily dominant in 
conveying meaning; passive voice is 
often chosen when an actor is of no 
importance or unnecessary like in "The 
(he)~ATTEND ~object(ears)~direction ~ (teacher) 
~ reason MENTAL. STATE (satisfied) r 
(teacher) ~ J 
~-~MENTAL. STATE (normal) 
Fig.3 Conceptual Representation with Primitive Concepts. 
--459-- 
rocket was launched." At present, so we 
think the difference of voice is not 
necessarily explicit in conceptual 
structure; when generating an English 
sentence, passive voice is used when 
actor is omitted in a concept network. 
5. Translation Procedure 
Our translation procedure is illus- 
trated in fig. 5, and is described in 
the following. First, the system inputs 
a Japanese sentence and separates it 
into 'bunsetsu's, then analyzes rela- 
tions among them to obtain which 'bun- 
setsu' modifies which 'bunsets' This 
information represents the syntax struc- 
ture of the sentence, and is output in a 
form of 'bunsetsu'-table. Based upon 
this table, concept structure is con- 
structed. Notice, in this structure, 
syntactical information or words pecu- 
liar to the source language is not con- 
tained. Next, English phrase for each 
semantic symbol (attached to a node\] is 
obtained by consulting a dictionary 
data. In this process, many candidates 
of English phrase may be found, but the 
most suitable one is chosen. Further, 
important grammatical imformation, such 
as subject, object, or compliment is set 
to each arc. Finally, these English 
phrases are synthesized into an English 
sentece applying English grammar and 
modification of words if necessary. 
5.1 Analysis of Japanese Language 
In Japanese written language, each 
word in a sentence is not separated by a 
space like in English; a sentence is 
usually a succession of words (see I of 
pre sent 
continuous ~.. 
I 
present 
;e~ position (in~ .... modifier ~ 
..... system I this J 
robject 
theme 
'specification 
Tobject position (in) show, | _,J' (table)J 
Fig. 4 Conceptual Structure. 
Japanese text 
Analysis of 1 Japanese sentence 
I 'bunsetsu'-table 
I"construction of 
conceptual structure i 
English text 
i synthesis of "i 
L English .... sentence 
English ohrase structure 
~ generation of 
English phrase structure 
conceptual structure ,~ 
Fig.5 Translation Process. 
--460-- 
fig.6). For recognition of each word 
(see 2 of fig.6), we use adjunctive con- 
dition of words; in a Japanese sentence, 
"~=~%S~y~(watashi ga kate ni hon 
wo ageta: I gave him a book)", " 9~ " can 
follow "watashi", and "9~" can be fol- 
lowed by "~". but succession of "~" 
and "~" is not allowed. This adjunc- 
tive relation provides us with very 
powerful word separation method. (How- 
ever, since there are many homonyms in 
Japanese, 100 per cent of correct 
separation is theoretically impossible. 
But neary 100 per cent correct separa- 
tion is being obtained. This matter is 
not discussed in this paper in more 
detail.) 
'Bunsetsu's are thus recognized as 
in 3 of fig.6, each of which is composed 
of 'jiritsu-go' and 'juzoku-go'. Then 
,kakariuke'-condition is used to analyze 
the 'kakariuke' between 'bunsetsu's. As 
in 4 of fig.6, 'bunsetsu' "~," does not 
modify "~ ~=" nor "~ %" but "~y ~ " 
This is because ,kakariuke'-condition 
contains a rule that "~," only modifies 
a predicate "~ly~ " but not others. 
This 'kakariuke'-condition depends on 
syntactic features of 'junsetsu' 
In order to identify 'kakariuke' 
relations more minute information is 
needed. For example, in fig. 7, 
" ~ ~<~ (kawa no kaban: a bag of 
leather)", semantic information should 
be used to know auxiliary word "~ " 
after "~" specifies the kind of materi" 
al of "~,~Z (bag)" 
I__J \[ 
Fig.6 An Example of Japanese Analysis. 
Fig.7 An Example of 'Kakariuke' relation. 
5.2 Construction of Conceptual Network 
From the 'bunsetsu'-table obtained 
in the previous step, a condeprual net- 
work is constructed with an aid of se- 
mantic symbol table which supplies sym- 
bols for Japanese words, phrase, or 
'kakariuke'-relations. 
In a conceptual network, a node 
represents a concept corresponding to 
verb, noun, adjective, or adverb of 
Japanese, and an arc represents func- 
tional meaning, such as an auxiliary 
word " (about)" 
5.3 Generation of English Phrase 
Structure 
To generate English phrase struc- 
ture (i.e., conceptual structure with 
syntax roles and English phrase attached 
to nodes and arcs), there is data for 
each semantic symbol, such as Englsh 
phrase (possibly a word) and its syntac- 
tic type (noun, adjective, verb, and so 
on). Also, informatin of sentence 
structure which a phrase takes is 
provided. That structural information 
decides the kind of syntax role, such as 
subject, object, compliment, to be put 
on an arc. In this phrase structure, 
verb, adjective, or noun is put on a 
node, and conjunction, preposition, or 
relations (such as "which", "where") is 
put on an arc. 
5.4 Synthesis of English Sentence 
In accordance to syntactical infor- 
mation given in English phrase struc- 
ture, English sentence is generated from 
English phrases put on arcs and nodes. 
In this process, verb, adjective, ad- 
verb, and noun are modified to fit in 
with a sentence to generate; for exam- 
pie, verb "see" is modified to "saw" if 
tense is specified so. 
6. Conclusion 
An experiment of our system on 10 
pages text from a computer system manual 
(approximately 230 sentences included) 
is currently under way. The results so 
far is farely good and we would like to 
comment on this after the data is col- 
lected. 
One of the possible extension of 
our system is an automated abstraction 
system, that is, to generate an abstract 
on a given text. To do that, we need to 
--461-- 
distinguish the equality of concepts of 
different levels (discussed in section 
4) for handling context among sentences. 
For example, in "There came a girl who 
was attractive." and "...that honey...", 
an attractive girl and the honey have to 
be identified in order to clearify logi- 
cal relatinship. The conceptual struc- 
ture thus obtained resembles paragraph 
structure proposed by Schank 4 . This 
would be a first step towards an au- 
tomated machine abstraction of writings. 
Acknowledgment 
We would like to thank Masato Kobe, 
and Tatsuy~ Hayashi who gave helpful 
suggestions through many discussions, 
and also would like to thank Sanya 
Uehara for his assitance in preparing 
this paper. 

References 

\[!\] Fillmore,C.J.:"The case for case" in 
Bach,E., Harms,R.(eds.): "Universals 
in Linguistic Theory",Holt, Rinehart 
and Winston, New York, 1968. 

\[2\] Hutchins,W.J.:"Progress in Documen- 
tation Machine Translation and Ma- 
chine-aided translatlon , Journal 
of Documentation, vol.34, No2, June 
1978. 

\[3\] Schank,R.C.:"Conceptual Tnformation 
Processing",North-Holland Publishing 
Company-Amsterdam,1975. 

\[4\] Schank,R.C.:"The Structure of Epi- 
sodes in Memory" in Bobrow,D.G. and 
Collins,A (eds.):"Representation and 
Understanding",Academic Press, 1975. 
