DEALING WITH INCOMPLETENESS OF LINGUISTIC KNOWLEDGE 
IN LANGUAGE TRANSLATION 
TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT 
Makoto Nagao, Toyoaki Nishida and Jun-ichi Tsujii 
Department of Electrical Engineering 
Kyoto University 
Sakyo-ku, Kyoto 606, JAPAN 
I. INTRODUCTION 
Linguistic knowledge usable for machine trans- 
lation is always imperfect. We cannot be free from 
the uncertainty of knowledge we have for machine 
translation. Especially at the transfer stage of 
machine translation, the selection of target lan- 
guage expression is rather subjective and optional. 
Therefore the linguistic contents of machine 
translation system always fluctuate, and make 
gradual progress. The system should be designed to 
allow such constant change and improvements. This 
paper explains the details of the transfer and gen- 
eration stages of Japanese-to-English system of the 
machine translation project by the Japanese Govern- 
ment, with the emphasis on the ideas to deal with 
the incompleteness of linguistic knowledge for 
machine translation. 
2. DESIGN STRATEGIES 
2.1 Annotated Dependency Structure 
The intermediate representation we adopted as 
the result of analysis in our machine translation 
is the annotated dependency structure. Each node 
has arbitrary number of features as shown in Fig. i. 
This makes it possible to access the constituents 
by more than one linguistic cues. This representa- 
tion is therefore powerful and flexible for the 
sophisticated grammatical and semantic checking, 
especially when the completeness of semantic analy- 
sis is not assured and trial-and-error improvements 
are required at the transfer and generation stages. 
2.2 Multiple L~aver Grammar 
We have three conceptual levels for grammar 
rules. 
lowest level: default grammar which guarantees the 
output of the translation process. The quality 
of the translation is not assured. Rules of 
this level apply to those inputs for which no 
higher layer grammar rules are applicable. 
kernel level: main grammar which chooses and gener- 
ates target language structure according to 
semantic relations among constituents which are 
determined in the analysis stage. 
topmost level: heuristic grammar which attempts to 
get elegant translation for the input. Each 
rule bears heuristic nature in the sense that it 
is word specific and it is applicable only to 
some restricted classes of inputs. 
2.3 Multiple Relation Structure 
In principle, we use deep case dependency 
structure as a semantic representation. Theoreti- 
cally we can assign a unique case dependency struc- 
ture to each input sentence. In practice, however, 
analysis phase may fail or may assign a wrong 
structure. Therefore we use as an intermediate 
representation a structure which makes it possible 
to annotate multiple possibilities as well as mul- 
tiple level representation. An example is shown in 
Fig. 2. Properties at a node is represented as a 
vector, so that this complex dependency structure 
is flexible in the sense that different interpreta- 
tion rules can be applied to the structure. 
2.4 Lexicon Driven Feature 
Besides the transfer and generation rules 
which involve semantic checking functions, the 
grammar allows the reference to a lexical item in 
the dictionary. A lexical item contains its spe- 
cial grammatical usages and idiomatic expressions. 
During the transfer and generation stages, the~e 
rules are activated with the highest priority. 
This feature makes the system very flexible for 
dealing with exceptional cases. The improvement of 
translation quality can be achieved progressively 
by adding linguistic information and word usages in 
the dictionary entries. 
2.5 Format-Oriented Description of Dictionary 
Entries 
The quality of a machine translation system 
heavily depends on the quality of the dictionary. 
In order to build a machine translation dictionary, 
we collaborate with expert translators. We develop- 
ed a format-oriented language to allow computer- 
naive human translators to encode their expertise 
without any conscious effort on programming. 
Although the format-oriented language we developed 
lacks full expressive power for highly sophisticat- 
ed linguistic phenomena, it can cover most of the 
common lexical information translators may want to 
describe. The formatted description is automati- 
cally converted into statements in GRADE, a pro- 
gramming language developed by the Mu-Project. We 
prepared a manual according to which a man can fill 
in the dictionary format with linguistic data of 
items. The manual guarantees a certain level of 
quality of the dictionary, which is important when 
many people have to work in parallel. 
420 
(Due %0 the advance of electronic instrumentation, auwmsted ship increases in number.) 
J-CAT=Verb 
J-LEX ffi It~ "f ~ (increase) 
J-DEEP-CASE = MAIN 
J-GAPffi'(SOUrce GOAl)' 
J-SEI~WENCE -CONNECTOR = DECLARATIVE 
J-SENTENCE-RELATION = NIL 
J.SEh~I'ENCE-END © NIL 
J-DEEP.TENSE = PRESENT 
J-DEEP.ASPECT= BeyondTime 
J.DEEP.MODE = NIL 
J.VERB.ASPECT ffi TRANSITIVE 
J-VERB.INT = NO 
J-VERB-PAT='(~: .~." ~' .:~ I "C" :: )' 
J-VERB-SD ~'(~ ~ -SUBject T-CAUse ... )' 
J-NEG = NIL 
J-CAT = No'tin 
J-LEX = ;~ ~(advance ) 
J.DEEP-CASE ffi CAUse 
J.SUT'.FACE-CASE ffi ~'- 
I I 
-CAT= Noun 
(electronic instrument.stion) 
.DEEP.CASE ffi SUBject 
:SURFACE-CASE = -'9 
J-CAT = Noun 
J-£ZX ffi NIL 
J-DEEI'-CASE = SOUrce 
J-SURFACE-CASE = ~,. ,% 
J-CATfNoun 
J.LEX ,= g~ Ib~(\['.~(antomatad ship) 
J-DEEP-CASE ffi SUBject 
J-SURFACE-CASE = ~' 
J-BKK-LEX = ~: 
J-NFffiNIL 
J-DEEP-BFKI-3 = NIL 
J-SURFACE-BFKI-3 ,= NIL 
J-BFK-LEX1-3 ,, NIL 
J-N,ffi C.,ommonNotm 
J.SEM,, OM(urthSeinl object) 
J-NUMBER = NIL 
.o. 
I J-CAT = Noun 
I J-LEX = NIL 
J-DEEP-CASE,ffi GOAl I 
J -SURFACE-CASE ='(( ". T" ) (:=))" 
dummy nodes 
Fig. i. Representation of analysis result by features. 
his work o work .... \] 
work work 
I I \[ J-LEX = he 
agent OR possess ~ |J-DEEP-CASE l 
I -- I L = agent OR posessj 
he he 
Fig. 2. An example of complex dependency structure. 
3. ORGANIZATION OF GRAMMAR RULES FOR TRANSFER 
AND GENERATION STAGES 
3.1 Heuristic Rule First 
Grammar rules are organized along the princi- 
ple that "if better rule exists then the system 
uses it; otherwise the system attempts to use a 
standard rule: if it fails, the system will use a 
default rule." The grammar rule involves a number 
of stages for applying heuristic rules. Fig. 3 
shows a processing flow for the transfer and gener- 
ation stages. 
Heuristic rules are word specific. GRADE makes 
it possible to define word specific rules. Such 
rules can be invoked in many ways. For example, we 
can associate a word selection rule for an ordinary 
verb in a dictionary entry for a noun, as shown in 
Fig. 4. 
421 
terna l P re-transfer ~ post-transfer loop loop in TRANSFER internal 
representation -- ~ representation 
for Japanese for English ++,/ 
\ +..,,o, 
phrase 
structure,s/ 
tree ~ structure 
transformation 
MORPHOLOGICAL 
SYNTHESIS 
Fig. 3. Processing flow for the transfer and generation stages. 
(a) Activating a Lexical Rule for a Noun "~J~'(effect) from a Governing Verb "+~. + "(give) 
J-CAT= Verb J-CAT = Verb J-LEX= ~- ~ ~ (five) TRANSFER 
... b. J-/~X = affect 
// \ /---\ 
J-CAT=Noum J 
J-LEX= P~ ~(effect) J 
J-DEEP-CASE =OBJect, I 
J-N-V-PROG = ~ ~-V-TRANSFER ......... 
J-N- KOUSETSU = ~ ~-KOUSETSU-TRANSFE R I::2 
.... -" "''. P'-~- SUBGRAMMAR:~ ~- V-TRANSFER 
J /; dealing with c~ses like: / 
*'~ ... /" <VERB>:A~, ~£~ .... 
t I C~ve), (l~ive) ~ ~ected, a~ec~._ 
I 
other sub~'amrnars ~ J~(efl'eet ) 
(b) Form-Oriented Description of a Transfer Rule for a Noun "~J~m~'(effect) 
~- EFFECT +-~>~ 
\[ ftl&+t | I'\[ I++.~ 
+'+ ..+ +'~,i 
s. I It 6 
t i I = ~ 
I~FF~CT)TE 
IFtPptCTITE 
I 
I 
I 
IPE ¢. 
-,'~ it! .- 
I) X' 
!~! ~ua3 08; 
a~T ooJ I 
"tOO ~0 
f./ ~ 2 ) 
• = ~ ^.c • I ~U=G ~ l AnG + 
+~.+ i + 
/ze( 
I 
3} i: 
-! 
|! 
;J. ! 
3.2 Pre-transfer Rules 
Some heuristic rules 
are activated just after the 
standard analysis of a 
Japanese sentence is finish- 
ed, to obtain a more neutral 
(or target language oriente~ 
analyzed structure. We call 
such invocation the pre- 
transfer loop. Semantic and 
pragmatic interpretation are 
done in the pre-transfer 
loop. The more heuristic 
rules are applied in this 
loop, the better result will 
be obtained. Figs. 5 and 6 
show some examples. 
3.3 Word Selection in 
Target Language by 
Using Semantic Markers 
Word selection in the 
target language is a big 
problem in machine transla- 
tion. There are varieties 
of choices of translation 
for a word in the source 
language. Main principles 
adopted in our system are, 
(i) Area restriction by 
using field code, such 
as electrical Engineer- 
ing, nuclear science, 
medicine, and so on. 
(2) Semantic code attached 
to a word in the analy- 
sis phase is used for 
the selection ofaproper 
target language word or 
a phrase. 
(3) Sentential structure of 
the vicinity of a word 
to be translated is 
sometimes effective for 
the determination of a 
proper word or a phrase 
in the target language. 
Table i shows examples 
of a part of the verb trans- 
fer dictionary. Selection 
of English verb is done by 
the semantic categories of 
nouns related to the verb. 
The number i attached to 
verbs like form-l, produce- 
2 is the i-th usage of the 
verb. When the semantic 
information of nouns is not 
available, the column indi- 
cated by ~ is applied to 
Fig. 4. Lexicon-oriented 
invocation of grammar 
rules. 
422 
I J-CAT= N0un { 
J.CAT=Verb 
J-LEX = ~" ~, Ido not have) 
J-CAT = Nouz 1 J-LEX = ~'~(sense) 
J.-DEEP-CASE = SUBject 
{ 
J-CAT= Noun 
--J J-LEX = NIL 
J-CAT=Noun 
J-LEX = ~ in~.¢expression ) 
I 
"~ J-CAT = Al)Jamtive 
{ J-LEX = ~ ~"~ ~ Cmeaning{ess) 
=expre~ion which d~s not have sere" ~ "meaningle~ expre~ion" 
Fig. 5. An example of a heuristic rule used in the 
pre-transfer loop. 
logarithmic have integral 
characteristics equation 
integral integral 
equation equation 
l { 
have \~ with 
integral logarithmic logarithmic 
equation characteristics characteristics 
c0nductivity give effect 
effect effect 
give / 
? effect conductivity (REC: recipient) 
(3) ADJ \[~ { Sl 
~> :many Xl ~ X2 ~i X2 
~>-~:few ! 
ADJ ~ :be, exist,.. (to be determined 
I SUB at transfer step) 
X 1 
14) ~DSI (~,~) 
- . ~(+tend tO) /A 
I z 
~ ~ :there exist 
~/~ ~ :tendency 
produce a default translation. 
In most cases, we can use a fixed 
format for describing a translation rule 
for lexical items. We developed a num- 
ber of dictionary formats specially 
designed for the ease of dictionary in- 
put by computer-naive expert translators. 
The expressive power of format- 
oriented description is, however, insuf- 
ficient for a number of common verbs 
such as "~ ~ " (make, do, perform .... ) 
and "~ ~ " (become, consist of, provide, 
...) etc. In such cases, we can encode 
transfer rules directly by GRADE. An 
example is shown in Fig. 7. Varieties 
of usages are to be listed up with their 
corresponding English sentential struc- 
tures and semantic conditions. 
3.4 Post-Transfer Rules 
The transfer stage bridges the gap 
between Japanese and English expressions. 
There are still many odd structures 
after this stage, and we have to adjust 
further more the English internal repre- 
sentation into more natural ones. We 
call this part as post-transfer loop. 
An example is given in Fig. 8, where a 
Japanese factitive verb is first trans- 
ferred to English "make", and then a 
structural change is made to eliminate 
it, and to have a more direct expression. 
4. GENERATION PROCESS 
4.1 Translation of Japanese 
Postpositions 
Postpositions in Japanese general- 
ly express the case slots for verbs. A 
postposition, however, has different 
usages, and the determination of English 
prepositions for each postposition is 
quite difficult. It also depends on the 
verb which governs the noun phrase hav- 
ing that postposition. 
Table 2 illustrates a part of a 
default table for determining deep and 
surface case labels when no higher level 
rule applies. This sort of tables are 
defined for all case combination. In 
this way, we confirm at least one trans- 
lation to be assigned to an input. A 
particular usage of preposition for a 
particular English verb is written in 
the lexical entry of the verb. 
4.2 Determination of Global Sentential 
Structures in Target Language 
Fig. 6. Examples of pre-transfer rules. 
423 
non-living substance form-I structure 
social phenomena take place 
action,deed,movement occur-i reaction 
form X(obj) 
X take place 
X occur 
standard,property 
state,condition arise-i X arise 
relation produce-2 produce X 
form-I non-living substance structure X form Y 
phenomena,action cause-i X cause Y 
produce-2 
improve-i 
x produce Y property 
measure increase-2 
raise-i X raise Y 
Semantic marker for X/Y 
X improve Y 
X increase Y 
Table i. Word selection in target language by using semantic markers. 
~ (NARO) 
(1) A ~rS ~'~ ~. 
(2) A,~'\[3 I:- .~u~. 
: : __._.> 
== % ~r 
:= }, 
l: i' 
NARU\[J-VP=VI\] consist of /\ ,. /\ 
A B A B 
(SUB) (COM) (OBJ) (COM) 
NARU \[ J-VP=V2 \] /\ 
A B 
(suB) (GOAL) 
provide 
\[B. J-SEM=CE\] )" // X 
CE:means, equipment A B (AGT) (OBJ) 
reach 
MU : unit A B (OBJ) (STO) 
"B. J-CAT=ADJ ~ bzec°~e 
J-LEX= %'~ (easy) |~ / k 
I=<~(diffi- I A B 
cult) J (OBJ) (GOAL) 
turn 
\[B. J-SEM=IT, IC\] ~ / 
IT : theory ,method A B 
IC : conceptual (OBJ) (GOAL) 
object get 
B : complement marke I 
(OBJ) (GOAL) 
become 
,default\] .." / X 
A B 
(OBJ) (GOAL) 
(3) dictionary rules 
help become b give 
double become :. double 
cause become ~. A causes B 
Fig. 7. An example of dictionary transfer rules 
of popular verbs. 
Grobal sentential structures of Japanese and 
English are quite different, and correspondingly 
the internal structure of a Japanese sentence is 
not the same as that of English. Fundamental 
difference from Japanese internal representation 
to that of English is absorbed at the (pre-, post 
-) transfer stages. But at the stage of English 
generation, some structural transformations are 
still required in such oases as (a) embedded 
sentential structure, (b) complex sentential 
structure. 
We classified four kinds of embedded senten- 
tial structures. 
(i) a case slot of an embedded sentence is vacant, 
and the noun modified by the embedded sentence 
comes to fill the slot. 
(~)The form like "NI~" V ~ N2" m " (N 2 ~ NI~'V ) 
N2". In this case the noun N I must have 
the semantic properties like parts, attributes, 
and action. 
(~i~)The third and the fourth classes are particular 
embedded expressions in Japanese, which have 
the connecting expressions like "~ " (in 
the case of), " ~9~ " (in the way that, 
"g~,P " (in that), and so on. 
An example of the structural transformation 
is shown in Fig. 9. The relative clause "vhy..." 
is generated after the structural transformation. 
Connection of two sentences in the compound 
and complex sentences is done according to Table 
3. An example is given in Fig. i0. 
4.3 The Process of Sentence Generation in English 
After the transfer is done from the Japanese 
deep dependency structure to the English one, 
conversion is done to a phrase structure tree with 
all the surface words attached to the tree. The 
processes explained in 4.1 and 4.2 are involved at 
this generation stage. The conversion is perform- 
ed top-down from the root node of the dependency 
tree to the leaf. Therefore when a governing verb 
demands a noun phrase expression or a to-infinitive 
expression to its dependent phrase, the structural 
change of the phrase must be performed. Noun to 
verb transformation, and noun to adjective 
424 
(transfer) 
~ ~ ~ make 
A B C A B C 
1 SUB I /1 
B B / 
I (C:intransitive (consultation to 
verb) lexieal item C) 
(post-transfer) 
> C' 
! 
A C 
(C':transitive verb derived 
derived from C) 
A~I"8tI~ ---~A make B rotate > A rotate B 
Fig. 8. An example of post-transfer rule application. 
J-SURFACE-CASE J-DEEP-CASE E-DEEP-CASE Default Preposition 
~: (ni) RECipient REG. BENeficiary to (REC- to, BEN -- for) 
ORigin ORI from 
PARticipant PAR with 
TIMe Time-AT in 
ROLe ROL as 
GOAl GOA ~o 
Table 2. Default rule for assigning a case label of English to a 
Japanese postposition " l~ " (ni). 
JAPANESE ENGLISH 
SENTENTIAL SENTENTIAL 
CONNECTIVE DEEP-CASE CONNECTIVE 
RENYO 
(-SHI) TE 
RENYO 
(-SHI) TE 
- TAME 
-NODE 
-KARA 
-TO 
-TOKI 
-TE 
-TAME 
-NONI 
-YOU 
-YOU 
-KOTONAKU 
-NACAP~, 
-BA 
TOOL 
TOOL 
CAUSE 
TIME 
PURPOSE 
II 
MANNER 
ACCOMPANY 
CIRCUMSTANCE 
BY -ING . . 
BY -ING .. 
BECAUSE . . 
¢! 
s! 
WHEN . . 
SO-THAT-MAY 
to 
AS-IF 
WITHOUT -ING .. 
WHILE -ING .. 
WHEN . . 
,°., 
Table 3. Correspondence of sentential connectives. 
he school resign reason 
N 1 N 2 V N 3 
\[ANALYSIS\] reason(N3), \ 
resign(V) / 
-~ / m I "~'~O e /o I ~e, 
i l school(N 2 ) reason he 
\[TRANSFER\] 
73 ,PROP. CAUSE i 
N 1 N 2 (N 3) 
\[GENERATION\] NP 
N 3 RELCL 
REL/~V S , 
why 
Fig. 9. Structural transformation of an embedded 
sentence of type 3. 
425 
(a) (b) 
,ANALYSIS\] ~i \[i 
TAMENI --->V 2 YOUNI --~V. 
(PURPOSE) (PURPOSE)\[ z 
X 
\[TRANSFER\] yl \[I 
~-- > V 2 ~ V? SO-THAT-~AY SO-THAT-MAY " 
(PURPOSE) (PURPOSE) X 
\[GENERATION\] S S 
V 1 INF V 1 SUB 
TO V 2 CONJ S T 
IN-ORDER-TO X AUX V 1 
MAY 
Fig. i0. Structural transformation of 
an embedded sentence. 
transformation are often required due to the differ- 
ence of expressions in Japanese and English. This 
process goes down from the root node to all the 
leaf nodes. 
After this process of phrase structure genera- 
tion, some sentential transformations are performed 
such as follows. 
( i ) When an agent is absent, passive transforma- 
tion is applied. 
( ii ) When the agent and object are both missing, 
the predicative verb is nominalized and 
placed as the subject, and such verb phrases 
as "is made", and "is performed" are supple- 
mented. 
(iii) When a subject phrase is a big tree, the 
anticipatory subject "it" is introduced. 
( iv ) Pronominalization of the same subject nouns 
is done in compound and complex sentences. 
( v ) Duplication of a head noun in the conjunctive 
noun phrase is eliminated, such as, "uniform 
component and non-uniform component" > 
"uniform and non-uniform components". 
(vi) Others. 
Another big structural transformation required 
comes from the essential difference between DO- 
language (English) and BE-language (Japanese). In 
English the case slots such as tools, cause/reason, 
and some others come to the subject position very 
often, while in Japanese such expressions are never 
used. The transformation of this kind is incorpo- 
rated in the generation grannnar such as shown in 
Fig. ii, and produces more English-like expressions. 
This stylistic transformation part is still very 
primitive. We have to accumulate much more linguis- 
tic knowledge and lexical data to have more satis- 
fiable English expressions. 
earthquake building collapse 
collapse destroy 
building earthquake earthquake building 
= The buildings collapsed \[CPO:causal potency\] 
due to the earthquake. = The earthquake 
destroyed the 
buildings. 
Fig. ii An example of structural transformation 
in the generation phase. 
5. SUMMARY 
This paper described a number of strategies 
we employed in the transfer and generation stages 
of our Mu system to make the system both powerful 
and fault-tolerant. As is mentioned above, our 
system has many advantages such as the flexibility 
of the generation process, the utilization of 
strong lexical information. The system is in the 
course of development in collaboration with a num- 
ber of computer scientists from computer industries 
and expert translators. Some of the translation 
results are attached in the last, which show the 
present level of the translation system. Progres- 
sive improvement is expected in the next two years. 
ACKNOWLEDGEMENTS 
We acknowledge the members of the Mu-Project, 
especially, Mr. S. Takai(JCS), Mr. Y. Fukumochi 
(Sharp Co.), Mr. T. Ishioka(JCS), Miss M. Kume 
(JCS), Mr. H. Sakamoto(Oki Co.), Mr. A. Kosaka 
(NEC Co.), Mr. H. Adachi(Toshiba Co.), Miss A. 
Okumura(Intergroup), and Miss A. Okuda(Intergroup) 
who contributed greatly for the implementation of 
the system. 
REFERENCES 
\[i\] M. Nagao: Machine Translation Project of the 
Japanese Government, a paper presented at the 
workshop between EUROTRA and Japanese machine 
translation experts, held in Brussels on 
November 24-25, 1983. 
\[2\] J. Nakamura, et al.: Grammar Writing System 
(GRADE) of Mu-Machine Translation Project and 
its Charactersitics, Proc. of COLING 84, 1984. 
\[3\] J. Tsujii, et al.: Analysis Grammar of 
Japanese in the Mu-Project -- A Procedural 
Approach to Analysis Grammar --, ibid. 
\[4\] Y. Sakamoto, et al.: Lexicon Features for 
Japanese Syntactic Analysis in Mu-Project- 
JE, ibid. 
\[5\] J. Tsujii: The transfer Phase in an English- 
Japanese Translation System, Proc. of COLING 
82, 1982. 
Sample outputs as of April, 1984 are attached in 
the next page. 
426 
o 
N g~ 
':'~ i/i 
".'.'I ® 
o 
"="+ i 
o 
.~ ~ .~ 
+.-~ 
-~$ o., .Q 
E 
0 
T 
• Q. 
~.~ 
e:~ 
~.~ o o ~,L, ~.:- 
o ~:~ ~ ~ 
"o 
o) 
oJ 
c 
v,. 
m 
N 
X 
o 
e~ 
"o 
¢ 
x 
v,l 
o 
4 ® ~ g 
¢ 
- g 
~ ° 
tU .4.4 0 
m 
0 
--- ¢1 
o o o. 
.--, =--+ ~ ~' ~ 
~= ~ .j~ .+ .C 
o*,.> 
u ~o 
uJ,~ 
0 
U 
1) 
~.o 
~-, ~': ~ ~' 
0 0 
-E--- _ U 
~" ~ -~'- 
\[\] ~;.~ .~ ~: 
°.~ ~3 o o 
°~- -o~ "- " \[ 
~ ~ o~ o~ 
427 
