A Prototype English-Japanese Machine Translation System 
for Translating IBM Computer Manuals 
TaiJiro Tsutsumi 
Natural Language Processing 
Science Institute, IBM Japan, Ltd. 
5-19, Sanban-cho, Chlyoda-ku 
Tokyo 102, Japan 
ABSTRACT 
This paper describes a prototype English-Japanese 
machine translation (MT) system developed at the Sci- 
ence Institute of IBM Japan, Ltd. This MT system cur- 
rently aims at the translation of IBM computer manuals. 
It is based on a transfer approach in which the transfer 
phase is divided into two sub-phases: English transfor- 
mation and Engllsh-Japanese conversion. An outline of 
the system and a detailed description of the 
English-Japanese transfer method are presented. 
I. Introduction 
The Science Institute of IBM Japan, Ltd. has been 
involved in Engllsh-Japanese machine translation for 
four years (I). We have developed a prototype capable 
of translating IBM computer manuals into Japanese. 
This system is based on a transfer approach in which the 
transfer process consists of English transformation and 
English-Japanese conversion. This MT system aims at I) 
high-quality translation; 2) an easily maintained 
transfer component; and 3) a smaller English-Japanese 
terminology dictionary. The transformation rules and 
the conversion rules are presently being constrBcted 
through tests using the IBM manual "VM/SP General 
Information" (60P). 
We are focusing on translation of IBM computer manu- 
als for 3 reasons: I) high-quality translation is 
expected in a limited area; 2) English IBM manuals are 
presumably written as clearly as possible according to 
an IBM internal standard; 3) we already had a practical 
Engllsh-Japanese terminology dictionary for human 
translators. 
Most MT systems developed in Europe and the U.S. 
deal with language pairs in the Indo-European language 
group (2). In the case of English-Japanese translation, 
since both languages are categorized in different lan- 
guage groups, a more powerful linguistic mechanism must 
be implemented. For instance, word order and sentence 
style are different and moreover an English word some- 
times corresponds to more than one Japanese equivalent. 
To overcome these difficulties, an English-Japanese or 
Japanese-English MT system might he based on a transfer 
or interlingua approach with a wide range of 
tree-transduclng Capabilities and a semantic processing 
mechanism. 
2. Overview of the s s~ 
Fig. i shows the overall translation process. First of 
all, an English sentence is syntactically analysed in 
the English analysis phase. The output of this analy- 
sis is one or more English parse trees. Second, in the 
Engllsh-Japanese transfer phase, an English parse tree, 
or an English intermediate representation, is trans- 
ferred to a corresponding Japanese tree, or a Japanese 
intermediate representation. During this transfer, an 
English parse tree is at first transformed by the 
transformation component to an English tree in 
Japanese-like style, and this result is converted to a 
Japanese tree by the English-Japanese conversion compo- 
nent. 
646 
An English Sentence 
I 
I English Analysis \] 
I English Intermediate Representation 
<English Analysis Tree(s)> I 
English-Japanese Transfer 
\[ English Transformation \] 
English-Japanese Conversion\] 
I Japanese Intermediate Representation 
<A Japanese tree> 
f 
Japanese Sentence(s) 
Fig. 1. Overall translation process 
Finally, in the Japanese generation phase, one or more 
Japanese sentences are produced by operations such as 
generating Japanese auxiliary verbs, determining 
Japanese case particles, and rearranging word order. 
At present, the components shown in Fig. 1 are all 
implemented in LISP. 
3. English Analysis 
For analysing English, we are making use of the English 
parser, the English analysis grammar, and the English 
analysis dictionary developed by G. Heidorn et el. at 
the IBM T.J. Watson Research Canter (3). The English 
analysis is based on an augmented phrase structure 
grammar and is syntactically performed in a bottom-up 
and parallel manner. This English analysis aims at 
area-independent, high-performance and fail-soft anal- 
ysis. The area-independent feature means portability of 
this analysis component to other application areas. The 
fail-soft feature is important for a practical MT sys- 
tem which should provide some Japanese segments for a 
human translator even if the parser fails to analyze 
the input sentence as a complete sentence. 
As the syntactic analysis of English sometimes 
produces more than one parse tree, the English parser 
computes metric values which indicate plausibility of 
the parse trees based on the characteristics of the 
modifications between phrases (4). When more than one 
parse tree is obtained by analysis, semantically incor- 
rect parse trees are discarded during the 
Engllsh-Japanese transfer. If more than one Japanese 
tree remains after the transfer, the metric values 
copied from these English parse trees to corresponding 
Japanese trees are used to rank these Japanese trees in 
terms of plausibility. The Japanese tree which has the 
least value~ namely the most plausible one, is chosen 
by the MT system. 
4:._~1_ I s h -~:{t~les e Transfer 
GEnerally, tlm transfer process of u transfer approach 
includ:lng semantic processing tends to become compli- 
cated and then difficult to maintain. But a transfer 
approach seems to be the most straightforward for 
implementing human translators ' knowledge which 
includes various types of linguistic information such 
as specific words, syntactic structures, and semantic 
information. 
ThErE are many Engllsh..proper expressions, such as 
'It-that', tree-to', and 'there-be'. Their sty\]as are 
very different :from Japanese ones and flare no simple 
contrast explessions in ,JapanesE. The EnglJ.sh-Japanese 
transfer component of our system is divided into two 
separate components: an English transformation compo- 
nent and an English-Japanese conversion component. We 
ca\]\]. our apploach a two-pass transfer method. By using 
English transformation rules, the English transforma- 
tion component rewrites an English parse tree and 
produces a new style of Eugl:ish tree which is close to 
Japanese syntax. This can then Easily be converted to a 
corresponding Japanese tree. When we expect dJfferent 
English Expressions t:o be translated to the same 
Japanese expcession, we only have to write Englisll 
transformation rules instead of E~.,\] transfer rules of a 
conventional transfer approach. Moreover, when we have 
a MT SyStEm change a Japanese expression, we are 
rEquirEd only to modify some E-J conversion rules 
instead of modifying a larger number of relating E-J 
transfer ru\].es. Consequently, ,:he two-pass transfer 
method provides us with modularlty and maintainability 
Df the transfer component. 
4.1EnnKllsh Transformation 
English transformation is performed by using English 
• transformation rules and a transformation dictionary. 
The transformation sometimes requires a derivative form 
of an English word, such as a verbal form of a noun and 
an adverbial form of an adjective. The transformation 
dictionary contains this sort of derlvational data. 
The transformation rules are categorized into groups 
according to syntactic categories of nodes of parse 
trees. Each group is also classified into several 
sub-groups. For example, the rule group for a sentence 
consists of 22 sub-groups, such as an inverslon-rule 
sub-group, an insertion-rule sub-group, and an 
ellipsis-rule sub-group. The following are examples of 
applications of the rules to sentences. 
It is required that you specify the assignment. 
-> That you specify the assignment is required. 
There are several records in the file. 
-> Several records exist in the file. 
System operation is so impaired that the IPL 
procedure has to be repeated. 
-> Because system operation is very impaired, 
the IPL procedure has to be repeated. 
The routine has a relatively low usage rate. 
-> Usage rate of the tontine is relatively low. 
The following are examples of applications of the rules 
to noun phrases. 
execution of the program 
-> executing the program 
a disk available with ... 
-> a disk which is available with ... 
\]'he transformation is performed in a top-down manner 
along an English parse tree. At each node of a tree, a 
corresponding rule group is retrieved according to the 
syntact:ic type of tile :node and th:Is ru\]e group is 
app\].Jed to the sub-tree only once. In this application 
of the rule group, each sub-group is sequentially 
applied to the sub-tree only once. If a matcI~ing pat- 
tern of a transformation rule matches the sub-.tree and 
a target pattern produces a new tree, the rest of the 
rules :in tile sub-group are no longer used ~qnd process- 
ing of tile next sub-group begins. We have dEsignEd the 
rulE groups and their sub-groups to avoid backtracking 
and repetitive application of the same rule.. 
A transformed Eng\].isil tree is convarted to a corre ~ 
spoudiug JapanEse tree by us:lug conversion rn\]es and a 
conversion dictionary. The functions of this process 
are \]) determining appropriate Japanese syntax, equiv- 
alents, and additional linguistic data such as tense, 
aspect, modality, and vOiCE; and 2)d:isambiguating mod- 
ifications of English phrases. 
Technique/Theory 
Feature/Ability 
Information 
Attribute 
Value of AT 
Human 
Unlt/Device ._ 
One of tim basic approaches to semantic processing ;in 
MT is to use semantic markers of nouns. We have defined 
24 sEmantic markErs specific to computer manuals, which 
wi\]\] be Effective in tratlslating IBM computer manuals. 
Table 1 lists ail.\], of tI, e semantic markers and their 
meanings. 
Markors~. Meanings M~rkers ~--~! Meanings 
LC ~Logical Container WK \[Work/Action 
I,E |Logical Entry PS \]Predicate 
LP |Logical Path AP Attribute of PS 
DM |Document SL Supply 
ST |State PT Part 
TH DT \[Term of documents 
FA ML Material 
IF TM Time 
AT PI, Place 
VA PN Person's Name 
HM PO \]Point 
Organization UD .... OG \[ ......... 
Table I. Semantic markers 
Nouns in computer manuals have one or more semantic 
markers. For example, "file" has "LC" and "LE", "pro- 
gram" has "LE", and "operator" has "LE" and "}{M". This 
set of markers is so slmple that maintenance is easy. 
4.2.2. E-J Conversion Dictionar\[~ 
In the English-Japanese conversion dictionary, condi- 
tions for conversion are described by a combination of 
English syntax, semantic markers and sometimes specific 
Japanese words. The conversion dictionary is divided 
into sub-dictlonaries, such as a verb-dlctionary, a 
noun-dictionary, and a prepositional-dictionary. Fig. 
2 shows an example of an entry of the verb-dlctlonary in 
the case of "provide". 
( "provide" 
((SB (S ((LE UD) Y1 "ga"))) 
(DO (S ((FA AT) YI "we"))) 
(P "sonae" PYI (V SHIHOI NIL JY00TAI TRANS))) 
((SB (S ((DM ~ UD) Y1 "ga"))) 
(1)0 (S ((BM) Y1 "ni"))) 
("witb" (S ((IF FA AT) Y1 "we"))) 
(P "teikyo" PYI (V SAHEN NIL KEIZOKU TRANS))) ) 
Fig. 2. An example of an English-Japanese 
conversion dJctlonary entry one type of ... -> one-type-of ... 
647 
The upper half of the description in Fig. 2 specifies 
that if the subject of a sentence has semantic marker 
"LE" or "UD" and the first object has marker "FA" or 
"AT", then choose the Japanese case particle "ga" for 
the first Japanese noun phrase, the Japanese case par- 
ticle "wo" for the second one, and the Japanese verb 
"sonae" as the proper equivalent for the English verb 
"provide". "YI" and "PYI" in Fig. 2 specify types of 
corresponding Japanese sub-trees to be generated. The 
lower half of the description gives a similar rule to 
the previous one except for an additional condition on 
a prepositional phrase. This part specifies that if 
the conditions are met, then use Japanese case parti- 
cles "ga", "hi", and "wo" in this order and select 
"teikyo" as the appropriate Japanese verb. 
The verb-dictionary is used to convert an English 
surface case structure into a Japanese one directly by 
depending upon the semantic markers. This conversion 
must be more efficient than in the case where deep cases 
are introduced so as to pursue similar semantic proc- 
essing. This conversion determines an appropriate 
Japanese verb, Japanese case particles, and Japanese 
syntax of a simple sentence at the same time. In some 
cases, an appropriate Japanese equivalent for an Eng- 
lish noun phrase is successfully selected based on 
these conditions when the English noun phrase has more 
than one Japanese equivalent. Moreover, application of 
these entries also means a semantic check of the input 
from the computer area's point of view. Consequently, 
if there is no entry applicable to the input simple sen- 
tence, it is deemed inappropriate for computer manuals 
and is rejected by the system. This contributes to 
disambiguation of English analysis trees. 
Additional linguistic data of an English simple sen- 
tence concerning tense, aspect, modality, and voice, 
are also converted to corresponding data of a Japanese 
tree by using a contrast conversion table and the con- 
version dictionary. For example, voice and aspect of 
an English sentence are changed in a Japanese sentence 
according to the characteristic of the verb. 
4.2.3. E-J Translation of Simple Noun Phrases 
One of the issues in MT is how to create and maintain a 
large terminology dictionary. Generally, a technical 
document includes a number of technical noun groups. 
We call a noun phrase which basically has no post modi- 
fier a simple noun phrase (SNP), such as "a procedure 
library", "system-to-operator communication", "IBM 
supplied licensed and nonlicensed programs" and "page 
34". 
Our MT system facilitates a component for translat- 
ing SNPs. Even if the terminology dictionary does not 
have the entry in whole, a long SNP which is composed of 
many words is successfully translated by appropriately 
assembling the dictionary data of all elements of the 
SNP. This is mainly due to the similarity of syntax of 
SNPs between English and Japanese. 
The functions of the SNP translation component are 
to choose appropriate Japanese equivalents for various 
parts-of-~peech(e.g, noun, adjective, adverb); to 
insert "no" between noun phrases; to reorder Japanese 
equivalents; to process conjunctions within a simple 
noun phrase; and to handle hyphenated words. These are 
achieved by using a special dictionary for translating 
SNPs and co-occurrence frequency data of words or 
semantic markers in IBM computer manuals. 
4.3 E-J Conversion Process 
The English-Japanese conversion component subsequently 
648 
converts a transformed English tree to a Japanese tree 
in a bottom-up and parallel manner along the English 
tree. 
First of all, the English-Japanese conversion dic- 
tionary is searched for all English words which are 
terminal symbols of the English parse tree. This is 
part of Engllsh-Japanese conversion of the lowest level 
sub-trees of the English tree. An upper level English 
sub-tree is converted to a corresponding Japanese 
sub-tree by using the English-Japanese conversion rules 
and by using the English-Japanese conversion results of 
the current level English sub-trees. The category of 
the top node of the upper sub-tree determines which set 
of Engllsh-Japanese conversion rules is to be applied. 
During the conversion of sub-trees, semantic processing 
is performed according to the data in the 
English-Japanese conversion dictionary as mentioned 
earlier. 
5. Japanese Generation 
The Japanese generation component produces one or more 
Japanese sentences from a Japanese tree which conveys 
Japanese syntax, Japanese equivalents, and other infor- 
mation. 
The functions of this component are to generate 
Japanese auxiliary verbs; to determine appropriate 
Japanese equivalents of adverbs, negation, determiners 
and conjunctions including subordinate conjunctions; to 
position Japanese adverbial phrases in a Japanese sen- 
tence; to modify Japanese case particles; to reorder 
Japanese noun phrases; to insert punctuations; and to 
erase a duplicate Japanese subject. Japanese auxiliary 
verbs are generated based on Japanese verb information, 
such as the original form of the verb, the conjugation 
type of the verb, tense, aspect, voice, and modallty. 
6. Conclusion 
We have described a prototype Engllsh-Japanese machine 
translation system based on a two-pass transfer 
approach. Introduction of separate English transforma- 
tion in the E-J transfer makes the transfer component 
easy to maintain. 
We have proposed a set of semantic markers specific 
to computer manuals and the Engllsh-Japanese conversion 
dictionary so as to perform hlgh-quality translation. 
The mechanism of selecting appropriate Japanese equiv- 
alents and syntax is simple and effective. We will con- 
tinue to enhance our MT system to translate many kinds 
of IBM computer manuals into high-quality Japanese. 
References 

i. Tsutsumi, T. "On the Machine Translation from Eng- 
lish to Japanese" in Tokyo Scientific Center Report 
N:G318-1571 (1982) 

2. Slocum, J, "A Survey of Machine Translation: its 
History, Current Status, and Future Prospects" in 
AJCL Ii-i (1985) 

3. Heldorn, 8. E., K. Jensen, L.A. Miller, R.J. Byrd, 
and M.S. Chodorow. "The EPISTLE Text-Critiquing 
System" in IBM Sys. J. 21.3 (1982), 305-326. 

4. Heidorn, G.E. "Experience with an Easily Computed 
Metric for Ranking Alternative Parses" in Proc. 
20th Annual Meeting of the ACL. Tronto, Canada 
(1982), 82-84. 
