When Mariko ~alks ~o Siegfried 
Experiences from a Japanese/German 
Machine Translation Project, - 
l)ie~mar l{6sner 
Projekt SEMSYN, Ins~i~uL fiir Inform~t~ik 
UniversiL~t S~u~garL,Herdweg 51 
D-7000 SLu~Lgar~ 1 
WesL Germ~H~y 
Abstract 
In this paper we will report on our experiences from a 2 1/2 
year project that designed and implemented a prototypical 
Japanese to German translation system for titles of Japanese 
papers. 
Background 
An american study published in Nature, 308 (1984) 
- evaluated cir. 9000 Japanese scientific papers. 75 percent 
of them are published exclusively in Japanese, only a 5th of 
Japanese papers are currently evaluated from Western 
refereeing and information services. The main conclusion of 
the study was, that the general opinion all important 
Japanese stuff would be published in English is not true, at 
least for the applied sciences. From this background and from 
the Japanese success in a lot of fields of modern tech- 
nologies stems a wider interest in having access to Japanese 
material and in having help to overcome the language barrier. 
Die Inforaationstachnologie end ihr EinfluO auf die Rusbildung in don UGR. 
)is Graphgraamatik ale Gonerierungo-gerkzeug balm Vorstehen yon Rildern. 
in Terminal mit hochgortigen Graphik-Funkti0non~ des air sines eehrfachon 
rozassor raalistert gird. 
Faktoran zur Beoinflussung van gartungen und Vorbesearungan uon Progromman. 
Die Struktur dos Dialogs zwiechan Sgstaa-Inganiaur und Saft~are-Inganlauro 
Dis Entuicklung van gerkzaugen zur Einach~tzuno der Vorarbeltungsleistung • 
Der Gtandpunkt dos Managers. 
EIn Entuurf, der Bur Bins Sprache zur Speziflkation van Cemputerharduaro 
abgoatiwmt wird. Die Falietudie bel dar Simulation van Mlkro=Prozesaoren van 
BIt-SIIce-Typ auf dar Ebone dec RegiaterObartragung. 
=*MORE** 
III.t From Japanese to German via flTLflS/II and SEflSYN 
1. SEMSYN - a Japanese/German translation system 
The project SEMSYN-83 - SEMSYN is an acronym for SEM 
antic SYNthesis - has produced a system for the generation 
of German from semantic representations. The combination 
of this generator with the ATLAS/g-System of the Japanese 
cooperation partner FUJITSU may be seen as the first 
Japanese to German translation system. 
Die Soainflussung der Zuverl~issigkeit und dot Quel ~t~t van Software mit ainsm 
Sgstem zur UntarstOtzung dar Entuicklung air ulna= Computer. 
4~29,B6 09=57=5Z 
The analysis of the Japanese input - currently at most titles 
of scientific papers from the field of information technology 
- and its transformation into the semantic representation is 
the task of ATLAS/II. SEMSYN's part is to produce a correct 
and understandable German text for these semantic 
representations. 
2. The overall design of the SEMSYN-Svstem 
SEMSYN's generation from, FUJITSU's nets to German surface 
structures is done in three main steps. 
The first step is to transform the semantic net delivered by 
FUJITSU into an expression of our own frame representation 
language - the so called IKBS-descriptions. IKBS stands for 
I '((DEVELOP --INST-> COflPUTER) (SUPPORT --OBJ-> DEVELOP) (SUPPORT --INST -> SVSTEfl) (GIVE --INST ~> SYSTEM) (QUALITY --POSSESSOR-> SOFTWARE) (RELIflBILITV --ENUfl-> QUALITY) (GIVE --GOAL-> RELIABILITV) (GIVE --OBJ-> FlFFECT) (*NIL --ST -> RFFECT)) Ill.: SEMSVN's interface with OTLflS/II (TIT-gi) 
Instantiated Knowledge Base Schemata. This transformation 
does not only lead to a more structured representation, it 
helps as well to keep the generation modul somewhat inde- 
pendent from the special form of the FUJITSU interface. 
652 
The second -, and probably most important - step is to 
decide in which way the content of the semantic represen- 
tation should be uttered as German text. The output of this 
step is a functional description of the intended utterance in 
grammatical terms ORS = Instantiated Realization Schema). 
The IRS description completely determines the German out- 
put. Its terminal elements are root forms of German words 
and their syntactic features. 
:NG (:HERO "Beoinflussunq") 
(:FERTURE$ (:gET DEF) (:NUO SG)) (:POBSflTTR (:NG (:HERO (:NO-CONJUNCT (:NGS (:NG (:HERD "Zuverl~ssi~keit") 
(tFERTUREB (sNUff SG) (=DET DEF))) 
(=NO (:HERD "Quell=l=') (:FEATURES (:NUM 86) 
(IOET DEF)))) 
(:COBB "und"))) 
(:FERrURES (:gET DEF) (:NUfl PL)) 
(:POSSRTTR (:PG (:PREP "yon") 
(:POBJ (:N6 (:HERO "Software") (:FERTURES (:NUM SG) 
(:BET ZERO)))))))) 
(:QUBLIFIERS (:PG 
(:PREP "mit") (=POBJ 
(:NG (:HERO "Sqstem") 
(:FEBTUREB (:BET INDEF) (:COS DnT)) (:POSSRTTR 
(=PG (:PREP "zu") 
(:POBJ (=NG 
(~HERD "UnterstOtzunq") (:FEflTUREB (INUM SG) (~DET DEF)) 
(=POSSBTTR 
(:NG 
(:HERD "EnLgicklunq") 
(=FEBTUBEO (:BET DEF) (:BUff SG)) 
(:QUBLIFIERS 
(=PG (=PREP "=it") (:POBJ (:NG (:HERD "Computer") 
(:FEATURES (:DET INDEF) 
(=CRS DRT))))))))))))))))) 
I.: IRS-Deocription For TIT-B1 
The third step - the generator-front-end SUTRA-S -- takes 
the IRS description and produces a corresponding syntac- 
tically and morphologically correct German surface structure 
(Emele & Momma, 1985). SUTRA-S is an extended 
reimplementation of the program SUTRA that has been 
developped by Busemann in the HAM-ANS project 
(Busemann, 1982). 
3. Generation from frame descriptions 
3,1 The frame description language 
The formal definition of SEMSYN's frame representation is as 
follows: 
<IKBS-DESCR> ::: (A <FRAME-NAME>) 
( A <FRAME-NAME> 
WITII . <SLOTS&FILLERS>) 
(THE <SLOT-NAME> FROM <IKBS-DESCR>) 
<SLOTS&FILLERS> :== ((<SLOT-NAME> = <IKBS-DESCR>) 
... ) 
Conceptually we distinguish the following three main classes 
of frames: 
1. Case schemata for verb concepts or actions (among these 
are all those frames that have case roles as slots). 
2. Concept schemata for noun concepts or "picture 
producers". 
3. Relation schemata - ENUMERATION, PURPOSE-, SCOPE- 
Relation etc. 
(THE :OBJECT 
FROm (0 GIVE 
gITlt (:GOBL = 
(fiN ERUMERflTIOR gITH 
(:RRGL = (R RELIABILITY) (A QUALITY)) (:POSSESSOR = (R BOFTgRRE)))) 
( : INSTRLIHENT = 
(THE :INSTRUMENT 
FROM 
(fl SUPPORT 
g;(TH 
(:OBJECT = (n DEVELOP 
frITH 
(:IN$TRUffERT = (g COMPUTER)))) 
(:INSTRUMENT = (g SVSTEII))))) 
(:OBJECT = (AN RFFECT)))) 
Ill.! Frome-Ooscr!pt!on For TIT-BZ ___ 
Within this scope the repertoire of the semantic represen 
tat)on includes: 
- "classical" case roles a la Fillmore (agent, 
object, method, instrument source, goal .... ) 
- roles for the further specification of actions 
(manner, place, time ...) 
- roles for the further specification of concepts 
(name, concern, specialize ...) 
- ways to quantify and attribute concepts 
- modality (e.g. not, possible ...). 
- conjunctive and disjunctive ENUMERATION. 
3.2 Knowledge bases during generation 
SEMSYN's main generation phase may be viewed as com- 
munication between two knowledge bases: General 
knowledge about principal possibilities for realizing the 
semantic structures - the so called realization schemata 
- and specific knowledge mainly about diverse possibilities 
for lexicalization of semantic svmbols. The latter is stored 
within the semantic to German dictionary SLEX (ROsner, 
1986). 
3.3 Object-oriented implementation 
The general knowledge about possible realizations has been 
implemented using the FLAVOR system of the LISP machine. 
The classes of tile frame representation correspond to flavor 
classes. Realization schemata and the knowledge about the 
realization of roles are defined as flavor methods. This 
object-oriented architecture has shown to be very flexible. It 
supported experimenting with the system and its step-by- 
step improvement 
3.4 Realization schemata 
Frame descriptions as used in SEMSYN are recurs)re struc- 
tures and so is - in general - the control structure in 
SEMSYN's generation, In other words: the same decisions 
have to be redone on each level of embedding, tn embedded 
frames of course some decisions are already restricted by 
the context. 
What will be the syntactic form of the text generated for 
such a frame? At least for case schemata we have as first al- 
ternative the choice between the realization types :CLAUSE 
and :NG (noun group). For semantic structures from titles we 
used as default to generate a noun group (a toplevel case 
schema was lexicalised as noun). Only in a few cases we had 
titles that had to be generated as questions like "What is a 
model of ...?'. 
653 
If the general syntactic form has been decided upon, there 
are more choices: a clause for example could be realized as 
an active or a passive clause. Within a noun group the at- 
tribute could be realized as a relative clause or in the form of 
a prepositional group. 
=NG as Title-Default= 
Ole Baoinrlussung der Zuverltuslgkeit und dar Quolttat van Software mat ainu= System zur UnterstUtzung dar Entaicklung nit einom Computer, 
=NG uith ~PREFER-RELATIVE-CLAUSES*: 
Oia BoainFluosung deP Zuvarl~saigkait und dar guallt&t van SoL=ware Bit 
einem Systelp Bit dam die Entgtcklung mat ainam Computar unterstGtzt gird. 
=CLAUSE in paeuive voice= 
Die Zuverl~astgkait und die Quoltt~t van So£tuara gird mit miriam System zur 
Untarstetzung der Entuicklung mtt oinem Computer beeinfluBt. 
=CLAUSE ulth anonymous Agent= 
Man baeinftuOt die Zuverl~saigkatt und die Quotlt§t van Software mit ainem 
System zur UnteratUtzuny dar Entuicklun 9 Bit album Computer. 
IlL: Different Realisationsror TIT-81 
These decisions are done with respect to several factors. 
One is the type of the actually filled roles. If a case schema 
for example has an :OBJECT, but no :AGENT, we prefer the 
passive construction in a clause realization. On the other 
hand stylistic preferences could be another factor. In the 
above case a preference could be to avoid passive, so we 
would take the realization schema "ACTIVE with an anonymus 
agent of "man'". 
In titles these preferences come from global switches. In 
real text they could come from the context. 
3.5 Role realizations 
For frames without roles - the so called terminal structures 
- the realization is more or less the lexicalisation of the 
semantic symbol. After this, process control and the 
produced IRS structure is given back to the surrounding 
frame or the toplevel. 
If there are roles, there is some more work to be done. 
Some fillers of roles are realized as distinct structures of 
their own (mostlv noun groups). They could be uttered for 
themselves. 
Other roles only lead to changes in the IRS structure of their 
frame: 
-decision about syntactic features: fillers of a :NUMBER role 
may e.g. lead to the pluralization of the noun group of the 
modified frame. 
-creation of noun compounds as head of the actual nominal 
group: the filler of a :NAME role may become a prefix ("alas 
SEMSYN-Projekt'). This holds as well for the terminal filler of 
a :SPECIALIZE role (variant: realization as an adjective). A 
negative :MODALITY could - in a noun group realization 
- lead to the prefix "Nicht-'. 
For those frames that have roles with realizations of their 
own this procedure recursively repeats for the frame descrip-- 
tions of the fillers of those slots. 
For realized role fillers it has to be decided how their IRS- 
structure shall be integrated in the overall structure (mostly 
as prepositional group) and which syntactic features could 
additionally be inferred. 
4. Inferring of missing information 
SEMSYN's generation modul starts from a semantic represen- 
tation that was designed to be language independent. For the 
primitives used - especially for the semantic relations ex- 
pressed by the arcs in the semantic net - this may be true. 
On the other hand the data delivered to us bv FUJITSU are 
not really universal representations. The fact that the seman- 
tic nets are derived from Japanese is recognizable if one 
looks at the information that is not explicitly represented. 
In Japanese number or definiteness of nouns or time of verbs 
normally is not expressed - correspondingly our data do not 
have semantic correlates for these features (except in the 
rare case when they have been expressed in the Japanese 
original). The Japanese reader infers the missing information 
from the context. In titles there is no such context available. 
For correct and acceptable German on the other hand we 
need determiners and our nouns need a number. Therefore 
we had to develop heuristics to reconstruct this information. 
Some examples of such heuristics: 
- a nominalized case frame has to be realized with definite 
article in singular ("Die Generierun9 natQrlicher Sprache"). 
- the :OBJECT role of a nominalized case frame should be 
realized indefinite and plural ("Die Generierung van Titelo'), 
except in cases with an exception information in SLEX ("Die 
Wartung van Software"). 
- concepts that have a :NAME role will be realized definite 
and singular ("pie Fourier-Transformation"). 
If no heuristic is applicable and if no SLEX information is 
found we use as title defaults 'indefinite' and 'singular' ("~=n 
Verfahren'). 
5. Concluding remarks 
Our current concern is to broaden the applicability of 
SEMSYN's generator for German: On the one hand we are 
experimenting with the generation of full texts (e.g, 
newspaper stories), on the other hand we are extending the 
repertoire of feasible semantic structures that mav serve as 
input for the generator. 
Acknowledgement: 
SEMSYN-83 has been funded by the West German Ministry 
for Research and Technology (BMFT) from July 1983 till 
February 1986. Special thanks to all the colleagues that col- 
laborated - for shorter or longer periods - within this project: 
Kenji Hanakata, Joachim Laubsch, Arek Lesniewki and Shoichi 
Yokovama. 
References 

Busemann, S. (1982) "Probleme der automatischen 
Generierung deutscher Sprache", HAM-ANS Memo 8, 
Universit~t Hamburg 

Emele, M. & S. Mamma (1985) "SUTRA-S - Erweiterungen 
eines Generator-Front-Ends fuer das SEMSYN-Projekt', 
Studienarbeit, Inst. f. Informatik, Univ. Stuttgart 

Laubsch,J., RSsner,D., Hanakata,K., Lesniewski,A. (1984) 
"Language Generation from Conceptual Structure: Synthesis 
of German in a Japanese/German MT Project", in: COLING-84, 
Proceedings, Stanford 

RSsner, D. (1986) "SEMSYN - Wissensquellen und Strategien 
beider Generierung van Deutsch aus ember semantischen 
Repr~isentation', in: Batori & Weber (Eds.) Neue Ans~itze in 
Maschineller SprachUbersetzung: Wissensrepr~sentation und 
Textbezug', Niemever Verlag, T(~bingen 

Uchida, H. & K. Sugiyama (1980) "A machine translation sys- 
tem from Japanese into English based on conceptual struc- 
ture ", in: COLING-80, Proceedings, Tokvo, S. 455-462 
