Organizing linguistic knowledge 
for multilingual generation* 
Martin Emele, Ulrich Held, Stefan Momma, Rdmi Zajac 
Project Polygloss 
University of Stuttgart 
IMS-CL/IfI-AIS, KeplerstraBe 17, 
D 7000 Stuttgart 1, Federal Republic of Germany 
emaih p01ygloss@informatik.uni-stuttgart.dbp.de 
Abstract 
We propose an architecture for the organisation of 
linguistic knowledge which allows to (1) separately 
formulate generalizations for different types of lin- 
guistic information, and (2) state interrelations be- 
tween partial information belonging to different lev- 
els of description. We use typed feature structures 
for encoding linguistic knowledge. We show the ap- 
plication of this representational device for the archi- 
tecture of linguistic knowledge sources for nmltilin- 
gum generation. As an example, we describe the use 
of interacting collocational and syntactic constraints 
in the generation of French and German sentences. 
1 Introduction 
1.1 The Problem 
The choice of target language realizations in ma- 
chine translation or in multilingual generation is con- 
ditioned by constraints involving different levels of 
linguistic description for the individual languages. 
From a descriptive point of view, it is desirable to 
be able to keep these different levels separate - con- 
ceptually ~ well ~s in the actual implementation of 
knowledge sources and representations of linguistic 
objects; such levels may include, for example, a de- 
scription of morphological properties, of constituent 
structure, of predicate-argument structures or func- 
tional structures, as well as a description of textual 
and pragmatic properties. 
In generation from semantic representations, con- 
straints on the choice of linguistic realizations de- 
pend on properties of the basic elements of lexical 
and syntactic variants with respect to all these lev- 
els; such constraints usually interact in various ways. 
Knowledge sources which provide the informa- 
tion necessary for modelling such phenomena should 
therefore allow for modularization as well as for 
*Reseoxch reported in this paper is partly supported by 
the Gelanan Ministry of Research and Technology (BMFT, 
Bundesminister ffir Forschung und Technologic), under grant 
No. 08 B3116 3. 
the declarative formulation of dependencies between 
information which belongs to different descriptive 
levels. 
1.2 Current Approaches 
In research on MT and NL generation, different ap- 
proaches to both problems, modularization and in- 
teraction between the modules, have been proposed. 
Although most of these approaches allow for a de- 
scription of linguistic phenomena at each individ- 
ual level, it is hard for them to explicitly express 
interactions between levels without using direction- 
ality. Usually, adjacent levels are connected by ex- 
plicit mappings. Conditions acting on nonadjacent 
levels often cannot be expressed directly: thus, in- 
formation has to be carried explicitly across levels 
where it would normally not be stated. Moreover, as 
the input structure is transformed stepwise, the set 
of mappings has to be ordered carefully. Other gen- 
eration researchers, like \[DANLOS 1987\]:96-99 and 
\[NIRENBUtlG 1989\]:242f have a similar view of the 
architecture of the linguistic description; most of 
their solutions to the above problem are based on a 
heuristic ordering of the mapping and on some loss 
of strictness in the separation of levels. 
In order to alleviate such problems, the use of 
"codescriptions" h~u been proposed 1 which allows 
for statements about the coexistence of partial de- 
scriptions belonging to separate descriptive levels. 
This device makes all relevant information avait~ 
able in one place thus allowing for constraints from 
different levels to be considered at the same time. 
Although the use of codescriptions perfectly sup- 
ports the formulation of interactions between differ- 
ent types of partial information, the current propos- 
1See. e.g. \[FENSTAD ET At,. 1987\] who annotate context- 
free rules of Lexical Functional Grammar with ftmc- 
tional as well as semantic descriptiotm, wlfich allows for 
the simultaneous construction of f-structures and situation 
schemata; an application to trmisfer has been proposed by 
\[I(APLAN ET AL. 1989\], where f-structures for different lan- 
guages are simtdtm~eously built up. 
102 1 
als do not seem to pay enough attention to the sep- 
aration <)f different types of information. In many 
cases, one of the types of information is assigned 
a predominant role (usually this is c-structure); all 
other information depends on it, i.e. cannot be ex- 
pressed without making reference to the dominating 
information. 
In this paper we propose an architecture of knowl- 
edge sources for multilingual genera.tion. It is based 
on linguistic descriptions in the format of typed fea- 
ture terms following ideas of \[Ai'T-KAcI 1986\]. This 
allows, much like object-oriented systems, for a mod- 
ularization of knowledge; we can state relations cap- 
turing interdependencies between elements of differ- 
ent des('riptive levels without losing the possibility to 
independently formulate generalizations fbr classes 
of linguistic Objects. 
2 Architecture 
2.1 The representational device: typed fea- 
ture terras 
The objects used to represent partial linguistic infor- 
mation are typed feature terms: i.e. feature terms 
where each node in the directed graph usually repre-- 
smiting an ordinary feature term can be associated 
with a type symbol. For type'symbois, the linguist 
supplies a. feature type definition which can be a fea- 
ture term, a feature term with conditions (used to 
express additional constraints), or a conjmlction or 
disjunction of feature terms. 
The system we have implemented compiles a set of 
feature type definitions, a feature type system, into a 
hierarchy of feature terms which is derived from the 
hierarchy of type symbols (iniplicitly defined by the 
set of feature type definitions) and the usual sub- 
sumption relation on ordinary feature terms. In the 
general case, the hierarchy of feature terms is a mul- 
tiple inheritance hierarchy. 
(riven an arbitrary linguistic object represented 
by a feature term which is only partially specified, 
the linguist is interested in obtaining the most pre- 
cise description of this object according to a gram- 
mar (specified as a feature type system). Given such 
a term, the interpreter computes the set of most 
specific feature terms which are deriw~d from it by 
applying feature type definitions: each member of 
the solution set is subsumed by (i.e. is more specific 
than) the initial term. For tiffs derivation, the inter- 
prefer of' the system uses only two basic operations ~, 
typed unification of feature terms, and rewriting 
ba~sed on unifying substitutions of feature terms. 
7A liiore detailed description of these operations can be 
found in \[F\]MELb~/ZAJAC 1990^\] 
"OBJECT' 
"PROPOSITION" 
'EVENT" "STATE" 
'I-ARG-EVENT" "2-ARG-EVgNT" "3-A RG--EV ENT" /\ 
"F, NCOUNTER" "DELAY" 
'~NTITY' 
"PROJECT' "TEAM" 'PROBLEM' 
Figure 1: A part of the conceptual hierarchy 
2.2 Linguistic Objects 
Descriptions of linguistic objects may contain prag- 
matic, semantic, and syntactic information. We of 
ganize tile linguistic information as a hierarchy of 
classes of linguistic objects, where we assume that 
for each of the levels, it is possible to define classes 
of basic objects, classes of structures these objects 
can be parts of, and interrelations between objects 
of different levels and between structures of diflbrent 
levels 3. 
For example, basic objects of a conceptual de- 
scription are concepts, those of a lexical description 
are lezical units (words as well as multi word lex- 
emes). Semantic structures may, for example, define 
the temporal structuring of states of affairs; syntac= 
tic structures define well-formed phrase structures 
and functional structures. Relations between con- 
cepts and lexemes define possible ways of lexicalizing 
conceptual information in a given language. Rela- 
tions between semantic and syntactic structures de- 
scribe possible syntactic realizations of semantic de: 
scriptions. 
For the remainder of this paper, we isolate the 
part of the architecture concerned with the relations 
between conceptual and syntactic descriptions, ab- 
stracting away from other types of information. 
3 Relating conceptual and syntactic 
knowledge 
3.1 A conceptual hierarchy 
Figure 3.1 shows a simplified version of the upper 
part of a hierarchy of concepts. It distinguishes be- 
tween entities and propositions (events, states etc.), 
the latter being subclassified according to their num- 
ber of arguments. The lower parts of the hierarchy 
(not shown in the figure), may contain domain spe- 
3Our semantic description is largely based on work in Dis- 
course \]{eprcsentatlon Theory \[KAMP/I\[EYbE 1990\]; the syn- 
tactic representation follows the lines of Lexical I"unctlonal 
(J r ltin I110it 
2 103 
cific concept classifications. As to the level of de- 
tail down to which the conceptual description pro- 
ceeds, it seems useful to stop decomposing concepts 
at a level where none of the languages to be treated 
has more specialized lexemes available. This allows 
to treat cases where languages differ with respect 
to the specialization level down to which they have 
lexemes. French and Italian, for example, do not 
have lexemes for the concept of "transport-in-the-ai(', 
which makes it necessary to use the lexeme denot- 
ing a generic *transport*-action and to separately 
realize (in a prepositional phrase) the instrument: 
transporter des fleurs en avion de Nice a Berlin; 
trasportare fiori in aereo, da Nizza a Berlino. En- 
glish and German, on the other hand, not only have 
lexemes for the generic concept, but also for the spe- 
cialized one, namely lo fly slh. and etw. ftiegen, as in 
fly flowers from Nice to Berlin; Blumen yon Nizza 
nach Berlin fliegen. We therefore introduce a con- 
cept *fly*, defined as a subclass of *transport* with 
*airplane* as an instrument. 
3.2 A lexical hierarchy 
In analogy to the semantic classification, we intro- 
duce a hierarchy of syntactic objects classified ac- 
cording to their syntagmatic properties; this allows 
for immediate access to information relevant for the 
realization of each lexeme. 
We use a hierarchy of subcategorization types 
where "monosemous readings" of lexical units are 
classified. As an example, we discuss some charac- 
teristics of the organization of the French verbal 
subcategorization hierarchy. Although basically us- 
ing fnnctional structures of LFG as syntactic rep- 
resentations, we do not only use LFG's grammati- 
cal function labels as a basic vocabulary 4. We also 
specify, among others, the phrasal category of the 
complement (NP, AP, finite or infinite clause, etc.), 
pronominalization possibilities (e.g. le, la, lee vs. 
en, y vs. lui, lent, etc.), and the presence/absence 
of prepositions. This decomposes grammatical ftmc- 
tions according to the distinctions relevant for 
their definition. Similar procedures have been pro- 
posed within LFG's lexical mapping theory, e.g. 
by \[\]3RESNAN/KANEKVA 1989\]. Our representation 
would also allow, without major changes, the con- 
struction of syntactic representations in the format 
of a different unification-based grammatical theory, 
if this theory makes use of the same types of elemen- 
tary distinctions as the classification used here. 
This double classification which uses categorial as 
4E.g., grammatical functions may be assigned to com- 
plements with different internal structure (e.g. NPs or sub- 
clauses, described as (OBJ)ects). 
well as pronominalization information, allows for the 
independent formulation of generalizations coded in 
the definition of the respective subcategorization 
classes; for each pronominalization type, including 
predicative and oblique complements, a range of pos- 
sible syntactic realizations is described. For each 
"monosemous reading" of a verbal lexeme we can 
thus describe in detail a set of synonymous syntac- 
tic construction variants. 
The lexical entry for enthousiasmer specifes, for ex- 
ample, that both its complements may only be real- 
ized as NPs: 
(1) enthousiasmer = 
rPRED: "enthousiasmer" 1 
2-f-verb /1ST-COMPL: subj A nominal-phrase I . 
L2ND-eOMPL: obj A nominal-phrase.J 
enthousiasmer inherits from the class 2-f-verb of 
verbs taking two complements and selects as a first 
complement a nominal phrase of type subj, and as 
a second complement an NP of type obj. subj repre- 
sents the whole range of possible structures which 
can appear as first complements; obj also defines a 
set of realization variants with the same pronomi- 
nalization behaviour, namely an NP, affirmative or 
interrogative clause or infinitivals with or without a 
or de as a preposition. 
(2) 2-f-verb = 
"IsT-COMPL: subj J)  obj v 
iobj v 1 verb-class ~ND-COMPL: 
~en-obj V y-obj V . " 
l, predi V obl 
(3) subj = nominative A 
({ nominal-phrase V affirmative V infinite}). 
3.3 Relating conceptual and syntactic 
knowledge 
3.3.1 Types of relations 
From what we said about semantic decomposition 
above, it follows that, for each concept, we assume 
there exists at least one lexeme in one of the lan- 
guages; usually there will be more than one. This 
is captured by a relation p on pairs of partial se- 
mantic and partial syntactic structures. The relation 
is specified as a feature term with top level labels 
SEM and SYN. The fact that many of the interac- 
tions between semantic and syntactic de,~criptions 
can be expressed for whole classes of structures is 
reflected by a modular and hierarchical specifica- 
tion of this relation. For example, there is a relation 
p-entity~NP which connects objects of type *entity* 
with noun phrases, or the relations p-proposition ~NP 
and p-proposition,-.VP which relate *proposition* type 
104 3 
objects with NPs (whose head is a nominalization) 
or VPs (e.g. infinitival complements, that-clauses, 
etc.), re.,~pectively. 
Since verbs are classified according to their num- 
ber of complements, it is possible to define relations 
between predicates and lexical classes, e.g. for predi- 
cates with two arguments and their two-place lexical 
counterparts (p-2arg). 
On the lexical level, relations between concepts 
and lexemes allow for the specification of lexical ma- 
terial (single lexemes, as well as e.g. support verb 
constructions etc.) available for a given concept; the 
lexical types represent a range of construction vari~ 
ants which can be formed with the lemma. So, for 
example p~enthuse relates the concept *enthuse* with 
the verbal lemlna enthousiasmer discussed in (l). The 
concept *enthuse* is defined as taking a *proposition* 
as its first ~trgnment. The lemnm enthousiasrner, how- 
ever, selects only nominal-phrase type first comple- 
ments. Consequently, subject clauses or subject in- 
finitives are ruled out for this verb, and among the 
realization possibilities for *proposition*s, only nomi- 
nalizations can be used. 
3.3.2 Using the relations in generation 
The use of the architecture of knowledge sources for 
different descriptive levels is best shown in an ap- 
plication where the interaction of constraints from 
different levels has to be treated: in the following 
example, collocational and syntactic (subcategoriza- 
tion) constraints interact. 
For the realization of a simplified conceptual struc- 
ture like (4), different English, French and German 
collocations such as (5)--(11) can be used; 
(4) *ENCOUNTER* \[ARG2: *PROBLEM*J 
such as: 
(5) .t;,': the team encounlers a problem; 
(6) E: the leaT~ comes across a problem; 
(7) F: le troupe rencontre uu probldme; 
(8) F: le troupe bule eonlre un probl~me; 
(9) F: le troupe se heurle hun probl~me; 
(10) G: das Team st&l auf ein Problem; 
(1.1) G: das Team lrifft auf ein Problem. 
The verbs used in (5) to (11) are collocationally 
p~:eferred. 
The following statements relate the concept 
*encounter* with French and German lexemes, im- 
plicitly keeping track of the colloeational restrictions 
by listing only the possible collocates of problem and 
Problem, respectively: 
(12) p-f-ll :: 
SYN: {rencontre, V buter V se_heurter}J" 
(13) p~tg-ll := 
SYN : stossen.zuf 
A simplified subcategorization description for the 
French and German verbs shows that they fall 
roughly into two classes, transitive and intransitive'~: 
French renconlrer (7) is described as transitive, tak- 
ing a subject and an object and being passivizable. 
French buter contre (8) and se heu,'ter a (9), a.~ well 
as German s toflen auf or lreffen auf (lO), (11), are 
described as taking a subject and a prepositional ob- 
ject, and disallowing passivization: 
(16) rencontrer = 
{S-O-V V s-by-v} \[PRED: "rencontrer"\] . 
(17) buter = 
\[PR.ED : "buter" \] 
s-p-v teoBJ: \[PcAsE:"co,,t~e"\] " 
(18) se_heurter ~- 
\[PRED:"se_heurter" \] 
s-p-v LPOBJ: \[PEASE: "a"\]J " 
(19) stossen~uf = 
\[PRED : "stossen" \] 
s-P-VkPOBj: \[PCASg:"auff\] " 
The problem is that collocationally correct realiza- 
tions of (4) in German are only possible with verbs 
which do not have passive forms, like sloflen auf or 
Ire\]fen auf (lO), (11). 
If we want to generate French and German fi'om a 
predicate-argument structure like (19), we still want 
to use a general relation describing the relation be- 
tween a I{ESTR. of an *entity* and i~s syntactic realiza- 
tion variants, namely a relative clause or an embed- 
ded participle. This relation carries the constraint, 
however, that the verb which the participle is to be 
derived from be transitive. Taking the French col- 
locations discussed in (7)-(9), the following realiza- 
tions for (19) exist: 
® participle: 
(20) un probl~me rencontr( a relard( un pro- 
jet; 
5We simplify considerably and abstract away, for the pur- 
pose of this paper, from the question of language-specitlc ele- 
ments in the definition of syntax:tic classes like "transitive" or 
"intransitive" and take passivization possibility as a distinc- 
tive criterion in all languages. 
4 I05 
(19) *DELAY* "ARG 1 : ITI*PROBLEM* \[RESTR: 
ARG2: *PROJECT* 
*ENCOUNTER* ARG2: I 
(20) 
SEM: *DELAY* 
SYN: S-O-V 
LARG2: *PROJECT* 
"PRED: "verzoegem" 
"PRED: to" 
SUBJ: N REL-ADJ: S-P-V / TOPIC: \[~\] RELPRO: PCASE: 'auf'j 
/suBJ:  PRED: 
LPOBJ: \[r_ej\] P R 0 
OnJ: N \[PRED: "Ptojekt"\] 
• passive relative clause: 
(21) un probldme qui a dld rencontrd a re- 
tardd un projet; 
• active relative clause with impersonal subject 
(22) un probl~me qu'on a rencontrd a retardg 
un projet; 
(23) un probld-me conlre lequel on a butd a 
retardd un projet; 
(24) un probl~me auquel on s'est heurtd a re- 
tardd un projet 
Since in German there is no collocation with the 
same syntactic structure as rencontrer un probldme, 
i.e. wlfere the collocate is a transitive verb, only 
an active relative clause with an impersonal subject 
(man) is possible for (19): 
(25) Das Problem, auf das man slbflt, 
verz@ert das Projekt 
4 Conclusion 
In this paper we proposed an architecture for the or- 
ganization of linguistic knowledge allowing for both 
(1) the separate formulation of generalizations for 
different types of linguistic information, and (2) 
the use of relations to state correspondences be- 
tween partial information pertaining to different lev- 
els of linguistic description. Typed feature terms are 
used for encoding linguistic knowledge; the TFS sys- 
tem incorporates a multiple inheritance mechanism, 
which allows to minimize redundancy within "spe- 
cialized" knowledge sources, like e.g. the hierarchy 
of subcategorization types. On the other hand, no 
interfacing problems between different levels of lin- 
guistic description arise, due to the use of one and 
the same data structure for representing all these lev- 
els for a given linguistic object. In addition, instead 
of explicitly controlling complex interactions, the 
relational approach allows to constrain realization 
choices in generation as a result of the simultaneous 
application of distributed linguistic constraints. 
The TFS system has been implemented in 
Common-LISP by Martin Emele and Rdmi Z~jac 
\[EMELE/ZAJAC 1989A\], on Symbolics, TI Explorer 
and VAX. Sample grammars have been documented 
in \[EMELE 1988\] and \[ZAJAC 1989B\]. The specifi- 
cation of the knowledge sources is currently under 
way. 
5 Acknowledgements 
Our proposal has benefited from discussions with 
Stanley Starosta, Sergei Nirenburg and our col- 
leagues at the IMS, whom we would like to thank 
for comments on previous versions of this paper. 

References 

\[A'#r-KAcI 1986\] Hassan Ai't-Kaci: "An Algebraic Se- 
mantics Approach to the Effective Resolution of 
Type Equations." in: Theoretical Computer Sci- 
ence, Vol. 45, p. 293-351 

\[B~.u ERLI?, 1988\] 
RMner B uerle: Ereignisse und Reprffsentationen, 
(Stuttgart: IBM Germany), 1988, \[LILOG-Report, 
43\], 

\[BATEMAN ET AL. 1989\] John Bateman, Bob Kasper, 
Johanna Moore, Richard Whitney: "The Penman 
Upper Model - 1988. Penman Development Note", 
ms. (Los Angeles: ISI), May 1989 

\[BrtESNAN/KANI~:aVA 1989\] Joan Bresnan , Jonni M. 
Kanerva: "Locative Inversion in ChicheWa: A Case 
S~udy of Factorization in Grammar" in: Linguistic 
Inquiry, Vol. 20/1, p. 1-50, 1989 

\[DANLOS 1987\] Laurence Danlos: The Linguistic Basis 
of Text Generation. Cambridge University Press. 

\[EMELE 1988\] Martin C. Emele: "A Typed Feature 
Structure Unification-based Approach to Genera- 
tion" in: Proceedings of the WGNLC of the IECE 
1988, (Japan: Oiso University) 1989 

\[EMELI,\]/ZAJAC 1989A\] Martin Emele, R4tni Zajac: 
"RETIF: A Rewriting System tbr Typed Feature 
Structures", (Kyoto) 1989, \[ATR Technical Report 
TR-I-0071\] 

\[EMELE/ZAJAC 1989B\] Martin Emele, Rdmi Zajac: 
"Multiple Inheritance in RETIF", (Kyoto) 1989, 
\[ATR Technical Report TR-I-0114\] 

\[EMELE/ZAJAC 1990A\] Martin Emele, R4mi Zajac: Se- 
mantics for Feature Type Systems. Internal p~per. 
IMS. University of Stuttgart. 

\[EMELE/ZAJAC 1990B\] Martin Emele, R4mi Zajac: 
"Typed Unification Grammars.", in: Proceedings of 
COLING-90, 1990, this volume 

\[FENSThD E'r AL. 1987\] Jens Erik Fenstad, Per-Kristian 
Halvorsen, Tore Langholm, Johan van Benthem: 
Situation, Language and Logic., (Dordrecht: Reidel) 
1987 

\[ItAINOI:tSEN/K APLAN 1988\] 
Per-Kristian Halvorsen, Ronald Kaplan: "Projec- 
tions and Semantic Description." ,in: Proceedings of 
the .International Conference on Fifth Generation 
Computer Systems, (Tokyo) 1988 

\[KAMP/REYLE 1990\] tIans Kamp, Uwe Reyle: From 
Discourse to Logic, (Dordrecht: Reidel): to appear 

\[KAPLAN ET AL. 1989\] Ronald M. t(aplan, Klaus Net- 
ter, J/irgen Wedekind, Annie Zaenen: "Translation 
by Structural Correspondences" in: Proceedings of 
the 4th Conference of the A CL, European Chapter, 
Manchester, 10.12 April 1989, 1989 

\[KAY 1984\] Martin Kay: "Functional Unification Gram- 
mar: a formalism for machine translation". Proceed- 
ings of Coling-84. Stanford. 

\[NIRENBUI~G 1989\] Sergei Nirenburg: "KBMT-89. 
Project Report.", ms. (Pittsburgh, Pa: Center for 
Machine Translation, Carnegie Mellon University) 
1989 

\[PoLLARD/SAG 1987\] Carl J. Pollard, Ivan A. Sag: 
Information-Based Syntax and Semantics, Vol.1, 
Fundamentals, (Chicago: University Press) 1987 
\[CSLI Lecture Notes, No. 13\] 

\[ZAIZNEN 1988\] A. Zaenen: Lexical Information in LFG 
-- At* Overview, ms. (Palo Alto: Xerox PARC) 1988 

\[ZAJAC :1989A\] R4mi Zajac: "A Relational Approach to 
Translation", (Stuttgart: IMS), internal paper, sub- 
mitted to Third International Conference on Theo- 
retical and Methodological Issues of Machine Trans- 
lation of Languages (Austin, 1990) 

\[ZAJAC 1989B\] R4mi Zajac: "A Transfer Mode\] Using 
a Typed Feature Structure Rewriting System with 
Inheritance.", in: Proceedings of the ~27th Annual 
Meeting of the A CL-89 (Vancouver, Canada) 1989 
