ATOMIZATION IN GRAMMAR SHARING 
M~umi Kamey-m~, 
Micrneleclmnim and Compui~" Technology Coopomtion (MCC) 
3500 West Balcones C.enm" Drive, Austin, Tcxas 78759 
megumi@mcc~om 
ABSTRACT new insights with which to account for certain linguistic 
We describe a prototype SK~RED CmAt~eAR for the 
syntax of simple nominal expressions in Arabic, E~IL~lx, 
French, German, and Japanese implemented at MCC. In 
this Oamm~', a complex inheritance ian/cc of shared 
gr~mmAtlcal templates provides pans that each language 
can put together to form lansuug~specific gramm-ti~tl 
templates. We conclude that grammar shsrin 8 is not only 
possible but also desirable. It forces us to reveal cross- 
liuguistically invm'iant grammatie~ primitives that may 
otherwise rem~ conflamd with other primitives if we deal 
only with a single ~.nousge or l-n~uuge type. We call this 
the process of OaA~O~AT~CAL ^TOI~aZAT~ON. The specific 
implementation reported here uses catcgorial tmifr, ation 
grammar. The topics include the mono-lcvel nominal 
category N, the functional distinction between 
ARGUMENT and NON-ARGUMENT of nominals, 
grammatical agreement, and word order types. 
Is grammar sharing possible? 
The multill.eual pmjec~ of MCC a~mpts to build a 
grammatical system hierarchic~tily shared by multiple 
languages (Slucum & Justos 1985). ~ ~ as 
proposed should have an advantage over a system with 
separate grammars for different languages: It should reduce 
the ~ of a mnllflinsual rule base, and fecilltat~ the 
addition of new languages. Bef~e Inesenting evidence for 
such advantages, however, there is the basic question m be 
answered: Is grammar sharing at all possible? Although it 
is well known that languages possess similarities based on 
genetic, typological, of areal grounds, the question remains 
whether and how these ~imilarities translate into 
computational techniques. 
In this paper, we will describe a prototype shared 
for simple nominal expressions in Arabic, 
English, French, German~ and Japanese. x We conclude that 
grammar sharing is not only possible but also desirable. It 
forces us to reveal crces-liuguiatic~y invariant 
grRmmAtiCal primitives that may otherwise 
confiated with other primitives if we deal only with a single 
language of language type. We call this the process of 
~Tlf.~. ATOMmA~ON 2 forced by grammar sharing. 
Each language or language type is then characterized by 
particular combinations of such primitives, often providing 
Xpreliminary investigations have also been made on 
Spanish, Russian, and Chinese. 
2The verb atom/ze means "to separate of be separated 
into free atoms" (The Collins English Dictionary, 2nd 
edition, 1986). 
problems. Before we go into more derail, the following is 
our view of what general components and mechanisms 
COllStiUlle 8 shared gr~ntle~l SyStem- 
Bask mechanisms In a shared grammar:. The 
process of buildiug a shared grammaT, in our view, requires 
(i) linguistic description of a set of languages in a common 
theoretical framework, (ii) a mechanism for E~1~ACr1~O a 
common grammatical asse~on from two or more 
assertions, and (fii) a mechanism for MEROINO grammatical 
asse~ous. The linguistic description should define certain 
string-combination operations (defined on siring I"YI~) 
associated with information structures. Then what we do is 
identify shamble packages of common string-types and 
information slmctures among independently motivated 
languuge-spccific grammatical assertaions. These 
packages are then put into the shared part of the grammnr D 
and the remaining language-specifics are potential sources 
for mofe sharing. This extraction is essential in what we 
call ATOMIZATION, which is basically "breaking up of 
grammatical a~gions into mailer independeot parts" (Le. 
decomposition). If we assume that all grammatical 
aase~iem ~e expressed in terms of FEAI"ORE ST~UCTtn~ES 
(Shieber 1986), the atomi.Jtlon process would be defined 
mound the notion of <~2q~.,,H~TION (i.e. reverse of 
Ut~C.A~ON) as follows: 
basic at~s/za~a.. Given two feature 
structures, Xa for category X in language A end 
Xb for category X in language B, the shared 
m'ucture X~t for category X is the 
~'nON of Xa and Xb (i.e., the must 
specific feature slmcmm in commnn with both 
Xa and Xb). Xa is separated out of eithar Xa or 
Xb, and placed into the shared space. 
Consequently, a ~ ofdering is established 
wlm~fin Xa sue~ Xa and Xb, respectively. 
There is an underlying assumption that two language- 
specific de~uitiom of a commn~ grammatical camgony 
share something in comn~a no matter how small it is. This 
means that the linguis~ descriptive basis is questionable if 
the content of Xa above is nulL Conversely, if clo~ly 
common information structures appear under language- 
specific definitions of distinct grammatical categories, we 
may suspect a basis for a new common grammatical 
category. 
Once the shared and iauguage-spucific pm'ts are 
separated out, a mechanism for merging them is necessary 
for successfully incorporating the shared assertion into the 
language-specific assertion. ~m~c.ATIO~ by n~rr~.~c~ 
is such a merging mechanism that we employ in our system 
(see below). The shared space is a complex inheritance 
lattice that provides various predefined grammatical 
assertions that can be freely merged to create language- 
specific ones. 
194 
/ / I 1"~6 "~-/. \ \ ~A,~"~~ 
T ?,TYT?WI qi..nun qi..t~ neko cats cat Katzen Katze c~ ~ij ~ieCrSer 
which welcher que! 
Film 1. A simplified shared httt/¢e 
Shared inheritance lattice: Let us now take • look at 
a grossly simplified shared inheritance lattice that results 
from the process described above. See Figure 1. Them is • 
universal notion N(ominal) in all five languages under 
consideration. This common notion is part of the N 
definition of each language by inheritance. There ~e some 
nominals that am 'complete' in the ~mse that they can be 
used as subjects or objects (e.g. I saw ¢~s/¢~ cat.). Some 
others am 'incomplete' in that they cmnot be used as such 
(e. 8. I saw scat.). General notions Complete and 
Incomplete are thauby defined for characterizing relevant 
nominal classes of each language (see the diacmufion on 
ARG vs. NON-ARG below). Since Determiners in 
English, German, and ~ch make such incomplete 
nominals complete, the Determiner definition inherits (i.e. 
includes) the definition of Complete. Lexical items in these 
languages are defined by multiply inheriting relevant 
assertions: 
In what follows, we will f'n'st describe the specific 
linguistic and computational approaches that we employed 
to build our first shared grammar. We will then discuss the 
grammatiCul primitives for chm'ac~rizing scne~d 
nominals, ednommal modifiers, agreem~t, and word order 
types, illustrating solutions to specific cross-linguistic 
problems. We will end with prospects for further work. 
Framework 
Grammatical framework: We use a cutogorial 
unification grammar (CUG) OVittenbur8 1986a; Karmmea 
1986; Uzkoreit 1986b). The one described here is a non- 
directional categorial system (e.g. Montague 1974; 
Schmerling 1983; van Benthem 1986:Ch.7) with a non- 
directed functional application rule as the only reduction 
rule (i.e., a functor XIY may combine with adjacent Y in 
either direction to build X). Non-directionality allows for 
desired flexibility in the shared part of the grammsr. A 
sepm-ate compommt constrains the linear ord~ of elements 
in each lmguage (see Arislar 1988 for motivation). 
Unification and template inheritance: CUG's lexical 
orlentafioo end unification arc employed. In the t.e~coN of 
each kngusgu, lexical itema are defined to be the 
unification of language-specific ¢mAMMA~C.~ ~T~S 
(Shinber 1984, 1986; Ftickeoger et al. 1985; Pollmd & Sag 
1987). These language-specific templates, prefixed with 
AR(abic), EN(glish), FR(ench), OE(rman), and JA(panese), 
In fesm~ slzuctun= composed by multiplc inheritance 
from sluu'ed gra~atle~! templates prefixed with SO (for 
"Shm~d Grammar"). SG-templates are tbemsclves 
composed by multiple iulm'imnce in a complex 
INHI~rrANCZ LATI'/CE, whose holXom-end feeds into 
language-specific templmes. Tbe CUG parser (MCC's 
Astm, Wittenberg 1986b) applies reduction rules to the 
feature struclan~ of words in the input slring. 3 Arabic and: 
Japanese strings are currently represented in RomAn letters 
(augmanted for Arabic) with spaces between 'words'. 4 
3Tho parser is linked m an independently developed 
morphology analyzer (Slocum 1988). This enables each 
word to undergo a morphological analysis including a 
dictionary look-up of the root morpheme, and to output a 
list (or altel'llative \]JsLq) of ~mmatiCal ~m~la~ llsm~ 
that, when their contents ere unified, produce a single 
fealme s~rucmre (or more than one if the word is 
ambiguous) for that particular token word. 
4If we were to process Japanese texts directly, the system 
would have to perform morphological end syntactic 
analyses simultaneously since there is no explicit word 
boundaries. (Thh is one of the strong motivations for our 
recent movement toward building a new CUG-based 
morphology system.) 
195 
Present linguistic coverage 
Simple nominals: The present linguistic coverage is 
the syntax of ~ NOMINALS: nouns and nominal 
expressions with lexical or phrasal modifiers such as 
attributive adjectives (e.g. long), demonstratives (e.g. th/s), 
articles (e.g. the), quanth"ters (e.g. a//), nmnera~ (e.g. 
three), genitives (e.g. of the Sun), and pp-modifiers (e.g./n 
the ocean). Complex nominals including conjunctions, 
derived nominals, gerunds, nominal compound& and 
relative clause modification have not been handled yet. 
Data ualysis: We first analyzed a data chart of simple 
nominals in each language. The chart focused on the 
syntactic well-formedness of nominal expression& in 
particular, the order and dispensability of elements when 
the nominal expression acts as an argument (e.g. subject, 
object) to a verb or an adposition (Le. preposition or 
postposition). 
Shared templates overview 
By design, the SG-LATHCE captures shared grammatical 
fealmcs in the given set of languages, whether they me due 
to universal, typological, genetic, or meal bases. As our 
research proceeded, we observed an atomization process 
whereby more and more grammatical properties were 
distinguished. This was because certain grammatical 
characterizations that seemed most natural for some 
language(s) were only partially relevant to others, which 
forced us to break them down into smaller parts so that 
other languages can use only the relevant parts. 
Modules in the SG-iattke: As the shared templates 
underwent atomization, we created sublattices 
corresponding to independent grammatical modules so that 
a grammar writer can make a langnage-specific 
combination of shared templates by consciously selecting 
one or more from each group. The existing subgroups me: 
(i) categorial grammar categories (the theory-dependent 
aspect of the shared grammar), (ii) common syntactic 
categories (theory-independent linguistic notions), (iii) 
grammatical agreement (to handle grammatical agreement 
within nominals), (iv) reference types (semantic features of 
the nominals, e.g. definite, indef'mite, specific), (v) 
determiner types (to handle co-occurrence and order 
restrictions among determiners), and (vi) atlributive 
modifier types (to handle order restrictions among 
attributive modifiers). We will focus on (i)-(iii) in this 
paper. 
Kinds of SG-templates: SG-templatns as they exist 
fall under the following types. The most general distinction 
can be made between ATOMIC and COM~rrE templates. 
Atomic templates inherit from no other template. They 
result from the atomization process, and are primitive parts 
that a grammar writer can put together to create mere 
complex templates. A composite template inherits from at 
least one other, to which a partial slructure defined for 
itself may be added. We may also distinguish between 
UTn.r~ and sUeSTA~rnve templates. Utility templates 
contribute integral parts of categodal grammar categories 
such as how many arguments they need to combine within 
none for a BASIC CATEGORY, ~ one or more for a 
PUNCIDR CA'EBGORYo Substantive templates supply 
grammatical categndes and features expressed in terms of 
various linguistic notions. Specific examples are discussed 
below. 
Highlights of shared grammatical atoms 
The basic graph structure 
Each word must be associated with a complete CUG 
feature structure. The current implementation uses a 
malx~ notation for ACYCLIC DIRP.~-I-~ GRAPH. ~ Figure 
2: 
\[result: \[cat: \[ \] 
index: \[ \] 
agr: \[ \] 
feats: \[ l 
type: \[ \] 
elements: \[ \] 
order: \[ \] 
arguments: \[ \]\] 
<- the syntactic type of (~ 
<- relative linear position of (~ 
<- grammatical agreement features of o< 
(optional) 
<- pragmatic agreement features of ~-, 
<- the functional type of ¢x (see below) 
<- elements within c~ 
<- order of elements (see below) 
<- arguments sought (see below) 
l~lure2. Tae notation for a word whose resulting structure is ot 
A ca~gnry is either SATURXT~D (looking for no 
argumen0 or UNSATU~TED (needing to combine with one 
or more arguments). It is saturated when the value of 
ARGUMENTS is 'closed' with symbol #. An unsaturated 
category may seek one or more arguments, each of which 
is either unspecified (\[ \]) or typed (e.g. \[cat: N\]). Overall 
• saturation is sought in parsing. The parser assigns index 
numbers to words in the input string from left to right, and 
coindexes corresponding subsWactares under ELEMENTS. 
The ELEMENTS component currently has A for the word 
for which this structure is defined, B for the first argument, 
and C for the second argument. These labels simply flag 
PATHS for accessing particular elements. There can be any 
number of order-relevant labels corresponding to an 
element. These labels, with coindices with respective 
elements, are in the ORDER component, which is subject 
to the Word Order ConsU'alnt (discussed later). TYPE is 
the slot for assigning the pseudo-functional category ARG 
or NON-ARG that we found significant in the present 
cross-linguistic treatment of nominals (see below). 
AGR(eement) and FEATS subgraphs contain grammatical 
and pragmatic agreement features, respectively (discussed later). 
196 
atomic templates 
%SG-NO--ARGUMEN'I~: \[arguments: #\] <- saturates the category 
$SG-LEX: \[result: \[elements: \[a: \[lex: \[ \]\]\]\]\] <- has a slot foe the word form 
%SG-WORD-FEATS-ARF~TOP-FEATS: <- passes the word's own features to the top 
\[result: \[feats: <1> elements: \[a: \[feats: 1\[ \]1111 
inheritance of composite templates 
%SG-WO RD- FEATS-ARE-TOP-FEATS $SG-LEX ",,,/ 
JA-N EN-N FR-N GEoN AR-N 
FISUm 3. C~nerai N 
A few more remarks about the notation follow. A 
value can be either atomic (e.g N), a disjunction of atomic: 
values enclosed in curly brackets (e. 8. {N P\]), or a 
complex feature structure. It can also be umi~ffied (\[ D. 
The identity of two or more values is fo~.~d by reenmmt 
structmm indicated by coindexing (e.g. I\[ \] and <I>). 
Such coreferring value slots automatically point to a sin81e 
data structure entered through any one of the slots. 
Universal mono-level category N 
Category N: We posit the universal categmy N for 
nominals. Nominals here are those that realize AR~ 
such as subjects and objects. Nominals are more 
commonly labeled NP, a phrase typically built axound N or 
CN (comm*~ noun), as in phrase structure NP->DET N as 
well as in the categorlal grammar characterization of DET 
as a functor NPICN (Le. combines with CN and builds NP) 
(e.g. Ades & Steedm~n 1982; Wittenberg 1986a). This 
BI.LEV\]~ View of nominals is motivated by facts in western 
European languages. In English, for instance, while cat or 
wide cat cannot f'dl a subject position, a cat and thLv ca: 
can. In comrast, while he can be a subject, it cannot be 
modified as ~ he or srange h~. This motivates the 
following category-assJguments with a constraint that only 
NPs can be arguments: ca: is CN, he is NP, a and #~s are 
NP/CN, and white and sWange are CN/CN. This, bewevef, 
requires that plurals and mass nouns be CN and NP at the 
sanlc time since ca~, gold, white cats, white gold, these 
cms, and this gold can all be arguments. The count/nmss 
distinction is also often blurred since a singular count noun 
llke ca: may be used as a mass noun referring to the meat 
of the cat, and a mass noun like gold may be used as a 
singular count noun referring to a UNIT of gold or a KIND of 
gold (see e.g. Bach 1986). The boundmT between NP and 
CN is at best Ftr22Y. 
When we ~ to othm" languages, the basis for the 
bi-level view vmisbes. In Japanese, for instance, neko 'cat' 
can be an argument on its own, and pronoun kam 'he' can 
be modified as in ano kate 'that he' and okas/na kate 
'strange he'. In short, there is no basic syntactic diff~iew.e 
among count nouns, pronouns, and mass nouns (and no 
singular/plural distinction on a 'count' noun). All of them 
behave iJ~ plural and mass nouns in English. This 
supports a mono-level view of nominals, which we intend 
to captm~ with category N. Figure 3 shows the SG- 
templates relevant to the most general characterization of N 
in each language. SG-templates in the following 
illustrations are marked as follows: atomic templates SG-x 
(boldface), utility templates 9~SG-x, and substantive 
templates $SG-x. 
At the moat general level, the basic llomlnall ill 
Gezman (OE-N) and Arabic (AR-N) must be unsaturated 
because gcnitivc-inflectod Ns may take arguments. The 
basic nominals in Japanese (JA-N), English (EN-N), md 
French fiR-N), on the other hand, are basic categories that 
are salmated? In *_d,\]ition, all but JA-N inherit relevant 
AGR(eemant) templates (see below). Crucially, note that 
what 1oo~ like a reasonable characterization of N in each 
language actually consists of a particular selection from the 
common set of primitives. 
ARGUMENT and NON-ARGUMENT: We posit a 
pseudc~functiomd level of description in terms of 
ARG(ument) and NON-ARG for category N instead of the 
categozy=level distinction between NP and CN. ARG may 
function as an ~t alone, and NON-ARG cannot. 
5Note that English possessive marker's is not treated as 
an inflection here. 
197 
NON-ARG becomes ARG only by being combined with a 
certain modifier or by undergoing a semantic change (e.g 
massifying). In this view, the ARG/NON-ARG distinction 
is 'grounded on a complex intcraction of morphology, 
semantics, and syntax. 
In English and Germa~ singular count nouns (e.g. wee, 
Baum) are NON-ARG while plurals, mass (~ngu~) 
nouns, proper names, and pronouns are ARG. The NON- 
ARG nouns become 'complete' ARG nominals either by 
being modified with deteTmin~'s of by chmsing int~ mass 
nouns (typically changing an object reference into a 
property/substance mfe~nce, e.g., i uaed app/, /n my 
p/e.).° In French, all forms of commo~ nouns (i.e. singul&, 
plural, and mass) me NON-ARG, in need of delcrminers to 
become ARC; (e.g~ $'a/~ *ar~ arbrea 'I saw tn~J'; 
*AmourlL' omour e~ delica~ 'Love is delkate'). 
In Japanese, them ~e few NON-ARG nouns (e.g., kam 
'person' (HONORIFIC)), which can become ARG with 
any modifier such as a relative clause or an adjective (e.g. 
~mana tam 'free person (HON.)'3 In Arabic, the 
morphological distinction of nouns between a~rexzo vs. 
UNA~VeXED corresponds to NON-ARG md ARG statues, 
respectively, s For instance, the unmlnexed form q~.ma.~ 
CAT-DUAL NOM-UNANNEX 'tWO Ca~' may occur u mbject 
alone whereas the mnexed form q'.~a: CAT.DU~M 
ce~not. The latter must be modified with a noun-based 
modifier such as a genitive phrase, and this modifier must 
be unsnncxod (e.g. with rajulin MAN-ffeN.UNANNIDG q't~a: 
raju//n 'mAn's two cats'). These facts in Japanese mul 
Arabic show that the proposed fun~onal distinction for 
nominals is motivated independently from the syntaodc 
role of determiuen since ueithcr language has modifiers of 
categmy DET that we find in Engl_i~h; French, and Gennm 
(more discussed later). 
We realize that the ARG/NON-ARG distinction itself 
is not a final solution until fine-grained syntactic-romantic 
interdependence is fleshed out. For now, we simply posit 
pseudo-functional types ARG md NON-ARG, which me 
either changed or passed up within the nominal slructure: 9 
$SG-ARG: \[result" \[type: erg\]\] 
$SG-NON-ARG:\[result: \[type: non-&g\]\] 
Category NIN: Adnominal modif'~m (N-MODs) are 
now universally NIN (Le. a functor that combines with N 
and builds N). This includes both determiners and 
aUribulive modif'u:rs. Figure 4 shows the SG-templates for 
the basic N-MOD. Different kinds of N-MOD must then 
distinguish whether it takes one or two arguments and 
whether the resulting nominal with modification is ARG or 
NON-ARG. Each distinction is briefly illustrated below. 
Two kinds of Igenltlve: Genitive N-MOD functors 
may take different numbers of arguments cross- 
linsuist/cally. An inf~ted genitive nominal (e.g. GE: 
Marias, AR: rajulln 'man's') takes one, while a genitive 
8dposition (e.g. EN: o)) takes two. The former is captured 
with SG-I~ONAI.~ENrrIVE-CASE-MOD, and 
the latter, with SG-PARTICLE-GENITIVE-CASE-MOD. 
see ~,ur, s. 
Non-universal determiner category: In the present 
~roach, DET(enniner) is a modifim- type (including 
&ticks, demonstratives, quantifiers, numerals, and 
possessives) such that at least one of its members is needed 
for making an ARG nominal out of a NON-ARG. The fact 
that a nominal with a del~rmln~r is always ARG Iranslates 
into SG-DET inheriting from SG-ARG among others. 
DET is present in English, German, and French, but not in 
Japmese or Arabic (or Russian o~ Chinese). 
Demommnfive~ quanlifiers, numerals, and possessives in 
the latter lansuagea do not sham the syntactic function of 
DET. We suspect that the presence of DET is an areal 
property of western Eeropean lmgeaSes. 
The sublatticc in Figure 6 highlights two aspects of 
DET. One is the diff~,~.,ce between DET and ADJ(ective) 
in Engfish, German, and French with respect to the ARG 
status of the resulting nominal. DET always builds ARG 
cancelling whatever the type of the incoming nominal 
whereas ADJ passes the type of the incoming nominal to 
the top. The other is the place of demonslralives in relation 
to DET. Eve~ language has demonstratives encoding two 
or tluue degre~ of speaker proximity (e.g. JAPANESE: 
kono (close to the speaker), sow (close to the addressee), 
61n implementation, this latter process may be triggered 
by a unary rule COUNT->MASS. 
7They are assigned a NON-ARG category MN (for 
'modified noun') separate from the ARG category N. Any 
modifier changes it into ARG. 
SA/mEX~ here means 'needing to be mmexed to a noun- 
based modifier', and UN~ means 'completed'. 
Th~ arc also called NONNUNATED ~ NUNATED fOl'l~, 
respectively, in Semitic linguistics (Aristar, personal 
communication). 
9An intnging direction is shown in Kritka's (1987) 
categorial grammar t~ttmenL He assigns the singular 
count noun in English (i.e. our NON-ARG) m unsatnmted 
nominal category looking for its numerical value both in 
syntax and semantics. The sJSnificance of determiners is 
here as suppliers of numerical values. How this approach 
can be extended to cover the NON-ARG nominals in 
Arabic and JapAnese (which ale not in need of numerical 
values per se) remRin~ to be seen. Although it ma~s sense 
to see NON-ARG as a functor looking for more semantic 
determinaeon, implemeneng it would require a reduction 
rule for TWO FONc'roRs U30~O FOR EAC~ oTtm~ The 
current system would cause an infinite regression with such 
a rule. 
198 
atomic templates 
%SG-HF.AD-FF.ATS-ARE-TOP-FEATS: <- passes the features of the second 
(result: \[feats: <1> element to the top elements: \[b: \[feats: 1\[ \])\]\]\] 
%SG.-FIRST-ARGUMENT: <- slot for the first argument \[result: \[elements: \[b: <1>\]\] 
arguments: \[first: \[result: 1\[ \]\]\]\]\] 
%SG-GET.-ORDER: <- passes the ORDER content of the first argument to the top \[result: \]order: \[\[<1>\]\] 
arguments: \[first: \[result: \[order: 1\[ \]\]\]\]\] 
$SG-MOD: <- for • category-constant functor MOD (see below) \[result: \[eat: 4\[ \] 
elements: \[s: \[index: <1>\] 
b: <3>\] 
order: limed: 1\[ \]\] \[head: 2\[ \]\]\] 
arguments: \[f'h'St: \[result: 3\[cat: <4> 
index: <2>\]\]\] 
inheritance of composite templates 
$SG-N (above) %SC,-HEAD.-FEAT~ARF_,-TOP.FEATS 
%SG-FI1L~-ARG~iG-G~SG-MOD 
$SG-N-MOD<- for the general sdnominal modifier 
Figure 4. Genecal N-MOD 
atomic templates 
%SG-ARGUMENTS-REST-SATURATED: \[arguments: \[rest: #\]\] 
%SG-ONLY-TWO-ARGUMEN~: \[arguments: \[rest: \[first: \[arguments: #\] 
rest: #\]\]\] 
<- saturates the second argumen 
<- no more than two arguments soughl 
$SG...GENrnv~ <- assigns the genitive case featun \[result: \[elements: \[a: \[feats: \[case: genitive\]\]\]\]\] 
inheritance of composite templates 
$SG-N-MOD (above) 
$SG-CASE-MOD: <- for the general case-mod 
\[result: \[elements: \]a: \[cat: {'P N') <- P or N 
feats: \[mod-t'ype: case-meal\]\]\]\]\] 
~S G-INI~..EC'MON~.-Ca~E-M OD $SG-GENF~VE S SC~-PAR'n CLE-C~-q E-M O D category ~ ~ (chooses / ~ (chooses category P) 
~SG-INFLECTIO NAL-GEN rSl~tE-CASE-MOD $SG-PARTICLE-GENITIVE-CASE-MOI: 
GE-N (above) 
GE: MarJas AR: rsjulin 'man's' EN: of JA: no 
Flgu~ $. Genitive Case MOD 
199 
and ano (away from either)), but they belong to the class of 
determiners only ff the language has DET. 
Grammatical agreement (AGR) 
Two kinds of features are distinguished, linguistic 
features relevant to GRAMMATICAL A~'r (e.g. Frenc~ 
grammatical gender i~l~*~ table °a table' f.), and refexent 
fealm~s relevant to ~AC~ATXC A~Rmgdm~r (e.g. using s~ 
to refer to a female person; using appropriate numend 
classifiers fur counting objects in Japanese). The former is 
under aUribute AGR, and the latter is under FEATS. The 
N-internal gramma,~c~l agn:emunt (AGR) requires that 
certain features of the HEAD Nominal must agree with 
those of MOD. For instance, English has number 
agreement (e.g. th/s book, *tho~ book, *th/,v boo~). 
Among the five languages under consideration, all but 
Japanese have AGR. 
Although them is c~oss-linguistic variation in AGR 
features, it is not random (Moravcsik 1978). Table I sums 
up the N-intemai AGR features in the four languages. All 
AGR features go under atlribute AGR so that its presence 
simply corresponds to the inescoce of grmmnatical 
agreement in a language. EN-N, for instance, inherits the 
shared template for number agreement, and FR-N 
those for number and gender agreements. See below:. 
$SG-NBR-AGR: 
\[result" \[agr:. \[nbr:. <I>\] 
elements: \[a: \[feats: \[nbr: IN\]\]\]\]\] 
$SG-GDR-AGR: \[result: \[ag~. \[g~ <1>\] 
etemmts: \[~ \[feats: \[g~ 11"I\]\]\]\]\] 
Seperating AGR end FEATS enables us to cte.a~ SO- 
templates that impose the most general agreement 
conslraint ~-g~miless of the precise content of agreement 
fea~. Three agreement templates produce the combined 
effect of N-intenml agreement conslrsint, SG-AGR, SG- 
AGR-ARGUMENTS, and the composite of the two, SG- 
AGR-WITH-ARGUMEN'I~. See Figure 7. 
The reenlrancies impose the strict identity of AGR 
features: (0 $SG-AGR--betwem the topmost structure 
and the dcmmt that the graph is defined for, (fi) 
$SG-AGR-ARGUMENTS---between the topmost 
structure and the first argument, and (iii) $SG-AGR- 
WITH-ARGUMENTS--among all the three. (0 goes into 
ALL NOMINALS, pussing the Dominql's AGR featams to the 
top level This is because the AGR features must always be 
available at the top level of a nominal so that they can be 
used when the nominal is further modified. (ii) goes into 
ADNO~AL MODn~mRS, passing the head nominai's 
AGR realtors to the top leveL (ih~ goes into ONLY THOSE 
ADNOMINAL MODwle.gS SUBJECT TO THB AG~ CONS'IRAINI** 
for instance, demomtratives (e.g. these) but not attributive 
adjectives (e.g. sma//) in English, and both demonstratives 
and adjectives in French (see this diff~ce in the above 
inberitance). 
This is an example where a better language-specific 
treatment is obtained from the gnunmar-sharing 
perspective. If only English is handled, one may simply 
force the identity of NBR features amidst all kinds of other 
featmes, but in the light of eruss-linguistic variation and 
invsrisnts, it lends itself naturally to separating out two 
kinds of features that correspond to diff~t semantic 
intcqnetation processes. 
Category constancy and word order 
typology 
In connecting word order typology and categoriai 
grnmm~r~ we have benefited from work of Grcenberg 
(1966), Lelmumn (1973), Vennemann (1974, 1976, 1981), 
Kecnma (1979), Flynn (1982), and Hawkins (1984). 
Amon 8 these, we have a f'h-st-cut implementation of 
Vamemmm's (1981) and Plyun's (1982) view that the 
functor types based on CATEOORY CONSTANCY have a 
significant relation to the default word order of a language. 
A functor is c^Teoo~Y.COm-T~aCr ff it builds the same 
catego~ as its argum~t(s). It is CATEGORY.NON-CONSTANT 
if it builds a different category from its m-gument(s). These 
notions ~e also called m~xJrt, mc md ~x~c, 
respectively, by Ber-Hillel (1953), and are crucially used in 
lqyma's high-level word order convention s~. The 
definitiom of the notions MOD (modifier), HEAD (head), 
FN (run.ion), and ARG (argument) follow:. 
• MOD is a categm'y-comtant functor (XIX) that 
combines with HEAD (X). (see above for SG- 
MOB) 
• FN is a category-non-comtant functor (YIX) 
that combines with ARG (X). 
eatm~oz~, aat~oz~, 
cmast~ant non-oonst.ant~ 
X Y 
I\ I\ 
XlX X YIX X 
I I I I 
~ PM &\]RG 
@.g. 
BIN W PPIM W 
adJ noun pzmp noun 
red roof for Max 
Them is crms-linguis~ evidenc~ that MOD-I-IEAD 
mid FN-ARG urdcn tend to go in opposite directions. This 
remounts to two basic word order types in languages: 
¢~R T'~PE 1: \]tRG < FN 
MOD ~ 
¢L~DEIt TXW2 2: i'N<~ 
IDLED ~ MOD 
(wlmL-e < ~-qutdB as 'pz.cmdas') 
The N-level default word order in a language is determined 
as follows: Every language has ~posrnoN-s (prepositions 
and postpositions), universally a category-non-constant 
functor PPIN. A postpositionai laaguage (i.e. a language 
that uses only or predominantly postpositions) then belongs 
to TYPE 1 (ARG < FN), and a prepositional language 
belongs to TYPE 2 (FN < ARG). in the present case, EN, 
G~ ~ and AR are propositional while JA is 
postpositiuneL 
The default MOD order is most faithfully observed in 
200 
inheritance of composite templates ~ 
$SG-ARG (see above), 
%SG-ARGUMENTS-REST-SATURATED (see above) 
$S~-DET ~G-N~ (see above) 
{various templates for cons~aimng 
the cooccurrence 
and order inside DET) $SG-DEM(onstrative) $SG-ATI'RIBUTIVE-ADJECTIVE 
$SG-HEAD-TYPE-IS-TOP-TYPE: 
~/'"~ / ~:\[result: \[t~:>eeleme~l:> \[b: 1\[ \]\]\]\]\] i 
ENoATTIRB-ADJ GE-ATTRIB-ADJ FR-ATTRIB-ADJ AR-ATTRIB-ADJ JA-A3"rRIB-ADJ 
big gross grand . kablyr ookU 
Figure 6. DEM 8rid ATrRIB-ADJ in relation to DET 
ARABIC: 
GERMAN: 
FRENCH: 
F.NGLISH: 
NUMBER: GENDER: CASE: DEFINrrE: ANNEXED 
SG DU PL3 M F NOM ACC GEN ÷- + - 
SG PL M F N NOM ACC GEN DAT 
SG PL M F 
SG PL 
Ttble I. N-inmul Agmemmt Feature 
atomic tamplat~ 
%SG-AGR: \[result: \[agr: <I> 
elements: \[a: \[agr: I\[ ll\]\]\] 
:$SG-AGR-ARGUMENTS: \[result: \[agr: <1>\] arguments: \[first: \[result: \[AO~ I\[ \]\]\]\]\] 
inheritance of composite templates 
(~ "~SG-GDR-AGR (above) ~J~.~a N MOD FIR N MOD 1 '' I~" ~etc. ~ .......... r ....... 
inu dogs chiens these stall ces petits 
Figure 7. AGREEMENT 
201 
Arabic (HEAD < MOD) and Japanese (MOD < HEAD), 
with few exceptions. The three European languages, 
however, observe the default order only with 'heavier' (i J:. 
phrasal or clausal) modifiers, namely, genitives, pp- 
modifiers, and relative clauses. Lex/cal modifiers, 
including numerals, demonslratives, and adjectives (more 
or less), go in the opposite ordering. The exceptionally 
ordered MODs of the five languages revealed en 
implk:ational chain amnng modifiers: Numerals < 
Demonstratives < Adjectives < Genitives .: 
Relative clauses. Exceptional order was found with those 
MODs s~arting from the left-end of this hierarchy: JA: 
marked use of Numerals, AR: enmarked use of Numerals 
and Demonslratives, FR: Numerals, Demonstratives, and 
used of Adjectlve~ EN&GE: Numerals, 
Demomlrafives, and Adjectives. The generalization is that 
a non-default order for a modifier type x implies the now 
default order for other types located to the LeFr of x in the 
given chain. WI~ we found mppo~ the general 
implicational hierm~hy that Hawkin~ (1984) found in his 
cross-linguistic study. We can ~ maintain, therefin'e, that 
there is such a thing as the default .o~ with a 
qualification that it maybe oven'idden by non-random, 
subclaasea. In our current implementation, we simply 
assign another category MOD2 on those 'exceptional' 
modifiers in order to free them from the general order 
conslraint on MOD, which we hope to improve in the 
future. 10 
Potential problems and solutions 
There are two potential problems in m effort to 
develop a shared grammar as described be~ One is the 
need for serious cooperation amang the developers. A 
small change in shared templates can always affect 
language-specific templmns that someoue else is workln~ 
on. The other problem is the sheer complexity of the 
inheritance lattice. Both problems can be most cffcctively 
reduc~_d by a sophisticated edits tooL 
Conclusions and future prospects 
We have shown a specific implementation of grammar 
sharin8 using graph unification by inheritance. Although 
the case discussed covers only simple nominals in five 
languages, we believe that the fundamental process that we 
GRAMMATICAL ATOMIZATION will remain crucial in 
developing a shared grammar of any sU'uctural complexity 
a~l linguistic coverage. The specif~ merits of this process 
is that (a) it tends to prevent the grammar writer from 
implementing treatments that work only for a language or a 
language type, and that (b) it pmvidas insights as to how 
certain conflated properties in a languase actually mnsist 
of smaller independent pros. In the end, when a prototype 
shared grammar anains a reasonable scale, we hope to 
verify the prediction that it will facilitate adding coverage 
for new languages. 
The purpose of this wo~ at MCC was to demonstrate 
the feasibility of a shared syn~ rule base for dissimilar 
languages. We only assumed that languages are used to 
. convey information contents that can be represented in a 
common knowledge base. As the next step, therefore, we 
have chosen to connect syntax with 'deeper' levels of 
information pmces~in~ (i.e. sern*.tlcs, discourse, and 
knowledge base) rather them continuing to increase the 
syntactic coverage alone. Our current effort is on 
developing a blackboard-like system for controlling various 
knowledge sources (i.e. morphology, syntax, semantics, 
discourse, and a commmutense knowledge base (MCC's 
CYC, Lanat and Feigenhaum 1987)). In the future, we 
hope to see a shared grammar integrated in a full-blown 
interface tool for man-machine commuuical/on. 
Acknowledgments 
This shared grammar work is a collaborative effort of a 
team at MCC. I am especially indebted to my fellow 
linguis~ Anthony Arists~ and Carol Juatus, for their 
insights into multilingual facts and numerous discussions. 
I would also like. to tl~nk Rich Cohen, Martha Morgan, 
Elaine Rich, Jonathan Slecum, Ksystyna Wachowicz, and 
Kent Wittenburg for valuable comments and discussions at 
various phases of the work. Thank~ also go to AI Mendall 
and Michael O'Leary for implementing the interface tool, 
e~l to anonymous ACL reviewers for helpful comments. I 
am responsible, however, for this particular exposition of 
the work and remaining shortcomings. 
I°We envision using a data structure of type inheritance 
lattice defined for each lanouage to express word order 
constraints in order to handle non-default orde~m 8. The 
basic idea is that an order constraint stated on a d_,~'__~-ndant 
(e.g. DEM < head) ovearides that stated on its anc~tont 
(e.g. head < MOD). This differs from GPSG's LP rules 
(Gazdar & Pullum 1981; Gazd& et al. 1985; Uzlmreit 
1986) in that the order conslraints apply to items located 
anywhen" in the derivational Iree struclrue, not limited to 
sister constituents, and the pieces of an item can be 
scattered in the tree. It is in spirit ~imilar to LFG's 
functional precedence conslraints (Kaplun 1988; 
Kameyama forthcoming). 
References 
Aries, Anthony and Mark Steedman. 1982. On the order of 
words. Lingusitics and Philosophy, 4, 517-558. 
Aristar, Anthony. 1988. Word-order constraints in a 
n~0tilingeal categorial grammar. To appear in the 
Proceedings for the 12th International Conference on 
Computational Linguistics, Bedapest. 
Bach, ~mmon. 1986. The algebra of events. Linguistics 
and Philosophy, 9, 5-16. 
Bar-Hillel, Y. 1953. A quas/-arithmetical notation for 
202 
syntactic description. Language, 29(1), 47-58• 
van Benthem, Johan. 1986. Categorial grammar. Essays in 
Logical Semantics (Chapter 7). DonkechC Reidel, 
123-150. 
Flickengcr, Daniel, Cad Pollard, and Thomas Wasow. 
1985. Structure-sharing in lexical rcprcsentation. 
The Pruccedings for the 24th Annual Meeting of the 
Association for Computational Linguistics. 
Flynn, Michael 1982. A categorial theory of stricture 
building. In G. Gazdar, G. Pollum, and E. Klein 
(eds), Order, Concord, and Constituency. Dordrecht: 
Foris. 
Gazdsr, Gerald and Geoffrey K. Pullum. 1981. 
Subcategorizat/on, constituent order, and the notion 
'head'. In Moongat, M., H. v.d. Huist, and 
T. Hoekstra (eds), The Scope of Lexical Rules. 
Dordrecht, Holland: Foris, 107-123. 
; Ewen Klcin; Geoffrey K. pollum; and Ivan A. Sag. 
1985• Generalized Phrase Slnumm~ Grammar. 
Oxford, England: Blackwell Publishing and 
Cambridge, Mass.: Harvard University Press. 
Greenberg, Joseph. 1966. Some universals of grammar 
with particular reference to the order of meaningful 
elements. In J. Greenberg (ed.), Universals of 
Language (2nd edition). Cambridge, Mass.: The MIT 
Press, 73-113. 
Hawkins, Jolm. 1984. Modifier-head or function-argument 
relations in phrase slructure? The evidence of some 
word order universals. Lingua, 63, 107-138. 
Kameyam* Megumi. forthcoming. Functional precedence 
conditions on overt and zero pmnominals. 
Manuscript. 
Kapian, Ronald M. 1988. Three seductions of 
computational psycholinguistics. In Whitelock, 
Peter;, Harold Somen, Paul Bennett, Rod Johnson, 
and Mary McGee Wood (eds), Linguistic Theory and 
Computer Applications. Academic Press. 
Karttunen, LaurL 1986• Radical lexicalism. Paper 
presented at the Workshop on Alternative 
Conceptions of Phrase Slntcture at the Summer 
Linguistic Institute, New York. \[To appear in 
Kroch, Anthony et aL (eds), Alternative Conceptions 
of Phrase Structure.\] 
Keemn, Edward. 1979. On surface form and logical form. 
Studies in the Linguistic Sciences (special issue), 
8(2). 
Krifka, Manfred. 1987• Nominal ref~uce and tempm-al 
constitution: towards a semantics of quantity. In 
J. Gmenendijk, M. Stokhof, and F. VelUnan (eds), 
Proceedings of the Sixth Amsterdam Colloquium, 
University of Amsterdam, Institu~ for Language, 
Logic, and Information, 153-173. 
I.ab,~mn; Winfred P. 1973. A structural principle of 
language and its implications. Language, 49, 47-66. 
Lenat, Douglas B. and Edward A. Feigenbanm. 1987. On 
the thresholds of knowledge. Paper presented at the 
Workshop on Foundations of AI, MIT, June. Also in 
the Proceedings for the International Joint 
Conference on Artificial Intelligence, Milan. 
Montague, Richard. 1974. The proper Ireatment of 
quanlffication in English• In Rich Thomason (ed•), 
Formal Philosophy:. Selected Papers of Richard 
Montague. New Haven: Yale, 247-279. 
Moravcsik, Edith. 1978. AgreemanL In J. H. Greenberg et 
al. (eds), Universals of Human Language, VoL 3. 
Stanford: Stanford University Press. 
Pollard, Cad and Ivan Sag. 1987. Head-driven Phrase 
SU'UCUI.-'~ Grammar~ The ¢oursc ~ for \[he 
Linguistic Institute at Stanford University. 
Schmerlin 8. Susan. 1983. Two theories of syntactic 
categories. Linguistics and Philosophy, 6, 393.421. 
Shicher, Stuart. 1984. The design of a computer language 
for linguiStiC informaliolL The Pr~__J~yl_ |n~s for the 
10th International Conference on Computational 
Linguistics, 362-366. 
1986• An Introduction to Unification-based 
Approaches to Grammar• CSLI Lecutre Notes 4. 
Stanford: CSLL (available from the University of 
Chicago P~s) 
Slocum, Jonathan. 1988. Morphological processing in the 
Nabu system. In the ProceeA_ings for the 2rid 
Confezence on Applied Natural Language 
Pmcessh\]8. ACL. 
and Carol Juatus• 1985• Transprtability to other 
languages: the natm~ language processing project in 
the AI program at MCC. ACM Transactions on 
Offke Information Systems, 3(2), 204-230. 
Uzkm~t, Ham. 1986a. Comtraints on order. Stanford, CA: 
CSLI Repog No. CSLI-86-46. 
• 1986b. Categorial unification gramman. The 
~gs for the 1 lth International Conference on 
Computational Linguistics, 187-194. 
Venuemann, Then. 1974. Topics, subjects and word one-'r: 
From SXV tu SVX via TVX. In J. M. Andsrson ~nd 
C. Jones (eds), Historical Linguistics, I• Amsterdam: 
North-Holland, 339-376. 
• 1976. Categorial grammar and the order of 
meaningful elements. In A. Jnilland (ed.), IAnguistic 
studies offered to Joseph Greenberg on the occasion 
of his sixtieth birthday. California: Saratoga, 
615-634. 
• 1981. Typology, universals and change of language. 
Paper prmentad at the International Conference on 
Historical Syntax, Poman. 
and Ray H&low. 1977. Categorial grammar md 
consistent basic VX ~iafizafion. Theoretical 
linguistics, <3), 227-254. 
Wittenhorg, Kent. 1986a. Natural language processing with 
combinat~ry categorial grammar in a graph- 
imificafion-based formalkuk Doctoral Dissertation, 
University of Texas at Austin. 
• 1986b. A parsor for portable NL interfaces using 
graph-unification-based ~mmnrS. The ~gS 
for the 5th National Conference on Artificial 
IntelLigence, 1053-1058. 
203 
