Unification Categorial Grammar: 
A Concise, Extendable Gragamar for Natural Language Processing 
Jonathan CALI)F.R, Ewan KI,E1N and +Henk ZEEVAT 
I/niversity of Edinburgh 
Centre tot Cognitive Science 
2 Buccleuch Place 
Edinbm'gh 
EIt8 9LW 
\]Institut fib' Maschinelle Sprachverarbeitung 
Universit,qt Stuttgm't 
Keplerstrasse 17 
D 7000 Stuttgart 
Abst_~'act ~hrlication Categorial Gtmmnar (t/:G) combines the 
syntactic iu.,;ights of Categoria\[ Grammar with the semantic 
insights of I)i;course Representation Theory. The addition of 
uni/ication to these two frameworks allows a simple account of 
intexaction be,.wcen different linguistic levels within a constrain- 
ing, monostraml theory. The resulting, computationaUy efticient, 
system provides an explicit formal framework for linguistic 
description, widfin which large fi'agments of grammms for French 
aud English have ah'eady been developed. We present the for- 
mai basis of UCG, with i.dependent definitions of well-. 
toxmedness fol syntactic and semantic dimensions. We will also 
focus on the concept of #nodifier within the theory. 
L lntroduetl, on 
Uniiieation C~tcgo~q_al Grammm' (UCG) comlfines the syntactic 
insights of C~tegori~ll Grammar with the semantic insights of 
Discourse Rep,:escntafion Theory (DRT, Kamp 1981). The addi- 
tion of unilicalion (Shiebcr et al. 1983) to these two flameworks 
allows a simple account of intcractiou between different linguistic 
levels. The xesulting, computationally efficient, system provkles 
an explicit tbmml fi'amework fbr linguistic description, within 
which large grammar ti'agnrents lbr French (Baschung et al. 
1987) and F.nglish (Calder, Moens and Jmevat 1986) have already 
been developed. This paper will describe the design of the UCG 
formalism, illustrated by exarnples of grammatical categories and 
rules, l 
UCG embodies seve.d recent trends in linguistics. First, being a 
categorial grar,amar, it is strongly lexicalist, In other words rela- 
tively little Jr, formation is contained in grammar rules. Most 
information odginates in the lexicon. Second, it is strictly 
dechtrative. Unitication is the only operation allowed over gram- 
matical objects;. Third, there is a very close relationship between 
the syntax and semantics of linguistic expressions. 
UCG lies with:ht the family of grammars described by Uszkoreit 
1986 and Kmttunen 1986. tJCG also has close affinities to the 
IIead-Driven Phrase Structure Grammar (I-IPSG) proposed by 
Pollard 1985. The main theoretical difference is that in IIPSG 
welMi~rmedrmss is characterized algorithmically, rather than 
declm'atively as in OC(/. For this reason, we t,ave adopted 
l'ollard's tenn;nology and refer to linguistic expressions as signs. 
A sign retm>;ents a complex of phonological, syntactic and 
semantic tale.nation, each of these linguistic levels having its 
own definitions of well--formedness, 
In U(.'G, we employ three primitive categories: nomts (tmun), sen- 
tences (sent) md noun phrases (ltp) These primitive categories 
i 'IIm work described here is SUl)portexl by the EEC ESPRIT projec 
P393 ACORD: the Cons|ruction and Interrogation of Knowledge Bases asia 
Natural L~mguag(~ Text and Graphics. 
admit further specification by features, so that we can distinguish 
tinite and non-finite sentences, nominative and accusative NPs, 
and so on. Categories are now defined as follows: 
(1) a. 
Any primitive category (together with a syntactic feature 
specification) is a category. 
b. 
If A is a category, and B is a sign, then A/B is a category. 
In a Category of the trmr A/B, we call B the active part of the 
category, and also of the sign as a whole in which A/B occurs as 
category. It will be observed that this definition is just tire 
categorial analog of Pollard's (1985) proposal for subcategoriza- 
tion, according to which phrasal heads ate spccificd for a list of 
signs corresponding to their complements. Likewise, (1) is 
closely related to the standard definition for the set of categories 
of a categorial grammar, 
Within the grammar, we allow not just constant sylnbols like 
sent and rip, but also variables, at each level of representatiou. 
V~wiables allow us to capture tire notion of inconlplete infomaa- 
tion, and a sign which contains variables can be further specitied 
by unilication. The form of unification that we rely on is first- 
order term unification, provided as a basic operation in program- 
ruing languages such as PROLOG. 
This, in essence, is the structure of UCG. We will complicate the 
picture by distinguishing two rules of fnnctional application, and 
by giving more content to the notions of semantics and features. 
2. The Mechanisms of UCt; 
2.1. Structuring Signs 
UCG signs have three basic components, corresponding to their" 
phonology, category and semantics. We will write the most 
unspecified sign as follows: 
(2) w: 
c: 
s 
by which we intend a sign with phonology W, categow C and 
semantics S. (1) and (2) give well-formedness conditions on pos- 
sible instantiations for a sign's category. For the present papm', 
we will assume that a sign's phonology may be simply its 
orthography in tire case of proper names, otherwise a sign's pho- 
nology may be composite, consistiug of variables and ortho- 
graphic constants sep~ated by +. The + operator is understood 
as denoting concatenation. 2 
2 This operation might apt)ear to take us beyond tile bounds of first- 
o~ler unification. However in the cases we will deal with, there are 
equivalent signs which express concatenation by means of PROI.OG difference 
lists. 
83 
2.2. Indexed Language 
The semantic representation language that we use to encode the 
semantics of a sign is called InL (for Indexed Language), and is 
derived from Discourse Representation Theory (cf. Kamp 1981), 
supplemented with a Davidsonian treatment of verb semantics 
(cf. Davidson 1967). The main similarity with the Discourse 
Representation languages lies in tile algebraic structure of InL. 
There are only two connectives for building complex formulas; 
an implication that at the same time introduces universal 
quantification, and a conjunction. 
The language InL differs in one important respect from the DRT 
formalism, and thus earns its name; every formula introduces a 
designated variable called its index. This does not mean that 
(sub)formulas may not introduce other variables, only that the 
index has a special status. The postulation of indices is crucial 
for the treatment of modifying expressions, but it is indepen- 
dently plausible on other grounds. Every sign has an associated 
ontological type, represented by the sort of its index. Subsump- 
tion relations hold between certain sorts; for instance, a index of 
sort singular will unify with an index of sort object to yield an 
index of sort singular. For notational purposes, we use lower 
case alphabetics to represent sorted variables in InL formulas. 
Upper case alphabetics are variables over formttlas. The index of 
an expression appears within square brackets in prenex position. 
(3) gives example translations of some expxessions. 
(3) \[Index\] Formula Expression Sort 
a. \[el WALK(e, x) walk event 
b. Ix\] STUDENT(x) student singular object 
c. \[x\] \[PARK(y),\[x\] \[IN(x,y),MAN(x)\[I 
man in a park singular object 
d. \[m\] BUTlER(m) butter mass object 
e. \[s\] STAY(s, x) stay state 
3. UCG Binary Rules 
We may write UCG rules as simple relations between signs. We 
require two rules, our analogs of forwards and backwards appli- 
cation (FA and BA respectively). Here we follow the PROLOG 
convention that variables srart with an upper case alphabetic. 
(4) (FA) Phonology:Category/Active:Semantics Active 
Phonology:Category: Semantics 
(BA) Active Phonology:Category/Active:Semantics 
$ 
Phonology:Category: Semantics 
TheSe rules state that in the case of function application, the 
resulting category is simply that of the functor with its active 
sign removed; the semantics and phonology of the result are 
those of the functor, thus effecting a very strict kind of Head 
Feature Convention. Note in particular that we view the phono- 
logical, syntactic and semantic functor as always coinciding. 
This has important consequences for the way we treat quantified 
NP's, as we discuss in the next section. Any of the features of 
the resulting sign may have been further instantiated in the pro- 
cess of unification. 
Importantly, (4) states that a functor may place restrictions on 
any of the dimensions of its argument. Likewise it will 
determine the role that the information expressed by its argument 
plays in the resulting expression. A UCG sign thus represents a 
complex of constraints along several dimensions. 
4. 1JCG Signs 
We now give some example UCG signs. 
4.1. Nouns and Adjectives 
(5) student: 
noun: 
\[x\]STUDENT(x) 
(6) cheerful+W: 
noun/(W:noun: Ix\]P): 
\[x\] \[CHEERFUL(x), \[ix\]P\] 
The reader is invited to work out for herself how the signs (5) 
and (6) will combine using the rule of forwards application. 
4.2. Determiners 
Following Montague 1973, we treat quantified NPs as type-raised 
terms. We can however take advantage of the polymorphic 
nature of UCG categories and have a single representation for NPs 
regardless of their syntactic context. In our analysis, the deter- 
miner introduces the type raising. This is the sign that 
corresponds to a. 
(7) W 
(C/(W:C/a+Wl: np\[nom or obj\]:b):\[a\]S)/(Wl:noun:\[b\]R): 
\[a\]\[\[blR, S\] 
More verbosely, this says that a combines first with a noun 
which has phonology W1 and semantic index b. The semantics 
that results from such a combination is a conjunction, the first 
conjunct of which is the semantics of the noun. The second con- 
junct is the semantics of the resulting NP's argument. As the NP 
is type-raised, it has a category of the schematic form: 
(8) C/(C/np) 
That is, a type-raised NP will take as its argument some consti- 
tuent which was itself to combine with a (non-type-raised) NP. 
When fleshed out with values for the other components of a sign, 
it will have the form as shown in (9). Note in particular that, as 
it is the verb that determines linear order, the phonology of the 
resulting expression depends on that of the argument to the NP. 
(9) shows the result of combining (7) and (5) via forward appli- 
cation. 
(9) W 
C/(W:C/a+student: rip\[nora or obj\]:b):\[a\]S: 
\[al\[\[blSTUDENT(b), Sl 
The sign corresponding to every (10) is very similar to that for a, 
the major difference being that every introduces DRT implication, 
notated here with ~. 
(10) W 
(C/(W:C/every+Wl: np\[nom or obj\]:b): \[a\]S)/ON l:noun:\[blR): 
\[sl\[\[b\]R ~ \[a\]S\] 
4.3. Verbs 
The following is the sign for walks: 
(11) W+walks: 
sent\[fin\]/(W:np\[noml:x): 
\[e\]WALK(e, x) 
This will combine with file sign (9) a student to yield (12) of 
which the semantics may be read as: "There is a walking event, 
of which b is the agent, and b is a student". 
(12) a+student+walks: 
sent\[fin\]: 
\[e\]\[\[b\]STUDENT(b), WALKS(e, b)\] 
5. Modifiers in UCG 
We have already seen one category of modifiers, namely adjec- 
tives, in section 4.1. We are able to make more general state- 
ments about modifiers; they all contain instances of the category 
in (13): 
(13) C/(W:C:S) 
84 
Appropriate msldctions on C wilt allow us to describe, for 
instance, the class of VP modifiers such as adverbials and auxi- 
liaries (Cf. Bouma 1988). The close relationship between syntax 
and semantic,,; allow us to give concise formulations of the dis- 
tinctions between intersective, vague and intensional modifiers 
(Kamp 1975). In the first two cases, the semantics of the 
modified expression is conjoined with that of the modifying 
expression. In the vague case, we have to relativize the meaning 
of the modifying expression to that of the modified. In the third 
case, the semantics of the modified expression must be contained 
within the scope of an intensional predicate. The following 
examples illustrate the three cases. 
(14) square+W: 
noun/(W:noun:\[a\]A): 
\[allSQUARE(a), AI 
large+W: 
noun/(W:noun: \[a\]A): 
\[alILARGE(A, a), AI 
fake+W: 
nmm/(W:noun:\[a\]A): 
\[alFAKE(A) 
6. Further ltevelopment of the tlleory 
We have not attempted to give a fully representative list of UCG 
signs here. Elsewhere (Calder, Morns and Zeewtt 1986, Zeevat, 
Klein and Calder 1987), substantial analyses of subcategorization, 
prepositional and adverbial modification, negation, relative 
clauses and sentential .connectives have been developed. We 
have also extended the theory to encompass non-canonical word 
order, using ;t mechanism similm" to the GPSG SLASH (Gazdar 
et al. 1985), and to handle quantificational constraints on ana- 
phora following the analysis of Johnson and Klein 1986. We 
have an eftie:ient implementation of the system which represents 
signs as PROIOG terms, using a different treatment of phonologi- 
cal informalion. The use of templates (Shieber et al. 1983) 
allows us to capture generaliztions about clases of lexical items. 
The compilalion of UCG structures into PROLOG temas is pet- 
formed by a general processor driven by the definitions of well- 
formedness of the dimensions of a sign, allowing compile-time 
typeocbecking of grammars (Calder, Moens and Zeevat 1986). 
The system u:;es a tabular shift-reduce parser. 
7. Further 6evelopments of UCG 
The system described above is deficient in some respects. For 
example, requiring coincidence between the phonological, syntac- 
tic and semantic funetors may be too strict. The problem of 
quantifier scoping is a case in point. Zeevat 1987 suggests relax- 
ing this requirement to allow linear ordering to become the dom- 
inant factor in determining semantic functor-argument relations. 
It seems likely that such a step will also be necessary for certain 
phonological phenomena. Current work is investigating the util- 
ity of associative and commutative unification in this respect. 
in extending UCG to allow u'eammnt of unbounded dependency 
constructions and partially fi'ee word order, heavy use is made of 
uttary rules (Wittenburg 1986). Current work aims to recast the 
notion of unary rule within the framework of paramodular 
unification (Siekmann 1984). 
8. Conclusion 
An attractive feature of UCG is the manner in which different lev- 
els of representation - semantic, syntactic and phonological - are 
built up simultaneously, by the mfiform device of unification. 
There axe, of course, different organizing principles at the 
different levels; conjunction and implication exist at the seman- 
tic level, but not at the syntactic or phonological. Nevertheless, 
the composit:,onal construction of all three levels takes place in 
the same manner, namely by the accretion of constraints on pos- 
sible representations. Although we have said nothing substantive 
about phonology here, it seems plausible, in the light of 
Bach and Wheeler 1981 and Wheeler 1981, that the methodologi- 
cal principles of compositiouality, monotonicity and locality can 
also lead to illuminating analyses in the domain of sound stnm- 
ture. 
UCG is distinctive in the particular theory of semantic representa- 
tion which it espouses. Two iilcidental features of InL may 
obscure its relation to DRT. The first is very minor: our formu- 
las are linear, rather than consisting of "box-ese". The second 
difference is that we appear to make no distinction between the 
set of conditions in it DRS, and the set of discourse markers. In 
fact, this is not the case. A simple recursive definition (similar to 
that for "free wtriable" in predicate logic) suffices to construct 
the cunmlative set of discourse markers associated with a com- 
plex condition from indices within a formula. This definition 
also allows us to capture the constraints on anaphora proposed by 
DRT. These departures from the standard DRT formalism do not 
adversely affect the insights of Kamp's theory, but do offer a 
substantial advantage in allowing a rule-by-rule construction of 
the representations, something which has evaded most other ana- 
lyses in the literature. 
UCG syntax is heavily polymorphic in the sense that the category 
identity of a fim~tion application typically depends on the make- 
up of the argmnent. Thus, the result of applying a type-raised 
NP to a transitive verb phrase is an intransitive verb phrase, 
while exactly the same functor applied to an intransitive verb 
phrase will yield a sentence. Analogously, a prepositional 
moditier applied to a scntence will yield a sentence, while exactly 
the same functor applied to a noun will yield a noun. This 
approach allows us to dramatically simplify the set of categories 
employed by the grammar, while also retaining the fundamental 
insight of standard categorial grammar, namely that expressions 
combine as fnnctor and argument. Such a mode of combination 
treats head-complement relations and hcad-moditier relations as 
special cases, and provides an elegant typology of categories that 
can only be awkwardly mimicked in X-bar syntax. 
Finally, we note one important innovation. Standard ctttegorial 
grammar postulates a functor-argument pair in semantic rcpresen- 
ration which parallels the syntactic constituents; typically, 
lambda-abstraction is required to construct the appropriate flmctor 
expressions in semantics. By contrast, the introduction of signs 
to the right of the categorial slash means that we subsume 
semantic combinatioi* within a generalized fnnctional application, 
and the necessity ~{) consmtcting specialized flmctors in the 
semantics simply disaopears. 

References 

Bach, E. and Wheeler, D. (1981) Monmgue Phonology: A First 
Approximation. In Chao, W. and Wheeler, D. (eds.) University of 
Massachusetts Occasional Papers in Linguistics, Volume 7, pp27-45. 
Distributexl by Gradtmte Linguistics Student Association, University 
of Massachusetts. 

Baschung, K., Bes, G. G., Corluy, A. and Guillotin, T. (1987) Anxiliaries and 
Clitics in French UCG (;rammar. In Proceedings of the Third 
Conference of the European Chapter of the Association for 
Computational Linguistics, 1987. 

Bouma, G. (1988) Modiliers and specifiers in categorial unification grammar. 
Linguistics, 26, 21-46. 

Calder, J., Morns, M. and Zeevat, H. (1986) A UCG Interpreter. ESPRIT 
PROJECT 393 ACORD; Deliverable I'2.6. 

Davidson, D. (1967) Tile logical form of action sentences, in Rescher, N. 
(ed.) Tile Logic of Decision and Action. Pittsburgh: University of 
Pittsburgh Press. 

Gazdar, G., Klein, E., Pullum, G. and Sag, I. (1985) Generalized Phrase 
Structure Grammar. London: Basil Blackwell. 

Johnson, M. and Klein, E. (1986) Discom'se, anaphora and parsing. In 
Proceedings of the llth International Conference on Computational 
Linguistics and the 24th Annual Meeting of the Association for 
Computational Linguistics, Institut fuer Kommunikationsforschung 
und Phonetik, Bonn University, Bonn, August, 1986, pp669-675. 

Kamp, J. A. W. (1975) Two Theories about. Adjectives. In Keenan, E. L. 
(ed.) Formal Semantics of Natural Language: Papers from a 
colloquium sponsored by King's College Research Centre, 
Cambridge, pp123-155. Cambridge: Cambridge University Press. 

Kamp, H.(1981) A theory of truth and semantic representation. In 
Groenendijk, J. A. G., Janssen, T. M. V. and Stokhof, M. B. J. (eds.) 
Formal Methods in the Study of Language, Volume 136, pp277-322. 
Amsterdam: Mathematical Centre Tracts. 

Karttunen, L. (1986) Radical Lexicalism. Report No. CSLI-86-68, Center for 
the Study of Language and Information, December, 1986. Paper 
presented at the Conference on Alternative Conceptions of Phrase 
Structure, July 1986, New York. 

Montague, R. (1973) The protx~r treatment of quantification in ordinary 
English. In Hintikka, J., Moravcsik, J. M. E. and Suppes, P. (eds.) 
Approaches to Natural Language. Dordrecht: D. Reidel. Reprinted 
in R H Thomason (ed.) (1974), Formal Philosophy: Selected Papers 
of Richard Montague, pp247-270. Yale University Press: New 
Haven, Conn. 

Pollard, C. J. (1985) Lectures on HPSG. Unpublished lecture notes, CSLI, 
Stanford University. 

Shieber, S., Uszkoreit, H., Pereira, F. C. N., Robinson, J. J. and Tyson, 
M. (1983) The Formalism and Implementation of PATR-II. In 
Grosz, 13. and Stickel, M. E. (eds.) Research on Interactive 
Acquisition and Use of Knowledge, SRI International, Menlo Park, 
1983, pp39-79. 

Siekmann, J. H. (1984) Universal Unification. In Shostak, R. H. (ed.) 
Proceedings of the Seventh International Conference on Automated 
Deduction, Napa, California, May, 1984, ppl-42. Lecture Notes in 
Computer Science; Springer-Verlag. 

Uszkoreit, H. (1986) Categorial Unification Grammars. In Proceedings of the 
l l th International Conference on Computational Linguistics and the 
24th Annual Meeting of the Association for Computational 
Linguistics, Institut fuer Kommanikationsforschung und Phouetik, 
Bonn University, Bonn, 25-29 August, 1986, pp187-194. Also Center 
for the Study of Language and Information. 

Wheeler, D. (1981) Aspects of a Categorial Theory of Phonology. PhD 
Thesis, Linguistics, University of Massachusetts at Amherst. 
Distributed by Graduate Linguistics Student Association, University 
of Massachusetts. 

Wittenburg, K. W. (1986) Natural Language Parsing with Combinatory 
Categorial Grammar in a Graph-Unification-Based Formalism. PhD 
Thesis, Department of Linguistics, University of Texas. 

Zeevat, H., Klein, E. and Calder, J. (1987) An Introduction to Unification 
Categorial Grammar. In Haddock, N. J., Klein, E. and Morrill, G. 
(eds.) Edinburgh Working Papers in Cognitive Science, Volume 1: 
Categorial Grammar, Unification Grammar, and Parsing. 

Zeevat, H. (1987) Quantifier Scope in Monostratal Grammars. Unpublished 
ms. University of Stuttgart. 
