An Efficient Implementation of PATR 
for Categorial Unification Grammar 
Todd Yampol 
Stanford University 
Lauri Karttunen 
Xerox PARC and CSLI 
1 Introduction 
This paper describes C-PATR, a new C im- 
plementation of the PATR-II formalism \[4, 5\] 
for unification-based grammars. It includes 
innovations that originate with a project for 
developing an efficient translator from En- 
glish to first-order logic \[2\], most notably the 
extension of the standard unifcation algo- 
rithm for list values \[section 3\]. In addition 
the unifier and a chart parser tuned for cate- 
gorial grammars, the system (C-PATR) con- 
tains a set of tools necessary for grammar 
development. These tools include facilities 
for hier,~rehical lexicon design and interactive 
grammar debugging \[section 4\]. 
2 Grammar Fo~:ma~ism 
2.1 PATR-I\]\[ as implemented 
in C-PATR 
PATR-II is a formalism for describing gram- 
mars in terms of feature structures. C- 
PATR supports two equivalent notational 
systems for representing feature structures, 
path equations and attribute-value matrices. 
Path equations can be used to define a hier- 
archical system of templates \[section 4\] that 
encode linguistic generalizations. Internally, 
feature structures as are represented as di- 
rected graphs (DGs). PATR-style feature 
structures are capable of describing a wide 
variety of unification-based grammars. The 
present version of C-PATR is designed to 
support only pure categorial grammars. It 
does not support the use of explicit phrase 
structure rules, thus C- PATR is not an ex- 
haustive implementation of PATR. 
2.2 Categorial grammars as feature 
structures 
A categorial grammar represents syntactic 
relations in a completely lexical fashion, i.e. 
without explicit phrase structure rules. Lex- 
ical items belong to basic or functor cate- 
gories. A basic category is inert, in that it 
does not seek to combine with other cate- 
gories. Functor categories perforln the bulk 
of the work by actively seeking to combine 
with other categories. A functor category 
specifies the category of its argument, a direc- 
tion in which to search for the argument, and 
the category of the result that is produced by 
applying the functor to its argument. With 
only this simple machinery, it is possible to 
describe a wide range of syntactic phenom- 
ena. 
In C-PATR, basic categories are those with 
NONE as the value of the argument at- 
tribute. (NONE is a regular atomic value 
that is given special status by the parser.) 
Functor categories must have values speci- 
fied for the argument, direction, and result 
attributes (see Figure 1). 
The parsing algorithm manages the forma- 
tion of constituents through the application 
of functors to their arguments \[see section 
3\]. The argument and result attributes can 
contain information other than simple cate- 
gory designations. For example, the sample 
grammar in the appendix uses these slots to 
place constraints on the argument, to pass in- 
formation from the argument to the functor, 
and to construct a semantic representation. 
cat:N \] 
argument:NONEJ 
Figure 1: Traditional categorial descriptions 
argument: \[ cat:NP\]\] 
direction:left 
result:\[ cat:S\] J 
of Noun (basic) and V-intrans (a functor) 
3 Unification and Parsing 
Algorithms 
C-PATR offers two varieties of unification. A 
standard unification algorithm (adapted from 
D-PATR \[1\]) is used in creating the internal 
representation of a grammar, while a more 
complex algorithm featuring list unification 
\[see below\] is employed by the parser. The 
parser itself is a fairly standard active chart 
parser (also adapted from D-PATR). 
3.1 Optimizing parsing 
and unification 
Function application is the only composi- 
tional technique used by C-PATRs parser. 
More powerful techniques such as functional 
composition and type-raising are not used. 
In parsing a non-trivial sentence, hundreds 
of unifications are attempted, hence the data 
types and algorithms that C-PATR employs 
during unification must be optimized in order 
to achieve efficient parsing. In order to per- 
form quick comparisons while keeping sym- 
bol names readily available, a symbol in C- 
PATR is designated to be the location in 
memory of its print name, maintained on a 
letter tree, where each unique symbol-name 
has only one entry. 
3.2 List unification 
Merging partial information by unification is 
not sufficient for the description of all the 
correspondences between syntactic and se- 
mantic representation. A case in point is 
the semantics of conjoined noun phrases \[2\]. 
An appropriate semantic representation for a 
sentence like b and c are small is aconjoined 
formula, small(b) A small(c). Such represen- 
tations cannot be derived by pure unification 
420 
because two instances of the logical predi- 
cate small with different arguments must be 
produced from a single instance of the word 
small. The same difficulty arises with re- 
ciprocal pronouns (each other) and numeral 
determiners. C-PATR solves this problem 
by extending unification to list values, with 
an effect that is similar to abstraction and 
lambda conversion in logic. For example, a 
conjoined noun phrase, such as b and c, may 
require that the verb phrase it combines with 
has a list-valued semantic representation. If 
the verb phrase, such as are small, is not of 
that type, the unifier simply coerces the ar- 
gument to a list value thereby producing two 
copies of its semantic translation. 
The algorithm for list unification is quite 
straightforward. (1) Two lists can be unified 
if they have the same number of elements, 
and if each corresponding pair of elements is 
unifiable. (2) Two lists of unequal lengths are 
not unifiable. (3) To unify a list of length n 
with a simple DG (non-list), coerce the non- 
list into a list by making n copies of the non- 
list, unifying each instance the non-list with 
a successive element of the list. (4) If any sin- 
gle sub-unification fails, then the whole uni- 
fication fails. In our system, list values are 
represented as feature structures using the 
special attributes first and rest (analogous to 
CAR and CDR in Lisp). 
3.3 Chart Parser 
C-PATRs chart parser is a simplified version 
of general chart parsing algorithm. In a cat- 
egorial grammar, all constituents are formed 
from two pieces (a functor and an argument), 
thus the parser need only consider binary 
rules. 
The parser includes a subsumption filter 
\[1\]. Just before an edge is added to the 
chart, the filter checks if there are any iden- 
tical edges spanning the same nodes as the 
candidate edge. If there are any such edges, 
then the duplicate edge is not placed on the 
chart. Subsumption checking eliminates re- 
dundant analyses, and improves parsing effi- 
ciency :for grammars that have many differ- 
ent ways to reach the same analysis. When a 
more complete parsing record is desired, the 
subsumption filter can be toggled off. 
4 Special Features 
4.1 Hierarchical lexicon design 
C-PATR allows the user to specify a gram- 
mar in terms of a hierarchical system of tem- 
plates. The grammar is divided into two 
parts, a set of templates and a set of lexical 
entries. Each template consists of a name 
(designated by an Q-sign) followed by a set 
of explicit path equations and references to 
other templates \[see Appendix A\]. The path 
equations are compiled into directed graphs. 
When a template is referred to within an- 
other template definition the latter inherits 
the path equations of the former. The sample 
grammar makes use of template inheritance 
in the entries for @Vtrans, @Ga, and @O 
\[see Appendix\]. A template can also be used 
in a path equation (as in the sample gram- 
mat's entries for @V\Vstem and @Parti- 
cle) to define a complex value. 
The format of the lexicon file is identical 
to that of the template file except that the 
labels for lexical entries do not begin with 
@-signt~. While a number of path equations 
usually constitute the body of a template, 
a typical lexical entry contains few explicit 
path equations. If a set of templates is well 
constructed, the list of template names men- 
tioned in a lexical entry constitutes a mean- 
ingful high-level description of the word. \[see 
Appendix B\]. Path equations mentioned in 
a lexical entry should describe only the id- 
iosyncratic properties of the word. The form 
of the entry is automatically assigned to the 
attribute lez unless specified otherwise. 
4.2 Interactive grammar debugging 
and lexicon compiling 
In designing a grammar, the user specifies 
templates or expanded lexical entries within 
a text file. C-PATR then compiles the text 
into an internal representation for the parser. 
This compilation task has been optimized to 
allow for reasonable interactive grammar de- 
velopment and debugging on small personal 
computers. On a Sun- 4, a 100K source 
grammar compiles into a 140K binary form in 
5 seconds. On a Mac-II, the same task takes 
30 seconds. To improve the grammar loading 
efficiency on the Macintosh, C-PATR pro- 
vides a facility for pre-compiling the gram- 
mar. The Mac resource file created by pre- 
compilation loads in less than 2 seconds. 
4.3 Services provided by C-PATR 
C-PATR is driven by single character com- 
mands. These are summarized in Figure 2: 
Type a sentence to parse or: 
n to see contents of edge number n 
b to run a batch test 
f to toggle subsumption filter 
1 to view lexical entries for a word 
m to view a micro-dump of chart 
1 to load a new lexicon 
o to specify an output file 
p to review phrase that was parsed 
q to quit 
t to toggle result print format 
s to view a short dump of chart 
t to view logical translation(s) 
u to unify two arbitrary edges 
v to toggle variable style 
w to list words 
x to view extra long chart dump 
z to zap expanded lexicon to a file 
Figure 2: C-PATR command summary 
3 
421 
5 Conclusion 
C-PATR has advantages in size, speed, and 
portability over its predecessors. By choos- 
ing C as our implementation language, we 
gained in all three areas. Earlier PATR im- 
plementations, written in Lisp and Prolog, 
require the high overhead of an interpreter. 
C- PATRs 135k of source code compiles into 
a 58k stand-alone application on the Mac, 
and an 82k stand-alone on the Sun-4. C- 
PATR is an order of magnitude faster than 
D-PATR. C-PATR has been compiled on the 
Macintosh and on various Unix systems. 
There are currently plans to enhance C- 
PATRs existing syntactic component with a 
two-level morphological analyzer \[3\]. The 
sample grammars treatment of yonda \[see 
Appendix\] is an example of how one might 
make use of morphologically analyzed forms. 
C-PATR is available through the Center 
for the Study of Language and Information 
at Stanford. 
Acknowledgements 
Thank,~ to the Center for the Study of Lan- 
guage and Information and the Symbolic 
Systems Program for their generous support 
of this project. Also, thanks to Dorit Ben- 
shalom for offering many valuable sugges- 
tions that directly influenced the design of 
C-PATR. 
Appendix: Grammar for 
a fragment of Japanese 
created in C-PATR 
A Templates for Japanese 
@Basic 
<argument> = NONE. 
@Functor-left 
<direction> = left. 
@Functor-right 
<direction> = right. 
@V 
@Basic 
<cat> = Vstem 
<semantics pred> = <lex> . 
@Vtrans 
@V 
<syntax ga> -- <semantics agent> 
<syntax o> -- <semantics theme>. 
422 
4 
@V\Vstem 
(~Functor-left 
,:cat> = V\Vstem 
<argument cat> = Vstem 
,:result> = @Basic 
<:result cat> = V 
,:result morphology> = <morphology> 
,:result syntax> = <argument syntax> 
,:result semantics> = 
<argument semantics> . 
@Past <morphology tense> = past. 
@Informal <morphology level> = informal. 
@Noun 
@Basic 
<cat > : N 
<:semantics ind> = <lex> . 
@Particle 
@tihnctor-left 
<:cat> : Particle 
<:argument cat> :: N 
<:result cat> :- NP 
<:result> = @Functor-right 
<result argument cat> ::: V 
<:result result> = @B~:~sic 
<:result result cat> = 
<result argument cat> 
<:result result semantics> = 
<result argument semantics> 
<:result result morphology> = 
<result argument morphology> . 
@Ga 
@Particle 
<:result argument syntax ga> : 
< argument semantics ind> 
<:result result syntax ga> : filled 
<:result result syntax o> : 
<result argument syntax o> . 
@O 
@Particle 
<:result argument syntax o> = 
<argument semantics ind> 
<result result syntax ga> = 
<result argument syntax ga> 
<:result result syntax o> = filled. 
B Unexpanded lexical entries 
john @Noun. 
hon @Noun. 
ga @Ga. 
o QO. 
yom 
@Vtrans 
<lex> = yomu. 
-ta 
@V\Vstem 
@Past 
@Informal. 
C Sample expanded entry 
for the particle ga 
cat:Particle 
argument: semantics: \[ 
L 
direction:left 
cat:NP 
argument: 
cat:V 
morphology:#2 
syntax: \[ ga:#l\] o:#3 j 
semantics:#4 
result: direction:right 
result: 
cat:V 
morphology:#2 
syntax: \[ o:#3ga:filled\]\] 
semantics:#4 
argument:NONE 
lex:ga 
5 
423 
1-) Sample C-PATR session 
Welcome to C-PATR! 
lexicon type: 
1. templates (.tem file) 
2. expanded lexicon (.xlx file) 
-.->1 
What is the template file? coling.tem 
What is the lexicon file? coling.lex 
Loading attribute ranking ........... done 
- templates - 
#.Basic 
#.Functor-left 
#.Functor-right 
#-V 
#.Vtrans 
#.VkVstem 
#.Past 
#.Informal 
#.Noun 
#-Particle 
#-Ga 
#-O 
- lexical items - 
john 
hon 
ga 
O 
yom 
-ta 
:,john ga hon o yom -ta 
\[john read a book. Note that yonda has been 
morphologically analyzed.\] 
john ga hon o yom -ta 
number of parses: 1 
0.100 seconds 
11 edges, 31 dgs, 79 avs 
>m 
\[C-PATR command to list the span of each 
edge\] 
0. john 
1. ga 
2. john ga 
3. hon 
4.0 
5. hon o 
6. yom 
7. -ta 
8. yom-ta 
9. hon o yom -ta 
10. john ga hon o yom -ta 
>10 
\[C-PATR command to display edge #10, 
which contains the parse\] 
content: 
\[cat:V 
morphology:\[levehinformal 
tense:past\] 
syntax: \[ga:filled 
o:filled\] 
semantics:\[pred:yomu 
agent:john 
theme:hon\] 
argument:NONE\] 
parse tree: 
V\[NP\[N<john> 
Particle<ga>\] 
V\[NP\[N<hon> 
Particle<o>\] 
V\[Vstem<yom> 
V\Vstem<-ta>\]\]\] 
>q 
bye! 
424 6 

Bibliography 

\[1\] Karttunen, Lauri, D-PATR, A development 
environment for unification-based gram- 
mars, Report No. CSLI-86-81, Center for 
the Study of Language and Information, 
Stanford, California, 1986. 

\[2\] Karttunen, Lauri, Translating from English 
to Logic in Tarski's World, In the Proceed- 
ings of of ROCLING-II, September 22-24, 
Sun- Moon Lake, Taiwan, 1989, pp 43-72. 

\[3\] Koskenniemi, Kimmo, Two-Level Morphol- 
ogy: A General Computational Model for 
Word-Form Recognition and Production, 
Publications No. 11, Department of Gen- 
eral Linguistics, University of Helsinki, 
Helsinki, Finland, 1983. 

\[4\] Shieber, Stuart, An Intro- 
duction to Unification-based Approaches to 
Grammar, CSLI Lecture Note Series, Vol- 
ume 4, Chicago University Press, Chicago, 
Illinois, 1986. 

\[5\] Shieber, Stuart, Parsing and Type Infer- 
ence for Natural and Computer Languages, 
Technical Note 460, Stanford Research In- 
ternational, Menlo Park, California, 1989. 
