Features and Values 
Lauri Karttunen 
University of Texas at Austin 
Artificial Intelligence Center 
SRI International 
and 
Center for the Study of Language and Information 
Stanford University 
Abstract 
The paper discusses the linguistic aspects of a new gen- 
eral purpose facility for computing with features. The pro- 
gram was developed in connection with the course I taught 
at the University of Texas in the fall of 1983. It is a general- 
ized and expanded version of a system that Stuart Shieber 
originally designed for the PATR-II project at SRI in the 
spring of 1983 with later modifications by Fernando Pereira 
and me. Like its predecessors, the new Texas version of the 
"DG {directed graph}" package is primarily intended for 
representing morphological and syntactic information but 
it may turn out to be very useful for semantic representa- 
tions too. 
1. Introduction 
Most schools of linguistics use some type of feature no- 
tation in their phonological, morphological, syntactic, and 
semantic descriptions. Although the objects that appear 
in rules and conditions may have atomic names, such as 
"k," "NP," "Subject," and the like, such high-level terms 
typically stand for collections of features. Features, in this 
sense of the word, are usually thought of as attribute-value 
pairs: \[person: lst\], \[number: sg\], although singleton fea- 
tures are also admitted in some theories. The values of 
phonological and morphological features are traditionally 
atomic; e.g. 1st, 2nd, 3rd; they are often binary: +, -. 
Most current theories also allow features that have com- 
plex values. A complex value is a collection of features, for 
example: 
Isgreement: r per$°n: 3rdll Lnumber: sgJJ 
Lexical Functional Grammar (LFG) \[Kaplan and Bres- 
nan, 83\], Unification Grammar (UG) \[Kay, 79\], General- 
ized Phrase Structure Grammar (GPSG) \[Gazdar and Pul- 
lum, 82l, among others, use complex features. 
Another way to represent feature matrices is to think of 
them as directed graphs where values correspond to nodes 
and attributes to vectors: 
"lag reement 
numb~/~eri°n 
sg 3rd 
In graphs of this sort, values are reached by traversing 
paths of attribute names. We use angle brackets to mark 
expressions that designate paths. With that convention, 
the above graph can also be represented as a set of equa- 
tions: 
<agreement number> = sg <agreement person> = 3rd 
Such equations also provide a convenient way to ex- 
press conditions on features. This idea lies at the heart of 
UG, LFG, and the PATR-II grammar for English \[Shieber, 
et al., 83\] constructed at SRI. For example, the equation 
<subject agreement> = <predicate agreement> 
states that subject and predicate have the same value for 
agreement. In graph terms, this corresponds to a lattice 
where two vectors point to the same node: 
subject ~ I predicate 
agreement ~~agreement 
numb~erson 
sg 3rd 
28 
In a ca~'~e like this, the values of the two paths have been 
"unified." To represent unification in terms of feature ma- 
trices we need to introduce some new convention to distin- 
guish between identity and mere likeness. Even that would 
not quite suffice because the graph formalism also allows 
unification of values that have not yet been assigned. 
A third way to view these structures is to think of 
them ~s partial functions that assign values to attributes 
\[Sag et.aL, 8.1\]. 
2. Unification and Generalization 
Several related grammar formalisms (UG, LFG, PATR- 
II, and GPSG) now e×ist that are based on a very similar 
conception of features and use unification as their basic op- 
eration. Because feature matrices (lattice nodes) are sets 
of attribute-value pairs, unification is closely related to the 
operation of forming a union of two sets. However, while 
the latter always yields something-at least the null set, 
unification is an operation that may fail or succeed. When 
it fails, no result is produced and the operands remain un- 
changed; when it succeeds, the operands are permanently 
altered in the process. They become the same object. This 
is an important characteristic. The result of unifying three 
or more graphs in pairs with one another does not depend 
on the order in which the operations are performed. They 
all become the same graph at the end. 
If graphs A and B contain the same attribute but have 
incompatible values for it, they cannot be unified. If A 
and B arc compatible, then (Unify A B) contains every 
attribute that appears only in A or only in B with the 
value it has there. If some attribute appears both in A 
and B, then the value of that attribute in (Unify A B) is 
the unification of the two values. For example, 
r . rnumber* )" == I sgreernent: be,son: 2n 
J \[case: nominative 
r II B " lagreement: Iperson: 3rd / Lgender* m.sc, j 
Lease: genitive 
(Generalige A B) = \[agreement: \['number:. SI~.~\] 
Generalization seems to be a very useful notion for ex- 
pressing how number and gender agreement works in coor- 
dinate noun phrases. One curious fact about coordination 
is that conjunction of "I" with "you" or "he" in the subject 
position typically produces first person verb agreement. In 
sentences like "he and I agree" the verb has the same form 
as in "we agree. " The morphological equivalence of "he" 
and I," "you and I," and "we" is partially obscured in En- 
glish but very clear in many other languages. The problem 
is discussed in Section V below. 
3. Limitations of Some Current For- 
malisms 
Most current grammar formalisms for features have 
certain built-in limitations. Three are relevant here: 
• no cyclic structures 
• no negation 
• no disjunction. 
The prohibition against cyclicity rules out structures 
that contain circular paths, as in the following example. 
A = \[agreement: \['number:, pill\] 
B = 
(Unify A B) 
I: greement: \['person: 31u:l\]l ase: nominative 
-r.... , I' ge e.' be,=on: 
Lease: nominative 
Simple cases of grammatical concord, such as number, 
case and gender agreement between determiners and nouns 
in many languages, can be expressed straight-forwardly by 
stating that the values of these features must unify. 
Another useful operation on feature matrices is gen- 
eralization. It is closely related to set intersection. The 
generalization of two simple matrices A and B consists of 
the attribute-value pairs that A and B have in common. 
If the ~lues themselves are complex, we take the general- 
ization of those values. 
For example, 
a 
Here the path <a b c> folds back onto itself, that is, 
<a> = <a b c>. It is not clear whether such descriptions 
should be ruled out on theoretical grounds. Whatever the 
case might be, current implementations of LFG, UG, or 
GPSG with which I am familiar do not support them. 
The prohibition against negation makes it impossible 
to characterize a feature by saying that it does NOT have 
such and such a value. None of the above theories allows 
specifications such as the following. We use the symbol "-" 
to mean 'not.' 
\[o==,:  dat\]\] 
29 
\[.°,..o.o, 
The first statement says that case is "not dative," the 
second says that the value of agreement is "anything but 
3rd person singular." 
Not allowing disjunctive specifications rules out ma- 
trices of the following sort. We indicate disjunction by 
enclosing the alternative values in {}. 
I. ,..,III g,,,.,,,t: IL","b,':. ,Q ,?! 
L\['number: pl~\] jj 
loose: {nora aoo} 
The first line describes the value of case as being "ei- 
ther nominative or accusative." The value for agreement 
is given as "either feminine singular or plural." Among 
the theories mentioned above, only Kay's UG allows dis- 
junctive feature specifications in its formalism. (In LFG, 
disjunctions are allowed in control equations but not in the 
specification of values.) 
Of the three limitations, the first one may be theo- 
retically justified since it has not been shown that there 
are phenomena in natural languages that involve circular 
structures (of. \[Kaplan and Bresnan, 83\], p. 281). PATR-II 
at SRI and its expanded version at the University of Texas 
allow such structures for practical reasons because they 
tend to arise, mostly inadvertently, in the course of gram- 
mar construction and testing. An implementation that 
does not handle unification correctly in such cases is too 
fragile to use. 
The other two restrictions are linguistically unmoti- 
vated. There are many cases, especially in morphology, 
in which the most natural feature specifications are nega- 
tive or disjunctive. In fact, the examples given above all 
represent such cases. 
The first example, \[case: -dat\], arises in the plu- 
ral paradigm of words like "Kind" child in German. 
Such words have two forms in the plural: "Kinder" and 
"Kindern." The latter is used only in the plural dative, 
the former in the other three cases (nominative, genitive, 
accusative). If we accept the view that there should be just 
one rather than three entries for the plural suffix "-er", we 
have the choice between 
-ez" ffi number: pl ac c).l 
ase: {nora gen 
-er = Fnumber: pl l 
\[_case' ~atJJ 
The second alternative seems preferrable given the fact 
that there is, in this particular declension, a clear two- 
way contrast. The marked dative is in opposition with an 
unmarked form representing all the other cases. 
The ~econd example is from English. Although the fea- 
tures "number" and "person" are both clearly needed in 
English verb morphology, most verbs are very incompletely 
specified for them. In fact, the present tense paradigm of 
all regular verbs just has two forms of which one represents 
the 3rd person singular ("walks") and the other ("walk") 
is used for all other persons. Thus the most natural char- 
acterization for "walk" is that it is not 3rd person singu- 
lar. The alternative is to say, in effect, that "walk" in the 
present tense has five different interpretations. 
The system of articles in German provides many ex- 
amples that call for disjunctive feature specifications. The 
article "die," for example, is used in the nominative and 
accusative cases of singular feminine nouns and all plural 
nouns. The entry given above succinctly encodes exactly 
this fact. 
There are many cases where disjunctive specifications 
seem necessary for reasons other than just descriptive el- 
egance. Agreement conditions on conjunctions, for exam- 
pie, typically fail to exclude pairs where differences in case 
and number are not overtly marked. For example, in Ger- 
man \[Eisenberg, 73\] noun phrases like: 
des Dozenten (gen sg) the docent's 
der Dozenten (gen pl) the docents'. 
can blend as in 
der Antrag des oder der Dozenten 
the petition of the docent or docents. 
This is not possible when the noun is overtly marked for 
number, as in the case of "des Professors" (gen sg) and 
"der Professoren" (gen pl): 
*der Antrag des oder der Professors 
*der Antrag des oder der Professoren 
the petition of the professor or professors 
In the light of such cases, it seems reasonable to as- 
sume that there is a single form, "Dozenten," which has 
a disjunctive feature specification, instead of postulating 
several fully specified, homonymous lexical entries. It is 
obvious that the grammaticality of the example crucially 
depends on the fact that "Dozenten" is not definitely sin- 
gular or definitely plural but can be either. 
4. Unification with Disjunctive and 
Negative Feature Specifications 
I sketch here briefly how the basic unification proce- 
dure can be modified to admit negative and disjunctive 
values. These ideas have been implemented in the new 
Texas version of the PATR-II system for features. (I am 
much indebted to Fernando Pereira for his advice on this 
topic.) 
Negative values are created by the following operation. 
If A and B are distinct, i.e. contain a different value for 
some feature, then (Negate A B) does nothing to them. 
Otherwise both nodes acquire a "negative constraint." In 
effect, A is marked with -B and B with -A. These con- 
straints prevent the two nodes from ever becoming alike. 
30 
When A is unified with C, unification succeeds only if the 
result is distinct from B. The result of (Unify A C) has to 
satisfy all the negative constraints of both A and C and it 
inherits all that could fail in some later unification. 
Disjunction is more complicated. Suppose A, B and 
C are all simple atomic values. In this situation C unifies 
with {A B} just in case it is identical to one or the other 
of the disjuncts. The result is C. Now suppose that A, B, 
and C are all complex. Furthermore, let us suppose that A 
and B are distinct but C is compatible with both of them 
as in the following: 
A : F..oo.,: ,.mq Lnumber: sg.J 
13 = \['nur"ber: pl"\] 
c-- \[=.,,: .=o'1 
What should be the result of (Unify {A B} ~)? Because 
A and B are incompatible, we cannot actually unify C with 
both of them. That operation would fail. Because there is 
no basis for choosing one, both alternatives have to be leR 
open. Nevertheless, we need to take note of the fact that 
either A or B is to be unified with C. We can do this by 
making the result a complex disjunction. 
c' = {(A C) (B C)) 
The new value of C, C', is a disjunction of tuples which 
can be, but have not yet been unified. Thus (A C) and {B 
C) are sets that consist, of compatible structures. Further- 
more, at least one of the tuples in the complex disjunction 
must remain consistent regardless of what happens to A 
and B. After the first unification we can still unify A with 
any structure that it is compatible with, such as: 
D- \['oa.se: nor.'\] 
If this happens, then the tuple (A C) is no longer con- 
sistent. A side effect of A becoming 
A, o Fge. e,: ,.mq 
I-umb,,: sg / LC,,se: nor" j 
is that C' simultaniously reduces to {(B C)}. Since there 
is now only one viable alternative left, B and C can at this 
point be unified. The original result from (Unify {A B} 
C) now reduces to the same as (Unify B C). 
c" = ((B c)) = F..r"be,: p'l 
! / Lease: acoj 
As the example shows, once C is unified with {A B}, A 
and B acquire a "positive constraint." All later unifications 
involving them must keep at least one of the two pairs (A 
C), (B C) unifieable. If at some later point one of the 
two tuples becomes inconsistent, the members of the sole 
remaining tuple finally can and should be unified. When 
that has happened, the positive constraint on A and B can 
also be discarded. A more elaborate example of this sort 
is given in the Appendix. 
Essentially the same procedure also works for more 
complicated cases. For example, unification of {A B} with 
{C D} yields {(A C) (i D) (B C) (B D)} assuming that 
the two values in each tuple are compatible. Any pairs that 
could not be unified are left out. The complex disjunction 
is added as a positive constraint to all of the values that 
appear in it. The result of unifying {(A C) (B C)} with 
{(DF) (E F)} is {(A C D F) (ACEF) (BCDF)(BC 
E F)}, again assuming that no alternative can initially be 
ruled out. 
As for generalization, things are considerably simpler. 
The result of (Generalize A B) inherits both negative and 
positive constraints of A and B. This follows from the fact 
that the generalization of A and B is the ma~ximal sub- 
graph of A and B that will unify with either one them. 
Consequently, it is subject to any constraint that affects A 
or B. This is analogous to the fact that, in set theory, 
(A - C) n (B - D) = (A n B) - (C u D) 
In our current implementation, negative constraints 
are dropped as soon as they become redundant as far as 
unification is concerned. For example, when \[case: ace\] 
is unified with with \[case: -dat\], the resulting matrix is 
simply \[case: acc\]. The negative constraint, is eliminated 
since there is no possibility that it could ever be violated 
later. This may be a wrong policy. It has to be modified 
to make generalization work as proposed in Section V for 
structures with negative constraints. If generalization is 
defined as we have suggested above, negative constraints 
must always be kept because they never become redundant 
for generalization. 
When negative or positive constraints are involved, 
unification obviously takes more time. Nevertheless, the 
basic algorithm remains pretty much the same. Allowing 
for constraints does not significantly reduce the speed at 
which values that do not have any get unified in the Texas 
implementation. 
In the course of working on the project, I gained one 
insight that perhaps should have been obvious from the 
very beginning: the problems that arise in this connection 
are very similar to those that come up in logic program- 
ming. One can actually use the feature system for certain • 
kind of inferencing. For example, let Mary, Jane, and John 
have the following values: 
Mary- ~ha~r: blond~\] 
Jane- \[h~r: dA~'1 
John = \['sister:. {Jane Mary~-~\] 
31 
If we now unify John with 
\[sister: \[eyes: blue\]\]. 
both Jane and Mary get marked with the positive con- 
straint that at least one of them has blue eyes. Suppose 
that we now learn that Mary has green eyes. This imme- 
diately gives us more information about John and Jane as 
well. Now we know that Jane's eyes are blue and that she 
definitely is John's sister. The role of positive constraints 
is to keep track of partial information in such a way that 
no inconsistencies are allowed and proper updating is done 
when more things become known. 
5. Future prospects: Agreement in 
Coordinate Structures 
One problem of long standing for which the present sys- 
tem may provide a simple solution is person agreement in 
coordinate noun phrases. The conjunction of a 1st person 
pronoun with either 2nd or 3rd person pronoun invariably 
yields 1st person agreement. =I and you" is equivalent to 
=we," as far as agreement is concerned. When a second 
person pronoun is conjoined with a third person NP, the 
resulting conjunction has the agreement properties of a 
second person pronoun. Schematically: 
let + 2nd - Is~ 
ts~ + 3rd -Ist 
2nd + 3rd - 2nd. 
Sag, Gazdar, Wasow, and Weisler \[841 propose a so- 
lution which is based on the idea of deriving the person 
feature for a coordinate noun phrase by generalization (in- 
tersection) from the person features of its heads. It is ob- 
vious that the desired effect can be obtained in any feature 
system that uses the fewest features to mark 1st person, 
some additional feature for 2nd person, and yet another for 
3rd person. Because generalization of 1st and 2nd, for ex- 
ample, yields only the features that two have in common, 
the one with fewest features wins. 
Any such solution can probably be implemented easily 
in the framework outlined above. However, this proposal 
has one very counterintuitive aspect: markedness hierar- 
chy is the reverse of what traditionally has been assumed. 
Designating something as 3rd person requires the greatest 
number of feature specifications. In the Sag et ai. system, 
3rd person is the most highly marked member and 1st per- 
son the least marked member of the trio. Traditionally, 3rd 
person has been regarded as the unmarked case. 
In our system, there is a rather simple solution under 
which the value of person feature in coordinate NPs is de- 
rived by generalization, just as Sag it et al. propose, which 
nevertheless preserves the traditional view of markedness. 
The desired result can be obtained by using negative con- 
straints rather than additional features for establishing a 
markedness hierarchy. For example, the following feature 
specifications have the effect that we seek. 
181; == Foonversant: +\] 
Lspeake~ + 
2rid :" Fc°nversant: +1 \[speaker: -- 
3rd " \['conversant: "1 Lspeake~ o 
The corresponding negative constraints are: 
,.,. r-roo,,,,.,...,.-\]\] L tspeaker. - 
2nd =" \[--\['conversant:-\]\] 
3rd - (no constraints) 
Assuming that generalization with negative constraints 
works as indicated above, i.e. negative constraints are al- 
ways inherited, it immediately follows that the generaliza- 
tion of Ist person with any other person is compatible with 
only 1st person and that 2nd person wins over 3rd when 
they are combined. The results are as follows. 
rconversant: +\]\] 
181; + 2rid = \]_Foonversant: 
L L speaker, - 
,,,,,. ,rd- _ 
I-,pea,,.,.: _ \] 2nd + 3rd = .\] 
Note that the proper part of lst+2nd excludes 3rd person. 
It is compatible with both 1st and 2nd person but the 
negative constraint rules out the latter one. In th~ case 
of lst+3rd, the negative constraint is compatible with 1st 
person but incompatible with 2nd and 3rd. In the last case, 
the specification \[speaker: -\] rules out 1st person and the 
negative constraint -\[conversant: -\] eliminates 3rd person. 
When negative constraints are counted in, 1st person 
is the most and 3rd person the least marked member of 
the three. In that respect, the proposed analysis is in line 
with traditional views on markedness. Another relevant 
observation is that the negative constraints on which the 
result crucially depends are themselves not too unnatural. 
In effect, they say of 1st person that it is "neither 2nd nor 
3rd" and that 2nd person is "not 3rd." 
It will be interesting to see whether other cases of 
markedness can be analyzed in the same way. 
32 
6. Acknowledgements 
I am indebted to Martin Kay for introducing me to uni- 
fication and to Fernando Pereira, Stuart Shieber, Remo 
Pareschi, and Annie Zaenen for many insightful sugges- 
tions on the project. 
References 
Eisenberg, Peter, "A Note on Identity of Constituents," Linguis- 
tic Inquiry 4:3..117-20 (1973). 
Gazdar, Gerald and G. Pullum. "Generalized Phrase Structure 
Grammar: A Theoretical Synopsis." Indiana University 
Linguistics Club, Bloomington, Indiana (1982). 
Kaplan, Ronald M. and Joan Bresnan, 1983: "Lexieal- 
Functional Grammar: A Formal System for Grammatical 
Representation," Ch.4 in J. Bresnan, The Mental Repre- 
sentation of Grammatical Relations (ed.), Cambridge, MIT 
Press. 
Kay, Martin, 1979: "Functional Grammar." Proceedings of the 
Fifth Annual Meeting of the Berkeley Linguistic ,Society, 
Berkeley l,inguistic Society, Berkeley, California (February 
17-19, 1979), pp. 142-158. 
Pereira, Fernando and Stuart Shieber, 1984: "The semantics of 
Grammar Formalism Seen as Computer Languages." Pro- 
eeedh2gs of the Tenth International Conference on Compu- 
tational Linguistics, Stanford University, Stanford Califor- 
nia (4-7 July, 1984). 
Sag, Ivan, Gerald Gazdar, Thomas Wasow, and Steven Weisler, 
1984: "(Joordination and How to Distinguish Categories." 
CLSI Report No. 3. Center for the Study of Language and 
Information, Stanford, Ca., (March 1984). 
Shieber, S., II. Uszkoreit, F. Pereira, J. Robinson, and M. Tyson, 
1983: "The Formalism and Implementation of PATR-II," 
in B. Grosz and M. Stiekel, Research on Interactive Acqui- 
sition and Use of Knowledge, SRI Final Report 1894, SRI 
International, Menlo Park, California (November 1983). 
A. Appendix: Some Examples of 
Unification 
(These examples were produced using the Texas version of 
the DG package.)_ ro.,e: <oom .oo> 
die / \[r0.o0,,: "mll i 
n.: I'o': i Ln''mb'': so j? L 
tr,,umb,,: pO J -=f 
nfl: ~ , Fgender: neut 
L ag`` \[number: pl 
die Kinder = f \[o,,.:<oom.oo> n 
o,,: L,,0,: r~,.o<,.,: neu.l// \[number: pl ,JJJ 
i \]\]I den = II rg.nd.,: n,,: ~L'"" t~omO.," ",;'°\] IF,,.. 0,,, l,L,,g,: \['number,, PO 
den Kinder = *FAILS* 
f , den Kindez"r, = tease,, a.t .euql nfh | r rgender: L ='°: L(umber: p, .Jj 
I = r ro=,e: nora ' t 
nJ\[, I Fnumber: L "°': Lperson: IstJ 
I he = J rgen~e,.: ,'...s= nfl: tagr: \[number', sg 
L Lperson: 3rd 
°'"°' :\]\]\] do = \[ F-Fnumber: sg 
nfh La.,: L \[person: 3r 
Ido= ~ense: present II 
lease: nom ll 
nil: I Fnumber: sglll L -~r L.erson: ,,uJ\] 
he do = *FAILS* 
< . + 
LI:: 
(Unify x y) 
= \[::.;\]\] 
f:: 
• ; \[: :\]: 
(Unify (Unify x y) z) 
b: 
33 
