Constraints, Exceptions and Representations 
T. Mark Ellison 
Centre for Cognitive Science, University of Edinburgh 
2 Buccleuch Pl., Edinburgh EH8 9LW, U.K. 
raarke@cogsc±, ed. ac. uk 
Abstract 
This paper shows that default-based phonologies have 
the potential to capture morphophonological generali- 
sations which cannot be captured by non-default theo- 
ries. In achieving this result, I offer a characterisation 
of Underspecification Theory and Optimality Theory 
in terms of their methods for ordering defaults. The 
result means that machine learning techniques for bull- 
(ling declm'ative analyses may not provide an adequate 
b~is for morphol)honological analysis. 
Introduction 
In other work, I have shown (EUison 1992, forthcoming) 
that interesting phonological constraints can be learned 
despite the presence of exceptions. Each of these con- 
straints imposes a limit the set of possible words at a 
common level of repre~sentation. In this paper, I consi- 
der possible limits to the usefulness of these constraints 
in representing morphemes and finding concise repre- 
sentations of lexical entries. 
In order to compare a strictly declarative formalism 
with other constraint formalisms, a common formal en- 
vironment must be established. Using model theory 
to establish the relationship between description and 
object, and then a modal formalism to define the struc- 
tures to which constraints apply, we can compare the 
different effects of strict constraints and defaults. In 
particular, a strict declarative approach can be com- 
pared with other constraint frameworks such as Un- 
derspecification Theory (UT) (Archangeli, 1984) and 
()ptimality Theory (OT) (Prince & Smolensky, 1993). 
This discussion is followed in tim latter part of the pa- 
l)or by consideration of the possibility of using machine 
learning to constraint systems that use defaults. 
Morphophonology 
To structure the disct~ssion, I offer four desiderata for 
morphophonology. The first is that the morphophono- 
logy must allow concise lexical representations. Where 
information is predictable, it should not have to be spe- 
cified in the lexicon. This desideratum is not a matter 
of empirical accuracy, rather one of scientific aesthetics. 
For example, English contains no front rounded vowels, 
so a vowel which is marked as front in the lexicon need 
not be marked as unrounded. 
The second desideratum is that the morphophono- 
logy should allow generalisations to be made over pho- 
nologically conditioned aUomorphs. For example, a re- 
presentation of the Turkish plural affixes -lar, -ler, that 
uses the feature \[:t:front\] is superior to a segmental re- 
presentation because a single representation for the two 
allomorphs can be achieved by not specifying the value 
for this feature in the representation of the morph. 
The third desideratum requires ttlat the specific allo- 
morphs be recoverable from the generalisations. If-lar 
and -ler are generalised in a single representation, such 
as -IAr, then the morphophonology should make the 
recovery of the allomorphs in the correct environments 
possible. 
The final desideratum is, like the first, a matter of 
scientific aesthetics: a priori abstractions should not be 
used in an analysis any more than is necessary. For 
example, the feature \[:t:front\] should not be used in the 
analysis of a language unless it is motivated by structu- 
res in the language itself. This desideratum may con- 
flict with the first: a priori features may result in a more 
concise representation. 
These four desiderata provide a framework for evalua- 
ting the relative merits of monostratal systems of pho- 
nological constraints with other current theories such 
as Underspecification Theory and Optimality Theory. 
Model Theory and Modal Logic 
A fundamental distinction in any formal account is the 
distinction between description and object. Failure to 
make the distinction (:an lead, at best, to confusion, 
and, at worst, to paradoxes, such as Russell's Paradox. 
Because this theory is talking about theories, it must 
make the distinction explicitly by formalising the relati- 
onship between description and object. This distinction 
is pursued in below and developed into a formalism for 
complex structures in the following section. 
Model theory 
In model theory, the meaning of a statement in a formal 
l~mguage is provided by means of an INTERPRETATION 
25 
FUNCTION which maps the statement onto the set of 
(Jbje(:ts for which the statement is true. If L is a lan- 
guage and W is a set of .t)jects, and P(W) is the set 
of all snl)sets of W, then the interpretation function I 
ma.ps L onto P(W): 
I : L ~ ~(W). 
As an example, suppose & is a horse, ~ is a ferret 
and q) is a large stone, and that these are the objects in 
our world. We might define a language L0 containing 
the terms big, animate, slow and human, and assign 
these terms the interpretations in (1). 
(1) Term T Interpretation I0 (T) 
big {a, V} 
animate {$, ~ } 
slow { ~ , V} 
human {} 
This language can be expanded to include the logi- 
cal operations of conjunction, disjunction and negation. 
These are provided a semantics by combining the se- 
mantics of the terms they apply to. 
(2) Term Interpretation 
l • io Io(l) 
X A Y I(X) N I(Y) 
X VY I(X) UI(Y) 
-~x w \ i(x) 
With this interpretation function, we can determine 
that big A animate A slow is a CONTRADICTION having 
a null interpretation in W, while big V slow is a TAUTO- 
LOGY as I(big V slow) is the same as I(big) U I(slow) 
which equals W. 
The term PREDICATE will be used to describe a sta- 
tement in a language which has a well-defined interpre- 
tation. 
Modal logics 
Model theory as defined in section applies only to do- 
mains with atomic, unstructured objects. More com- 
plex structures can be captured by extending the theory 
of models to refer to different worlds and the relati- 
onships between them. Such a complex of worlds and 
relations is called a MODAL logic. 
A modal theory consists of a universe U is a set of 
worlds Wj,jew, called TYPES, together with a set of re- 
lations Rk,kET¢ : Wdom(j) ~ Wcod(k ) from one world 
to another. Types may contain other types, and whe- 
never a type is so contained, it defines a characteristic 
relation which selects elements of that subtype from the 
larger type. A language for this universe is more com- 
plex as well, needing a function w : L ---+ I to indicate 
the type W~( 0 in which any given expression l is to be 
interpreted. A MODAL OPERATOR rk is a special sym- 
I)ol in tile language which is interpreted as the relation 
Rk. 
Mo(hfl operators can combine with predicates to con- 
struct new predi(:atcs. If ¢ is a predicate, rk is a 
modal operator and w(¢) = cod(k) then we can de- 
fine am interpretation, I(rk¢) C Wdom(k) , for rk¢, 
nanmly R~ I\[I(¢)\]. l~lrthcrmore, we define the type 
of the expression to be the (lomain of the fimctor: 
w(rk¢) = dom(k). The interpretation of any well- 
formed sentence in this language is a sul)set of the cor- 
responding world I(¢) C_ W~(¢). 
From here on, we will assume that tile Rk,ken are 
functions, and call the corresponding operators of the 
language FUNCTORS. Functors simplify the interpreta- 
tion of predicates: inverses of functions preserve inters- 
ection, so functors distribute over conjunction as well 
as disjunction. 
A path equation defines a predicate which selects ent- 
ities that have the same result when passed through 
two different sequences of functions. Suppose that p 
and q are two sequences of functors with the same first 
domain and last codomain, and that the composition 
of the corresponding sequences of functions are P and 
Q respectively. Then the interpretation of p = q is 
the set of entities x in the common domain such that 
P(x) = Q(x). 
Suppose the universe U consists of seven worlds, a, 
b, c, alphabet, nullstring, nannullstring and string. 
Some of these worlds are built from others: alphabet is 
the disjoint union of a, b and c, while string is the dis- 
joint union of nullstring and nannuUstring. Linking 
these types are the three functors shown in (3). 
(3) right : nonnullstring ~ string 
left : nonnullstring ~ string 
head : nonnullstring ~ alphabet 
We subject these definitions to the path equation that 
right left x and left right x equal x for all non-null 
strings x. 
A predicate in the corresponding modal language, 
using only the characteristic predicates of the types 
and the functors, might be: head a meaning the set 
of non-null strings whose first letter is a, left head a A 
right head c to specify the context a__c, or head c A 
right(head a A right(head b A right null)). 
By the use of functors, we can move from one type 
to another, or from one item in a type to another item 
in the same type. Metaphorically, we will call the ty- 
pes joined by fimctors LOCATIONS, particularly when 
the type instances are only distinguished by flmctorial 
relationships with other types. 
In a complex structure, like a string, the functors pro- 
vide a method for interrogating nearby parts of the the 
structure within a predicate applying at a given posi- 
tion. By an appropriate choice of types and functors, 
complex feature structures and/or non-linear represen- 
tations can be defined. For the sake of simplicity, the 
discussion in the remainder of this paper will be restric- 
ted to strings constructed using the types and functors 
defined above. 
26 
I 
Constraints in a modal theory 
In model-theoretic terms, a constraint is any well- 
formed expression in the language to which an inter- 
pretation is attached. Phonologists also use the term, 
usually intending universal application. It will be used 
here for a single predicate applying at a particular lo- 
cation in structure. 
As an exmnple of a constraint, consider front vowel 
harmony in Turkish t. Informally, we can write this con- 
straint as if the last vowel was front, so is the current 
one. In the format of a phonological rule, this might 
be written as \[+front\]C*J~ ~ \[+front\], where C* stands 
for zero or more consonants. F is used to represent the 
disjunction of all of the front vowels. 
(4) Left = ~ (left head C h left Left)V 
left head F 
Constraint = head F V --,Left 
In (4) the left context is abstracted into a named pre- 
dicate called Left. This is because the left context 
iterates over consonants. This iteration appears in the 
definition of Left as the recursive call: if the imme- 
diate left segment is a consonant, move left and check 
again. Left succeeds immediately if the immediate left 
segment is a front vowel. 
Note the the predicate defined here imposes no re- 
strictions at all on where it applies except that it be a 
non-null string. On the other hand, it only applies at 
the current location in structure. The relationship bet- 
wecn constraints and locations is the topic of the next 
section; first in the discussion of features, and then in 
the prioritisation of default feature assignment. 
Features, Underspeeifieation and 
Defaults 
The question ariscs as" to what basic predicates should 
be used in defining the lexical specification of phono- 
logical items. Lexical specifications in phonology are 
traditionally built from binary features. While the the 
feature values usually correspond to a priori predica- 
tes, there is no reason why a feature cannot be defi- 
ned for an arbitrary predicate: ¢ defining the feature 
\[+¢\] everywhere that ¢ is true and \[-¢\] everywhere 
that ¢ is false. This section includes discussion of two 
kinds of feature system here: A PRIORI and EXCEPTION- 
MARKING. 
A priori features 
Traditionally, the choice of features is made a priori 
(an A Priori Feature System -- APFS). This does not 
mean that phonologists do not select their feature sets 
to suit their problems, rather that they do not approve 
of doing so. Instead, acoustic or articulatory grounds 
t "l~lrkish Ires eight vowels, a, e, i the back version of i, o 
and its front correlate 6, and u and the corresponding front 
vowel /i. 
are sought for a universal set of features which will serve 
for all analyses. 
Furthermore, features in traditional systems are con- 
text free. The predicates defining the features do not 
make reference to neighbouring structures, such as the 
segment to the right or the left, in order to determine 
the feature polarity in a given position. Feature va- 
lues depend only on the segment at that position in the 
string. 
Continuing to draw our examples from Turkish vo- 
wels, front can be thought of as the predicate head (eV 
i V 6 V fi). This predicate is context-free: there are no 
uses of the functors left and right in the definition. 
We can define the feature values \[+front\] and \[-front\] 
as holding at each non-null position in the string where 
front is true and false respectively. 
Exception-marking features 
A more adventurous feature system brings context to- 
gether with the local segmental value to define its fea- 
tures. The question arises as to which predicates from 
this wider range should be chosen. The principle of 
Epicurus (Asmis, 1984) suggests that no choice should 
be made until direct evidence is adduced. In this do- 
main the evidence comes in the form of a constraint on 
phonological structure. So, if it appears that ¢ is an 
interesting constraint on phonological structure, then 
\[=t=¢\] should be used as a feature. This choice is less ad 
hoc than introducing new predicates a priori. 
As an example of this kind of feature assignment, con- 
sider the constraint (4) applied to the word seviyorurn 
I like (cts), which has the structure shown in (5). 
mdl nut! 
T T .... 
n-, n n ~ i~n n-n ~ n-u n.n 6---r ,., n., ,*n 
(5) ..... , ........ 
The features assigned by the constraint are shown in 
(6). For clarity, the segments and head functors are not 
shown. To make the clearer, the positive and negative 
feature marks are shown as ticks and crosses respec- 
tively. 
nl.dl mdl 
T'* T" 
I¢N jZ.j t rj-pZ ii-f/ zl.n ~ rl-if k'fs Ion 
In only one case does this feature assign a negative va- 
lue, ie. there is only one exception to the constraint in 
this word. This exception is the occurrence of the back 
vowel o after the front vowel i. 
The segments themselves provide non-arbitrary 
context-free predicates which can be used as features. 
For example, we could define a feature \[:t:a\] which is 
true if and only if head a is true. 
These kind of feature systems are called EXCEPTION- 
MARKING FEATURE SYSTEMS (EMFSs) becm~se it is ex- 
ceptions to identified constraints which define all but 
the most basic features. 
27 
Underspecification 
In EMFSs the number of features is likely to be much 
b~rger than in traditional systems. On the other hand, 
each of the features correspond to either a segment or 
a phonological constraint or a segment, so the system 
as a whole is ontologically simpler than a APFS. Ne- 
vertheless, unless some method of compression is used, 
EMFSs will demand verbose lexical forms. Two types 
of compression are familiar to, though seldom distin- 
guished by, phonologists: redundancy and defaults 2. In 
terms of model theory the distinction is clear. Redun- 
dancy rules have no effect on the interpretation function 
I, while defaults modify it. This section discusses un- 
derspecification that eliminates redundancy. The next 
section discusses defaults. 
A predicate ¢ is FULLY SPECIFIED FOR another pre- 
dicate ¢ if either ¢ is more specific than ¢, that is, 
I(¢) = I(¢)NI(¢), or ¢ contradicts ¢, I(¢)f'lI(¢) = 0. 
A FULLY SPECIFIED predicate is one which is fully spe- 
cified for all other predicates. 
Intuitively, a fully specified predicate is one which is 
indivisible. There is no extra restriction which can be 
imposed which will make it more specific; it can only 
be contradicted. If ¢ is a fully specified predicate, then 
there is no point in adding further information to it. 
If the interpretation function I is computable, then 
each feature value at each position in a fully-specified 
structure can be calculated. If the conjunction of the 
feature predicate with the structure has a null inter- 
pretation, then the feature is false, otherwise it is true. 
Consequently, so long as a predicate remains fully speci- 
fied, any feature specifications which are removed from 
it can be recovered. 
In APFSs, the constraints associated with features 
will not be very interesting. When the features are con- 
textual constraints, however, regaining the full specifi- 
cation amounts to a process of phonological derivation 
albeit one of simultaneous application of constraints. 
Let us utilise the Turkish vowel set for another exam- 
pie. Suppose each vowel is assigned a feature, and so 
is the vowel harmony constraint, (4). For each vowel, 
x\[ marks the presence of the vowel, × its absence. The 
same symbols mean the satisfaction of a constraint or 
its failure. Table (7) shows redundant feature specifi- 
cations with a box around them. The example word 
is severira I like. Features for the consonants are not 
shown for the sake of brevity. 
(7) 
2Calder & Bird (1991) make this distinction using the 
(',l'SG-like terms feature-cooccurrence restrictions (FCi~s) 
;tlld ti~ature-specification defaults (FSDs). 
s e v e r i m 
Constraint(4) ~/ y/ ~/ ~/ ~ ~/. ~/ 
a X X X ~ X ~ X 
e X ~ X X X X 
! X X X X IX I X 
i Z X X X X ~ X 
0 X X X ~ X X 
i 
x x x x x x x 
x x x \[xl x Ixl x U 
X X X X X X X 
Note that this is not the only possible selection of redun- 
dant specifications. If the vowel feature specifications 
are regarded as primary and non-redundant, then the 
constraint feature values can all be regarded as redun- 
dant. 
At this point we can define the declarative phonolo- 
gical formalism we are evaluating. It is an EMFS with 
redundant features removed, called Exception Theory 
(ET). 
Defaults 
Identifying fully specified predicates allows us to com- 
press representations by removing predictable specifi- 
cations from predicates. This compression method can 
be enhanced by modifying the interpretation fimction 
so that more predicates are fully specified. 
A DEFAULT is defined in terms of a special predicate 
which will not need to be specified in individual repre- 
sentations. A representation will be conjoined with the 
default predicate unless it is already fully specified for 
it. 
There may be a number of default predicates in a 
default system. For this reason the formal definition of 
the effect of defaults on the interpretation function has 
the recursive structure shown in (8): 
(s) 
x~,~(¢) = I~(¢) 
if ¢ is fully specified for 6 wrt Ia, 
or I~(¢) n Ia(6) otherwise. 
Each default predicate specifies its action at only one 
position in the structure. If the default is to apply at 
many positions in a structure, more default predicates 
must be added to cover each position in the structure. 
For example, take the default predicate ~ to be the 
feature \[-front\] equivalent to the predicate head (a V 
l V o V u). Let ¢ be the partial specification for klzlam 
(her) girls in which each vowel is underspecified for the 
feature front. Then the interpretation I\[-front\](¢) of 
¢ subject to the default 5 applied at the location a 
contains only the four forms klzlam, kizlarl, k~zlari 
and kizlari. Forms such as kizlem are ruled out by 
the default at the a position. 
To make the same default restriction at the other 
vowels in the word, we would need to other defaults 
such as left left 5 and right right 6. 
28 
Default ordering 
Applying defaults is not necessarily commutative. One 
default may preclude ithe action of another. Consider 
the case where two feature values \[-front\] and \[+front\] 
are imposed as defaults to the completely unspecified 
predicate true. Because true is not fully specified for 
either \[-front\] or \[+front\], these defaults add specificati- 
ons to the predicate: /\[-front\] (true) is I(\[-frontl) while 
I\[+front\](true ) is/(\[+front\]). But \[-front\] is fully speci- 
fied for \[+front\] (and vice-versa), so adding \[-front\] (or 
\[-front\]) as a default will have no effect on the interpre- 
tation. Thus the two orderings of the defaults produce 
conflicting interpretations. 
(9) I\[+frontll_front\] (irue) 
= /\[_front\] (\[+frontl) 
= /(\[+front\]) 
# /({-front\]) 
= I\[+front\] (\[-front!) 
= I\[_front\]\[+front\](true) 
Since the two orderings produce different results, a de- 
cision about the ordering of defaults must be made. 
Default Ordering Schemes 
Ordering 
Defimlts need to be ordered. There are a number of 
ways that the ordering of groups of defaults can be spe- 
cified. Three of these are presented here. 
Ordering by feature 
One method for ordering defaults is to order the fea- 
tures they instantiate. We begin with an ordering on 
the features, so that, for example, feature \[+F\] has hig- 
her priority than feature \[+G\], in symbols \[+F\]~\[+G\]. 
This ordering on features, can then be extended to an 
ordering on defaults specified with those features. 
Suppose p and q are paths in string structure, com- 
posed of sequences of !eft and right functors. Then 
for any defaults filling in predicates 6 = p\[+F\] and 
e = q\[+G\], 5 is ordered before e if and only if \[+F\] 
has higher priority tha n \[+G\]. 
Suppose a language is analysed as imposing a hig- 
her priority default that front vowels cannot occur af- 
ter round vowels. Assume that the defaults insert the 
features \[+front,\] and \[+round\] in all positions. Given a 
fi)rm kVtV where V represents the completely unin- 
stantiated vowel, there are two different instantiations 
depending on the ordering of the two features. If the 
\[+fi'ontl default applies first, then the resulting form will 
be k\[+front\]t round . If, on the other hand, the 
\[+round\] default applies first, the derived form will be 
k\[+round\]t \[++ front round \] " 
Ordering by failure count 
Another approach orders defaults instantiating the 
same feature in different positions. The preferred de- 
fault minimises the number of contradictions to the de- 
fault feature value. 
Suppose the default feature value to be ordered is 
\[+F\]. The failure count default ordering mechanism 
uses a default predicate for each possible number of 
exceptions. The predicates, 6i, are defined in (10). 
(10) 6, = V' right 6j^6k 3+k=i 
3o = left (nullV6oA\[+F\]) 
$~ = left (nuUV3,^\[+F\]VL_x^\[-F\]) 
go = right (nullV$oA\[+F\]) 
$i = right (nullV6iA\[+F\]V6i-lA\[-F\]) 
If 5i is compatible with a predicate ¢, then there is a 
fully-specified restriction on ¢ which has no more than 
i occurrences of \[-F\]. The ordering on the defaults is 
imposed by requiring that for any feature \[+Fi\], with 
the corresponding predicate 6i, 5i has priority over 5j 
iffi < j. 
Suppose we already have a number of higher prio- 
rity constraints on stress: that it can only be assigned 
once and in only one position within a syllable, and 
that consecutive syllables cannot be stressed. Collap- 
sing the representation of syllables into a single symbol 
a for convenience, table (11) gives the assignment of 
stress to a number of partially specified representati- 
ons. The default feature is \[+Stress\], and this is applied 
to minimise the number of failures. 
(I1) ¢ \[~Strrees\] 
After defaults 
or 
Location 
¢ \[:i:Stress\] 
After defaults 
Location 
¢ \[~Stress\] 
After defaults 
Location 
+ + - + + - 
+ - + - + + - 
+ + - + + + 
+ 
- + - + + + - 
Ordering by position 
Another possibility is to order defaults by how far away 
from the starting position they specify their features. 
There are two simple ways of relating distance to prio- 
rity: closer means higher priority, or further away me- 
ans higher priority. 
The formal definitions for this kind of default orde- 
ring are straightforward. Suppose, once again, that \[+F\] 
is the feature value to be filled in by the defaults. Now, 
6i will denote the specification of a default value at a 
distance of i functors to the left, or i to the right of the 
starting position. 
(12) 5i = 
6o = 
~i+ l 
(~i+1 = 
right~iA6i 
\[+F\] = ~0 
left 5i V null 
right ~i V null 
To prefer near defaults, prefer Ji over 5j when i < j. 
For far defaults, do the reverse. 
29 
Directional default preferences minfic the application 
of phonological rules in a left-to-right or right-to-left di- 
rection. Using this ordering, directional defaults (:an re- 
strict some structures which the counting defaults can- 
not. Consider once again the stress assignments by de- 
faults in table (11). Instead of simply trying to maxi- 
mise the number of stresses, assume that the starting 
position is the left end of the word, and that near stres- 
ses are given priority. Under this system of defaults, 
the first of the three underspecified representations is 
rendered more specific, while the other two make the 
same restriction. These results are shown in table (13). 
(13) 4 \[:LStre~s\] 
After defaults 
Location 
¢ \[....=Stress\] 
After defaults 
Location 
,ib \[~Stress\] 
After defaults 
Location 
÷ ÷ -b - .4- 
ff ~r ff q q q ~ q 
÷ ÷ -.~ - 4- ÷ 
ff q ff ~ q ~¢ q ff q 
÷ 
- + + - ÷ ÷ 
Three Theories 
Underspecification Theory 
Within the framework given above, it is possible to 
define a form of Underspecification Theory. What is 
described here is not precisely the Underspecification 
Theory of Archangeli (1984), differing in that the struc- 
tures described are linear and segmental. This is, ho- 
wever, not a necessary limitation of the framework, and 
the definition of of underspecification theory presented 
here could be applied to autosegmental representations 
if suitable types and functors were defined for them. 
In UT, lexical specifications are made in terms of an 
a priori fixed set of features. For example, Archangeli & 
Pulleyblank (1989) use the four features \[±high\], \[±low\], 
\[±back\] and \[±ATR\] to describe seven Yoruba vowels. 
All lexical specifications of vowel quality are assumed to 
involve specifications for some subset of these features. 
In the lexical specifications, redundant information is 
left unmarked. The Yoruba vowel a does not need to 
be marked for any feature other than \[+low\], because 
there is only one vowel which is \[+low\]. Consequently, 
the feature values \[+back\], \[-high\] and \[-ATt~\] are all 
redundant. 
In UT, redundant features are are filled by rule. Spe- 
cial constraints, such as the Redundancy Rule Orde- 
ring Constraint (Archangeli, 1984:85) ensure that re- 
dundancy rules apply before the features they instan- 
tiate are referred to. Furthermore, these constraints 
apply as often as necessary (Archangeli & Pulleyblank, 
1989:209-210). This has the same effect as the auto- 
matic specification of redundant feature values in the 
(:urrent framework. 
Only one type of feature value is ever lexically spe- 
cified in UT. Opposite feature values are filled in by 
default rules. This allows the feature specifications for 
some segments to be subspecifications of those for other 
se~lnelltS. 
Apart from the context-free features used ill lexi- 
cal specifications, there are also context-sensitive con- 
straints which are regarded in UT as fiflly-fledged pho- 
nological rules. For example, the Yoruba vowel har- 
mony rule can be summarised as a vowel on the le~t 
of a \[-ATR\] vowel will also be \[-ATR\]. Regularity to 
this constraint in one position may conflict with regula- 
rity in another position. In UT, the defaults associated 
with such constraints are ordered by position: Yoruba 
vowel harmony applies right-to-left in the sense that 
constraint applications further from the beginning of 
the word have higher priority. 
This directionality is not the only ordering of de- 
faults. As it happens, there are no \[+high\] vowels in 
Yoruba which are also \[-ATR\]. Consequently, the de- 
fault rule marking vowels as \[+high\] can conflict with 
the default that spreads \[-ATR\]. In tim analysis of 
Archangeli & Pulleyblank the \[+high\] default is ordered 
first. All defaults constructed from the one feature have 
priority over all defaults built on the other. 
The general structure of UT, therefore, is to have an a 
priori limited set of features for lexical specification and 
a set of defaults for these features and for constraints. 
The defaults associated with each feature or constraint 
are ordered by position. 
Optimality Theory 
Optimality Theory (Prince & Smolensky, 1993) is ap- 
parently a very different theory, but, when classified in 
terms of its use of defaults, is actually quite similar. 
In contrast to UT, OT is deliberately vague about 
underlying representations. Instead of discussing the 
manipulation of representations directly, OT refers to 
their interpretations, terming them CANDIDATE SETS. 
Constraints in OT apply exactly like defaults. If they 
can be imposed without resulting in a contradiction 
(empty candidate set), then they are. Each constraint 
imposes a set of defaults, and these are primarily orde- 
red by an extrinsic ordering placed on the constraints. If 
any two defaults pertaining to two constraints conflict, 
the default of the higher order constraint is preferred. 
As with UT, there is the possibility that tile impo- 
sition of the the santo constraint at different locations 
will conflict. Rather than ordering these defaults by po- 
sition, they are ordered by the number of exceptions to 
the constraint that they allow. If there is a candidate 
form with a certain number of exceptions, all candi- 
dates with more exceptions will be eliminated by the 
default. This ordering on defaults is the ORDERING BY 
FAILURE COUNT described earlier. 
Exception Theory 
In contrast to the other two, more standard, phonolo- 
gical theories, Exception Theory does not use defaults. 
In ET, each lexicai form is fully specified, and any fea- 
ture in it may be removed so long as this property is 
preserved. 
The set of features includes a feature for each seg- 
nmnt type, and a feature for each constraint. While 
this results in a large set of features, underspecification 
of redundant features means that many feature specifi- 
cations may be eliminated. Nevertheless, there will be 
more feature specifications needed in ET than in, for 
example, UT, because of the absence of default values. 
On the other hand, because ET uses no defaults, 
there is no need for any form of constraint or rule or- 
dering. All features have an immediate interpretation 
through the interpretation function, and so a minimum 
of computation is needed to identify the denotation of 
a representation. 
Summary 
Table (14) smnmarises the attributes of the three theo- 
rivs. UT and OT are primarily distinguished by the use 
of different methods to order defaults built from con- 
straints. ET differs in that it does not use defaults at 
all. 
(14) 
UT OT ET 
A priori features ~ x x 
Defanlts : y/ ~ x 
By Feature primary primary x 
By Failure Count x secondary x 
By Position secondary x x 
Discussion 
Early in this paper, four desiderata for morphophono- 
logical theories were introduced. This section considers 
whether using defaults is advantageous with respect to 
these desiderata. 
Conciseness 
The first desideratum sought concise lexical representa- 
tions for morphemes. Since default-based theories can 
also exploit underspecification of redundant feature va- 
lues, they are at least as concise as non-default theories. 
If there are ever contrastive feature specifications, then 
they are more concise, allowing one side of the contrast 
to be left, as a default value to be instantiated. 
Note that the concept of conciseness which is being 
used here is feature.counting, not an information- 
theoretic measure. In a direct application of informa- 
tion theory, contrasting a \[+F\] feature value with whi- 
tespace carries as much information as contrasting it 
with l-F\] 3. 
Abstracting and recovering morphemes 
Defanlts also provide advantages in abstracting mor- 
pheme representations from which allomorphs can be 
aIt may be possible, nevertheless, to provide an infor- 
mation theoretic basis for the feature-counting notion by 
couching the feature specifications in a suitable descriptive 
language. 
recovered. As well as making representations more 
concise, using defaults allows more allomorphs to be 
brought together within a single phonological represen- 
tation. As there are no feature changing rules in tile 
framework, all feature values in the abstract represen- 
tation must survive to the surface in ca.oh allom,~rl~h. 
Conversely, the abstract representation can only con- 
tain feature specifications common to all of the allo- 
morphs. So the upper bound on feature specifications 
for the abstract morpheme is the is the intersection of 
the featural specifications for all of the allomorphs of 
the morpheme. 
As an example, consider four allomorphs of the Tur- 
kish second person plural possessive suffix! -mxz, -iniz, 
-unuz and -ilniiz. If the vowels are specified with the 
three features \[=Lfront\], \[:t:round\] and \[±high\], then the 
iatersection of the specificati(ms of the four alh)m(~rl~hs 
is the sequence \[+high\]n\[+high\]z. 
While it is always possible to form abstract represen- 
tations by intersecting feature values (the second de- 
sideratum), there is no guarantee that the allomorphs 
will be readily recoverable (third desideratum). If they 
are not recoverable, then there is no single featural ge- 
neralisation which captures the phonological structure 
of the morphemes. 
One important question is whether defaults allow 
recoverable generalisations about a greater range of 
morphemes than non-default representations. The an- 
swer is yes. If the morphological alternations is one- 
dimensional, then there is no difference between having 
defaults and not. Suppose 5 is a default predicate, and, 
equally, an exception feature. If all allomorphs are spe- 
cified \[+~\] then the abstraction will share this feature, 
and so the default does not need to apply. Similarly 
if all allomorphs are specified \[-6\], so will the abstract 
forms be, and the default cannot apply. If the allomor- 
phs vary in their specification for \[±5\], then the abstrac- 
tion will not have include a specification for this feature. 
Consequently, the default will specify \[+J\] when the cor- 
rect value is l-J\], and so not fail to produce the correct 
result. In the non-default interpretation, the represen- 
tation is never fully specified. 
On the other hand, if the morphological alternations 
form a two-dimensional paradigm, then it is possible 
that the paradigm might be decomposable into mor- 
phemes only with the use of defaults. Suppose, once 
again, that J is a default predicate and exception fea- 
ture. The default feature value is \[+5\]. Suppose further, 
that there is a paradigm with the feature specification 
for \[:t=5\] shown in (15). 
(15) \[-~\] \[0~\] 
\[-~\] \[-~\] \[-~\] 
\[0~1 \[-~\] \[+~1 
The margins show the 'morphemes' extracted by inters- 
ecting the feature values. The conjunction of the two 
\[05\] specifications is not fully specified for 5, and so its 
direct interpretation does not recover the corresponding 
31 
component of the paradigm. If, however, the default 
\[+6\] is applied, the full specification of the paradigm is 
recovered. 
So it is possible to have paradigms where the morpho- 
logical components cannot be assigned common phono- 
logical representations without the use of defaults 4. 
A priori specifications 
The final desideratum is the avoidance of a priori in- 
formation in a model. UT makes use of an a priori set 
of features for lexical specification. As other generali- 
sations in the formalism are only visible insofar as they 
affect the values of these features, this limits the pos- 
sible constraints which can be identified. This is the 
reason why vowel harmonies such as that of Nez Perce 
are so problematic for phonologistsS: the sets of vowels 
used in the harmony do not have a neat definition in 
terms of traditional features. 
Greater claims about a priori features are made in 
OT. Prince & Smolensky (1993:3) state that constraints 
are essentially universal and of very general formulation 
... interlinguistic differences arise from the permutation 
of constraint-Tunking. In other words, all of the predica- 
tes which define features in OT are prior to the analysis 
of an individual language. 
In ET, very little is assumed a priori. Any constraint 
which captures interesting phonological generalisations 
about the phonology defines a feature which can be used 
to specify structure. Because ET does not use defaults, 
it need not be concerned with ordering constraints, only 
with finding them. Consequently, interlinguistic diffe- 
rences can only result from distinct sets of constraints. 
Conclusion 
In this paper I have presented a rigorous framework for 
characterising theories that use defaults with phonolo- 
gical structure. The framework provides a straightfor- 
ward characterisation of Underspecification Theory and 
Optimality Theory in terms of the action of defaults. 
Using this framework, I have shown that non-defanlt 
theories cannot be sure of capturing all of the generali- 
sations which are available to default theories. For this 
reason, the non-default constraints learnt by programs 
suctl as ttmse described by Ellison (1992, forthconfing), 
are not as powerful for morphophonological analysis as 
default-based theories. Furthermore, defaults lead to 
more concise, and consequently preferable, lexical re- 
presentations. 
4If general predicates are permitted for specifying mor- 
phemes, rather than just featural specifications, the distin- 
ction between default and non-default systems disappears. 
If the entries in the l)aradigm are ~ij, define o~i to be Vj ~ij 
a.ml fl.j I.o be Ai((ij V "~,~i). Then, s(, long as |,ll~ t~i are di- 
si,im:t (wiiich will l)e tim case if the (i.i are all distinct), then 
the i)~tradigm will be fully recoverable without defaults. 
5Anderson & Durand (1988) discuss some of this 
literature. 
The question, therefore, is how to enhance the lear- 
ning algorithms to involve the use of defaults. The in- 
troduction of defaults means that constraints must be 
ordered; so learning must not only discover the right 
constraint, it must assign it a priority relative to other 
constraints. This makes the learning task consideral)le 
more complicated. However difficult a solution for this 
problem is to find, it will be necessary before m~u:hinc- 
generated analyses can be sure of competing succes- 
sfully with man-made analyses. 
Acknowledgements 
This research was funded by the U.K. Science and En- 
gineering Research Council, under grant GR/G-22084 
Computational Phonology: A Constraint-Based Ap- 
proach. I am grateful to Richard Sproat and Michael 
Gasser for their comments on an earlier version of this 
paper. 

References 
Anderson, J. & Durand, J. (1988). Vowel harmony and 
non-specification in Nez Perce. In tI. van der Hulst 
& N. Smith (Eds.), Features, Segmental Struclure and 
Harmony Process (Part II) (pp. 1- 17). Foris. 
Archangeli, D. (1984). Underspecifieation in Yawelmani 
Phonology and Morphology. PhD thesis, Massachusetts 
Institute of Technology. 
Archangeli, D. & Pulleyblank, D. (1989). Yoruba vowel 
harmony. Linguistic Inquiry, 20, 173-217. 
Asmis, E. (1984). Epicurus'Scientific Method. Ithaca, NY: 
Cornell University Press. 
Calder, J. & Bird, S. (1991). Defaults in underspecification 
phonology. In S. Bird (Ed.), Declarative Perspectives 
on Phonology (pp. 107-125). University of Edinburgh. 
Ellison, T. M. (1992). The Machine Learning of Phono- 
logical Structure. PhD thesis, University of Western 
Australia, Perth. 
Ellison, T. M. (1994). The iterative learning of phonolo- 
gical rules. Technical report (forthcoming), Cognitive 
Science, University of Edinburgh. 
Prince, A. S. & Smolensky, P. (1993). Optimality Theory: 
Constraint Interaction in Generative Grammar. Tech- 
nical Report 2, Center for Cognitive Science, Rutgers 
University. 
