Directional Constraint Evaluation in Optimality Theory* 
Jason Eisner 
Dei)art\]nellt of Computer Scien(:e / University of Rochester 
l~,ochester, NY 14607-0226 (U.S.A.) / jason@cs .rochester.edu 
Weighted finite-state constraints that can count 1111- 
boundedly many violations make Optimality Theory 
more powerful than tinite-state transduction (Frank 
and Satta, 1998). This result is empirically mM (:om- 
t)ul, a.tionally awkwm'd. We 1)rol)ose replacing l;h(:se 
mlbounded constrainl;s, ;Is well as non-filfite-state 
Gtmeralized Aligmnent constraillts, with a new class 
of finite-state dirc'ctional constraints. We give lin- 
guistic ai)plications , l'esult, s on generative, power, and 
algorithms to comt)ile grmmna.rs into tl.'allSdllc(Ws. 
1 Introduction 
()l)tinmlity Theory is a gl'aanlnitr framework 
thnt directly ext)resses constraints on 1)honolog- 
ical fbrms. I/,oughly, tim grmnlnm" t)l'et¢~rs Ji)rlns 
thnt violate ea(:h constraint as little as I)ossil)le. 
Most (:onstrainl;s used in t)l"a(:tic(~ des(:ril)e 
disfavored local eontigurntions in the l)honolog- 
ieM tbrm (Eisner, 1997; 0. It is thel"etbre l)ossi- 
l)le tbr a, given tbrm to ottimd a single constraint 
at st',vcral locations in the fbrm. (l or example, 
a constraint against syllable codas will l)e ()t'- 
MMed by every syllM)le, that has n (:oda.) 
When eomt)m'ing tbrms, then, how do we ~Lg- 
greg~te ~ tbrm's multil)le lo(:al offenses into a,n 
overall violation level? 
A (:onstraint (:ould answer this question ill nt 
least three, wi~ys, the, third being our proposM: 
• Unbounded evaluation (l'rim:e mM 
Smolensky, 1993). A tbrnl's viob~tion le, vel 
is given 1)y the munber of ott'enses. Forms 
with fewer ot\[balses m'e t)reti~rre(1. 
• Bounded evaluation (Frank and Satta, 
1998; Karttmmn~ 1998). A tbrm's viola- 
lion level is lnill(k, nunlber of offenses) fbr 
8Ollle \]i;. This is like mfl)ounded evalu~U;ion 
excet)t l;h~t the COllStrMnt does not; disl;in- 
guish among tbrms wil;h > ti: ofl'enses. 1 
• Directional evaluation. A tbrm's vio- 
lation level coxisiders the locntion of of- 
fimses, not their totM nunll)er. Under left- 
* l am grateful to the 3 mmnymous referees tbr tbx,xtback. 
1Nol;e that k = 1 gives "l)inary" constraints that can 
1)e des(:ril)ed siml)ly as s. Any/,:-tramMed con- 
straint ca.n easily be simulated by k binary constraints. 
to-right ewduation, the constraint pret~rs 
tbrms whose ofl~nses are as late as possible. 
To (:omp;~re two tbrms, it aligns tlmm (ac- 
cording to l;heir (:omnlon underlying repre- 
senl;ation), and scans theln in 1)arMlel from 
M'I; to right, stol)ping at the first loe~tion 
where one form hns ml offense and the ()tiler 
does not ("sudden death"); it; pret~rs the 
1;~tter. Right-toqeft evMmttion is similar. 
~2 of this paper gives linguistic and conqmta- 
tional motivation tbr the 1)rol)osal. §3 tbrmMizes 
the idea trod shows thai; composin 9 a transducer 
'with a directional (:on,straint yields a transducer. 
\[Phus direc, l;ional constraints, like bouuded ones, 
kee t) (YI' within the, (:lass of regular relntions. 
(\]iut we Mso show them to be more exl)ressive. ) 
2 Motivation 
2.1 Intuitions 
RecM1 thnt ()T's constraint rmlking mechanism 
is ml answer to tim question: Ilow (:;m a gl'gd311- 
mar ewthml;e ;t ti)rnl \])y aggreg;d;ing its viola- 
lions of severM COllst;ra,ints'? Above we asked 
the Salne question at a liner some: How cm~ a 
(:onsl, raint evaluate a torm by aggregating its o5 
timses ~tl; several loe~tions? Figure I illustrates 
I;h;~l; Ollr itllSWel: is jllsl; eonstrMnt rm~king l'e(hlx. 
l)ireetional ewdual;ion sl;riel;ly ranks the im- 
1)ortnnt:e of the locations within ~ tbrm, e.g., 
from left; to right. This exeml)lilies ()T's "do 
only when necessary" strategy: l;he constraint 
preti~rs to postpone oItimses until the, y t)ecome 
strictly necessary toward tile right of the tbrm, 
even at the, cost of having more of them. 
One might l;hink from Figure 1 that each di- 
rect|ohm constraint could be decomposed into 
severnl binary or other bounded constrMnts, 
yielding ;t grmnnmr using only bounded con- 
straints. However, no single such grammar is 
general enough to handle M1 inputs: the nun> 
her of constraints needed tbr the decomposition 
corresponds to the length (i.e., the number of 
locntions) of the mMerlying represelltation. 
257 
ban.to.di.bo 
ban.ton.di.bo 
ban.to.dim.boa 
ban.ton.dim.bon 
(a) II INOCODA 
.1 
g~ ** 
***\[ 
***I* 
• ,. ,,. .......... 
• ,, ~ ..... 
1~,5" * *1 
• * v * u * 
Figure 1: Directional evaluation as subconstraint ranking. All candidates lmve 4 syllables; we simplify here 
by regarding these as the locations. C1 is some high-rm~ked constraint that eliminates ban.to.di.bo; No CoDA 
is oflbnded by syllable codas. (a) %'aditional unbounded evaluation of NoCoDA. (b) Left-to-right evahlation 
of NOCODA, shown as if it were split into 4 constraints evaluating the syllables separately. (c) I/.ight-to-left. 
2.2 Iterative and floating phenomena 
The main empMcal motiw~tion fbr direction- 
ally evaluated constraints is the existence of "it- 
erative" t)henomena such as metrical footing. 
(Derivational theories described these with pro- 
cedures that scanned a fbrm fi'om one end to 
the other and modified it; see (Johnson, 1972).) 
For most other phenomena, directional con- 
straints are indistinguishable fl'om traditional 
unbounded constraints. Usually, the candidates 
with the tbwest ofiimses are still the ones that 
sm'vive. (Since their competitors offend at ex- 
actly the same locations, and more.) This is 
precisely 1)ecause most phonology is locah sat- 
is(ring a constraint at one location does not usu- 
ally block satisi)ing it at another. 
Distinguishing cases, like the artificial Fig. 
1 where the constraint can only trade offenses 
at one location tbr oflbnses 21; anothm .... arise 
only under special conditions involving non- 
local t)henomena. Just as directional evaluation 
would predict, such a tbrced tr~(teoff is always 
resolved (to our knowledge) by placing offenses 
as late, or as em'ly, as higher constraints allow: 
• Prosodic groupings tbrce each segment or 
syllable 1;o choose which constituent (if 
any) to associate with. So-called left-to- 
right directional syllabification (Mester and 
Padgett, 1994) will syllabit~ /CVCCCV/ 
greedily as CVC.CV.CV rather than 
CV.CVC.CV, postponing epenthetic ma- 
terial mltil as late as possible. Simi- 
larly, left-to-right binary tholing (Hayes, 
1995) prefers (aa)(~a)a over a(acr)(arr)or 
(cra)a(~ra), postponing mffboted syllables. 
• Floating lexical material must surface 
somewhere in the ibrm. Floating features 
(e.g., tone) tend to dock at the let*most 
or rightmost available site, postponing the 
apt)earance of these marked t\[~atures. In- 
fixed morphemes tend to be infixed as little 
as possible (McCarthy and Prince, 1995), 
postI)oning the appearance of an affix edge 
or other aflIix material within the stem. 2 
• Floating non-lexical material must also ap- 
pear somewhere. If a high-ranked con- 
straint, CULMINATIVITY~ requires that a 
primary stress mark appear on each word, 
then a directional constraint against pri- 
mary stress will not only prevent additional 
marks but also trash the single mark to the 
first or last available syllable the tradi- 
tional "End Rule" (Prince, 1983). 
• Ilarmony must decide how fin" to spread 
features, and OCP ~ccts such as Grass- 
man's Law must decide which copies of 
a feature to eliminate. Again, directional 
ewfluation seems to capture the facts. 
2.3 Why not Generalized Alignment? 
In OT, ibllowing a remark by Robert Kirch- 
her, it has been traditional to analyze such t)he- 
nolnena using highly non-local Generalized 
Alignment (GA) constraints (McCarthy and 
Prince, 1993). For example, left-to-right foot- 
ing is thvored by A LIGN-LEF'ro~(Foot, Stem), 
which requires every ibot to be left-aligned 
with a morphological stem. Not only does 
each misaligned tbot offend the constraint, but 
the seriousness of its offense is given by the 
2 "Available site" and "l)ossible '' amount of infixation 
are defined here by higher-ranked constraints. These 
might restrict the allowed tone-bearing units and the 
allowed CV shape aider infixation, lint do not flflly de- 
termine where the floating material will surface. 
A referee asks why codas do not also float (to postpone 
NOCODA offenses). Answer: Flotation requires unusual, 
non-local mechanisnls, gen or a constraint ntltst ellstlre 
that an mlchored tone sequence resembles the mlderlying 
floating tone sequence, which may be represented on an 
auxiliary intmt tape or (if bounded) as an inlmt pretix. 
But ordinary ihithflflness constraints check only whether 
mMerlying material surfaces locally; they would l)enalize 
coda flotation as a local deletion plus a local insertion. 
258 
mmfl)er ()f syllal)les t)y which i(; is misaligned. 
Tlmse mmfl)ers nre sumnmd over nil otlhnding 
tk'.el; to ol)l;~ili tim violation level. \]'i)r (:x~m> 
t)le, \[(r(cro)(o(y)c~(o-o)\],s,t,,,, has 1+3+6=10 vio- 
lations, a,nd \[crcmo(c.ro-)(o~cr)\] st,~m is (;q,,ally 1)ad 
at 4+6=10 violntions. Shifting thor l(;ftward or 
(;limin~fing th(;m r(;du(:cs t;he violation hwel. 
(Sl;(;mt)(,,rgcr, \]99(i) argued l;h;d; CA (:on- 
st;mints were l;()o pow(M'ul. (Ellison, 1995) 
s\]lowc(\] l;h~d; no singh; tinil;(>sl;nt(., ,ml)o,md('A 
constraint (:ould deline the smnc violation lev- 
els as a CA (:onsl;raint. (Eisner, 1997a) showed 
more strongly l;h;tl; sin(:(; GA (:nn t)e ma(te to 
(:(ml;(n: a floatil~g ton(', on ;t t)hras(',, :~ n() hierar- 
ch, y of tinil;(;-st;d;(; md)omMed (:onstrMnts could 
(wen (t(;tin(; the S&ilI(} optimal candidatc,s a.s a 
GA (:onstr;dnt. Thus GA (:retool; l)e simulated 
i,, (1994) ({}a. 
For this reason, as w(;ll as the awkwm'(hmss 
~md non-lo(:Mil;y of (:wdu;~ting CA otl'(ms(;s, w(; 
t)rol)ose t() r(q)bme (-IA with (tir(~(:tiotml c(m- 
straints. \])ir(,x:l;ionM (:onsl;ra.inl;s apt)ear l;() m()re 
dir(~(:tly (:;q)l;ur(: the ol)serv('xl t)h(mom(m:~. 
We (to nol;e th;d; m,)th(n', l;ri(:lder t)ossibility is 
to eliminalx; CA ill favor of ()l'(li~t;u'y '¢mbo'und,d 
(:onsl;rainl;s thai; are in(tifl'('a'(ml: to th(', h)(:ation 
of oflimses. (Ellison, \]99d) noted that GA con- 
straint;s that (wahmte(t the 1)ln(:(mmnt of only 
one (:l(;m(mt (e.g., I)rinmry stress) could 1)(: r(> 
1)l:med by simt)hn: N()\]NTEllVENING (;OliSi;\]';tinLs. 
(\]'\]isn(;r, ii 9971)) gives ;~ (IA-fr(',('. tr(~a.tm(',nt ()f th(', 
metri(:al sl;r(,,ss tyl)()l()gy of (llayes, 1995). 
2.4 Generative power 
li|; has r(~(:ently t)(xm 1)ropos(M th:g; tbr (:Oml)U- 
tal, ionM r(',a,sons, (Yl' should (;liminnt(; not only 
GA t)ut all unbomMed constraints (Frnnk ~tlltl 
Satt~, 1998; Karttunen, 1998). As with GA, 
we oflbr the less extreme npl)roach of ret)lacing 
them with dircctionnl (:onsl;rainl;s insi;(;n(1. 
l/,c(:all that a t)honological grammar, as usu- 
ally (:onc(',ive(t, is a (tescrit)tion of t)(',rmissil)l(', 
CUll,, SR) 1)Mrs. 'I It has long 1)e(m 1)eli(w(M 
t;h;h; naturally o(:curring t)honologicnl grammnrs 
are r(:.qular relations (Johnson, 1972; Kal)lm~ 
and Kay, 1.994). This 111(2~1.118 |;hat they can 
1)e implemented as finite-state transducers 
(FSTs) that accel)t exactly the granlmatical 
pairs. FSTs are immensely useflfl in l)ertbrm- 
3This is inde('xl too powerful: cent(Mng is ulmtt(~sl;cd. 
dUR = un(hMying rel)rcscntation , SIt = surta(:(; relm. 
ing mmly rt;h;vm~l; tasks r~q)idly: g(;ner~d;ion (Oil- 
raining all possible S\]{.s tSr a UI{.), COml)rehcn- 
sion (conversely), chm'act(Mzing I;\]m sol; of fin'ms 
on wlfi(:h two grmmnars (pertmI)S from diflk~rent 
descriI)tiv(', fi:am(,,works) would difl'er, etc. More- 
over, FS'i~s can 1)e al)t)lied in parallel to regular 
sets of tbrms. For example, one cml obtMn n 
w(fighted set of l)ossible Sl{.s (a l)hon(mm lat- 
l;i(:(~) ti'OlU & s\])(}(;(:\]l rtx:ognizer, pass it; through 
the invers(; l;rmlsdu(:(;r, int(,xsect the r(;sulting 
weight(~d set of U\]/.s with the lexicon, and I;hen 
(;xi;ra(:t th(; \])(;st surviving U1Rs. 
(Ellison, 199d; Eisner, 1997a) frmne ()T 
wit\]fin this tradition, t)y modeling Gen mM the 
(:onsl;ra.illtS :ts w(;ig\]lt(;d tinit(>sl;~t(; m~mhin(;s 
(see {}3.2). Bui; all;hough thos(; t)ai)ers showed 
hOW I;O gt}ll(}l';I,LO, I;ll(} S(}l; of SI{,s t()l; ;L single given 
UI{., t\]my did not (:Oml)ih~ tim OT gr;umnar into 
;ul FST, or ol)tnin the other l)(mcfits th('d'(~of. 
In fact, (Frank and Satta, 11!)98) showed 
Idmt such (:onq)il~ti;ion is iml)ossil)h', in tlm g(m- 
er;tl (:;is(', ()\[ un\])otln(t(}(t (:onsl;raints. 'l.i) see 
why, consi(hn: the grammar MAX, \])EI)~ ltAR- 
M()NY\[h(~ight,\] >> ll)l,',NT-l()\[h(;ight\]. This gra.m- 
mar insists on ll(:ight lmrmony ~m~()ng sm:li~(:(; 
vowels, but (lislik(;s (:h;mg(;s fl:om tim UR. The 
result is tlm ml~-g3;(}sl;e(t 1)h(momelion of "ma- 
jority nssimilntion" (l{akovi(:, 1999; Lomt)~r(ti, 
1999): a UR with more high vowels thml low 
will surt'a(:(; with all vowels high, ;uM vi(:(>v(n'sa. 
So ()T may comp~u'c 'unbo'u',,dcd count.s" in a. w~ty 
th;tt ;m \]i'S3' (:mmot a.n(l phonology (loes not. 
This suggests that ()T with unl)ounded (:on- 
stmints is i;oo l)owcrlul, lhmce (Fr;mk mM 
S~t;I;~L, 1998; Km:ttmmn, 1998) l)rOl)OS(: using 
only 1)(mnde(t (:onstrainl;s. q'hcy show this re- 
du(:es OT's l)owcr to finite-state transduction. 
The worry is l;h~t 1)oundcd constraints italy 
1101; b(', (;xI)rcssiv(; enough,. A 2-bounded version 
of NOCODA would not distinguish among the 
tinM thre(; tbrms in Figure \]: it is agnostic wh(m 
l;lm intmt tbr(:es multii)l(,, codas in all c;mdidat(~s. 
To t)e sure, ~/~:-l)oun(h~d approximation may 
work well fi)r large t~:. 5 llut its automaton (!}3.2) 
will tyl)ically h~we k times as m~my st;td;es ~s I;he 
mlb(mnded originM, since it mlrolls loops: the 
5Using the al)t)roximat(: grmnlnar tbr generation, an 
out,put is guarant(:(:d corrc(:t unless it achieves h vio- 
lations for some k-l)oundcd (:onstraint. One can t;hen 
raise k, recoml)ile tlw. grammar, and try again. But 1,: 
may grow (tuil;c large for long inputs like phonoh)gical 
1)hrases. 
259 
state must keep track of the offense count. In- 
tersecting many such large constraints cast pro- 
duct very large FSTs--while still failing to cap- 
tare simple generalizations, e.g., that all codas 
are dispreferred. 
In §3, we will show that directional con- 
straints are more powerful than bounded 
constraints, as they can express such 
generalizations--yet they keel) us within 
the world of regular relations and FSTs. 
2.5 Related Work 
Walther (1999), working with intersective con- 
straints, defines a similar notion of Bounded 
Local Optimization (BLO). Tronnner (1998; 
1999) applies a variant of Walther's idea to OT. 
The motivation in both eases is linguistic. 
We sketch how our idea differs via 3 examples: 
UR uuuuu uu uuu uuuuu 
candidate X vvvbb vv vbb vvvbb 
candidate Y vvbaa vvvvbaa vzbaa 
Consider *b, a left-to-right constraint that is of 
fended by each instance of b. On our proposal, 
candidate X wins in each column, because Y 
always offends ,b first, at position 3 in the UR. 
But under BLO, this offense is not fatal. Y 
can survive *b by inserting epenthetic material 
(colunm 2: Y wins by postt)oning b relative to 
tits SR), or by changing v to z (cohmm 3: Y ties 
X, since vv ¢ vz and BLO merely requires the 
cheapest choice 9iven the sur:face output so far). 
In the same way, NoCoDA under BLO would 
trigger many changes unrelated to codas. Our 
definition avoids these apparent inconveniences. 
Walther and Trommer do not consider the ex- 
pressive power of BLO (cf. ga.3) or whether 
grammars can be compiled into UR-to-SR FSTs 
(our main result; see discussion in §3.4). 
3 Formal Results 
3.1 Definition of OT 
An OT grammar is a pair (Gen, C) where 
• the candidate generator Gen is a relation 
that maps eaeh input to a nonempty set of 
candidate outputs; 
• the hierarchy C = (C1, C2,...) is a finite 
tnple of constraint functions that evaluate 
outputs. 
We write d(5) for the tuple (C~(5), C2(5),...). 
Given a UR, or, as input, the grammar adnfits 
as its SRs all the outtmts 5 such that C(5) is lex- 
ieographicalty minimal in {C(5) : 5 ~ Gen(~)}. 
The values taken by 6'/ are called its viola- 
tion levels. Conventionally these are natural 
mnnbers, trot any ordered set will do. 
Our directional constraints require the fol- 
lowing immvations. Each input a is a string as 
usual, but the outputs are not strings. Rather, 
each candidate 5 C Gen(cr) is a tuple of I~l + 1 
strings. We write 5 for the concatenation of 
these strings (the "real" SR). So 5 specifies an 
aliflnmcnt of 5 with a. The directional con- 
straint Ci maps the tuple 5 to a tuple of nat- 
ural numbers ("offense levels") also of length 
I~1 + \]. Its violation levels {6~(5) : 5 < Gen(~)} 
are compared lexicographically. 
3.2 Finite-state assumptions 
We now confine our attention to tinite-state OT 
grammars, following (Ellison, 1994; Tesar, 1995; 
Eisner, 1997a; Frank and Satta, 1998; Kart- 
tunen, 1998). Gen C_ E* x A* is a regular 
relation, ~ and may be implemented as an uu- 
weighted FST. Each constraint is implemented 7 
as a possibly nondeterministic, weighted finite- 
state automaton (WFSA) that accepts A* and 
whose ares are weighted with natural nulnbers. 
An FST, T, is a tinite-state automaton in 
which each arc is labeled with a string pair (t : 3'- 
Without loss of generality, we require \[eel < 1. 
This lets us define an aligned transduction 
that maps strings to tuples: If ~r = al...a~, 
we define T(a) as the set of (n + 1)-tui)les 
5 = (50, 31,... 5n) such that T has a path trails- 
ducing a : g along which 50"" 5i-1 is the con> 
plete output before ai is read fronl the input. 
We now describe how to evaluate C(d) where 
C is a WFSA. Consider the path in C that ac- 
cepts a. 8 In (nn)bounded evaluation, C(5) is 
the total weight of this path. In left>to-right 
evaluation, C(5) is the n + 1 tuI)le giving the re- 
spective total weights of the subpaths that con- 
sume a0,.., at. In right-to-left evaluation, C(5) 
is the reverse of the previous tuple. "~ 
GEllison required only that Gen(c,) be regular (Vcr). 
rSpace prevents giving the equivalent characteriza- 
tion as a locally weighted  (Walther, 1999). 
Slf there art multiple accepting paths (nondetermin- 
ism), take the one that gives the least vahm of C(5). 
9This is equivalent to CR(5.~t,...,~) where ~ de- 
notes reversal of the automaton or string as apt)ropriate. 
260 
3.3 Expressive power 
Thanks to Gen, finite-state ()T can l;rivially iln- 
1)lement any regular inl)ut-outtmt relation with 
no coustrmnts at all! And {i3.4 below shows that 
whether we allow directional or houri(led con- 
straints does not affect this generative power. 
But in another sense, directional constraints 
are strictly more expressive than bounded ones. 
If Gen is fixed, then any hierarchy of hemmed 
constrMnts can be simulated by some hierarchy 
of directionM constraints 1° -but not vice-versa. 
Indeed, we show even more strongly that di- 
rectional constraild;S cannot always be simu- 
lated even by mflmmMed constraints. 11 \])trine 
• b as in §2.5. This ranks the set (alb) '~ in lexico- 
graphic order, so it; makes 2u distinctions. Let 
Gen be the regular relation 
(a :(,,Ib:b)*(c:,((,,:-lb: b)* I (,:t,(-:(,F,: hi,,: t, lb:-)*) 
We claim that the grammar (Gen,*b) is not 
equivalent to (Gen, C1,..., C~) tTor any bounded 
or mfl)ounded constraints C~,... C.~. There is 
some k; such l,hat tbr all d 5 A", each Ci(5) < 
t~:'n. 12 So candidates 5 of length n have at most 
(h,n,) 's ditt'erenl; violation profiles (~(5). Choos, n 
such that 2 '~ > (k'n) ~. Then the set of 2 ~ strings 
(alb) n must contain two distinct strings, 6 = 
:,.,...:,:,, a' = > .-..v,,,, will, = 
Let i be nfinima,1 such that xi ~ ?,Ji, all(l with- 
oul; loss of generality ~ssume xi = o,, yi = b. Put 
O- = J;1 "''Zi--lC:Ci+l "'':g~t" Now 5, 51 C Gen(o) 
and 5 is lexicographicMly minimM in Gen(c,). 
So the granunar (Gen,*b) 1naps cr to 5 ouly, 
whe, reas (Gen, C) emmet distinguish between 5 
and (\[', st) it; maps cr to neither or both. 
3.4 Grammar compilation: OT ---- FST 
It is triviM to translate an arbitrary FST gram- 
mar into ()T: let Gen be the FST, aim C = (). 
The rest of this section shows, conversely~ how 
to compile a tinite-state OT grammar (Gen, C) 
into an FST, provided that the grammar uses 
only bounded and/or directional constraints. 
1°How? By using states to count, a bounded COil- 
straint's WIPSA can bc transtbrmed so that all the weight 
of each path falls on its final arc. This defines the same 
optimal candidates, even when interpreted directionally. 
,1Nor vice-versa, since only unlmunded constraints can 
implement non-regular relations (~2.4,{i3.4). 
12 Apply !i3.4.4 to elinfinate any e's froIn the constraint 
WDFAs (regarded as outlmtless transducers), then take 
1,: to exceed all arc weights in the result. 
3.4.1 The outer loop of compilation 
Let 5/~ = Gen. For i > 0, we will construct 
an FST 77,/ that iml)lements the i)artial grmn- 
mar (Gen, C1, C2,... Ci). We construct Ti from 
Ti_ 1 alld C i only: Ti('/;) colttaills the forlllS 
y E Ti_l(X) tot" whieh G(Y) is minimal. 
If C i is L;-txmnded, we use the construction of 
(Frank and Satta, 1998; Karttulmn, 1998). 
If Ci is a left-to-right constraint, we compose 
Ti-1 with the WFSA that l'epresents Ci, obtain- 
ing a weigh, ted finite-state transducer (WFST), 
Ti • This transducer may be regarded as assign- 
ing a Ci-violation level (an (1~1 + 1)-tuple) to 
each cr : (~ it accepts. We must now 1)rulm away 
the subol)timM candidates: using the DBP al- 
gorithm below, we construct a new unweighted 
FST 7) that transduces a : ~ ill" the weighted 9~ 
can transduce (r : 5 as cheaply as any a : 5 ~. 
If Ci is right-to-left;, we do just the same, ex- 
cept DBP is used to construct; T/t ti'om 7)\]". 
3.4.2 Directional Best Paths" The idea 
All that remains is to give the construction of 
Ti from 7~i, which we call Directional Best 
Paths (DBP). Recall standard bestq)aths or 
shortest-t)aths algorithms that pare a WFSA 
d(}wn to its 1)aths of minimmn total weight (Di- 
jkstra, 1959; Ellison, 1994). Our greedier ver- 
sion (toes llot SllUl along Imths trot always im- 
me(liately takes the lightest "availal)le" at('. 
Cru{:iMly, available ar{:s are define{t r(;lativc to 
the int)ut string, l)ecause we must retain one or 
more ot)timal output candidates for each inlmt. 
So availal}ility requires "lookahead": we must 
take a heavier are (b:z beh)w) just when the 
rest; of the intmt (e.g., abd) emmet otherwise be 
ac{:et}ted on any t)ath. ~ c:c _ 
,,:,, 2 2 
Ti(abd) = {a:,'c, vc} k~_~. c ~__~ 
i (ab ) = 
(a,c ,fl)bIev,att,s (e, a,, c)) su~mptimal ~) 
On this example, DBP would simply make state 
6 non-tinal (tbrcing abe to take the light are un- 
availal)le to abd), but often it; must add states! 
This relativization is what lets us compile a 
hierarchy of directionM constraints, once and tbr 
all, into an single Fsq_' that can find the optimal 
OUtl)ut for aTzy of the infinitely many t)ossible in- 
puts. We saw in §2.4 why this is so desirable. By 
261 
contrast, Ellison's (1994) best-paths construc- 
tion tbr unbounded constraints, and previously 
proposed constructions tbr directional-style con- 
straints (set §2.5) only find the optimal outt)ut 
for a single input, or at best a finite lexicon. 
a.4.a Dir. Best Paths: A special case 
§3.2 restricted our FSTs such that for every arc 
label o~ : 7, I ~t\] -< 1. In this section we construct 
^ 
~) ti'om Ti under the stronger assumption that % 
Ioe\[= 1, i.e., ~i is e-flee on the intmt side. 
If Q is the stateset of Ti, then let; the stateset 
of be S\]: c_ S c_ 0, q c- S- 
This has size IQ\[" 31QI-*. However, most of 
these states are typicMly unreachable from the 
start state. Lazy "on-the-fly" construction tech- 
niques (Mohri, 1997) can be used to avoid allo- 
cating states or arcs until they arc discovered 
during exploration from the start State. 
For a E E*,q G Q, define V(G,q) as the 
minimmn cost (~ \[al-tut)le of weights) of' any 
^ 
or-reading 1)ath from Ti's start state q0 to q. 
The start state ot'fl) is \[q0; 0; {q0}\]. The intent 
is that Ti have a path from its start state to 
\[q; R.; S\] that transduces cr:5 \]a itf 
• Ti has a q0 to q, o':a path of cost V(er, q); 
• = {q'c Q: < 
• s = {4 c O: <_ vO, q)}. 
So as Ti reads c,, it "Ibllows" Ti. cheapc.st cy- 
reading paths to q, while calculating R, to which 
yet cheaper (but l)erhaI)S dead-end) paths exist. 
Let \[q; R; S\] be a final state (in Ti) itf q is final 
and no q' E R is final (in 5~?i). So an accepting 
path in ~) survives into Ti ifl' there is no lower- 
cost accepting path in Ti for the same int)ut. 
The arcs fl'om \[q;R;S\] correspond to arcs 
from q. For each arc fl'om q to q' labeled a : -y 
and with weight W, add an unweighted a : 7 
arc from \[q;R; S\] to \[q'; R'; S'\], provided that 
the latter state exists (i.e., unless q' E R', indi- 
cating that there is a cheaper path to q'). Here 
R' is the set of states that art either reachable 
from R by a (single) a-reading arc, or reachable 
from S by an a-reading arc of weight < W. S t 
is the union of R' and all states reachat)le from 
S by an a-reading arc of weight W. 
3.4.4 Dir. Best Paths: The general case 
To apply the above construction, we nnlsl; firsl; 
transtbrm Ti so it is e-flee on the int)ut side. Of 
laa is a tuple of \]~r\]+l strings, but 50 = e by e-fl'eeuess. 
course int)ut c's are cruciM if Gen is to be allowed 
to insert unbounded alilOllllt8 of surface mate- 
rim (to be pruned back by the constraints). 14 
To eliminate e's while preserving these seman- 
tics, we are tbrced to introduce FST arc labels 
of the tbrm a : F where F is actually a regular 
set of strings, represented as an FSA or regu- 
lar expression. Following e-elimination, we can 
apply the construction of §3.4.3 to get Ti, and 
finMly convert Ti back to a normal transducer 
by expanding each a:F into a subgraph. 
When we elilninate an arc labeled c. : 7, we 
must Imsh 7 and the arcs weight back onto 
a previous non< arc (but no further; contrast 
(Mohri, 1997)). The resulting machine will ira- 
plement the same Migncd transduction as ~ 1)ut 
more transparently: in the notation of .~3.2, the 
arc reading ai will transduce it directly to 5i.1.5 
Concretely, suppose G~ can gel; from state q 
to q" via a t)ath of total weight W that 1)egins 
with a : 7~ on its first arc followed 1)y e : "T2~ 
e : 7a, ... on its remaining arcs. \¥e would like 
to substitute an arc from q to q" with label 
a : 7172Ta-.. and weight I/V. But there may 
be infinitely many such q q" t)~ths, of varying 
weight, so we actually write a : F, where \]? de- 
scribes .just those q-q" paths with minimmn W. 
The exact procedure is as follows. Let G be 
the possibly discommcted subgraph of 5hi \]brined 
by e-reading arcs. Run ml nil-pairs shortest- 
paths algorithm Is on G. This finds, for each 
state pair (qt, q,) connected by an c-readiug 
path, the subgral)h Gq,,q,, of G formed by the 
minimmn-weight e-reading t)aths froln q' to q", 
as well as the common weight Wq,,q,, of these 
paths. So tbr each arc in 2Pi from q to q', with 
weig~ht W and label a : 7, we now add an arc 
to Ti from q to q" with weight W + l/Vq, q,, and 
label a : 7Gq,,q,,(e). (G(e) denotes the regular 
 to which G transduces e.) Having done 
this, we can delete all e-reading arcs. 
The modified e-free ~) is equivalent to 
14As is conventional. Besides epenthetic material, Gen 
often introduces COlfiOUS prosodic structure. 
lSThat arc is labeled ai : P where & E F. But what is 
ao? A special symbol E E E that we introduce so that 
5o can be pushed back onto it: Before e-dimination, we 
modify Ti by giving it a new start state, commcted to 
the old start state with an arc E : e. After e-elimination, 
we apply DBP and replace E with e in the result Ti. 
lS(Cormen et al., 1990) cites several, including fast 
algorithms for when edge weights are small integers. 
262 
tlm origim~l (;xcet)l; for ('Jilnim~ting some 
of tim sul)ol)timal sul)l);tths. H('a'c is a 
gri~l)h ti'agment heft)r(; and after c-climim~tion: 
iI:iI ~(2~') 
a :b 
Note: Right-to-left; evaluation applies \])BP t;o 
~ so consistency with our t)revious definitions 
mGms it lllltSt; trash c.'s forward, not backward. 
4 Conclusions 
This t)atmr has 1)rol)oscd anc, w notion in OT: 
"dirccl;ional ev;fluation," where UlM(;rlying loca- 
tions ~Lr(', sl;rictly rmlk('xt t)y I, hcir ilnporl,mm(;. 
Tra(lil;ionld finitc-stnt(; OT constraints lu~vc 
enough power to compare arbitrarily \]figh 
(:(rants; Oen('.ralizcd Aligmncnt is even worse. 
Dirct,l;ion~fl constraints SC(}lll I;O cD~\])|;llr( ~, |;\]1(I pros 
of these consla'ainl;s: |;hey al)prot)riately mili- 
l;~tl;e a,g~finsl; cvc, ry instan(:(', of a disfavored con- 
figured;ion in a (:andid~l;c forln, no ln~d;l;cr how 
many, and they naturally (:at)t;ur(', il;erativc and 
(',(lgt;most ellbx:ts. Y(;t they do not hnve the ex- 
(:(',ss power: w(', ha,vc, showl~ l;ha, l; iH~ g~r}~liml{rr o\[' 
(tirc(:tional and/or 1)omMed (:onstrainl;s can 1)e 
COml)iled into n tinitc-sl;~m', t\]'~msduc(;r. '.l?h;~t is 
both empirically and ('Oml)ut~tion~dly dcsir;fl)h;. 
The most ohvious ful;m'e work comes t'rOlll lin- 
guistics. Can dir(;cl;ionnl constraints do MI l;h('. 
work of mfl)ounded and CA cons\]mints? Itow 
do they change the style, of mlalysis'? (E.g., di- 
rectional versions of markedness consl;rai\]lts pin 
down the locations of mm:kcd ol)jc,('ts, leaving 
lower-ranked constr~fints no s~G.) \]?in~flly, direc- 
tional (:onstrainl;s ca, n t)c, vm'iously formul~t('xt 
(is *CLUSTEll, offended at tim start; or (rod of 
each duster? or of its enclosing sy\]\]{fl)le?). St) 
what conventions or restrictions shoukt apply? 

References 

Eric Bakovid. 1999. Assimilation to the unmarked. 
Ms., Rutgers Optimality Archive ROA-340. 

T. H. Cormen, C. E. Lciserson, and 11. L. Rivest. 
1990. Introduction to Algorith, ms. MIT Press. 

Edsger W. l)ijkstra. 1959. A note on two problems 
in colmexion with grat)hs. Numcrisch, e Math.c- 
matik, 1:269 27\]. 

Jason Eisner. 1997a. Efficient generation in primitive Optimality Theory. In PRoceedings of the 35th, Annual ACL and 8th EACL, Madrid, July, 31.3320. 

Jason Eisner. 1997b. FooTlPOltM decomposed: Us- 
ing primitive constraints in QT. In Benjmnin Bru- 
ening, editor, Pro< of SCIL VIII, MIT Working 
Papers in Linguistics 31, Cambridge, MA. 

T. Mark Ellison. 1994. Phonological derivation in 
optimality theory. In Proc. of COLING. 

~\]'. Mark Ell\]son. 1995. ()T, tin\]\]e-state repres('ma- 
tions and l)rocedurality. In Proc. of fit,(: Col@r- 
e'nee on I, brmal (trammar, Barcelona. 

Robert Frank and Giorgio Satta. 1998. Optimality Theory and the generative complexity of constraint violability. Computational Linguistics, 24(2):307-315. 

Bruce ttayes. 1.995. Metrical Stress Theory: Princi- 
ples and Uasc Studies. U. of Chicago Press. 

C. Douglas Johnson. 1972. Formal Aspects of 
Phonological Description. Mouton. 

Ronald M. Kaplan and Martin Kay. 1994. Regular 
models of phonological rule systems. Computational Linguistics, 20(3):331-378. 

l,mn'i Karttunen. 1998. The prol)cr treatment of 
optilmdity in COmlmta.tiona\] lfllonology. In P?'oc. 
ofFSMNLP'98, 1 12, Bilkent I.-., Ankara, Turkey. 

lfinda Lombard\]. 1999. Positional faithflflness and 
voicing assimilation in ()ptimality Theory. Nat',,- 
ral La',,g'uage and Linguistic Theory, 17:267 302. 

.Iohn McCarthy and Alan Prince. 1993. Generalized 
aligmnent. In G(!(;rt Booij and ,Iaap van Math',, 
c'ditors, D:a'rbook of Morph.ology, 79 \] 53. Kluwer. 

John McCarthy and Alan Prince.. 1995. Ntithfltlncss 
and reduplicative identity. In Jill Beckman ct al., 
editor, Papers in Optimality ~JT~,c.ory, 259 384. U. 
of Massachusetts, Amherst: GLSA. 

Armin Mester mid ,laS,e \]~adgc'tt. \]!)94. \])irectional 
syllalfification in Genca'alized Aligmnent. Phonol- 
ogy ~t|; Sall|;a CrllZ 3, ()(:t;ot)o.r. 

Mehryar Mohri. 1997. Finite-state transducers in 
 and speech processing. Computational Linguistics 9-23(2). 

Ala.n Prince and l?mfl Smolensky. 1993. Optimal- 
it;.3, Theory: Consl;raint interaction in generative 
g;ralnnlar. \]k/Is., l{utgers U. and U. of Colorado. 

Alan Prince. 1983. Relating to the, grid. Linguistic 
Inq',.iry, 14:\].9 100. 

3. P. Stemberger. \]996. The. scope of the theory: 
Where does beyond lie? In L. MoNa\]r, K. Singer, 
L. M. Dobrin, and M. M. Auto\]n, eds. Papc~w firm 
the Parascssion on ~17~,cory and Data in Linguis- 
tics, CLS 23, \]39 -164. Chicago Linguistic Society. 

Bruce Tesar. 1995. Computational Optimality The- 
ory. Ph.D. thesis, U. of Colorado, Boulder. 

Jochen Trommer. 1998. Optimal morphology. In 
T. Mark Ellison, editor, Proc. of the 4th ACL 
SIGPHON Workshop, Quebec, July. 

Jochen Trommer. 1999. Mende ton(; lmtterns revis- 
ited: Tone mal)t)ing its local constraint evaluation. 
In Linguistics in Potsdam Working Papc~w. 

Markus Walthe.r. 1999. One-level prosodic morphol- 
ogy. Arl)eiten zur Linguistik 1. U. of Mart)urg. 
