An Algorithm for Functional Uncertainty 
Ronald M. KAPLAN and John T. MAXWEI,L I\[I 
Xerox Pale Alto Research Center 
3333 Coyote Hill Road 
Palo Alto, Californid 94304 USA 
Abstract: The formal device of flmetional uncertainty has been 
introduced into linguistic theory as a means of characterizing 
long-distance dependencies alternative to conventional 
phrase-structure based approaches. In this palter we briefly outline 
the uneertMnty concept, and then present an algorithm for 
determining the satisfiability of acyclic gramu~atical descriptions 
containing uncertainty expressions and for synthesizing the 
grammatically relevant solutions to those descriptions 
1. Long-dis~ance l)ependeneies and Functional Uncertainty 
In most linguistic theories hmg-distance dependencies such as are 
found in topiealization and relative clause constructions are 
characterized in tcrnrs of categoric,,; and configurations of 
phrase-structure nodes. Kaplan and Zaenen (in press) have compared 
this kind of an analysis with one based on the fimetional organization 
of sentence:~, and suggest that tile relevant generalizations are instead 
best stated in functional or predicate-argument terms. \]'hey defined 
and investigated a new tbrmal device, called "functional uncertainty" 
that permit~ a functional statement of constraints on unbounded 
dependeneie:~. In this paper, after reviewing their formal specification 
of flmctional uncertainty, we present an algorithm for determining the 
satisfiability of grammatical descriptions that incorporate uncertainty 
specifications and fro" synthesizing the smallest solutions to such 
descriptions. 
/Kaplan and Zacnen (in press)/ started from an idea that 
/Kaplan and Bresnan 1982/briefly considered but quickly rejected on 
mathematical and (/Kaplan and Zaenerd suggest, mistaken) linguistic 
grounds. They observed that each of the possible underlying positions 
of an initial phrase could be specified in a simple equation locally 
associated with that phrase. In tile topiealized sentence Mary John. 
telephoned yesterday, the equation (in LFG notatiml) (1' TOPIC): 
( 1' (mJ) specifies that Mary is to be interpreted as the object of the 
predicate telephoned. In Mary John claimed that Bill telephoned 
yesterday, the appropriate equation is ( 1' TOHC)=( 1' COMP {mJ), 
indicating that Mary is still the object of telephoned, which because of 
subsequent words in the string is itself the eonrplement (indicated by 
the function name COMP) of the top-level predicate claim. The sentence 
can obviously be extended by introducing additional complement 
predicates (Mary John claimed that Bill said that .... that Henry 
telephoned yesterday), for each of which stone equation of the general 
fm'm ( 1' TOHC)=( 1' COMP ('OMP .... On,I) would be appropriate. The 
problem, of course, is that this is an infinite family of equations, and 
hence impossible to enumerate in a finite disjunction appearing on a 
particular rule of grammar. For this technical reason, Kaplan and 
Bresnan abandoned the possibility of specifying unbounded 
uncertainty directly in fimctional terms. 
Kaplan and Zaencn reconsidered the general strategy that 
Kaplan and Bresnan began to explore. Instead of formulating 
uncertainty hy an explicit disjunctive enumeration, however, they 
provided a formal specification, repeated here, that characterizes the 
family of equations as a whole. A characterization of a family of 
equations roay be finitely represented in a grammar even though the 
family itself has an infinite number of members. \]'hey developed this 
notion from the elementary descriptive device in LFG, the 
functional-application expression. This has the following 
interpretation: / 
(1) (f s)= e holds if and only if f is an f-structure, s is a symbol, 
and the pair <s;v> E f. 
An f-structure is a hierarchical finite function from symbols to either 
symbols, semantic forms, f-structures, or sets of f-structures, and a 
parenthetic expression thus denotes the value that a thnetion takes for" 
a particular symbol. This notation is straightforwardly extended to 
allow for strings of symbols, as illustrated in expressions such as 
( I" co,~w (re,l) above, lfx=sy is a string composedofan irfitial symbol s 
followed by a (possibly empty) suffix stringy, then 
(2) (fxI~((fs)y) 
(f~) =-/', where c is the empty string. 
The crucial extension to handle unbounded uncertainty is to allow the 
argument position in these expressions to denote a set of strings. 
Suppose u is a (possibly infinite) set of symbol strings. Then Kaplan 
and Zaenen say that 
(3) (f(r)= v holds if and only if ((fs) Suff(s,a))= v for some symbol 
.s, where Suff(s,a) is the set of suffix strings y such that sy 6 a. 
Thus, an equation with a string-set argnment holds if it wouhl hold for 
a string in the set that results fl'om a sequence of left-to-right symbol 
choices. This kind of equation is trivially unsatisfiable iffl denotes the 
empty set. Ira is a finite set, this fornmlatiou is equivalent to a finite 
disiunction of equations over the strings in a. Passing fi'om finite 
disjunction to existential quantification enables us to capture the 
intuition of unbounded uncertainty as an underspeeifieation of exactly 
which choice of strings in a will ire compatible with tile functional 
information carried by the surrounding surface environment. 
Kaplan and Zacnen of emu'se imposed the further requh'emmtt 
that the membership of a be characterized in finite specifications. 
Specifically, for linguistic, mathematical, and computational reasons 
they required that a in fact be drawn from the class of regular 
hmguages. The characterization of uncertainty in a partieuhu' 
grammatical equation can then be stated as a regular expression over 
the vocabulary of grammatical function names. The infnite 
uncertainty for the topicalization example above, for example, can be 
specified by the equation (\]' TOPIC)=('\[ COMP*OBJ), involving the 
Kleene closure operator. A specification for" a broader class of 
topiealization sentences might be ( 1' TOPIC)={ T COMP* GF), where GF 
denotes the set of primitive grammatical functions {SUFU, OgJ, OBJY, 
XCOMP, ...}. Various restrictions on the domain over which these 
dependencies can operate--the equivalent of the so-called island 
constraints--can be easily formulated by constraining the uncertainty 
language in different ways. \["or example, the restriction for English 
and Icelandic that adjunct clauses are islands (Kaplan & Zaenen, in 
press) might be expressed with the equation ( 1" TOPIC) = 
(\]" (GF-ADJ)* GF). One noteworthy consequence of this flmetional 
approach is that appropriate predicate-argument relations can be 
defined without relying on empty nodes or traces in constituent 
structure. 
In the present paper we study the mathematical and 
computational propertiesofregular uncertainty. Specifically, we show 
that two important problems are decidable and present algorithms for 
computing their solutions. In LFG the f-structures assigned to a string 
are characterized by a functional description ('f-description'), a Boolean 
combination of equalities and set-membership assertions that 
acceptable f-structures must satisfy. We show first that the 
verification problem is decidable for any functional description that 
contains regular uncertainties. We then prove that the satisfiability 
problem is decidable for a linguistic interesting subset of descriptions, 
namely, those that characterize acyclic structures. 
297 
2. Verification 
The verification problem is the problem of determining whether or not 
a given f-structure F satisfies a particular functional description for 
some assignment of elements of F to the variables in the description. 
This question is important in lexical-functional theory because the 
proper evaluation of I,FG's constraint equations depends on it. It is 
easy to show that the verification problem for an f-description 
including an uncertainty such as (fa) = v is decidable ifF is a noncyc|ic 
f-structure. If F is noncyclic, it contains only a finite number of 
function-application sequences and thus only a finite number of 
strings that might satisfy the uncertainty equation. The membership 
problem for the regular sets is decidable and each of those strings can 
therefore he tested to see whether it belongs to the uncertainty 
language, and if so, whether the uncertainty equation holds when the 
uncertainty is instantiated to that string. Alternatively, the set of 
application strings can be treated as a (finite) regular language that 
can be intersected with the uncertainty language to determine the set 
of strings (if any) for which the equation must be evaluated. 
This alternative approach easily generalizes to the more 
complex situation in which the given f-structure contains cycles of 
applications. A cyclic F contains at least one element g that satisfies 
an equation of the form (gy)=g for some stringy. It thus involves an 
infinite number of function-application sequences and hence an 
infinite number of strings any of which might satisfy an uncertainty. 
But a finite-state machine can be constructed that accepts exactly the 
strings of attributes in these application sequences, for example, by 
using the Kasper/Rounds automaton model for f-structures (Kasper 
and Rounds, 1986). These strings thus form a regular language whose 
intersection with the uncertainty language is a regular set I 
containing all the strings for which the equation must be evaluated. If 
I is empty, the uncertainty is unsatisfiable. Otherwise, the set may be 
infinite, but ifF satisfies the uncertainty equation for any string at all, 
we can show the equation will be satisfied when the uncertainty is 
instantiated to one of a finite number of short strings in I. Let n be the 
number of states in a minimum-state deterministic finite-state 
acceptor for \[ and suppose that the uncertainty equation holds for a 
string w in I whose length Iwl is greater than n. From the Pumping 
Lemma for regular sets we know there are strings x, y, and z such that 
w=xyz, lYl >- l, and for all m -> 0 the string xymz is in L But these 
latter strings can be appfication-sequences in F only if y picks out a 
cyclic path, so that ((fx) y) = (fx). Thus we have 
(fw)=viff 
(f xyz) = v iff 
(((fx) y) z)=v iff 
(fix) z) = v iff 
(f xz) = u 
with xz shorter than w but still in I and hence in the uncertainty 
language a. lflxz I is greater then n, this argument can be reapplied to 
find yet a shorter string that satisfies the uncertainty. Since w was a 
finite string to begin with, this process will eventually terminate with 
a satisfying string whose length is less than or equal to n. We can 
therefore determine whether or not the uncertainty holds by 
examining only a finite number of strings, namely, the strings in \[ 
whose length is bounded by n. 
This argument can be translated to an efficient, practical 
solution to the verification problem by interleaving the intersection 
and testing steps. We enumerate common paths from the start-state of 
a minimum-state acceptor for a and from the f-structure denoted by fin 
F. In this traversal we keep track of the pairs of states and subsidiary 
f-structures we have encountered and avoid retraversing paths from a 
state/f-structure pair we have already visited. We then test the 
uncertainty condition against the f-structure values we reach along 
with final states in the u acceptor. 
3. Satisfiability 
It is more difficult to show that the satisfiability problem is decidable. 
Given a functional description, can it be determined that a structure 
satisfying all its conditions does in fact exist? For trivial descriptions 
consisting of a single uncertainty equation, the question is easy to 
answer. If the equation has an empty uncertainty language, 
containing no strings whatsoever, the description is unsatisfiable. 
Otherwise, it is satisfied by the f-structure that meets the 
requirements of any string freely chosen from the language, fro" 
instance, one of the shortest ones. For example, the description 
containing only (fTOPIC)=(fCOMP*GF) is obviously satisfiable 
because (fTOPIC) = (fsuBJ) clearly has a model. There is a large cIass of 
nontrivial descriptions where the question is easy to answer for 
essentially the same reason. If we know that the satisfiability of the 
description is the same no matter which strings we choose from the 
(nonempty) uncertainty languages, we can iastantiate the 
uncertainties with fi'eely chosen strings and evaluate the resulting 
description with any satisfiability procedure (for example, ordinary 
attribute-value unification) that works on descriptions without 
uncertainties. The bnportant point is that for descriptions in this class 
we only need to look at a single string from each uncertainty language, 
not all the stririgs it contains, to determine the satisfiability of the 
whole system. Particular models that satisfy the description will 
depend on the strings that instantiate the uncertainties, of course, but 
whether or not such models exist is independent of the strings we 
choose. 
Not all descriptions have this desirable free-choice 
characteristic. If the description includes a conjunction of an 
uncertainty equation with another equation that defines a property of 
the same variable, the description may be satisfiable tbr some 
inst,antiations of the uncertainty but not for others. Suppose that the 
equation (fTOPIC)=(fCOMP*GF) is conjoined with the equations 
(f COMe SUBJ NUM) =SG and (f TOPIC NUM) = eL. This description is 
satisfiable on the string COMe COMe SUBJ but not on the shorter string 
COMe SUBJ because of the SG/PL ','inconsistency that arises. More 
generally, if two equations (fa)=vQ and (f {\])=vp are conjoined in a 
description and there are strings in a that share a common prefix with 
strings in \[I, then the description as a whole may be satisfiable for some 
strings but not for others. The choice of x from.a and xy from 13, tbr 
example, implies a further constraint on the values vQ and v13: (fx)= va 
and (fxy) = ((fx) y) = vp can hold only if (v a y) = vii, and this may or may 
not be consistent with other equations for vQ. 
We can formulate more precisely the conditions under which 
the uncertainties in a description may be freely instantiated without 
affecting satisfiability. For simplicity, in the analysis below we 
consider a particular string of one or more symbols in a non-uncertain 
application expression to be the trivial uncertainty language 
containing just that string. Also, although out" satisfiability procedure 
is actually implemented within the general framework of a directed 
graph unification algorithm (the congruence closure method outlined 
by /Kaplan and Bresnan 1982/), we present it here as a formula 
rewriting system in the style of/Johnson 1987/. This enables us to 
abstract away from specific details of data and control structure which 
are irrelevant to the general line of argument. We begin with a few 
definitions. We say that 
(5) A description is in canonical form if and only if 
(a) It is in disjunctive normal form, 
(b) Application expressions appear only as the left-sides of 
equations, 
(c) None of its uncertainty languages is the empty string e, 
and 
(d) For any equation f=g between two distinct variables, one 
of the variables appears in no other conjoined equation. 
There is a simple algorithm for converting any description to a 
logically equivalent canonical form. First, every statement containing 
an application expression (g {\]) not to the left of an equality is replaced 
by the conjunction of an equation (g \[3)= h, for h a new variable, with 
the statement formed by substituting h for (g \[3) in the original 
statement. This step is iterated until no offending application 
expressions remain. The equation (fa) = (g ~), for example, is replaced 
by the conjunction of equations (fa)=h A (g{3)=h, and the 
membership statement (g {\])~f becomes h(f A (g {\])= h. Next, every 
equation of the form (f s)=v is replaced by the equation f=v in 
accordance with the identity (2) above. The description is then 
2981 
transtbrmed to disjunctive normal form. Finally, for every equation of 
tire form f=g between two distinct variables both of which appear in 
other conjoined equations, all occurrences ofg i~ those other equations 
are replaced by f Each of these transformations preserves logical 
equivalence and the algorithm terminates after introducing only a 
finite number of new equations and variables and performing a finite 
number of substitutions. 
Now let Z be the alphabet of attributes in a description and 
define the set of first'attributes in a language a as follows: 
(5) First(a) ~-{s in E I sz is in u for some string z in E*} 
Then we say that 
(6) (a) Two application ex!Sressions (fa) and (g 13) are free if and 
only if 
(i) fand g are distinct, m" (ii) First(a) ~ First(\[l) = O and s is 
in neither a nor 13. 
(b) Two equations are free if and only if their application 
expressions are pairwi.se free. 
(c) A functional description is free if and only if it'is in 
canonical form and all its conjoined equations are pairwise 
free. 
If all the attribute strings on tire same variable in a canonical 
description differ on their first element, there can be no shared 
prefixes. The fi'ee descriptions are thus exactly those whose 
satisfiability is not affected by different uncertainty instantiations. 
3.1 Remoeing interactions 
We attack the satisfiability problem by providing a procedure for 
transforming a thnctional description D to a logically equivalent but 
free description D' any of whose instantiations .can he tested for 
satisfiability by traditional algorithms. We show that this procedure 
terminates for the desm'iptions that usually appear in linguistic 
grammars, namely, the descriptions whose atinimal models are all 
aeyclic. Although the procedure can detect that a description may 
have a cyclic minimal model, we cannot yet show that the procedure 
will always terminate with a correct answer if a cyclic specification 
interacts with an infinite uncertainty language. 
The key ingredient of this procedure is a transfornmtion that 
converts a conjunction of two equations that are not free into an 
equivalent finite disjunction of conjoined equations that are pairwise 
free. Consider the conjoined equations (fa)= v~ and (f~)=vo for some 
value expressions va and vl~, where (fn) and (fL3) are not free. Strings x 
and y arbitrarily chosen frmn a and 13, respectively, might be related in 
any of three significant ways: Either (a) x is a prefix ofy (y is xy' for 
some string y'), (b) y is a prefix ofx (x is yx'), or (c) x and y are identical 
up to some point and then diverge (x is zsxx' and y is zsyy' with symbol 
Sx distinct from Sy). Note that the possibility that x and y are identical 
strings is covered by both (a) and (b) with either y' or x' being empty, 
and that noninteracting strings fall into case (c) with z being empty. 
In each of these cases there is a logically equivalent reformulation 
involving either distinct variables or strings that share no first 
symbols: 
(7) (a) xisa prefixofy: 
(fx) = v~ A (fxy') = v~ iff 
(f x) = v,~ A ((f x) y') = v~ iff 
(f x) = vQ A (on y') = V~ (by substituting va for (~x) 
(b) y is a prefix of x: 
(fyx') ~- va A (fy) :: el3 iff 
(v~ x) = v~ A (f)') = Ul~ 
(c) x and y have a (possibly mnpty) common prefix and then 
diverge: 
(f zs~') = o, A (f ZSyy') = v~ iff 
(f z) = g A (g s~x') =.vo A bX Syy ') = u~ 
for g a new variab!e and symbols s~ ~e sy 
All ways in which the chosen strings can interact are covered by the 
disjunction of these reformulations. We observe that if these specific 
attribute strings are considered as trivial uncertainties and if va and vl~ 
are distinct from f, the resulting equations in each case are pairwise 
free. 
In this analysis we transfer the dependencies among chosen 
strings into different branches of a disjunction. Although we have 
reasoned so far only about specific strings, an analogous line of 
argument can be provided for families of strings in infinite uncertainty 
languages. The strings in these languages fall into a finite set of 
classes to which a similar case analysis applies. Let <Qq, 8~, qu, Fa, 
E> be the states, transition function, start state, final states, and 
alphabet of a (perhaps nondeterministic) finite-state machine that 
accepts a and let < QIt, 50, q13, Fl3, E > be an accepter for \[l. Let 8" be the 
usual extension of 8 to strings in E* and define 
(8) Prefix(a,q) -= {x\[ q (8*a(qa,x) } 
(the prefixes of strings in u that lead to state q) 
Suffix(u,q) -~ {xJS*(q'x)flFn ~: O} ifq~Q~ 
U Suffix(u,p) ifq C Qa pEq 
(the suffixes of strings in a whose prefixes lcad to states q) 
and note that Prefix(a,q) and Suffix(a,q) are regular sets for all q in Q~ 
(since finite-state acceptm's for them can easily be constructed from the 
accepter for a). Further, every string in a belongs to the concatenation 
of Prefix(a,q) and Suffix(a,q) for some state q in Qa. The prefixes of all 
strings in u thus behmg to a finite number of languages Prefix(n,q), 
and every prefix that is shared between a string in a and a string in fl 
also belongs to a finite number of classes formed by intersecting two of 
regular sets of this type. The common prefix languages fill the role of 
the prcfix strings in the three-way analysis above. All interactions of 
the strings in a and 13 that lead through states q and r, respectively, are 
covered by the following possibilities: 
(9) (a) Strings fi'om a are prefixes of strings fiom 13: 
(f aCIPrefix(~,r)) = va A (v(~ Suffix(13,r)) = uf~ 
(b) Strings fi'om 13 are prefixes of strings from a: 
(f \[ff/Prefix(a,q))= ell A (v~ Suffix(a,q))= u a 
(c) Strings have a common prefix and then diverge on some sa 
and sl~ in Z: 
(f Prefix(n,q)NPrefix(\[J,r)) == gqv A 
\[(g'q,r ,%Suffix(u,8~,(q,so)))= vQ A 
(g,q,,. s~Suffix(\[3,8~(r,sl0)) = rid 
where the gq,r in (9c) is a new variable and sa:esl~. Taking the 
disjunction of these cases over the cross-product of states in Qa and Q~ 
and pairs of distinct symbols in E, we define the following operatm': 
(10) Free((fa)=v,, (fl~)=vo) 
\[(f anPrefix(13, r)) = ua A (v, Suffix(13, r)) = vl3l (a) 
V \[(f ~(1Prefix(a, q)) = el3 A (v{\] Suffix(u, q)) = ea\] (b) 
V V \[(f Prefix(a, q)APrefix(13, r))=gq,r A 
q( Qa \/ (c) 
V \[(gq,,. sQSuffix(n, Sa(q, sn)))=v~A r~Q~ 
8c~,8\[J~ (gq,r s\[3Suffix(~, 813(r , St3))) :Vii\] \] sa :~ s13 
This operator is the central component of our satisfiability 
procedure. It is easy to show that Free is truth-preserving in the sense 
that Free((fa)= va, (f 13)= v 0) is logically equivalent to the conjunction 
(fa) = va A (f13) = v~. Any strings x and y that satisfy the uncertainties 
in the conjunction must fall into one of the cases in (7). Ify=xy' applies 
(case 7a), we have (f x) ~= va A (va y') = vf~. But x leads to some state rx in 
Q~ and therefore belongs to Prefix(~,rx) while y' belongs to Suffix(13,rx). 
Thus, x satisfies (f a(3Prefix(13,rx))=va and y' satisfies 
(va Suffix(~,rx) = v~, ann (10a) is satisfied for one of the r x disjunctions. 
A symmetric argument goes through ifcas e (7b) obtains. 
Now suppose the strings diverge to SxX' and Syy' for distinct sx 
and Sy after a common prefix z (case 7c) and that z leads to q in Qa and r 
in Q~. Then z belongs to Prefix(a,q)NPrefix(~,r) and satisfies the 
uncertainty (f Prefix(a,q)APrefix(~,r))=gq,r. Since x' belongs to 
299 
Suffix(a,Sa(q,sx))) and y' belongs to Suffix(13,Si~(r,sy))), the gq.,. equations 
in the s~,sp disjunction also hold. Thus, if both original equations are 
satisfied, one of the disjunctions in (10) will also be satisfied. 
Conversely, if one of the disjunctions in (lO) holds for some particular 
strings, then we can find other strings that satisfy both original 
equations. If (f oC~Prefix(!3,r)) = va hokts for some string x in a leading 
to state r in it's accepter and (va Suffix(ILr))= % holds for some stringy' 
in Suffix(\[~,r),then (fa)=va holds because x is in a and (f\[~)=v~ holds 
because ((f x) y') = v~ = (f xy') and xy' is in \[k The argmnents for the 
other cases in (10) are similarly easy to construct. Thus, logical 
equiwdenee is established by reasoning back and forth between strings 
and languages and between strings and their prefixes and suffixes. 
If the operands to Free are from a description in canonical 
form, then the canonical form of the result is a free description--all its 
conjoined equations are pairwise free. This is true whether or not the 
original equations were free, provided that the value expressions va 
and v\[3 are distinct from f(if either value was f, the original equations 
would have only cyclic models, a point we will return to below). In the 
first two eases in (10), the resulting equations are fi'ee because they 
have distinct variables (if neither vQ nor vp is f). In the third ease, the f 
equation is free of the other two because gq,r is a new variable, and the 
two gq,r equations are free because the first symbols of their 
uncertainties are distinct. In sum, the Free operator transforms a 
conjunction of two non-fl'ee equations into a logically equivalent 
formula whose canonical form is free. 
The procedure for converting a description D to free form is 
now straightforward. The procedure has four simple steps: 
(It) (a) Place D in canonical form. 
(b) If all conjoined equations in D are pairwise free, stop. D is 
free. 
(el Pick a conjunction C in D with a pair of non-free equations 
(f a)=v~ and (f l~)=vi~, and replace C in D with the 
canonical form of its other equations conjoined with 
Free(((fn) = va, (fl~) = %) 
(d) Go to step (a). 
3.2 Termination 
lfD has only aeylic models, this procedure will terminate after a finite 
number of iterations. We argue that there are a certain number of 
ways in which the equations in each conjunction in D's canonical form 
can interact. Initially, for a conjunction C of N equations, the maximal 
number Of non-free pairs is N(N-1)/2, on the worst-ease assumption 
that every equation may potentially interact with every other 
equation. Suppose step (1 le) is applied to two interacting equations in 
C. The result will be a disjunction of conjunctions each of which 
includes the remaining equations from C and new equations 
introduced by one of the eases in (10). In eases (10a) and (10b) the 
interaction is removed from the common variable of the two equations 
(D and transferred to a new variable (either va or %). In ease (10c), the 
interaction is actually removed from the system as a new variable is 
introduced. Since new variables are introduced only when an 
interaction is removed, the number of new variables is bounded. Thus 
each interaction is processed only a bounded number of times before it 
is either" removed (10el or transferred to a variable that it was 
previously associated with (t0a, b). ttowever, it can only transfer to a 
previous variable if the description has cyclic models. Suppose that f 
is reached again through a series of (10a,b) steps Then there is a 
conjoined sequence of equations (f a):va, (va ut)=vav ..., 
(v% an + 11 = f But these can only be satisfied if there is some string x 
in aat...an+ 1 such that (f x)=f and this holds only of cyclic models. 
Since the number of variables introduced is bounded by the original 
number of possible interactions, all actual interactions in the system 
must eventually disappear either through the application of (10c) or by 
being transferred to a variable whose other equations it does not 
interact with. 
300 
As we argued above, the satisfiability of a free description can 
be determined by arbitrarily instantiating the residual uncertainties 
to particular strings and then applying any traditional satisfiability 
algorithm to the result. Given the Free operator and the procedure in 
(ll), the satisfiability of an arbitrary acyclic description is thus 
decidable. 
'Phe possibility of nontermination with cyclic descriptions may 
or may not be a problmn in linguistic practice. Although the formal 
system makes it easy to write descriptions of this sort, very few 
linguistic analyses have made use of them. The only example we are 
aware of involves modification structures (such as relative clauses) 
that both belong to the element they modify (the head) and also 
contain that element internally as an attribute value. But out' 
procedure will in fact terminate in these sorts of eases. The difficulty 
with cycles crones fl'om their interaction with infinite uncertainties. 
That is, the desm'iption may have cyclic models, but the cyclic 
specifications will not always lead to repeating variable transfers and 
nontermination. For example, if the cycle is required by an 
uncertainty that interacts with no other infinite uncertainty, the 
procedure will eventually terminate with a fi'ee description. This is 
what happens in the modification ease, because the cycle involves a 
grammatical function (say RELCLAUSE or MOD) which belongs to no 
infinite uncertainty. 
I,'or cycles that are not of this type, there is a straightforward 
modification to the procedure in (11) that at least enables them to be 
detected. We maintain with each uncertainty a reem'd of all the 
variables that it or any of its ancestors have been associated with, and 
recognize a potentially nonterminating cycle when the a transfer to a 
variable already in the set is attemi~ted. If we terminate the procedure 
when this happens, assuming in effec~ that all subsequent disjunctions 
are unsatisfiable, we cannot be sure that all possible solutions will be 
aeemmted for and thus cannot guarantee the completeness of our 
procedure in the cyclic case. We can refine this strategy by recording 
and avoiding iteration over combinations of variables and uncertainty 
languages. We thus safely explore more of the solution possibilities 
but perhaps still not all of them. It is an open question whether or not 
there is a satisfiability procedure different from the one we have 
presented that terminates correctly in all eases. On the other band, it 
is also not clear that potential solutions that might be lost through 
early termination are linguistically significant. Perhaps they should 
be excluded by definition, much as /Kaplan and Bresnan 1982/ 
excluded c~structure derivations with nonbranching dominance chains 
because of their linguistically uninteresting redundancies. 
4. The Smallest Models 
The satisfiability of a description in free form is independent of the 
choice of strings from its uncertainty languages, but of course different 
string choices result in different satisfying models for the description. 
An infinite number of strings can be chosen from even a very simple 
functional uncertainty such as (f COMP* SUBJ) : V, and thus there are 
an infinite nunlber of distinct possible models. This is reminiscent of 
the infinite nmnber of models for descriptions with no uncertainties at 
all (just (fsunJ)=v), but in this case the models are systematically 
related in the natural subsumption ordering on the f~structure lattice. 
There is one smallest structure; the others include the information it 
contains and thus satisfy the description. But they also include 
arbitrary amounts of additional information that the description does 
not call for. This is discussed by/Kaplan and Bresnan 1982/, where the 
subsumption-minimal structure is defined to be the grammatically 
relevant one. 
The models corresponding to the choice of different strings 
from an infinite uncertainty are also systematically related to each 
other but on an metrle that is orthogonal to the subsumption ordering. 
Again appealing to the Pumping Lemma for regular sets, strings that 
are longer than the number of states in an uncertainty's minimal-state 
finite-state accepter include a substring that is accepted by some 
repeating sequence of transitions. Replicating this substring 
arbitrarily still yields a string in the uncertainty, so in a certain sense 
these replications contribute no new grammatically interesting 
information. Since all the intbrmatim~ is esseutially contained in tile 
shorter st,rinh~ that has no oeeurreuce of this imrtieular subs(ring, we 
define this t. be the grammatically relevant representative fin" the 
whole class. Thus a description with uncertainties has only a finite 
number of lir~guistically significant models, those that result h'mn the 
5nite disjunci:ions that are introduced in converting the description to 
flee form and fl'om choosing among the finite nmnber of Short strings 
in the residual uncertainties. 
5. Pek'farmmice Considerations 
We have outlined a general, abstract procedure fro' solving uncertainty 
descriptions, making the smallest number of assumptions about the 
details of its operatiml, '\['he efficiency of any i,nphmmntation will 
depend in huge nleasure in just how details of data str(u:ture and 
explicit COlnp~ttational control are fixed. 
There are a nuruber of obvious optimizations tbat can be made. 
First, although not required by the abstract procedure, perfornmnce 
will clearly be better if deterministic, minimal-state finite-state 
nmchines are used to represent the uncertainties. This reduces the 
size of the :;late eross-prodnets, which is the leading term in the 
number of disiunctions that nnlst be processed. Second, the cases in 
the Free operatm' are not mutually distinct: if identical strings behmg 
to the two um-ertainty languages, those wonld full into both cases (at 
and (b) and hence be processed twice with exactly equivalent results. 
The solution to this redundancy is to restrict one of tile cases (say (at) 
so that it only handles proper prefixes, consigning the identical strings 
to the otber case. Third, when pairs of symbols are enumerated in the 
(el case, there is obviously no point in even considering symbols that 
are in the alphabet bnt are not First symbols of the suff'ix 
uncertainties. This optimization is applied automatically if only the 
transitions leaving the start states are enmnerated and the 
finite-state machines tire represented with partial transition functions 
pruned of transitions to failure states. 
Four(b, a derivative uncertainty produced by the Free opm'ator 
will sometimes be empty. Since equations with empty nncertainties 
are imsatisfiable by definition, tiffs case should be detected and that 
disjunctive brt, nch immediately discarded. Fifth, the same derivative 
suffix and prefix languages of a particular state may appear in 
pursuing diffecent branches of the disjunction er processing different 
combinations af equations. Some conq)utaUonal advantage may be 
gained by saving the derivative finite-state machines in a cache 
associated with the states they are based on. Finally, successive 
iterations of the Free procedure may lead to transparent 
inconsistencies; (an assertion of equality between two distinct symbols; 
m" equating a synlbol to a variable that is also nsed as a functimi). It is 
important to detect these inconsistencies when they first appear and 
again discard the corresponding disjunctive branch. In fact, if' this is 
done systemaUcally, iterated application of the Free operator by itself 
simulates the effect of traditional unification algorithms, with 
variables corresponding to f-structures or nodes of a directed graph. 
There are also some less obvious but also quite important 
peribrmance considerations. What we have described is an equational 
rewriting system that is quite different fl'om the usual reeursive 
unification algorithm that operates on directed graptl representations. 
Directed graph data structures index the information in the equations 
so that related structures are quickly accessible through the reem'sive 
control structure. Since our procedure does not depend for its 
correctness o~, (he order in which interacting equations arc chosen for 
i:recessing, it ought to be easy to embed Free as a simple extension of a 
traditional algorithm. However, traditional unification algorithms do 
not deal with disjnnetion gracefully. In particular, they typically do 
net expect new disjunctive branches to arise (luring the course of a 
reeursive invocation; this would require inserting a fork in the 
reeursive control structure or saving a emnplete 'copy of the enrrent 
computational context for each new disjunction. We avoid this 
~wkwar(tness by postponing tile processing of the functional 
uncertainty natil all simple unifications are complete. Before 
performing a simple unification step, we remove from the data 
struetures all uncertsinties that need to be resolved and store them 
with a pointer to their contahdng structures on a qmme or agenda of 
peuding t.mificaLions. Uncertainty proceasing can be resumed at a 
later, more convenient time, after tile sinlpler unil'lcations have hecIl 
completed. (Indeed, if mm of tile simpler unifications fails, the 
mlcertainty may never be processed at all.) Waiting until sinipler 
nnifications are done means that no computational state has to be 
preserved; only data structures have to be copied to \[wmre the 
independence of the various disjunctive paths. 
We also note that as l<lng as the machinery \[br postponing 
thnctiona\[ uncertainty 6~r some anmunt of time is needed, it is often 
advantagemm to postpoue it even hinter than is absohltely necessary 
In i)artieuhu', we fonnd I:lalL il' uncertainties are postl)nned until 
predicates (seulantic form values lilt' PIU'tD attributes) at'(! assigned to 
the I' structures they belong to, the nuluber of cases that must be 
explored is dramatically reduced. This is heeause of the coherence 
cm~dition that I,FG imposes on t\struetures with In'edicates: an 
\['-structure with a predicate can only contain (.hose govvrnable 
functions that are explicitly mentioned by the predicate. Any other 
governable \['unctions are considered unacceptable. Thus, if we wail 
until the predicate is klentified, we need only consider the small 
number of governable attributes that any particular predicate allows, 
even though the initial attributes in an uncertainty may include the 
entire set of governab\[e functions (SUB J, oBJ, and various kinds of 
obliques and eonlplmnents), and this may be quite large. The effect is 
to make tim processing of hmg distance dependencies sensitive to the 
subeategorization fralne of the predicate: we haw=" ahserved eUOFInOUS 
ow,'all performance ilnprovemetm; from applying this delay strategy 
Note that m a left.to-right parsing model, the processing h)ad therefore 
increases iu relative clauses just after the predicate is seen, and this 
might bare a variety of interesting psycholinguistic implications. 
Finally, we observe that there is a specialization of the Free 
operator that applies when an uncertainty interacts with several 
non uncertainty equations (equations whose attribute expressions 
have singleton First set:;), instead of separating one interaction flxun 
the uncertainty with each application of Fl'eo, the Itncertainty is 
divided in a single step into u minimum nmuber of disjunctive 
possibilities eacilef which interacts with just one of the. other 
equations. The disjunction contains one branch for each symbol in the 
uneertainty's First set that is an initial attribute in one of the other 
equations, ohm a single branch tbr all of the residual inithd symbols: 
(12) (fa)=u iff (fslSuffix(a,S(qa,st)))-:v ...v(fsnSuffix(u,5(q(,,st~)))::::v 
V (l'n--{s b...s,d~:*) = v 
The statement of the generic Free a/gm'ithm (10) is simplified by 
considering specific attributes as trivial regular languages, buL this 
suggests that COlnplex finite-state machinery would be roquh'ed to 
process them. This alternative works in the opposite direction: it 
reduces leading ternls in an uncertainty to simph. ~ attrihutes boil)re 
pursuing their interactions, so that efficient attribute lnatehing 
routines of a normal unification procedm'e can be applied. This 
alternative has a second computational advantage. The generic 
algorithm unwinds the uncertainty one attribute at a time, 
constructing a residual regular set at each step, which is then 
processed against the other nml-uncertain equations. The alternative 
pr(leesses them all at once, avoiding the construction of these 
intermediate residual languages. This is a very ilnportanl 
optimization, since we lbund it to be the most colnmon case when we 
embedded uncertainty resohltion in our reeursive unification 
algorithm. 
Unem'tainty sl/ecificatlons are at colnI)act way of expressing a 
large number of disjunctive possibilities that are uncovered one by one 
as our procedure operates. It might seem that this is an extremely 
expensive descriptive device, one which should lie avoided in tltvor of 
apparently simpler 'mechanisms. Bul; the disjunctions that emerge 
fl'om processing uncertainties arc real: they represent independent 
grammatical possibilities that would require additional computational 
resources no matter how they were expressed. In theories in which 
long-distance dependencies are based ou empty phase~strueture nodes 
and implemented, for example, by gap..threading machinery, a'rN 
tIol,I) lists, and the like, the exact h)cation of these empty nodes is not 
signaled by any in(urination directly visible in the sentence. This 
301 
increases the number of phrase.structure rules that can he applied. 
What we see as the computational cost of functional uncertainty shows 
up in these systems as additional resources needed for 
phrase-structure analysis and for functional evaluation of the larger 
number of trees that the phrase-structure component produces. 
Unlike phrasally-based specifications, fnnctional uncertainties in LFG 
are defined on the same level of representation as the 
subcategorization restrictions that constrain how they can he resolved, 
which our coherence-delay strategy easily takes advantage of. But the 
thct remains that functional uncertainties do generate dlsjueetions, 
and thus strongly highlight the already perceived need for efficient 
disjunction-processing techniques if acceptable performance is to be 
achieved with I,FG and related grammatical formalisms. Recent 
disjunction proposals by/Kasper 1987/and/Eisele and D0rre 1988/are 
important steps in the development of the necessary computational 
technology. 
6. Conclusion 
The notion of regular functional uncertainty thus has very nice 
mathematical properties. Our state-decomposition algorithm provides 
a very attractive method for resolving functional uncertainties as 
other phrasal and functional constraints are computed during the 
parse of a sentence. This algorithm expands the uncertainties 
incrementally, introducing at each point only as much disjunction as is 
necessary to avoid interactions with other functional information that 
has already been taken into account. We bare recently added this 
algorithm and the functional uncertainty notation to our LFG 
Grammar Writer's Workbench, and we can now rigorously but easily 
test a wide range of linguistic hypotheses. We have also begun to 
investigate a number of other computational heuristics for the 
efficient, controlled expansion of uncertainty. 
Kaplan and Zaenen (in press) first proposed the idea of 
functional uncertainty as sketched in this paper to account for the 
properties of long-distance dependencies within the LFG h'amework. 
In this fi'amework, it has already shed new light on long-standing 
problems like island constraints (see, e.g., /Saiki 1985/ for an 
application to Japanese). But the notion is potentially of much wider 
use: first, it can be adapted to other unification grammar formalisms 
to handle facts of a similar nature; and second, it can be used to handle 
phenomena that are traditionally not thought of as falling into the 
same class as long-distance dependencies but that nevertheless seem 
to involve nonlocal uncertainty. A discussion of its application in the 
LFG framework to infinitival complements can be found in/Johnson 
1986/for Dutch and/Netter 1986/for German;/Karttunen (in press)/ 
discusses how similar extensions to Categorial Unification Grammar 
(CUG) can account in a simple way for related facts in Finnish that 
would otherwise require type-raising. Halvorsen has suggested that 
scope ambiguities in semantic structures might also be characterized 
by this device. 
Acknowledgements 
Our understanding of the linguistic applications of functional 
uncertainty developed over a long period of time in discussions with 
Joan Bresnan, Kris Halvorsen and Annie Zaenen. Discussions with 
Mark Johnson helped us in the early formulations of the satisfiability 
procedure, and Bill Rounds assisted us in understanding the 
difficulties of the cyclic case. We are grateful for the invaluable 
assistance these colleagues have provided. 

References 

Eisele, A. and D6rre, J. 1988. Unification of disjunctive feature 
descriptions. Proceeedings of the 26th Annual Meeting of the 
Association fo~'Computational Linguistics. 

Johnson, M. 1986. An LFG description of the double infinitive 
construction in Dutch and German, CSLI report. 

Johnson, M. 1987. Attribute-value logic and the theory of grammar. 
Unpublished doctoral dissertation, Stanford University. 

Kaplan, R. M. andBresnan, J. 1982. 1,exicaLfunctional grammar: A 
formal system for grammatical representation. In J. Bresnan 
(ed.), The mental representation of grammatical relations. 
Cambridge: MIT Press. 

Kapian, R. M. and Zaenen, A. In press. Long-distance dependencies, 
constituent structure, and functional uncertainty. In M. Baltin 
and A. Kroch (eds.), Alternative Conceptions of Phr~tse Structure. 
Chicago: Chicago University Press. 

Karttuncn, L. In press. Radical \[,exicalism. In M. Baltin and A. Kroch 
(eds.), Alternative Conceptions of Phrase Structure. Chicago: 
Chicago University Press. 

Kasper, R. 1987. A unification method for disiunetive feature 
descriptions. Proceedings of the 25th Annual Meeting of the 
Association for Computational Linguistics. 

Kasper, R. and Rounds. W. 1986. A logical semantics for feature 
structures. Proceedings of the 24th Annual Meeting of the 
Association for Computational Linguistics. 

Netter, K. 1986. Getting Things out of Order. COLING l 1. 

Saiki, M. 1985. On the coordination of gapped constituents in 
Japanese. CLS 21. 
