SHARED PREFERENCES 
James Barnett and Inderjeet Mani 
MCC 
3500 West Balcones Center Dr. 
Austin, TX 78759 
barnett@mcc.com 
mani@mcc.com 
!Abstract 
This paper attempts to develop a theory of 
heuristics or preferences that can be shared be- 
tween understanding and generation systems. We 
first develop a formal analysis of preferences and 
consider the relation between their uses in gener- 
ation and understanding. We then present a bi- 
directional algorithm for applying them and ex- 
amine typical heuristics for lexical choice, scope 
and anaphora in:, more detail. 
1 Introduction 
Understanding and generation systems must both 
deal with ambiguity. In understanding, there are 
often a number of possible meanings for a string, 
while there are usually a number of different ways 
of expressing a given meaning in generation. To 
control the expl6sion of possibilities, researchers 
have developed a variety of heuristics or prefer- 
ences - for example, a preference for low attach- 
ment of modifiers in understanding or for conci- 
sion in generation. This paper investigates the 
possibility of sharing such preferences between 
understanding and generation as part of a bidirec- 
tional NL system. In Section 2 we formalize the 
concept of a preference, and Section 3 presents 
an algorithm fo ! applying such preferences uni- 
formly in understanding and generation. In Sec- 
tion 4 we consider specific heuristics for lexical 
choice, scope, and anaphora. These heuristics 
have special properties that permit a more effi- 
cient implementation than the general algorithm 
from Section 3. Section 5 discusses some of the 
short-comings of the theory developed here and 
suggests directions for future research. 
2 Preferences in Under- 
standing and Generation 
Natural language understanding is a mapping 
from utterances to meanings, while generation 
goes in the opposite direction. Given a set String 
of input strings (of a given language) and a set Int 
of interpretations or meanings, we can represent 
understanding as a relation U C String x Int, 
and generation as G C IntxString. U and G are 
relations, rather than functions, since they allow 
for ambiguity: multiple meanings for an utter- 
ance and multiple ways of expressing a meaning 1. 
A minimal requirement for a reversible system is 
that U and G be inverses of each other. For all 
s 6 String and i 6 Int: 
(s, Oeu (i,s)ec (1) 
Intuitively, preferences are ways of controlling 
the ambiguity of U and G by ranking some inter- 
pretations (for U) or strings (for G) more highly 
than others. Formally, then, we can view prefer- 
ences as total orders on the objects in question 
(we will capitalize the term when using it in this 
technical sense). 2 Thus, for any s 6 String an un- 
derstanding Preference Pint will order the pairs 
{(s, 01(s,=) E U}, while a generation Preference 
* The definitions of U and G allow for strings with no 
interpretations and meanings with no strings. Since any 
meaning can presumably be expressed in any language, 
we may want to further restrict G so that everything is 
expressible: Yi 6 Int (Bs 6 String \[(s, ,) E GI). 
2We use total orders rather than partial orders to avoid 
having to deal with incommensurate structures. The re- 
quirement of commensurability is not burdensome in prac- 
tice, even though many heuristics apparently don't apply 
to certain structures. For example, a heuristic favoring low 
attachment of post-modifiers doesn't clearly tell us how to 
rank a sentence without post-modifiers, but we can insert 
such sentences into a total order by observing that they 
have all modifiers attached as low as po6sible. 
109 
P,,r will rank {(/,s)l(/,, ) 6 G} s. Thus we can 
view the task of understanding as enumerating 
the interpretations of a string in the order given 
by Pint. Similarly, generation will produce strings 
in the order given by Po,,. Using Up,., and Gp.,. 
to denote the result of combining U and G with 
these preferences, we have, for all s G String and 
i 6 Int: 
Up,., (s) =d,l (ix .... , in) (2) 
where U(s) = {il,..., in} and 
\[j < k\] ~ \[(s, it) <p,., (s, ik)\] 
GPo,.(i) =d,! (s,...so 0 (3) 
where G(0 = {sx,..., sin} and 
IJ < --. <p.,. (i, sk))\] 
Alternatively, we note that any Preference P 
induces an equivalence relation ='p which groups 
together the objects that are equal under p.4 
We can therefore view the task of Generation 
and Understanding as being the enumeration of 
P's equivalence classes in order, without worrying 
about order within classes (note that Formulae 2 
and 3 specify the order only of pairs where one 
member is less than the other under P.) 
The question now arises of what the relation be- 
tween understanding Preferences and generation 
Preferences should be. Understanding heuristics 
are intended to find the meaning that the speaker 
is most likely to have intended for an utterance, 
and generation heuristics should select the string 
that is most likely to communicate a given mean- 
ing to the hearer. We would expect these Prefer- 
ences to be inverses of each other: if s is the best 
way to express meaning i, then i should be the 
most likely interpretation of s. If we don't accept 
this condition, we will generate sentences that 
we expect the listener to misinterpret. There- 
fore we define class(Preference, pair) to be the 
equivalence class that pair is assigned to under 
Preference's ordering, 5 and link the the first 
3Note that this definition allows Preferences to work 
'across derivations.' For example, it allows Pint to rank 
pairs (s, ,}, (s', i9 where 8 # s'. It permits a Preference to 
say that i is a better interpretation for s than i' is for s:. 
It is not clear if this sort of power is necessary, and the 
algorithtns below require only that Preferences be able to 
rank different interpretations (strings) for a given string 
(interpretation). 
4Any order P on a set of objects D partitions D into a 
set of equivalence classes by assigning each x E D to the 
set {ulv _<P x :z x _<p u}. 
Selass(Preference, pair) is defined as the number of 
classes containing items that rank more highly than pair 
under Preference. 
(most highly ranked) classes under P/., and P.,r 
as follows: 
elass(eo,r, (/, 8)) = 0 (4) 
-. crass(P,.,, (s, O) = 0 
It is also reasonable to require that opposing 
sets of preferences in understanding be reflected 
in generation. If string s, has two interpretations 
it and i2, with it being preferred to is, and string 
ss has the same two interpretations with the pref- 
erences reversed, then s, should be a better way 
of expressing i, than i2, and vice-versa for ss: 
\[ (sl, il) <p,., @1, is) 
& 
(as, i2) <e,., (as, il) \] 
\[(il,sd <p.,. (i ,82) 
& 
(is,as) <p.,. 62,.x) \] 
(5) 
Formula 4 provides a tight coupling of heuris- 
tics for understanding and generating the most 
preferred structures, but it doesn't provide any 
way to share Preferences for secondary readings. 
Formula 5 offers a way to share heuristics for sec- 
ondary interpretations, but it is quite weak and 
would be highly inefficient to use. To employ it 
during generation to choose between sl and ss as 
ways of expressing il, we would have to run the 
understanding system on both sl and ss to see if 
we could find another interpretation i2 that both 
strings share but with opposite rankings relative 
to il. 
If we want to share Preferences for secondary 
readings, we will need to make stronger assump- 
tions. The question of ranking secondary in- 
terpretations brings us onto treacherous ground 
since most common heuristics (e.g., preferring 
low attachment) specify only the best reading 
and don't help choose between secondary and 
tertiary readings. Furthermore, native speakers 
don't seem to have clear intuitions about the rel- 
ative ranking of lesser readings. Finally, there is 
some question about why we should care about 
non-primary readings, since the best interpreta- 
tion or string is normally what we want. However, 
it is important to deal with secondary preferences, 
in part for systematic completeness, but mostly 
because secondary readings are vital in any at,- 
tempt to deal with figurative language- humor, 
irony, and metaphor - which depends on the in- 
terplay between primary and secondary readings. 
ii0 
i 
To begin to develop a theory of secondary Pref- 
erences, we will simply stipulate that the heuris- 
tics in question are shared 'across the board' be- 
I tween understanding and generatmn.~ The sim- 
plest way to do this is to extend Formula 4 into a 
biconditional, a~d require it to hold of all classes 
(we will reconsider this stipulation in Section 5). 
For all s6Strin~l and i6Int, we have: 
et.ss(P,.,, (,,,)) = el.ss(P.,., (i, 8)) (6) 
Since Preferences now work in either direction, 
we can simplify our notation and represent them 
as total orderings of a set T of trees, where each 
node of each tre4 is annotated with syntactic and 
semantic information, and, for any t 6 T, str(t) 
• k returns the string in String that t dominates (i.e., 
spans), and sere(t) returns the interpretation in 
Int for the root node of t. For apreferenee P on 
T and trees tl, th, we stipulate: 
t, <p t2 
Up(str(tl)) 
tl <p t2 
= 
& 
Gp(sem(tl)) = 
str(tl)=str(t2) (7) 
~ . . sem(tl ) . . . sem(t2 ) . . .) 
sere(q) = sere(t2) (8) 
6..str(t,)...str(t2)...) 
We close this Section by noting a property of 
Preferences thatwill be important in Section 4: 
an ordered list Of Preferences can be combined 
into a new Preference by using each item in the 
list to refine the bordering specified by the previ- 
ous ones. That is, the second Preference orders 
pairs that are equal under the first Preference, 
and the third Preference applies to those that are 
still equal under the second Preference, etc. If 
P1--. P, are Preferences, we define a new Com- 
plex Preference P<,...,> as follows: 
tl <Pc-, ...,.) t2 (9) 
~-;B l<j< n \[Q <pj t2\] 
& -,3i<j \[t2 <p, tl\] 
3 An Algorithm for Sharing 
Preferences 
If we consider ways of sharing Preferences be- 
tween understanding and generation, the simplest 
one is to simply produce all possible interpreta- 
tions(strings), and then sort them using the Pref- 
erence. This is, of course, inefficient in cases 
where we are interested in only the more highly 
ranked possibilities. We can do better if we are 
willing to make few assumptions about the struc- 
ture of Preferences and the understanding and 
generation routines. The crucial requirement on 
Preferences is that they be 'upwardly monotonic' 
in the following sense: if t, is preferred to t2, then 
it is also preferred to any tree containing tz as a 
subtree. Using subtree(t,,t2) to mean that tx is 
a subtree of t2, we stipulate 
\[tl <p t2 ~ subtree(t2,t3)\] (10) 
--~ tl <P gS 
Without such a requirement, there is no way to 
cut off unpromising paths, since we can't predict 
the ranking of a complete structure from that of 
its constituents• 
FinaLly, we assume that both understanding 
and generation are agenda-driven procedures that 
work by creating, combining, and elaborating 
trees. 6 Under these assumptions, the following 
high-level algorithm can be wrapped around the 
underlying parsing and generation routines to 
cause the output to be enumerated in the order 
given by a Preference P. In the pseudo-code be- 
low, mode specifies the direction of processing and 
input is a string (if mode is understanding) or a 
semantic representation (if mode is generation). 
execute_item removes an item from the agenda 
and executes it, returning 0 or more new trees. 
generate_items takes a newly formed tree, a set of 
previously existing trees, and the mode, and adds 
a set of new actions to the agenda. (The un- 
derlying understanding or generation algorithm 
is hidden inside generate_items.) The variable ac- 
tive holds the set of trees that are currently be- 
ing used to generate new items, while frozen holds 
those that won't be used until later, complete_tree 
is a termination test that returns True if a tree is 
complete for the mode in question (i.e., if it has 
a full semantic interpretation for understanding, 
or dominates a complete string for generation). 
The global variable classes holds a list of equiva- 
lence classes used by equiv_class (defined below), 
while level holds the number of the equivalence 
class currently being enumerated. Thaw~restart 
is called each time level is incremented to gener- 
ate new agenda items for trees that may belong 
to that class. 
ALGORITHM 1 
e A wide variety of NLP algorithms can be implemented 
in this manner, particularly such recent reversible gen- 
eration algorithms as \[Shieber, van Noord, Moore, and 
Pereira, 1989\] and \[Calder, Reape, and Zeevat, 1989\]. 
iii 
classes := Nil; solutions := Nil; 
new-trees := Nil; agenda := Nil; 
level := 1; 
frozen := initialize.agenda(input, mode); 
{end of global declarations} 
while frozen do 
begin 
solutions := get_complete_trees 
(frozen, level, mode); 
agenda := thaw&restart 
(frozen, level, agenda, mode); 
while agenda do 
begin 
new_trees := execute_item(agenda); 
while new_trees do 
begin 
new_tree := pop(new_trees); 
if equiv_class (P, new_tree) 
, > level 
then push(new_tree, frozen); 
else if complete_tree 
(new_tree,mode) 
then push(newAree, solutions); 
else generate, items 
(new_tree, active, 
agenda, mode); 
end; 
end; {agenda exhausted for this level} 
{solutions may need partitioning} 
while solutions do 
begin 
complete_tree := pop(solutions); 
if equiv_class(P, complete_tree) 
> level 
then push(complete_tree, frozen); 
else output(complete_tree, level) ; 
end 
{increment level to output next class} 
level := level + 1; 
end; 
The function equiv_class keeps track of the 
equivalence classes induced by the Preferences. 
Given an input tree, it returns the number of 
the equivalence class that the tree belongs to. 
Since it must construct the equivalence classes as 
it goes,along, it may return different values on 
different calls with the same argument (for ex- 
ample, it will always return 1 the first time it is 
called, even though the tree in question may end 
up ha a lower class.) However, successive calls 
to equiv_class will always return a non-decreasing 
series of values, so that a given tree is guaran- 
teed to be ranked no more highly than the value 
returned (it is this property of eqaiv_class that 
forces the extra pass over the completed trees in 
the algorithm above: a tree that was assigned to 
class n when it was added to solutions may have 
been demoted to a lower class in the interim as 
more trees were examined). Less_than and eqeai 
take a Preference and a pair of trees and return 
True if the first tree is less than (equal to) the 
second under the Preference. Create_class takes a 
tree and creates a new class whose only member 
is that tree, while insert adds a class to classes 
in the indicated position (shifting other classes 
down, if necessary), and select_member returns an 
arbitrary member of a class. 
function equiv_class (P: Preference, T: Tree) 
begin 
class_num := 1; 
for class in classes do 
begin 
if less_than 
(P, T, select_member(class)) 
then 
begin 
insert(new_class(T), 
classes, class_num); 
return(classmum); 
end; 
else if equal 
(P, T, select_member(class)) 
then 
begin 
add_member(T, class); 
return(class_hum); 
end; 
else class_num := class_num + 1; 
end ; 
{T < all classes} 
insert(new_class(T), 
classes, class_num); 
return(class_num); 
end {equiv_elass} 
To see that the algorithm enumerates trees in 
the order given by <p, note that the first itera- 
tion outputs trees which are minimal under <p. 
Now consider any tree t, which is output on a 
subseqent itertion N. For all other t,, output on 
that iteration, t, =p t,,. Furthermore, t, con- 
tains a subtree t,ub which was frozen for all levels 
up to N. Using T(J) to denote the set of trees 
output on iteration J, we have: VI_< I< N 
IV ti 6 T(I) ti <p t,ub\]\], whence, by stipulation 
10, t, <p ti. Thus t, is greater than or equal to 
112 
all trees which were enumerated before it. To cal- 
culate the time complexity of the algorithm, note 
that it calls equiv_class once for each tree created 
by the underlying understanding or generation al- 
gorithm (and once for each complete interpreta- 
tion). Equiv_class, in turn, must potentially com- 
pare its argument with each existing equivalence 
class. Assuming that the comparison takes con- 
stant time, the '.complexity of the algorithm de- 
pends on the number k of equivalence classes <p 
induces: if the Underlying algorithm is O(f(n)), 
the overall comp~lexity is O(f(n)) x k. Depending 
on the Preference, k could be a small constant, 
or itself proportional to f(n), in which case the 
complexity woul~ be O(f(n)~). 
4 Optimization of Prefer- 
ences 
As we make more restrictive assumptions about 
Preferences, more efficient algorithms become 
possible. Initialily , we assumed only that Pref- 
erences specified! total orders on trees, i.e., that 
would take two I trees as input and determine 
if one was less than, greater than, or equal to 
the other ~. Given such an unrestricted view 
of Preferences, ~ve can do no better than pro- 
ducing all interp~-etations(strings) and then sort- 
ing them. This simple approach is fine if we 
want all possibilities, especially if we assume 
that there won't, be a large number of them, so 
that standard n ,2 or n logn sorting algorithms 
(see \[Aho, Hopcroft, and Ullman, 1983\]) won't be 
much of an addit~ional burden. However, this ap- 
proach is inefficient if we are interested in only 
some of the possibilities. Adding the monotonic- 
ity restriction 10 permits Algorithm 1, which is 
more efficient in. that it postpones the creation 
of (successors of) lower ranked trees. However, 
we are still opera'ting with a very general view of 
what Preferencesl are, and further improvements 
are possible when we look at individual Prefer- 
ences in detail, in this section, we will consider 
heuristics for lexical selection, scope, and anaphor 
resolution. We do not make any claims for the 
usefullness of these heuristics as such, but take 
them as concrete 'examples that show the impor- 
tance of considering the computational properties 
of Preferences. 
Note that Algorithm 1 is stated in terms of a 
single Preference. It is possible to combine multi- 
ple Preferences into a single one using Formula 9, 
rWe also assume \[hat this test takes constant time. 
and we are currently investigating other methods 
of combination. Since the algorithms below are 
highly specialized, they cannot be combined with 
other Preferences using Formula 9. The ultimate 
goal of this research, however, is to integrate such 
specialized algorithms with a more sophisticated 
version of Algorithm 1. 
4.1 Lexical Choice 
One simple preferencing scheme involves assign- 
ing integer weights to lexical items and syntactic 
rules. Items or rules with higher weights are less 
common and are considered only if lower ranked 
items fail. When combined with restriction 10, 
this weighting scheme yields a Preference <wt 
that ranks trees according to their lexical and rule 
weights. Using maz_wt(T) to denote the most 
heavily weighted lexical item or rule used in the 
construction of T, we have: 
tl <tot 7t2 ('~del maz-wt(tl) < maz_wt(t2) 
(11) 
The significant property here is that the equiva- 
lence classes under <wt can be computed without 
directly comparing trees. Given a lexical item 
with weight n, we know that any tree contain- 
ing it must be in class n or lower. Noting that 
our algorithm works by generate-and-test (trees 
are created and then ranked by equiv_class), we 
can achieve a modest improvement in efficiency 
by not creating trees with level n lexical items or 
rules until it is time to enumerate that equivalence 
class. We can implement this change for both 
generation and understanding by adding level as 
a parameter to both initialize_agenda and gener- 
ate_items, and changing the functions they call 
to consider only rules and lexical items at or be- 
low level. How much of an improvement this 
yields will depend on how many classes we want to 
enumerate and how many lexical items and rules 
there are below the last class enumerated. 
4.2 Scope 
Scope is another place where we can improve 
on the basic algorithm. We start by consider- 
ing scoping during Understanding. Given a sen- 
tence s with operators (quantifiers) ol...o,, as- 
signing a scope amounts to determining a total 
order on ol ...o, s. If a scope Preference can do 
SNote that this ordering is not a Preference. A Prefer- 
ence will be a total ordering of trees, each of which contains 
such a scope ordering, i.e., a scope Preference will be an 
ordering of orderings of operators. 
113 
no more than compare and rank pairs of scopings, 
then the simple generate-and-test algorithm will 
require O(n!) steps to find the best scoping since 
it will potentially have to examine every possible 
ordering. However, the standard heuristics for as- 
signing scope (e.g., give "strong" quantifiers wide 
scope, respect left-to-right order in the sentence) 
can be used to directly assign the preferred or- 
dering of ox... ON. If we assume that secondary 
readings are ranked by how closely they match 
the preferred scoping, we have a Preference <,c 
can be defined. In the following (ol, oj) 6 Sc(s) 
means that oi preceeds oj in scoping Sc of sen- 
tence s, and Scb,,t(s) is the preferred ordering of 
the operators in s given by the heuristics: 
Sc,(s) <,~ Se2(s) ~d,! (12) 
Vo ,oi \[(o ,o9 e 
Sc (s) --. (o,,o9 sc,(.)\] \] 
Given such a Preference, we can generate the 
scopings of a sentence more efficiently by first pro- 
ducing the preferred reading (the first equivalence 
class), then all scopes that have one pair of oper- 
ators switched (the second class), then all those 
with two pairs out of order, etc. In the following 
algorithm, ops is the set of operators in the sen- 
tence, and sort is any sorting routine, switched? 
is a predicate returning True if its two arguments 
have already been switched (i.e., if its first arg 
was to the right of its second in Scbe,t(s)), while 
switch(o,, o2, ord) is a function that returns new 
ordering which is the same as ord except that o~ 
precedes o, in it. 
{the best scoping} 
root_set := sort(operators, SCbe°~(s)); 
level := 1; 
output(root_set, level); 
new_set := Nil; 
old_set := add_item(root_set, Nil); 
{loop will execute n! - 1 times } 
while old_set do 
begin 
for ordering in old_set do 
begin 
for op in ordering do 
begin 
{consider adjacent pairs of operators} 
next := right_neighbor(op, ordering); 
{switch any pair that hasn't already been} 
if next and not(switched?(op, next)) 
then do 
begin 
new_scope := switch(op, next, ordering); 
add_item(new.scope, new_set); 
output(new_scope, level) ; 
end 
end 
end 
old_set := new_set; 
new_set := Nil; 
end 
While the Algorithm 1 would require O(n!) 
steps to generate the first scoping, this algo- 
rithm will output the best scoping in the n 2 
or n log n steps that it takes to do the sort (cf 
\[Aho, Hopcroft, and Ullman, 1983\]), while each 
additional scoping is produced in constant time. 9 
The algorithm is profligate in that it generates 
all possible orderings of quantifiers, many of 
which do not correspond to legal scopings (see 
\[Hobbs and Shieber, 1987\]). It can be tightened 
up by adding a legality test before scope is out- 
put. 
When we move from Understanding to Gener- 
ation, following Formula 6, we see that the task 
is to take an input semantics with scoping Sc 
and enumerate first all strings that have Sc as 
their best scoping, then all those with Sc as the 
second best scoping, etc. Equivalently, we enu- 
merate first strings whose scopings exactly match 
Sc, then those that match Sc except for one pair 
of operators, then those matching except for two 
pairs, etc. We can use the Algorithm 1 to imple- 
ment this efficiently if we replace each of the two 
conditional calls to equiv_class. Instead of first 
computing the equivalence class and then testing 
whether it is less than level, we call the following 
function class_less_than: 
{True iff candidate ranked at level or below} 
{ Target is the desired scoping} 
function classAess_than( candidate, target, level) 
begin 
switchAimit := level; {global variable} 
switches := O; {global variable} 
return test_order(candidate, target, target); 
end {class_less_than } 
function test_order( eand, targ_rest, targ) 
begin 
if null(cand) 
return True; 
else 
9switched.¢ can be implemented in constant time if 
we record the position of each operator in the original 
scoping SCbest. Then switched.¢(Ol, 02) returns True iff 
posiaon(o2) < p0siao,(ol). 
114 
! 
begin 
targ_tail := member(first(cand), targ_rest); 
if targ_tail 
return test_order(rest(cand), targ_tail, targ); 
else 
begin 
switches := switches + 1; 
if >(switches, switch.limit) 
return FalSe; 
end 
else i 
if (simple_test(rest(cand), targ_rest) 
return tesLorder(cand, targ, targ); 
else return False; 
end 
end {test_order} i 
function simple~test( cand_rest, targ_rest) 
begin 
for cand in cand_rest do 
begin 
if not(member(cand, targ_rest)) 
begin 
switches := switches + l; 
if >(switches, switch_limit) 
return falsei 
end 
end 
return true; 
end {simple_test} 
To estimate the complexity of class_less_than, 
note that if no switches are encountered, 
test_orderwill make one pass through targ_rest (= 
targ) in O(n) steps, where n is the length of targ. 
Each switch encoUntered results in a call'to sita- r 
pie_test, O(n) steps, plus a call to test_arg on the 
full list targ for another O(n) steps. The overall 
complexity is thus O((j+ 1) x n), where level = j 
is the number switches permitted. Note that 
class_less_than tests a candidate string's scoping 
only against the target scope, without having to 
inspect other possible strings or other possible 
scopings for the string. We therefore do not need 
to consider all strings that can have Sc as a scop- 
ing in order to fifid the most highly ranked ones 
that do. Furthermore, class_less_than will work 
on partial constituents (it doesn't require that 
cand have the same number of operators as targ), 
so unpromising piths can be pruned early. 
4.3' Anaphoi.a 
Next we consider the problem of anaphoric ref- 
erence. From the standpoint of Understanding, 
resolving an anaphoric reference can be viewed 
as a matter of finding a Preference ordering of 
all the possible antecedents of the pronoun. Al- 
gorithm 1 would have to produce a separate in- 
terpretation for each object that had been men- 
tioned in the discourse and then rank them all. 
This would clearly be extremely inefficient in 
any discourse more than a couple of sentences 
long. Instead, we will take the anaphora reso- 
lution algorithm from \[Rich and Luperfoy, 1988\], 
\[Luperfoy and Rich, 1991\] and show how it can be 
viewed as an implementation of a Complex Pref- 
erence, allowing for a more efficient implementa- 
tion. 
Under this algorithm, anaphora resolution is 
entrusted to Experts of three kinds: a Proposer 
finds likely candidate antecendents, Filters pro- 
vide a quick way of rejecting many candidates, 
and Rankers perform more expensive tests to 
choose among the rest. Recency is a good ex- 
ample of a Proposer; antecedents are often found 
in the last couple of sentences, so we should start 
with the most recent sentences and work back. 
Gender is a typical Filter; given a use of "he", we 
can remove from consideration all non-male ob- 
jects that the Proposers have offered. Semantic 
plausibility or Syntactic parallelism are Rankers; 
they are more expensive than the Filters and 
assign a rational-valued score to each candidate 
rather than giving a yes/no answer. 
When we translate these experts into our 
framework, we see that Proposers are Prefer- 
ences that can efficiently generate their equiva- 
lence classes in rank order, rather than having to 
sort a pre-existing set of candidates. This is where 
our gain in efficiency will come: we can work back 
through the Proposer's candidates in order, confi- 
dent that any candidates we haven't seen must be 
ranked lower than those we have seen. Filters rep- 
resent a special class of Preference that partition 
candidates into only two classes: those that pass 
and those that are rejected. Furthermore, we are 
interested only in candidates that aiifilters assign 
to the first class. If we simply combine n Filters 
into a Complex Preference using Formula 9, the 
result is not a Filter since it partitions the input 
into 2" classes. We therefore define a new sim- 
ple Filter F(I ,..J.) that assigns its input to class 
1 iff F1...Fn all do. Finally, Rankers are Pref- 
erences of the kind we've been discussing so far. 
When we observe that the effect of running a Pro- 
poser and then removing all candidates that the 
Filters reject is equivalent to first running the Fil- 
ter and then using the Proposer to refine its first 
115 
class 1°, we see that the algorithm above, when run 
with Proposer Pr, Filters F1... Fn and Rankers 
Rt ... Rj, implements the Complex Preference 
P(Ftl ' I.),pr,at...a~), defined in accordance with 
Formu'la 9. We thus have the following algorithm, 
where nezt_class takes a Proposer and a pronoun 
as input and returns its next equivalence class of 
candidate antecedents for the pronoun. 
class := 1; {global variable} 
cand := next_class(Proposer, pronoun); 
filtered_cand := cand; 
while (cand) do 
begin 
for eand in cands do 
begin 
for filter in Filters do 
begin 
if not(Filter(cand)) 
then remove(cand, filtered_cand); 
end 
end 
{filtered_cand now contains class n under} 
{P(F(,,...l.),pr ). Rankers R1-.. Rj} 
{may split it into several classes} 
refine&output(filtered_cand, Rankers); 
cand := next_class(Proposer); 
end 
function Refine&Output(cands, Rankers) 
begin 
refined_order := sort(cands, Rankers); 
if rest(Rankers) 
then refine&output(refined_order, 
rest(Rankers)); 
else 
begin 
loc_class := 1; for cand in refined_order do 
if >(equiv_class 
first(Rankers), cand), 
loc_class) 
then 
begin 
loc_class := loe_elass + 1; 
class := class + ioc_class; 
end 
output(cand, class); 
end 
end {Refine&Output} 
Moving to Generation, we use this Preference 
*0 In both cases, the result is: pl n fl ,.-- P., l'lfl, where 
Pl ... p. are the equivalence classes induced by the Pro- 
poser, and fl is the Filter's first equivalence class. 
to decide when to use a pronoun. Following For- 
mula 6, we want to use a pronoun to refer to ob- 
ject z at level n iff that pronoun would be inter- 
preted as referring to z in class n during Under- 
standing. First we need a test occursf(Proposer, 
z) that will return True iff Proposer will even- 
tually output z in some equivalence class. For 
example, a Recency Proposer will never suggest a 
candidate that hasn't occurred in the antecedent 
discourse, so there is no point in considering a 
pronoun to refer to such an object. Next, we 
note that the candidates that the Proposer re- 
turns are really pairs consisting of a pronoun and 
an antecedent, and that Filters work by compar- 
ing the features of the pronoun (gender, number, 
etc.) with those of the antecedent. We can im- 
plement Filters to work by unifying the (syntac- 
tic) features of the pronoun with the (syntactic 
and semantic) features of the antecedent, return- 
ing either a more fully-specified set of features for 
the pronoun, or .L if unification fails. We can now 
take a syntactically underspecified pronoun and z 
and use the Filter to choose the appropriate set of 
features. We are now assured that the Proposer 
will suggest z at some point, and that z will pass 
all the filters. 
Having established that z is a reasonable can- 
didate for pronominal reference, we need to de- 
termine what claxs z will be assigned to as an an- 
tecedent. Rankers such as Syntactic Parallelism 
must look at the full syntactic structure**, so we 
must generate complete sentences before doing 
the final ranking. Given a sentence s contaning 
pronoun p with antecedent z, we can determine 
the equivalence class of (p, z) by running the Pro- 
poser until it (p, z) appears, then running the Fil- 
ters on all other candidates, and passing all the 
survivors and (p,x) to refine~ontpnt, and then 
seeing what class (p, z) is returned in. Alterna- 
tively, if we only want to check whether (p, z) is 
in a certain class n or not, we can run the reso- 
lution algorithm given above until n classes have 
been enumerated, quitting if (p,x) is not in it. 
(See the next section for a discussion of this algo- 
rithm's obvious weaknesses.) 
nThe definitions we've given so far do not specify how 
Preferences should rank "unfinished" structures, i.e., those 
that don't contain all the information the Preference re- 
quires. One obvious solution is to assign incomplete struc- 
tures to the first equivalence class; M the structures be- 
come complete, they can be moved down into lower daases 
if necessary. Under such a strategy, Preferences such as 
Syntactic Parallelism will return high scores on the in- 
complete constituents, but these scores will be meaning- 
less, since many of the resulting complete structures will 
be placed into lower classes. 
116 
5 Discussion 
Related Work: There is an enormous amount 
of work on preferences for understanding, e.g., 
\[Whittemore, Ferrara, and Brunner, 1990\], 
\[Jensen and Binot, 1988\], \[Grosz, Appelt, Mar- 
tin, and Pereira, 1987\] for a few recent examples. 
In work on generation preferences (in the sense of 
rankings of structures) are less clearly identifiable 
since such rankings tend to be contained implic- 
itly in strategies for the larger problem of deciding 
what to say (but see \[Mann and Moore, 1981\] and 
\[Reiter, 1990\].)i Algorithm 1 is similar in spirit 
to the "all possibilities plus constraints" strategy 
that is common in principle-based approaches (see 
\[Epstein, 1988\])i, but it differs from them in that it 
imposes a preference ordering on interpretations, 
rather than rest'ricting the set O f legal interpreta- 
tions to begin With. 
I Strzalkowski \[Strzalkowski, 1990\] contrasts two 
strategies for r~versibility: those with a single 
grammar and two intepreters versus those with 
a single interpreter and two grammars. Although 
the top-level algorithm presented here works for 
both understanding and generation, the under- 
lying generatio~ and understanding algorithms 
can belong to either of Strzalkowski's categories. 
However, the more specific algorithms discussed 
in Section 4 belong to the former category. There 
is also a clear "directionality" in both the scope 
and the anaphora Preferences; both are basically 
understanding h~euristics that have been reformu- 
lated to work b~i-directionally. For this reason, 
they are both considerably weaker as generation 
heuristics. In particular, the anaphora Prefer- 
ence is clearly insufficient as a method of choosing 
when to use a pronoun. At best, it can serve to 
validate the choices made by a more substantial 
planning compoOent. 
The Two Directions: In general, it is not clear J 
what the relation between understanding and 
generation heuristics should be. Formulae 4 and 
5 are reasonable requirements, but they are too 
weak to provide ithe close linkage between under- 
standing and generation that we would like to 
have in a bi-directional system. On the other 
hand, Formula 6 is probably too strong since it 
requires the equlivalence classes to be the same 
across the boardl In particular, it entails the con- 
verse of Formula.4, and this has counter-intuitive 
results. For example, consider any highly convo- 
luted, but grammatical , sentence: it has a best 
interpretation, and by Formula 6 it is therefore 
one of the best ways of expressing that meaning. 
But if it is sufficently opaque, it is not a good 
way of saying anything. Similarly, a speaker may 
suddenly use a pronoun to refer to an object in 
a distant part of the discourse. If the anaphora 
Preference is sophisticated enough, it may resolve 
the pronoun correctly, but we would not want the 
generation system to conclude that it should use a 
pronoun in that situation. One way to tackle this 
problem is to observe that understanding systems 
tend to be too loose (they accept a lot of things 
that you don't want to generate), while genera- 
tion systems are too strict (they cover only a sub- 
set of the language.) We can therefore view gener- 
ation Preferences as restrictions of understanding 
Preferences. On this view, one may construct a 
generation Preference from one for understanding 
by adding extra clauses, with the result that its 
ordering is a refinement of that induced by the 
understanding Preference. 
Internal Structure: Further research is neces- 
sary into the internal structure of Preferences. 
We chose a very general definition of Preferences 
to start with, and found that further restrictions 
allowed for improvements in efficiency. Prefer- 
ences that partition input into a fixed set of 
equivalence classes that can be determined in ad- 
vance (e.g., the Preference for lexical choice dis- 
cussed in Section 4) are particularly desireable 
since they allow structures to be categorized in 
isolation, without comparing them to other al- 
ternatives. Other Preferences, such as the scope 
heuristic, allow us to create the desired struc- 
tures directly, again without need for compari- 
son with other trees. On the other hand, the 
anaphora Preference is based on an algorithm 
that assigns rational-valued scores to candidate 
antecedents. Thus there can be arbitrarily many 
equivalence classes, and we can't determine which 
one a given candidate belongs to without look- 
ing at all higher-ranked candidates. This is not 
a problem during understanding, since the Pro- 
poser can provide those candidates efficiently, but 
the algorithm for generation is quite awkward, 
amounting to little more than "make a guess, then 
run understanding and see what happens." 
The focus of our future research will be a for- 
mal analysis of various Preferences to determine 
the characteristic properties of good understand- 
ing and generation heuristics and to investigate 
methods other than Formula 9 of combining mul- 
tiple Preferences. Given such an analysis, Algo- 
rithm 1 will be modified to handle multiple Pref- 
erences and to treat the different types of Pref- 
erences differently, thus reducing the need for 
117 
the kind of heuristic-specific algorithms seen in 
Section 4. We also plan an implementation of 
these Preferences as part of the KBNL system 
\[Barnett, Mani, Knight, and Rich, 1990\]. 

References 
\[Aho, Hopcroft, and Ullman, 1983\] Alfred Aho, 
John Hopcroft, and Jeffrey Ullman. Data 
Structures and Algorithms. Addison-Wesley, 
1983. 
\[Barnett, Mani, Knight, and Rich, 1990\] 
Jim Barnett, lnderjeet Mani, Kevin Knight, 
and Elaine Rich. Knowledge and natural lan- 
guage processing. CACM, August 1990. 
\[Calder, Reape, and Zeevat, 1989\] J. Calder, M. 
Reape, and H. Zeevat. An algorithm for gen- 
eration in unification categorial grammar. In 
Proceedings of the ~th conference of the Eu- 
ropean Chapter of the ACL, 1989. 
\[Epstein, 1988\] Samuel Epstein. Principle-based 
interpretation of natural language quanti- 
tiers. In Proceedings of AAAI 88, 1988. 
\[Grosz, Appelt, Martin, and Pereira, 1987\] 
Barbara Grosz, Douglas Appelt, Paul Mar- 
tin, and Fernando Pereira. Team: an exper- 
iment in the design of portable natural lan- 
guage interfaces. Artificial Intelligence, 1987. 
\[Hobbs and Shieber, 1987\] Jerry Hobbs and Stu- 
art Shieber. An algorithm for generating 
quantifier scopings. Computational Linguis- 
tics, 13(1-2):47-63, 1987. 
\[Jensen and Binot, 1988\] 
Karen Jensen and Jean-Louis Binot. Dic- 
tionary text entries as a source of knowledge 
for syntactic and other disambiguations. In 
Second Conference on Applied Natural Lan- 
guage Processing, Austin, Texas, 9-12 Febru- 
ary, 1988. 
\[Luperfoy and Rich, 1991\] Susan Luperfoy and 
Elaine Rich. Anaphora resolution. Compu- 
tational Linguistics, to appear. 
\[Mann and Moore, 1981\] William 
~Mann and James Moore. Computer gener- 
ation of multiparagraph english text. Amer- 
ican Journal of Computational Linguistics, 
7(1):17-29, 1981. 
\[Reiter, 1990\] Ehud Reiter. The computational 
complexity of avoiding conversational impli- 
catures. In Proceedings of the ACL, Pitts- 
burgh, 6.9 June, 1990. 
\[Rich and Luperfoy, 1988\] Elaine Rich and Susan 
Luperfoy. An architecture for anaphora res- 
olution. In Second Conference on Applied 
Natural Language Processing, Austin, Texas, 
9-12 February, 1988. 
\[Shieber, van Noord, Moore and Pereira, 1989\] 
S. Shieber, G. van Noord, R. Moore, and F. 
Pereira. A semantic head-driven generation 
algorithm for unification-based formalisms. 
In Proceedings of the ACL, Vancouver, 26- 
29 June, 1989. 
\[Strzalkowski, 1990\] Tomek Strzalkowski. Re- 
versible logic grammars for parsing and gen- 
eration. Computational Intelligence, 6(3), 
1990. 
\[Whittemore, Ferrara, and Brunner, 1990\] Greg 
Whittemore, Kathleen Ferrara, and Hans 
Brunner. Post-modifier prepositional phrase 
ambiguity in written interactive dialogues. 
In Proceedings of the A CL, Pittsburgh, 6-9 
June, 1990. 
