COMBINATORIAL DISAMBIGUATION 
P. S. Newman 
IBM Los Angeles Scientific Center 
11601 Wilshire Boulevard 
Los Angeles, CA 90025-1738 
Abstract 
The disambiguation of sentences is a combinatorial 
problem. This paper describes a method for treating 
it as such, directly, by adapting standard combinatorial 
search optimizations. Traditional disambiguation heu- 
ristics are applied but, instead of being embedded in 
individual decision procedures for specific types of 
ambiguities, they contribute to numerical weights that 
are considered by a single global optimizer. The result 
is increased power and simpler code. The method is 
being implemented for a machine translation projecl, 
but could be adapted to any natural language system. 
1. Introduction 
The disambiguation of sentences is a combinatorial 
problem. Identification of one word sense interacts 
with the identification of other word senses, 
(I) lie addressed the chair 
and with constituent attachment, 
(2) He shot some bucks with a rifle 
Moreover, the attachment of one constituent interacts 
with the attachment of other constituents: 
(3) She put the vase on the table in the living room 
This paper describes a method of addressing the prob- 
lem directly, by adapting standard searctl optimization 
techniques, in the first section we describe the core of 
the method, which applies a version of best-.first search 
to a uniform representation of the set of possibilities. 
In the second section we relate the work to other 
approaches to preference-based disambiguation. "l"he 
final sections describe how the representation may he 
obtained from a lexicon. 
2. The Search Method 
In the machine translation project for which this tech- 
nique is being developed, disambiguation begins after 
a parser, specifically the PLNI.P English Grammar 
by Jensen (1986), has identified one or more parses 
based primarily on syntactic considerations (including 
subcategorization). Words are disambiguated up to 
part-of-speech, but word senses are not identified. In- 
dividual parses may indicate alternative attachments 
of many kinds of constituents including prepositional 
phrases and relative clauses. 
Beginning disambiguation only after a general parse 
has the advantage of making clear what all the possi- 
bilities are, thus allowing their investigation in an ef- 
ficient order. Perlbrming a full parse before 
disamblguatlon need not consume an inordinate 
amount of space or time; techniques such as those 
used by Jensen (default rightmost prepositional phrase 
attachment), and "l'omita (1985) (parse forests) ade- 
quately control resource requirements. 
The parser output is first transfi)rmed so that it is 
represented as a set of of semantic choice points. Each 
choice point generally represents a constituent, it con- 
rains a group of weighted semantic alternatives that 
represent the different ways in which word-senses of 
tile constituent head can he associated semantically 
with word-senses of a higher level head. This allows 
word-sense and attachment alternatives to be treated 
uniformly. 
Combinatorial disamhiguation then selects the consis- 
tent combination of alternatives, one from cat'h choice 
point, that yields tile highest total weight. "1,~ illustrate 
the method, we use tile exlension of the classic example 
mentioned above: 
(2) lie shot some bucks with a rifle 
A decomposition of tile sentence into choice points el, 
c2, and c3 is shown in Figure I. (The illustration 
243 
assumes that "shot" and "bucks" have two meanings 
each, and ignores determiners.) Choice point "cl" 
gives the alternative syntactic and semantic possibilities 
for the attachment of the constituent "he" to its only 
possible head "shot'. Alternative ell is that "he" is 
the subject of %hootl', with the semantic function 
"agent', and is given (for reasons which will be dis- 
cussed later) the weight "3". Alternative c12 is similar, 
but the meaning used for "shoot" is "shoot2". Similar 
alternatives are used for the attachment (c2) of the 
object "some bucks" to its only possible head, "shoot'. 
Alternative c23 represents the unlikely combinations. 
Choice point c3 reflects the different possible attach- 
ments of'with a rifle'; the highest v, eight (3) is given 
to its attachment as an instrumental modifier of 
"shootl'. The other possibilities range from barely 
plausible to implausible and are weighted accordingly. 
I laving obtained this repesentation (whose construction 
is described in later sections), the next step is to es- 
tablish the single-alternative choice points as given and 
to propagate any associated word-sense constraints to 
narrow or eliminate other alternatives. (l'his does not 
occur in our example.) 
Then combinations of the remaining choices are 
searched using a variation of the A* best-first search 
method. See Nilsson (1980) for a thorough description 
of A*. Briefly, A* views combinatorial search as a 
problem of finding an optimal path from the root to 
a leaf of a tree whose nodes are the weighted alterna- 
tives to be combined. At any point in the process, the 
next node n to be added is one for which the potential 
F(n) = G(n) + If(n) 
is maximal over all possible additions. G(n) represents 
the value of the path up to and including n. I I(n) 
represents an estimated upper bound for the value of 
the path below n (i.e., for the additional value which 
can be obtained) ~. When a complete path is found by 
this method, it must be optimal. The efficiency of the 
method, i.e., the speed of finding an optimal path, 
depends on the accuracy of tile estimates ll(n). 
To apply the method in our context, tile search tree 
is defined in terms of levels, with each level corre- 
sponding to a choice point. Choice points are assigned 
to levels so that those which would probably be re- 
sponsible for the greatest difference between estimated 
and actual l l(n) in an arhitrary assignment are exam- 
ined first. Looked at in another way, the assignment 
of choice points to levels is made so that those which 
will best differenliale among polential path scores are 
examined firsl. 
This is done by (partially) ordering the choice points 
in descending order of their difference potential De, 
the difference in weight between their highest weighted 
alternative and the next allernalive. If the highest 
weight is represented by two different alternatives, Dc 
= 0. Within this ordering the choice points are further 
ordered by weight differences between tile 2rid and 
3rd highest weighted alternatives, etc. For our example 
this results in choice point c3 ('with a rifle') being 
assigned to the highest level in the tree, followed by 
choice points c2 and then el. 
cl. He 
cll agt (he shootl) 3 
clZ agt the shootZ) 3 
eZ. some bucks 
cZl goal !buck1 shoot1) 3 
c22 goal (t~ck2 shoot2) 3 
c23 goal ! buck1 shoot2 ), ( buck2 shooil ) 
c3.~i*h a rifle 
c31 ins* (rifle shooil) 3 
c32 inst trifle shooi2) 2 
c33 ~og~ trifle (buckl,buck2)) 0 
c3~ accm trifle (shoo~l,shoo~2)) 0 
ti.e., fired-at ) 
ti.e., ~rasted ) 
li.e., male deer) 
ti.e., dollar) 
0 
( i .e., together-~ith) 
( i.e., ~ccompanied-by) 
Figure 1: Choice points and alternatives 
244 
We also associate with each level=choice point the 
value lie, which is the sum of the maximum weights 
that can be added by contributions below that choice 
point. This is needed by the algorithm to estimate 
potential path scores below a given node. For this 
example, the lie values are: 
HO: ~op=9 H3:rifle=6 
HZ: buck=3 HI: he=0 
Then the best-first search is carried out. At each point 
in the search, the next node to be added is that which 
(a) is consistent in word-sense selection with previous 
choices, and (b) has the highest potential. The potential 
is calculated as the accumulated weight down to (and 
including) that node plus tic for that level. A diagram 
of the procedure, as applied to the example, is shown 
in Figure 2. The first node to be added is "with rifle 
shootl', which has the highest potential. At that point, 
the highest weighted consistent alternative is c21, etc. 
While the set of choice points implies that there are 
(4 x 3 x 2) = 24 paths to be searched, only one is 
pursued to any distance. Thus while the approach 
takes a combinatorial view of the problem, it does so 
without loss of efficiency. 
When a full path is found, it is examined for semantic 
consistency (beyond word-sense consistency). The 
checks made include: (a) ensuring that the interpreta- 
tion includes all required semantic functions for a 
word-sense (specified in the lexicon), and (b) ensuring 
that non-repeatable functions (e.g., the goal of an ac- 
tion) are not duplicated. 
Even if the full path is found to be consistent, the 
search does not terminate immediately, but continues 
until it is certain that no other path can have aft equal 
score. This will be true whenever the maximum po- 
tential for open nodes is less than the score for an 
already-found path. A more precise description of the 
algorithm is given in the Appendix. 
When more than one path is found with the same 
high score, additional tests are applied. These tests 
include comparisons of surface proximity and, as this 
work is situated within a multi-target translation sys- 
tem, user queries in the source language, as outlined 
by Tomita (1985). 
An extended version of the method is used in com- 
paring alternate parses which differ in constltucnt com- 
position, and thus are more easily analyzed as different 
parse trees, each with its own set of choice points. An 
example is the classic: 
(4) Time flies like an arrow 
(where the main verb can he arty one of tile first three 
words). In such cases, one set of choice points is 
constructed per parse tree. In general, the search 
alternates among trees, with the next node to be added 
being that with tile greatest potential across trees. If 
such trees always had Ihe same numhe,- of choice 
points, this would he the only revision needcd, llow- 
ever, the number of choice poinls may differ, for one 
thing because the parser may have detected and con- 
densed non-compositional compounds (e.g., "all the 
same ~) in one parse but not in another. For this 
reason the algorithm changes to evaluate paths not by 
total scores, but by average scores (i.e., the total scores 
divided by the number of choice points in the particular 
parse). 
"kop 
"rifle" c31 
"buck" cZl cZZ ( inconsistent ) 
"he" cll 
Figure 2: Red.ced Tree Search 
! The basic A* algorithm is usually described as *expanding" (i.e., adding all immediate successors ol) the most promising node. The variant 
described here, which is more appropriate to our situation (and also mentioned by Nilsson), adds a single node at each step. 
245 
3. Related Work 
There seems to be little work which directly addresses 
the combinatorial problem. First, there is considerable 
work in preference-related disambiguation that as- 
sumes, at least for purposes of discussion, that indi- 
vidual disambiguation problems can be addressed in 
isolation. For example, treatments of prepositional 
phrase attachment by by Dahigren and McDowell 
(1986) and Wilks el. al. (1985) propose methods of 
finding the "best ~ resolution of a single attachment 
problem by finding the first preference which is satisfied 
in some recommended order ofrule application. Other 
types of ambiguity, and other instances of the same 
type, are assumed to have been resolved. This type 
of work contributes interesting insights, but cannot be 
used directly. 
One type of more realistic treatment, which might be 
called the deferred decision approach, is exemplified 
by ilirst (19831. When, in the course of a parse, an 
immediate decision about a word sense or attachment 
cannot be made, a set of alternative possibilities is 
developed. The possibility sets are gradually narrowed 
by propagating the implications of both decisions and 
narrowings of other possibility sets. 
This approach has a number of problems. First, it is 
susceptible to error in semantic "garden path" situa- 
lions, as early decisions may not be justifiable in the 
context. For example, in processing 
(5) He shot a hundred bucks with one rifle. 
a particular expert might decide on "dollars" for bucks, 
because of the modifier "hundred', before the prepo- 
sitional phrase is processed. Also, it is difficult to see 
how versions of this method could be extended to deal 
with comparing alternate parses where the alternatives 
are not just ones of attaching constituents, but of de- 
ciding what the constituents are in the first place. 
A full-scale deferred-decision approach also has the 
potential of significant design complexity (the ilirst 
version is explicitly limited), as each type of decision 
procedure (for words and for different kinds of attach- 
ments) must be able to understand and process the 
implications of the results of other kinds of decision 
procedures. 
Underlying these problems is the lack of quantification 
of alternatives, which allows for comparison of com- 
binations. 
There are, however, early and more recent approaches 
which do apply numeric weights to sentence analysis. 
Early approaches using weights applied them primarily 
to judge alternative parses. Syntactically-oriented ap- 
proaches in this vein attached weights to phrase struc- 
ture grammar rules (Robinson 1975, 1980) or ATN 
arcs (Bates 1976). Some approaches of this period 
focussing on semantic expectations were those of Wilks 
(19751 and Maxwell and Tuggle (1975), which em- 
ployed scores expressing the number of expected de- 
pendents present in an interpretation. An ambitious 
approach combining different kinds of weights and 
cumulative scores, described by Walker and Paxton 
et. al. (19771, included heuristics to select the most 
promising sublrees for earlier development, to avoid 
running out of space before a "best" tree can be found. 
llowever, except for tiffs type of provision, none of 
the early approaches using weights seem to address 
the combinatorial problem. ~ 
A contemporary approach for thorough syntactic and 
semantic disambigualion using weights is described by 
L. Schubert (1986). During a left-to-right parse, in- 
dividual attachments are weighted based on a list of 
considerations including preference, relative sentential 
position, frames/scripL~, and salience in the current 
context. The multiple evolving parse trees are rated 
by summing their contained weights, and the combi- 
natorial problem is controlled by retaining only the 
two highest scoring parses of any complete pltrases. 
This approach is interesting, although some details are 
vague 3. l\[owevcr, the post-parse application of A* 
described in this paper obtains the benefits of such a 
within-parse approach without its deficiences in that: 
(a) combinatorial computations of weighCs and word- 
sense consistencies are avoided except when warranted 
by total sentence informalion, and (b) there is no pos- 
sibility of early erroneous discarding of allerrmtives. 
Heidorn (1982) provides a good summary of early work in weight-based analysis, as well as a weight-oriented approach to attachment decisions 
based on syntactic considerations only. 
No examples are given, so it is unclear whether a parse for a phrase or part thereof represents only one interpretation, or all interpretations 
having the same structure, scored by the most likely interpretation. The former is obviously inadequate (c.g., for highly ambiguous subject NPs 
like *The stands'), while the latter seems to require either the calculation of all alternative cumulative scores, or recalculation of scores if an 
interpretation fails. 
246 
he bucks 
sub~ he shoo( obj buck shoot 
with a rifle 
with rifle shoot 
with rifle buck 
Figure 3:Syntactic Choice Poin~ 
One other parser-based work should be noted, that of 
Wittenburg (1986), as it is explicitly based on A*. The 
intent and content of the method is quite different from 
that described here. It is situated within a chart-parser 
for a categorial grammar, and the intent is to limit 
parsing expense by selecting that rule for next appli- 
cation which has the least likelihood of leading to an 
incomplete tree. While selectional preference is men- 
tioned in passing as a possible heuristic, the heuristics 
discussed in any depth are grammar oriented, and 
operate on the immediate facts of the situation, rather 
than on informed estimates of total parse scores. 
It should be also be mentioned that the representation 
of alternatives in schemes which combine syntactic 
and semantic disambiguation is rarely discussed, al- 
though maintaining a consistent representation of the 
relationships among word-sense and attachment alter- 
natives is fundamental to a systematic treatment of the 
problem. An exception is the discussion by K. 
Schubert 0986), who describes a representation for 
alternatives with some affinities to that described here 4. 
The information limitations of disambiguation during 
parsing are not found in spreading-activation ap- 
proaches, exemplified by Charniak 0986), Cottrell 
and Small 0984), and Waltz and Pollack (1985). 
These approaches are still in the experimental stage, 
and are primarily intended for parallel hardware, while 
the A* algorithm used in this paper is designed for 
conventional serial hardware. But, in a sense, these 
approaches reinforce the main point of this paper: 
they argue for a single global technique for optimized 
combinatorial disambiguation based on all available 
information. 
4. Preparing Semantic Choices 
llaving described how the choice points are used, we 
address their development. Two steps are involved: 
(1) the development of syntactic choice points, and (2) 
the development of semantic choice points. The first 
step transforms the parse-level syntactic functions into 
a form appropriate to the second step, which is the 
application of the lexicon to those fimctions to obtain 
the semantic alternatives. 
In our example, the first step is a simple one. Syntactic 
relationships among conslituents are transformed into 
syntactic relationships among head words, and the 
syntactic relationships arc refined, so that "ppmod" is 
replaced by the actual preposilions used. The result 
of this step is shown in Figure 3. The development of 
syntactic choice poinls for some other types of con- 
stituents is more complex. Before discussing these 
situations, we discuss slep 2: application of the lexicon 
to the syntactic choice points Io obtain Ihe semantic 
choice points, i.e., those shown in Figure I. 
4.1 The Lexicon 
The lexicon conlains entries for word stems (distin- 
guished by part-of-speech), linked to word-sense en- 
tries, which are lhe lowest level "conccptsL Concept 
entries are linked Io superordinale concept entries, 
forming a lattice. Concepl entries include a set of 
concept features (e.g. a list of snperordinate concepts), 
and one or more rules for each syntactic function 
associated with the concept. The more relevant parts 
of the lexicon entries fbr the concepts used in the 
ongoing example are shown in Figure 4. The "classes" 
are lisls of superordinale concepts. The synlactic func- 
tion rules have the form: 
synfun dependent head semfun weight 
Thus tile first rule under "shootl" indicates that word- 
senses falling into tile class "animate" are its preferred 
objects, with tile weight 3, and the combination is 
given tile semantic fimclion "goal'. The concept entry 
4 However, the weighting scheme is different, and rather interesting. The reference does not discuss the selection of a particular combination of 
alternatives in any detail, but it appears to be based on the presence in a combination of one (or more?) highly weighted alternative (or alternatives.'?). 
247 
shoo~l classes I humanac~, ~ransv) 
(ob~ec~ animate shoo{1 goal 3) 
(wi{h firearm shoo~1 ins{ 3) 
humanac~ classes (ac~...) 
(sub~ human humanac{ ag{ 3) 
twiih human hunmnact accm 31 
buck1 classes (animate...) 
but:k2 classes (money..) 
shoo~2 classes Ihumanac~, {ransv) 
l objec~ money shoo{2 goal 3) 
{ransv 
lobjec~ ~hing ~ransv goal O) 
act 
twi{h tool act ins~ 21 
rifle classes (firearm...) 
firearm classes I~ool...) 
Figure 4: Izxieon Entries 
"humanact" contains other rules applicable to shootl 
and other verb-senses taking human actors. 
Weights are actually assigned symbolically, to allow 
experimentation. Current settings are as follows: 
• Rules for idioms (e.g., kick the bucket), 4. 
• RUles for more general selectional preferences, 3. 
• Rules for acceptable but not preferred alternatives 
(e.g., locative prepositional phrases attached to ar- 
bitrary actions), 2. 
• Very general attachments (e.g., "nounmod noun l 
noun2), 0. These allow for uninterpreted metaphoric 
usage. 5 
One major objective in assigning weights to ensure 
that combinations of very high (idiom) weights together 
with very low weights do not ouLscore more balanced 
combinations. Thus, for: 
(6) He kicked the ugly bucket 
weights such as: 
subj he kickedl $ 
oh\] bucke{1 kickedl 
ad\]m ugly buc.ke~ 1 0 
subj he kicked2 3 
ob~ bue.ke{2 kicked2 3 
adam ugly bucke{2 2 
provide the necessary balance. (llere kickedl is the 
idiomatic interpretation, and bucketl is a word-sense 
of bucket used only in that interpretation.) 
By convention, rules for syntactic functions are as- 
signed, by class, to entries for specific kinds of concepts. 
Thus rules for verb-attached arguments or preposi- 
tional phrases are stored with verbs or verb classes. 
Adjective-noun rules are generally associated with ad- 
jectives, and noun-north rules with the right-hand 
nouns (rite next section discusses the treatment of 
noun-phrase choice points irt somewhat greater detail). 
Lexicon entries also cot|lain additional information. 
First, a list of required syntactic functions is generally 
associated with word-senses. Also, syntactic function 
rules may contain additional conditions limiting their 
applicability. For example, a combination "nounl IN 
noun2* might be limited to apply only if the second 
concept denotes an object larger than the first. 
4.2 Lexicon Application 
Given these lexicon entries, the set of semantic alter- 
natives corresponding to each syntactic alternative 
"synfun wordl word2" may be derived. The goal of 
the derivation process is to account for all possible 
combinations of word-senses for wordl and word2 
related by tile syntactic fimction "synfim". To do this, 
all concept entries containing potentially applicable 
rules are searched. For each satisfied rule fotmd, an 
alternative is created of the form: 
senran{ic-rela~ion sensepairs weigh{ 
where "sensepairs" is a list of pairs. F.ach pair is in 
the form ((di,dj,...) (hnl,hn,...)), where the di's are 
senses of the dependenl participant of the function, 
Obtaining the necessary lexicon information is of course a major problem. But there is significant contemporary work in the automatic or 
semi-automatic derivation of that information. For example, the aproached described by Jcnsen and Binot (19~,6) obtains attachment preference 
information by parsing dictionary definitions. 
248 
and tile hi's are senses of the headword. The semantic 
relationship is stated to apply to all combinations of 
word-senses in the cross-products of those lists. For 
the example sentence, this process would obtain es- 
sentially the alternatives shown in Figure 1, except 
that alternative c23 would first be expressed as: 
obj I t buekl ,l~ckZ ) ( shoo~l, shoot2 ) ) 0 
The last step in the process reduces this result. If 
some of the word-sense combinations are also found 
in an alternative of higher weight, the "dominated" 
combinations are deleted. And if all word sense com- 
binations are so dominated, the alternative is deleted. 
In this way alternative c23 is reduced to its final form. 
After the semantic choice point list is completed, the 
search algorithm is applied as described above. 
5. Preparing Syntactic Choices 
In the example above, the preparation of syntactic 
choice points from parser output was very simple. 
Assuming an input choice point for a constituent to 
be a list of (one or more) parser-provided alternative 
relationships with an immediately containing constitu- 
ent, the process consisted of obtaining the headwords 
of each constituent, and of substituting literal prepo- 
sitions for the general syntactic function "ppmod'. 
i iowever, in other cases this step is a more significant 
one. In the lexicon, selectional preferences are ex- 
pressed in terms of the syntactic functions of some 
basic constituent types. For example, verb preferences 
are expressed in terms of the syntactic functions of 
active-voice, declarative main clauses, with dependents 
in unmarked positions. Adjective preferences are ex- 
pressed in terms of classes of nouns occurring in the 
relationship "adjective noun'. But there are many 
other syntactic relationships whose dlsambiguation de- 
pends on this information. The major function of the 
syntactic choice identification step is to re-express, or 
"normalize" input syntacfic relationsips in terms of the 
relationships assumed by the lexicon. For example, 
passive constructions such as: 
(7) The bucks were shot with a rifle 
are normalized by replacing the choice "subj buck 
shoot" with "object buck shoot'. (A lexicon condition 
barring or lowering preference for the "gambling" in- 
terpretation in the passive voice is also needed here.) 
In ditransitive cases both indirect and direct object 
functions are used as allernatives. 
Thus the sequence of deriving semantic choice points 
consists of two siguilicant steps, which may be depicted 
in terms of results as shown in Figure 5. 
The transformation of input syntactic choice points to 
normalized synlactic choice poinls is governed by de- 
clarative specifications indicaling, for each syntactic 
function, how its choice points are to be transformed. 
The changes are specified as replacements for one or 
more positions of the choice triple. For example, some 
of the "subj" rules are: 
t subj 
( ~es~ tvoicepassive! 
sy~fun 'obj ) ) 
!t:es~ (not (voicef)assive)) 
synfun 'subjl ... 
slating that for tile input fimction "subj', if the specified 
test (voicepassive) succeeds, then "obj" is used for the 
synfun part of tile normalized choice. The additional 
rule is used to ensure that tile ~subject" fimclion is 
retained only for ttle aclive voice. 
Additional applications of these transformations in- 
clude those for modifiers of nomlnalized verbs, attrib- 
utive clauses, and relative clauses. 
Input Syn Chp~l Normalized Syn Chp~l 
................................... 
Choice Cll Choice Clll 
Seman~ ic Chp~l 
Choice C1111 
Choice Cl112 
Choice CllZ Choice C1121 
Choice Cl12Z 
Figure 5: Steps in Semantic Choice Point Derivation 
249 
Noun phrases whose heads are nominalized verbs are 
addressed by adding choice points corresponding to 
verb arguments. Thus for 
(8) 1"he bucks' shooting ..... 
the alternative "nounmod bucks shooting" is expanded 
to include-the alternatives "subj bucks shooting" and 
"obj bucks shooting'. Then, during lexical processing, 
rules for word-senses of the noun "shooting" having 
an associated verb are understood as expanded to 
include the expected verb arguments. 
Attributive clauses such as: 
(9) The bucks were handsome. 
are transformed to allow the application of adjective 
information, tlere'obj handsome were" is transformed 
to *adjmod handsome bucks'. 
For relative clauses, the core of the transformation 
expresses alternative attachments of the relative clause 
as alternative connections between the head of the 
relative clause, and the possible fillers of the gap po- 
sition. (Relative clauses with multiple gaps are gen- 
erally handled in separate parses.) Thus for: 
(10) The rifle above the mantle that the bucks were 
shot with... 
transformations produce the alternatives "with shoot 
rifle" and "with shoot mantle'. 
The handling of relative clauses, however, is more 
complicated than this, as it is desirable to also use 
information from the relative pronoun (if present) for 
the disambiguation. Two initial choice points are in- 
volved, one attaching the relative clause to its higher 
level head, and one attaching the gap to its head. The 
first is expanded to to obtain relationships *relp that 
rifle", "relp that mantle*, and the other to obtain the 
"with .... " relationships. And an additional consistency 
check is made during the tree search, beyond word- 
sense consistency, to keep the choices consistent. 
It should be noted that the transformation rules for 
syntactic choice points also include "fixup specifica- 
tions" (not shown above) indicating how result semantic 
functions and attachments are to be modified if the 
transformed alternatives are used in the final interpre- 
tation. For example, to "fixup" the results of trans- 
forming attributive clauses, the noun-modifier semantic 
role is replaced with one suitable to a direct role in 
the clause. 
6. Concluding Remarks 
This paper has summarized a three-step method for 
optimized combinatorial preference based 
disambiguadon: 
1. obtaining syntactic choice points, with alternatives 
stated as syntactic functions relating words. 
2. transformation into semantic choice points with 
alternatives stated as weighted semantic functions 
relating word-senses, via lexicon search. 
3. application of A* to search alternative combina- 
tions. 
This method, currently being implemented in the con- 
text of a multi-target machine translation system, is 
more powerful and systematic than approaches using 
isolated or interacting decision procedures, and thus 
is easier to modify and extend with new heuristics as 
desired. 
The method is applicable to any post-parse 
disambigualion situation, and can be taken as an ar- 
gument for that approach. To demonstrate this, we 
first note that aspects of the method are useful for 
within-parse disambigualion. In any realistic scheme, 
decisions must of\[en be deferred, making two aspects 
of the method relevant: (a) the unified way of repre- 
senting word sense and attachment alternatives and 
their interdependency, and (b) the explicit, additive 
weights. F.xplicit additive weights substitute for elab- 
orate, case-specific rules, and also make possible a 
systematic treatment of alternative parses which differ 
in more than word-senses and attachments. 
! lowever, using weighted attachments for within-parse 
disambiguation requires calculating the summed 
weights of, and examining the consistency of, all com- 
binations encountered whose elements cannot be dis- 
carded (assuming some good criteria for discarding 
can be found). Deferring disambiguation until after 
the parse allows for optimized searching of alternatives, 
as described above, to significantly limit tile number 
of combinations examined. 
Future work in this direction will include refining tile 
weighting criteria, extending the method It deal with 
anaphoric references (using consideralions developed 
by, for example, Jensen and Zadrozny (1987), and 
integrating a treatment of non-frozen metaphor. 
250 
7.Acknowledgements 
! thank John Sowa, Maria Fuenmayor, and Shelley 
Smith for their careful reviews and many helpful sug- 
gestions. Also, I thank Peter Woon for his patient 
managerial support of this project. 
9. Appendix: Search Algorithm 
Figure 6 describes Ihe step by step application of A* 
to searching semantic choices. 
251 
Assume an "open list" containing, for each node n with an unexamined 
child, the following information: 
1. The list of choices made on the path up to and including n. 
2. The set of constraints on word senses imposed by nodes on the path. 
3. The index of the highest weighted unexamined child 
(choice at next level) of n, wl~re choices within levels are sorted by 
descending weight. 
~. The potential Fc = An + Hc + Hc for that child, ~ere An is the 
accumulated weight on the path up to and including n, Nc is the 
Neight of the child, and Hc is the upper bound on the cumulative 
potential fop paths below the child. 
Then the following algorithm is used to search the tree. 
1. Put an entry for the dummy "top" node in the open list, 
with path=self, index of first child, and its potential Fc. 
2. Find the node n in the "open list" Nhich has the highest 
potential Fc for a child node. 
3. If a full path has already been found, and Fc is lower than the 
total ~leight for that path, the search is over. 
4. Otherwise check the consistency of the designated child in 
entry n. If it is consistent, add an open list entry for the child, 
including a ne~, more constrained consistency requirement. 
~;. Nhether or not the designated child is consistent, determine 
if there are any unexamined children of node n. If so, modify 
entry n accordingly. Otherwise remove entry n from the open list. 
6. If there is a new entry, and it represents a completed path, 
remove it from the open list and perform additional consistency 
checks. If the checks fail, ignore the new path. If they succeed, 
record the path and its score as a competing alternative. 
7. Return to Step Z. 
Figure 6: Search Algorithm 
252 

References 

1. Bates, Madeleine 1976. "Syntax in Automatic 
Speech Understanding', American Journal Computational Linguistics Microfiche 45. 

2. Charniak, Eugene 1986. "A Neat Theory of 
Marker Passing" Proc. AAAI-86 584-588 

3. Cottrell, Garrison W. and Steven L. Small 1984. 
"Viewing Parsing as Word Sense Discrimination: 
A Connectlonist Approach', in B.G. Bara and G. 
Guida (eds), Computation Models of Natural Language Processing, Elsevier Science Publishers B.V. 

4. Dahlgren, Kathleen and Joyce McDowell 1986. 
"Using Commonsense Knowledge to 
Disambiguate Prepositional Phrase Modifiers", 
Proc. AAAI-86, 589-593 

5. Heidorn, George 1982. "Experience with an Easily 
Computed Metric for Ranking Alternative Parses" 
Proc. 20th Annual Meeting of the ACL, June 1982 

6. Hirst, Graeme 1983. "Semantic Interpretation 
Against Ambiguity', Technical Report CS-83-25, 
Brown University, December 1983 

7. Jensen, Karen 1986. "Parsing Strategies in a 
Broad-coverage Grammar of English', IBM Re- 
search Report RC 12147, 1986 

8. Jensen, Karen and Jean-Louis Binot 1987. "A 
Semantic Expert Using an Online Standard Dic- 
tionary', Proc. HCAI-87, 709-714 

9. Jensen, Karen and WIodzimierz Zadrozny 1987. 
"The Semantics of Paragraphs', presented at Logic 
and Linguistics, Stanford, July 1987. 

10. Maxwell, B.D and F. D. Tuggle 1975. "Toward 
a Natural Language Question Answering Facility', 
American Journal Computational Linguistics, Microfiche 61. 

11. Nilsson, Nils J. 1980. Principles of Artificial In- 
telligence, Tioga Publishing Co. 

12. Robinson, Jane J. 1982. "DIAGRAM: A Gram- 
mar for Dialogues', Comm. ACM Vol 25 No 1, 
27-47 

13. Schubert, Klaus 1986. "Linguistic and Extra- 
Linguistic Knowledge" Computers and Translation 
Vol 1, No 3, July-September 1986 

14. Schubert, Lenhart K. 1986. "Are There Preference 
Trade-oirs in Attachment Decisions', Proc. 
AAAI-86, 601-605 

15. Tomita, Masaru 1984. "Disambiguating Grammatically Ambiguous Sentences by Asking", Proc. 
COLING 84 

16. Tomita, Masaru 1985. "An Emcient Context-Free 
Parsing Algorithm for Natural Languages', Proc. 
IJCAI-85,756-763 

17. Walker, Donald E. and William 11. Paxton with 
Gary G. Ilendrix, Ann E. Robinson, Jane J. Rob- 
inson, Jonathan Slocum 1977. "Procedures for 
Integrating Knowledge in a Speech Understanding 
System", SRI Technical Note 143. 

18. Waltz, David L. and J. B. Pollack 1985. "Mas- 
sively Parallel Parsing: A Strongly Interactive 
Model of Natural I.anguage Interpretation', Cog- 
nitive Science Vol 9,No I, January-March 1985, 
51-74 

19. Wilks, Yorick, Xiuming lluang, and Dan Fass 
1985. "Syntax, Preference and Right Attachment', 
Proc. IJCAI-85 779-794 

20. Wilks, Yorick 1975. "An Intelligent Analyzer and 
Understander of English', Comm. ACM, Vol 18 
No 5, 264-274 

21. Wittenburg, Kent 1986. "A Parser for Portable 
Ni. Interfaces Using Graph-Unification-Based 
Grammars", Proc. AAAI-86, 1053-1058 
