Probe bilistic Unification-Based Integration Of 
Syntactic and Semantic Preferences For Nominal Compounds 
Dekai Wu* 
Computer Science Division 
University of California at Berkeley 
Berkeley, CA 94720 U.S.A. 
dekai@ucbvax, berkeley, edu 
Abstract 
In this paper, we describe a probabilistic 
framework for unification-based grammars 
that facilitates'integrating syntactic a~ld se- 
m~mtic constraints and preferences. We 
share many of the concerns found in recent 
work on massively-parallel language inter- 
pret'ation models, although the proposal re- 
flects our belief in the value of a higher-level 
account that is not stated in terms of dis- 
tributed'computati0n. We also feel that in- 
adequate learning theories severely limit ex- 
'isting massively-parallel language interpre- 
tation models. A learning theory is not only 
interesting in its own right, but must un- 
derlie,any quantitative account of language 
interpretation, because the complexity of 
interaction between constraints and prefer- 
ences makes ad hoc trial-and-error strate- 
gies for picking numbers infeasible, partic- 
ula~'ly for semantics in realistically-sized do~ 
fire,ins. 
Introduction 
Massively-parallel models of language interpretation 
~:including markeropassing models and neural net- 
works of both the connectionist and PDP (parallel 
distributed processing) variety--have provoked some 
fundamental questions about the limits of symbolic, 
logic- or rule-based frameworks. Traditional frame- 
works have difficulty integrating preferences in the 
presence of complex dependency relationships, in 
analyzing ambiguous phrases, for example, seman- 
tic information should sometimes override syntac- 
tic prefcrenccs, and vice versa. Such interactions 
can take place at different levels within a phrase's 
constituent structure, even for a single analysis. 
Massiw;ly-parallel models excel at integrating dif- 
ferent sources of preferences in a natural, intuitive 
*Many thanks to Robert Wilensky and Charles Fill- 
more for helpful discussions, and to Hans Karlgren and 
Nigel Ward for constructive suggestions On drafts. This 
research was sponsored in part by the Defense Ad- 
vanced Research Projects Agency (DoD), monitored by 
the Space and Naval Warfare Systems Command under 
N00039-88-C-0292, the Office of Naval Research under 
contract N00014-89-J-3205, and the Sloan Foundation 
under grant 86-10-3. 
fashion; for example, connectionist models simply 
translate dependency constraints into excitatory or 
inhibitory links in relaxation networks (Waltz & Pol- 
lack 1985). Furthermore, massively-parallel models 
have shown remarkable ability to compute complex 
semantic preferences. 
We argue that it is possible and desirable to give 
a more meaningful account of preference integration 
at a higher level, without resort to distributed algo- 
rithms. One could say that we are interested in char- 
acterizing the nature of the preferences, rather than 
how they might be efficiently computed. We do not 
claim that all properties of massively-parallel models 
can or should be described at this level. However, 
few language interpretation models take advantage 
of those properties that can only be characterized at 
the distributed level. 
We also propose a quantitative theory that as- 
signs an interpretation to the numbers used in our 
model. A quantitative theory explains the numbers' 
significance by defining the procedure by which the 
model--in principle, at least--can learn the num- 
bers. Much of the mystique of neural networks is due 
to their potential learning properties, but surpris- 
ingly, few PDP and no connectionist models of lan- 
guage interpretation that we know of specify quanti- 
tative theories, even though numbers must be used to 
run the models. Without a quantitative theoretical 
basis, it seems unlikely that the network structures 
will generalize much beyond the particular hand- 
coded examples, if for no other reason than the ina- 
mense room for variation in constructing such net- 
works. 
Case Study: Nominal Compounds 
Nominal compounds exemplify the sort of phenom- 
ena modeled by interacting preferences. Nouns 
themselves are often homonymous--is dream stale a 
sleep condition or California?---necessitating lexical 
ambiguity resolution. Structural ambiguity resolu- 
tion required for nested nominal compounds, which 
have more than one parse; consider \[baby pool\] 
lable versus baby \[pool tabk\]. Lexicalized nom- 
inal compounds necessitate syntactic preferences, 
while semantic preferences are needed to guide se- 
mantic composition tasks like frame selection and 
case/role binding, as nominal compounds nearly al-' 
ways have multiple possible meanings. Tradition- 
ally, linguists have only classified nominal come 
i 413 
PREFERRED PARSE 
NNN 
NNN 
COMPETING LEXICALIZED COMPOUNDS 
First compound more lexicalized 
kiwi fruit juice 
LEX\]CALIZED 
LEX\]CALIZED 
New York state park 
LEXICALIZED 
LEXICAIJZED 
Second compound more lexicalized 
navel oran\[~e juice 
LEXICAL~ED 
LEXICAI.JF~D 
baby poo ! table 
LEXICA LIZED 
LEXICALIZED 
COMPETING LEXICALIZED AND IDENTIFICATIVE 
COMPOUNDS 
aft.moon rest area 
IDENTIFICATIVE 
LEXICALIZED 
gold watch chain 
LEXICALIZED 
IDENTIFICATIVE 
Figure 1. Nominal compounds requiring integration of semantic preferences. 
(a) Waltz & Pollack 1985, Bookman 1987 
WORD SENSES 
MICROFEATURES 
Co) Wermter 1989a, Wermter & Lehnert 1989 
NOUNI 1 NOUN2 ~ 
1.0 1.0 
Conceptually equivalent to: 
INPUT WORD 
SENSE 
SIMIL.MirIY 
"I0 OTHER 
SENSES 
SENSES ~ 
~CRO~AaXJRBS 
o.o 1.o o.o --..~ ~ 
4-... ~mDEN / \ Miaaoa LAYm 
0.6 0.3 O.l 
0.5 0.3 0.9 GOODNESS 
Figure 2. PDP semantic similarity evaluators. 
pounds according to broad criteria such as part- 
whole or source-result relationships (Jespersen 1946; 
Quirk et al. 1985); several large-scale studies have 
provided somewhat finer-grained classifications on 
the order of a dozen classes (Lees 1963; Levi 1978; 
Warren 1978). However, the emphasis has been on 
predicting the possible meanings of a compound, 
rather than predicting its preferred meaning. An ex- 
ception is Leonard's (1984) rule-based model which, 
howew~r, only produces fairly coarse interpretations 
with medium (76%) accuracy. 
We distinguish three major classes of nominal com- 
pounds: lexicalized (conventional), such as clock ra- 
dio; identificative, such as clock gears; and creative, 
such as clock table. Both identificative and creative 
compounds are novel in Downing's (1977) sense; they 
differ in that an identificative compound serves to 
identify a known (but hitherto unnamed) seman- 
tic category, whereas to interpret a creative com- 
pound requires constructing a new semantic struc- 
ture. There is a bias to use the most specific pre- 
existing categories that match the compound be- 
ing analyzed, syntactic or semantic. Precedence is 
given to a conventional parse if one exists, then 
a parse with an identificative interpretation, and 
lastly a parse with a creative interpretation. How- 
ever, this "Maximal Conventionality Principle" can 
easily be overruled by global considerations aris- 
ing from the embedding phrase and context. Fig- 
ure 1 shows examples where two conventional com- 
pounds compete, and where global considerations 
cause an identificative compound to be preferred 
over a competing conventional compound. These 
cases require integration of quantitative syntactic 
and semantic preferences, since non-quantitative in- 
tegration schemes (e.g., Marcus 1980; Hirst 1987; 
Lytinen 1986) do not discriminate adequately be- 
tween the alternative analyses. 
What Do Massively-Parallel Models 
Really Say? 
One use of massive parallelism is to evaluate the 
similarity or compatibility between two concepts in 
order to generate semantic preferences. Similarity 
evaluators usually employ PDP networks where se- 
mantic concepts are internally represented as dis- 
tributed activation patterns over a set of ~microfea- 
turcs'. Conceptually, the network in Figure 2a gives 
a similarity metric between a given concept and ev- 
ery other concept, computed as the weighted sum of 
shared microfeatures. 1 Likewise, the hidden layer in 
Figure 2b computes the goodness of every possible 
relation between the given pair of nouns. In non- 
massively-parallel terms, what such nets do is cap- 
ture statistical dependencies between concepts, down 
to the granularity of the chosen "microfeatures'. A 
probabilistic feature-structure formalism employing 
the same granularity of features should be able to 
capture the same dependencies. 
Connectionist models are often used to integrate 
syntactic and semantic preferences front different 
information sources (Cottrell 1984, 1985; Wermter 
1989b; Wermter & Lehnert 1989). Nodes represent 
t Ignoring Bookman's persistent activation, which sim- 
ulates recency-based contextual priming. 
414 2 
hypotheses about word senses, parse structures, or 
role bindings; links represent either supportive or 
inhibitory dependencies between hypotheses. The 
links constrain the network so that activation prop- 
agation causes the net to relax into a state where 
the hypotheses are all consistent with one another. 
~.I'he most severe problem with these models is the 
~rbitariness of the numbers used; Cottrell, for exam- 
p\]e, admits "weight twiddling" and notes that lack 
of formal analysis hampers determination of param- 
eters. In other words, although the networks settle 
itlto consistent states, there is no principle determin- 
lag the probability of each state. 
McClelland & Kawamoto's (1986) PDP model 
learns how :syntactic (word-order) cues affect seman- 
tic frame/case selection, yielding more principled 
preference integration. Like the PDP similarity eval- 
uators, however, the information encoded in the net- 
work and its weights is not easily comprehended. 
Previous non-massively-parallel proposals for 
quantitative preference integration have used non- 
probabilistic evidence combination schemes. Schu- 
bert (1986) suggests summing "potentials" up the 
phrase-structure trees; these potentials derive from 
salience, logical-form typicality, and semantic typ- 
icality conditions. McDonald's (1982) noun com- 
pound interpretation model also sums different 
sources of syntactic, semantic, and contextual evi- 
dence. Though qualitatively appealing, additive cal- 
culi are liable to count the same evidence more than 
once, and use arbitrary evidence weighting schemes, 
making it impossible to construct a model that works 
for all cases. Hobbs el al. (1988) propose a theorem- 
proving lnodel that integrates syntactic constraints 
with variable-cost abductive semantic and pragmatic 
a~sumptions. The danger of these non-probabilistic 
approaches, as with connectionist preference integra- 
tor's, is that the use of poorly defined "magic num- 
b,:'rs" makes large-scale generalization difficult. 
A Probabillstlc Unificatlon-Based 
Preference Formulation 
We are primarily concerned here with the following 
p,:oblem: given a nominal compound, determine the 
ranking of its possible interpretations from most to 
least likely. The problem can be formulated in terms 
of unification. Unification-based formalisms provide 
an elegant means of describing the information struc- 
tures used to construct interpretations. Lexical and 
structural ambiguity resolution, as well as semantic 
composition, are readily characterized as choices be- 
tween alternative sequences of unification operations. 
A key feature of unification--especially important 
foJ: preference integration--is its neutrality with re- 
spect to control, i.c., there is no inherent bias in the 
order of unifications, and thus, no bias as to which 
choices take precedence ovcr others. Although nom- 
inal compound interpretation involves lcxical and 
st t'uctural ambiguity resolution and semantic com- 
p()sition, it is not a good idea to centralize con- 
trol around any single isolated task, because there 
is too much interaction. For example, the frame se- 
lection problem affects lexical arnbiguity resolution 
(consider the special case where the frame selected 
is that signified by the lexical item). Likewise, frame 
selection and case/role binding are two aspects of the 
same semantic composition problem, and structural 
ambiguity resolution depends largely on preferences 
in semantic composition. 
Thus we turn to unification for a clean formu- 
lation of the problem. Three classes of feature- 
structures are used: syntactic, semantic, and con- 
structions. The construction is defined in Fillmore's 
(1988) Construction Grammar as "a pairing of a 
syntactic pattern with a meaning structure"; they 
are similar to signs in HPSG (Pollard & Sag 1987) 
and pattern-concept pairs (Wilensky & Arens 1980; 
Wilensky et al. 1988). Figure 3 shows a sample 
construction containing both syntactic and seman- 
tic feature-structures. 2 Typed feature-structures are 
used: the value of the special feature TYPE is a 
type in a multiple-inheritance type hierarchy, and 
two TYPE values unify only if they are not disjoint. 
This allows (1) easy transformation from semantic 
feature-structures to more convenient frame-based 
semantic network representations, and (2) efficient 
encoding of partially redundant lexical/syntactic cat- 
egories using inheritance (see, for example, Pollard 
& Sag 1987; Jurafsky 1990). Our notation is cho- 
sen for generality; the exact encoding of signification 
relationships is inessential to our purpose here. 
TYPE: NN.constrl \[ TYPE: NN \] 
S YN: CONS T1 : 1 
CONST~: 2 
SEM: 1\[ TYPE: thing \] 
TYPE: composlte-thlng \] 
FRAME: ROLE1: 3 
ROLE2: ¢ TYPE: N-constr \] 
SUBI: SYN: 1 
SEM: a TYPE: N-constr \] 
SUB~: SYN: 2 SEM: 4 
Figure 3. A nominal compound construction. 
Given a nominal compound (of arbitary length), 
an intevpretalion is defined as an instantiated 
construction--including all the syntactic, seman- 
tic, and sub-construction f-structures--such that the 
syntactic structure parses the nominal compound, 
and the semantic structure is consistent with all the 
(sub-)constructions. Figure 4 shows an interpreta- 
tion of afternoon rest. Given this framework, lexical 
ambiguity resolution is the selection of a particular 
sub-construction for a lexical item that matches more 
than one construction, structural ambiguity resolu- 
tion is the selection between alternative syntactic f- 
structures, and semantic composition is the selection 
between alternative semantic f-structures. In each 
case we must be be able to compare alternative in- 
terpretations and determine the best. 
Before discussing how to compare interpretations, 
let us briefly consider the sort of information avail- 
able. We extend the unification paradigm with a 
function f that returns the relative frequency of any 
category in the type hierarchy, normalized so that 
for any category cat, f(cat) = P\[cat(x)\] where x is a 
2Ordering constraints are omitted in this paper. 
3 415 
TYPE: 
SYN: 
SEM: 
FRAME: 
SUB1: 
SUB£: 
NN- eonstrl \[ TYPE: NN \] 
CONSTI: 1 
CONST$: 2 
1 
TYPE: nap-Jrame \] 
TIME: 3 
S TATE: 4 
~Y~. ,\[ TY.E: "oj.r.oo." \] 
SEM: 3 TYPE: \] 
SYN: 2 TYPE: "rest" 
SEM: 4 TYPE: rest \] 
CONSTI\ /CONSn ~ ~" ~ /STATe 
,3=NN(i3)Sl,,~,~(i3,U),r2(i3,i2) e7=nft-co~tr(iT) .... 2g % / '~ / ,~--."~ ~,~") .... 
eg=~-exmltrl (i9) .... 
Figure 4. Bracket and graph representations of an in- 
terpretation of "afternoon rest". 
random variable ranging over all categories. For se- 
mantic categories, this provides the means of encod- 
ing typicality information. For syntactic categories 
and constructions, this provides a means of encoding 
information about degrees of lexicalization. Since f 
is defined in terms of relative frequency, there is a 
learning procedure for f: given a training set of cor- 
rect interpretations, we need only count the instances 
of each category (and then normalize). 
The probabilistic goodness metric for an interpre- 
tation is defined as follows: the goodness of an inter- 
pretation is the probability of the entire construction 
given the input words of the nominal compound, e.g., 
P\[+c:)l + s1, +82\] = P\[ NN-constrl(ig)l "afternoon"(ix)^ "rest"(i2)\]. 
The metric is global, in that for any set of alternative 
interpretations, the most likely interpretation is that 
with the highest metric. 
As a simplified example of computing the metric, 
suppose the feature graph of Figure 4 constituted a 
complete dependency graph containing all candida~ 
hypotheses (actually an unrealistic assumption since 
this would preclude any alternative interpretations). 
For each pair of connected nodes, the conditional 
probability of the child, given the ancestor, is given 
by the ratio of their relative frequencies (Figure 5a). 
The metric only requires computing the probability 
of c9 (Figure 5b). 3 Nodes are clustered into multi- 
valued compound variables as necessary to eliminate 
loops, to ensure counting any piece of evidence only 
once (Figure 5c). 
The conditional probability vectors P\[+c91zi\] and 
P\[zll + sl, +s2\] are computed using the disjunctive interaction 
model: 4 
3A natural language processing system needs to prop- 
agate probabilities to the semantic hypotheses as well, in 
order to make use of the interpreted information. 
4Jnstification for the disjunctive interaction model is 
beyond our scope here; it is a standard approximation 
P\[+c9\] + 83, q-c?, +c8\] 
= 1 - p\[~cgl + s3\]. P\[~cg\] + c7\]. p\[~cgl + cs\] 
P\[+~91 + 83, +~7, --~s\] = 1 - P\[--¢91 + s~\]. p\[-.cg\] + c~\] 
P\[+col + 83, ~c~, +~s\] = 1- Pb~gl + s3\].Pbc~l + cs\] 
P\[+~l + s~, ~c7, ~es\] = 1 - P\[~l + ~3\] 
P\[+e~l~s3, +c~, +~s\] = 1 - P\[~l + c~\]. F\[~cgt + cs\] 
P\[+c9l~s3, +c7,--~c8\] = 1 - P\[-~c9\] + c7\] 
P\[+~91-~s3, ~¢~, +~\] = 1 - Pbc~l + cs\] P\[+cgl",s3, ~cr,-~cs\] 
= 1 - 1 
P\[+s3, +c7, +cs\] + 81, +82\] 
= P\[+s31 + si, +s2\]" P\[+cvl + sl, +su\] 
"P\[+csl + Sl, +s2\] 
= {1 - P\[~s31 + sx\]. P\["s31 + s2\]} • ... P\[+s3, +c~, 
"csl + 81, +s2\] .... 
Finally, we compute P\[+c91 + s1,+s2\] by condi- 
tioning on the compound variable Z and taking 
the weighted average of P\[+cglZ, +sl, +s2\] over all 
states of Z: 
Ei P\[+cglzi, +sl, +s2\]P\[z~l + 81, +s2\] 
= E~ P\[+~l~de\[~l + Sl, +8~\]. 
Both syntactic and semantic preferences are taken 
into account. The influence of semantic preferences is 
encoded in the conditional probabilities P\[+cg\] + c7\] 
and P\[+cgl + cs\]J The loops in the original de- 
pendency graph correspond to support for the in- 
terpretation via both syntactic and semantic paths. 
A more complex example demonstrating structural 
ambiguity resolution is shown in Figure 6; here an 
afternoon rest schema produces a semantic prefer- 
ence that overrides a syntactic preference arising 
from weak lexicalization of the nominal compound rest area. 6 
A major unsolved problem with this approach is 
specificity selection. This is a well-known trade-off 
in classification models: the more general the inter- 
pretation, the higher its probability is; whereas the 
more specific the interpretation, the greater its util- 
ity and the more informative it is. The probabilistic 
goodness metric does not help when comparing two 
interpretations whose only difference is that one is 
more general than the other. 7 In our initial studies 
we attempted to handle this trade-off using thresh- 
olded marker-passing techniques (Wu 1987, 1989), 
but we are currently investigating a stronger utility 
used to complete the probability model in cases where 
is infeasible to gather or store full conditional probabil- 
ity matrices for all input combinations (see Pearl 1988). 
Heavily biased conditional probability matrices that can- 
not be satisfactorily approximated by disjunctive inter- 
action can sometimes be handled by forming additional 
categories. The apparent schema-organization of human 
memory may well arise for the same reason. 
~These conditional probabilities cannot be derived 
solely from frequency counts since c9 is an instance of 
a novel category--the category of "afternoon rest" con- 
structions denoting a nap--with zero previous frequency. 
Instead, the conditional probabilities P\[+c9\] + c~\] and 
P\[+cgl + cs\] are a function of the ancestral conditional 
probabilities P\[+s31+sl\], P\[+s3i+p~\], P\[+x6l+z4\], and 
P\[+z6\] + zs\] plus the disjunctive interaction assumption. 
6Note that (a) and (b) are two partitions of the same 
dependency graph. 
7Norvig (1989) has also noted the competition be- 
tween probability and utility in the context of language 
interpretation. 
416 4 
(a) 
~. ~ \[z2 = ~3, +eT, -e~ f3 f3/fl N \]~/rl ~ N \] _j z3--+s3,-e7,+c8 
s3~.. ~ ~ .~ ~'f6/1:5 s3.~ ~ ~. Z~. -'''7 z4=+s3''c7,'e8 
~eT~ /8j6 ~eT~ /8 k V'8'-'_s3,.e7,.c8 
fg/s ~\ l~ fg/r3 ~\ l 
(a) e9 (b) c9 (e) c9 
Figure 5. Computing the goodness metric for an "afternoon rest" interpretation (see text). 
1.00 1.00 1.00 sl =' 'aO~moon"(il) s2="rest"(i2) .3="ama"(i3) 
.i )¢ 0.7, 07£10> 
o.~ ~\ ~ e 19=Nlq-eonstr=w-nap-c~-ea-semtattic~(il 9) 
1.00 1.00 1.00 sl ~ '~aRemoon"(il) 12=' 'r,e4rt "(i2) s3="a/ea"(i3) 
,7=NN(iT~ clS=afl-eomstrf~lS) e20=mst-are~-~~ 0,111 
e21 =NN.eons~'-w4ime4nte~tate-~em~mtie~(i21) 
Figure 6. Semantic overriding syntactic preference in "afternoon rest area". 
theory to complement the probabilistic model, in- 
corporating both explicit invariant biases and prob- 
abilistically learned utility expectations. It is not 
yet clear whether we shall also need to incorporate 
pragmatic utility expectations in the constructions. 
For methodological reasons we have deliberately 
impoverished the statistical database, by depriving 
the model of all information except for category 
frequencies, relying upon disjunctive interaction to 
complete the probability model. This limitation on 
the complexity of statistical information is too re- 
strictive; disjunctive interaction cannot satisfactorily 
approximate cases where 
P\[-{-c3lCl, (:21 ~, 1 - P\[-c3lcl\]. P\[ czlq\]. 
Such cases appear to arise often; for example, the 
presence of two nouns, rather than one, increases 
the probability of a compound by a much greater 
factor than modeled by disjunctive interaction. We 
intend to test variants of the model empirically on 
a corpus of nominal compounds, with randomly se- 
lected training sets; the restrictions on complexity of 
conditional probability information will be relaxed 
depending upon the resulting prediction accuracy. 
Conclusion 
We have suggested extending unification-based for- 
malisrrLs to express the sort of interacting prefer- 
ences used in massively-parallel language models, us- 
ing probabilistic techniques. In this way, quantita- 
tive claims that remain hidden in many massively- 
parallel models can be made more explicit; more- 
over, the numbers and the calculus are motivated 
by a reasonable assumption about language learn- 
ing. We hope to see increased use of pr0babilistic 
models rather than arbitrary calculi in language re- 
search: Charniak & Goldman's (1989) recent anal- 
ysis of probabilities in semantic story structnres is 
a promising development in this direction. Stol- 
eke (1989) transformed a unification grammar into 
a connectionist framework (albeit without prefer- 
ences); we have taken the opposite tack. Many 
linguists have acknowledged the need to extend 
their frameworks to handle statistically-based syn- 
tactic and semantic judgements (e.g., Karlgren 1974; 
Ford et al. 1982, p. 745), but only in passing, largely, 
we suspect, due to the unavailability of adequate rep- 
resentational tools. Because our proposal makes di- 
rect use of traditional unification-based structures, 
larger grammars should be easy to construct and 
5 417 
incorporate; because of the direct correspondence 
to semantic net representations, complex semantic 
models of the type found in AI work may be more 
readily exploited. 

References 

Bookman, L. A. (1987). A microfeature based scheme 
for modelling semantics. In Proceedings of the Tenth 
International Joint Conference on Artificial lntelli. 
gence, pp. 611-614. 

Charniak, E. & R. Goldman (1989). A semantics 
for probabilistic quantifier-free first-order languages, 
with particular application to story understanding. 
In Proceedings of the Eleventh International Joint 
Conference on Artificial Intelligence, pp. 1074-1079. 

Cottrell, G. W. (1984). A model of lexical access of am- 
biguous words. In Proceedings of the Fourth Na- 
tional Conference on Artificial Intelligence, pp. 61-67. 

Cottrell, G. W. (1985). A connectionist approach to word 
sense disambiguation. Technical Report TR 154, 
Univ. of Rochester, Dept. of Comp. Sci., New York. 

Downing, P. (1977). On the creation and use of English 
compound nouns. Language, 53(4):810-842. 

Fillmore, C. J. (1988). On grammatical construe- 
tions. Unpublished draft, University of California 
at Berkeley. 

Ford, M., J. Bresnan, & R. M. Kaplan (1982). A 
competence-based theory of syntactic closure. In 
J. Bresnan, editor, The Mental Representation of 
Grammatical Relations, pp. 727-796. MIT Press, 
Cambridge, MA. 

Hirst, G. (1987). Semantic Interpretation and the Res- 
olution of Ambiguity. Cambridge University Press, 
Cambridge. 

Hobbs, J. R., M. Stickel, P. Martin, & D. Edwards (1988). 
Interpretation as abduction. In Proceedings of the 
£6th Annual Conference of the Association for Com- 
putational Linguistics, pp. 95-103, Buffalo, NY. 

Jespersen, O. (1946). A Modern English Grammar on 
Historical Principles, volume 6. George Alien & Un- 
win, London. 

Jurafsky, D. S. (1990). Representing and integrating 
linguistic knowledge. In Proceedings of the Thir- 
teenth International Conference on Computational 
Linguistics, Helsinki. 

Karlgren, H. (1974). CategoriM grammar calculus. Sta- 
tistical Methods In Linguistics, 1974:1-128. 

Lees, R. B. (1963). The Grammar of English Nominal. 
izations. Mouton, The Hague. 

Leonard, R. (1984). The Interpretation of English Noun 
Sequences on the Computer. North Holland, Ams- 
terdam. 

Levi, J. N. (1978). The Syntax and Semantics of Complex 
Nominals. Academic Press, New York. 

Lytinen, S. L. (1986). Dynamically combining syntax 
and semantics in natural language processing. In 
Proceedings of the Fifth National Conference on Ar- 
tificial Intelligence, pp. 574-578. 

Marcus, M. P. (1980). A Theory of Syntactic Recognition 
for Natural Language. MIT Press, Cambridge. 

McClelland, J. L. & A. H. Kawamoto (1986). Mecha- 
nisms of sentence processing: Assigning roles to con- 
stituents of sentences. In J. L. McClelland & D. E. 
Rumelhart, editors, Parallel Distributed Processing, 
volume 2, pp. 272-325. MIT Press, Cambridge, MA. 

McDonald, D. B. (1982). Understanding noun 
compounds. Technical Report CMU-CS-82-102, 
Carnegie-Mellon Univ., Dept. of Comp. Sci., Pitts- 
burgh, PA. 

Norvig, P. (1989). Non-disjunctive ambiguity. Unpub- 
lished draft, University of California at Berkeley. 

Pearl, J. (1988). Probabilistie Reasoning in Intelligent 
Systems: Networks of Plausible Inference. Morgan 
Kaufmann, San Mateo, CA. 

Pollard, C. & I. A. Sag (1987). Information-Based Syntax 
and Semantics: Volume 1: Fundamentals. Center 
for the Study of Language and Information, Stan- 
ford, CA. 

Quirk, R., S. Greenbaum, G. Leech, & J. Svartvik (1985). 
A Comprehensive Grammer of the English Lan- 
guage. Longman, New York. 

Schubert, L. K. (1986). Are there preference trade-offs in 
attachment decisions? In Proceedings of the Fifth 
National Conference on Artificial Intelligence, pp. 
601-605. 

Stolcke, A. (1989). Processing unification-based gram- 
mars in a connectionist framework. In Program 
of the Eleventh Annual Conference of the Cognitive 
Science Society, pp. 908-915. 

Waltz, D. L. & J. B. Pollack (1985). Massively paral- 
lel parsing: A strongly interactive model of natural 
language interpretation. Cognitive Science, 9:51-74. 

Warren, B. (1978). Semantic Patterns of Noun-Noun 
Compounds. Acts Universitatis Cothoburgensis, 
Gothenburg, Sweden. 

Wermter, S. (1989a). Integration of semantic and syn- 
tactic constraints for stuctural noun phrase disam- 
biguation. In Proceedings of the Eleventh Inter- 
national Joint Conference on Artificial Intelligence, 
pp. 1486-1491. 

Wermter, S. (1989b). Learning semantic relationships in 
compound nouns with connectionist networks. In 
Program of the Eleventh Annual Conference of the 
Cognitive Science Society, pp. 964-971. 

Wermter, S. & W. G. Lehnert (1989). Noun phrase anal- 
ysis with connectionist networks. In N. Sharkey & 
R. Reilly, editors, Conneetionist Approaches to Lan- 
guage Processing. In press. 

Wilensky, R. & Y. Areas (1980). Phran - a knowledge- 
based approach to natural language analysis. Tech- 
nical Report UCB/ERL M80/34, University of Cali- 
fornia at Berkeley, Electronics Research Laboratory, 
Berkeley, CA. 

Wilensky, R., D. Chin, M. Luria, J. Martin, J. Mayfield, 
& D. Wu (1988). The Berkeley UNIX Consultant 
project. Computational Linguistics, 14(4):35-84. 

Wu, D. (1987). Concretion inferences in natural lan- 
guage understanding. In K. Morik, editor, Pro- 
ceedings of GWA1-87, 11th German Workshop on 
Artificial Intelligence, pp. 74-83, Geseke. Springer- 
Verlag. Informatik-Fachberichte 152. 

Wu, D. (1989). A probabilistic approach to marker prop- 
agation. In Proceedings of the Eleventh International 
Joint Conference on Artificial Intelligence, pp. 574- 
580, Detroit, MI. Morgan Kaufmann. 
