Integrating Symbolic and Statistical Representations: 
The Lexicon Pragmatics Interface 
Ann Copestake 
Center for the Study of Language and Information, 
Stanford University, 
Ventura Hall, 
Stanford, CA 94305, 
USA 
aac~csl£, stanford, edu 
Alex Lascarides 
Centre for Cognitive Science and 
Human Communication Research Centre, 
University of Edinburgh, 
2, Buccleuch Place, 
Edinburgh, EH8 9LW, 
Scotland, UK 
alex@cogsci, ed. ac. uk 
Abstract 
We describe a formal framework for inter- 
pretation of words and compounds in a 
discourse context which integrates a sym- 
bolic lexicon/grammar, word-sense proba- 
bilities, and a pragmatic component. The 
approach is motivated by the need to han- 
dle productive word use. In this paper, 
we concentrate on compound nominals. 
We discuss the inadequacies of approaches 
which consider compound interpretation as 
either wholly lexico-grammatical or wholly 
pragmatic, and provide an alternative inte- 
grated account. 
1 Introduction 
VVhen words have multiple senses, these may have 
very different frequencies. For example, the first two 
senses of the noun diet given in WordNet are: 
O 1. (a prescribed selection of foods) 
=> fare - (the food and drink that are regularly 
consumed) 
2. => legislature, legislative assembly, general as- 
sembly, law-makers 
\]k|ost English speakers will share the intuition that 
the first sense is much more common than the sec- 
ond, and that this is (partly) a property of the word 
and not its denotation, since near-synonyms oc- 
cur with much greater frequency. Frequency differ- 
ences are also found between senses of derived forms 
(including morphological derivation, zero-derivation 
and compounding). For example, canoe is less fre- 
quent as a verb than as a noun. and the induced ac- 
tion use (e.g., they canoed the kids across the lake) is 
much less frequent than the intransitive form (with 
location PP) (they canoed across the lake). 1 A de- 
rived form may become established with one mean- 
ing, but this does not preclude other uses in suffi- 
ciently marked contexts (e.g., Bauer's (1983) exam- 
ple of garbage man with an interpretation analogous 
to snowman). 
Because of the difficulty of resolving lexical am- 
biguity, it is usual in NLP applications to exclude 
'rare' senses from the lexicon, and to explicitly list 
frequent forms, rather than to derive them. But this 
increases errors due to unexpected vocabulary, espe- 
cially for highly productive derivational processes. 
For this and other reasons it is preferable to as- 
sume some generative devices in the lexicon (Puste- 
jovsky, 1995). Briscoe and Copestake (1996) argue 
that a differential estimation of the productivity of 
derivation processes allows an approximation of the 
probabilities of previously unseen derived uses. If 
more probable senses are preferred by the system, 
the proliferation of senses that results from uncon- 
strained use of lexical rules or other generative de- 
vices is effectively controlled. An interacting issue is 
the granularity of meaning of derived forms. If the 
lexicon produces a small number of very underspeci- 
fled senses for a wordform, the ambiguity problem is 
apparently reduced, but pragmatics may have insuf- 
ficient information with which to resolve meanings, 
or may find impossible interpretations. 
We argue here that by utilising probabilities, a 
language-specific component can offer hints to a 
pragmatic module in order to prioritise and con- 
trol the application of real-world reasoning to disam- 
biguation. The objective is an architecture utilising 
a general-purpose lexicon with domain-dependent 
probabilities. The particular issues we consider here 
are the integration of the statistical and symbolic 
components, and the division of labour between se- 
1Here and below we base our frequency judgements 
on semi-automatic analysis of the written portion of the 
tagged British National Corpus (BNC). 
136 
Arzttermin *doctor appointment doctor's appointment 
Terminvorschlag * date proposal 
Terminvereinbarung * date agreement 
proposal for a date 
agreement on a date 
Januarh/ilfte 
Fr/ihlingsanfang 
* January half 
* spring beginning 
half of January 
beginning of spring 
Figure 1: Some German compounds with non-compound translations 
mantics and pragmatics in determining meaning. 
We concentrate on (right-headed) compound nouns, 
since these raise especially difficult problems for NLP 
system architecture (Sparck Jones, 1983). 
2 The grammar of compound nouns 
Within linguistics, attempts to classify nominal com- 
pounds using a small fixed set of meaning relations 
(e.g., Levi (1978)) are usually thought to have failed, 
because there appear to be exceptions to any clas- 
sification. Compounds are attested with meanings 
which can only be determined contextually. Down- 
ing (1977) discusses apple juice seat, uttered in a 
context in which it identifies a place-setting with a 
glass of apple juice. Even for compounds with es- 
tablished meanings, context can force an alternative 
interpretation (Bauer, 1983). 
These problems led to analyses in which the re- 
lationship between the parts of a compound is un- 
determined by the grammar, e.g., Dowty (1979), 
Bauer (1983). Schematically this is equivalent to the 
following rule, where R is undetermined (to simplify 
exposition, we ignore the quantifier for y): 
NO ---4 N1 N2 
(1))~x\[P(x) A Q(y) A R(x, y)\] )~y\[Q(y)\] )~x\[P(x)\] 
Similar approaches have been adopted in NLP with 
further processing using domain restrictions to re- 
solve the interpretation (e.g., Hobbs et al (1993)). 
However, this is also unsatisfactory, because (1) 
overgenerates and ignores systematic properties of 
various classes of compounds. Overgeneration is 
apparent when we consider translation of German 
compounds, since many do not correspond straight- 
forwardly to English compounds (e.g., Figure 1). 
Since these exceptions are English-specific they can- 
not be explained via pragmatics. Furthermore they 
are not simply due to lexical idiosyncrasies: for 
instance, Arzttermin/*doctor appointment is repre- 
sentative of many compounds with human-denoting 
first elements, which require a possessive in English. 
So we get blacksmith's hammer and not * blacksmith 
hammer to mean 'hammer of a type convention- 
ally associated with a blacksmith' (also driver's cab, 
widow's allowance etc). This is not the usual pos- 
sessive: compare (((his blacksmith)'s) hammer) with 
(his (blacksmith's hammer)). Adjective placement is 
also restricted: three English blacksmith's hammers/ 
*three blacksmith's English hammers. We treat these 
as a subtype of noun-noun compound with the pos- 
sessive analysed as a case marker. 
In another subcategory of compounds, the head 
provides the predicate (e.g., dog catcher, bottle 
crusher). Again, there are restrictions: it is not 
usually possible to form a compound with an agen- 
tire predicate taking an argument that normally re- 
quires a preposition (contrast water seeker with * wa- 
ter looker). Stress assignment also demonstrates in- 
adequacies in (1): compounds which have the in- 
terpretation 'Y made of X' (e.g., nylon rope, oak 
table) generally have main stress on the righthand 
noun, in contrast to most other compounds (Liber- 
man and Sproat, 1992). Stress sometimes disam- 
biguates meaning: e.g., with righthand stress cotton 
bag has the interpretation bag made of cotton while 
with leftmost stress an alternative reading, bag for 
cotton, is available. Furthermore, ordering of ele- 
ments is restricted: e.g., cotton garment bag/ *gar- 
ment cotton bag. 
The rule in (1) is therefore theoretically inade- 
quate, because it predicts that all noun-noun com- 
pounds are acceptable. Furthermore, it gives no hint 
of likely interpretations, leaving an immense burden 
to pragmatics. 
We therefore take a position which is intermediate 
between the two extremes outlined above. We as- 
sume that the grammar/lexicon delimits the range 
of compounds and indicates conventional interpre- 
tations, but that some compounds may only be re- 
solved by pragmatics and that non-conventional con- 
textual interpretations are always available. We de- 
fine a number of schemata which encode conven- 
tional meanings. These cover the majority of com- 
pounds, but for the remainder the interpretation is 
left unspecified, to be resolved by pragmatics. 
137 
general-nn \[ 
possessive 
/1\ \] made-of\] purpose-patient deverbal / 
I n°n-derived-pp I I deverbal-pp \] 
linen chest ice-cream container 
Figure 2: Fragment of hierarchy of noun-noun compound schemata. The boxed nodes indicate actual 
schemata: other nodes are included for convenience in expressing generalisations. 
general-nn NO -> N1 N2 
Ax\[P(x) A Q(y) A R(x, y)\] Ay\[Q(y)\] Ax\[P(x)\] 
R =/general-nn anything anything 
/stressed 
made-of R = made-of substance physobj 
/stressed 
purpose-patient R = TELIC(N2) anything artifact 
Figure 3: Details of some schemata for noun-noun compounds. / indicates that the value to its right is 
default information. 
Space limitations preclude detailed discussion but 
Figures 2 and 3 show a partial default inheri- 
tance hierarchy of schemata (cf., Jones (1995)). 2 
Multiple schemata may apply to a single com- 
pound: for example, cotton bag is an instantiation of 
the made-of schema, the non-derived-purpose- 
patient schema and also the general-nn schema. 
Each applicable schema corresponds to a different 
sense: so cotton bag is ambiguous rather than vague. 
The interpretation of the hierarchy is that the use 
of a more general schema implies that the meanings 
given by specific subschemata are excluded, and thus 
we have the following interpretations for cotton bag: 
1. Ax\[cotton(y) A bag(x) A made-of(y, x)\] 
2. Ax\[cotton(y) A bag(x) A TELIC(bag)(y,x)\] = 
Ax\[cotton(y) A bag(x) A contain(y, x)\] 
2We formalise this with typed default feature struc- 
tures (Lascarides et al, 1996). Schemata can be re- 
garded formally as lexical/grammar rules (lexical rules 
and grammar rules being very similar in our framework) 
but inefficiency due to multiple interpretations is avoided 
in the implementation by using a form of packing. 
3. Ax\[R(y, x) A -~(made-of(y, x) V contain(y, x) V ...)\] 
The predicate made-of is to be interpreted as ma- 
terial constituency (e.g. Link (1983)). We follow 
Johnston and Busa (1996) in using Pustejovsky's 
(1995) concept of telic role to encode the purpose 
of an artifact. These schemata give minimal indi- 
cations of compound semantics: it may be desirable 
to provide more information (Johnston et al, 1995), 
but we will not discuss that here. 
Established compounds may have idiosyncratic in- 
terpretations or inherit from one or more schemata 
(though compounds with multiple established senses 
due to ambiguity in the relationship between con- 
stituents rather than lexical ambiguity are fairly un- 
usual). But established compounds may also have 
unestablished interpretations, although, as discussed 
in §3, these will have minimal probabilities. In 
contrast, an unusual compound, such as apple-juice 
scat, may only be compatible with general-nn, and 
would be assigned the most underspecified interpre- 
tation. As we will see in §4, this means pragmatics 
138 
Unseen-prob-mass(cmp-form) = number-of-applicable-schemata(cmp-form) 
I ~eq( cmp-form ) + number-of-applicable-schemata( cmp-form ) 
Prod(csl) Estimated-freq(interpretationi with cmp-formj) = Unseen-prob-mass(cmp-formj) x ~ Prod(csl) ..... Prod(cs.,) 
(where csl ... cs, are the compound schemata needed to derive the n unattested entries for the form j) 
Figure 4: Probabilities for unseen compounds: adapted from Briscoe and Copestake (1996) 
must find a contextual interpretation. Thus, for any 
compound there may be some context in which it 
can be interpreted, but in the absence of a marked 
context, only compounds which instantiate one of 
the subschemata are acceptable. 
3 Encoding Lexical Preferences 
In order to help pragmatics select between the multi- 
pie possible interpretations, we utilise probabilities. 
For an established form, derived or not, these de- 
pend straightforwardly on the frequency of a par- 
ticular sense. For example, in the BNC, diet has 
probability of about 0.9 of occurring in the food 
sense and 0.005 in the legislature sense (the remain- 
der are metaphorical extensions, e.g.. diet of crime). 
Smoothing is necessary to avoid giving a non-zero 
probability for possible senses which are not found 
in a particular corpus. For derived forms, the ap- 
plicable lexical rules or schemata determine possi- 
ble senses (Briscoe and Copestake, 1996). Thus 
for known compounds, probabilities of established 
senses depend on corpus frequencies but a residual 
probability is distributed between unseen interpreta- 
tions licensed by schemata, to allow for novel uses. 
This distribution is weighted to allow for productiv- 
it3" differences between schemata. For unseen com- 
pounds, all probabilities depend on schema produc- 
tivity. Compound schemata range from the non- 
productive (e.g., the verb-noun pattern exemplified 
by pickpocket), to the almost fully productive (e.g.; 
made-of) with many schemata being intermediate 
(e.g., has-part: ~-door car is acceptable but the ap- 
parently similar *sunroof car is not). 
We use the following estimate for productivity 
(adapted from Briscoe and Copestake (1996)): 
M+I Prod(cmp-schema) - N 
(where N is the number of pairs of senses which 
match the schema input and M is the number 
of attested two-noun output forms -- we ignore 
compounds with more than two nouns for simplic- 
ity). Formulae for calculating the unseen probability 
mass and for allocating it differentially according to 
schema productivity are shown in Figure 4. Finer- 
grained, more accurate productivity estimates can 
be obtained by considering subsets of the possible 
inputs -- this allows for some real-world effects (e.g., 
the made-of schema is unlikely for liquid/physical- 
artifact compounds). 
Lexical probabilities should be combined to give 
an overall probability for a logical form (LF): see 
e.g., Resnik (1992). But we will ignore this here and 
assume pragmatics has to distinguish between alter- 
natives which differ only in the sense assigned to 
one compound. (2) shows possible interpretations 
for cotton bag with associated probabilities. LFS are 
encoded in DRT. The probabilities given here are 
based on productivity figures for fabric/container 
compounds in the BNC, using WordNet as a source of 
semantic categories. Pragmatics screens the LFS for 
acceptability. If a LF contains an underspecified ele- 
ment (e.g., arising from general-nn), this must be 
instantiated by pragmatics from the discourse con- 
text. 
(2) a. 
b. 
Mary put a skirt in a cotton bag 
e, x, y~ Z~ W, t, now 
mary(x), skirt(y), cotton(w), 
bag(z), put(e, x, y, z ) , 
hold(e, t ) , t -~ now, 
made-of(z, w) 
P = 0.84 
c. 
e, x, y, z, w, t, now 
mary(x), skirt(y), cotton(w), 
bag(z), put(e, x, y, z), 
hold(e, t ) , t -< now, 
contain(z, w) 
e, X; y~ Z, W~ t, now 
P = 0.14 
d. 
mary(x), skirt(y), cotton(w), 
bag(z), put(e, x, y, z), 
hold(e, t), t -< now, 
Rc(z,w),Rc =?, 
-~( made-of(z, w)V 
contain(z, w) V . . .) 
P = 0.02 
139 
4 SDRT and the Resolution of 
Underspecified Relations 
The frequency information discussed in §3 is insuf- 
ficient on its own for disambiguating compounds. 
Compounds like apple juice seat require marked con- 
texts to be interpretable. And some discourse con- 
texts favour interpretations associated with less fre- 
quent senses. In particular, if the context makes the 
usual meaning of a compound incoherent, then prag- 
matics should resolve the compound to a less fre- 
quent but conventionally licensed meaning, so long 
as this improves coherence. This underlies the dis- 
tinct interpretations of cotton bag in (3) vs. (4): 
(3) a. Mary sorted her clothes into various large 
bags. 
b. She put her skirt in the cotton bag. 
(4) a. Mary sorted her clothes into various bags 
made from plastic. 
b. She put her skirt into the cotton bag. 
If the bag in (4b) were interpreted as being made 
of cotton--in line with the (statistically) most fre- 
quent sense of the compound--then the discourse 
becomes incoherent because the definite descrip- 
tion cannot be accommodated into the discourse 
context. Instead, it must be interpreted as hav- 
ing the (less frequent) sense given by purpose- 
patient; this allows the definite description to 
be accommodated and the discourse is coherent. 
In this section, we'll give a brief overview of 
the theory of discourse and pragmatics that we'll 
use for modelling this interaction during disam- 
biguation between discourse information and lex- 
ical frequencies. We'll use Segmented Discourse 
Representation Theory (SDRT) (e.g., Asher (1993)) 
and the accompanying pragmatic component Dis- 
course in Commonsense Entaihnent (DICE) (Las- 
carides and Asher. 1993). This framework has 
already been successful in accounting for other 
phenomena on the interface between the lexicon 
and pragmatics, e.g.. Asher and Lascarides (1995). 
Lascarides and Copestake (1995). 
Lascarides, Copestake and Briscoe (1996). 
SDRT is an extension of DRT (Kamp and Reyle, 
1993). where discourse is represented as a recursive 
set of DRSS representing the clauses, linked together 
with rhetorical relations such as Elaboration and 
Contrast. cf. Hobbs (1985). Polanyi (1985). Build- 
ing an SDRS invoh'es computing a rhetorical relation 
between the representation of the current clause and 
the SDRS built so far. DICE specifies how various 
background knowledge resources interact to provide 
clues about which rhetorical relation holds. 
The rules in DICE include default conditions of the 
form P > Q, which means If P, then normally Q. For 
example, Elaboration states: if 2 is to be attached 
to a with a rhetorical relation, where a is part of the 
discourse structure r already (i.e., (r, a, 2) holds). 
and 3 is a subtype of a--which by Subtype means 
that o's event is a subtype of 8's, and the individ- 
ual filling some role Oi in 3 is a subtype of the one 
filling the same role in a--then normally, a and 2 
are attached together with Elaboration (Asher and 
Lascarides, 1995). The Coherence Constraint on 
Elaboration states that an elaborating event must 
be temporally included in the elaborated event. 
• Subtype : 
(8~(ea,~l) A 8z(e3, ~2) A 
e-condn3 Z_ e-condn~ A 7"2 E_ ~,1) 
Subtype(3. a) 
• Elaboration: 
((r, a, 2) A Subtype(3, a)) > Elaboration(o, ~) 
• Coherence Constraint on Elaboration: 
Elaboration(a, 3) --+ e3 C ea 
Subtype and Elaboration encapsulate clues about 
rhetorical structure given by knowledge of subtype 
relations among events and objects. Coherence 
Constraint on Elaboration constrains the se- 
mantic content of constituents connected by Elab- 
oration in coherent discourse. 
A distinctive feature of SDRT is that if the DICE ax- 
ioms yield a nonmonotonic conclusion that the dis- 
course relation is R, and information that's neces- 
sary for the coherence of R isn't already in the con- 
stituents connected with R (e.g., Elaboration(a, 8) is 
nonmonotonically inferred, but e3 C_ eo is not in a 
or in 3). then this content can be added to the con- 
stituents in a constrained manner through a process 
known as SDRS Update. Informally. Update( r, a. 3) 
is an SDRS, which includes (a) the discourse context 
r, plus (b) the new information '3. and (c) an attach- 
ment of S to a (which is part of r) with a rhetorical 
relation R that's computed via DICE, where (d) the 
content of v. a and 3 are modified so that the co- 
herence constraints on R are met. 3 Note that this 
is more complex than DRT:s notion of update. Up- 
date models how interpreters are allowed and ex- 
pected to fill in certain gaps in what the speaker 
says: in essence affecting semantic canter through 
context and pragmatics, lVe'll use this information 
3If R's coherence constraints can't be inferred, then 
the logic underlying DICE guarantees that R won't be 
nonmonotonically inferred. 
140 
flow between context and semantic content to rea- 
son about the semantic content of compounds in dis- 
course: simply put, we will ensure that words are as- 
signed the most freqent possible sense that produces 
a well defined SDRS Update function. 
An SDnS S is well-defined (written 4 S) if there 
are no conditions of the form x =? (i.e., there are 
no um'esoh'ed anaphoric elements), and every con- 
stituent is attached with a rhetorical relation. A 
discourse is incoherent if "~ 3, Update(T, a,/3) holds 
for every available attachment point a in ~-. That 
is. anaphora can't be resolved, or no rhetorical con- 
nection can be computed via DICE. 
For example, the representm ions of (Sa.b) (in sire- 
plified form) are respectively a and t3: 
(5) a. Mary put her clothes into various large 
bags. 
x. ~ ". Z, e,~. to. u 
o. mary(x), clothes(Y), bag(Z). 
put(eo,x,~'. Z). hold(e,,,ta), ta "< n 
b. She put her skirt into the bag made out of 
cotton. 
x.y.z,w, e3.t2.n.u.B 
mary(x), skirt(y)~ bag(z), cotton(w), 
3. made-of(z, w), u =?, B(u, z). B =?, 
put(e~,x,y,z), hold(e2,to), t~ -< n 
In words, the conditions in '3 require the object 
denoted by the definite description to be linked 
by some 'bridging' relation B (possibly identity, 
cf. van der Sandt (1992)) to an object v identi- 
fied in the discourse context (Asher and Lascarides. 
1996). In SDRT. the values of u and B are com- 
puted as a byproduct of SDRT'5 Update function (cf. 
Hobbs (1979)); one specifies v and B by inferring 
the relevant new semantic content arising from R~s 
coherence constraints, where R is the rhetorical rela- 
tion inferred via the DICE axioms. If one cannot re- 
soh'e the conditions u =? or B =? through SDnS up- 
da~e. then by the above definition of well-definedness 
on SDRSS the discourse is incoherent (and we have 
presupposition failure). 
The detailed analysis of (3) and (52) involve rea- 
soning about the values of v and B. But for rea- 
sons of space, we gloss over the details given in 
Asher and Lascarides (1996) for specifying u and B 
through the SDRT update procedure. However. the 
axiom Assume Coherence below is derivable from 
the axioms given there. First some notation: let 
3\[C\] mean that ~ contains condition C. a~d assume 
that 3\[C/C'\] stands for the SDRS which is the same 
as 3. save that the condition C in 3 is replaced by C'. 
Then in words, Assume Coherence stipulates that if 
the discourse can be coherent only if the anaphor u 
is resolved to x and B is resolved to the specific re- 
lation P, then one monotonically assumes that they 
are resoh,ed this way: 
• Assume Coherence: 
(J~ Update(z,a,B\[u -.,-7 B =?/u = x.B, = P\]) A 
(C' # (,~ = z ^ B = P) -~ 
$ Update( 7", a, ~\[u =?.B =?/C'\]))) -~ 
( Update(z, a, ~) 
Update( v, a, 3\[u =?,B =?/u = x,B = P\])) 
Intuitively, it should be clear that in (Sa.b) -, $ 
Update(a, a, 3) holds, unless the bag in (5b) is one 
of the bags mentioned in (5a)--i.e, u = Z and 
B = member-of For otherwise the events in (5) 
are too "disconnected" to support ant" rhetorical re- 
lation. On the other hand. assigning u and B these 
values allows us to use Subtype and Elaboration 
to infer Elaboration (because skirt is a kind of cloth- 
ing, and the bag in (Sb) is one of the bags in (5a)). 
So Assume Coherence, Subtype and Elaboration 
yield that (Sb) elaborates (Sa) and the bag in (5b) 
is one of the bags in (5a). 
Applying SDRT tO compounds encodes the ef- 
fects of pragmatics on the compounding relation. 
For example, to reflect the fact that compounds 
such as apple juice seat, which are compatible 
only with general-nn, are acceptable only when 
context resoh'es the compound relation, we as- 
sume that the DRS conditions produced by this 
schema are: Rc(y,x), Rc -.,-7 and -,(made-o/(y.x) V 
contain(y, x) V...). By the above definition of well- 
definedness on SDRSS, the compound is coherent only 
if we can resoh,e Rc to a particular relation via the 
SDRT Update function, which in turn is determined 
by DICE. Rules such as Assume Coherence serve to 
specify the necessary compound relation, so long as 
context provides enough information. 
5 Integrating Lexical Preferences 
and Pragmatics 
\Ve now extend SDRT and DICE to handle the prob- 
abilistic information given in §3. We want the prag- 
matic component to utilise this knowledge, while 
still maintaining sufficient flexibility that less fre- 
quent senses are favoured in certain discourse con- 
texts. 
Suppose that the new information to be in- 
tegrated with the discourse context is ambigu- 
ous between ~1 .... ,Bn. Then we assume that 
exactly one of Update(z.a,~,). \] < i <_ n. 
holds. We gloss this complex disjunctive formula as 
141 
/Vl<i<n(Update(T,a, j3i)). Let ~k ~- j3j mean that 
the probability of DRS f~k is greater than that of f~j. 
Then the rule schema below ensures that the most 
frequent possible sense that produces discourse co- 
herence is (monotonically) favoured: 
• Prefer Frequent Senses: 
( /Vl<i<n( Update(T, c~,/~i))A 
$ Update(T, oz,/~j) A 
(/~k ~" j3j --~ -~ $ Update(T,a,~k))) -+ 
Update(T, a,/~j) 
Prefer Frequent Senses is a declarative rule for 
disambiguating constituents in a discourse context. 
But from a procedural perspective it captures: try 
to attach the DRS based on the most probable senses 
first; if it works you're done; if not, try the next most 
probable sense, and so on. 
Let's examine the interpretation of compounds. 
Consider (3). Let's consider the representation ~' 
of (3b) with the highest probability: i.e., the one 
where cotton bag means bag made of cotton. Then 
similarly to (5), Assume Coherence, Subtype and 
Elaboration are used to infer that the cotton bag 
is one of the bags mentioned in (3a) and Elab- 
oration holds. Since this updated SDRS is well- 
defined, Prefer Frequent Senses ensures that it's 
true. And so cotton bag means bag made from cotton 
in this context. 
Contrast this with (4). Update( a, a, /~') is not 
well-defined because the cotton bag cannot be 
one of the bags in (4a). On the other hand, 
Update(a, (~, ~") is well-defined, where t3" is the DRS 
where cotton bag means bag containing cotton. This 
is because one can now assume this bag is one of 
the bags mentioned in (4a), and therefore Elabora- 
tion can be inferred as before. So Prefer Frequent 
Senses ensures that Update(a,a,~") holds but 
Update(a, o~, j3') does not. 
Prefer Frequent Senses is designed for reason- 
ing about word senses in general, and not just the 
semantic content of compounds: it predicts diet has 
its food sense in (6b) in isolation of the discourse 
context (assuming Update(O, 0, ~) = ~), but it has 
the law-maker sense in (6), because SDRT's coher- 
ence constraints on Contrast ((Asher, 1993))--which 
is the relation required for Update because of the cue 
word but--can't be met when diet means food. 
(6) a. In theory, there should be cooperation be- 
tween the different branches of government. 
b. But the president hates the diet. 
In general, pragmatic reasoning is computation- 
ally expensive, even in very restricted domains. But 
the account of disambiguation we've offered circum- 
scribes pragmatic reasoning as much as possible. 
All nonmonotonic reasoning remains packed into the 
definition of Update(T, a, f~), where one needs prag- 
matic reasoning anyway for inferring rhetorical re- 
lations. Prefer Frequent Senses is a monotonic 
rule, it doesn't increase the load on nonmonotonic 
reasoning, and it doesn't introduce extra pragmatic 
machinery peculiar to the task of disambiguating 
word senses. Indeed, this rule offers a way of check- 
ing whether fully specified relations between com- 
pounds are acceptable, rather than relying on (ex- 
pensive) pragmatics to compute them. 
We have mixed stochastic and symbolic reasoning. 
Hobbs et al (1993) also mix numbers and rules by 
means of weighted abduction. However, the theories 
differ in several important respects. First, our prag- 
matic component has no access to word forms and 
syntax (and so it's not language specific), whereas 
Hobbs et al's rules for pragmatic interpretation can 
access these knowledge sources. Second, our prob- 
abilities encode the frequency of word senses asso- 
ciated with word forms. In contrast, the weights 
that guide abduction correspond to a wider variety 
of information, and do not necessarily correspond to 
word sense/form frequencies. Indeed, it is unclear 
what meaning is conveyed by the weights, and con- 
sequently the means by which they can be computed 
are not well understood. 
6 Conclusion 
We have demonstrated that compound noun in- 
terpretation requires the integration of the lexi- 
con, probabilistic information and pragmatics. A 
similar case can be made for the interpretation 
of morphologically-derived forms and words in ex- 
tended usages. We believe that the proposed archi- 
tecture is theoretically well-motivated, but also prac- 
tical, since large-scale semi-automatic acquisition of 
the required frequencies from corpora is feasible, 
though admittedly time-consuming. However fur- 
ther work is required before we can demonstrate this, 
in particular to validate or revise the formulae in §3 
and to further develop the compound schemata. 
7 Acknowledgements 
The authors would like to thank Ted Briscoe and 
three anonymous reviewers for comments on previ- 
ous drafts. This material is in part based upon work 
supported by the National Science Foundation un- 
der grant number IRI-9612682 and ESRC (UK) grant 
number R000236052. 
142 

References 
Asher, N. (1993) Reference to Abstract Objects in 
Discourse, Kluwer Academic Publishers. 
Asher, N. and A. Lascarides (1995) 'Lexical Disam- 
biguation in a Discourse Context', Journal of Se- 
mantics, voi.12.1, 69-108. 
Asher, N. and A. Lascarides (1996) Bridging, Pro- 
ceedings of the International Workshop on Se- 
mantic Underspecification, Berlin, October 1996, 
available from the Max Plank Institute. 
Bauer, L. (1983) English word-formation, Cam- 
bridge University Press, Cambridge, England. 
Briscoe, E.J. and A. Copestake (1996) 'Controlling 
the application of lexical rules', Proceedings of the 
A CL SIGLEX Workshop on Breadth and Depth of 
Semantic Lexicons, Santa Cruz, CA. 
Downing, P. (1977) 'On the Creation and Use of 
English Compound Nouns', Language, vol.53(~), 
810-842. 
Dowty, D. (1979) Word meaning in Montague Gram- 
mar, Reidel, Dordrecht. 
Hobbs, J. (1979) 'Coherence and Coreference', Cog- 
nitive Science, vol.3, 67-90. 
Hobbs, J. (1985) On the Coherence and Structure of 
Discourse, Report No. CSLI-85-37, Center for the 
Study of Language and Information. 
Hobbs, J.R., M. Stickel, D. Appelt and P. Martin 
(1993) 'Interpretation as Abduction', Artificial In- 
telligence, vol. 63.1, 69-142. 
Johnston, M., B. Boguraev and J. Pustejovsky 
(1995) 'The acquisition and interpretation of com- 
plex nominals', Proceedings of the AAAI Spring 
Symposium on representation and acquisition of 
lexical knowledge, Stanford, CA. 
Johnston, M. and F. Busa (1996) 'Qualia struc- 
ture and the compositional interpretation of com- 
pounds', Proceedings of the ACL SIGLEX work- 
shop on breadth and depth of semantic lexicons, 
Santa Cruz, CA. 
Jones, B. (1995) 'Predicting nominal compounds', 
Proceedings o.f the 17th Annual conference of the 
Cognitive Science Society, Pittsburgh, PA. 
Kamp, H. and U. Reyle (1993) From Discourse to 
Logic: an introduction to modeltheoretic seman- 
tics, formal logic and Discourse Representation 
Theory, Kluwer Academic Publishers, Dordrecht, 
Germany. 
Lascarides, A. and N. Asher (1993) 'Temporal Inter- 
pretation, Discourse Relations and Commonsense 
Entailment', Linguistics and Philosophy, vol. 16.5, 
437-493. 
Lascarides, A., E.J. Briscoe, N. Asher and A. Copes- 
take (1996) 'Persistent and Order Independent 
Typed Default Unification', Linguistics and Phi- 
losophy, voi.19:1, 1-89. 
Lascarides, A. and A. Copestake (in press) 'Prag- 
matics and Word Meaning', Journal of Linguis- 
tics, 
Lascarides, A., A. Copestake and E. J. Briscoe 
(1996) 'Ambiguity and Coherence', Journal of Se- 
mantics, vol.13.1, 41-65. 
Levi, J. (1978) The syntax and semantics of complex 
nominals, Academic Press, New York. 
Liberman, M. and R. Sproat (1992) 'The stress and 
structure of modified noun phrases in English' in 
I.A. Sag and A. Szabolsci (eds.), Lexical matters, 
CSLI Publications, pp. 131-182. 
Link, G. (1983) 'The logical analysis of plurals 
and mass terms: a lattice-theoretical approach' 
in Bguerle, Schwarze and von Stechow (eds.), 
Meaning, use and interpretation of language, de 
Gruyter, Berlin, pp. 302-323. 
Polanyi, L. (1985) 'A Theory of Discourse Structure 
and Discourse Coherence', Proceedings of the Pa- 
pers from the General Session at the Twenty-First 
Regional Meeting of the Chicago Linguistics Soci- 
ety, Chicago, pp. 25-27. 
Pustejovsky, J. (1995) The Generative Lexicon, MIT 
Press, Cambridge, MA. 
Resnik, P. (1992) 'Probabilistic Lexicalised Tree Ad- 
joining Grammar', Proceedings of the Coling92, 
Nantes, France. 
van der Sandt, R. (1992) 'Presupposition Projection 
as Anaphora Resolution', Journal of Semantics, 
voi.19.4, 
Sparck Jones, K. (1983) 'So what about parsing com- 
pound nouns?' in K. Sparck Jones and Y. Wilks 
(eds.), Automatic natural language parsing, Ellis 
Horwood, Chichester, England, pp. 164-168. 
Webber, B. (1991) 'Structure and Ostension in the 
Interpretation of Discourse Deixis', Language and 
Cognitive Processes, vol. 6.2, 107-135. 
