Reading Distinction in MT* 
Plus ten Hacken 
Eurotra 
Onderzoeksinstituut voor TaM en Spraak 
University of Utrecht 
Trans 10, 3512 JK Utrecht 
email: tenhackenC~hutruu59.bitnet 
Abstract 
In any system for Natural Language Process- 
ing having a dictionary, the question arises as 
to whiclh entries are included in it. In this pa- 
per, I address the subquestion as to whether a 
lexical unit having two senses should be con- 
sidered ambiguous or vague with respect to 
them. The inadequacy of some common strate- 
gies to answer this question in Machine Trans- 
lation (MT) systems is shown. From a seman- 
tic conjecture, tests are developed that are ar- 
gued to give more consistent and theoretically 
well-founded results. 
1 Introduction 
In any system for Natural Language Processing 
having a dictionary, the question arises which 
entries are included in it. In this paper, I will 
assume the environment of a mnltilingual MT 
system based on a linguistic analysis and trans- 
fer architecture, from which I will derive some 
argumentation. 
The question which entries are included in 
the dictionary should be answered in two parts. 
First there is a mapping from graphic words to 
lexical units (lu's), then a mapping from lu's to 
readings, each of which is represented in an en- 
try. The former mapping represents a certain 
level of analysis of the graphic word. It ab- 
stracts away from inflection and spelling vari- 
ation, and, depending on the system's analysis 
component, may do so as well for productive 
*I would like to thank my colleagues at the university 
and in Eurotra, especially Louis des Tombe and Henk 
Verkuyl, for their helpful comments. 
derivation and compounding, and multi-word 
units. In this paper I will concentrate on the 
latter mapping, reading distinction, in a way 
that does not appeal to a particular choice on 
the relation between lu and graphic word. 
A consistent approach to reading distinction 
is necessary, because inconsistencies in read- 
ing distinction in an MT system will compli- 
ca.te transfer components between a pair of lan- 
guages, and jeopardize extensibility of the sys- 
tem. A correct solution will save time in devel- 
opment and improve performance. The central 
question in this area can be formulated as in 
(1). 
(1) Given an lu X and two of its senses ocl 
and 5'2, is X ambiguous or vague with 
respect to S1 and $2 ? 
In (1) a sense of an lu is the meaning the 1ll has 
in a certain set of contexts. If the lu is vague, 
both senses are covered by the same reading. 
If it is ambiguous, S1 and $2 are examples of 
different readings of the lu, each reading being 
represented in a single entry. 
2 Some common methods 
Since every reading distinction creates lexical 
ambiguity that has to be solved, it seems at- 
tractive to use the features expressing the rele- 
vant information as criterion for answering (1): 
An lu is ambiguous between $1 and $2 iff there 
is a feature describing the dilference. 
If we only take morphological and sylltactic 
features, many intuitively clear cases of ambi- 
guity (e.g. bank as financial institution vs. as 
162 
river verge) cannot be expressed. This will lead 
to problems in transfer or in generation. On 
the other hand, these features will cause un- 
wanted distinctions as well, e.g. Prench fonc- 
tionnair'e (civil servant) with gender masculine 
or feminine, and kneel with past tense l~neegd 
or knelt. It makes no sense to try to disam- 
biguate these. 
The use of semantic features to define aln- 
biguity should be rejected for similar reasons. 
First, we have to determine a fixed set of fea- 
tures a priori, since otherwise no answers to 
questions of reading distinction evolve. This 
imposes an artificial upper bound on reading 
distinction. Moreover, the availability of a cer- 
tain feature does not mean that it has to be 
assigned in all cases. We will certainly have 
a feature expressing the male/fema.le contra.st, 
but it is not desirable to create two readings 
of parent accordingly, leading to a translation 
of pa~r.nts into something meaning mother(s) 
a~zd/or Jhther~,:). 
Alternatively, we couhl argue thal, since 
translation is the goal, it shouh\] be the cri- 
terium \[br reading distinction as well, answer- 
ing (1): An lu having two senses is ambigu- 
ous iff there are diffe.rent translations for the 
two senses. Leaving aside the non.-trivial prob- 
lem of determining whether there are differ- 
ent translations, we have to admit that there 
are cases of exceptional distinctions in one lan- 
guage, e.g. fle'uve vs. r'ivi&c in French, mean- 
i ng river ending in a, sea or in another river re- 
spectively. "l"hese distinctions will influence all 
other dictionaries in the system, in the sense 
that e.g. English 7+iver and Dutch r'ivier be-. 
come anabiguous, and there are two transla- 
t;ion rules between them. If we restrict our 
attention to a limited grou I) of languages, e.g. 
the languages in the system, the system be- 
comes difficult to extend, siace adding a new 
language from outside this group will affect all 
existing dictionaries. Otherwise there is a con- 
ceptual problem, since it will ,rover be possible 
to decide that an lu is vague, unless we know 
al\] languages of the world. Instead, cases of 
exceptional distinctions, bilinqual ambiguities, 
are best ihandled in transfer between the two 
languages, because they really are translation 
pro blems.. 
Summarizing, taking the means (features) or 
the goal (translation) as a criterium for read-. 
ing distinction results in decisions that cause 
various practical problems and are intuitively 
incorrect. Furthermore these strategies detach 
the notion of reading from meaning, which is 
theoretically undesirable. 
Taking only intuitions as our guide will link 
reading to meaning, but if even trained intu- 
itions or lexicographers do not prevent incon- 
sistencies, as can be seen in many published 
dictionaries, there is not much hope of reach- 
ing consistency, unless we manage to find some 
support for the intuitions. 
3 A semantic method 
The tests I will propose here to decide on read- 
ing distinction are based on monolingual mean- 
ing, and yield a substantially greater degree of 
consistency than direct, ui~.aided intuitions. It 
is based on the following conjecture. 
(2) There is a set of processes P, that, given 
a single occurrence of an lu, can stretch 
the actual meaning of the 1,l in the con- 
text to the boundaries of the reading 
the lu has, but not beyond. 
In order to be able to check the results, 
we will first consider some tests where spe- 
cific processes in P are used, as a.pplied to 
some intuitiveh, clear cases of ambiguity (e.g. 
ba~dv" as financial institution vs. river verge) and 
vagueness (e.g. elephant as Indian elephant vs. 
African elephant). Then the scope of these 
tests will be expanded to other cases. 
A well-known test evolving from (2) is based 
on conjunction. Lakoff (1970) proposed a. test 
where anaphoric so in the second clause of a 
conjunction refers back to an antecedent con- 
taining the lu :for which the question of reading 
distinction arises, ms in (3). 
(3) a. John went to a bank this morning, 
and so did Mary. 
b. John saw an elephant, and so did 
Mary. 
The question to be asked in this cast is whether 
tile sentence is semantically normal when tile 
anaphor is interpreted in the other sense than 
its antecedent. Clearly, (3a) is strange in this 
163 
interpretation, whereas (3b) is normal, con- 
firming that bank but not elephant is ambigu- 
ous in the relevant way (cf. Cruse (1986) on the 
use of semantic normality judgements). Other 
anaphors, e.g. one, there can be used as well. 
The answers are more reliable in case of an an- 
tecedent contmning less lexical material out- 
side the lu in question. In (3a), the antecedent 
of so is go to a bank, and ambiguity might be 
claimed to arise from the verb. Using one in- 
stead of so takes away this possibility, which is 
especially relevant in less clear cases. 
Other processes use quantifiers. One, based 
on Wiggins (1971), uses universal quantifica- 
tion. It is exemplified in (4). 
(4) a. All banks in this town are safe. 
b. All elephants in th.is zoo are old. 
The question to be asked here is whether all X 
can be interpreted as all $1 or all ,%., or only 
as all $1 and all $2. Whereas (4a) can mean 
either that there is no danger of flooding or 
that bank-robbers are effectively discouraged, 
and it is odd when used to mean both, (4b) can 
only be used to predicate over both African and 
Indian elephants in the zoo that they are old. 
A variant using negation in the same way is 
discussed by Kempson & Cormack (1981). 
A slightly different test can be pertbrmed 
with a universal quantifier somewhat remote 
from the relevant lu, as in (5). 
(5) a. Every town has a bank. 
b. Every zoo ha~s an elephant. 
The question to be asked here is whether the 
X (bank/elephant) has to be interpreted in the 
same sense for every Y (town/zoo). In a similar 
way numerals can be used as in (6), and coor- 
dination as in (7), requiring the same question. 
(6) a. This town has two banks. 
b. This zoo has two elephants. 
(7) a. 
b. 
John and Mary went to a bank this 
morning. 
John and Mary saw an elephant this 
morning. 
Summarizing, there are three main classes 
of processes in P behaving as in (2). The first 
one refers to two elements from the extension 
of the lu, one of them by an anaphor, as in 
(3). The second one refers to the flfll extension 
of a reading at once, as in (4). The third one 
refers to a group of elements in the extension, 
exploiting distributivity, as in (5)-(7). Each 
class is associated with a difl'erent question the 
answer of which determines whether an anal- 
ysis as ambiguity or as vagueness is correct. 
There are various realizations of test sentences 
for each class, some of which are subject to in- 
dependently motivated constraints. In a nat- 
ural way an intuitively appealing definition of 
reading evolves as in (8). 
(8) A reading of an lu is a coherent group of 
senses, the boundaries of which ('aunot 
be crossed by a single occurrence of the 
lu without losing semantic normality. 
4 The tests in actual use 
In the previous section, semantic tests were 
shown to give correct answers in cases where 
we can check them. This proves that we should 
not immediately reject the tests. The reason 
we need them however, is that there are many 
cases wtmre unaided intuition is not sufficiently 
determinate, so that conflicts on the correct 
an,~tysls might arise. 
A well-known problem area is the al~alysis of 
privative oppositions, where one of the senses 
is more general and includes the other one. 
Both dog and lion have senses animal belonging 
to a particular species of mammals and male 
specimen of that species. According to Kemp- 
son (1980) they are both vague with respect 
to these senses, but Zwicky ~: Sadock (1975) 
claim that dog but not lion is ambiguous. Ap- 
plying various tests to them we get the follow- 
ing sentences. 
(9) a. John has a dog, and Mary ihas one 
too. 
b. The zoo has a lion, and the circus 
has one too. 
(lo) a. 
b. 
All dogs of this breed are short- 
sighted. 
All lions in this wild reserve have 
been killed by poachers. 
164 
(11) a. This family has two dogs. 
b. This zoo has two lions. 
The sentences (9) and (1.1) cannot lead to a. 
conclusion for independent reasons. Since an 
individual or a group in the more specific sense 
of the lu is also an individual or a group in 
the general sense, the general sense is always 
available to cover up the opposition. This is 
not the case when the flfll extensions are com- 
pared, however. Therefore from (10) we can in-- 
deed conclude that dog is ambiguous and lion 
is not. Both (10a) and (10b) have the gen- 
eral interpretation, but only (10a) also has the 
more specific one (cf .... , b,tt not the bitches 
vs. *..., but not the lionesses). 
Another problem that comes up is the con- 
struction of test sentences for other syntactic 
categories than nouns. Although the various 
processes are most easily demonstrated with 
nouns, nothing in the theory refers to nouns di- 
rectly. VP-anaphors, e.g. so, can also be used 
for w~rbs. 
(12) a. John has been running 'all day, and 
so has his washing machine. 
b. John has been running all day, and 
so has his dog. 
(13) John followed Mary, and Bill (lid so 
too. 
The sentences in (12) show the ambiguity of 
run between tile senses with a human and a 
machine subject, and the vagueness between 
senses with a biped and a quadruped subject. 
For transitive verbs, such as follow, having the 
sense understand and go after, the result of 
the test is more disputable, since (13) shows 
the ambiguity of follow Mary, and one could 
argue that it is due to ambiguity of Mary, e.g. 
between the senses thinking person and spa- 
tial object. Therefore, the use of a non-lexical 
anaphor, indicated by # in the examples, is to 
be preferred. 
(14) John followed Mary, and Bill # Kate. 
It is rather difficult to construct a sentence 
with a quantifier over the verb comparable to 
(4) for nouns. Rather, a sentence such as (15) 
below displays the same distributivity effect as 
(5), that can also be achieved by coordination 
as in (16). 
(15) All boys followed Mary. 
(16) John and Bill followed Mary. 
Tile test sentences for verbs can also be used 
for adjectives, if they are used predicatively. 
An example is (17), where black is shown to 
have different readings when used with a con- 
crete object and with humour. 
(17) Her dress is black, and so is her hu- 
n~lour. 
For gradable adjectives, a comparison is a ba- 
sis for constructing a test sentence. Although 
(17) can be used humoristica.lly, (118) below, il- 
lustrating the ambiguity of fair, ca:n hardly be 
interpreted. 
(18) Her hair is as fair as the salary she pa~ys 
her employees. 
In general it seems that for gradable adjectives 
comparison provokes stronger judgements than 
anaphoric reference by so. In some cases, how- 
ew,,r, one of the senses cannot be used predica- 
tively, and neither of the two processes can be 
used. An empty anaphor sometimes provides a 
solution, as in (19), where economic is shown 
to be ambiguous between the senses relati~g to 
the economy and not wasteful. 
(19) For Inany years, he produced economic 
theories ~nd # cars. 
In sorne languages, there is a lexical anaphor 
that requires an adjective as its antecedent, e.g. 
dito in Dutch, as illustrated in (20). 
(20) Bij hun gouden bruiloft kregen 
ze een dito horloge. 
(Litt.) 'At their golden wedding got 
they a # watch' 
Among the remaining problems is the com- 
parison of two senses with big syntactic differ- 
ences. All test sentences have to be syntacti- 
cally correct, and syntax does not allow e.g. co- 
ordination of a noun and a verb in correspond- 
ing positions, tn such cases, tile semantic part 
of testing the senses is never a.rrived at. 
165 
5 Conclusion 
In this paper, I developed tests to answer the 
question whether an lu with two senses is to 
be analyzed as ambiguous or vague with re- 
spect to them from the semantic conjecture 
(2). The tests allow for theoretically well- 
founded and consistent decisions in many cases. 
In MT, they determine a proper balance on 
the cline between what can easily be disam- 
biguated monolingually, and what is useful as 
a distinction in translation. As such they define 
the target for monolingual disambiguation, and 
the class of bilingual ambiguities, that should 
be treated in transfer. Since the MT environ- 
ment has only been used in the argumentation, 
not in the solution proposed, theoretical well- 
foundedness and consistency evolving from the 
tests presented here are equally valid in other 
environments where a monolingual dictionary 
is used. 

References 

(;ruse, D.A. (1986). Lexical Semantics, Cam- 
bridge University Press. 

Kempson, Ruth (1980). ~ Ambiguity and Word 
Meaning', in: Greenbaum, Sidney, Geoffrey 
Leech & Jan Svartvik, St'udics in English Lin- 
guistics, Longman, London / New York, p. 7-16. 

Kempson, Ruth & Annabel Cormack (1981). 
'Ambiguity and Quantification', Linguistics 
and Philosophy ~, p. 259-309. 

Lakoff, George (1970). 'A Note on Vagueness 
and Ambiguity', Linguistic lnquiry i, p. 357-359. 

Wiggins, David (1971). 'On sentence-sense, 
word-sense and difference of word-sense. To- 
wards a philosophical theory of dictionaries.' 
In: Steinberg, Danny ,~ Leon Jakobovits (ed.). 
Semantics, Cambridge University Press, p. 14-34. 

Zwicky, Arnold & Jerrold Sadock (1975). ~Am- 
biguity tests and how to fail them', in: Kim- 
ball, John (ed.). Syntax and semantics 1, Aca- 
demic Press, p. 1-36. 
