Role of Word Sense Disambiguation in Lexical Acquisition: 
Predicting Semantics from Syntactic Cues 
Bonnie J. Dorr and Doug Jones 
Department of Computer Science and 
Institute for Advanced Computer Studies 
University of Maryland 
A.V. Williams Building 
College Park, MD 20742 
{bonnie, j ones}~umiacs, umd. edu 
ABSTRACT 
This paper addresses the issue of word-sense ambiguity 
in extraction from machine-readable resources for the con- 
struction of large-scale knowledge sources. We describe two 
experiments: one which ignored word-sense distinctions, re- 
sulting in 6.3% accuracy for semantic classification of verbs 
based on (Levin, 1993); and one which exploited word-sense 
distinctions, resulting in 97.9% accuracy. These experiments 
were dual purpose: (1) to validate the central thesis of the 
work of (Levin, 1993), i.e., that verb semantics and syntactic 
behavior are predictably related; (2) to demonstrate that a 
15-fold improvement can be achieved in deriving semantic 
information from syntactic cues if we first divide the syntac- 
tic cues into distinct groupings that correlate with different 
word senses. Finally, we show that we can provide effective 
acquisition techniques for novel word senses using a combi- 
nation of online sources. 
1 Introduction 
This paper addresses the issue of word-sense ambigu- 
ity in extraction from machine-readable resources for 
the construction of large-scale knowledge sources. We 
describe two experiments: one which ignored word- 
sense distinctions, resulting in 6.3% accuracy for seman- 
tic classification of verbs based on (Levin, 1993); and 
one which exploited word-sense distinctions, resulting 
in 97.9% accuracy. These experiments were dual pur- 
pose: (l) to validate the central thesis of the work of 
(Levin, 1993), i.e., that verb semantics and syntactic be- 
havior are predictably related; (2) to demonstrate that 
a 15-fold improvement can be achieved in deriving se- 
mantic information from syntactic cues if we first divide 
the syntactic cues into distinct groupings that correlate 
with different word senses. Finally, we show that we 
can provide effective acquisition techniques for novel 
word senses using a combination of online sources, in 
particular, Longman's Dictionary of Contemporary En- 
glish (LDOCE) (Procter, 1978), Levin's verb classifica- 
tion scheme (Levin, 1993), and WordNet (Miller, 1985). 
We have used these techniques to build a database of 
10,000 English verb entries containing semantic infor- 
mation that we are currently porting into languages 
such as Arabic, Spanish, and Korean for multilingual 
NLP tasks such as foreign language tutoring and ma- 
chine translation. 
322 
2 Automatic Lexical Acquisition for 
NLP Tasks 
As machine-readable resources (i.e., online dictionaries, 
thesauri, and other knowledge sources) become read- 
ily available to NLP researchers, automated acquisition 
has become increasingly more attractive. Several re- 
searchers have noted that the average time needed to 
construct a lexical entry can be as much as 30 min- 
utes (see, e.g., (Neff and McCord, 1990; Copestakc et 
al., 1995; Walker and Amsler, 1986)). Given that we 
are aiming for large-scale lexicons of 20-60,000 words, 
automation of the acquisition process has become a ne- 
cessity. 
Previous research in automatic acquisition focuscs 
primarily on the use of statistical techniques, such as 
bilingual alignment (Church and Hanks, 1990; Kla- 
vans and Tzoukermann, 1996; Wu and Xia, 1995), or 
extraction of syntactic constructions from online dic- 
tionaries and corpora (Brant, 1993; Dorr, Garman, 
and Weinberg, 1995). Others who have taken a more 
knowledge-based (interlingual) approach (Lonsdale, Mi- 
tamura, and Nyberg, 1996) do not provide a means 
for systematically deriving the relation between sur- 
face syntactic structures and their underlying semantic 
representations. Those who have taken more argument 
structures into account, e.g., (Copestake et al., 1995), 
do not take full advantage of the systematic relation be- 
tween syntax and semantics during lexical acquisition. 
We adopt the central thesis of Levin (1993), i.e., that 
the semantic class of a verb and its syntactic behav- 
ior are predictably related. We base our work on a 
correlation between semantic classes and patterns of 
grammar codes in the Longman's Dictionary of Con- 
temporary English (LDOCE) (Procter, 1978). While 
the LDOCE has been used previously in automatic cx- 
traction tasks (Alshawi, 1989; Farwell, Guthrie, and 
Wilks, 1993; Boguraev and Briscoe, 1989; ,Wilks et al., 
1989; Wilks et al., 1990) these tasks are primarily con- 
cerned with the extraction of other types of informa- 
tion including syntactic phrase structure and broad ar- 
gument restrictions or with the derivation of semantic 
structures from definition analyses. The work of San- 
filippo and Poznanski (1992) is more closely related to 
our approach in that it attempts to recover a syntactic- 
semantic relation from machine-readable dictionaries. 
Itowever, they claim that the semantic classification of 
verbs based on standard machine-readable dictionaries 
(e.g., the LDOCE) is % hopeless pursuit \[since\] stan- 
dard dictionaries are simply not equipped to offer this 
kind of information with consistency and exhaustive- 
ness." 
Others have also argued that the task of simplify- 
in K lexical entries on the basis of broad semantic class 
membership is complex and, perhaps, infeasible (see, 
e.g., Boguraev and llriscoe (1989)). tlowever, a number 
of researchers (l,'ilhnore, 1968; Grimshaw, 1990; Gru- 
ber, 1965; Guthrie et al., 1991; Hearst, 1991; Jackend- 
otr, 1983; Jackendoff, 1990; l,evin, 1993; Pinker, t989; 
Yarowsky, 1992) have demonstrated conclusively that 
there is a clear relationship between syntactic context 
and word senses; it is our aim to exploit this relationship 
for the acquisition of semantic lexicons. 
3 Syntax-Semantics Relation: -Verb 
Classification Based on Syntactic 
Behavior 
The central thesis of (Levin, 1993) is that the seman- 
tics of a verb and its syntactic behavior are predictably 
related. As a demonstration that such predictable rela- 
tionships are not confined to an insignificant portion of 
the vocabulary, Levin surveys 4183 verbs, grouped into 
191 semantic classes in Part Two of her book. The syn- 
tactic behavior of these classes is illustrated with 1668 
example sentences, an average of 8 sentences per (:lass. 
Given the scope of bevin's work, it is not easy to 
verify the central thesis. 'lb this end, we created a 
database of Levin's verb classes and example sentences 
from each class, and wrote a parser to extract, basic syn- 
tactic patterns from tire sentences.1 We then character- 
ized each semantic class by a set of syntactic patterns, 
which we call a syntactic signature, and used the re- 
suiting database as the basis of two experiments, both 
designed to to discover whether the syntactic signatures 
tell us anything about the meaning of the verbs. 2 '\['he 
first experiment, which we label Verb-Based, ignores 
word-sense distinctions by assigning one syntactic sig- 
nature to each verb, regardless of whether it occurred 
in multiple classes. The second experiment, which we 
label Class-Based, implicitly takes word-sense distinc- 
tions into account by considering each occurrence of a 
verb individually and assigning it a single syntactic sig- 
nature according to class membership. 
The remainder of this section describes the assign- 
rnent of signatures to semantic cbusses and the two ex- 
periments for determining the relation of syntactic in- 
formation to semantic cbtsses. We will see that our clas- 
sitication technique shows a 15-fold improvement in the 
experiment where we implicitly account for word-sense 
distinctions. 
1Both the database and the parser are encoded in Quin- 
tus Prolog. 
2The design of this experiment is inspired by the work 
of (Dubois and Saint-Dizier, 1995). In particular, we depart 
from the alternation-based data in (Levin, 1993), which is 
primarily binary in that sentences are presented in pairs 
which constitute an alternation. Following Saint-Dizier's 
work, we construct N-ary syntactic characterizations. The 
choice is of no empirieM consequence, but it simplifms the 
experiment by eliminating the problem of naming the syn- 
tactic patterns. 
Verbs: break, chip, crack, crash, crush, fracture, rip, 
shatter, slnash, snap, sl)linter, split, tear 
Example Sentences: 
Crystal vases break easily. 
The hammer broke the window. 
The window broke. 
q'ony broke her arm. 
'l?ony broke his finger. 
"lbny broke the crystal vase. 
qbny broke the cup against the wall. 
q'ony broke the glass to 1)ieces. 
Tony broke the piggy bank open. 
Tony broke the window with a hanuner. 
Tony broke the window. 
*Tony broke at tit(; window. 
*qbny broke herself on the arm. 
*Tony broke himself. 
*qbny broke the wall with the cup. 
A break. 
Derived Syntactic Signature: 
1-\[np,v\] 1-\[np,v,np\] 1-\[np,v,np,adjectiw\] 
1- \[np, v, np ,pp(against) \] l-\[np,v,np,pp(to)\] 
1- \[np, v, np,pp (with) \] 1- \[np, v, pess, np\] 
1- \[np,v,adv(easily)\] l-in\] 
O-\[np,v,np,pp(with)\] 0- \[np,v,self\] 
O-\[np,v,seH,pp(on)\] 0- \[np,v,pp(at)\] 
Table 1: Syntactic Signatm:e for Change of State break 
subclass 
3.1 Asslgntnent of Signatures 
For tile first experiment below, we construct a verb- 
based syntactic signature, while for the second exl)eri- 
ment, we constructed a class-based signature. 
The first step for constructing a signature is to 
decide what syntactic information to extract for ttre 
t)asic syntactic patterns that make up the signature. 
It turns out that a very simple strategy works well, 
namely, flat parses that contain lists of the major cat- 
egories in the sentence, the verb, and a handfifl of 
other elements. The "parse", then, for the sentence 
Tony broke the crystal vase is simply the syntac- 
tic pattern \[np,v,np\]. For Tony broke the vase to 
pieces we get \[np,v,np,pp(to)\]. Note that the pp 
node is marked with its head preposition. Table l shows 
an example class, the break subclass of the Change of 
State verbs (45.1), along with example sentences and 
the derived syntactic signature based on sentence pat- 
terns. Positive example sentences are denoted by the 
number 1 in the sentence patterns and negative example 
sentences are denoted by the number 0 (corresponding 
to sentences marked with a *). 
3.2 Experiment 1: Verb-based Approach 
In the first experiment, we ignored word sense distinc- 
tions and considered each verb only once, regardless of 
whether it occurred in multiple classes. In fact;, 46% 
of the verbs appear more than once. In some cases, 
the verb appears to have a related sense even though it 
appears in different classes. For example, the verb roll 
appears in two subclasses of Manner of Motion Verbs 
that are distinguished on the basis of whether the gram- 
matical subject is animate or inanimate. In other cases, 
tile verb may have (largely) unrelated senses. For ex- 
ample, the verb move is both a Manner of Motion verb 
323 
and verb of Psychological State. 
To compose the syntactic signatures for each verb, 
we collect all of the syntactic patterns associated with 
every class a particular verb appears in, regardless of the 
different classes are semantically related. A syntactic 
signature for a verb, by definition, is the union of the 
frames extracted from every example sentence for each 
verb. The outline of the verb-based experiment is as 
follows: 
1. Automatically extract syntactic information from the 
example sentences. 
2. Group the verbs according to their syntactic signature. 
3. Determine where the two ways of grouping verbs over- 
lap: 
(a) the semantic classification given by Levin. 
(1)) the syntactic classification based on the derived 
syntactic signatures. 
To return to the Change of State verbs, we now con- 
sider the syntactic signature of the verb break, rather 
than the signature of the semantic class as a unit. The 
verb break belongs not only to the Change of State 
class, but also four other classes: 10.6 Cheat, 23.2 Split, 
40.8.3 Hurl, and 48.1.1 Appear. Each of these classes is 
characterized syntactically with a set of sentences. The 
union of the syntactic patterns corresponding to these 
sentences forms the syntactic signature for the verb. So 
although the signature for the Change of State class has 
13 frames, the verb break has 39 frames from the other 
classes it appears in. 
Conceptually, it is helpful to consider the difference 
between the intension of a function versus its exten- 
sion. In this case, we are interested in the functions 
that group the verbs syntactically and semantically. In- 
tensionally speaking, the definition of the function that 
groups verbs semantically would have something to do 
with the actual meaning of the verbs. ~ Likewise, the in- 
tension of the function that groups verbs syntactically 
would be defined in terms of something strictly syntac- 
tic, such as subcategorization frames. But the inten- 
sions of these functions are matters of significant the- 
oretical investigation, and although much has been ac- 
complished in this ~rea, the question of mapping syntax 
to semantics and vice versa is an open research topic. 
Therefore, we can turn to the extensions of the func- 
tions: the actual groupings of verbs, based on these two 
separate criteria. The semantic extensions are sets of 
verb tokens, and likewise, the syntactic extensions are 
sets of verb tokens. To the extent that these functions 
map between syntax and semantics intensionally, they 
will pick out the same verbs extensionally. 
So for the verb-based experiment, our technique for 
establishing the relatedness between the syntactic signa- 
tures and the semantic classes, is mediated by the verbs 
themselves. We compare the two orthogonal groupings 
of the inventory of verbs: the semantic classes defined 
by Levin and the sets of verbs that correspond to each 
of the derived syntactic signatures. When these two 
groupings overlap, we have discovered a mapping from 
the syntax of the verbs to their semantics, via the verb 
tokens. More specifically, we define the overlap index 
3An example of the intensional characterization of the 
Levin classes are the definitions of Lexical Conceptual Struc- 
tures which correspond to each of Levin's semantic classes. 
See (Dorr and Voss, to appear). 
as the number of overlapping verbs divided by the av- 
erage of the number of verbs in the semantic class and 
the number of verbs in the syntactic signature. Thus an 
overlap index of 1.00 is a complete overlap and an over- 
lap of 0 is completely disjoint. In this experiment, the 
sets of verbs with a high overlap index are of interest. 
When we parsed the 1668 example sentences in Part 
Two of Levin's book (including the negative examples), 
these sentences reduce to 282 unique patterns. The 191 
sets of sentences listed with each of the 191 semantic 
classes in turn reduces to 748 distinct syntactic signa- 
tures. Since there are far more syntactic signatures than 
the 191 semantic classes, it is clear that the mapping 
between signatures and semantic classes is not direct,. 
Only 12 mappings have complete overlaps. That means 
6.3% of the 191 semantic classes have a complete over- 
lap with a syntactic signature. 
The results of this experiment are shown in Table 2. 
Three values are shown for each of the six variations in 
the experiment: the mean overlap, the median overlap, 
and the percentage of perfect overlaps (overlaps of value 
1.00). In every case, the median is higher than the 
mean. Put another way, there is always a cluster of 
good overlaps, but the general tendency is to have fairly 
poor overlaps. 
The six variations of the experiment are as follows. 
The first distinction is whether or not to count the neg- 
ative evidence. We note that the use of negative exam- 
ples, i.e., plausible uses of the verb in contexts which 
are disallowed, was a key component of this experi- 
ment. There are 1082 positive examples and 586 nega- 
tive examples. Although this evidence is useful, it is not 
available in dictionaries, corpora, or other convenient 
resources that could be used to extend Levin's classi- 
fication. Thus, to extend our approach to novel word 
senses (i.e., words not occurring in Levin), we would 
not be able to use negative evidence. For this reason, 
we felt it necessary to determine the importance of nega- 
tive evidence for building uniquely identifying syntactic 
signatures. As one might expect, throwing out the neg- 
ative evidence degrades the usefulness of the signatures 
across the board. The results which had the negative 
evidence are shown in the left-hand column of numbers 
in Table 2, and the results which had only positive evi- 
dence are shown in the right-hand side. 
The second, three-way, distinction involves preposi- 
tions, and breaks the two previous distinctions involv- 
ing negative evidence into three sub-cases. Because we 
were interested in the role of prepositions in the sig- 
natures, we also ran the experiment with two different 
parse types: ones that ignored the actual prepositions 
in the pp's, and ones that ignored all information except 
for the values of the prepositions. Interestingly, we still 
got useful results with these impoverished parses, al- 
though fewer semantic classes had uniquely-identifying 
syntactic signatures under these conditions. These re- 
sults are shown in the three major rows of Table 2. 
The best result, using both positive and negative ev- 
idence to identify semantic classes, gives 6.3% of the 
verbs having perfect overlaps relating semantic classes 
to syntactic signatures. See Table 2 for the full results. 
3.3 Experiment 2" Class-based Approach 
In this experiment, we attempt to discover whether each 
class-based syntactic signature uniquely identifies a sin- 
324 
Verb-based Experiment (No Disami)iguation) 
:ed 
,~sitions 
~i~ ed 
)sitions 
qYgfy 
)sitions 
Overlap 
Median 
Mean 
Perfect 
Median 
Mean 
Perfect 
Median 
Mean 
Perfect 
With No 
Negative Negative 
Evidence Evidence 
O.lO 0.09 
0.17 0.17 
6.3% 5.2% 
0.t0 0.09 
0.17 O. 16 
6.3% 4.2% 
0.10 0.09 
0.16 0.715 
3.1% 3.1% 
Table 2: Verb-Ba~sed Results 
Class-based Experiment (Disambiguated Verbs) 
- With No 
Negative Negative 
Overlap Evidence Evidence 
Marked 
Prepositions 
~lgnored 
Pret)ositions 
~3nly 
Prepositions 
-TVledian 
Mean 
Perfect 
Median 
Mean 
Perfect 
Median 
Mean 
Perfect 
1.00 1.00 
0.99 0.93 
97.9% 88.0% 
-1.00 1.00 
0.96 0.69 
87.4% 52.4% 
1.00 0.54 
0.82 0.57 
66.5% 42.9% 
'fable 3: Cla~ss-Based l{esnlts 
gle semantic class. By h)cnsing on the classes, the verbs 
are implicitly disambiguated: the word sense is by def- 
inition the sense of the verb as a member of a given 
class. To compare these signatures with the previous 
verb-based signatures, it may be helpfnl to note that 
a verb-based signature is the union of all of the class~ 
based signatures of the semantic classes that the verb 
appears m. 
'Fhe outline for this class-based exl)eriment is as fol- 
lows: 
1. Automatically extract syntactic information from tile 
example sentences to yMd the syntactic signatnre for 
the class. 
2. Determine which semantic classes have uniquely- 
identifying syntactic signatures. 
If we use the class-based syntactic signatures contain- 
ing t)rcposition-marked pp's and both positive and neg- 
ative evidence, the 1668 example sentences reduce to 
282 syntactic patterns, just as before. But now there 
are 189 class-based syntactic signatures, as compared 
with 748 verb-based signatures from before. 187 of them 
mriquely identify a semantic (:lass, meaning that 97.9% 
of the classes have uniquely identifying syntactic signa- 
tures. Four of the semantic classes do not have enough 
syntactic information to distinguish them uniquely. 4 
Although the effects of the various distinctions were 
present in the verb-based experiment, these effects are 
much clearer in the class-based experiments. The effects 
of negative and positive evidence, as well as the three 
ways of handling prepositions show up much clearer 
here, as is clear in Table 4. 
In the class-based experiment, we counted the per- 
centage of semantic classes that had uniquely ide.nti- 
fying signatures. In the verb-based experiment, we 
counted the number of perfect overlaps (i.e., index of 
1.00) between the verbs as grouped in the semantic 
classes and grouped by syntactic signature. The over- 
all results of the suite of experiments, illustrating tile 
role of disambiguation, negative evidence, and preposi- 
tions, is shown in Table 4. There were three ways of 
treating prepositions: (i) mark the pp with the prepo- 
sition, (ii) ignore the preposition, and (iii) keel) only 
the prepositions. For these different strategies, we see 
the percentage of perfect overlaps, as well as both tire 
4Two of these classes correspond to one of the two non- 
unique signatures, and two (:orrespond to the other non- 
unique signature. 
median and mean overlap ratios for each experiment. 
'Fhese data show that the most important factor in the 
experiments is word-sense disambiguation. 
Marked Prepositions 
ignored Prepositions 
Only Prepositions 
~W~{h Dismnl)lguation 
Marked Prepositions 
Ignored Prepositions 
Only Prepositions 
With No 
Negative Negative 
Evidence Evidence 
6.3% 5.2% 
6.3% 4.2% 
3.1% 3.1% 
97.9% 88.{)% 
87.4% 52.4% 
66.5% 42.9% 
Table 4: Overall Results 
4 Semantic Classification of Novel 
Words 
As we saw above, word sense disambiguation is critical 
to tile success of any \[exical acquisition algorithm. The 
Levin-based verbs are already disambiguated by virtue 
of their membership in different classes. The difficulty, 
then, is to disambiguate and classify verbs that do not 
occur in Levin. Our current direction is to make use 
of the results of tire first two experiments, i.e., the re- 
lation t)etween syntactic patterns and semantic classes, 
but to use two additional techniques for disambiguation 
and classification of non-Levin verbs: (1) extraction of 
synonym sets provided in WordNet (Miller, 1985), an 
online lexical database containing thesaurus-like rela- 
tions such as synonymy; and (2) selection of appropri- 
ate synonyms based on correlations between syntactic 
information in l,ongman's Dictionary of Contemporary 
English (LDOCF,) (Procter, 1978) and semantic classes 
in Levin. 'Phe basic idea is to first determine tire most 
likely candidates for semantic classification of a verb by 
examining the verb's synonym sets, many of which in- 
tersect directly with the verbs classified by Leviu. The 
"closest" synonyms are then selected fl'om these sets by 
comparing the LDOCE grammar codes of tire unknown 
word with those associated with each synonym candi- 
date. The use of LDOCE as a syntactic filter on tire 
semantics derived from WordNet is tire key to resolv- 
ing word-sense ambiguity during the acquisition pro- 
cess. The fldl acquisition algorithm is as follows: 
325 
Given a verb, check I,evin (:lass. 
1. If in Levitt, classify directly. 
2. if not in Levin, find synonym set from WordNet. 
(a) if synonym in Levin, select, the class that 
has the closest match with canonical LDOCE 
codes. 
(b) If no synonyms in Levin or canonical LDOCE 
codes are completely mismatched, hypothesize 
new class. 
Note that this algorithm assmnes that there is a 
"canonicM" set of LDOCE codes tbr each of Levin's 
semantic classes. Table 5 describes the significance of 
a subset of the syntactic codes in LDOCE. (The total 
nmnber of codes is 174.) We have developed a relation 
between LDOCE codes and Levin classes, in mnch the 
same way that we associated syntactic signatures with 
the semantic classes in the earlier experiments. These 
canonical codes are for syntactic filtering (checking for 
the closest match) in the classification algorithm. 
As an example of how the word-sense disambigua- 
tion process and classifcation, consider the non-Levin 
verb attempt. The LDOCE specification for this verb 
is: T1 T3 T4 WV5 N. Using the synonymy feature 
of WordNet, the algorithm automatically extracts tire 
candidate classes associated with the synonyms of this 
word: (1) Class 29.6 "Masquerade Verbs" (ace), (2) 
Class 29.8 "Captain Verbs" (pioneer), (3) Class 31.1 
"Amuse Verbs" (try), (4) Class 35.6 "Ferret Verbs" (seek), 
and (5) Class 55.2 "Complete Verbs" (initiate). 
The synonyms for each of these classes have the follow- 
ing LDOCE encodiugs, respectively: (1) I I-FOIl I-ON 
I-UPON LI L9 T1 N; (2) L9 T1 N; (3) I T1 T3 T4 WV4 
N; (4) ~ bAF'\['EI~ I-FOR T1 T3; and (5) T1 TI-INTO 
N. The largest intersection with the syntactic codes for 
attempt occurs with the verb try (TI T3 T4 N). How- 
ever, Levin's class 31.1 is not the correct class for at- 
tempt since this sense of try has a "negative amuse" 
meaning (e.g., John's behavior tried my patience. In 
fact, the (:odes T1 'l'3 '1'4 are not part of the canonical 
class-code mapping associated with class 31.1. Thus, at- 
tempt falls under case 2(b) of the algorithm, and a new 
class is hypothesized. This is a case where word-sense 
disambiguation has allowed us to classify a new word 
and to enhance Levin's verb classification by adding a 
new class to the word try as well. In our experiment;s, 
our algorithm found severM additional non-Levin verbs 
that fell into this newly hypothesized (;lass, including 
aspire, attempt, dare, decide, desire, elect, need, and 
swear. 
We have automatically classified 10,000 "unknown" 
verbs, i.e., those not occurring in the Levin classifica- 
tion, using this technique. These verbs are taken from 
i e , translations provided in bilin- English "glosses" (.. ) . 
gual dictionaries for Spanish and Arabic) As a pre- 
liminary measure of success, we picked out 84 L1)OCE 
control vocabulary verbs, (i.e., primitive words used for 
defning dictionary entries) and hand-checked our re- 
sults. We found that 69 verbs were classifed correctly, 
SThe Spanish-English dictionary was built at the Univer- 
sity of Maryland; The Arabic-English dictionary was pro- 
duced by Alpnet, a company in Utah that develops transla- 
tion aids. We are Mso in the process of developing bilingual 
dictionaries for Korean and French, and we will be porting 
our LCS acquisition technology to these languages in the 
near future. 
i.e., 82% accuracy. 
5 Summary 
We have conducted two experiments with the intent of 
addressing the issue of word-sense ambiguity in extrac- 
tion from machine-readable resources for the construe 
tion of large-scale knowledge sources. In the first exper- 
iment, verbs that appeared in different classes collected 
the syntactic information fl'om each class it appeared 
in. Therefore, the syntactic signature was coml)osed 
from all of the example sentences fi'om every (:lass the 
verb appeared in. In some cases, the verbs were seanan- 
tically unrelated and consequently the mat)ping from 
syntax to semantics was muddied. '\['he second exper- 
iment attelnpted to determine a relationship between 
a semantic class and the syntactic information associ- 
ated with each class. Not surprisingly, but not insignif- 
icantly, this relationship was very clear, since this ex- 
periment avoided the problem of word sense ambiguity. 
These experiments served to validate Levin's claim that 
verb semantics and syntactic behavior are predictably 
related and also demonstrated that a significant con> 
ponent of any lexical acquisition program is the ability 
to perform word-sense disambiguation. 
We have used the results of our first two experiments 
to help in constructing and augmenting online dictio- 
naries for novel verb senses. We have used the same 
syntactic signatures to categorize new verbs into Lcvin's 
classes on the basis of WordNet and 1,1)O(?1!3. We are 
currently porting these results to new languages using 
online bilingual lexicons. 
Acknowledgements 
The research reported herein was supported, in part, 
by Army l{,esearch Office contract I)AAL03-91-C-0034 
through Battelle Corporation, NSF NYI IRl-9357731, 
Alfred P. Sloan Research Fellow Award BR3336, and a 
General Research Board Semester Award. 
References 
Alshawi, H. 1989. Analysing the Dictionary l)efini- 
tions. In B. Boguraev and T. Briscoe, editor, Compuo 
rational Lexicography for Natural Language Prvcess- 
ing. Longman, London, pages 153 169. 
Boguraev, B. and T. Briscoe. 1989. Utilising the 
LDOCE Grammar Codes. In B. Boguraev and T. 
Briscoe, editor, Computational Lexicography for" Nat- 
ural Language Processing. Longman, London, pages 
85-116. 
Brent, M. 1993. Unsupervised Learning of Lexical Syn- 
tax. Computational Linguistics, 19:243-262. 
Chm'ch, K. and P. Hanks. 1990. Word Association 
Norlns, Mutual Information and Lexicography. Co'm- 
pntational Linguistics, 1(5:22 29. 
Copestake, A., T. Briscoe, 1 ). Vossen, A. Ageno, 
I. Castellon, F. l{ibas, (J. t{igau, 1t. l{odr{guez, and 
A. Samiotou. 1995. Acquisition of LexicM Transla- 
tion Relations from MRDS. Machine Translation, 9. 
l)orr, B., J. Garlnan, and A. Weinberg. 1995. l,'rom 
Syntactic Encodings to '\['hematic Roles: Building 
Lexieal Entries lbr Interlingual MT. Machine 7'rans- 
lation, 9. 
326 
LDOCE Code Argmnentg Adjuncts ExmnI)le 
I 
I-AI,'TER 
I- I,'O 1/. 
I-ON 
n I-UPON 
\], 1 
L9 -q,\] ............. 
T1-IN~I'O 
T3 
q'4 
NP 
ADV/PP 
NP 
NP 
t'P~af~ 
Tq,~.\]- - - 
PP\[on\] 
Olivier is acting t, onight 
She sought ;ffter the truth 
They sought for the right otto 
lie acted on our suggestion 
'\['he drug acted upon the pa.in 
11(.' a('ts the exl)erienced man 
'l'he \])by a('.ts well 
-Ii)loneered tit(: new bum 
-WeT~it~,eTd-him into tit(: group 
lie. trled to do it 
7qh(7 Grh.~dl eating the new food 
WV4 ing adjectiwd I've ha.d a, trying day 
-ed adjectiwd WV5 lie was convicted for attelnpte(1 murder 
Tabh'. 5: Sample Syntactic Codes used in IA)OCI'; 
l)orr, B. and C. Voss. to appear. A Multiq,evel Ap 
proach to lnterlingual MT: I)etining the Interface 
between l{epresentt~tional l,anguages, l'nlcr'national 
Jou'rnal of l,,',pert Systcms. 
Dubois, \]). and 1 ). Saint-Dizier. 1995. Construction el, 
repres6ntation de classes sdmantiques de verbes: une 
coop&ation entre syntaxe et cognition, manuscript, 
IRIT- CNRS, Toulouse, li'rance. 
Farwell, D., 1,. Guthrie, trod Y. Wilks. 1993. Automat- 
ically Creating Lexical Entries for UI,'FRA, a Multi- 
lingual MT System. Machine 7Y'anslaliou, 8(3). 
Fillmore, C.,I. 1968. The Case lbr (?ase. In 1,',. Bach and 
t~.'1'./larms, editor, Universals in Linguistic Th, cory. 
\[lolt, l{,inehart, a11d Winston, p~ges 1 88. 
Grimshaw, J. 1990. Argument b'lruclure. MIT Press, 
Cambridge, MA. 
Gruber, J.S. 1965. Studies in Lcxical Rela¢ions. Ph.I). 
thesis, MIT, Cambridge, MA. 
Guthrie, J., L. Guthrie, Y. Wilks, and 11. Aidinciad. 
19911. Subjeet-l)ependent Co occurrence and Word 
Sense l)isambiguation. In l~roceedings oJlhc 29th An- 
nual Meeting of lhe Associalion for Compulalional 
Linguistics, pages 146 152, (Jniversity of California, 
Berkeley, CA. 
Ilearst, M. 1991. Noun llomograph l)isambiguation 
Using 1,oeal ('Jontext in Large 'Pext Corl)ora. In Using 
Corpora, University of Waterloo, Waterloo, Ontario. 
,l~u;kendoff, R,. 1983. Semantics aud Cognition. MtT 
Press, Cambridge, MA. 
Jackendoff, R. 1990. fiemanlic Structures. MIT Press, 
Cambridge, MA. 
Klawms, J.L. and 1,;. q'zoukermmm. 1996. Dictionar- 
ies and (k)rpora: (Jolnbiniug (~ort)us aud Machine,- 
readable Dictionary l)ata lbr Building 13ilingual I,ex 
icons. Machine 'lYanslatiou, 10. 
Levin, B. 1993. l'h~.glid~, Verb Classes and Alternation.s: 
A Preliminary Investigation. Chicttgo, II,. 
l,onsdale, I)., T. Mitamura, and F,. Nyberg. 19!)6. Ac. 
quisition of l,arge Lexicous for Practical Knowledge- 
Based MT. Machine Translation,'9. 
Miller, (\]. 1985. WOIU)NF, T: A /)ictiouary lh:owser. 
In Proceedings of the First lnlcrualional Conference 
on htJbrmalion, i'n l)ala, University of Waterloo (',en- 
tre for the New ()li)l), Waterloo, Ontario. 
Neff, M. and M. Mc(;ord. 1990. Acquiring I,exh:at 
I)ata f'ronl Machinc-II.eadable l)ictionary I~esourccs 
for Machine Translation. \]ii Third l'nter'nalional Con- 
ference on Theoretical and Methodological issues in 
Machine Translation oJ'Nahwal Languages (7'MI-90), 
A us t in, 'l'exas. 
Pinker, S. 1989. hearnability and Cognilion: The Ac- 
quisitiou of Argument Structure. MIq' Press, Cam- 
bridge, MA. 
Procter, P. 1978. Lou, gman Dictionary of Conlcmpoo 
ra W I5'uglish. l,onginan, l,ondon. 
Sanfilippo, A. and V. I'ozmmsk\[. 1992. The Acquisi- 
tion of \[,exieal Knowledge from (~olnbine(l Machine- 
Readable Dictionary Ih-.som'('es. In Proceedings of 
th.e Applied Natural Languaqc Processing Co'nfcrencc, 
pages 80 87, Tren{;o, Italy. 
Walker, 1). and R,. Ams\]er. 1986. The Use of Machine-- 
readable Dictionaries in Sublanguage Atmlysis. In 
1L Grishman and IL Kittredge, editors, Analyzing 
Language in t{csl'ricled Domains. l,awrence Erlbaum 
Associates, llillsdale, New Jersey, pages 69 83. 
Wilks, Y., 71). Fass, C.M. Guo, ,1.1';. M('Domdd, and 
T. l'|ate. 1990. Providing Machine 't'ractable \[)ic- 
~ionary 'lbols. Machine 7'ranslalion, 5(2):99 154. 
Wilks, Y., 1). Fass, C.M. Guo, a .E. M('l)onald, '\['. P\]al,e, 
and B.M. Slator. 1989. A Tractable Machine Dicl;io- 
nary as a. I{.esource for Computational Semanl, ics. In 
B. Bogura.ev and T. Briscoe, editor, Computaiional 
Lexicograph, y for' Natural Language Processing. \[,oug-- 
man, l,ondon, pages 85 116. 
Wu, \]). and X. Xia. 19!)5. I,m'ge-Seale Automatic Ex- 
traction o\[' an l,\]nglish-Chinese, Translation l,exieon. 
Machine Translalion, 9. 
Yarowsky, 1). 1992. Word-Sense l)isambigual, io|~:, Us 
ing Statistical Models of Roget's Categories Trainc.,d 
on l,arge Corpora. In Proceedings of the l/ou'rlce~lh 
lntc'rnatioual UonJi:renc~ on Compulalional Linguis- 
lie.s, pages 454 460, Nantes, l~'rancc. 
327 
