Large-Scale Acquisition of LCS-Based Lexicons 
for Foreign Language Tutoring 
Bonnie J. Dorr 
Department of Computer Science 
University of Maryland 
College Park, MD 20742, USA 
bonnie©cs, urad. edu 
Abstract 
We focus on the probleln of building large 
repositories of le.rical coJtceplual structure 
(LCS) representations for verbs in multi- 
ple languages. One of the main results 
of this work is the definition of a rela- 
t, ion between broad semantic classes and 
LCS meaniug components. Our acquisi- 
tion program--LEXICALL--takes, as in- 
put, the result of previous work on verb 
classification and thematic grid tagging, 
and outputs LCS representations for differ- 
ent. languages. These representations have 
been ported into English, Arabic and Span- 
ish lexicons, each containing approximately 
9000 verbs. We are currently using these 
lexicons in an operational foreign language 
tutoring and machine translation. 
1 Introduction 
A wide range of new capabilities in NLP applications 
such as foreign language tutoring (FLT) has been 
made possible by recent advances in lexica.1 seman- 
tics (Carrier and Randall, 1993; Dowty, 1991; Fill- 
more, 1968; Foley and Van Valin, 1984; Grimshaw, 
1990; Gruber, 1965; Hale and Keyser, 1993; Jack- 
endoff, 1983; aackendoff, 1990: Jackendoff, 1996; 
Levin, 1993; Levin and Rappaport Hovav, To ap- 
pear; Pesetsky, 1982; Pinker, 1989). Many of these 
researchers adopt the hypothesis that verbs can be 
grouped into broad classes, each of which corre- 
sponds to some combination of basic meaning con> 
ponents. This is the basic premise underlying our 
approach to multilingual lexicon construction. In 
particular, we have organized verbs into broad se- 
lnantic classes and subsequently designed a set of 
le,ical conceptual structures (LC, S), for each class. 
These representations have been ported into English, 
Arabic, and Spanish lexicons, each containing ap- 
proximately 9000 verbs. 
An example of a NLP application for which these 
lexicons are currently in use is an operational foreign 
language tutoring (FLT) system called Military Lan- 
guage Tutor (MILT). This system provides a wide 
range of lessons for use in language training. One 
of the tutoring lessons, the MicroWorld Lesson (see 
Figure 1) requires the capability of the language- 
learner to state domain-specific actions in a variety 
of different ways. For example, the language-learner 
might connnand the agent (pictured at the left in 
the graphical interface) to take the following action: 
Walt" to the table and pick up the document. The 
same action should be taken if the user says: Go to 
the table and remove document, Retrieve the docu- 
ment from the table, etc. The LCS representation 
provides the capability to execute various forms of 
the same command without hardcoding them as part, 
of the graphical interface. 
In another tutoring lesson, Question-Answering, 
the student is asked to answer questions about a 
foreign language text that they have read. Their 
answer is converted into an LCS which is matched 
against a prestored LCS corresponding to an answer 
typed in by a human instructor (henceforth, called 
the "author"). The prestored LCS is an idealized 
form of the answer to a question, which can take one 
of many forms. Suppose, for example, the question 
posed to the user is: Where did Jack put the book'? 
(or Addnde paso Jack el libro? in Spanish). The 
author's answer, e.g., Jack put the book in the trash, 
has been stored as an LCS by the tutoring system. 
If the student types Jack threw the book in the trash, 
or Jack moved the book from the table into the trash, 
the system is able to nautch against the prestored 
LCS and determine that all three of these responses 
are semantically appropriate. 
We have developed an acquisition program-- 
LEXICALL--that allows us to construct LCS-based 
lexicons for the FLT system. This program is de- 
signed to be used for multiple languages, and also for 
other NLP applications (e.g., machine translation). 
One of the main results of this work is the definition 
of a relation between broad semantic classes (based 
on work by Levin (1993)) and LCS meaning com- 
ponents. We build on previous work, where verbs 
were classified automatically (Doff and .Jones, 1996: 
139 
Figure 1: MicroWorld Lesson in MILT 
x~ # , ~ .~ ~,a ~ ~ ~ ~:': ::: : 
Dorr, To appear) and tagged with thematic grid in- 
formation (Dorr, Garman, and Weinberg, 1995). We 
use these pre-assigned classes and thematic grids as 
input to LEXICALL. The output is a set of LCS's 
corresponding to individual verb entries in our lexi- 
con. 
Previous research in automatic acquisition focuses 
primarily on the use of statistical techniques, such as 
bilingual alignment (Church and Hanks, 1990; Kla- 
vans and Tzoukermann, 1995; Wu and Xia, 1995) 
or extraction of syntactic constructions from on- 
line dictionaries and corpora (Brent, 1993). Others 
have taken a more knowledge-based (interlingual) 
approach (Lonsdale, Mitamura, and Nyberg, 1995). 
Still others (Copestake et al.. 1995), use English- 
based grammatical codes for acquisition of lexical 
representations. 
Our approach differs from these in that it exploits 
certain linguistic constraints that govern the rela- 
tion between a word's surface behavior and its cor- 
responding semantic class. We delnonstrate that-- 
by assigning a LCS representatioll to each semantic 
class--we can produce verb entries on a broad scale; 
these, in turn, are ported into multiple languages. 
We first show how the LCS is used in a FLT system. 
We then present an overview of the LCS acquisition 
process. Finally, we describe how LEXICALL con- 
structs entries for specific lexical items. 
2 Application of the LCS 
Representation to FLT 
One of the types of knowledge that must be cap- 
tured in FLT is linguistic knowledge at the level 
of the lexicon, which covers a wide range of infor- 
mation types such as verbal subcategorization for 
events (e.g., that a transitive verb such as hit occurs 
with an object noun phrase), featural information 
(e.g., that the direct object of a verb such as frighlen 
is animate), thematic information (e.g., that Mary is 
the agent in Mary hie the ball), and lexical-semantic 
information (e.g., spatial verbs such as throw are 
conceptually distinct fi'om verbs of possession such 
as give). By modularizing the lexicon, we treat each 
information type separately, thus allowing us to vary 
the degree of dependence on each level so that we 
can address the question of how much knowledge is 
necessary for the success of the particular NLP ap- 
plication. 
This section describes the use of the LCS repre- 
sentation in a question-answering component of the 
MILT system (Sains, 1993; Weinberg et al., 1995). 
As described above, the LCS representation is used 
as the basis of matching routines for assessing stu- 
dents' answers to free response questions about a 
short foreign language passage. In order to inform 
the student whether a question has been answered 
140 
Table 1: Correspondence Between NLP Output and Tutor Feedback 
System Prompt: Where did Jack put the book? 
Student Answer Prestored Answer Matcher Output Feedback 
Jack threw the book in the trash Jack threw the book in the trash exact match "That's right" 
Jack put the book in the trash Jack threw the book in the trash missing MANNER "How?" 
.Jack threw the book in the trash Jack put the book in the trash extra MANNER "You're assuming things" 
.Jack is friendly Jack put the book in the trash mismatch primitive "Please reread" 
Jack threw the book Jack put the book in the trash missing argument "Where?" 
correctly, the author of the lesson must provide the 
desired response in advance. The system parses and 
semantically analyzes the author's response into a 
corresponding LCS representation which is then pre- 
stored in a database of possible responses. Once the 
question answering lesson is activated, each of the 
student's responses is parsed and semantically ana- 
lyzed into a LCS representation which is checked for 
a match against the corresponding prestored LCS 
representation. The student is then informed as to 
whether the question has been answered correctly 
depending on how closely the student's response 
LCS matches the author's prestored LCS. 
Consider what happens in a lesson if the author 
has specified that a correct answer to the question 
Addnde paso Jack el libro? in Spanish is Jack fir6 
el libro a la basura ('Jack threw out the book into 
the trash'). This answer is processed by the system 
to produce the following LCS: 
(1) \[E,'~nt CAUSE 
(\[Thing JACK\], 
\[Ev..t GOLo~ 
(\[Thing BOOK\], 
\[P~th TOLo~ 
(\[Position ATLoc 
(\[Thing BOOK\], \[Thing TRASH\])\])\])\], 
\[M ...... THROWINGLY\])\] 
The LCS is stored by the tutor and then later 
matched against the student's answer. If the stu- 
dent types Jack movio ' el libro de la mesa a la basura 
('Jack moved the book froln the table to the trash'), 
the system must determine if these two match. The 
student's sentence is processed and the following 
LCS structure is produced: 
(2) \[E .... CAUSE 
(\[Thing JACK\], 
\[Event GOLoc 
(\[Thing BOOK\], 
\[Path ZOLoc (\[Position ATLo¢ 
(\[Thing BOOK\], \[Thing TRASH\])\])\], 
\[Path FROMLo~ (\[Position ATLo~ 
(\[Thing BOOK\], \[Thin~; TABLE\])\])\])\])\] 
The matcher compares these two, and produces the 
following output: 
Missing: MANNER THROWINGLY 
Extra: FROM LOC 
The system identifies the student's response as a 
match with the prestored answer, but it also recog- 
nizes that there is one piece of missing information 
and one piece of extra information. 
The "Missing" and "Extra" output is internal to 
the NLP component of the Tutor, i.e., this is not 
the final response displayed to the student. The sys- 
tem must convert, this information into meaningful 
feedback so that the student knows how to repair 
the answer that was originally given. For example, 
the instructor can program the tutor to notify the 
student about the omitted information in the form 
of a 'How' question, or it can choose to ignore it. 
The extra information is generally ignored, although 
it is recorded in case the instructor decides to pro- 
gram the system to notify the student about this 
as well. The full range of feedback is not presented 
here. Some possibilities are summarized (in English) 
in Table 1 (adapted from (Holland, 1994)). Note 
that. the main advantage of using the LCS is that it 
allows the author to type in an answer that is general 
enough to match any number of additional answers. 
3 Overview of LCS Acquisition 
We use Levin's publicly available online index 
(Levin, 1993) as a starting point for building LCS- 
based verb entries. 1 While this index provides a 
unique and extensive catalog of verb classes, it does 
not define the underlying meaning components of 
each class. One of the main contributions of our 
work is that it provides a relation between Levin's 
classes and meaning components as defined in the 
LCS representation. 
Table 2 shows three broad semantic categories and 
example verbs along with their associated LCS rep- 
resentations. We have band-constructed a database 
containing 191 LCS templates, i.e., one for each verb 
class in (Levin, 1993). In addition, we have gener- 
a.ted LCS templates for 26 additional classes that are 
not included in Levin's system. Several of these cor- 
respond to verbs that take sentential complements 
(e.g., coerce). 
1We focus on building entries for verbs; however, 
we have approximately 30,000 non-verb entries per 
language. 
141 
Category Verb 
Location suspend 
touch 
Motion abandon 
float 
Placement adorn 
spill 
Table 2: Sample Templates Stored in the LCS Database 
Class Grid 
9.2 ,ag_th,loc() 
47.8 thloc 
51.2 _th,src 
51.3.1 th,src() ,goal() 
9.8 _ag th ,raod-poss (with) 
9.5 , agth 
Ko\]I 
\[CAUSE (X, 
\[BELo¢ (Y, \[ATLo¢ (Y, Z)\])\], \[BY (MANNER)\])\] 
\[BELo¢ (Y,\[ATLo~ (Y, Z)\], \[BY (MANNER}\])\] 
\[GOLo~ (Y, \[(DIRECTION)Lo¢ (Y, \[ATLo¢ (Y, Z)\])\])\] 
\[GOLo¢ (Y, \[BY (MANNER)\])\] 
\[CAUSE (X, 
\[GOIdent (Y, 
\[TOWARDId~t (Y, 
\[ATId¢n~ (Y, 
\[(STATE)Id~nt (\[(WITH>po~ (*HEAD*, Z)\])\])\])\])\])\] 
\[CAUSE (X, \[GOLo¢ (Y)\], \[BY (MANNER)\])\] 
A full entry in the dal:abase includes a semantic 
class number with a list of possible verbs, a thematic 
grid, and a LCS template: 
(3) Class 47.8: adjoin, intersect., meet, touch .... 
Thematic Grid: _th_loc 
LCS Template: 
(be loc (thing 2) 
(at loc (thing 2) (thing 11)) 
( ! ! -ingly 26) ) 
The semantic class label 47.8 above is taken from 
Levin's 1993 book (Verbs of Contiguous Location), 
i.e., the class to which the verb touch has been 
assigned. 2 A verb, together with its semantic class 
uniquely identifies the word sense, or LCS tem- 
plate, to which the verb refers. The thematic grid 
(_th_loc) indicates that the verb has two obligatory 
arguments, a theme and a location. 3 The !! in the 
LCS Template acts as a wildcard; it will be filled by 
a lexeme (i.e., a root form of the verb). The resulting 
form is called a constant, i.e., the idiosyncratic part 
of the meaning that distinguishes among members 
of a verb class (in the spirit of (Grimshaw, 1993; 
Levin and Rappaport Hovav, To appear; Pinker, 
1989; Talmy, 1985)). 4 
Three inputs are required for acquisition of verb 
entries: a semantic class, a thematic grid, and 
a lexeme, which we will henceforth abbreviate as 
"class/grid/lexeme." The output is a Lisp-like ex- 
pression corresponding to the LCS representation. 
An example of input/output for our acquisition pro- 
cedure is shown here: 
(4) Acquisition of LCS for: touch 
Input: 47.8: _th_loc; "touch" 
2Verbs not occurring in Levin's book are also assigned 
to classes using techniques described in {Dorr and Jones, 
1996; Dorr, To appear). 
ZAn underscore (_) designates an obligatory role and 
a comma (,) designates an optional role. 
4The ! ! in the Lisp representation corresponds to the 
angle-bracketed constants ill Table 2, e.g., ! !-ingly cor- 
responds to (MANNER}. 
Output: 
(be loc (* thing 2) 
(at loc (thing 2) (* thing 11)) 
(touchingly 26) ) 
Language-specific annotations such as the .-,uarker 
in the LCS Output are added to the templates by 
processing the components of thematic grid specifi- 
cations, as we will see in more detail next. 
4 Language-Specific Annotations 
In our on-going example (4), the thematic grid 
_th loc indicates that the theme and the loca- 
tion are both obligatory (in English) and should 
be annotated as such in the instantiated LCS. This 
is achieved by inserting a *-marker appropriately. 
Consider the structural divergence between the fol- 
lowing English/Spanish equivalents: 
(5) Structural Divergence: 
E: John entered the house. 
S: John entr6 a la casa. 
'John entered into the house.' 
The English sentence differs structurally from the 
Spanish in that the noun phrase the house corre- 
sponds to a prepositional phrase a la casa. This 
distinction is characterized by different positionings 
of the *-marker in the lexical entries produced by 
LEXICALL: 
(6) Lexical Entries: 
enter: (go loc (* thing 2) 
(toward loc (thing 2) 
(in loc (thing 2) (* thing 6))) 
(enteringly 26) ) 
entrar: (go loc (* thing 2) 
((* toward 5) loc (thing 2) 
(in loc (thing 2) (thing 6))) 
(enteringly 26) ) 
The lexicon entries for enter and entrar both mean 
"X (= Thing 2) goes into location Y (= Thing 6)." 
Variable positions (designated by numbers, such as 
2, 5 and 6) are used in place of the ultimate fillers 
142 
such as john and house. The structural divergence 
of (,5) is a.ccomnaodated as follows: the *-marked leaf 
node, i.e., (thing 6) in the enter definition, is filled 
directly, whereas the .-marked non-leaf node, i.e., 
((toward 5) loc ...) in the en¢rar definition, is 
filled in through unification at the internal toward 
node. 
5 Construction of Lexical Entries 
C.onsider the construction of a lexical entry for the 
verb adorn. The LC, S for this verb is in the class of 
Fill Verbs (9.8): s 
(7) (cause (thing 1) 
(go ident (thing 2) 
(toward ident (thing 2) 
(at ident (thing 2) (!!-ed 9)))) 
(with poss (*head*) (thing 16))) 
This list structure recursively associates logi- 
cal heads with their arguments and modifiers. 
The logical head is represented as a primi- 
tive/field Colnbination, e.g., GOIdent is repre- 
sented as (go ident ...). The arguments 
for CAUSE are (thing 1) and (go ident ...). 
The substructure GO itself has two arguments 
(thing 2) and (toward ident ...) and a modi- 
fier (with poss ...).6 The ! !-ed constant refers 
to a resulting state, e.g., adorned for the verb adorn. 
The LC.S produced by our program for this verb is: 
(8) (cause (thing 1) 
(go ident (thing 2) 
(toward ident (thing 2) 
(at ident (thing 2) (adorned 9)))) 
(with poss (*head*) (thing 16))) 
The variables in the representation map between 
LCS positions and their corresponding thematic 
roles. In the LCS framework, thematic roles provide 
semantic information about properties of the argu- 
ment and modifier structures. In (7) and (8) above, 
the numbers 1, 2, 9, and 16 correspond to the roles 
agent (ag), theme (th), predicate (pred), and pos- 
sessional modifier (mod-poss), respectively. These 
numbers enter into the construction of LCS entries: 
they correspond to argument positions in the LCS 
template (extracted using the class/grid/lexeme 
specification), hfformatiou is filled into the LCS 
template using these numbers, coupled with the the- 
matic grid tag for the particular word being defined. 
5.1 Pundmnentals 
LEXICALL locates the appropriate template in the 
LCS database using the class/grid pairing as an in- 
5Some of the other 9.8 verbs are: anoint, bandage. 
flood, frame, garland, stud, s~@~se, surround, veil. 
6The *head* symbol--used for modifiers--is a place- 
holder that points to the root (cause) of the overall lex- 
icaJ entry. 
dex, and then determines the language-specifc an- 
notations to instantiate for that template. The de- 
fault position of the .-marker is the left-most oc- 
currence of the LCS node corresponding to a par- 
ticula.r thematic role. However, if a preposition oc- 
curs in the grid, the .-marker may be placed dif- 
ferently. In such a. case, a. primitive representation 
(e.g., (to loc (at loc))) is extracted from a set 
of predefined mappings. If this representation cor- 
responds to a subcomponent of the LCS template, 
the program recognizes this as a match against the 
grid, and the .-marker is placed in the template at 
the level where this match occurs (as in the entry 
for entrar given in (6) above). 
If a preposition occurs in the grid but there is no 
matching primitive representation, the preposition is 
considered to be a. collocation, and it is placed in a 
special slot--:collocations--which indicates that 
the LCS already covers the semantics of the verb 
and the preposition is an idiosyncratic variation (as 
in learn about, know of, etc.). 
If a preposition is required but it is not specified 
(i.e., empty parentheses 0), then the .-marker is po- 
sitioned at the level dominating the node that cor- 
responds to that role--which indicates that several 
different prepositions might apply (as in put on, put 
under, put through, etc.). 
5.2 Examples 
The input to LEXICALL is a class/grid/lexeme 
specification, where each piece of information is sep- 
arated by a hash sign (#): 
<class>#<grid>#<lexeme># 
<other semantic information> 
For example, the input specification for the verb re- 
plant (a word not classified by Levin) is: 
9.7#_ag_th,mod-poss(with)#replant# 
!!-ed = planted (manner = again) 
This input indicates that the class assigned to re- 
plant is 9.7 (Levin's Spray/Load verbs) and its grid 
has a.n obligatory agent (ag), theme (tit), and all 
optional possessional modifer with preposition with 
(mod-poss (with) ). The information following the 
final # is optional; this information was previously 
hand-added to the assigned thematic grids. In the 
current example, the !!-ed designates the form of 
the constant planted which, in this case, is a mor- 
phological variant of the lexeme replant, r Also, the 
rThe constant takes one of several forms, including: 
! !-ingly for a manner, ! !-er for an instrument, and 
!!-ed for resulting states. If this information has not 
been hand-added to the class/grid/lexeme specification 
(as is the case with most of the verbs), a default mor- 
phological process produces the appropriate form from 
tile lexeme. 
143 
manner again is specified as an additional semantic 
coin ponent. 
For presentational purposes, the remainder of this 
section uses English examples. However, as we saw 
in Section 4, the representations used here carry over 
to other languages a.s well. In fact, we have used 
the same acquisition program, without modification, 
for building our Spanish and Arabic LCS-based lex- 
icons, each of size comparable to our English LCS- 
based lexicon. 
I. Thematic Roles without Prepositions 
(9) Example: The flower decorated the room. 
Input: 9.8#_mod-poss_th#decorate# 
Template: 
(be ident (thing 2) 
(at ident (thing 2) (!!-ed 9)) 
(with poss (*head*) (thing 16))) 
Two thematic roles, th and mod-poss, are specified 
for the above sense of the English verb decorate. The 
thematic code numbers--2 and 16, respectively--are 
.-marked and the constant decorated replaces the 
wildcard: 
(10) Output: 
(be ident (* thing 2) 
(at ident (thing 2) (decorated 9)) 
(with poss (*head*) (* thing 16))) 
II. Thematic Roles with Unspecified Preposi- 
tions 
(11) Example: We parked the car near the store. 
We parked the car in the garage. 
Input: 9. l#_ag_th_goal ( ) #park# 
Template: 
(cause (thing 1) 
(go loc (thing 2) 
(toward loc (thing 2) 
(\[at\] loc (thing 2) (thing 6)))) 
( ! ! -ingly 26) ) 
The input for this example indicates that the goal is 
headed by an unspecifed preposition. The thematic 
roles ag, th, and goal() correspond to code num- 
bers 1, 2, and 6, respectively. The variable positions 
for ag and th are .-marked just as in the previous 
case, whereas goal() requires a different treatment. 
When a required preposition is left. unspecified, the 
.-marker is associated with a LCS node dominating 
a generic \[at\] position: 
(12) Output: 
(cause (* thing 1) 
(go loc (* thing 2) 
((* toward S) loc (thing 2) 
(\[at\] loc (thing 2) (thing 6)))) 
(parkingly 26) ) } 
III. Thematic roles with Specified Preposi- 
tions 
(13) Example: We decorated the room with flowers. 
Input: 9.8#_ag_th ,mod-poss (with) #decorate# 
Template: 
(cause (thing 1) 
(go ident (thing 2) 
(toward ident (thing 2) 
(at ident (thing 2) (!!-ed 9)))) 
(with poss (*head*) (thing 16))) 
Here, the mod-poss role requires the preposition 
'w~th in the modifier position: 
(14) Output: 
(cause (* thing 1) 
(go ident (* thing 2) 
(toward ident (thing 2) 
(at ident (thing 2) (decorated 9)))) 
((* with 15) poss (*head*) (thing 16))) 
In order to determine the position of the .-marker 
for a thematic role with a required preposition, 
LEXICALL consults a set of predefined mappings 
between prepositions (or postpositions, in a lan- 
guage like Korean) and their corresponding primi- 
tive representations, s In the current case, the prepo- 
sition with is mapped to the following primitive rep- 
resentation: (with poss). Since this matches a 
sub-component of the LCS template, the program 
recognizes this as a match against the grid, and the 
.-marker is placed in the template at the level of 
with. 
6 Limitations and Conclusions 
We have described techniques for automatic con- 
struction of dictionaries for use in large-scale 
FLT. The dictionaries are based on a language- 
independent representation called lexical conceptual 
structure (LCS). Significant enhancements to LCS- 
based tutoring could be achieved by combining this 
representation with a mechanism for handling issues 
related to discourse and pragmatics. For example, 
Mthough the LCS processor is capable of determin- 
ing that the phrase in the trash partially matches the 
answer to Where did John put the book?, a prag- 
matic component would be required to determine 
that this answer is (perhaps) more appropriate than 
the full answer, He put the book in the trash. Repre- 
senting conversational context and dynamic context 
updating (Traum et al., 1996; Haller, 1996; DiEu- 
genio and Webber, 1996) would provide a fl'ame- 
work for this type of response "relaxation." Along 
SWe have defined approximately 100 such mappings 
per language. For example, the mapping produces 
the following primitive representations for the English 
word to: (to loc (at loc)), (to poss (at poss)), 
(to temp (at temp)), (toward loc (at loc)), 
(toward poss (at poss)). We have similar mappings 
defined in Arabic and Spanish. For example, the follow- 
ing primitive representations are produced for the Span- 
ish word a: (at loc), (to loc (at loc)), (to poss 
(at poss)), (toward loc (at lot)). 
144 
these same lines, a pragmatic component could pro- 
vide a mechanism for det, ermining that certain fully 
matched responses (e.g., John hurled the book inlo 
the trash) are not. as "realistic sounding" as partially 
matched alternatives. 
Initially, LEXICALL was designed to support the 
development of LCS's for English only; however, the 
same techniques can be used for nmltilingual acquisi- 
tion. As the lexicon coverage for other languages ex- 
pands, it, is expected that our acquisition techniques 
will help further in the cross-linguistic investigation 
of the relationship between Levin's verb classes and 
the basic meaning components in the LCS represen- 
t, ation. In addition, it is expected that verbs in the 
same Levin class may have finer distinctions than 
what we have specified in the current LCS templates. 
We view the importation of LCS's from the En- 
glish LCS database into Arabic and Spanish as 
a first, approxin~ation to the development of com- 
plete lexicons for these languages. The results have 
been hand-checked by native speakers using the 
class/grid/lexeme format (which is much easier to 
check than the flfily expanded LCS's). The lexical 
verification process took only two weeks by the na- 
tive speakers. We estimate that, it would take at 
least 6 months to build such a lexicon from scratch 
(by human recall and data. entry alone), and in such 
a case, the potential for error would be a.t least twice 
as high. 
One important benefit of using the Levin classi- 
fication as the basis of our program is that, once 
the mapping between verb classes and LCS repre- 
sentations has been established, we can acquire the 
LCS representation for a new verb (i.e., one not in 
Levin) simply by associating it. with one of the 191 
classes. We see our approach as a first step toward 
compression of lexical entries in that it allows lex- 
icons to be stored in terms of the more condensed 
class/grid/lexeme specifications; these can expanded 
online, as needed, during sentence processing in the 
NLP application. 
We conclude that, while human intervention is 
necessary for the acquisition of class/grid informa- 
tion, this intervention is virtually eliminated fi'om 
the LCS construction process because of our pro- 
vision of a lnapping between semantic classes and 
primitive meaning components. 
Acknowledgements 
I would like t.o thank Jungshin Park and Mine Ulku 
Sencan for their aid in the development of certain 
components of the LEXICALL program. In ad- 
dition, comments from five anonymous reviewers 
greatly enhanced the presentation of this work. The 
author has been supported, in part, by Army Re- 
search Office contract DAAL03-91-C-0034 through 
Battelle Corporation, NSF NYI IRI-9357731 and 
Logos Corporation, NSF CNRS INT-9314583, Ad- 
vanced Research Projects Agency and ONR contract 
N00014-92-J-1929, Alfred P. Sloan Research Fellow 
Award BR3336, Army Research Institute contract 
MDA-903-92-R-0035 and Microelectronics and De- 
sign, Inc., and the University of Maryland General 
Research Board. 

References 

Brent, Michael. 1993. Unsupervised Learning 
of Lexical Syntax. Computational Linguistics, 
19:243-262. 

Carrier, Jill and Janet H. Randall. 1993. Lexical 
mapping. In Eric Reuland and Werner Abraham, 
editors, Knowledge and Language II: Lexical and 
Conceptual Structure. Kluwer, Dordrecht, pages 
119-142. 

Church, Kenneth and P. Hanks. 1990. Word Association Norms, Mutual Information and Lexicography. Computational Linguistics, 16:22-29. 

Copestake, Ann, Ted Briscoe, P. Vossen, A. Ageno, 
I. Cast.ellon, F. Ribas, G. Rigau, H. Rodrlguez, 
and A. Samiotou. 1995. Acquisition of Lexical Translation Relations from MRDS. Machine 
Translation, 9:183-219. 

DiEugenio, Barbara and Bonnie Lynn Webher. 
1996. Pragmatic Overloading in Natural Lan- 
guage Instructions. International Journal of Ex- 
pert Systems, 9(1):53-84. 

Dorr, Bonnie J. To appear. Large-Scale Dictio- 
nary Construction for Foreign Language Tutoring 
and Interlingual Machine Translation. Machine 
Translation, 12(1). 

Dorr, Bonnie J., .Joseph Garman, and Amy Wein- 
berg. 1995. From Syntactic Encodings to The- 
matic Roles: Building Lexical Entries for Interlin- 
gum MT. Machine Translation, 9:71-100. 

Dorr, Bonnie J. and Douglas Jones. 1996. Role 
of Word Sense Disarnbiguation in Lexical Acquisition: Predicting Semantics from Syntactic 
Cues. In Proceedings of the International Conference on Computational Linguistics, pages 322-333, Copenhagen, Denmark. 

Dowty, David. 1991. The Effects of Aspectual Class 
on the Temporal Structure of Discourse: Seman- 
tics or Pragmatics? Language, 67:547 619. 
Filhnore, Charles. 1968. The Case for Case. In 
E. Bach and R. Harms, editors, Universals in 
Linguislic Theory. Holt., Rinehart, and Winston, 
pages 1-88. 

Foley, William A. and Robert D. Van Valin. 1984. 
Functional Syntax and Universal Grammar. Cam- 
bridge University Press, Cambridge. 

Grimshaw, Jane. 1990. Argument Structure. MIT 
Press. Cambridge, MA. 

Grilnshaw, Jane. 1993. Semantic Structure 
and Semantic Content in Lexical Representa- 
tion. unpublished ms., Rutgers University, New 
Brunswick, NJ. 

Gruber, Jeffrey S. 196.5. Studies in Le~:ical Rela- 
tim~s. Ph.D. thesis, MIT, Cambridge, MA. 

Hale, Ken and Samuel J. Keyser. 1993. On Argu- 
ment Structure and Lexical Expression of Syntac- 
tic Relations. In Ken Hale and Samuel J. Keyser, 
editors, The View from Building 20: Essays in 
Honor of Sylvain Bromberger. MIT Press, Canl- 
bridge, MA. 

Haller, Susan. 1996. Planning Text About Plans In- 
teractively. International JourTml of Expert ,C;ys- 
terns, 9(1):85-112. 

Holland, Melissa. 1994. Intelligent Tutors for For- 
eign Languages: How Parsers and Lexical Se- 
mantics can Help Learners and Assess Learning. 
In Proceedings of the Educational Testing Service 
Conference on Natural Language Processing Tech- 
niques and Technology i7~ Assessment and Educa- 
tion, Princeton, NJ: ETS. 

Jackendoff, Ray. 1983. Semantics and Cognition. 
MIT Press, Cambridge, MA. 

Jackendoff, Ray. 1990. Semantic Structures. MIT 
Press, Cambridge, MA. 

Jackendoff, Ray. 1996. The Proper Treatment of 
Measuring Out, Telicity, and Perhaps Even Quan- 
tification in English. Natural Language and Lin- 
guistic Theory, 14:305-354. 

Klavans, Judith L. and Evelynne Tzoukernaann. 
1995. Dictionaries and Corpora: Combining Cor- 
pus and Machine-Readable Dictionary Data for 
Building Bilingual Lexicons. Machine Transla- 
tion, 10:185-218. 

Levin, Beth. 1993. English Verb Classes and Alter- 
nations: A Preliminary Investigatiom Chicago, 
IL. 

Levin, Beth and Malka Rappaport Hovav. To ap- 
pear. Building Verb Meanings. In M. Butt and 
W. Gauder, editors, Th.e Projection of Argum.ents: 
Lexical and Syntactic Constraints. CSLI. 

Lonsdale, Deryle, Teruko Mitalnura, and Eric Ny- 
berg. 1995. Acquisition of Large Lexicons for 
Practical Knowledge-Based MT. Machine Trans- 
lation, 9:251-283. 

Pesetsky, David. 1982. Paths and Categories. Ph.D. 
thesis, MIT, Cambridge, MA. 

Pinker, Steven. 1989. Learn.ability aT~d Cognition: 
The Acquisition of Argument Structure. MIT 
Press, Cambridge, MA. 

Sams, Michelle. 1993. An Intelligent Foreign Lan- 
guage Tutor Incorporating Natural Language Pro- 
cessing. In Proceedings of (.'onfereT~ce on h~telli- 
gent Computer-Aided Training and Virtual Envi- 
ronmeT~t Technology, NASA: Houston, TX. 

Tahny, Leonard. 1985. Lexicalization Patterns: Se- 
mantic Structure in Lexical Forms. In T. Shopen, 
editor, Language Typology and Syntactic Descrip- 
tion 3: Grammatical Categories and the Lexicon. 
University Press, Cambridge, England, pages 57-149. 

Traum, David R., Lenhart K. Schu- 
bert, Nathaniel G. Martin, Chung Hee Hwang, Pe- 
ter Heeman, George Ferguson, James Allen, Mas- 
simo Poesio, and Marc Light. 1996. Knowledge 
Representation in the TRAINS-93 Conversation 
System. International Journal of Expert Systems, 
9(1):173-223. 

Weinberg, Amy, Joseph Garman, Jeffery Martin, 
and Paola Merlo. 1995. Principle-Based Parser 
for Foreign Language Training in German and 
Arabic. In Melissa Holland, Jonathan Kaplan, 
and Michelle Sams, editors, Intelligent Language 
Tutors: Th.eory Shaping Technology. Lawrence 
Erlbaum Associates, Hillsdale, NJ. 

Wu, D. and X. Xia. 1995. Large-Scale Automatic 
Extraction of an English-Chinese Translation Lex- 
icon. Machine Translation, 9:285-313. 
