The BICORD System 
Combining Lexical Information from Bilingual Corpora 
and Machine Readable Dictionaries ~ 
Judith Klavans Evelyne Tzoukcrmann 
IBM T.J. Watson Research A.T. & T., Bell Laboratories, 
Yorktown Heights, N.Y. 10532 Murray tlill, New Jersey 07974 
ABSTRACT 
Our goal is to explore methods for combining 
structured but incomplete information from dic- 
tionaries with the unstructured but more complete 
information available in corpora for the creation of 
a bilingual lexical data base. This paper concen- 
trates on the class of action verbs of movement, and 
builds on earlier work on lexical correspondences 
between languages and specific to this verb class. 
The languages we explore here are English and 
French. We first examine the way prototypical 
verbs of movement are translated in the Collins- 
Robert (Collins 1978, henceforth CR) bilingual 
dictionary. We then analyze the behavior of some 
of these verbs in a large bilingual corpus. We take 
advantage of the results of linguistic research on 
verb types (e.g. Levin, to appear) coupled with data 
from machine readable dictionaries to motivate 
corpus-based text analysis for the purpose of estab- 
fishing lexical correspondences with the full range 
of associated translations and then attach frequen- 
cies to translations. 
1. Background. As NLP systems become more ro- 
bust, large lexicons are required, providing a wide 
range of information including syntactic, semantic, 
pragmatic, naorphological and phonological. There 
are difficulties in constructing these large lexicons, 
first in their design, and then in providing them with 
the necessary and sufficient data. These problems 
have recently been the topic of intense research 
(Klavans 1988, Boguraev and Briscoe 1989, 
Boguraev et al. 1989, Zemick 1990). Moreover, an 
important sub-area of computational lexicon build- 
ing that has barely been approached is that of bi- 
lingual lexicon construction (Caholari and Picchi 
1986, Rizk 1989). 
2. Motion Verbs. In this paper, we report on data 
for movement verbs (or motion verbs). The class 
of English motion verbs and their translations into 
Romance languages has been widely discussed from 
various points of view including theoretical, struc- 
tural (Talmy 1985), and applied (Atkins et al. 1990, 
in preparation). English generally incorporates 
movement and cause or manner into a single lexical 
item whereas languages like French do not. For 
example, in CR stroll is translated as %6 promener 
nonchalamment', 'fl/mer' and stroll in/out etc. as 
'entrer/sortir/s'floigner sans se presser' or 
'nonchalammcnt'. Notice that in French, the 
translation typically consists of a general motion 
verb 'entrer/sortir/aUer/avancer' with an adverbial 
or prepositional modifier showin\[, manner, e.g. 
'nonchalammcnt' or 'sans se presser'. Similarly, in 
English, causation in movement is often incorpo- 
rated, e.g the Fmglish verb march as in to march the 
troops is translated in CR as 'faire marcher (au pas) 
les troupes'. These multi-word correspondences 
often cause problcms in the lexical transfer compo- 
nent of machine translation systems. 
3. Bilingual Corpus-based Analysis. In earlier work 
(Klavans and Tzoukermann 1989), we reported on 
a study of a scleclcd sub-set of movement verbs ha 
a bilingual corpus. The corpus consists of 85 
million English and 95 million French words from 
the Canadian Parliamentary Proceedings (the 
ttansard corpus). Of this, 75 million French and 
70 milfion I;nglish words are aligned by sentence 
(Brown ctal. 1988). For example: 
SENTENCE ~: 3S7748 
The a.~,assador's con~rlbu~ionwas one small 
parly a~ which a r'mu~er of us ended up 
dancing on a lable. 
L'appor~ de l'ambassadeur s'es~ resume 
a une petite f;~e ou nous avons fini 
par danser sup une table. 
Figure One : Sample Ci~alion 
Some rcprescntalive verbs which have at least one 
movement sense were selected. We compared the 
extent of the information found in the bilingual 
corpus with the information found in the CR 
machine-readable dictionary (MRD). For verbs like 
commute which do not have a straightforward 
translation, we found either (I) all the components 
of the verb concept, as in 'se rendre au travail 
quotidiennement'; (2) parts of the translation, as in 
'faire le trajet'; or (3) a totally different verb from 
that given in the MRD, such as 'parcourir' or 
'voyager'. 
Wc observed that, not only was the MRD 
informalion incomplete, but also only a partial ex- 
This work was completed at IBM, T.J. Watson Research, although the second author is currently at A.T. 
& T., Bell Laboratories. $, 
174 1 
pression of the typical meaning of the verb was 
provided. In the past, since printed dictionaries 
have been subject to the constraints of time and 
space, they have not always been able to offer full 
information about entries, ltowever, with electronic 
dictionaries and lexical data bases, this should no 
longer be a restriction. In fact, given more and 
richer information, we envision a move away from 
the flat tfierarchieal structure of dictionaries to a 
more network-like representation of lexical know- 
ledge. 
4. Rclate~l R~earch. Combining linguistic and sta- 
tistical methods is becoming increasingly popular in 
computational linguistics especially as more corpora 
become available. 2 Work in this vein ranges from 
the syntactic and semantic to the lexical. For ex- 
ample, Atkins 1987 demonstrates convincingly that 
with corpus data, the lexicographer can attack the 
difficult problem of word senses in a systematic way. 
Church and ttanks 1989 and Church et al. 1990 
develop a battery of statistical methods to induce 
linguistic regularities. They identify coocurrence 
relations by computing statistics (e.g. by use of 
mutual information, t-score) over millions of words 
of text. Their approach is focussed on monolingual 
rather than bilingual corpus analysis, and constitutes 
a significant contribution to lexical research. On 
more syntactic note, Dagan and Itai 1990 use sta- 
tistical methods over linguistically parsed text 
(Jensen 1986) to resolve anaphorie reference. 
In the arena of automatic bilingual lexicon 
construction, Catizone el: al. 1989 take two corre- 
sponding texts (English and German) and develop 
aigoritluns to deternffne lexical alignments by using 
statistical methods over texts combined with the 
optional support of an MRD. In contrast, Sadler 
1989 proposes parsing aligned corpora into depend- 
ency trees, which form the structures upon which 
lexieal correspondences are suggested to the user. 
The early stages of the construction of the Bilingual 
Knowledge Base (BKB) rely heavily on human in- 
put but gradually becomes more automatic as data 
is collected. Using purely statistical techniques, 
Brown et al. 1988 make use of the Itansard bilingual 
corpus for the purpose of building a machine trans- 
lation system. Such a system is a good example of 
using exclusively statistical non-linguistic methods 
to induce translations. 
5. The BICORD System - Bilingual Corpus-based 
Dictionary. Our approach involves a combination 
of standard linguistic methodology using MRD's, 
enhanced with some statistical techniques. Dic- 
tionaries are often discounted because they are built 
on basis of introspective intuition rather than purely 
on objective observation of data. ttowever, our 
underlying assumption is that the insights that a 
dictkmary encodes and represents should not be 
disregarded (although there are some limitations re- 
suiting from the structural organisation). \]'his is a 
controversial assumption. Even though, in the past, 
dictionaries havc been built solely on the basis of 
intuition, current trends are to use corpus-driven 
criteria, as, for example, in the Collins COBUILD 
dictionary (1987). Without question tiffs is a step 
in the riglat direction towards completness and ac- 
curacy of coverage of the language as it actually oc- 
curs. Itowever, the limitation of corpus analysis is 
that subtle linguistic inluitions about word behavior 
(such as "negative evidence") cannot be obtained 
from corpora; in other words, what is disallowed in 
the language may never be discovered. Tiros we 
disagree with the claim of Garside, Leech, and 
Sampson 1987 that the survival of both descriptive 
and theoretical computational linguistics lies prima- 
rily in statistical analysis. We take the more mod- 
erate view that both approaches (linguistic and 
statistical) are essential if the language is to be 
characterized accurately and in its entirety. 
We extracted occurrences of several move- 
ment verbs (called "probe" strings) from the English 
side of the I lansard corpus. The criteria used to 
ensure that the verb was a member of tiffs semantic 
class is described in Atkins, Boguraev and Klavans 
1990 (in preparation). The test set of verbs was 
drift, dance, commute, emigrate, immigrate, ascend, 
descend, circle, sail and glide. The probe string was 
used to search in CR; both for translations and 
collocations under the entry itself, and also for 
French headwords in the French side of the dic- 
tionary with the probe as a translation. The ex- 
tracted corpora, consisting of the set of English 
citations containing the probe string (ha any 
morphological shape) and the corresponding French 
sentence, is called a "probe corpus". A statistical 
tagger (Tzoukermann and Merialdo 1989) was used 
to assign a part of speech to the English side of the 
corpora. Translations and collocations were ab- 
stracted automatically from the parsed version of 
CR (see Neff and Boguraev 1989) using LQL (Neff 
et al. 1988). For illustration, a partial entry for 
dance is: 
+-bdw: dance 
+-superhom 
+-Im~og~aph 
+-homr.Jm: 2 
+-pos : v~ 
+-Iransla~ 
I +-argxJlm~r1~:: leal~z e~c 
I +-+*ord: danser 
J .o, 
:t For example, the ACL Data Collection Initiative (ACL/DCI) coordinated by Dr. Mark l.iberman at A.T.& T. Bell 
Laboratories was established to make corpora of all shapes and sizes mole widely ,~vailable to the research community. 
2 175 
+-ho.~raph 
I ÷-homnum: s 
I +-pos: vi 
I ÷--sensl~ 
I.o+ 
+-collocat 
I ÷-srcnote: fig 
I +-source: to darce in/out elc 
I +-target: entrer/sor~ir etc joyeusefaent 
I 
+-colloca( 
I ÷-SOUrCe: 
I ÷-source : 
1 +-targot : 
I +-target: 
I 
÷-collocat 
÷-sOUrca : 
I +-4:arget : 
I l,., 
to dance about 
to dance up and down 
gambadar 
saul i 1 ler 
the child dano.d I~ay /or/ off J • • 
1'enfant s'es~ elo,gne 
on gambadant /or/ensautillan~ 
Figure Two: Partial HRD entry for dance 
Also, the French words 'gambiller' and 'guincher' 
have dance as a translation. Probes had a maximum 
of 1 t46 citations, with a maximum of 25 senses and 
collocations in CR (a rough measure of polysemy). 
The tagger used to preprocess the corpus was 
trained on 1 million words (about 42,000 sentences) 
tagged manually and provided by the tree bank of 
Lancaster University (Garside, Leech, and Sampson 
1987). Our version has 81 tags, a subset of the tree 
bank tags. Of these tags, 52 are categorial (such as 
VV+I for infmitival form of a non-auxiliary verb) 
and 29 are lexically bound, some of the latter being 
bound to a class of one (e.g. I0' is for the preposi- 
tion of), and some are bound to a small sub-class 
of category (such as PP*S for "personal pronoun 
subject"). Some tags (such as N+I "singular noun") 
provide morphological information, as well as cate- 
gorial. The program, based on a tfigram model, 
computes the probability of a word in relation to its 
tag and assigns the tag that corresponds to the 
highest likelihood. In its simplest form: 
p(TIH) = p(HITI m piT) 
that is, the probability of a tag given its word cor- 
responds to the product of the probability of ob- 
serving the word given its tag by the probability of 
observing the tag. By random sampling, we deter- 
mined the error rate for part of speech tagging to 
be about 3%. 
In this way, examples of sample strings as a 
verb were separated from the nominal uses. This is 
the first step in disambiguation, enabling lexical 
correspondences. To give an idea of size, there were 
293 citations (about 12,000 words) with the string 
dance in its four morphological forms in English. 
The distribution by part of speech for these citations 
is: 
Category Citations Z 
VERB 109 37 
NOUN 174 59 
AOJ 10 3 
The distribution varies by probe; for example, of 
the 34024 citations for the string "move" (and its 
variants), 26218 usages were labelled as verbs 
(77%), 7412 as nouns (22%), and 394 (11%) as 
adjectival. Some illustrative fragments tor dance 
ale: 
we are dancing upxm eggshells... 
PP~S VBRW W(;Iw Z~ Nw2 
the politician ~o liked to dance... 
AT++ Nw1 P~l WPAST~ TOw WI++ 
...Russian people dancing rather tlaan fighting. 
Jw Nw1 W(;1w RWR IW WGIw 
Data from CR are utilized to drive our first 
pass at filtering out pre-linked pairs common to 
both data resources. Citations that have lexical 
correspondences already provided by the machine- 
readable dictionary are extracted from the probe 
corpus. For example, consider again the verb 
dance. Thc character strings in the translation and 
collocation fields are extracted from CR; these 
strings arc filtered to remove function words and 
some common words (such as 'faire' (to make or 
do), morphological variants are generated. Some 
examples for dance are 'danser/dansa/dansera ..., 
gambader/gambadont .... ' Probe translations and 
collocations from CR are then ready to be used to 
automatically match stmlgs in the French side of the 
corpus. Each correspondence that matches one of 
the MRD probes is removed from the probe corpus, 
stored, and counted, leaving a reduced probe cor- 
pus. For example, for 109 citations of dance as a 
verb, 52 sentences matched the MRD correspond- 
ences, as shown in Figure One. An extended lexi- 
con can then be built, using the structure already 
provided by CR where the frequencies are com- 
puted over these matches. For example, an initial 
partial enhanced entry for dance is: 
+-hda: dance 
÷-superhom 
I.,. +-homograph 
I +-ho.r.~: 1 
I +-pos: v 
I .I.-sense 
÷-c_(rans le~ 
+-i+ord: danser 
I +-inflect: inf 
I ÷-freq: 4c+Y. 
÷-word: clanser 
I ÷-inflect: pas% 
I +-freq: 17Z 
÷-word: danser 
I ÷-inflect: fur 
I +-freq: 5Z 
I,.. 
I,., 
+-ho~er.ph 
I +-homum: Z 
I +-pos: vt 
I ÷ - Sl~"lse 
+-d_(r~ns la( 
176 3 
I *-,~r!~umen~ : w~liz etc 
I +-~rd: danser. 
I +-pos: vi 
*-d .~ranslat 
+-,~ontex~ : person 
e-context: leaves in wiwJ 
+-~ontext : boat on waves 
+-,¢,'ontext : eyes 
÷-~n>rd: danser 
+-dc;olloca't 
I +-srcnote: fig 
I *-,,~ourc~: ~o dar~e in/eut ere 
I +-~:~:~rgot: mtrer/sortir e~e joym~s~nt 
I 
+ -d ,(:o 1 loea t 
I +-sc~Iroe: tO dance al~ut 
I *-~;ource: '~0 dative up mn£l 
I 
I +-~:~rgt~t: ~t~ader 
I .r .-c_co I loca 
I ÷-source: to dartce aroucw\] 
I *-inflect: present 
I *-'f req : 2Z 
1 
I 4-target: sautiller 
I +-o collocat 
I *-source: ~o dat~o rot~ 
I ~-inflec~ : past 
I ÷-freq : 27. 
I 
+-d_colloca ~ 
I +-source: tlm child danced ~ray /or/ off 
I *-tarot: l'enfant s'esl Gloigr~ 
I on g-ambadan~ ~or/ ~ sautillsnt 
Figure flares: Partial Er~armod Entry 
Notice that dictionary nodes are now identified with 
a prefix "d', and corpus motivated nodes with "c_" 
New information is placed at the relevant node, low 
in the tree if there is no ambiguity of attachment or 
scope, and higher in the tree if necessary until evi- 
dence is found to permit the information to be 
moved down in the structure. For example, an ad- 
ditional node is added to the MRD structure to in- 
sort danser since danser is a translation both in 
homograph 2 and in homograph 3. Since transitiv- 
ity of a verb cannot be determined automatically, 
there is no evidence to rnotivate placement so the 
data is inserted high in the tree, at the homograph 
level. In contrast, 'gambader' and 'sautiUer' m'e al- 
ways intransitive (as determined by a look-up in 
CR), so they can be automatically placed under 
homograph three. Notice also that corpus derived 
information is placed under the relevant d_collocat 
|or 'gambadcr' and 'sautiller' since these are cases 
where matches occun'ed on the target term, but the 
source is different. 
The \]lansard, being the Canadian Parliamen- 
tary proceedings, contains a number of juridical and 
parliamentary terms, usages, and structures, a typi- 
cal feature of any sublanguage. However the tlexi~ 
bility inherent in the BICORD system woukt allow 
a repetition of the sarne process over different sub.. 
languages. As other texts are used, frequencies can 
be updaled in two ways, by counting all tiequencies 
into a general score, and also by keeping separate 
li'equencies linked to the source text. This feature 
allows a representation of the lexical correspond- 
ences of general and specific texts in one data struco 
lure. It also permits comparison between 
sublanguages. The result would be a balanced lexio 
con built over a balanced variety of corpora to re- 
|lect the actual uses of the words or phrases in 
context. 
Further analysis of the remaining probe cor- 
pus is pursued by observing cooccurences both over 
tags and lexical items. For example, with dance, 
looking at immediate right context over tags reveals 
verb-prep patterns: 
VERB CATEGORY % 
clarke pr~p 77 
darme otl-mr 22 
Moving from tag cooecurences to lexical items, the 
majority of these cases are for the preposition to. 
Including coocurrences over a larger window of five 
words, idioms are revealed like dance to ... tune, 
which is not found in CR, either under tune or 
dance. These and other patterns cma be discovered 
by statistical analysis over tags and lexical items it\] 
the reduced probe corpora. Therefore, a new set of 
collocations can be inserted in the lexicon; an entry 
for "dance" enhanced furl.her is shown as follows: 
+-h<~. : dance 
I 
+-superhom 
I°,, 
-homograph 
+ -h~m.,-~ : \] 
*-pos ' V 
* - scBnse 
+-c_t ransla~t 
+=word: danser 
I *-inflect: inf 
I *-freq: 44Z 
+-word: danser 
*-inflect: past 
I +=freq: 17Z 
*-word: danser 
I *-inflect: fur 
I +-freq: 57. 
*-ho~og~.ph 
*-homnum: 2 
+-pos : v~ 
I 
-sense 
I 
+-d transla~ 
I *-argument : waltz ere 
I +-word: danser 
*-~graph 
I +-homnu~: 3 
I *-pos: v i 
+-sense 
I 
4 177 
+-d translat 
+-context: person 
+-~:mte×t: leaves in wind 
+-context: heat on waves 
+-context: eyes 
+-word: danser 
+-d_collocat 
I +-srcnote: fig 
I +-source: to dance in/out ere 
I +-target: entrer/sortir ere joyeusement 
I 
+-d collocat 
I +-sourem: to dance ,~bout 
I +-source: to danc+ up and do~ 
I 
I +-tarot: ~mbader 
I +-c_eolloeat 
I +-source.+ to da.ee eround 
I +-inflect; present 
I +-freq , 2Z 
I 
I +-target: sBtutiller 
I +-c_.colloca% 
l +-source: to dance round 
I ~-inflect : past 
I +-frmq : Z% 
I 
+-c_colloca t 
+-source: to dance to 
+-argument: (~he) t~ loll 
+-freq : llZ 
+-target : se mettre ~u diapason 
+-target : com~pl~er io qua~uor 
+-o_eollocat 
I +-source: to dance around 
+-freq : 8Z 
+-~arge% : ~ourner au~our du po~ 
+-target : aller et venir 
I,.. 
I 
I 
+-d_colloeat 
I +-source: th~ child danced away /or/ off 
l +-target: l'mnfan~ s'es~ +loi~ 
I en gambadan{ /or/ en saulillant 
I.°, 
conversely, to enhance a statistical system with data 
from an Mill.). The first application can be viewed 
in the light of a lexicographer's workstation; it can 
also be viewed as a contribution to the choice of 
lexical item made by the component responsible for 
lexical transfcr in a machine translation system. 
Translations and collocations in the original MRD 
are ordered by frequency, orderings which can easily 
be updated depending on the sub-language corpus. 
The enhanced MRD is more complete in containing 
correspondences not found in the original diction- 
ary, and in suggesting new statistically significant 
translations. As for the second type of application, 
systems such as described in Brown et al. 1988 
which use purely statistical approaches to infer 
translations from a bilingual corpus can benefit di~ 
rectly from the information already given in the 
MRD. This information can be used to preset val- 
ues in the computation of correspondences, rather 
than letting the system learn values "already discov- 
ered. 
Future work depends on testing these two 
applications, namely that MRD-based lexieal trans- 
fer will proceed more accurately given statistical in- 
formation and that statistical implementations, 
given enhanced Mill) data, will demonstrate im- 
proved perlormance in determining lexical corre- 
spondences. 
Acknowledgements: We thank members of the Speech 
Recognition (;roup at IBM for cleaning and maintaining 
the I lansard corpus. In particular, we acknowledge help 
from Bernard Merialdo. 
References 
Figure Four: Fuller Enhar~ed Entry 
It is not always the case that the remaining corpus 
data can be easily inserted in the lexicon and in fact, 
we encountered a few problems during this process. 
First, it is not straightforward to ~aow with which 
field to associate the resulting correspondences. For 
example, in dance, does dance around go under a 
separate translation field or is it related to the 
collocation field with dance about? Second, some 
new context fields should be added to the 
collocation nodes, but determining the criteria tbr 
selecting them automatically is not always evident. 
Further, there is a question of locating and inte- 
grating robust new data from the corpus into the 
already existing structure. 
6. Applications and Future Plans. A system such 
as BICORD can be used in two complementary 
ways: to enhance an MRD with statistical data and, 

References

1. Atkins B. T., (1987) "Semantic ID "Fags: corpus ev- 
idence Ibr dictionary senses", In Proceedings of the 
Third Annual Conference of the University of 
Waterloo (;entre for the New Oxford English Dic- 
tionary 7'he Uses of Large Text Databases , 
Walerloo, Canada, pp. 17-36. 

2. Atkins, IUI'.S., B. Boguraev and J. L. Klavans 
(199(\], in preparation) "From Machine-Readable 
Diclionarles to a Lexical Knowledge Base: a Dis- 
cussion of Some Issues with Particular Reference to 
Verbs of Motion", in J. Pustejovsky ted.), Semantics 
in the Lexicon, Kluwer, Dordrecht. 

3. Boguraev, Bran and Ted Briseoe (1989) Computa- 
tional Lexicography for Natural Language Process- 
ing, I,ongman : London. 

4. Boguraev, Branimir, Byrd, Roy, Klavans, Judith, 
and Neff, Mary (1989, to appear) "From Structural 
Analysis of Lexical Resources to Semantics in a 
Lexical Knowledge Base", paper presented at 
|JCAI, Io appear as a chapter in Lexical AcquJsi- 
tion: Using on-line Resources to Build a Lexicon, 
MIT Press, Uri Zernik, editor. 

5. Brown, P.,J. Cooke, S. Della Pietra, V. Della Pietra, 
F. Jelinek, R. Mercer, and P. Roossin (1988) ~A 
Statistical Approach to Language Translation". 4th 
Conference on Computational Linguistics, Coling , 
Budapest, Hungary. 

6. Calzolari, N and E Picchi (1986) "A Project for a 
Bilingual Lexical Database System", Advances in 
Lexicology, Second Annual Conference of the UW 
Centre for the New Oxford English Dictionary, 
79--92. 

7. Catizone, Robert, Graham Russell, and Susan 
Warwick (1989) "Deriving Translation Data from 
Bilingual TexC, unpublished ms. , ISSCO, Geneva, 
Switzerland. 

8. Church K. and P. Hanks (1989) "~¢ord Association 
Norms, Mutual Information and Lexicography", 
Proceedings of the Association for Computational 
Linguistics, Vancouver, Canada. 

9. Church, K., W, Gale, P. Hanks, D. Hindle 0990, 
to appear) ~Parsing, Word Associations, and Typi- 
cal Predicate-Argument Relations', in Zernik ed.. 

10. Collins Cobuild English Language Dictionary 
(1987), John Sinclair, ed. Collins Publishers: 
London. 

11. Cotlins. 1978. Collins Robert French Dictionary: 
French-English. English-French. Collins Publishers: 
London. 

12. Dagan, Ido and Alon ltai (1990) "Automatic Ac- 
quisition of Constraints for the R~olution of 
Anaphora Reference and Syntactic Ambiguities" 
unpublished ms., Computer Science Department, 
Technion, tlaifa, Israel. 

13. Garside, R., G. Leech, and G. Sampson, eds. (1987) 
Computational Analysis of English: a corpus-bc~ed 
approach Longman : London and New York. 

14. Jensen, Karen (1986) "PEG 1986: A Broad- 
coverage Computational Syntax of English," Un- 
published paper. IBM Research: Yorktown 
Heighls, New York. 

15. Klavans, J. L. (1988) "COMPLEX: A Computa- 
tional Lexicon for Natural Language Systems', 
Proceedings of the 12th International Conference 
on Computational Linguistics. Budapest, ! lungary. 

16. Ktavans, Judith and Eve|yne Tzoukermann (1989) 
"Corpus-based Lexical Acquisition for Translation 
Systems" Proceedings of the Sixth Israeli Confer- 
ence of Artificial Intelligence and Computer Vision, 
741 Aviv, Israel.. 

17. Levin, Beth. (to appear) "The Representation of 
Semantic |nlbrmation in the Lexicon," in D. 
Walker, A. Zampolli, N. Calzolari, eds., Automat° 
ing the Lexicon -- Research and Practice in a 
Multih'ngual Environment. Cambridge, England: 
Cambridge University Press. 

18. Neff; M. S., R. J. Byrd, and O. A. Rizk (1988) 
~Creating and Querying llierarchical Lexicat Data 
Bases," Proceedings of the Second ACL Conference 
on Applied NLP, Austin, Texas, 84-92. 

19. Neff, M. and B. Boguraev (1989) "Dictionaries, 
Dictionary Grammars and Dictionary Entry 
Parsing", Proceedings of the 27th Annual Meeting 
of the Association for Computational Linguistics, 
Vancouver, British Columbia, 91--101. 

20. Rizk, O. (1989) "Sense DisambigualJon of Word 
Translations in Bilingual Dictionaries: Trying to 
Solve the Mapping Problem AutomaticallyL Mas- 
ter's Thesis, Courant Institute of Mathematical Sci- 
ences, New York University, N.Y. 

21. Sadler, Victor (1989) "The Bilingual Knowledge 
Bank: A New conceptual basis for MT ~ unpub- 
lished paper, BSO/Research, Utrecht. 

22. 'l'almy, l.eonard (1985) *Lexicalization Patterns: 
Semantic Structure in Lexical Forms ~, in T. 
Shopen, cd., Language 7}pology and Syntactic De- 
scription: Grammatical categories and the Lexicon. 
Cambridge University Press: Cambridge, England. 

23. Tzoukermann, Evelyne and Bernard Merialdo. 
1989. "Some Statistical Approaches for Tagging 
Unrestricted Text', unpublished ms., IBM, 7. J. 
Walson Research Center, Yorktown I leights, New 
York, 1\[\]532. 

24. Zernik, Uri (1990, Io appear) Lexical Acquisition: 
Using on-line Resources to Build a Lexicon, 
Lawrence F, rlbaum Associates Incorporated: 
Ilillsdalc, New Jersey, 
