On Some Aspects of Le rtcal Standardtzauon 
On Some Aspects of Lexical Standardization 
R6mt Zajac 
Computmg Research Laboratory, New Mexico State Umverstty 
zajac@crl nmsu edu 
In developing and using many large mult~-hngual multt-purpose lexicons at CRL, we ~denttfied three dlstmct 
problem areas (1) an appropriate meta-language (formahsm) tot representing and processing lex~cal knowledge 
(2) a standard generic lex~cal framework defimng a common lex~cal entry structure (names ot features and types 
ot content), and (3) shared umversal hngu~st~c types In th~s paper, we present the solutions developed at CRL 
addressing d~mens~ons 1 and 2, and we mention the on-going research addressing dlmens~on 3 
1 Introduction 
We envisage the standardlzauon of lexlcal resources as a three dlmenslonal process In developing, processing and using 
large multl-hngual and multi-purpose lexicons, a first set of dlfficult~es lies m the lack of a standard format that is 
flexible enough to cover many different languages and apphcaUons, but sufficiently ngld to enable the use ot a single 
lex~cal toolset shared across all these languages and apphcatlons A standard formahsm for encoding lex~cal knowledge 
enables the construction of a generic lexical toolset SGML has been used for example for pnnted dlct~onartes For 
computational dictionaries, a good alternative are feature structures (V6roms & Ide 92, Ide & V6roms 95) The second 
set of problems ts almost as acute as the first tt ~s very difficult to design a sound lex~cal architecture, hst the all the 
features that must be present for a variety of NLP apphcauons, predict the interaction between the various sub- 
structures, and predict the needs of the various NLP tools that would be accessing the dlcuonary A standard lextcal 
entry structure which defines the various features and provtde guldehnes to fill these features ~s a must for dictionary 
budders Thts level has been addressed for example m the Eagles program (Eagles 93) where it ts somettmes mtxed wtth 
the third dimension Finally, the problem of hngmstlc standards per se is addressed only partmlly by the definmons of 
gmdehnes and the use ot a standard lexlcal entry structme In a multdmgual setting, it is probably possible to dehne 
multdmgual types, such as a standard hst of part-of-speech However, this direction is stall very much a tesearc.h area 
related to the quest for a umversal grammar (see e g Cahdl & Gazdar 95, 96) Current standard~zatlon ettolts such as 
Eagles define standards for content for particular languages only 
In Section 2, we present a generic lex~cal architecture that addresses point one the generic structure ot lex~eal ent~es 
and d~cuonar~es, notions of lex~cal schema and meta-schema, and the generic lex~cal toolset Secuon 3 presents the 
standard structure of lex~cal entries that ~s used m structuring a number of computational d~ct~onar~es at CRL The 
standard structure Is layered so that a particular dictionary could implement a sub-set of the layers only, whde still 
implementing the standard Furthermore, the structure ~s flexible enough so that a given layer can be extended {by 
adding new elements through an inheritance mechamsm) for a partlcular language, but forbids the redefinmon of the 
lexlcal meta-structure Section 4 mention open problems and on-going research on the topic of a umversal lex~s and a 
parameter-based approach to the acqmslt~on of a lex~cal profile 
2 A Generic Lexical Architecture 
To support the development of lmge lexicon, we ~mplemented a Lex~cal Knowledge Base (LKB) called Habanera 
(Zajac 97) A Habanera LKB ~s composed of (1) several monohngual d~ct~onanes, (2) translation relations hnkmg 
these monohngual d~ct~onanes, and (3) a multdmgual d~ct~onary schema that defines a shared multdmgual inheritance 
h~erarchy of lex~cal types for all monohngual d~ct~onanes 
The system supports a variety of hngmstlc architectures Since the design of a lextcal architecture is a complex task, 
flex~bd~ty m des~gmng the structure of the LKB ~s an essentml feature Th~s flex~bdtty ~s provided by allowing ~o~ a 
multi-layered LKB schema m which each layer provides addmonal constrmnts on the structure of a lex~cal entry Thts 
approach ~s congruent w~th the d~stmct~on made m (Eagles 93) between meta-schemata, schemata and instances Thts 
38 
On Some Aapects of Lertcal Standm dtzatton 
constraint also means that the system ~s theory-neutral one can use the LKB to store LFG, HPSG, or any kind ot lex~cal 
data 
HTML Templates 
Checker 
LKB 
• T),pe Defimttons 
Schema 
Acquisition Tools 
Data ~ Browser ~ HTML Template~ 
Compder ) 
Figure 1 Habanera Architecture 
These reqmrements mouvated various tmttal chotces for the design and the ~mplementaUon of the system We use the 
Text Encoding Imuatlve (TEI) definmon for printed dicuonarles (Sperberg-McQueen & Burnard 94, Chap 12) as a 
source of respiration for the defimtlon of a standard dlcuonary entry structure (definition of the 'meta-schema' in Eagles 
terminology) However, lexlcal entries are encoded as Typed Feature Structures (TFS) which Is our primary descnptwe 
dewce for encoding lexlcal data Typed Feature Structures prowde a declarauve formalism with a well-defined formal 
semanucs (and associated operations umficatton and subsumptlon) which we use instead of SGML to encode lex~cal 
entries A set of type defimtlons specifies what constitute vahd lexlcal entries and play a role similar to a DTD in 
SGML A type definmon specifies the set of features and restrictions on values for types Most of the lexlcal tools are 
parametnzed by the type defimtlons which are part ot" the LKB schema 
Multlhngual dictionaries are orgamzed as a set of monohngual d~ctionanes plus translation relations between entries In 
the case of Knowledge-Based Machine Translation, relations are also defined between word senses and ontologtcal 
concepts Dictionaries and lexlcal entries are stored In a a commercial DBMS which allow concurrent access to a 
dictionary, an important conslderaUon when a dictionary Is developed by a team ol- lexicographers In the database, the 
format of stored data is independent of the external representation formahsm All strings are encoded using Umcode 
and we use UTF-8 for file exchange (Import/export functions) 
The system is designed to facd~tate acqmsmon as well as exploitation ot lexlcal resources Acqulsmon tools ale 
implemented using HTML forms for the acquisition lnterl-aces and additional integrated utd~tles for chef.king the 
correctness of entries, for transcriptions, etc These tools are patametrlzed by resources (e g, HTML templates, 
grammars for transcriptions) that are loaded at runtlme A dlctlonary can be accessed mteractwely through an HTML 
browser (also parametnzed by a set of HTML templates) Natural Language Processing tools such as parser do not 
access the database Instead, a dlcuonary ~s compded tn a compact binary format that allows fast lunt~me access to 
entries The dlcuonary compder can build several indexes to look-up entrtes m the compiled dictionary Runttme 
indexes are compressed tries that provide random access to a compact binary dtcttonary file 
2.1 Dictionaries 
The hngulst works with a source dictionary where each dlcttonary entry is structured as a set of sub-entries An entry 
can for example group together senses for the same lemma, different categories together for the same form, dfffelent 
lemmas m the same denvatlonal famdy, etc An entry has a unique key (a Umcode string) and a tree of sub-entries At 
each node o1" the tree, we attach a feature structure which encodes lexlcal lnlormatlon The feature structure must follow 
39 
On Some Aspects of Lextcal Standardization 
the type definmons specified in the dlctmnary schema The tree ot sub-entries defines an inheritance hleraichy 
Logically, only the leaves are actual entries the compiler traverses the tree of sub-entries, computing inheritance, and 
generating the compded dlctmnary from the set of leaves 
I "key'__._:' 
Figure 2 A lexical entry as a tree of feature structures 
The dictionary schema contains various reformation useful for managing the dlcUonary (1) The schema ot entries is 
specified using Typed Feature Structure definmons (2) The schema of relations among entries, if any A lelatlon must 
spectahze the pre-defined RelatJ.on type and relations are used to describe synonymy, hyperonymy, etc They ate also 
used to hnk several monohngual d~ctlonanes to provide translations (3) The set ot macros, defining abbrevmttons for 
complex feature structures (4) The location of the key in the entry which is used to build the primary dictionary index 
(each entry has a unique key within a dlctlonary) (5) The language (as a 3-letter ISO code) (6) Additional indexes that 
are maintained by the database engine for mteractwe look-up of entries These indexes are specified as a set ot paths m 
an entry (7) The name of the checker class and of the checker defaulter class 
We use (typed) feature structures to model entries and relatmns (Zajac 98, 92) Each type has a definmon, is simllm to a 
class definmon in an Object Oriented language the defintuon of a type specifies what are the allowed features for that 
type and what is the type of the value for each feature Types are used to define the structure of entries, of relations 
(links), and of lexlcal rules Since types can be orgamzed in an inheritance hierarchy, it is possible to define a common 
framework for describing all dlctmnanes by defining a cross-language type hierarchy This multdmgual type h~erarchy 
specifies dictionary-dependent (that is, language-dependent) elements such as the mventory ot morphosyntacttc 
categories by defining super-types that are common to two or more languages, thereby dehnmg a multthngual 
mhentance hierarchy of lexlcal types 
Only syntactically correct entries are stored in the database However, there are someconsistency checks which es~.ape 
the checking done by the parser as well as the type-checking mechanism plovided by the Typed Feature Structure 
engine For example, all headwords must be written using the alphabet of the language and other characters would not 
be allowed This kind of checks must be added specifically for each dxctlonary through the Implementation ot a checker 
class that is used by the database before adding entries in a dictionary 
An optmnal defaulter can also be provided for a given dictionary the defaulter analyzes a dictmnary entry and apphes 
default rules to fill m m,ssmg reformation For example, ff a feature number with value Plural IS hlled for a noun, the 
noun is an irregular plural, otherwise, it is a regular noun and the number feature is not further specified, or, it the 
dictionary specifies a gender only for femm,ne nouns, the defaulter might add a masculine gender when tt is not 
specified Entries m the database m,ght have such missing mformatmn However, our Typed Feature Structure engine 
does not provide defaults and a runume dlctmnary must include explicitly all the defaults the defaulter is used by the 
compiler to fill in default mformauon and produce a compiled dlctmnary where all reformation is expl,cttly expanded 
The compilation process is done as follows on each entry (1) Apply dictionary-specific checks using the checker class 
(if defined) (2) Apply the defaulter to augment the dictionary entry and solve all the defaults Note that the checker and 
the defaulter work on the tree of sub-entries, not on mdwidual feature structures (3) Move all reformation down to the 
leaves of the tree of sub-entries (compute inheritance) (4) Expand macro defimtions (5) Comp,le a feature structure for 
each leaf of the sense tree (7) Use type inference to ,nfer the most specific type for each sub-~eature structure within a 
feature structure (8) Type check the feature structures m a feature structure, expand the types of all sub-teatuie 
structures by unifying m the defimtion of the type 
40 
m 
On Some Aspects of Lexwal Standas dtzauon 
Relauonships between lexlcal entries are modeled using binary hnks (relauons), used to describe synonymy relations, 
denvatmns relauons, translanon relauons (see Sectmn 1 4), thesaurus relatmns, etc Any relatmn defined in the 
d~ctlonary schema must inherit from the Relation type Relations can be given an arbitrarily complex internal 
structure and can bear reformation A relatmon Is formally defined as 
Relatlon = \[dom Entry, 
range Entry\] , 
For example, ,n a relaUon that specifies a cross-reference defined freely by the lexicographer, the domain feature will 
point to the entry which is the source of the relatmn and the target entry (range feature) will be ldenufied by prowdmg 
the key of that entry as m 
#0= \[key ~' arm", 
.... xre f \[ dom # O, 
range \[ key "armament" \] , 
note "Collectlve for arm "\]\] 
A d~ctlonmy browser could Interpret these relations by generating hypelhnks between entries for example A dlct,onary 
also contains rules whlch specify producuve relations within an entry (see Sect,on 1 3) or among entries within multiple 
dlctmnanes or still within a single dlcUonary (see Section 1 4) The type Relation is used in the definition ot translation 
relat, ons, transfer rules and lex~cal rules each of these rules are defined as sub-types of Relate.on 
2.2 Schema and meta-schema 
The Eagles gmdehnes on standardization of lexlcal resources (Eagles 93) introduce the dlstmcuon between (1) "The 
meta-schema which defines general well-formdness condmons for the schema", (2) The schema "defines the logical 
format of language-specific and level-wise hngulstlc descriptions", and (3) "Instances are the mdw,dual lexicons for 
which there is a translation relalaon expressed between the individual format of the instance and the 'type' defined by 
the schema" 
Fngure 3 
Genermc Structure 
Persman-Enghsh Schema 
A specific dictionary schema, e g a Persmn-Enghsh dlctmnary, specmhzes the generic schema, 
which ms mtself budt on a hard-coded core lexlcal structure 
m 
m 
m 
In an Habanera lexlcal knowledge base, the only fixed structure is the tree of sub-entries, and anything else xs defined 
via the dlcuonary schema Using the Typed Feature Structure language developed at CRL, Jt is possible to define 
dzcuonary schemata using several layers of abstractions, therefore introducing arbitrary intermediate layers between the 
meta-schema and the schema proper In this TFS language, sets of type definmons are grouped Into modules and sub- 
modules (a notion similar to the notion of package m programming languages such as Lisp or Java) The use el modules 
allows to structure a schema as a set of modules introducing addmonal structures and more specific constra,nts on the 
format of an instance In the next section, we wdl present the lexlcal stlucture which ,s used In CRL dictionaries The 
schemata of dlctmnanes are orgamzed as follows A generic module defines the generic structure ota dictionary 
Language specific modules add to that specification language dependent mformatxon (e g a specific Inventory ot 
morphosyntacuc leatures) of that is grafted on the generic structure or which speclafizes the generic sUucture The 
generic structure has been respired by the TEI defimtlon and in presented in Secuon 3 
The set of type definmons specified m the dictionary schema Is used by the type-checker whlch checks that a d,ctzonary 
entry Is well-typed and by the compiler which braids a compact binary representatmn of a dictionary entry as a feature 
structure 
2.3 Tools 
The d~ctlonary browser and editor are parametnzed by a set ot HTML templates which dehne the presentaUon format to 
be used for dlsplaymg feature structures at each level of the tree ot sub-entries The mapping of the stlucture ot an entry 
41 
On Some Aspects of Le.ucal Standardization 
to an HTML template rel,es on naming convent,ons based on the value of paths to name HTML elements m an HTML 
document 
I 
-I~QOO~lBBfll~l:::ll~Ofll\]flflflg\] 
Bill B 
,,,to.- ,t..~ .~ 
F, gure 4 A Habanera Browser for a Persmn-Enghsh dleUonary 
Since most Web browsers do not support mput methods for languages other than Enghsh, mput of character strings ~s 
done using a transcr, ptlon A set of transcription tables can be defined by the user and selected m the browser when 
inputting some character strmg for e g headwords However, Web browsers support the display of almost any major 
language I and Umcode strings can be dtrectly embedded m HTML documents Habanera also provide import/export 
functions The format o1" a dlcttonary file uses a textual syntax for feature structure (the one used in the examples) The 
dictionary file encoding is UTF-8 
3 Standardizing the Structure of Lexical Entries 
The d,ctlonartes developed at CRL shared the same generic structure Each language specific dictionary refines the 
shared schema by add,ng language specific ,nformation (e g, a specific inventory of morphosyntactlc features) The 
data ot a monohngual dtctlonary is a set of entr, es corresponding to word senses as descrtbed m (Meyer et al. 1990) and 
(Onyshkevych and N1renburg, 1994) We distinguish between computational features that are used by NLP components 
such as parsers (form, gram, sem, synSem, trans, rel, lexRule, usg). 2 and other features that are used by 
lexicographers definlUon (clef). example (eg). etymology (etym). closs-reference (xref) and note (note) The 
features present for each sub-entry are 
EntryElements = \[ 
form Form, 
gram Grammar, 
s em TMR, 
synSem SynSemMap, 
trans Translat zons, 
rel LexzcalRelat zons, 
lexRule LexzcalRules, 
usg Usage, 
def Strzng, 
eg Example, 
etym Strzng, 
xref Xref, 
note Strzng\] , 
The computational features used by NLP components are the following 
1 form mformatton related to the orthographic form of the word and its morphology (includes morphologtc.a\] 
features and morphological variants), 
2 gram mformaUon related to the syntactic behawor of the word (includes POS and subcategonzatlon ln/ormauon), 
1 With the important except,on ot Arabic-based scripts 
2 The names ot most features are taken from the TEl speclficat~on 
42 
On Some Aspects of Le rical Standardlzanon 
3 trans a cross-reference to one or more enmes In a target dicnonary, 
4 sere semantic mapping to a conceptual structure, 
5 synsem mforrnatlon on syntax-semantic linking, 
6 rel reformation on paradigmatic (synonyms, antonyms, ) and syntagmatlc (collocations, co-occurrences, ) 
relations, 
7 lexRule specification of productive lexlcal relations among entries within a dictionary (e g, productive 
morphological derivations), 
8 usg restrictions on the usage of some word (domain, geographical, temporal, ) 
In the remainder of this section, we present the structure ot the form and gram features (see Zajac et als 98 for a 
description of other features) 
3.1 Orthography and Morphology 
The form feature records information about the type of word whether the word is a full word, and acronym, or an 
abbreviation These types are introduced since typically acronyms and abbrewauons are processed differently from 
ordinary words, for example dunng a tokemzation phase (see e g Grefenstette 94) and words or compounds are 
processed during or after a morphological analysm the dictionary compiler will produce different runhme dictionaries 
that include different hnds of information as needed by the various components of the system 
The orthography feature records the citation form of the word as well as a list of variants There could also be addmonal 
information such as capitalization, hyphenation or syllabification (a useful information tot an English morphological 
analyzer for example) 
The morphology records three different kinds of information morphological information that is attached to the word 
and stored In the lexicon (e g, gender Information), inflectional information that is typically computed by a 
morphological analyzer (and passed to the syntactic analyzer), and denvatlonal information that could be either pre- 
computed in the lexicon or dynamically computed by a morphological analyzer In our lexical model, we require that 
each dictionary includes as lexical morphological reformation the part-of-speech (using the pos feature) and the 
indication if the word has a regular morphology or not (using the Boolean regular feature) 
Irregular forms are listed In the dictionary if the value of the regular feature is False This feature is plovlded to 
handle simple cases where a given class of words has only one inflectional paradigm English noun for example can be 
defined as having only one paradigm for the number inflection, where phonological variants ale handled by the 
morphological processor and anything that falls out of the domain of the morphological processor will be treated as an 
irregular form Note that the dictionary schema must allow for the inclusion of all inflected forms for irregulars 
If the linguist has to define inflectional paradigms, as it is the case in many languages, these paradigms must also be 
specified m the dictionary schema and should allow for the specification ot various stems involved For example, one 
might consider that English verbs have two paradigms, one where all forms are derived from the citations \[orm (want, 
wants, wanted, wanted, wanting) modulo phonological changes, one class where some forms must be specified m the 
lexicon (take, takes, took, taken, takang), and a class of irregulars (be, is, was, been, being) Therefore, English verbs 
could be classified as regular or irregular, and for regular, they fall m one of two paradigms The readel will have 
noticed that the morphological model used in the lexicon must be compatible with the model Implemented by any 
morphological processor using the d~cuonary Our experience has shown that ~t ~s not always tnvml to reconcile a 
morphological analyzer developed independently from a dictionary with the dictionary 
The structure ot the form feature must therefore include the following elements 
\[type Full I Abbrevlatlon I Acronym, 
orth \[clt Strlng, // The cltatlon form 
varlants List\], // Optlonally, syllablflcatlon, capltallzatlon, etc 
morph \[ lex \[pos POS, 
regular Boolean\] , 
infl InflectlonalFeatures, // Always unspecxfled xn the dlctlonary 
derlv DerlvatlonalStructure \] \] 
For example, the form structute of an English entry might look like 
43 
aa 
i 
\[\] 
ae. 
i 
aa 
m 
m 
I 
i 
1 
On Some Aspects of Lextcal Standardzzatton 
#0=\[key #k="brlng", 
form orth exp #k, 
sense #i=\[ 
morph \[ 
lex \[ 
pos eng Type MalnVerb, 
regular True, 
paradlgm i, 
slmplePast "brought", 
pastPartlclple "brought"\]\]\]\] 
wheretheteatu~s ~rlnflectlonaland denvattonmformatton a~ leftunspectfied 
3.2 Syntax 
The gram feature groups all information related to the syntactic behavior ot the word The grammai teature gram 
contains as required features the part-of-speech information (feature pos) and the subcategonzatton frame (teature 
frame) The frame feature encodes the subcategonzatlon frame of the predicate expressed as a hst of phrasal types 
The grammar feature may include addmonal features such as the subcategory, for example Mass/Countable for nouns, 
or Intransmve/Transmve for verbs, although this Is typically better represented by defimng the appropriate sub-types tot 
each part-of-speech Additionally, an reflectional feature J.nfl Is also defined for use by syntactic processors the value 
of this feature ts shared with morphology During processing, a morphological analyzer will produce a set of mflecuonal 
features and make them available to syntax through the feature gram J-nfl Conversely, a syntactic generator will 
produce a set of mflecuonal features for iexlcal heads and make them available to the morphological generator 
The Grammar feature (path gram m an entry) has type Gram This type is defined as 
Gram = \[pos POS, 
frame List, 
infl MorphInf lectlon\], 
For example, the followmg (partml) entry specifies two subcategonzatlon frames for the noun "announcement" 
\[ key "announc ement" , 
gram \[pos N, 
sense gram frame < NpComp\[head "that"\] >, 
sense gram frame < NpObl \[head "of" \] > \] \] 
4 Conclusion 
Standardizing lexicons represent an interesting intellectual and practical endeavor Past experience at CRL m 
developing, processing and using many large lexicons for several tasks, including machme-translauon systems, 
machine-aided translation tools, and mformatmn processmg systems shows that a first set of dlfhculttes hes m the lack 
of a standard format that Is flexible enough to cover many dlffeient languages and applications, but sutficlently l lgid to 
enable the use of a single lexlcal toolset shared across all these languages and apphcatlons This problems have been 
addressed by developing a generic dictionary software architecture that is now use to manage several large d,cuonartes 
designed for machine-aided translation as well as for machine translauon 
The second set of problems is almost as acute as the first It is very difficult to start des,gnmg a sound lexlcal 
architecture from scratch, hst the all the features that must be present for a variety of NLP apphcauons, predict the 
interaction between the various sub-structures, and predict the needs of the various NLP tools that would be access,ng 
the dlcuonary This has been done many times at CRL and thin knowledge is m part incorporated in the generic standard 
lexical structure briefly presented ,n Secuon 3 When developing a new dictionary, the linguist must use a pie-defined 
dtcuonary entry structure and follow a set of gmdelmes for defining the language-specific features This guarantees that 
the dlctlonary can be developed and maintained us,rig a standard dictionary management toolset, and that the 
reformat,on contained in the dictionary can actually be used for a variety of NLP at~phcauons which requllements are 
not always obv,ous for a non-expert The construction o! such a standard lextcal structure ~s stdl howevel an open task 
some areas are defined with more precision than others We started from a tairly unconstrained structuie and cross- 
44 
On Some A~pects of Le.ucal Standm dtzatton 
language work brought out commonalmes that have been progress,rely incorporated m the standard structure Although 
the standard structure presented m th~s paper has been now stable for over a year, further research and expenmentauon 
couldyteld new constraints that could be ,ncorporated m the archltecture 
Finally, the problem of hngumUc standards per se Is addressed only parually by the definmons of gmdehnes and the use 
of a standard lexlcal entry structure In particular, a standard entry structure imposes a spectfic orgamzatton of the 
hngu~st~c mformatmn encoded m an entry It defines the kind of hngmst,c reformation to be encoded and how to 
structure this mformauon In a multdmgual setting, it xs probably possible to define multdmgual types, such as a 
standard hst of part-of-speech However, our exper,ence on more than 6 different languages show that trying to estabhsh 
a set of multlhngual types Is not worth the effort the use of a standard lex~cal structure allows the lmgu~st to narrow 
down rapidly on the inventory of language-spec,fic types which can then be hsted with relatlve ease Standardlzauon ol 
lex~cal content ~s sull a very much open problem, and th~s research area related to the quest tol a umve~sal grammar In 
the Boas project (Nlrenburg & Raskm 98), the hngmst defines language-specific propert,es using a knowledge 
ehcltauon system that contains knowledge about the set of possible hngmst~c parameters and values The hngutst ~s 
graded through a set of queries and answers, the result ot which Is a hngulstlc profile of a language From thls language 
profile, the goal is to generate automatically the set ot language-specific lexlcal propemes that define the schema ol a 
d~cuonary 

References 

Lynn Cahlll and Gerald Gazdar 1995 "Muluhngual lexicons fol related languages" In Ptoceedmgr o/the 2nd DTI 
Language Engmeermg Conference pp169-176 

Lynn Cahdl and Gerald Gazdar 1996 "Multdmgual Lexicons for Related Lexicons" In Proceedmga oJ AISB'96 
Workshop on Multdmguahty tn the Lexicon, Brighton, UK, 69-75 

Eagles 1993 "EAGLES Lexicon Arch,tecture" EAGLES Document EAG-CLWG-LEXARCH/B 

Gregory Grefenstette 1994 "What is a Word, What is a Sentence Problems of Tokemzatmn" Rank Xerox 
Research Center, Techmcal Report ML'IT-004, April 1994 

Ide, N, V4ron,s, J (1995) "Encoding dlcuonanes" In Ide, N, V4roms, J (Eds) (1995) The Text Encoding 
lnttlattve Background and Context Kluwer Academic Pubhshers, Dordrecht, 167-179 

Sperberg-McQueen, C M, Burnard, L 1994 Guldel,nes fot Electrontc Te.~t Encoding and lnterdzange, Text 
Encoding Inltmuve, Ch,cago and Oxford, Chapter 12, "Prmt D~ctlonanes", 312-370 

Meyer, I, B Onyshkevych and L Carlson 1990 "Lexlcographlc pr,nclples and design for knowledge-based 
machine translation" Techmcal report CMT-CMU-90-118, Carneg,e Mellon Umverslty, August 13. 1990 

Sergel Nlrenburg and V, ctor Raskm 1998 'Umversal Grammar and Lexls for Qutck Ramp-Up ol MT Systems" 
Proc of the 17th Internattonal Conference on Computattonal Ltng,tt.~ttts - COLING'98, 10-14 August 1998, 
Montreal, Canada pp975-979 

Onyshkevych, Boyan, and Serge~ N,renburg 1994 The lextcon m the acheme of KBMT things Memoranda m 
Computer and Cognmve Sc,ence, MCCS-94-277 Las Cruces, N M New Mexico State Umverstty Reprmted as 
A lexicon for knowledge-based machine translauon, m Dorr and Klavans 1995 (eds), 5-57 

Jean V6roms and Nancy Ide 1992 ",%. teature-based model for lex~cal databases" Proc of the 14th lnte~nattonal 
Conference on Cornputattonal Lmgutsttcs- COLING'92, August 23-28 1992, Nantes, France pp588-594 

R6mx Zajac 1992 "Inheritance and Constraint-Based Grammar Formahsms" Computat,onal Lmgtttsttcs 1~/2, 
Spec,al Issue on Inheritance, June 1992 

R6m, Zajac 1997 "Habanera, a Mult, purpose Muluhngual Lex~cal Knowledge Base" NLPRS Wo, kshop on 
MulUhnguat InformaUon Processing, Natural Language Processmg Pactfic Rtn~ Sympostum 1997, 1-4 December, 
1997, Fhuket, Thmland 

R4mt Zajac, Evelyne Vlegas and Svetlana Sheremetyeva 1998 "The Generic Structure of a Lex,cal Knowledge 
Base Entry" Ms Computing Research Laboratory, New Mexico State Umverslty 
