An Overview of the EDR Electronic Dictianary and the Current Status of Its Utilization 
Hideo Miyoshi, Kenji Sugiyama, Masahiro Kobayashi, and Takano Ogino 
Japan Electronic Dictionary Research Institute, LTD. (EDR) 
Daini-Abe Bldg., 78-1, Kanda-Sakumagashi, Chiyoda-ku, Tokyo 101, Japan 
{miyoshi, kenji, kobayasi, ogino}@edr.co,jp 
Abstract 
In this paper we present the specification and the struc- 
ture of EDR Electronic Dictionary which was developed 
in a nine-year project. The first version of EDR dictio- 
nary (V1.0) and its revised version (V1.5) are "already re- 
leased and are now utilized at many sites for both aca- 
demic and commercial purposes. We also describe the 
current status how the EDR dictionary is utilized. Finally 
we will give the outline of the new R&D project which 
EDR will launch in fiscal 1996. 
1 Introduction 
The EDR Electronic Dictionary\[ 1,2,31 is the result of a 
nine-year project (from fiscal 1986 to fiscal 1994), 
funded by the Japan Key Technology Center and eight 
computer manufacturers* aimed at establishing an infra- 
structure for advanced processing of natural language by 
computers and knowledge information processing. 
The features of the EDR Electronic Dictionary can be 
summarized as follows: 
(1) A large scale that covers all the vocabulary used in 
ordinary writing 
(2) Aimed at general purpose applications without bias 
towards a particular application system or algorithm 
(3) Provided with the knowledge base required for true 
semantic analysis 
(4) A high degree of objectivity based on large volumes 
of text 
(5) Fundamental content that is highly generalized 
across different languages and fields 
The EDR Electronic Dictionary, which is composed of 
eleven sub-dictionaries, catalogues the lexical knowledge 
* Fujitsu, Ltd., NEC Corporation, Hitachi, Ltd., Sharp Corporation, 
Toshiba Corporation, Oki Electric Industry Co., Ltd., Mitsubishi Elec- 
tric Corporation, and Matsushita Electric Industrial Co., Ltd. 
of Japanese and English (the Word Dictionary, the Bilin- 
gual Dictionary, and the Co-occurrence Dictionary), and 
has unified thesaurus-like concept classifications (the 
Concept Dictionary) with corpus databases (the EDR 
Corpus). The Concept Classification Dictionary, a sub- 
dictionary of the Concept Dictionary, describes the simi- 
larity relation among concepts listed in the Word Dictio- 
nary. The EDR Corpus is the source for the information 
described in each of the sub-dictionaries. The basic ap- 
proach taken during the development of the dictionary 
was to avoid a particular linguistic theory and to allow for 
adoptability to various applications. 
The first version of EDR dictionary (VI.0) and its re- 
vised version (V1.5) are already released and are now uti- 
lized at many sites for both academic and commercial 
purposes. This paper outlines the specification of EDR 
Electronic Dictionary and describes the current status of 
its utilization. 
2 The Structure of the EDR Electronic 
Dictionary 
The EDR Electronic Dictionary is composed of five types 
of dictionaries (Word, Bilingual, Concept, Co-occur- 
rence, and Technical Terminology), as well as thb EDR 
Corpus. 
EDR Electronic Dictionary 
-Word Dictionary 
L Japanese Word Dictionary 
English Word Dictionary 
- B.ilingual Dictionary 
~-Jpn.-Eng. Bilingual Dictionary 
t. Eng.-Jpn. Bilingual Dictionary 
oncept Dictionary 
Headconcept Dictionary 
Concept Classification Dictionary 
Concept Description Dictionary 
- Co-occurrence Dictionary 
~ Japanese Co-occurrence Dictionary 
1090 
L English Co-occurrence Dictionary 
-Technical Terminology Dictionary 
f Jpn. Technical Terminology Dictionary 
(Information Processing) 
Eng. Technical Terminology Dictionary 
(Information Processing) 
Others (Concept Classification, 
Bilingual Dictionary, 
Co-occurrence Data, etc.) 
EDR Corpus 
t Japanese Corpus 
English Corpus 
The Japanese Word Dictionary contains 250,000 words, 
and the English Word Dictionary contains 190,000 words. 
The Bilingual Dictionary lists the correspondences be- 
tween headwords in the different languages. The Japanese- 
English Bilingual Dictionary contains 230,000 words, and 
the English-Japanese Bilingual Dictionary contains 
190,000 words. 
The Concept Dictionary contains information on the 
400,000 concepts listed in the Word Dictionary and is di- 
vided according to information type into the Headconcept 
Dictionary, the Concept Classification Dictionary, and the 
Concept Description Dictionary. The ! leadconcept Dictio- 
nary describes information on the concepts themselves. 
The Concept Classification Dictionary describes the super- 
sub relations among the 400,000 concc, pt,;. The Concept 
Description Dictionary describes the semantic (binary) re- 
lations, such as 'agent,' 'implement,' and 'place,' between 
concepts that co-occur in a sentence. 
The Co-occurrence Dictionm'y describes collocational in- 
formation in the form of binm'y relations. The Japanese Co- 
occmTence Dictionary contains 900,000 phrases, and the 
English Co-occurrence Dictionary contains 460,000 
phrases. 
The Technical Terminology Dictionary covers the field of 
infbrmation processing, attd is split into four types of dic- 
tionaries of Word, Bilingual, Concept (Classification), and 
Co-occurrence. 
The linguistic data which the EDR Corpus contains has 
been obtained by collecting a large number of example sen- 
tences and analyzing them on morphoh)gical, syntactic, 
attd semantic levels. The Japanese Corpus contains 
220,000 sentences, and the English Corpus contains 
160,000 sentences. 
3 Role of Each Dictionary 
This chapter describes the roles of the major 
subdictionaryies of the EDR Electronic Dictionary and 
shows some examples. 
3.1 Word Dictionary 
The role of the Word Dictionary is to provide part of the 
information on the morphological, syntactic, and seman- 
tic revels that is requited for natulal language processing. 
Morphological information relates to headword 
(morpheme) mid intbrmation on the connectivity of roof 
phemes. This is used in morphological analysis to find 
the morphemes, and also used in morphological genera- 
tion to produce output sentences. 
Information on the syntactic level includes parts of 
speech as well as surface case information and other 
grammatical attributes. This information is used in syn- 
tactic analysis and generation, and provides the basis for 
the formulation of parsing rules and production rules. 
Semantic information includes concept identifiers. 
Headconcept and concept explications are provided as 
accampanying information. The concept identifier is a 
numerical expression and the basic constituent of the 
Concept Dictionary. The headconcept is a representative 
word that is the most appropriate in expressing the corn 
cept identified by the concept identifier. The concept ex- 
plication is an explanation written in natural language tor 
the p.,i,o,~;c of assisting humans in differentiating one 
~:,~nccpl Imm another. Every Word Dictionary record has 
a concci)t identifier to link the Word Dictionary and the 
Concept Dictionary. 
The following is an example of English Word l)ictio- 
nary record: 
lleadword: dog 
Cormoct. J v i t:y : I!',I,N\] , I,',(',N1 
Part o\[ Speech: I,',NI (common noun) 
CrammaLJcal At.t.ribut.()s: \],:CN\] ;I:NSG, I,:NC;I,:NNI,: 
CoIIc:(:pt I 1): ?;dbc6'/ 
lloadconcept : doq 
Concept l,:x~)l icat ion: an anima\] ca\]led dog 
3.2 Bilingual Dictionary 
The Bilingual Dictionary is designed to give appropriate 
correspondence words to the headwords contained in the 
Word Dictionmy, in machine processings. The headword 
information of the Bilingual Dictionary is a subset of the 
Word Dictionary, that is, headword notations, parts of 
speech, concept identifiers, headconcepts, and concept 
1091 
explications. The eoricept identifiers and concept expli- 
cations are used to indentify the meaning of the 
polysemous headwords. Some of the correspondence 
words include additional information which describes the 
constraints where the correspondence words are u~d. 
The following is an example of English Japanese 
Biligual Dictionary record: 
Headword : dog 
Part of Speech: ENI (common noun) 
Concept_ID : 3dbc67 
Headconcept : dog 
Concept Explication: an animal called dog 
Correspondence Word: 
3.3 Concept Dictionary 
The role of the Concept Dictionary is to provide the data 
required for computer processing of the semantic con- 
tents or the concepts, expressed in natural language sen- 
tences, such as: 
(1) Generating appropriate semantic representations for 
sentences 
(2) Determining the similarity (equivalence) of seman- 
tic contents 
(3) Converting a semantic content into a similar 
(equivalent) content 
For this reason, the Concept Dictionary contains three 
types of subdictionaries: Headconcept Dictionary, Con- 
cept Classification Dictionary, and Concept Description 
Dictionary. In the Concept Dictionary, each concept is 
uniquely identified by a concept identifier which is a 
hexadecimal number. The Headconcept Dictionary con- 
tains the concept identifier and the headconcept, and the 
concept explication. The headconcept is a word whose 
meaning is close to the content meaning of the concept. 
The concept explication is an explanation which ex- 
presses the meaning of the concept. The Concept Classi- 
fication Dictionary contains the set of pairs of concepts 
that have super-sub (is_a) relation. For example, the su- 
per-concepts of 'school' are 'organization,' 'building,' and 
'function.' The sub-concepts of 'school' are 'elementary 
school,' 'university,' and so forth. The Concept Descrip- 
tion Dictionary contains the set of pairs of concepts that 
have certain semantic relations other than super-sub rela- 
tions. The following eight semantic relations are used: 
object agent goal implement 
a-object place scene cause 
,~, 30f6ae 
<physical object> 30f801 g "** 
m ¢' <movement> g *% 
Concept Dictionary 
I 30f802 agent ~ 30f6b 
<spatial movement> <person> 
/ /~dc67 k <follow> ~" ( <dog> \] 
N J 
\ ~f~ i <onary 
EnicgtliShnaWO r~ EV JN ~apanese Word 
~~'-~ ~ CO'Occ~'- Bilingual Dictionary D/Ctiot/%~O.o~ 
9~c6O~J 
Figure 1. Relationships between sections of EDR Electronic Dictionary 
1092 
Table 1: Number of User Sites of the EDR Electronic Dictionary 
university goverment 
institution nalion',d and public private 
48 18 
private 
company total ove rse as 
No. of User Sites 1 3 23 93 
3.4 Co-occurrence Dictionary 
The Co-occurrence Dictionary includes the type of word 
conbinations used to construct a sentence, that is, 
collocational information. This type of information is 
used to select the appropriate correspondence words in 
machine translation. 
3.5 EDR Corpus 
The EDR Corpus is composed of the record number, ~n- 
tence information, constituent information, morphologi- 
cal information, syntactic information, and semantic in- 
formation. The basic role of the EDR Corpus is first to 
identify the sentence constituents of sentences, and then 
to indicate how the constituents combine to form the mor- 
phological, syntactic and semantic structure of the sen- 
tence using a large number of actual examples. The data 
in the Concept Description Dictionary and the Co-occur- 
rence Dictionary is extracted from the EDR Corpus. 
These subdictionaries are not indendent, but are organi- 
cally connected (Figure 1). 
4 The Current Status of Utilization 
As we mentioned in chapter 1, we have already released 
the first CD-ROM version of EDR Electronic dictionary 
(V1.0) in April 1995 after the nine year R&D project. 
They are now being utilized at many sites for both aca- 
demic and commercial purposes (Table 1). In fiscal 1995, 
furthermore refinement and improvement were done and 
the revised version (V1.5) is available since April 1996. 
One of the users, Fujitsu, released a commercial product 
using the EDR Electronic Dictionary in 1995. The prod- 
uct is called "Denjikai for Windows V2.0," which re- 
trieves the word information from various dictionaries 
including EDR Electronic Dictionary. 
5 Conclusion and Future Work 
A number of dictionaries are currentry being developed 
under the name of electronic dictionaries (machine-read- 
able dictionaries). These dictionaries consist of informa- 
tion li'om published dictionaries that has been stored on a 
recording medium, and which can then be referred to and 
used by mechanical means. However, these electronic 
dictionaries are referred to and used by people, unlike 
true electronic dictionaries (machine-tractable 
dictionaries), which in the strict sense are intended for 
use in machine processing. True electronic dictionaries 
are not simply machine-readable editions of dictionries 
for use by people. They must include all the information 
necessary for a computer to understand a natural lan- 
guage. We think that the EDR Electronic Dictionary sat- 
isfies those conditions and hope that it will be widely 
used for various natural language processing applica- 
tions. 
Finally we would like to make a short remark on the new 
project which EDR will launch in fiscal 1996. The new 
project will be funded by Information Technology Pro- 
motion Agency (IPA) of Japan and will be carried out in 
conjunction with Tokyo Institute of Technology and To- 
kyo University. The objective of the project will be the 
creation of a software that will allow the linguistic knowl- 
edge base to automatically expand by feeding the output 
of analyzed text into the knowledge base itself. We hope 
this will help refine and extend the EDR Electronic Dic- 
tionary. 
References 
\[ 1 \] EDR, Proceedings of the International Workshop on 
Electronic Dictionaries, EDR TR-031, 1991. 
\[2l EDR, EDR Electronic Dictionary Version 1 Techni- 
cal Guide, EDR TR2-003, 1995. 
\[3l EDR, Summary for the EDR Electronic Dictionary 
Version 1 Technical Guide, EDR TR2-005, 1995. 
1093 
