Acquiring an Ontology for a Fundamental Vocabulary
Francis Bond and Eric Nichols   and Sanae Fujita and Takaaki Tanaka 
* fbond,fujita,takaakig@cslab.kecl.ntt.co.jp ** eric-n@is.naist.jp
* NTT Communication Science Laboratories,
Nippon Telegraph and Telephone Corporation
** Nara Advanced Institute of Science and Technology
Abstract
In this paper we describe the extraction of
thesaurus information from parsed dictio-
nary de nition sentences. The main data
for our experiments comes from Lexeed,
a Japanese semantic dictionary, and the
Hinoki treebank built on it. The dictio-
nary is parsed using a head-driven phrase
structure grammar of Japanese. Knowledge
is extracted from the semantic representa-
tion (Minimal Recursion Semantics). This
makes the extraction process language inde-
pendent.
1 Introduction
In this paper we describe a method of acquiring
a thesaurus and other useful information from
a machine-readable dictionary. The research is
part of a project to construct a fundamental
vocabulary knowledge-base of Japanese:
a resource that will include rich syntactic and
semantic descriptions of the core vocabulary of
Japanese. In this paper we describe the auto-
matic acquisition of a thesaurus from the dic-
tionary de nition sentences. The basic method
has a long pedigree (Copestake, 1990; Tsuru-
maru et al., 1991; Rigau et al., 1997). The
main di erence from earlier work is that we use
a mono-stratal grammar (Head-Driven Phrase
Structure Grammar: Pollard and Sag (1994))
where the syntax and semantics are represented
in the same structure. Our extraction can thus
be done directly on the semantic output of the
parser.
In the  rst stage, we extract the thesaurus
backbone of our ontology, consisting mainly of
hypernym links, although other links are also
extracted (e.g., domain). We also link our
  Some of this research was done while the second
author was visiting the NTT Communication Science
Laboratories
extracted thesaurus to an existing ontology of
Japanese: the Goi-Taikei ontology (Ikehara et
al., 1997). This allows us to use tools that ex-
ploit the Goi-Taikei ontology, and also to extend
it and reveal gaps.
The immediate application for our ontology
is in improving the performance of stochastic
models for parsing (see Bond et al. (2004) for
further discussion) and word sense disambigua-
tion. However, this paper discusses only the
construction of the ontology.
We are using the Lexeed semantic database
of Japanese (Kasahara et al. (2004), next sec-
tion), a machine readable dictionary consisting
of headwords and their de nitions for the 28,000
most familiar open class words of Japanese,
with all the de nitions using only those 28,000
words (and some function words). We are pars-
ing the de nition sentences using an HPSG
Japanese grammar and parser and treebanking
the results into the Hinoki treebank (Bond et
al., 2004). We then train a statistical model
on the treebank and use it to parse the remain-
ing de nition sentences, and extract an ontology
from them.
In the next phase, we will sense tag the de ni-
tion sentences and use this information and the
thesaurus to build a model that combines syn-
tactic and semantic information. We will also
produce a richer ontology | by combining infor-
mation for word senses not only from their own
de nition sentences but also from de nition sen-
tences that use them (Dolan et al., 1993), and
by extracting selectional preferences. Once we
have done this for the core vocabulary, we will
look at ways of extending our lexicon and on-
tology to less familiar words.
In this paper we present the details of the
ontology extraction. In the following section we
give more information about Lexeed and the Hi-
noki treebank.We then detail our method for ex-
tracting knowledge from the parsed dictionary
de nitions (x 3). Finally, we discuss the results
and outline our future research (x 4).
2 Resources
2.1 The Lexeed Semantic Database of
Japanese
The Lexeed Semantic Database of Japanese is
a machine readable dictionary that covers the
most common words in Japanese (Kasahara et
al., 2004). It is built based on a series of psy-
cholinguistic experiments where words from two
existing machine-readable dictionaries were pre-
sented to multiple subjects who ranked them on
a familiarity scale from one to seven, with seven
being the most familiar (Amano and Kondo,
1999). Lexeed consists of all open class words
with a familiarity greater than or equal to  ve.
The size, in words, senses and de ning sentences
is given in Table 1.
Table 1: The Size of Lexeed
Headwords 28,300
Senses 46,300
De ning Sentences 81,000
The de nition sentences for these sentences
were rewritten by four di erent analysts to use
only the 28,000 familiar words and the best
de nition chosen by a second set of analysts.
Not all words were used in de nition sentences:
the de ning vocabulary is 16,900 di erent words
(60% of all possible words were actually used in
the de nition sentences). An example entry for
the word a0a2a1a4a3a4a5a7a6 doraib a \driver" is given in
Figure 1, with English glosses added. The un-
derlined material was not in Lexeed originally,
we extract it in this paper. doraib a \driver"
has a familiarity of 6.55, and three senses. The
 rst sense was originally de ned as just the syn-
onym nejimawashi \screwdriver", which has a
familiarity below 5.0. This was rewritten to the
explanation: \A tool for inserting and removing
screws".
2.2 The Hinoki Treebank
In order to produce semantic representations
we are using an open source HPSG grammar
of Japanese: JACY (Siegel and Bender, 2002),
which we have extended to cover the dictio-
nary de nition sentences (Bond et al., 2004).
We have treebanked 23,000 sentences using the
[incr tsdb()] pro ling environment (Oepen and
Carroll, 2000) and used them to train a parse
ranking model for the PET parser (Callmeier,
2002) to selectively rank the parser output.
These tools, and the grammar, are available
from the Deep Linguistic Processing with HPSG
Initiative (DELPH-IN: http://www.delph-in.
net/).
We use this parser to parse the de ning sen-
tences into a full meaning representation using
minimal recursion semantics (MRS: Copestake
et al. (2001)).
3 Ontology Extraction
In this section we present our work on creating
an ontology. Past research on knowledge ac-
quisition from de nition sentences in Japanese
has primarily dealt with the task of automat-
ically generating hierarchical structures. Tsu-
rumaru et al. (1991) developed a system for
automatic thesaurus construction based on in-
formation derived from analysis of the terminal
clauses of de nition sentences. It was success-
ful in classifying hyponym, meronym, and syn-
onym relationships between words. However,
it lacked any concrete evaluation of the accu-
racy of the hierarchies created, and only linked
words not senses. More recently Tokunaga et
al. (2001) created a thesaurus from a machine-
readable dictionary and combined it with an ex-
isting thesaurus (Ikehara et al., 1997).
For other languages, early work for En-
glish linked senses exploiting dictionary domain
codes and other heuristics (Copestake, 1990),
and more recent work links senses for Spanish
and French using more general WSD techniques
(Rigau et al., 1997). Our goal is similar. We
wish to link each word sense in the fundamen-
tal vocabulary into an ontology. The ontology is
primarily a hierarchy of hyponym (is-a) rela-
tions, but also contains several other relation-
ships, such as abbreviation, synonym and
domain.
We extract the relations from the semantic
output of the parsed de nition sentences. The
output is written in Minimal Recursion Seman-
tics (Copestake et al., 2001). Previous work has
successfully used regular expressions, both for
2
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
4
Headword a0a2a1a4a3a6a5a8a7 doraiba-
POS noun Lexical-type noun-lex
Familiarity 6.5 [1{7]
Sense 1
2
66
66
66
66
4
Definition
2
66
64
S1
a9a11a10
/a12a6a13a15a14 /a16
screw turn (screwdriver)
S10
a9a11a10
/a17 /a18a19a14a21a20a23a22 /a24a26a25 / a27 /a28a30a29a32a31a30a33 /a24 /a34a30a35 /a36a23a37 /a16
A tool for inserting and removing screws .
3
77
75
Hypernym a36a23a37 1 equipment \tool"
Sem. Class h942:tooli ( 893:equipment)
3
77
77
77
77
5
Sense 2
2
66
66
4
Definition
"
S1
a38a40a39a23a41
/a17 /a42a44a43 /a34a30a35 /a45 /a16
Someone who drives a car
#
Hypernym a45 1 hito \person"
Sem. Class h292:driveri ( 4:person)
3
77
77
5
Sense 3
2
66
66
66
66
66
64
Definition
2
66
64
S1
a46a2a47a30a48
/a49 / a27 /a50 /a51a44a52 /a53 /a54 /a55a56a1a58a57 /a16
In golf, a long-distance club.
S2
a59a23a60
/a61a30a62a63a0 /a16 /
A number one wood .
3
77
75
Hypernym a55a56a1a58a57 2 kurabu \club"
Sem. Class h921:leisure equipmenti ( 921)
Domain
a46a2a47a64a48 1
gorufu \golf"
3
77
77
77
77
77
75
3
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
5
Figure 1: Entry for the Word doraiba- \driver" (with English glosses)
English (Barnbrook, 2002) and Japanese (Tsu-
rumaru et al., 1991; Tokunaga et al., 2001).
Regular expressions are extremely robust, and
relatively easy to construct. However, we use
a parser for four reasons. The  rst is that
it makes our knowledge acquisition more lan-
guage independent. If we have a parser that
can produce MRS, and a machine readable dic-
tionary for that language, the knowledge acqui-
sition system can easily be ported. The sec-
ond reason is that we can go on to use the
parser and acquisition system to acquire knowl-
edge from non-dictionary sources. Fujii and
Ishikawa (2004) have shown how it is possible
to identify de nitions semi automatically, how-
ever these sources are not as standard as dic-
tionaries and thus harder to parse using only
regular expressions. The third reason is that
we can more easily acquire knowledge beyond
simple hypernyms, for example, identifying syn-
onyms through common de nition patterns as
proposed by Tsuchiya et al. (2001). The  nal
reason is that we are ultimately interested in
language understanding, and thus wish to de-
velop a parser. Any e ort spent in building
and re ning regular expressions is not reusable,
while creating and improving a grammar has
intrinsic value.
3.1 The Extraction Process
To extract hypernyms, we parse the  rst de -
nition sentence for each sense. The parser uses
the stochastic parse ranking model learned from
the Hinoki treebank, and returns the MRS of
the  rst ranked parse. Currently, just over 80%
of the sentences can be parsed.
An MRS consists of a bag of labeled elemen-
tary predicates and their arguments, a list of
scoping constraints, and a pair of relations that
provide a hook into the representation | a la-
bel, which must outscope all the handles, and
an index (Copestake et al., 2001). The MRSs
for the de nition sentence for doraib a2 and its
English equivalent are given in Figure 2. The
hook’s label and index are shown  rst, followed
by the list of elementary predicates. The  g-
ure omits some details (message type and scope
have been suppressed).
hh0; x1fh0 : prpstn rel(h1)
h1 : hito(x1)
h2 : udef(x1; h1; h6)
h3 : jidosha(x2)
h4 : udef(x2; h3; h7)
h5 : unten(u1; x1; x2)gi
hh1; x1fh : prpstn rel(h0)
h1 : person(x1)
h2 : some(x1; h1; h6)
h3 : car(x2)
h4 : indef(x1; h3; h7)
h5 : drive(u1; x1; x2)gi
a0a2a1a4a3a6a5a8a7a10a9a12a11a14a13a16a15a18a17 a19 somebody who drives a car
Figure 2: Simpli ed MRS represntations for doraib a2
In most cases, the  rst sentence of a dictio-
nary de nition consists of a fragment headed
by the same part of speech as the headword.
Thus the noun driver is de ned as an noun
phrase. The fragment consists of a genus
term (somebody) and di erentia (who drives
a car).1 The genus term is generally the most
semantically salient word in the de nition sen-
tence: the word with the same index as the in-
dex of the hook. For example, for sense 2 of the
word a0 a1 a3 a5 a6 doraib a, the hypernym is a17
hito \person" (Figure 2). Although the actual
hypernym is in very di erent positions in the
Japanese and English de nition sentences, it is
the hook in both the semantic representations.
For some de nition sentences (around 20%),
further parsing of the semantic representation
is necessary. The most common case is where
the index is linked to a coordinate construction.
In that case, the coordinated elements have to
be extracted, and we build two relationships.
Other common cases are those where the rela-
tionship between the headword and the genus
is given explicitly in the de nition sentence: for
example in (1), where the relationship is given
as abbreviation. We initially process the rela-
tion, a20 ryaku \abbreviation", yielding the coor-
dinate structure. This in turn gives two words:
a21a23a22a25a24a6a26 arupusu \alps" and
a27a4a28
a21a23a22a25a24a12a26 nihon
arupusus \Japanese Alps". Our system thus
produces two relations: abbreviation(a21 ,a21a14a22
a24a29a26 ) and abbreviation(a21 ,
a27a23a28
a21a6a22a30a24a31a26 ). As
can be seen from this example, special cases can
embed each other, which makes the use of reg-
ular expressions di cult.
(1) a21 :
a:
a:
a21a23a22a25a24a12a26
arupusu
alps
a32
,
,
a33a35a34a12a36
matawa
or
a27a4a28
a21a35a22a25a24a12a26
nihon-arupusu
japan alps
1Also know as superordinate and discriminator or
restriction.
a37
no
adn
a20
ryaku
abbreviation
a: an abbreviation for the Alps or the
Japanese Alps
The extent to which non-hypernym relations
are included as text in the de nition sentences,
as opposed to stored as separate  elds, varies
from dictionary to dictionary. For knowledge
acquisition from open text, we can not expect
any labeled features, so the ability to extract
information from plain text is important.
We also extract information not explicitly la-
beled, such as the domain of the word, as in
Figure 3. Here the adpositional phrase repre-
senting the domain has wide scope | in e ect
the de nition means \In golf, [a driver3 is] a club
for playing long strokes". The phrase that spec-
i es the domain should modify a non-expressed
predicate. To parse this, we added a construc-
tion to the grammar that allows an NP frag-
ment heading an utterance to have an adposi-
tional modi er. We then extract these modi ers
and take the head of the noun phrase to be the
domain. Again, this is hard to do reliably with
regular expressions, as an initial NP followed by
a38 could be a copula phrase, or a PP that at-
taches anywhere within the de nition | not all
such initial phrases restrict the domain. Most of
the domains extracted fall under a few superor-
dinate terms, mainly sport, games and religion.
Other, more general domains, are marked ex-
plicitly in Lexeed as features. Japanese equiva-
lents of the following words have a sense marked
as being in the domain golf: approach, edge,
down, tee, driver, handicap, pin, long shot.
We summarize the links acquired in Table 2,
grouped by coarse part of speech. The  rst
three lines show hypernym relations: implicit
hypernyms (the default); explicitly indicated
hypernyms, and implicitly indicated hyponyms.
UTTERANCE
VP
V[COPULA]
PP NP
PP PP
N CASE-P PUNCT N NMOD-P N
a0a0a0 a22a22a22a2a1a1a1 a38a38a38
a32a32a32 a3a3a3 /a4a4a4a6a5a5a5 /a7a7a7
a37a37a37 a8a8a8 a1a1a1a10a9a9a9
golf in , long-distance adn club
Figure 3: Parse for Sense3 of Driver
The second three names show other relations:
abbreviations, names and domains. Implicit hy-
pernyms are by far the most common relations:
fewer than 10% of entries are marked with an
explicit relationship.
Relation Noun Verbal Verb Other
Type Noun
Implicit 21,245 5,467 6,738 5,569
Hypernym 230 5 9
Hyponym 194 5 5
Abbreviation 423 35 76
Name 121 5
Domain 922 170 141
Table 2: Acquired Knowledge
3.2 Veri cation with Goi-Taikei
We veri ed our results by comparing the hyper-
nym links to the manually constructed Japanese
ontology Goi-Taikei. It is a hierarchy of 2,710
semantic classes, de ned for over 264,312 nouns
(Ikehara et al., 1997). Because the semantic
classes are only de ned for nouns (including ver-
bal nouns), we can only compare nouns. Senses
are linked to Goi-Taikei semantic classes by the
following heuristic: look up the semantic classes
C for both the headword (wi) and the genus
term(s) (wg). If at least one of the index word’s
semantic classes is subsumed by at least one of
the genus’ semantic classes, then we consider
their relationship con rmed (1).
9(ch; cg) : fch  cg; ch 2 C(wh); cg 2 C(wg)g
(1)
In the event of an explicit hyponym relation-
ship indicated between the headword and the
genus, the test is reversed: we look for an in-
stance of the genus’ class being subsumed by
the headword’s class (cg  ch). Our results
are summarized in Table 3. The total is 58.5%
(15,888 con rmed out of 27,146). Adding in the
named and abbreviation relations, the coverage
is 60.7%. This is comparable to the coverage
of Tokunaga et al. (2001), who get a coverage
of 61.4%, extracting relations using regular ex-
pressions from a di erent dictionary.
3.3 Extending the Goi-Taikei
In general we are extracting pairs with more
information than the Goi-Taikei hierarchy of
2,710 classes. For 45.4% of the con rmed re-
lations both the headword and its genus term
were in the same Goi-Taikei semantic class. In
particular, many classes contain a mixture of
class names and instance names: a11a13a12 buta niku
\pork" and a12 niku \meat" are in the same
class, as are a0 a1a15a14 doramu \drum" and a16
a17a19a18 dagakki \percussion instrument", which
we can now distinguish. This con ation has
caused problems in applications such as ques-
tion answering as well as in fundamental re-
search on linking syntax and semantics (Bond
and Vatikiotis-Bateson, 2002).
An example of a more detailed hierarchy de-
duced from Lexeed is given in 4. All of the
words come from the same Goi-Taikei seman-
tic class: h842:condimenti, but are given more
structure by the thesaurus we have induced.
There are still some inconsistencies: ketchup is
directly under condiment, while tomato sauce
and tomato ketchup are under sauce. This re-
 ects the structure of the original machine read-
able dictionary.
4 Discussion and Further Work
From a language engineering point of view, we
found the ontology extraction an extremely use-
ful check on the output of the grammar/parser.
Treebanking tends to focus on the syntactic
structure, and it is all too easy to miss a mal-
formed semantic structure. Parsing the seman-
tic output revealed numerous oversights, espe-
cially in binding arguments in complex rules and
lexical entries.
It also reveals some gaps in the Goi-Taikei
coverage. For the word a0 a1 a3 a5 a6 doraib a
\driver" (shown in Figure 1), the  rst two hy-
pernyms are con rmed. However, a0 a1 a3 a5 a6 in
Relation Noun Verbal Noun
Implicit 56.66% (12,037/21,245) 64.55% (3,529/5,467)
Hypernym 56.52% (134/230) 0% (0/5)
Hyponym 94.32% (183/194) 100% (5/5)
Subtotal 57.01% (12,354/21,669) 64.52% (3,534/5,477)
Table 3: Links Con rmed by Comparison with Goi-Taikei
a0a0a0a2a1a1a1a3a0a0a0a5a4a4a4a7a6a6a6a9a8a8a8a11a10a10a10 a24a24a24
1
tomato ketchup
a12a12a12a9a13a13a13 a3a3a3a14a0a0a0a16a15a15a15 a6a6a6 a26a26a26
1
white sauce
a17a17a17 a6a6a6a3a0a0a0a16a15a15a15 a6a6a6 a26a26a26
1
meat sauce
a15a15a15 a6a6a6 a26a26a26
2
sauce
a0a0a0a2a1a1a1a3a0a0a0a16a15a15a15 a6a6a6 a26a26a26
1
tomato sauce
a4a4a4a7a6a6a6a9a8a8a8a11a10a10a10 a24a24a24
1
ketchup
a18a18a18a20a19a19a19a22a21a21a21
1
condiment
a23a23a23
1
salta24a24a24a26a25a25a25
a6a6a6a22a27a27a27
1
curry powdera24a24a24a26a25a25a25
a6a6a6
1
curry
a28a28a28a30a29a29a29a30a21a21a21
1
spice
a26a26a26a32a31a31a31 a3a3a3 a26a26a26
1
spice
Figure 4: Re nement of the class condiment.
GT only has two semantic classes: h942:tooli
and h292:driveri. It does not have the seman-
tic class h921:leisure equipmenti. Therefore
we cannot con rm the third link, even though it
is correct, and the domain is correctly extracted.
Further Work
There are four main areas in which we wish to
extend this research: improving the grammar,
extending the extraction process itself, further
exploiting the extracted relations and creating
a thesaurus from an English dictionary.
As well as extending the coverage of the gram-
mar, we are investigating making the semantics
more tractable. In particular, we are investi-
gating the best way to represent the semantics
of explicit relations such as a33a35a34 isshu \a kind
of".2
2These are often transparent nouns: those nouns
which are transparent with regard to collocational or se-
lection relations between their dependent and the exter-
We are extending the extraction process by
adding new explicit relations, such as a36a38a37a38a39
teineigo \polite form". For word senses such
as driver3, where there is no appropriate Goi-
Taikei class, we intend to estimate the semantic
class by using the de nition sentence as a vec-
tor, and looking for words with similar de ni-
tions (Kasahara et al., 1997).
We are extending the extracted relations in
several ways. One way is to link the hypernyms
to the relevant word sense, not just the word. If
we know that a8 a1 a9 club \kurabu" is a hyper-
nym of h921:leisure equipmenti, then it rules
out the card suit \clubs" and the \association
of people with similar interests" senses. Other
heuristics have been proposed by Rigau et al.
(1997). Another way is to use the thesaurus to
predict which words name explicit relationships
which need to be extracted separately (like ab-
breviation).
5 Conclusion
In this paper we described the extraction of the-
saurus information from parsed dictionary de -
nition sentences. The main data for our exper-
iments comes from Lexeed, a Japanese seman-
tic dictionary, and the Hinoki treebank built on
it. The dictionary is parsed using a head-driven
phrase structure grammar of Japanese. Knowl-
edge is extracted from the semantic representa-
tion. Comparing our results with the Goi-Taikei
hierarchy, we could con rm 60.73% of the rela-
tions extracted.
Acknowledgments
The authors would like to thank Colin Bannard,
the other members of the NTT Machine Trans-
lation Research Group, NAIST Matsumoto
Laboratory, and researchers in the DELPH-IN
community, especially Timothy Baldwin, Dan
Flickinger, Stephan Oepen and Melanie Siegel.
nal context of the construction, or transparent to number
agreement (Fillmore et al., 2002).

References

Shigeaki Amano and Tadahisa Kondo. 1999.
Nihongo-no Goi-Tokusei (Lexical properties of
Japanese). Sanseido.

Geo Barnbrook. 2002. De ning Language | A lo-
cal grammar of de nition sentences. Studies in
Corpus Linguistics. John Benjamins.

Francis Bond and Caitlin Vatikiotis-Bateson. 2002.
Using an ontology to determine English count-
ability. In 19th International Conference on Com-
putational Linguistics: COLING-2002, volume 1,
pages 99{105, Taipei.

Francis Bond, Sanae Fujita, Chikara Hashimoto,
Kaname Kasahara, Shigeko Nariyama, Eric
Nichols, Akira Ohtani, Takaaki Tanaka, and
Shigeaki Amano. 2004. The Hinoki treebank:
A treebank for text understanding. In Proceed-
ings of the First International Joint Conference
on Natural Language Processing (IJCNLP-04).
Springer Verlag. (in press).

Ulrich Callmeier. 2002. Preprocessing and encod-
ing techniques in PET. In Stephan Oepen, Dan
Flickinger, Jun-ichi Tsujii, and Hans Uszkor-
eit, editors, Collabarative Language Engineering,
chapter 6, pages 127{143. CSLI Publications,
Stanford.

Ann Copestake, Alex Lascarides, and Dan
Flickinger. 2001. An algebra for semantic
construction in constraint-based grammars. In
Proceedings of the 39th Annual Meeting of the
Association for Computational Linguistics (ACL
2001), Toulouse, France.

Ann Copestake. 1990. An approach to building the
hierarchical element of a lexical knowledge base
from a machine readable dictionary. In Proceed-
ings of the First International Workshop on In-
heritance in Natural Language Processing, pages
19{29, Tilburg. (ACQUILEX WP NO. 8.).

William Dolan, Lucy Vanderwende, and Stephen D.
Richardson. 1993. Automatically deriving struc-
tured knowledge from on-line dictionaries. In Pro-
ceedings of the Paci c Association for Computa-
tional Linguistics, Vancouver.

Charles J. Fillmore, Collin F. Baker, and Hiroaki
Sato. 2002. Seeing arguments through transpar-
ent structures. In Proceedings of the Third Inter-
national Conference on Language Resources and
Evaluation (LREC-2002), pages 787{91, Las Pal-
mas.

Atsushi Fujii and Tetsuya Ishikawa. 2004. Summa-
rizing encyclopedic term descriptions on the web.
In 20th International Conference on Computa-
tional Linguistics: COLING-2004, Geneva. (this
volume).

Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai,
Akio Yokoo, Hiromi Nakaiwa, Kentaro Ogura,
Yoshifumi Ooyama, and Yoshihiko Hayashi. 1997.
Goi-Taikei | A Japanese Lexicon. Iwanami
Shoten, Tokyo. 5 volumes/CDROM.

Kaname Kasahara, Kazumitsu Matsuzawa, and
Tsutomu Ishikawa. 1997. A method for judgment
of semantic similarity between daily-used words
by using machine readable dictionaries. Transac-
tions of IPSJ, 38(7):1272{1283. (in Japanese).

Kaname Kasahara, Hiroshi Sato, Francis Bond,
Takaaki Tanaka, Sanae Fujita, Tomoko Kanasugi,
and Shigeaki Amano. 2004. Construction of a
Japanese semantic lexicon: Lexeed. SIG NLC-
159, IPSJ, Tokyo. (in Japanese).

Stephan Oepen and John Carroll. 2000. Perfor-
mance pro ling for grammar engineering. Natural
Language Engineering, 6(1):81{97.

Carl Pollard and Ivan A. Sag. 1994. Head
Driven Phrase Structure Grammar. University of
Chicago Press, Chicago.

German Rigau, Jordi Atserias, and Eneko Agirre.
1997. Combining unsupervised lexical knowledge
methods for word sense disambiguation. In Pro-
ceedings of joint EACL/ACL 97, Madrid.

Melanie Siegel and Emily M. Bender. 2002. E cient
deep processing of Japanese. In Procedings of the
3rd Workshop on Asian Language Resources and
International Standardization at the 19th Interna-
tional Conference on Computational Linguistics,
Taipei.

Takenobu Tokunaga, Yasuhiro Syotu, Hozumi
Tanaka, and Kiyoaki Shirai. 2001. Integration
of heterogeneous language resources: A monolin-
gual dictionary and a thesaurus. In Proceedings of
the 6th Natural Language Processing Paci c Rim
Symposium, NLPRS2001, pages 135{142, Tokyo.

Masatoshi Tsuchiya, Sadao Kurohashi, and Satoshi
Sato. 2001. Discovery of de nition patterns by
compressing dictionary sentences. In Proceedings
of the 6th Natural Language Processing Paci c
Rim Symposium, NLPRS2001, pages 411{418,
Tokyo.

Hiroaki Tsurumaru, Katsunori Takesita, Itami Kat-
suki, Toshihide Yanagawa, and Sho Yoshida.
1991. An approach to thesaurus construction
from Japanese language dictionary. In IPSJ SIG-
Notes Natural Language, volume 83-16, pages
121{128. (in Japanese).
