Extending the Coverage of a Valency Dictionary
Sanae Fujita and Francis Bond
fsanae, bondg@cslab.kecl.ntt.co.jp
2-4 Hikari-dai Seika-cho, Kyoto, Japan 619-0237
NTT Communication Science Laboratories,
Nippon Telegraph and Telephone Corporation
Abstract
Information on subcategorization and selec-
tional restrictions is very important for nat-
ural language processing in tasks such as
monolingual parsing, accurate rule-based ma-
chine translation and automatic summarization.
However, adding this detailed information to a
valency dictionary is both time consuming and
costly.
In this paper we present a method of assign-
ing valency information and selectional restric-
tions to entries in a bilingual dictionary, based
on information in an existing valency dictio-
nary. The method is based on two assump-
tions: words with similar meaning have simi-
lar subcategorization frames and selectional re-
strictions; and words with the same translations
have similar meanings. Based on these assump-
tions, new valency entries are constructed for
words in a plain bilingual dictionary, using en-
tries with similar source-language meaning and
the same target-language translations. We eval-
uate the e ects of various measures of similarity.
1 Introduction
One of the severest problems facing machine
translation between Asian languages is the lack
of suitable language resources. Even when
word-lists or simple bilingual dictionaries exist,
it is rare for them to include detailed informa-
tion about the syntax and meaning of words.
In this paper we present a method of adding
new entries to a bilingual valency dictionary.
New entries are based on existing entries, so
have the same amount of detailed information.
The method bootstraps from an initial hand
built lexicon, and allows new entries to be added
cheaply and e ectively. Although we will use
Japanese and English as examples, the algo-
rithm is not tied to any particular language
pair or dictionary. The core idea is to add
new entries to the valency dictionary by using
Japanese-English pairs from a plain bilingual
dictionary (without detailed information about
valency or selectional restrictions), and build
new entries for them based on existing entries.
It is well known that detailed information
about verb valency (subcategorization) and se-
lectional restrictions is useful both for monolin-
gual parsing and selection of appropriate trans-
lations in machine translation. As well as being
useful for resolving parsing ambiguities, verb
valency information is particularly important
for complicated processing such as identi cation
and supplementation of zero pronouns. How-
ever, this information is not encoded in normal
human-readable dictionaries, and is hard to en-
ter manually. Shirai (1999) estimates that at
least 27,000 valency entries are needed to cover
around 80% of Japanese verbs in a typical news-
paper, and we expect this to be true of any lan-
guage. Various methods of creating detailed en-
tries have been suggested, such as the extraction
of candidates from corpora (Manning, 1993; Ut-
suro et al., 1997; Kawahara and Kurohashi,
2001), and the automatic and semi-automatic
induction of semantic constraints (Akiba et al.,
2000). However, the automatic construction of
monolingual entries is still far from reaching
the quality of hand-constructed resources. Fur-
ther, large-scale bilingual resources are still rare
enough that it is much harder to automatically
build bilingual entries.
Our work di ers from corpus-based work such
as Manning (1993) or Kawahara and Kurohashi
(2001) in that we are using existing lexical re-
sources rather than a corpus. Thus our method
will work for rare words, so long as we can  nd
them in a bilingual dictionary, and know the
English translation. It does not, however, learn
new frames from usage examples.
In order to demonstrate the utility of the
valency information, we give an example of a
sentence translated with the system default in-
formation (basically a choice between transitive
and intransitive), and the full valency informa-
tion. The verb is a0a2a1a4a3a6a5 kamei-suru \order",
which takes a sentential complement. In (1)1
the underlined part is the sentential comple-
ment. The verb valency entry is the same as a7
a8 a3a9a5 joushin-suru \report" [NP-ga Cl-to V],
except with the clause marked as to-in nitival.2
The translation with the valency information
is far from perfect, but it is comprehensible.
Without the valency information the translation
is incomprehensible.
(1) a10a12a11
koku-ou
king
a13
wa
top
a14a16a15
kerai
subordinate
a17
ni
dat
a18a20a19 a21a23a22 a24
shutsugeki shiro,
sally forth
a25
to
quot
a0a2a1 a21a27a26a29a28
kamei-shita.
ordered
\The king ordered his follower to sally
forth."
with: The king ordered a follower
that sallied forth.
without: * ordered to a follower
that the king, sallied forth.
In general, translation tends to simplify text,
because the target language will not be able to
represent exactly the same shades of meaning
as the source text, so there is some semantic
loss. Therefore, in many cases, a single tar-
get language entry is the translation of many
similar source patterns. For example, there
are 23 Japanese predicates linked to the En-
glish entry report in the valency dictionary used
by the Japanese-to-English machine translation
system ALT-J/E (Ikehara et al., 1991).
2 Experimental Method
The approach is based on that of Fujita and
Bond (2002). For the explanation we assume
1We use the following abbreviations: top: topic
postposition; acc: accusative postposition; dat: dative
postposition; quot: quotative postposition; NP: noun
phrase: Cl: clause; V: verb.
2The subordinate clause is incorrectly translated as a
that-clause. This is a bug in the English generation; the
Japanese parse and semantic structure are correct.
that the source language is Japanese, and the
target language is English, although nothing de-
pends on this.
2.1 Method of Making New Patterns
Our method is based on two facts: (1) verbs
with similar meanings typically have similar va-
lency structures; (2) verbs with identical trans-
lations typically have similar meanings. We use
three resources: (1) a seed valency dictionary
(in this case the verbs from ALT-J/E’s valency
dictionary, ignoring all idiomatic and adjecti-
val entries | this gave 5,062 verbs and 11,214
valency patterns3) ; (2) a plain bilingual dic-
tionary which contains word pairs without va-
lency information (in our case a combination of
ALT-J/E’s Japanese-English word transfer dic-
tionary and EDICT (Breen, 1995)); and (3) a
source language corpus (mainly newspapers).
Our method creates valency patterns for
words in the bilingual dictionary whose English
translations can be found in the valency dictio-
nary. We cannot create patterns for words with
unknown translations. Each combination con-
sists of JU, an Unknown word for which we have
no valency information; E, its English transla-
tion (or translations); which is linked to one or
more valency patterns JV in the valency dictio-
nary. Figure 1 shows the overall  ow of creating
candidate patterns.
For each entry in the plain J-E dictionary
 If no entries with the same Japanese (JU)
exist in the valency dictionary
{ For each valency entry (JV ) with the
same English (E)
 Create a candidate pattern consist-
ing of JV replaced by JU
Figure 1: Creating Candidate Patterns
The de nition of \similar meaning" used to
generate new patterns is that they have the
same English translation. We had to make
this quite loose: any entry with the same En-
3We call an entry in the valency dictionary (consist-
ing of source and target language subcategorization in-
formation and selectional restrictions on the source side)
a valency pattern.
glish head. Therefore give up and give back are
counted as the same entry. This allows for mi-
nor inconsistencies in the target language dic-
tionaries. In particular the valency dictionary is
likely to include commonly appearing adjuncts
and complements that do not normally appear
in bilingual dictionaries. For example: iku \go"
is translated as to go in EDICT, go in the ALT-
J/E word transfer dictionary and NP go from
NP to NP in the ALT-J/E valency dictionary
(among other translations). To match these en-
tries it is necessary to have some  exibility in
the English matching.
In order to  lter out bad candidates, we com-
pare the usage of JV with JU using examples
from a corpus. Two judgments are made for
each paraphrase pair: is the paraphrase gram-
matical, and if it is grammatical, are the mean-
ings similar? This judgment can be done by
monolingual speakers of the source language.
This is done in both directions:  rst we  nd ex-
ample sentenes using JU, replacing JU with JV
and compare the paraphrased sentences, then
we  nd sentences for valence patterns using JV ,
replace with JU and judge the similarity. Fig-
ure 2 shows the comparison using paraphrases.
In the implementation, we added a pre- lter:
reject, for verb pairs that obviously di ered
in meaning. This allowed the analysts to im-
mediately reject verb pairs (JU-JV ) that were
obviously not similar, and speeded up things
considerably (e.g., JU
a0a2a1
a3 a5 ketchaku-suru
\settle on" has translation E settle; both JV 1
a3a5a4
a1a7a6
ochitsuku \calm down" and JV 2
a8a10a9
a3 a5 teijuu-suru \settle in" are candidates but
JV 2 appear because of the polysemy of E.4).
The three grammaticality classes are:
grammatical, ungrammatical, grammatical
in some context.5 Semantic similarity was
divided into the following classes:
 same: JU
a11
a3 odosu \threaten" and J
V a12
a3 odosu \threaten"
 close: JU
a13
a8 a3 a5 gushin-suru \report"
and JV a7 a8 a3a6a5 joushin-suru \report"
4JV 1 is equivalent to wordnet sense 8 \become quiet",
JV 2 is sense 4 \take up residence and become estab-
lished" (Fellbaum, 1998).
5The analysts also rejected 7.9% of the example sen-
tences as irrelevant. These were sentences where the verb
did not actually appear, but that had been selected due
to errors in the morphological analysis.
For each candidate pattern JU-E (from JV -E)
 If JU is obviously di erent to JV
reject
 Extract 5 sentences using JU from the cor-
pus
For each sentence
{ Replace JU with JV
{ Classify the paraphrased sentence into
3 grammaticality classes
if the class is grammatical
 Classify the semantic similarity
into 6 classes
 Extract 5 sentences using each pattern of
JV from the corpus
{ Replace JV with JU
{ Test as above
Figure 2: Paraphrasing Check
 [JU] broader: JU
a14a16a15
a18 a3 tsukuri-dasu
\create" and JV
a17a19a18
a3 a5 hatsumei-suru
\invent"
 [JU] narrower: JU
a20a22a21
a3 a5 saikon-suru
\remarry" and JV
a23a24a21
a3 a5 kekkon-suru
\marry"
 different nuance: JU
a25a27a26
a3 a5 oushuu-
suru \expropriate" and JV
a28a29a15
a7a31a30
a5 to-
riageru \con scate" (JU is more formal
than JV .)
 different: JU
a32
a4a19a33a5a34a36a35 tachi-mukau
\confront" and JV
a37a39a38
a3 a5 hanron-suru
\argue against" (their meanings overlap so
they are classi ed into other classes in some
context.)
Next, we give an example of the paraphras-
ing; for the unknown Japanese word JU
a13
a8 a3
a5 gushin-suru \report" we look at the existing
word JV a7 a8 a3 a5 joushin-suru \report" which
exists in the valency dictionary, with the same
English translation.
We extract 5 sentences from our corpus which
use JU, for example (2; slightly simpli ed here),
and replace JU with JV (3).
(2) a0a2a1
bassoku
penalty
a3
o
acc
a4a6a5a8a7a10a9
omoku-suru
increase
a11a2a12
hitsuyou
need
a13
wa
top
a14a16a15
nai
nothing
a17
to
quot
a18a10a19a21a20a23a22 a24
gushin-shita.
reported.
\I reported that there is no need to make
the penal regulations more severe."
(3) a0a2a1
bassoku
a3
o
a4a6a5
omoku
a7a25a9
suru
a11a2a12
hitsuyou
a13
wa
a14a26a15
nai
a17
to
a27 a19a28a20a23a22 a24
jyoushin-shita.
The paraphrase (3) is grammatical and the
pair (2, 3) have close meanings. This is done
for all  ve sentences containing JU and then
done in reverse for all 5 sentences matching the
pattern for JV .
2.2 Experiment
To test the useful range of our algorithm, we
considered the coverage of ALT-J/E’s valency
dictionary on 9 years of Japanese newspaper
text (6 years of Mainichi and 3 years of Nikkei)
(see Table 1). The valency dictionary had
Japanese entries for 4,997 verb types (37.5%),
which covered most of the actual words (92.5%).
There were 8,304 verbs with no Japanese en-
try in the valency dictionary. Of those, 4,129
(49.7 %) verbs appear in ALT-J/E’s Japanese-
English transfer dictionary or EDICT and have
a pattern with the same translation in the va-
lency dictionary.6 Most of these, 3,753 (90.9 %)
have some examples that can be used to check
the paraphrases. We made candidate patterns
for these verbs.
Table 1: Cover Ratio for Japanese Newspapers
(9 years)
In lexicon No. of Types (%) No. of Tokens (%)
Jp exists 4,997 37.5 24,656,590 92.5
En exists 4,129 31.0 1,355,552 5.1
No entry 4,175 32.4 645,158 2.4
Total 13,301 100.0 26,657,300 100.0
6EDICT typically translates Japanese verbal nouns
as nouns, without giving a separate verb entry: e.g., a29
a30 ky od o \cooperation". We used ALT-J/E’s English
morphological dictionary and the EDICT part-of-speech
codes to create 10,395 new verb entries such as: a29 a30a32a31
a33 ky od o-suru \cooperate".
For the 3,753 target verbs, we did the check
using the pre- lter and paraphrasing. The
original number of candidates was enormous:
108,733 pairs of JU and JV . Most of these
were removed in the pre- ltering stage, leav-
ing 2,570 unknown verbs matching 6,888 verbs
in the valency dictionary (in fact, as the pre-
 lter check doesn’t need the valency patterns,
they can be made after this stage). When these
were expanded into patterns, they made a total
of 8,553 candidate patterns (3.3 patterns/verb)
whose semantic similarity was then checked us-
ing paraphrases.
It took the lexicographer about 7 minutes per
verb to judge the  tness of the paraphrases; all
the rest of the construction was automatic. This
is a signi cant speed-up over the 30 minutes
normally taken by an expert lexicographer to
construct a valency entry.
3 Evaluation and Results
We evaluated the e ect on translation qual-
ity for each new pattern that had at least one
paraphrase that was grammatical. There were
6,893 new patterns, for 2,305 kinds of verbs (3.0
patterns/verb). For each verb (JU) we picked
two shortish sentences (on average 81.8 charac-
ters/sentence: 40 words) from a corpus of 11
years of newspaper text (4 years of Mainichi
and 5 of years Nikkei). This corpus had not
been used in the paraphrasing stage, i.e., all
the sentences were unknown. We tried to get
2 sentences for each verb, but could only  nd
one sentence for some verbs: this gave a total
of 4,367 test sentences.
Translations that were identical were marked
no change. Translations that changed were
evaluated by Japanese native speakers who
are  uent in English. The new translations
were placed into three categories: improved,
equivalent and degraded. All the judgments
were based on the change in translation quality,
not the absolute quality of the entire sentence.
We compared the translations with and with-
out the valency patterns. There were two set-
ups. In the  rst, we added one pattern at
a time to the valency dictionary, so we could
get a score for each pattern. Thus verbs with
more than one pattern would be tested multi-
ple times. In this case we tested 13,140 di er-
ent sentence/pattern combinations. In the sec-
ond, we added all the patterns together, and let
the system select the most appropriate pattern
using the valency information and selectional
restrictions. The results of the evaluation are
given in Table 2.
Table 2: Evaluation of New Valency Entries
Each pattern All patterns
Judgment No. % No. %
improved 4,536 34.5 % 1,636 37.5 %
no change 3,238 24.6 % 1,063 24.3 %
equivalent 3,465 26.4 % 1,115 25.5 %
degraded 1,901 14.5 % 552 12.6 %
Total 13,140 100.0% 4,366 100.0 %
As can be seen in Table 2, most sen-
tences improved followed by equivalent or no
change. Degraded is a minority. There was
a clear improvement in the overall translation
quality.
In particular, the result using all patterns
(which is the way the dictionary would nor-
mally be used) is better than using one pat-
tern at a time. There are two reasons: (1)
Many verbs have di erent alternations. When
we used all patterns, we covered more alterna-
tions, therefore the system could select the right
entry based on its subcategorization. (2) The
entries also have di erent selectional restrictions
for di erent translations. When we used all pat-
terns, the system could select the best valency
entry based on its selectional restrictions. Even
without using the full paraphrase data, only
the pre- lter and the grammatical judgments,
37.5% of translations improved and only 12.6%
degraded, an overall improvement of 24.9%.
Now we analyze the reasons for the improved
and degraded translations. The reasons for
improved: (1) The system was able to trans-
late previously unknown words. The transla-
tion may not be the best but is better than
an unknown word. (2) A new pattern with a
better translation was selected. (3) The sen-
tence was translated using the correct subcate-
gorization, which allowed a zero pronoun to be
supplemented or some other improvement. The
reasons for degraded: (1) the detailed nuance
was lost. For example, a0a2a1 a7a2a30 a5 nade-ageru
\brush up" became simply brush. (2) A new
pattern was selected whose translation was less
appropriate.
Further Re nements
We then examined the paraphrase data in an
attempt to improve the quality even further by
 ltering out the bad entries. To examine this,
we de ned scores for the evaluation categories:
improved is +1, no change and equivalent
are +0.5 and degraded is -1. Because we used
up to two evaluation sentences for each JU, the
evaluation score varies between -2 and 2.
We expected that restricting the patterns to
those with grammatical paraphrases and same
or close meaning would improve translation
quality. However, as can be seen in Table 3,
the distribution of improvements in translation
quality did not change signi cantly according to
the percentage of paraphrases that were either
same or close.
One reason for the lack of correlation of the
results is that change in translation quality of
an example sentence is a very blunt instrument
to evaluate the  tness of a lexical entry. Partic-
ularly for complicated sentences, the parse may
improve locally, but the translation degrade
overall. In particular, a translation that was
incomprehensible could become comprehensible
but with the wrong meaning: the worst possible
result for a translation system. However, even
allowing for these imperfections, the result still
holds: a test for paraphrasability on a small set
of sentences is not a useful indicator of whether
two verbs have the same valency. One reason
for the lack of utility of the paraphrase tests
is that the example sentences were chosen ran-
domly: there is no guarantee that they show
either typical usage patterns or the full range of
use.
We were actually suprised by these results,
so we tried various other combinations of gram-
maticality and semantic similarity, and found
the same lack of e ect. We also tried a mono-
lingual similarity measure based on word-vector
distances taken from word-de nitions and cor-
pora (Kasahara et al., 1997). This measure
was also not e ective. We then looked at our
inital grammaticality  lter (at least one para-
phrase that was grammatical). Evaluating a
small sample of verbs that failed this test (eval-
uated on 50 sentences), we found that only 16%
improved and 24% were degraded. Therefore
this test was useful. However, it only removed
265 verbs (less than 10%). If we had left them
Table 3: grammatical and either same or close
% same Change in Translation Quality
or close -2,-1 % -0.5 % 0 % 0.5, 1 % 1.5 % 2 % All
- 10 248 8 364 11 148 4 1,160 36 740 22 571 17 3,231
- 20 0 0 0 0 0 0 1 100 0 0 0 0 1
- 30 32 6 58 10 23 4 161 30 142 26 117 21 533
- 40 6 6 12 12 4 4 40 40 16 16 21 21 99
- 50 22 9 31 12 13 5 71 28 56 22 57 22 250
- 60 10 5 25 13 17 9 81 43 25 13 30 15 188
- 70 14 6 27 11 12 5 77 33 52 22 52 22 234
- 80 3 3 9 9 12 12 35 36 21 21 17 17 97
- 90 20 9 24 10 12 5 71 32 57 25 41 18 225
- 100 91 5 149 8 90 4 682 37 437 23 376 20 1,825
Total 446 7 699 10 331 4 2,379 36 1,546 23 1,282 19 6,683
in, the extrapolated result (testing each pattern
individually) is an improvement of 32% versus a
degradation of 16%, which should improve fur-
ther if all patterns are tested together.
Next, we looked at the target language trans-
lation (used to judge that the meanings are sim-
ilar). We made the conditions used to match the
English stricter (e.g., head match plus n words
di erent for various n), and found no useful dif-
ference.
Finally, we looked at the source of the English
used to  nd the candidate patterns. Transla-
tions with a very low preference value in our
system dictionary (i.e., the 12th best or lower
translation for a word with 12 translations or
more) were signi cantly worse. However there
were only 4 such verbs, so it is not a very use-
ful  lter in practice. An interesting trend we did
 nd was that translations found in EDICT were
signi cantly better than those found in ALT-
J/E’s transfer dictionary (see Table 4).
The main reason that EDICT gave such good
results was that words with no entry in ALT-
J/E’s transfer dictionary could not be trans-
lated at all by the default system: the trans-
lated system included the Japanese verb as is.
Building new patterns from EDICT allowed the
system to  nd a translation in these cases.
4 Discussion and Future Work
Overall, our method of fully exploiting existing
resources was able to create many useful valency
entries, but caused some degradations where we
could not add entries for the most appropriate
translations. We were able to add entries for
2,305 di erent verbs to an initial lexicon with
5,062 verbs.
The results of our examination of various
 lters on the constructed patterns were nega-
tive. Contrary to the claims of Fujita and Bond
(2002), using paraphrasing as a  lter did not
help to improve the quality. However, their
central claim, that words with similar meaning
have similar valency, and that words with the
same translations often have similar meanings
was validated.
From a practical point of view the results
are encouraging: we can build useful new pat-
terns with only a simple monolingual judgment
as pre- lter: \are these verbs similar in mean-
ing", and these patterns improve the quality of
translation. Even this crude  lter produces im-
provements in 32% of sentences versus degrada-
tions in only 16%. Adding a check on grammat-
icality of paraphrases gives an overall improve-
ment in translation quality for these verbs of
24.9% (37.5% improved, 12.6% degraded). Fur-
ther evaluation of the semantic similarity of the
paraphrases did not improve these results.
EDICT, which is built by the collaboration
of many volunteers, had a wider coverage than
the ALT-J/E system dictionary. This shows the
quality of such cooperatively built dictionaries.
New projects to build lexical resources for Asian
languages should take heed and (a) make the re-
sources freely available and (b) encourage peo-
ple to contribute to them. Projects already em-
bracing this approach include SAIKAM (Thai-
Japanese), PAPILLON (Japanese-French-. .. )
and LERIL (English-Hindi/Tamil/. . .).
Table 4: E ect of Translation Source
Translation Change in Translation Quality
Source -2,-1 % -0.5 % 0 % 0.5, 1 % 1.5 % 2 % All
ALT-J/E 312 9 481 13 228 6 1,284 36 793 22 498 13 3,596
ALT/EDICT 72 6 128 10 44 3 501 43 256 21 168 14 1,169
EDICT 110 4 194 7 103 3 894 33 681 25 739 27 2,721
Total (%) 494 7 803 10 375 5 2,679 36 1,730 23 1,405 18 7,486
We therefore propose a method of building
information-rich lexicons that proceeds as fol-
lows: (1) build a seed lexicon by hand; (2) ex-
tend it semi-automatically using a simple pre-
 lter check; (3) release the resulting lexicon so
that people can use it and provide feedback to
remove bad entries. People will not use a lexi-
con if it is too bad, but an error rate of 12.6%
should be acceptable, especially as it can be eas-
ily decreased with feedback.
5 Conclusion
In this paper we present a method of assign-
ing valency information and selectional restric-
tions to entries in a bilingual dictionary. The
method exploits existing dictionaries and is
based on two basic assumptions: words with
similar meaning have similar subcategorization
frames and selectional restrictions; and words
with the same translations have similar mean-
ings.
A prototype system allowed new patterns to
be built at a cost of less than 7 minutes per
pattern. An evaluation of 6,893 new patterns
showed that adding them to a Japanese-to-
English machine translation system improved
the translation for 37.5% of sentences using
these verbs, and degraded it for 12.6%, a sub-
stantial improvement in quality.
Acknowledgments
The authors would like to thank the other members
of the NTT Machine Translation Research Group,
Satoshi Shirai and Timothy Baldwin. This research
was supported by the research collaboration between
the NTT Communication Science Labs and CSLI.

References
Yasuhiro Akiba, Hiromi Nakaiwa, Satoshi Shirai,
and Yoshifumi Ooyama. 2000. Interactive gener-
alization of a translation example using queries
based on a semantic hierarchy. In ICTAI-00,
pages 326{332.
Jim Breen. 1995. Building an electronic
Japanese-English dictionary. Japanese Stud-
ies Association of Australia Conference
(http://www.csse.monash.edu.au/~jwb/
jsaa_paper/hpaper.html).
Christine Fellbaum, editor. 1998. WordNet: An
Electronic Lexical Database. MIT Press.
Sanae Fujita and Francis Bond. 2002. A method of
adding new entries to a valency dictionary by ex-
ploiting existing lexical resources. In Ninth Inter-
national Conference on Theoretical and Method-
ological Issues in Machine Translation: TMI-
2002, pages 42{52, Keihanna, Japan.
Satoru Ikehara, Satoshi Shirai, Akio Yokoo, and Hi-
romi Nakaiwa. 1991. Toward an MT system with-
out pre-editing { e ects of new methods in ALT-
J/E{. In Third Machine Translation Summit:
MT Summit III, pages 101{106, Washington DC.
(http://xxx.lanl.gov/abs/cmp-lg/9510008).
Kaname Kasahara, Kazumitsu Matsuzawa, and
Tsutomu Ishikawa. 1997. A method for judgment
of semantic similarity between daily-used words
by using machine readable dictionaries. Transac-
tions of IPSJ, 38(7):1272{1283. (in Japanese).
Daisuke Kawahara and Sadao Kurohashi. 2001.
Japanese case frame construction by coupling the
verb and its closest case component. In Proceed-
ings of First International Conference on Human
Language Technology Research (HLT 2001), pages
204{210, San Diego.
Christopher D. Manning. 1993. Automatic acquisi-
tion of a large subcategorization dictionary from
corpora. In 31st Annual Meeting of the Asso-
ciation for Computational Linguistics: ACL-93,
pages 235{242.
Satoshi Shirai. 1999. Tanbun-no ketsug o pat an-
no m ora-teki sh ush u-ni mukete [towards a
comprehensive cover of sentence valency pat-
terns]. In NLP Symposium. (In Japanese:
www.kecl.ntt.co.jp/icl/mtg/members/
shirai/nlpsym99.html).
Takehito Utsuro, Takashi Miyata, and Yuji Mat-
sumoto. 1997. Maximum entropy model learn-
ing of subcategorization preference. In Proc. 5th
Workshop on Very Large Corpora, pages 246{260.
