Obtaining Japanese Lexical Units for Semantic Frames
from Berkeley FrameNet Using a Bilingual Corpus
Toshiyuki Kanamaru
Kyoto University
Yoshida Nihonmatsu-cho, Sakyo-ku
Kyoto, 606-8501, Japan
kanamaru@hi.h.kyoto-u.ac.jp
Masaki Murata Kow Kuroda Hitoshi Isahara
National Institute of Information and
Communications Technology (NICT)
3-5 Hikaridai, Seikacho, Sorakugun
Kyoto, 619-0289, Japan
{murata,kuroda,isahara}@nict.go.jp
Abstract
An attempt was made to semi-automatically ob-
tain “lexical units” (LUs) for Japanese from
the English LUs defined in the semantic frame
database provided by Berkeley FrameNet (BFN)
using an English-Japanese bilingual corpus.
This task was a prerequisite to building a com-
plete database of semantic frames for Japanese.
In the task, a Japanese word is first translated
into an English word or phrase, E. E is one
of the lexical units that evoked a particular se-
mantic frame, F, in the BFN database. When
other lexical units of F are translated back into
Japanese, this defines a candidate set of F for
the lexical units of F in Japanese. The via-
bility of the proposed method was tested on a
Japanese verb (X-ga Y -wo) osou (roughly mean-
ing “X attack(s) Y ,” “X hit(s) Y ,” “X surprise(s)
Y ” in English, showing that it is a relatively pol-
ysemous word). The resulting translation was
compared to semantic descriptions provided by
IPAL and Nihongo Goi-Taikei (A Japanese Lex-
icon), two well-known language resources for
Japanese, and also by the Frame Oriented Con-
cept Analysis of Language (FOCAL). The com-
parison revealed that FOCAL, BFN, Goi Taikei,
and IPAL provided finer-grained descriptions in
this specific order.
1 Introduction
Making use of deep semantics in information pro-
cessing is one of the major problems confronting
today’s NLP community. More and more NLP
researchers are realizing that they need seman-
tic/lexical resources that go beyond such ones as
WordNet (Fellbaum, 1998) that only specify hier-
archical semantic relationships. One of the cru-
cial reasons for this is that raw linguistic data
embodies semantic associations that are difficult
to capture in terms of such hierarchical relation-
ships, one of which is the so-called “semantic
field” effect, a class of associative relationships
among words (or concepts). To deal with these
issues, deeper semantics are needed with descrip-
tions that incorporate ontological inferences. Let
us assume that X attacked Y is to be interpreted.1
This is a complex situation. In interpreting The
man attacked a bank, it may be necessary to spec-
ify (by inference) that the subject used a weapon
(e.g., a gun) and his purpose was to obtain money
(illegally), whereas in interpreting The wolf at-
tacked a flock of sheep, it may be necessary to
specify that the subject never used a weapon and
its purpose was to eat one or two individual sheep
(rather than the entire flock) after killing them.
Relevant inferences are clearly situation-based, or
“case-based” in the sense of Case-based Reason-
ing (Kolodner, 1993), and difficult to specify in
terms of the lexical semantic descriptions avail-
able in resources such as WordNet (Fellbaum,
1998) which don’t specify associative relation-
ships among concepts, including the relationships
between ROBBER (e.g., a man) and WAREHOUSE
OF VALUABLES (e.g., a bank, museum, jewelry
shop), and the one between a PREDATOR (e.g., a
wolf) and its PREY (e.g., sheep, rabbit). Thus, the
NLP community has a critical need for resources
that encode this kind of information.
Along with PropBank (Kingsbury and Palmer,
2002; Ellsworth et al., 2004), Berkeley FrameNet
1One of the anonymous reviewers told us that it was un-
clear how ontological inferences of this sort are related to
BFN’s frame definitions. The question boils down to the
question of definition, i.e., what kind of information we need
to define semantic frames to encode, and as we will see later,
this is exactly the question addressed by FOCAL claiming
that BFN frames are too coarse-grained to be used as an ef-
fective knowledge-base for ontological inferences.
11
(BFN) (Baker et al., 1998) is an ongoing research
project that is attempting to meet the demand for
resources that encode deeper lexical semantics by
providing a semantic frame lexicon (sometimes
called the “FrameNet”) and a corpus annotated
for semantic information encoded in terms of se-
mantic frames.
Thus far, BFN has produced “a lexical database
that currently contains more than 8,900 lexical
units, more than 6,100 of which are fully anno-
tated, in more than 625 semantic frames, exem-
plified in more than 135,000 annotated sentences”
(cited from the FrameNet web page). Other
ongoing projects, i.e., the German FrameNet
or “SALSA” (Erk et al., 2003), the Spanish
FrameNet (Subirats and Petruck, 2003), and the
Japanese FrameNet (Ohara et al., 2003), are try-
ing to build lexical resources that are compatible
with the BFN, but for Japanese at least, no data
has been released in a usable form, except for a
few annotation examples for verbs of motion.
In sum, no useful resource exists for frame-
based description/analysis of Japanese. This is
one of the reasons that we attempted the task in
this paper, along with our efforts to assess the use-
fulness of the database provided by BFN.
The anonymous reviewers of our paper pointed
out that there have been some similar projects
and other methodologies that have tried to trans-
late BFN into other languages automatically, such
as BiFrameNet (Chen and Fung, 2004) and Ro-
mance FrameNet2, and that it would have been
better to include the comparison against them.
BiFrameNet presented an automatic approach
to constructing a bilingual semantic network us-
ing the Chinese HowNet, which is a Chinese
ontology. While it is an interesting approach,
we have not compared their results with ours,
mainly because they seem to have used differ-
ent resources and had somewhat different goals,
along with the space consideration.
No papers are released, let alone being avail-
able to us, related to the Romance FrameNet
project for the time being. We couldn’t help
putting a comparison with it on hold.3
2http://ic2.epfl.ch/∼pallotta/rfn/
3One of the anonymous reviewers criticized us for failing
to mention Romance FrameNet project in our paper; it is just
unreasonable. The project was announced on June 1 on the
2 Proposed Procedure
We used a bilingual corpus (Utiyama and Isahara,
2003) to examine which semantic frames of BFN
contained LUs relevant to the Japanese verb osou.
JFN, for example, used a mono-lingual corpus to
construct the semantic frames. In cases like this,
the construction might be inefficient because they
have to construct all semantic frames by them-
selves. But this affects on the reliability of the
frames identified and described. This risk of arbi-
trary description can be reduced by using a bilin-
gual corpus, if it is of high-quality.
2.1 Identifying English equivalents of ”osou”
We chose Japanese-English alignments from the
bilingual corpus in which the Japanese text con-
tained osou, i.e., the target verb. We obtained 135
alignments from the corpus.
The bilingual corpus is consists of two subcor-
pra. One subcorpus is made of one-to-one align-
ments. Another is of one-to-many alignments. In
the latter, one Japanese sentence is aligned with
several English sentences.
In the first case, it was straightforward to spec-
ify an English word or phrase that translated the
target verb, osou. In the second case, however, it
is not. So, we singled out an English sentence that
corresponds to a Japanese sentence that contained
osou. In this process, the identification of osou’s
English translations was done manually.
After this procedure, the following five verbs
were identified as English translations of osou:
assault, attack, hit, pound, and strike4.
2.2 Identifying relevant semantic frames
Based on these five verbs, we extracted seman-
tic frames using FrameSQL (Sato, 2003). Seman-
tic frames with LUs that included any of the five
verbs were chosen from the BFN semantic frame
database (referred to here as BFN).
Corpora Mailing List, just one week before the submission
deadline. This means that we had little chance to know about
the project unless we were “insiders.”
4There were a few other verbs or constructions that
served as English translations of osou in the alignments: for
example, besiege, engulf, feel pain, occur, hurt, kill, rob,
shoot, stab, suffer, wreak on were used as its translations.
But we filtered out those less frequent items (whose fre-
quency is less than 3) for purposes of simplicity.
12
Based on Frame Semantics (Fillmore, 1982),
BFN posits that a semantic frame is an organi-
zation of “semantic roles,” which BFN terms as
“Frame Elements” (FEs). Usually, LUs are in-
stantiations or lexical realizations of FEs. Thus,
an LU in a frame, F, is a word, or phrase, that, ac-
cording to the assumptions of Frame Semantics,
“evokes” frame F. The definition of the 〈Attack〉
frame in the BFN database is used in Figure 1to
illustrate the procedure. As indicated, assault, at-
tack and strike are listed as LUs of the 〈Attack〉
frame.
After manually examining all the semantic
frames thus obtained, the five BFN frames were
recognized as relevant to the various senses of the
target word osou: 1. 〈Attack〉; 2. 〈Cause harm〉
3. 〈Experience bodily harm〉 4. 〈Cause impact〉
5. 〈Impact〉
Semantic frames in the BFN database are sup-
posedly related to one another. There are vari-
ous relationships, some of which are sometimes
encoded by establishing explicit “frame-to-frame
relations” (such as “is used” relation) between
two frames. Using this information, we obtained
the following relationships between the five
frames: 1. 〈Attack〉; 2. 〈Cause harm〉, is used:
〈Experience bodily harm〉; 3. 〈Cause impact〉,
uses: 〈Impact〉
2.3 Identifying relevant frame-evoking LUs
in English
Each semantic frame has a number of FEs, each of
which has lexical realizations, which called LUs.
In the work reported here, only verbal LUs were
selected as relevant from the English LUs made
available in the BFN database.5 Admittedly, there
5 On this point, we recognize a certain kind of discrep-
ancy between the theory and the practice in the BFN frame-
work. If a LU is, according to its defintion, a lexical realiza-
tion of a certain FE of a certain frame, more nominals should
be identified and listed as LUs. For example, in Jack or-
dered a hamburger at McDonald’s, hamburger is a noun that
evokes the 〈Cooking creation〉 frame. While the 〈Selling〉
frame is evoked by order.v, this means that, according the
definition of LU, hamburger.n needs to be identified as an
LU of the 〈Cooking creation〉 frame; more specifically, it is
an LU that instantiates the 〈Food〉 FE of the frame. It is ob-
vious that the QUALIA STRUCTURE (Pustejovsky, 1995) of
hamburger.n contains information of this sort. We suspect
that this aspect of “frame-evocation by nominals” does not
seem to be properly recognized and coded, and that BFN’s
current practice of mostly identifying predicates as LUs is
somewhat misleading, if we could say so, because it con-
are a few nominal LUs in certain frames in the
BFN, but we ignored them because they found
them to be less relevant to our specific task.
After identifying all the relevant LUs for the
three frames above, we obtained all the English
verbs that translated the senses of the target word
osou identified in terms of Frame Semantics.
For example, the relevant LUs for the 〈Attack〉
frame are the following verbs: ambush, assault,
attack, charge, invade, jump, lay, set, storm, and
strike
As was the case with the 〈Attack〉 frame, we
extracted the relevant LUs for the 〈Cause harm〉
and 〈Cause impact〉 frames. We manually
merged the extracted LUs, and obtained 93 ver-
bal LUs relevant to the Japanese verb osou.
2.4 Obtaining LU candidates for Japanese
FEs
Table 1: 15 most frequently occurring nouns
Noun Freq.
jiken (incident) 39
boukou (criminal assault) 32
josei (woman) 28
taiho (arrest) 23
hikoku (accused, defendant) 21
yougi (charge, suspicion) 20
kougeki (attack) 20
shounen (boy) 14
tero (terrorism) 14
shougai (injury) 13
higai (damage, harm) 12
kenkei (prefectural police department) 12
manshon (apartment) 12
butai (military unit) 10
fujo (girl and woman) 10
Using the bilingual corpus again, we gathered
alignments that had English texts containing the
English LUs specified in the way previously de-
scribed. We obtained 262 alignments. This proce-
dure defined a set of Japanese sentences contain-
ing Japanese words or phrases that were natural
translations of the LUs in the BFN.
ceals the fact that there can be, and actually are, many kinds
of frame-evoking effects. BFN has been concentrating on
identifying LUs for “governors,” not LUs for the entire set of
FEs, for whatever reason. In this respect, it is crucial to note
that not all frame-evokers are frame-governors: hamburger.n
clearly evokes the 〈Cooking creation〉 frame, but there the
noun does not govern the 〈Cooking creation〉 frame. Ar-
guably, it is unreasonable and even gratuitous to posit the
〈Hamburger〉 frame to make hamburger.n a governor.
13
Attack
Definition:
An Assailant physically attacks a Victim (which is usually but not always sentient), causing or intending to cause the Victim
physical injury. The Weapon used by the Assailant may also be mentioned, in addition to the usual Place, Time, Purpose, and
Reason. Sometimes a location is used metonymically to stand for the Assailant or the Victim, and in such cases the Place FE
will be annotated on a second FE layer.
As soon as he stepped out of the bar he was SET upon by four men in ski-masks.
Is he INVADING Iraq just to cover other shortcomings?
Then Jon-O’s forces AMBUSHED them on the left flank from a line of low hills.
FEs:
Core:
Assailant [Asl] The person (or other self-directed entity) that is attempting physical harm to the Victim.
The mysterious fighter ATTACKED the guardsmen with a sabre.
Victim [Vic] This FE is the being or entity that is injured by the Assailant’s attack.
The mysterious fighter ATTACKED the guardsmen with a sabre.
Lexical Units
ambush.n, ambush.v, assail.v, assault.n, assault.v, attack.n, attack.v, charge.n, charge.v, fall.v, incursion.n, invade.v, inva-
sion.n, jump.v, lay ((into)).v, offensive.n, onset.n, onslaught.n, raid.v, set.v, storm. v, strike.n, strike.v
Created by infinity on Fri Nov 22 14:05:22 PST 2002
Figure 1: BFN definition of 〈Attack〉 frame (partial)
It should be noted, however, that there is no es-
tablished method of recognizing these units au-
tomatically; they are part of a text without being
marked as such. To solve this problem, we hy-
pothesized that their statistical properties in the
texts could be used to pick them up; i.e., we as-
sumed that these LUs were relatively specific to
these types of texts and would appear at higher
frequencies than usual in the collected text.
We collected nouns with higher frequencies un-
der this assumption using a KH Coder 6.
The results were sorted according to the parts
of speech. The high-frequency nouns thus ob-
tained are listed in Table 1.
This provided little information about the se-
mantic classification of the nouns because there
was no indication of the LUs that they instan-
tiated. Semantic groupings are latent, how-
ever. This meant that we were able to “clus-
ter” the nouns based on certain generic proper-
ties to obtain an initial approximation of these
groupings. We used a tool called msort (stand-
ing for “meaning sort”) (Murata et al., 2001) to
establish generic, domain-independent semantic
6The KH Coder is a free analyzer that uses a combination
of ChaSen (Matsumoto et al., 1999) and MySQL. This is
freely available at http://khc.sourceforge.net/.
groupings.78
Nouns occurring more than three times were
obtained, as shown below:9
human dansei (man), danshi (boy), josei (woman), fujo
(woman), joshi (girl), danji (young boy), joji (young
girl), youjo (infant girl), shounen (boy), . . .
organization kokka (country), gaikoku (foreign country),
kokusai (international), sekai (world), . . .
product yakubutsu (drug), manshon (apartment), heya
(room), keesu (case), naifu (knife), shoujuu (rifle), . . .
7msort sorts a given set of nouns based on their encod-
ings in a Japanese thesaurus Bunrui Goi-hyou (National Lan-
guage Research Institute, 1964).
8One of the anonymous reviewers commented on this
“domain-independence” with a critical tone, questioning the
validity of the proposed method. This evaluation is clearly
based on a misunderstanding: the semantic association, or
conceptual dependence, between the 〈Assailant〉 and the
〈Victim〉 FEs is already encoded when we collected only
sentences whose main verbs are osou (in Japanese texts) or
its translations (in English texts). What we have done with
msort is to get subgroupings given a larger semantic group-
ing of “harm-causing” at a more generic level. Based on
our coding experience, we are sure that subclassfication of a
given semantic class is based on “semantic types” rather than
semantic roles. To give proper subgroupings of the events
that the 〈Attack〉 frame is relevant, it is necessary to know
whether an 〈Assailant〉 is a human ([+human, +animate,
. . . ]) or an animal ([−human, +animate, . . . ]), or whether
a 〈Victim〉 is a human ([+human, +animate, . . . ]) or an
animal ([−human, +animate, . . . ]). If we insist that such
subclassifications in terms of semantic types into messy de-
tails are irrelevant, we are committing what we meant by
“mere generalizations for generalizations,” failing to recog-
nized what is really needed in NLP tasks.
9The listings ending with “. . . ” are partial.
14
body part itai (body), soshiki (organization)
plant dansei (man), josei (woman), soshiki (tissue)
space genba (field), chiiki (region), mokuteki (purpose),
hokubu (northern area), shinai (city center)
amount gruupu (group)
relation jijou (circumstances), keesu (case), jitai (matter),
jiken (incident), ryakushiki (informality), kankei (rela-
tionship), mokuteki (purpose), genkou (current), . . .
activity jisatsu (suicide), satsugai (slaying), shougai (in-
jury), juushou (serious injuries), ishiki (conscious-
ness), utagai (doubt), yougi (suspicion), sousa (inves-
tigation), sousaku (search), shirabe (investigation), . . .
2.5 Identifying LUs for Japanese FEs
Based on the generic semantic groupings pro-
duced by msort, we classified nouns into sub-
classes by intution, so that they corresponded to
the FEs of the BFN frames in the following way:
Recall that a semantic frame is a collection of
semantic roles, or FEs. In the case of 〈Attack〉,
the frame has two “core” FEs, i.e., 〈Assailant〉
and 〈Victim〉, and some other “peripheral” or
“noncore” FEs such as 〈Place〉, 〈Time〉, and
〈Weapon〉. Thus, 〈Attack〉 denotes a situation
in which an agent recognizable as an 〈Assailant〉
causes (or tries to cause) some 〈Harm〉 or 〈Injury〉
to someone or a group of people recognizable as a
〈Victim〉 at some 〈Place〉 and 〈Time〉, sometimes
using an item recognizable as a 〈Weapon〉.
This means that all we need to do is to clas-
sify the nouns in Table 1 into semantic classes
such as 〈Assailant〉, 〈Victim〉, 〈Place〉, 〈Time〉, or
〈Weapon〉, with appropriate subclasses where hu-
man assailants are distinguished from nonhuman
assailants.10 The groupings provided by msort
turned out to be useful for this purpose.11
Using this procedure, the nouns obtained on a
frequency-basis for 〈Attack〉 were classified into
the two core FEs, as follows:
10It is important to note that the target data selection pro-
cedure of BFN is biased. For example, they put aside a num-
ber of problematic cases like metaphorical expressions, and
this is clearly reflected in the current frame definitions. We
repeated noticed that metaphorically extended senses of a
word were systematically dropped in the current release of
BFN. For illustration, the sense of attack.n in heart attack
is not described in BFN. Descriptive “gaps” of this sort are
clearly undesirable; some specific kinds of mapping prob-
lems between English LUs provided in BFN and Japanese
LUs arise from this.
11We were sometimes unable to identify an FE for a noun
class based solely on the output of msort. In these cases, we
looked at its usage in the corpus to determine its FE.
• 〈Assailant〉: dansei (man), goutou (burglary/burglar,
robbery/robber), heishi (soldier), hikoku (accused per-
son), butai (military unit), kyoudan (religious group)
• 〈Victim〉: danshi (boy), josei (woman), fujo (girl and
woman), joshi (girl), danji (young boy), joji (young
girl), youjo (infant girl), shounen (boy), shoujo (girl),
aite (opponent), nihonjin (Japanese), . . .
2.6 Advantages of proposed method
Using msort turned out to be more beneficial
than anticipated when it came to selecting non-
core FEs. msort helped to determine noncore
FEs correctly to a certain extent. The 〈Attack〉
frame, for example, includes noncore FEs such as
〈Place〉, 〈Time〉, 〈Purpose〉, and 〈Reason〉 in ad-
dition to its core FEs, 〈Assailant〉 and 〈Victim〉.
msort automatically groups naifu (knife), raifuru
(rifle), and pisutoru (pistol) into the “product”
category, which corresponds to the 〈Weapon〉 FE.
Similarly, it automatically groups chiiki (Regional
site), hokubu (northern area), and shinai (Inner
city) into the “location” category, which corre-
sponds to 〈Place〉. Thus, part of the FE assign-
ment task can be done automatically using msort.
The procedure also produced some interesting
results. For example, the proposed method auto-
matically specifies a set of lexical items (or lex-
ical units) that clearly have the frame-evocation
effect but that are not properly identified as frame
elements of a semantic frame in BFN, either in
terms of core FEs or peripheral FEs (= noncore
FEs). The semantic groupings that were thus au-
tomatically identified are enumerated below:
1. Names denoting an act(ion) of N (N suru (or sareru))
(“(make) do N”): ranbou (violence), boukou (crimi-
nal assault), bouryoku (violence), jikkou (execution),
shuugeki (assault), kougeki (attack)
2. Names denoting a state of affairs N (V shita + N) (N
that S V ): satsugai (slaying), shougai (injury), goutou
(burglary/burglar, robbery/robber), satsujin (murder),
sasshou (killing and wounding)
3. Result ((Y ni) V shite, N wo owaseta) (“did V , and in-
flicted N to Y ): juushou (serious injuries)
4. Parts of the compound words: kyoushuu (assault
force) (a part of “assault” force)
5. LUs of crime-related frames resulting from 〈Attack〉:
utagai (doubt), yougi (charge, suspicion), sousa (in-
vestigation), sousaku (search), shirabe (investigation),
kentou (investigation), hanketsu (judgement), . . .
A second look at the lexical items in 1 above
confirmed that most of these words or phrases can
15
be seen as LUs that realize, in Japanese, some of
the FEs of BFN’s 〈Attack〉 frame.12 As sets of
lexical items were not classified automatically, we
had to determine all classifications manually.
2.7 Overall results
When the procedure was applied to 〈Attack〉,
〈Cause harm〉 and 〈Cause impact〉, the following
Japanese LUs for their major FEs were specified:
1. Core FEs of 〈Attack〉:
〈Assailant〉: dansei (man), goutou (burglary/burglar, rob-
bery/robber), heishi (soldier), hikoku (accused
person), . . .
〈Victim〉: danshi (boy), josei (woman), fujo (girls and
women), joshi (girl), danji (young boy), . . .
2. Noncore FEs of 〈Attack〉:
〈Place〉: genba (field), chiiki (region), hokubu (northern
part), shinai (city center)
〈Weapon〉: naifu (knife), shoujuu (rifle), tanjuu (pistol)
3. Core FEs of 〈Cause harm〉:
〈Body part〉 : senaka (back)
4. Core FEs of 〈Cause impact〉:
〈Impactee〉: doru (dollar), shijou (market), ginkou (bank),
shokoku (some countries)
〈Impactor〉: saigai (disaster), jishin (earthquake), fukyou
(depression), dageki (damage)
3 Comparison with other resources
To evaluate our results, we compared them with
other Japanese resources and methods for anal-
ysis, i.e., IPAL (IPA, 1987) and Nihongo Goi
Taikei (a Japanese lexicon) (hereafter called Goi
Taikei) (Ikehara et al., 1997), which are widely
used lexical resources, and semantic frame anal-
ysis by FOCAL (Nakamoto et al., to appear;
Kuroda et al., 2004), which is a recent frame-
work being developed with the aim of provid-
ing BFN-style semantic annotation and analy-
sis for Japanese independent of the Japanese
FrameNet (Ohara et al., 2003).
3.1 Comparison with Goi Taikei descriptions
Goi Taikei contains detailed information on the
predicate-argument structure classified according
to usage. Its semantic description of osou is given
below:
12For the reason of this argument, see note 5 above.
(1) 20 zokusei henka (property change) (motion)
N1 ga N2 wo osou
N1 strike N2
N1 (1270 shimpai (concern) 1262 kanashimi (sorrow)
2056 sainann (disaster) 2359 kishou (atmospheric
phenomena) 1000 tyuushou (abstract)) N2 (2 gutai
(object))
(2) 23 shintai dousa (physical motion) (motion)
N1 ga N2 wo osou
N1 attack N2
N1 (3 shutai (subject) 535 doubutsu (animal) 2416 by-
ouki (disease)) N2 (2 gutai (object))
(3) 23 shintai dousa (physical motion)
31 kanjou dousa (affective motion) (motion)
N1 ga N2 no fui wo osou
N1 surprise N2
N1 (4 hito (man) 1001 tyuushoubutsu (abstruc-
tion/abstraction?) 1235 koto (event)) N2 (4 hito
(man))
The word meanings were classified from the
properties of osou for nouns related to surface
cases of the verb. When we compared the frames
in BFN and the description provided by Goi
Taikei, and examined how the BFN frames corre-
sponded to the Goi Taikei definitions, we obtained
the following relationships:
Table 2: BFN/Goi-Taikei correspondences
Attack (2) 23 shintai dousa (physical motion)
Cause harm (1) 20 zokusei henka (property change)
Cause impact (1) 20 zokusei henka (property change)
First, we did not obtain the meaning “An unex-
pected event occurred” like (3) in the Goi Taikei.
It was difficult to extract words whose meanings
described a manner of action, such as fui wo (by
surprise) using this method. It was also insuffi-
cient to extract only co-occurring nouns from sen-
tences related to verbs. As might be expected,
there was a close relationship between (2) and the
〈Attack〉 frame. However, we were unable to find
〈Assailant〉s such as sickness in the BFN FEs. Fi-
nally, the 〈Cause impact〉 frame and (1) were very
similar, except that assailant in (1) includes feel-
ings such as worry or sadness.
There was a good correlation between the se-
mantic frame constructed from BFN and the one
from Goi Taikei. With this method, however, we
met difficulties in extracting frames that did not
appear on the surface, such as 〈manner of action〉.
16
3.2 Comparison with IPAL descriptions
We compared the frames we obtained with the
definitions from the IPA Lexicon (IPA, 1987). Be-
low is an excerpt from the description of osou
from IPAL:
• Caption: osou001001 Semantic definition: An unde-
sirable thing unexpectedly occurs to someone.
Sentence valence pattern: N1 -ga N2 -wo
Noun phrase 1: bouto (rioter), goutou (burglary),
kuma (bear), sentouki (fighter plane), boufuu (wind
storm), jishinn (earthquake), ekibyou (plague), keizai
kiki (economic crisis)
Noun phrase 2: tabibito (traveler), fune (ship), nin-
gen (human)/kokudo (national land), kuni (country),
kouban (police box)
Example 1: Boufuu ga fune wo osotta. (A stormy wind
struck a ship.)
• Caption: osou001002
Semantic definition: Undesirable feelings and physio-
logical phenomena happening suddenly.
Sentence pattern: N1 -ga N2 -wo
Noun phrase 1: takamaru fuann (increased anxiety),
shi no kyoufu (fear of death), iyana kimochi (unpleas-
ant feelings)/ hageshii hiroukan (acute tiredness), ne-
muke (drowsiness)
Noun phrase 2: kare (he)
Example 1: Nemuke ga totsuzen kare wo osotta.
(Drowsiness fell upon him suddenly.)
Example 2: Kanojo ha fuann ni osowareta. (She be-
came uneasy suddenly.)
The IPAL description of osou identifies its two
senses13 We compared the BFN frames and the
IPAL descriptions (in terms of predicate frames)
and obtained the following correspondences:
Table 3: BFN/IPAL correspondences
Attack osou001001
Cause harm osou001001
Cause impact osou001001
All of the frames obtained from BFN seemed to
be classified into the first meaning in IPAL, e.g.,
there were no BFN frames in which 〈Assailant〉
recognized “sickness.” With IPAL definitions,
it was difficult to distinguish the difference be-
tween The bear attacked the traveler and *An eco-
nomic crisis attacked the traveler, the latter of
which sounds unnatural and quite odd, whereas
we can do it with BFN definitions: the former
13A term, “predicate frame,” is used in the IPAL to char-
acterize semantic properties of a predicate. While the idea
of predicate frames is somewhat related to semantic frames,
predicate frames are not defined as semantic frames in the
sense of Frame Semantics/BFN.
can be classified as an expression in the 〈Attack〉
frame, whereas the latter can not. The reason
for this is probably that BFN frames successfully
specify the semantic interdependence between the
〈Assailant〉 and 〈Victim〉 roles, whereas such in-
terdependece is not encoded in the IPAL descrip-
tions. We believe this is one of the strengths of
frame-based semantic description.
BFN definitions are not detailed enough, how-
ever. They face problems when we try to ac-
count for the constrast between The shark at-
tacked the swimmer and ?*The shark attacked the
bank, for example. The latter sentences doesn’t
makes sense unless it is reinterpreted some way,
while it is straightforward to interpret the first sen-
tence against a predatory situation.
In interpreting the second, there is a clear con-
flict or “competition” between two strong read-
ings: one interpretation (reading 1) is against the
situation of 〈Predation〉, where the shark is inter-
preted as a 〈Predator〉 and the bank as a 〈Prey〉.
Another (reading 2) is against the situation of
〈Bank Robbery〉, where the shark is interpreted
as a 〈Bank Robber〉 and the bank as a 〈Warehouse
of Valuables〉 (or simply as a 〈Bank〉). If reading
2 wins out, an implicit “type coercion” (Puste-
jovsky, 1995) takes place to the shark so that the
referent of the shark is switched to a human who
acts as a 〈Robber〉 with a nickname “shark.” If
reading 1 wins out, by contrast, another kind of
implicit type coercion takes place to the bank so
that the referent of the bank is switched to an ani-
mal (an instance of fish, dolphin, or whale) which
acts as a 〈Prey〉, being called “the bank” for some
unclear reasons. The preference of the reinter-
pretation for reading 2 over the other can be ac-
counted for if we are allowed to say that to find
someone being called “shark” is more likely than
to find some animal being called “bank.”
What this suggests is this: pieces of semantic
information that would account for “selectional
restrictions” of this sort are not specified in the
BFN definitions (yet). Therefore, it can be said
that the frames constructed from BFN do not
classify all meanings of osou in the same way
IPAL does not, but these frames specify some
finer-grained, selectional aspects of osou’s lexical
meaning than the IPAL description. As we will
see in the next section, this is one of the strong
17
motivations that a framework called FOCAL has
tried to extend the BFN.
3.3 Comparison with FOCAL descriptions
FOCAL is a theoretical framework for semantic
analysis and annotation. Its development has been
strongly influenced by BFN, but it also tries to
extend BFN’s scope of semantic analysis to the
next stage.
In the case of X-ga Y-wo osou, FOCAL recog-
nizes 15 frames in total, listed in Table 4, specify-
ing their hierarchical organization.14
These frames are identified and classified based
on the semantic co-variations between 〈Harm
Cause(r))〉 X, a special case of 〈Cause(r)〉,
and 〈Harm Experiencer〉 Y , a special case of
〈Experiencer〉. This is important to note that FO-
CAL puts more emphasis on the specification of
the semantic co-variation between X and Y in
terms of semantic features because they are cru-
cial characteristics of a semantic frame, which are
not captured in the Goi Taikei and IPAL descrip-
tions, and are not clearly encoded even in the BFN
description.
In FOCAL, frames are defined as idealized
models of situations such as Robbery, Predation,
assuming that human understanding is situation-
based. The descriptive task of FOCAL, then, is
to recognize situations and give adequately de-
tailed descriptions to them. Given R is a set of
situation-specific roles {r1, . . . , rn}, which are
called semantic roles in BFN. Semantic frames
are useful only if they serves as specifications of
the co-variations among such Rs.
For example, F06, as a subclass of the 〈Attack〉
class event is defined as follows:
Definition of F06: Attack(R) = Attack(Predator(X),
Prey(Y ))
= Hunt(Hunter(X), Target(Y ), Purpose(Z))
where Z = Eat(Eater(X), Food(Y ), Purpose(Zprime));
where Zprime = Satisfy (r1(Z), Hunger)
There seems to be no English noun that names r1.
These are the frames that account for more or
less all possible readings of X-ga Y -wo osou. The
14 Space limitation disallowed us to show that the 15
frames thus recognized are nearly optimal to exhaustively
specify all the situations against which the senses of osou are
determined. This was confirmed by multivariate analyses on
psychological experiments (Nakamoto et al., to appear). We
regret this because the result would surely have answered the
question from one of the anonymous reviewers.
Table 4: 15 FOCAL frames with groups G1–G5
G1 F01 harm to Y caused by conflict between
groups X and Y
G1 F02 harm to Y caused by X’s invasion
G1 F03 harm to Y caused by X’s robbery
G1 F04 harm to Y caused by X’s violence
G1 F05 harm to Y caused by X’s raping
G2 F06 harm to Y caused by X’s preying attack
G2 F07 harm to Y caused by X’s nonpreying attack
(e.g., X’s defense)
G3 F08 harm to Y due to an unexpected accident X
G3 F09 harm to Y caused by a natural phenomenon
X (on a smaller scale, e.g., gust)
G3 F10 harm to Y caused by a natural phenomenon
X (on a larger scale, e.g., earthquake, flood)
G3 F11 harm to Y caused by a natural phenomenon
X (on a larger scale, e.g., spread of an
epidemic)
G4 F12 harm to Y caused by a social phenomenon X
G5 F13 harm to Y caused by a disease X
(nontemporary, e.g., cancer)
G5 F14 harm to Y caused by a disease symptom X
(temporary, e.g., heart attack)
G5 F15 harm to Y caused by a bad feeling X
(temporary, e.g., drowsiness)
validity of this claim was confirmed through psy-
chological experiments, and reported in (Kuroda
et al., 2004; Nakamoto et al., to appear). The
BFN identifies 3 frames relevant to the semantics
of osou, while FOCAL uses a total of 15 frames
to determine the range of situations against which
people understand the sentences whose main verb
is osou.
The 3 BFN frames have been compared with
the 15 frames below to assess how well they cor-
respond to one another:
Table 5: BFN/FOCAL correspondences
Attack Part of G1 F01–F05
Cause harm [UNCLEAR] [UNCLEAR]
Cause impact [UNCLEAR] [UNCLEAR]
[UNCLEAR] G5 F13–F15
This comparison revealed several differences.
First, FOCAL specifies situations that the
〈Attack〉 frame applies to in much greater de-
tail, although its descriptions are based on se-
mantic frames like BFN’s descriptions are. This
is mainly because FOCAL identifies frames in
terms of conceivable differences in the “pur-
poses,” or “intended effects” of the 〈Harm
18
Cause(r)〉15, of which BFN’s 〈Assailant〉 is a spe-
cial case. This suggests that BFN frames can be
further elaborated according to the subclassifica-
tion of 〈Assailant〉 in terms of its purpose.16
The same is conversely true of 〈Cause harm〉
and 〈Cause impact〉 frames. These BFN frames
need to be generalized so that they include nonhu-
man, nonintentional agents, which is not done in
the current BFN. Better matches would be found
if the 〈Cause harm〉 and 〈Cause impact〉 frames
were further classified according to the properties
of the 〈Harm causer〉 and 〈Impactor〉 just as in the
〈Attack〉 frame.
While FOCAL explicitly groups the F01–F05
frames into G1 and combines it with another
group, G2, to yield a more general semantic class
{G1, G2}, it is not clear whether BFN captures
this hybrid class, since the hierarchical relation-
ships among frames are not sufficiently specified.
In fact, the comparison with FOCAL revealed
that BFN does not classify the 〈Assailant〉 types in
as much detail as FOCAL does. According to FO-
CAL’s assumptions, it is 〈Assailant〉’s 〈Purpose〉
(including the “null” value) that defines the differ-
ences in otherwise similar situations. To identify
such subtle differences is exactly what humans
are very good at and computers are not. Speci-
fication of information of this kind is one of the
serious demands arising from many of the NLP
tasks.
To conclude, we noted that the granularity
of the semantic descriptions provided by BFN,
IPAL, Goi Taikei, and FOCAL had the following
hierarchy: FOCAL > BFN ≈ Goi Taikei > IPAL
This suggests that, while BFN is clearly useful
for a variety of purposes, its semantic descrip-
tions are not detailed enough, particularly when
dealing with the polysemy of relatively frequent
words like osou in Japanese or hit in English.
While our result is only suggestive at best, let
15This is not the same as BFN’s 〈Harm causer〉 role,
which is much more specific than 〈Harm Cause(r)〉 in FO-
CAL’s sense.
16The question of “where to stop,” addressed by one of
the anonymous reviewers, would have been answered if we
had enough space to show that those 15 frames/situations are
nearly optimal to account for all the semantic classifications
reflected in selectional restrictions, as explained in note 14.
Clearly, we do not need to identify all semantically possible
subclassifications; we just need to identify psychologically
real subclassifications.
us make a brief comment on some methodologi-
cal aspects of the BFN framework.
Overall, BFN definitions for semantic frames
are much more oriented or even “biased” for de-
scriptions of activities intended and caused by
human, volitional agents. In fact, BFN took a
methodological decision not to include metaphor-
ical uses and other “problematic” uses of words
for ease of lexicon-building, thereby sacrificing
its descriptive range, causing a problem with bi-
ased data coverage, as far as we could see. In
the case of osou, for example, there were clearly
many examples in which harm is not caused by
a human, i.e., cases described by FOCAL frame
clusters G2: F06–F07, G3: F08–F11, G4: F12,
and G5: F13–F15. Therefore, as far as we are
concerned with the viability of the frame-based
description of situations that can be expressed us-
ing osou in Japanese, the current status of the
BFN database is only partially successful in that it
successfully captures the class of situations spec-
ified by G1.
4 Conclusion
We proposed a new translation-like method using
BFN to find Japanese LUs that corresponded to
English LUs in BFN semantic frames. We eval-
uated a technique of identifying Japanese LUs
based on English LUs using a bilingual corpus.
We evaluated the results by comparing them with
other Japanese language resources and analyses,
IPAL, Goi Taikei, and FOCAL. The comparison
revealed that FOCAL, BFN, Goi Taikei, and IPAL
provided finer-grained descriptions in this specific
order.
Our method allowed us to easily find Japanese
LUs that corresponded to LUs in BFN seman-
tics and at the same level of granularity as BFN.
Even if all the relevant sentenceswere not manu-
ally examined when the semantic frame was con-
structed, we were able to collect several members
of FEs. Our method also automatically specified
a set of lexical titems that clearly had the frame-
evocation effect but that were not properly iden-
tified as Frame Elements of a semantic frame in
BFN.
There are several problems still remaining that
need to be addressed. Because the bilingual cor-
pus used was a newspaper corpus, the target se-
19
mantic domains were limited. There is therefore
a possibility that we failed to identify certain se-
mantic frames. We plan to do further experiments
using a greater number of bilingual corpora with
a wider domain coverage.
In the comparison of the analyses by BFN and
by FOCAL, only one target verb osou is used in
this work. Clearly, this is insufficient and our re-
sult is only suggestive at best. To draw a realis-
tic conclusion, we will definitely need to examine
more target words and make the comparison more
reliable.
References
Collin F. Baker, Charles J. Fillmore, and John B.
Lowe. 1998. The Berkeley FrameNet project. In
Proceedings of the COLING-ACL ’98, Montreal;
Canada.
Benfung Chen and Pascale Fung. 2004. Biframenet:
Bilingual frame semantics resource construction by
cross-lingual induction. In Proceedings of the 20th
International Conference on Computational Lin-
guistics (COLING 2004).
Michael Ellsworth, Katrin Erk, Paul Kingsbury, and
Sebastian Pad´o. 2004. PropBank, SALSA, and
FrameNet: How design determines product. In
Proceedings of the LREC 2004 Workshop on Build-
ing Lexical Resources from Semantically Annotated
Corpora, Lisbon.
Katrin Erk, Andrea Kowalski, Sebastian Pad´o, and
Manfred Pinkal. 2003. Towards a resource for
lexical semantics: A large German corpus with ex-
tensive semantic annotation. In Proceedings of the
ACL-03.
Christiane Fellbaum, editor. 1998. WordNet: An Elec-
tronic Lexical Database. MIT Press.
Charles J. Fillmore. 1982. Frame semantics. In Lin-
guistic Society of Korea, editor, Linguistics in the
Morning Calm, pages 111–137, Seoul. Hanshin.
Satoru Ikehara, Mahahiro Miyazaki, Satoshi Shirai,
Akio Yokoo, Hiromi Nakaiwa, Kentaro Ogura,
Yoshifumi Ooyama, and Yoshihiko Hayashi. 1997.
Goi-Taikei: A Japanese Lexicon. Iwanami Shoten,
Tokyo. (in Japanese, 5 volumes/CDROM).
IPA, 1987. IPA Lexicon of the Japanese Language for
Computers: Basic Verbs. Information-Technology
Promotion Agency. (in Japanese).
Paul Kingsbury and Martha Palmer. 2002. From Tree-
Bank to PropBank. In Proceedings of the 3rd In-
ternational Conference on Language Resources and
Evaluation (LREC-2002).
Kolodner, Janet. L. 2004. Case-Based Reasoning.
Morgan Kauffman.
Kow Kuroda, Keiko Nakamoto, Toshiyuki Kanamaru,
Masahiro Tatsuoka, and Hajime Nozawa. 2004.
A scope of concept analysis based on “seman-
tic frames”: Berkeley FrameNet and Beyond. In
Conference Handbook of the 5th Meeting of The
Japanese Cognitive Linguistics Association, pages
133–153. (in Japanese).
Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita,
Yoshitaka Hirano, Hiroshi Matsuda, Kazuma
Takaoka, and Masayuki Asahara, 1999. Japanese
Morphological Analysis System ChaSen version
2.2.1. NAIST Technical Report NAIST-IS-TR. (in
Japanese).
Masaki Murata, Kyoko Kanzaki, Kiyotaka Uchimoto,
Qing Ma, and Hitoshi Isahara. 2001. Meaning sort
— three examples: dictionary construction, tagged
corpus construction, and information presentation
system —. In Alexander Gelbukh, editor, Compu-
tational Linguistics and Intelligent Text Processing,
Second International Conference, CICLing 2001,
Mexico City, February 2001 Proceedings, pages
305–318. Springer Publisher.
Keiko Nakamoto, Kow Kuroda, and Hajime Nozawa.
to appear. Defining the feature rating task as
a(nother) powerful method to explore sentence
meanings: With a special interest with how they are
mentally represented. In Japanese Journal of Cog-
nitive Psychology. (in Japanese).
National Language Research Institute. 1964. Bunrui
Goihyo (Word List by Semantic Principles). Syuei
Shuppan. (in Japanese).
Pustejovsky, James. 1995. The Generative Lexicon.
MIT Press.
Kyoko Hirose Ohara, Seiko Fujii, Hiroaki Saito, Shun
Ishizaki, Toshio Ohori, and Ryoko Suzuki. 2003.
The Japanese FrameNet project: A preliminary re-
port. In Proceedings of Pacific Association for
Computational Linguistics, pages 249–254.
Hiroaki Sato. 2003. FrameSQL: A software tool for
FrameNet. In ASIALEX ’03 Tokyo Proceedings,
pages 251–258. Asian Association of Lexicogra-
phy.
Carlos Subirats and Miriam R. L. Petruck. 2003. Sur-
prise: Spanish FrameNet. Presentation at Work-
shop on Frame Semantics, International Congress
of Linguists. July 29, 2003, Prague, Czech Repub-
lic.
Masao Utiyama and Hitoshi Isahara. 2003. Reliable
measures for aligning Japanese-English news arti-
cles and sentences. In Proceedings of the Annual
Meeting of the ACL-03, pages 72–79. ACL-2003.
20
