BiFrameNet: Bilingual Frame Semantics Resource Construction by 
Cross-lingual Induction 
Pascale Fung and Benfeng Chen 
Human Language Technology Center,  
University of Science & Technology (HKUST), 
Clear Water Bay, Hong Kong 
{pascale,bfchen}@ee.ust.hk 
 
Abstract 
We present a novel automatic approach to 
constructing a bilingual semantic network—the 
BiFrameNet, to enhance statistical and 
transfer-based machine translation systems. 
BiFrameNet is a frame semantic representation, 
and contains semantic structure transfers 
between English and Chinese. The English 
FrameNet and the Chinese HowNet provide us 
with two different views of the semantic 
distribution of lexicon by linguists. We propose 
to induce the mapping between the English 
lexical entries in FrameNet to Chinese word 
senses in HowNet, furnishing a bilingual 
semantic lexicon which simulates the “concept 
lexicon” supposedly used by human translators, 
and which can thus be beneficial to machine 
translation systems.  BiFrameNet also contains 
bilingual example sentences that have the same 
semantic roles. We automatically induce 
Chinese example sentences and their semantic 
roles, based on semantic structure alignment 
from the first stage of our work, as well as 
shallow syntactic structure. In addition to its 
utility for machine-aided and machine 
translations, our work is also related to the 
spatial models proposed by cognitive scientists 
in the framework of artifactual simulations of 
the translation process. 
1. Introduction 
The merits of translation at the word level or 
the concept level have long been a cause for debate 
among linguists. Some linguists suggest that the two 
languages of a bilingual speaker share a common 
semantic system (Illes and Francis 1999; Ikeda 1998) 
and hence translation is carried out at the concept 
level.  
Meanwhile, there has been a gradual 
convergence of statistical and transfer approaches in 
machine translation recently (Wu 2003). Statistical 
MT systems are based on a stochastic mapping 
between lexical items, assuming the underlying 
semantic transfer is hidden. Transfer systems use 
explicit lexical, syntactic and semantic transfer rules. 
Consequently, cognitive scientists and 
computational linguists alike have been interested in 
the study of semantic mapping between languages 
(Ploux and Ji, 2003, Dorr et al., 2002, Ngai et al., 
2002, Boas 2002, Palmer and Wu, 1995). We 
propose to automatically construct a bilingual lexical 
semantic network with word sense and semantic role 
mapping between English and Chinese, simulating 
the “concept lexicon”, suggested by cognitive 
scientists, of a bilingual person.  
 
 
  
Figure 1. BiFrameNet lexicon and example 
sentence induction 
 
The linguists-defined ontologies–-FrameNet 
(Baker et al., 1998), HowNet (Dong and Dong, 
2000), and bilingual dictionaries are the basis for the 
induction of the mapping. We automatically estimate 
the semantic transfer likelihoods between English 
FrameNet lexical entries and the Chinese word 
senses in HowNet, and align those frames and 
lexical pairs with high likelihood values. In addition, 
we propose to induce Chinese example sentences 
automatically to match English annotated sentences 
provided in the FrameNet. The BiFrameNet thus 
induced provides an additional resource for 
machine-aided or machine translation systems. It can 
also serve as a reference to be compared to cognitive 
studies of the translation process.  
: lexical entry in FrameNet; : concept in HowNet 
: Fram eN et fram e;              :H ow N et category
: links from FrameNet to HowNet
: links of the frame F;     : translations of 
 is a ranked list; 
F
FH
L
LT
R
µ
µ ν
µ
[ ] means the top-  elem ent
:possible  linked to 
: HowNet categories related to frame 
( ) :  binary function, return 1 if input is true;
           otherw ise return 0.
For each  
1={translation
F
F
Rk k
VF
F
x
T
µ
νµ
δ
µ
∈
Λ
W_C G_C POS
s of  in HowNet}
2 ={translations of  in dictionary}
 12
{( , ) | . , . = . }
For each 
   ={ |(u, ) ,  }
   For each   
       ( ) ( , )
    is the ranked list of H  so
F
F
T
TT T
LL T
F
VLF
H
fH V v H
R
µ
µµµ
µ
ν
µ
µ
µν ν ν ν
νν µ
δν
=
←∈
∈∈
=∈∈
∑
U
U
rted by ( ) 
   [1] [2]... [ ]
  {[]| ([],[]) ,
            1,.. , 1,..}
   {( , ) | , }
F
FF
F
fH
RR RN
R k Sim R l R k threshold
lNkN
LFVµν µ ν
Λ=
Λ←Λ >
==+
=∈∈Λ
UU
U
I
 Figure 2. BiFrameNet ontology induction  
 
Ploux and Ji, (2003) proposed a spatial model for 
matching semantic values between French and 
English. Palmer and Wu (1995) studied the mapping 
of change-of-state English verbs to Chinese. Dorr et 
al. (2002) described a technique for the construction 
of a Chinese-English verb lexicon based on HowNet 
and the English LCS Verb Database (LVD). They 
created links between HowNet concepts and LVD 
verb classes using both statistics and a manually 
constructed “seed mapping” of thematic classes 
between HowNet and LVD. Ngai et al. (2002) 
employed a word-vector based approach to create the 
alignment between WordNet and HowNet classes 
without any manual annotation. Boas (2002) outlined 
a number of issues surrounding the planning and 
design of GermanFrameNet (GFN), a bilingual 
FrameNet dictionary which, when complete, will 
have a corpus-based German lexicon following the 
FrameNet structure. 
2.1. FrameNet and HowNet 
The Berkeley FrameNet database consists of 
frame-semantic descriptions of more than 7000 
English lexical items, together with example 
sentences annotated with semantic roles (Baker et al., 
1998). There is currently no frame semantic 
representation of Chinese. However, the Chinese 
HowNet (Dong and Dong 2000) represents a 
hierarchical view of lexical semantics in Chinese.  
 
 This paper is organized as follows: Section 2 
describes the algorithm for estimating transfer 
relations between FrameNet and HowNet structures.  
Section 3 presents our method for selecting 
BiFrameNet example sentences for a particular 
frame and automatically inducing semantic role 
annotations. We conclude in Section 4, followed by 
a discussion in Section 5.  
FrameNet is a collection of lexical entries grouped 
by frame semantics. Each lexical entry represents an 
individual word sense, and is associated with 
semantic roles and some annotated sentences. 
Lexical entries with the same semantic roles are 
grouped into a “frame” and the semantic roles are 
called “frame elements”. For example: 
 
Frame: Cause_harm 
Frame Elements: agent, body_part, cause, event, 
instrument, iterations, purpose, reason, result, 
victim….. 
Lexical Entries: 
bash.v, batter.v, bayonet.v, beat.v, belt.v, 
bludgeon.v, boil.v, break.v, bruise.v, buffet.v, 
burn.v,…. 
Example annotated sentence of lexical entry 
“beat.v”: 
[agent I] lay down on him and beat [victim at him] 
[means with my fists].  
2. Lexical semantic mapping in BiFrameNet  
Dorr et al. (2002) uses a manual seed mapping of 
semantic roles between FrameNet and LVD to 
induce a bilingual verb lexicon. In this paper, we 
propose a method of automatically mapping the 
English FrameNet lexical entries to HowNet 
concepts, resulting in the BiFrameNet ontology. We 
also make use of two bilingual English-Chinese 
lexicons for this induction. In this section 2, we use 
an example FrameNet lexical entry “beat.v” in the 
“cause_harm” frame to illustrate the main steps of 
our algorithm. 
 
HowNet is a Chinese ontology with a graph 
structure of word senses called “concepts”, and each 
concept contains 7 fields including lexical entries in 
Chinese, English gloss, POS tags for the word in 
Chinese and English, and a definition of the concept 
including its category and semantic relations (Dong 
and Dong, 2000). For example, one translation for 
“beat.v” is 打: 
In this work, we make use of contextual lexical 
entries from the same semantic frame, as illustrated 
above. In this example, the “cause_harm” frame 
contains two lexical entries—“beat.v” and “strike.v”. 
From the previous step, “beat.v” and “strike.v” is 
each linked to a number of Chinese candidates. 
“beat.v” is linked to “打”  with membership in two 
different HowNet categories, namely “打 |beat” and 
“ 交往 |associate”. To disambiguate between the 
above these 2 candidate categories, we make use of 
the other lexical entries in “cause_harm”, in this case 
“strike.v” which is linked to “捶 ”,  in the “打 |beat” 
HowNet category. Now, “ |beat” receives two votes 
(from “ ” and from “ ”), and “  |associate” only 
one (from “ ”). We therefore choose the HowNet 
category “ |beat” to be aligned to the frame 
“cause_harm”, and eliminate the sense of “打 ”in the 
“ 交往 |associate” category.  Consequently, 
“beat.v” in “cause_harm” is linked to all HowNet 
concepts that are translations of “beat” which are 
verbs, and which also belong to the HowNet category 
“ |beat” (but not “  |associate”).  
 
NO. = 17645 
W_C =打  
G_C =V 
E_C =~架， ~斗， ~仗， ~敌人，~死， ~伤，~得好  
W_E=attack 
G_E=V 
E_E= 
DEF=fight|争斗  
 
Whereas HowNet concepts correspond roughly to 
FrameNet lexical entries, its semantic relations do not 
correspond directly to FrameNet semantic roles. 
 
2.2. Initial mapping based on bilingual lexicon 
(step 1) 
We use the bilingual lexicon from HowNet and 
LDC dictionary to first create all possible mappings 
between FrameNet lexical entries and HowNet 
concepts whose part-of-speech (POS) tags are the 
same. Here we assume that syntactic classification 
for the majority of FrameNet lexical entries (i.e. 
verbs and adjectives) are semantically motivated and 
are mostly preserved across different languages. For 
example “beat” can be translated into {搥, 败, 冲击, 
出手,  难倒,  骗取,  赢,  战 败… } in HowNet and {打,  
打败,  捣,  敲 打 , 赢… } in the LDC English-Chinese 
dictionary.   “beat.v” is then linked to all HowNet 
concepts whose Chinese word/phrase is one of the 
translations and the part of speech is verb “v”. 
 
In our example, HowNet concepts under two 
HowNet categories—“beat” and “damage” are linked 
to the “cause_harm” frame in FrameNet. Only the 
concepts in the top N categories are considered as 
correctly linked to the lexical entries in the 
“cause_harm” frame. We heuristically chose N to be 
three in our algorithm. 
2.4. Final mapping adjusted by taxonomy 
distance (step 3) 
Using frame context alone in the above step can 
effectively prune out incorrect links, but it also 
prunes some correct links whose HowNet categories 
are not in the top three categories. In this next step, 
we aim to recover this kind of pruned links by finding 
other categories with high similarity to the chosen 
categories. We introduce the category similarity 
score (Liu and Li, 2002), which is based on the 
HowNet taxonomy distance:  
2.3. Refined mapping based on semantic 
contexts in both languages (step 2) 
At this stage, each FrameNet lexical entry has links 
to multiple HowNet concepts and categories. For 
example, “beat.v” in “cause_harm” frame is linked to 
“打 ” in both the “beat” category and the “associate” 
category (as in“打电话 /make a phone call”). We need 
to choose the correct HowNet concept (word sense). 
Many word sense disambiguation algorithms use 
contextual words in a sentence as disambiguating 
features.  
Sim(category1,category2) = 
+d
α
α
 
Where d is the path length from category1 to 
category2 in the taxonomy. α  is an adjusting 
parameter, which controls the curvature of the 
similarity score. We set α =1.6 in our work following 
the experiment results in Liu and Li (2002). If the 
similarity of category p and one of the top three 
categories is higher than a threshold t, the category p 
is also considered as a valid category for the frame. 
 
 
 
In our example, some valid categories, such as 
“firing|射击 ” is not selected in the previous step even 
though it is related to the “cause_harm” frame. Based 
on the HowNet taxonomy, the similarity score 
between “firing|  ” and “beat| ” is 1.0, which is 
above the threshold set.  Hence, “firing|  ” is also 
chosen as a valid category and the concepts in this 
category are linked to the “beat.v” lexical entry in the 
“cause_harm” frame. However, using taxonomy 
distance can cause errors such as 打 in the “weave” 
category to be aligned to “beat.v” in the 
“cause_harm” frame. 
2.5. BiFrameNet lexicon evaluation   
We evaluate our work by comparing the results to 
a manually set golden standard of transfer links for 
some lexical entries in FrameNet, and use the 
precision and recall rate as evaluation criteria. 
Manual evaluation of all lexical entries is a slow 
process and is currently still on-going.  However, 
to show the lower bound of the system performance, 
we chose FrameNet lexical entries with the highest 
number of transfer links to HowNet concepts as the 
test set. Since each link is a word sense, these lexical 
entries have most ambiguous translations.  Since 
the number of lexical entries in a FrameNet parent 
frame (i.e. frame size) is an important factor in the 
disambiguation step, we analyze our results by 
distinguishing between “small frames” (a frame with 
less than 5 lexical entries) and “large frames”.  
24% of the frames are “small frames”. Referring to 
Tables 2 and 3, we can see a weighted average of 
(0.649*0.24+0.874*0.76) =82% F-measure. 
 
lexical 
entry 
Parent frame #candidate 
HowNet 
links 
#lexical 
entries in 
parent 
frame 
beat.v cause_harm 144 51 
move.v motion 132 10 
bright.a light_emission 126 44 
hold.v containing 145 2 
fall.v motion_directional 127 5 
issue.v emanating 124 4 
Table1. Lexical entries test set 
 
lexical 
entry 
Precision 
step3/step1 
Recall  
step3/step1  
F-measure 
step3/step1 
beat.v 88.9/36.8% 90.6/100% 89.7/53.8% 
move.v 100/49.2 % 72.3/100% 83.9/66.0% 
bright.a 79.1/54.0% 100/100% 88.3/70.1% 
Overall 87.1/46.3% 87.6/100% 87.4/52.3% 
Table 2.Performance on large frames 
lexical 
entry 
Precision 
step3/step1 
Recall  
step3/step1  
F-measure 
step3/step1 
hold,v 22.4/7.6% 100/100% 36.7/14.1% 
fall,v 87.0/49.2% 81.1/100% 83.9/66.0% 
issue.v 31.1/12.3% 100/100% 47.5/20.3% 
Overall 52.1/25.0% 85.9/100% 64.9/40.0% 
Table 3. Performance on small frames 
  Step 1 Step 2 Step 3 
Precision 36.81% 95.24% 88.89% 
Recall 100% 75.47% 90.56% 
F-measure 53.81% 84.21% 89.72% 
Table 4. Average performance on “beat.v” at 
each step of the algorithm 
 
Table 4 shows the system performance in each step 
of the alignment between the FrameNet “beat.v” to 
HowNet concepts with the final F-measure at 89.72. 
3. Cross-lingual induction of example 
annotated sentences in BiFrameNet 
In the second stage of our proposed work, we aim 
to automatically induce Chinese example sentences 
that are appropriate for each semantic frame. 
Together with English example sentences that 
already exist in the English FrameNet, they form 
part of the BiFrameNet, and serve to provide 
concrete examples of bilingual usage of semantic 
roles. They can be used either as a resource for 
machine-aided translation or training data for 
machine translation.  
 
FrameNet is a collection of over 100-million 
words of samples of written and spoken language 
from a wide range of sources, including British and 
American English. All the example sentences are 
chosen by linguists for their representative-ness of 
particular semantic roles, grammatical functions, and 
phrase type.  The current FrameNet contains on 
average 30 annotated example sentences per 
predicate, which is still inadequate for automatic 
semantic parsing systems (Fleischman et al., 2003). 
Each FrameNet example sentence contains a 
predicate. The semantic roles of the related frame 
elements are manually labeled. The syntactic phrase 
type (e.g. NP, PP) and their grammatical function 
(e.g. external argument, object argument) are also 
labeled. An example annotated sentence containing 
the predicate “beat.v”, in the “cause_harm” frame,  
is  shown below: 
 
Example sentence type: trans-simple  
We are fighting a barbarian, and [agent: we] must 
[predicate: beat] [victim: him].  
 
In order to provide a representative set of Chinese 
example sentences automatically for a particular 
frame, our method must fulfill the following criteria: 
1) It must find real sentences occurring naturally in 
Chinese texts; 
2) It should find sentences that cover as many 
different usage and domain as possible; 
3) It must find sentences that have the same 
semantic roles as the English example sentences; 
 
FF
  
    
;
;
English sentence Chinese sentence
for frame for frame 
Candidate for frame 
:  Dynamic Programming alignment (Figure 5)
: : 
Ω :  Ψ :  
: 
For each 
  {| , ,(, ) }
  For each 
  
F
FF
F
DP
CA
F
CA F u v Lνµ=∈∈ ∈
∈Ω
ec
c
cc
e
{}
F
ˆ    = argmax ( )
ˆ    { }
CA
FF
DP
∈
Ψ←Ψ
c
ce,c
cU
 
Figure 3. BiFrameNet example sentence induction 
 
4) It should require no manual annotation of any 
kind. 
 
There are at least three different (semi-)automatic 
approaches for mining Chinese example sentences: 
 
i) Translate all English example sentences into 
Chinese by automatic means, and annotate the 
semantic roles by word alignment; 
 
This approach is not appropriate because machine 
translation can be erroneous and this method does 
not satisfy criteria (1) and (2). 
 
ii) Construct an English semantic parser and a 
Chinese parser independently, and use them to 
annotate the sentences in a sentence aligned, 
parallel corpus; 
 
Apart from the high cost of building two semantic 
parsers, which itself requires semantically annotated 
Chinese data; it would be necessary to create 
artificial links between independent human 
annotations manually.  
 
iii) Mine Chinese sentences from a monolingual 
corpus that are syntactically similar to the English 
example sentence, and induce semantic roles from 
the syntactic transfer function between English and 
Chinese. 
 
This is the approach we take. Inspired by previous 
work on syntax-driven semantic parsing (Gildea and 
Jurafsky, 2002; Fleischman et al., 2003), and 
syntax-based machine translation (Wu, 1997; 
Cuerzan and Yarowsky, 2002), we postulate that 
syntactically similar sentences with the same 
predicate also share similar semantic roles. In this 
paper, we present our first experiments on inducing 
semantic roles based on shallow syntactic 
information. We mine Chinese example sentences 
from naturally occurring monolingual corpus, and 
rank them by their syntactic similarity to our English 
example sentences. A dynamic programming 
algorithm then annotates the aligned syntactic units 
with the same semantic roles. The example Chinese 
sentences are not translations of the English 
sentences. Therefore, the set of example sentences 
within a frame is enriched, providing better coverage 
for MT and CLIR systems. 
3.1. Induction from aligned predicate bilingual 
lexical pair 
Since frames are disjoint, we propose a method 
for finding example sentences one frame at a time.  
In this paper, we focus on finding Chinese example 
sentences for the largest frame “cause_harm” and 
the main semantic roles in this frame—“agent”, 
“predicate” and “victim”
1
.  
 
For each English lexical entry and its target 
translation candidates in the BiFrameNet, we first 
extract sentences that contain the translation 
candidates from a large Chinese monolingual 
corpus. Figure 4 shows some initial Chinese 
example sentence candidates under “beat.v”. There 
are many sentences that do not have the 
“agent-predicate-victim” structure. Our next step is 
to find the Chinese sentences that have the 
“agent”, ”predicate” and ”victim” semantic roles and 
annotate them automatically.  
 
南方军队还打 死打伤数百名 政府军官兵  (the southern 
army killed and maimed hundreds of government soldiers) 
土军在进攻中伤害了无辜的平民  (soldiers harmed 
innocent civilians during the attack) 
农民砍掉自 留 山上树木７ ０ 多 棵  (farmers cut down 
more than 70 trees) 
便用针刺破葫芦  (use the needle to prick the squash) 
*媒体捅出一份调查报告( the media exposed/produced 
an investigation report) 
*一些出版社采取“先斩后奏”的做法 ( some 
publishers adopt a “idiom” method) 
Figure 4. Some Chinese example sentence and 
glosses 
3.2. Inducing semantic roles from cross-lingual 
POS transfer  
Among all the Chinese sentences containing the 
target predicate words, we need to identify those that 
contain the same semantic roles as those of the 
English example sentences in FrameNet. Current 
automatic semantic parsing algorithms (Gildea and 
Jurafsky 2003, Fleischman et al., 2003) are all based 
on syntactic parse trees showing a close coupling of 
semantic and syntactic structures.  
Without carrying out full syntactic parsing of the 
Chinese sentences, we postulate that the semantic 
                                                        
1
 As an example, for “beat.v”, 73% of the English example 
sentences have these three semantic roles, only 27% also have 
other semantic roles such as “tools”. 
roles of a sentence are generated by the underlying 
shallow syntactic structure of the sentence such as 
POS tag sequences. We therefore focus on finding 
bilingual sentence pairs that are comparable in POS 
structure, though not necessarily having any lexical 
comparability. Note that this constitutes only a 
subset of all possible Chinese example sentences for 
each frame. The expansion of this set remains the 
objective of our future research  
English POS  Chinese POS  
(, )ecσ  
PRP  N 3.16-e2 
NN  N 4.0-e6 
JJ  N 1.74-e4 
NNP Nr 4.257-e2 
JJS V 2.15-e4
VB V 7.2-e5
VBG Ad 1.34-e3 
VBG m 6.74-e3
Table 5. Example POS tag transfer  
Given an English example sentence, its semantic 
role sequence, and its POS tag sequence; and a set of 
Chinese sentences and their POS tag sequence, we 
use a dynamic programming method (Figure 5) to 
find the Chinese sentence whose POS sequence is 
most likely to be generated from the English POS 
sequence, and the alignment path. The Chinese word 
aligned to the English word will assume the latter’s 
semantic role.  
 
[agent 南方军 队 ]还 [predicate 打死 打伤 ][victim 数百
名政府军 官兵 ] 
[agent 土军 ]在 进攻中 [predicate 伤害 ]了 [victim 无辜
的平民 ] 
[agent 农民 ][predicate 砍掉 ][victim 自 留山上树 木
７０多棵 ] 
便用 [agent 针 ][predicate 刺破 ][victim 葫芦 ] 
Figure 6. Example Chinese annotated sentences 
 
3.3. BiFrameNet example sentence evaluation 
Initialization 
[0,0] 0;[0,-1] (, );[-1,0] (,)
ji
SSj cSi eσ εσ=+ +ε
 
Recursion 
[-1, ] ( , )
[, ] [ 1, 1] ( , )
[, 1] ( , )
[-1, ] ( , )
[, ] arg [ 1, 1] ( , )
[, 1] ( , )
i
ij
j
i
ij
j
Si j e
Si j max Si j e c
Si j c
Si j e
Ti j max Si j e c
Si j c
σε
σ
σε
σε
σ
σε

+

=−−+


−+


+

=−−


−+

 
where 
(, )
ij
ecσ
 is the transfer cost of an English 
POS tag from a Chinese POS tag; ε  is an empty word. 
M and N are the lengths of the English and Chinese 
POS sequences respectively; 1< i <M; 1<j<N; 
Termination 
[, ],[, ]SNM TNM  are the final alignment score and 
final point on the path; 
Path Backtracking  
Output the final English-Chinese POS alignment path 
by tracing back from the terminal points. Also output 
the final alignment score normalized by the path length. 
We estimate the syntactic POS transfer 
probabilities from the HK News Corpus. We use two 
state-of-the-art POS taggers—a maximum entropy 
based English POS tagger (Ratnaparkhi, 1996), and 
an HMM based Chinese POS tagger.
2
 We perform 
two sets of experiments: (1) For each example 
English sentence in the “cause_harm” frame from 
FrameNet, we extract a corresponding Chinese 
sentence annotated with the same semantic roles; (2) 
rank all the Chinese sentences that have been 
aligned to the English sentences by alignment score. 
The highest ranking Chinese sentences are used for 
the BiFrameNet. Table 6 shows that the average 
annotation accuracy of all top Chinese sentence 
candidates for each English example sentence is 
68%. Table 7 shows that the annotation accuracy of 
the top 100 Chinese example sentences, sorted by DP 
score, is 71.8%. 
 
Semantic roles Accuracy 
Predicate 77.63% 
Agent 68.75%
Victim 52.72%
(Overall) 68% 
Figure 5. Dynamic programming (DP) alignment 
 
We train (, )ecσ  in Figure 5 from a sentence aligned, 
POS tagged, parallel corpus (Hong Kong News), and a 
bilingual dictionary. For each bilingual word pair in the 
dictionary, we estimate the prior distributions of the 
POS tags of the Chinese words from the Chinese side 
of the parallel corpus, and that of the English words 
from the English side.  A V x W POS tag “confusion 
matrix” is generated, where V is the vocabulary of the 
Chinese POS tags, and W is the vocabulary of the 
English POS tags. Table 5 shows some example 
English-Chinese POS mapping and Figure 6 shows 
some example annotated sentences in Chinese. 
Table 6. Annotation accuracy of the selected 
Chinese sentences 
Semantic roles Accuracy 
Predicate 81.69% 
Agent 63.24%
Victim 70.77%
(Overall) 71.8% 
Table 7. Annotation accuracy of the top 100 
Chinese sentences with the highest DP alignment 
scores 
                                                        
 
2
 http://mtgroup.ict.ac.cn/~zhp/ICTCLAS/index.html 
4. Conclusion  
We have presented a first quantitative and 
automatic approach of constructing a bilingual 
lexical semantic resource—the BiFrameNet. 
BiFrameNet consists of mappings between 
FrameNet semantic frames and HowNet concepts, as 
well as English and Chinese example sentences for a 
particular frame, with annotated semantic roles in 
the English FrameNet labels. Evaluation results 
show that we achieve a promising 82% average 
F-measure on lexical entry alignment, for the most 
ambiguous lexical entries; and a 68-72% accuracy in 
Chinese example sentence induction, for the largest 
frame. The initial results are available at 
http://www.cs.ust.hk/~hltc/BiFrameNet and will be 
updated as further improvements and evaluations are 
implemented.  
5. Discussion  
There are a number of possible directions for 
future work. One obvious extension is to use 
syntactic parse tree representations instead of POS 
sequences in example sentence alignment. Second, 
there are many other Chinese sentences that share 
the same semantic roles, but not the same POS 
sequences, which are not included. Using additional 
features to correctly identify these sentences and the 
constituent semantic roles is a topic of our ongoing 
research. Moreover, we note that Chinese is a highly 
idiomatic and metaphoric language. Compounded by 
the ambiguity of word boundaries, many predicate 
usages in Chinese are highly unexpected. It is worth 
considering using other Chinese linguistic resources 
to enhance the example sentence extraction and 
annotation. Finally, BiFrameNet needs to be further 
evaluated and manual post-processing is perhaps 
required.  
We expect the final complete BiFrameNet, in 
addition to the various FrameNet and PropBank 
resources being developed manually, will be a 
valuable resource for statistical and interlingua 
transfer-based MT systems, as well as to human 
translators in an machine-aided translation scenario. 
We are also motivated to investigate the relationship 
between our results and those of semantic mapping 
models proposed by cognitive scientists.  
6. Acknowledgement  
This work is partly supported by grants CERG# 
HKUST6206/03E and CERG#HKUST6213/02E of the 
Hong Kong Research Grants Council.  

References 

Collin F. Baker, Charles J. Fillmore and John B. Lowe. 
(1998).The Berkeley FrameNet project. In 
Proceedings of the COLING-ACL, Montreal, 
Canada.  

Hans C. Boas. Bilingual FrameNet Dictionaries for 
Machine Translation. In Proceedings of the Third 
International Conference on Language Resources 
and Evaluation. Las Palmas, Spain. Vol. IV: 
1364-1371 2002. 

Silviu Cucerzan and David Yarowsky. Bootstrapping a 
multilingual part-of-speech tagger in one person-day. 
In Proceedings of the Sixth Conference on Natural 
Language Learning (CoNLL). Taipei, Taiwan. 2002.  

Dong, Zhendong., and Dong, Qiang. HowNet [online 
2002]. Available at 
http://www.keenage.com/zhiwang/e_zhiwang.html 

Bonnie J. Dorr, Gina-Anne Levow, and Dekang 
Lin.(2002).Construction of a Chinese-English Verb 
Lexicon for Machine Translation. In Machine 
Translation, Special Issue on Embedded MT, 17:1-2.  

Michael Fleischman, Namhee Kwon and Eduard Hovy. 
Maximum Entropy Models for FrameNet 
Classification. In Proceedings of ACL 2003, Sapporo.  

Daniel Gildea and Daniel Jurafsky.(2002).Automatic 
Labeling of Semantic Roles. In Computational 
Linguistics, Vol 28.3: 245-288.  

Judy Illes and Wendy S. Francis. Convergent cortical 
representation of semantic processing in bilinguals. 
In Brain and Language, 70(3):347-363, 1999.  

Liu Qun and Li, Sujian. Word Similarity Computing 
Based on How-net. In Computational Linguistics 
and Chinese Language Processing，Vol.7, No.2, 
August 2002, pp.59-76 

Grace Ngai, Marine Carpuat, and Pascale Fung. 
Identifying Concepts Across Languages: A First 
Step towards a Corpus-based Approach to 
Automatic Ontology Alignment. In Proceedings of 
COLING-02, Taipei, Taiwan. 

Martha Palmer and Wu Zhibiao. Verb Semantics for 
English-Chinese Translation. In Machine 
Translation 10: 59-92, 1995.  

Sabine Ploux and Hyungsuk Ji. A Model for Matching 
Semantic Maps between Languages (French/English, 
English/French). In Computational Linguistics 
29(2):155-178, 2003. 

Adwait Ratnaparkhi. A Maximum Entropy 
Part-Of-Speech Tagger. In Proceedings of EMNLP 
2003, May 17-18, 1996. University of Pennsylvania 

Satoko Ikeda. Manual response set in a stroop-like task 
involving categorization of English and Japanese 
words indicates a common semantic representation. 
In Perceptual and Motor Skills, 87(2):467-474, 1998. 

Dekai Wu. Stochastic inversion transduction grammars 
and bilingual parsing of parallel corpora. In 
Computational Linguistics 23(3):377-404, Sep 1997 

Dekai Wu. The HKUST leading question translation 
system. MT-Summit 2003. New Orleans, Sep 2003. 
