VERBAL CASL FRAME ACQUISITION FROM A BIIAN(?,UAL C()RPUS: 
GRADUAL KNOWI,EDGE ACQUISITION 
ltideki Tanaka-t 
NHK Science and Technical Research Laboratories 
hanakah@ strl .nhk.o r.jp 
Abstract 
This paper describes acquisilion of English stillace case 
flames from a corpus, based on a gradual knowledge acqui- 
sition approach. To acquire and unambiguously accumu- 
late precise knowledge, the process is divided inln three 
steps which are assigned to the most appropriate processor: 
either a human or a computer. The data is prepared by hu- 
man workers and the knowledge is acquired and accumu- 
lated by a leaning program. By using this method, inconsis- 
tent hunmn judgement is minimized. The acquired case 
frames basically duplicate Imman work, but are more pre- 
cise and intelligible. 
1 Gradual Knowledge Acquisition 
We have been developing an English-to-Japanese nut- 
chine translation (MT) system (i~t news reports in l-nglish 
(Aizawa T., 1990) (Tanaka I I., 1991) and have so far stud- 
ted the translation selection problem for common English 
verbs (Tanaka I1., 1992). Recently, we examined the prob- 
lem of multiple translatkms for COllll/lOl\] English verbs 
(Tanaka \[1., 1993). Our MT system uses surface verbal case 
flames (simply written its case frames) to selccl a Japanese 
translation for an English verb. The need to acqtuirc and 
accumttlate case frames leads directly to three problems. 
(1) How to obtain detailed case frames which are accurate 
enough to mmslate highly polysemous verbs? 
(2) ltow to accumnlate a number o1' case frames in an unam- 
biguous way. 
(3) Manual case frame acquisition tends to yield inconsis- 
tent results since human judgements are changeable. \[Iow 
can we maintain cousistency? 
We need to devise a cleat' methodology lor acquiring suf-- 
ficient case flames and accuumlating them in a way that is 
unambiguous and consistent. 
In this paper, we propose a gradually building up a knowl- 
edge base from a bilingual corpus to cope with these three 
problems. The knowledge base is a collection of case 
fiames. Fig. 1 shows an overall view of otn approach. 
The process is divided into three steps which arc assigned 
to the most appropriate processor: a hmnan or a computer. 
Using this method, detailed knowledge is obtained fiom the 
Fig. 1 : Case-Frame Tree Acquisition from a 
Bilingual Corpus 
target &)main tents, unstable hmnan judgement is confined, 
and case IYames are accumtdated unambiguously by using a 
lemning algorithm. 
We begin by preparing a tagged bilingual corpus seeking 
detailed knowledge in target domain texts. The annotation 
described in the corpus is tile syntactic information of tile 
texts and tile translaliot~. They are assigned manually since 
hnman translators can do such jobs as syalactic lagging and 
translation with far more cousistency than writing case 
frames directly. 
Next, tile corpus is converted into an intermediate data 
form called the primitive case-flame table (PCI'T). Finally 
a stalistical learning algorilhm is used to extract the case 
frames from the PC\['T and accuuuulate them in a clear-cut 
fashion. 
While this approach let us avoid writing case flames di- 
rectly using linguistic contemplation, human activity plays 
an important role in designing and constructing the corpus 
and converling it into the PCIq' (Fig. 1). 
The case frames are represented in a discrimination tcee, 
which has sev01al attractive features lor word-sense selec- 
tion (Okunmra M., 1990). The biggest attraction of the 
learning algorithm, we think, is its intelligibility; compared 
with the algorithms for neural networks, for example, it 
produces highly intelligible results if the inpul is appmpri- 
727 
ate. 
Knowledge acquisition by machine learning from a cor- 
pus has recently been getting more attention than ever in 
some natural language processing fields. Cardie(1992, 
1993) applied this approach to predict the antecedent of 
relative pronouns and attributes of unknown words. 
Utsuro(1993) introduced a methodology for autonmtically 
acquiring the verbal case frames from bilingual corpora in a 
different way than our methodology. 
2 Case Frames for Translation 
Ore" machine translation system uses case frames for the 
translation of English verbs. Fig. 2 shows illustrative case 
frames for the word take. 
SN \[man\] take ON \[boy\] ~..~ (select) 
SN \[I\] take ON \[him\] PN\[to\] PNc\[BUILD\] 
~tL-Cb~ < (escort) 
SN \[HUMAN\] take ON \[CON\] PN\[to\] PNc\[BUILD\] 
~ -~ -( L, ~ < (bring) 
Fig. 2: Example of Case Frames for take 
We write case categories (SN (subject noun) and 
PN(preposition) here) and specify their restrictions. The re- 
striction can be a semantic category like tfUMAN or a word 
form itself like boy. There may be several hundred case 
frames for the most common English verbs. 
The translation selection is performed after the parser 
produces a syntactic structure for the input sentence. The 
system compares the syntactic structure with the case 
frames and selects the translation from the best-matching 
case frame. Translation selection is performed without 
considering the context. Our new case fiames are designed 
to follow the same protocol. 
There are three factors to consider at this point. 
(1) How many and what kinds of case categories should be 
used? 
(2) In which order should the system compare the syntactic 
structure and the case categories in a case fl'ame? 
(3) What kind of restriction should we use? 
In this paper, we will deal mainly with the first two fac- 
tors. Our solution is to use a discrimination tree for the 
case-flame representation and a statistical algorithm for 
learning. The necessary case categories are selected and 
stacked in a tree form, one by one, according to their contri- 
bution to the translation selection. We call the obtained tree 
the case-flame tree. Fig. 3a is an example of a case-frame 
tree for take. 
ON 
him//O~ox 
bring 
escort select 
...... ,... ................................................................ ..,...., .......................................................... 
Fig. 3a: Example of a Case-Frame "Free 
ON\[box\] ± " t, 
ON\[him\] ~,~ 
ON\[him\] PN\[to\] Z~tLTb,< 
Fig. 3b: Linear Case Franles for Fig. 3a 
Comparison with the syntactic structure is made fi'om the 
root node to the leaf nodes of the case frame-tree and no 
backtracking is allowed. The comp,'u'ison is executed deter- 
ministical\[y. If we read the tree fiom the root to the leafs, it 
can be expanded into a linear ease fiame, as shown in Fig. 
3b. This increases the intelligibility of the case-fiame tree 
enabling a human lexicographer to evaluate it from a lin- 
guistic viewpoint. 
3 Learning from the PCFT 
A case-fralne tree can be regarded as a decision tree. 
l)ecision-lree learning has a long research history and many 
algorithms have been developed. Among them, the ID3 
group (Quinlan J., 1993) of programs and its descendants 
satisfy our solution in Sec. 2. We apply the latest program, 
C4.5 (Quinlan J., 1993), to our problem. This algorithm 
learns a decision tree from an attribute-value and class table. 
An exatnple of such a table is shown in Table 1. 
Tal)le 1: Example nf a Primitive Case Franle Table 
SN V ON PN PNc translation 
I take him to theater 
you take him to school 
you take him to park 
you take box to theater 
you take box to park 
I take box to school 
1 take him 0 0 
you take him 0 0 
~g-(~,' < (escort) 
~q~:~Z't,, < (escort) 
~gZ'b' < (escort) 
~.ff o -C t,~ < (bring) 
},~ ~ -( t, ~ < (bring) 
}~o -('t,~/-, (bring) 
N~,,~ (select) 
L~,S~ (select) 
The first row of the table represents the attributes or the 
case categories. The values of the attribntes arc the restric- 
tions of the case categories. Word forms are used in this. 
Since the algorithm produces a case-liame tree fi'om this 
table, we term the table a "Primitive Case-fl'ame Table 
(PCIq')." 
728 
The (;4.5 first puts all translations listed in the PCI:f 
under a root node then recursively selects one case category 
and pmtitions the translations according to the word forms 
of the selected category. For the case category selection, a 
criteria based on the entropy reduction of translations 
gained by the partitioning ix used. See (Quinlan J., 1993) 
for more details. In a word, this algorithm places case cate- 
gories from the root node to the leaf nodes according to the 
category's ability for translation discrimination. The case- 
frame tree in Vig. 3a was produced fl'om Table 1. It does not 
have a node corresponding to a subject. This simply means 
the subject information is redundant in selecting the transla- 
tion o1' take in 'fable I. 
4 Data Preparation 
4.1 Construction of the Bilingual Corpus 
As mentioned in Sec. l, the data for nmchine learning is 
prepared in two steps: construction of a bilingual corpus 
and its conversion into a PCITF. l"ollowing are the factors 
consklered and the steps taken to put together our corpus. 
• Sollrce 
Since we couM not find a readily awfilable bilingual cor- 
pus from the news domain, we decided to make one our- 
selves by using the Associated Press (AP) wire service news 
text and adding a Japanese translation to it. 
• Target 
We selected 15 verbs known to be problematic verbs for 
maclfine translation: come, get, give, go, make, take, run, 
call, cut,fi, ll, keep, look, put, stand, and turn. 
Since case frames correspond to simple sentences, we did 
not deal with long sentences. The maximum sentence 
length was set at 15 words. 
• Quantity of Data 
To estimate the necessary amount of data, we investigated 
the monthly frequency of each verb appearing over six 
months. The \[}equency showed a fixed tendency over the 
measurement periods, suggesting that the data for one 
month ix a good starting point. We decided to use two 
months, January 1990 and January 1991, for the English 
sentence extraction. 
° Construction 
(1) Preparing the English text 
Sentences up to 15 words long which contain one or more 
of the 15 target verbs were autonmticatly extracted fi'om Ihe 
two-month AP sonrce text. 
(2) Identifying the range governed by the verh 
The range which the target verb directly governs in the 
English text was manually identified. The two lines starting 
with FNG in Fig. 4 are an example. 
(3) Constructing the English case data 
The a priori-defined category labels for each part of the 
ENG data were manually marked and the head word and 
functional word in each category were identified. The lines 
stm'ting with CASF, in Fig. 4 correspond to this data. 
We had defined 34 category labels beforehand. Twelve 
of them (sentence category labels) were assigned to verbs to 
identify the sentence category from which the verb was ex- 
tracted. Example categories are: V (declarative sentence), 
PVQ (polm question), IMV (imperative sentence), PASV 
(passive sentence), and IV (to-infinitive clause). Twenty- 
two of the category labels (case category labels) identify 
the surface cases or the syntactic categories of other compo~ 
nents in the sentence, l,~xamples are: SN (subject noun 
clause), SIN (subject to-infinitive clause), and PN (preposi- 
tional phrase Imodifying the target verb\]). 
(4) Constructing the Japanese data 
Japanese translations were assigned to each of the F, nglish 
head words and functional words. When translation was not 
possible simply reading the English sentence, its context 
was given to the translators. The two lines starting with JAP 
in Fig. 4 show the translations. 
The complete corpus took about 12 nmn-months of labor 
to construct. Table 2 shows the corpus statistics for seven 
verbs. Row (2) shows the percentage of sentences thal re- 
quired the context for translation. This figure indicates the 
limitations of manual translation without context. Most of 
these sentences had pronotms like it and the translators 
needed the context to clarify the referents. 
19 : " I just know I'm going to take those rubles and 
buikl another restaurant, " he said. 
ENG : I'm going to take those rubles 
('ASE : S N<II\]> AX<\[ he going tel> V<I take 1> 
()N<those \[ruble\]> 
JAP : SN<{,~Y',\]{ ~}> AX<IBli GOING Tel> 
20 : " I take everybody seriously " Graf said. 
ENG : l take everybody seriously 
CASE: SN<\[I \]> V<\[takel> ON<leverybody\]> 
DD<I seriously \]> 
,lAP : SN<I ~/, \[{ ~:~ } > V<\[ '~ l 0 I\[- ~") ~ l> 
<> category label, \[\] head word, { } functional word 
Fig. 4: Part ofa Tagged Bilingual Corpus 
4.2 Conversion into a PCFT 
The bilingual corpus must be converted into a PCVF be- 
729 
Table 2: Corpus Statistics 1 
come get give go make 
(1) 795 867 635 1204 1024 
(2) 3.4% 5.2% 4.1% 3.7% 6.6% 
(3) 782 849 637 941 1020 
(1) Number of English sentences run take 
(2) Percentage requiring context 
to translate 440 1062 
(3)Number of obtained quadruplets 6.0% 4.0% 
303 1067 
fore a case fiame can be learned. We can now directly con- 
trol the information used lbr learning. We followed the 
principals below. 
• Develop one case-fi'ame tree fi'om each sentence category 
This was intended to observe how the sentence category 
affects the appearance of case-frame trees. 
• Use all case categories in the corpus as attributes 
This was to select effective case categories without any 
bias. 
• Use head words and functional words as values for case 
categories 
These words are the primary elements representing each 
case category so it is reasonable to use them as the value. 
5. Case-frame Tree Learning Experiments 
Several learning experiments were conducted on the 
PCFT obtained from each sentence category of the target 
verbs. Complete results fiom the experiments are not pre- 
sented here due to space limitations. Table 3 shows the sta- 
tistical results for seven verbs. 
Table 3: Statistics of Case-Frame Trees 
(from declarative sentences) 
(1) come get give go make 
(2) 
(3) 
(4) 
(5) 
(6) 
398 274 292 225 367 
30 28 31 20 33 
10 9 9 8 8 
6 5 5 6 ' 6 .... 
10.1% 5.5% 13.0% 10.2% 6.2% 
(l)Verbs (2) Number of training data 
(3)Number of case categories appearing 
in the PCbT (attribute size) 
(4) Number of translations (class size) 
(5) Number of case categories 
appearing in the case-frame tree 
(6) Error rate when the tree was used to 
re-classify the training data 
run take 
68 285 
15 21 
3 10 
3 5 
0.0% 6.0% 
We are now increasing the corpus for give, make, and 
take by 4,000 sets. 
Translations occun'ing less than ten times were not in- 
cluded in the PcIq' for this experiment. The overall error 
rate in Table 3 was quite low. Part oflhe take tree is shown 
in Fig. 5. The figures at the end of each line show the result 
of the reclassification of the training PCIq" by the learned 
tree: (nnmber o1' data items which fell on this leaf/number 
of errors, if any). As is shown, the case-frame tree is highly 
intelligible. 
D<> = over: {J\[ ,: ~\[g: (" (12.(I) 
D<> = up: \]IY,~ (3.0/1.0) 
D<> = O: need time 
ON<>=0: ¢)'73,Z, (5.0/1.0) ~ A 
ON<> = action: ~ 7~ (8.0) 
ON<> = bronze: ~'{\[~:'~- 7j (9.0) 
ON<> = hour: 7~,7~, 7~ (11.0/3.0) 
ON<> = measures: k & (10.0) join 
ON<> = part: @JJIF~- 7~ (33.0/1.0) -.91--- B 
ON<> = while: J'o'Y0"~ (6.0) 
ON<> = place: ~l: ~ , . _ 
', SN<> = Sergei Shupletsov: ~J~f,J'~ Z~(I.0) 
', SN<> =attack: ~\] a~)avt.7o (4.0) win ,~¢q\ 
ON<> = time: happen "~ 
AX<> = 0: 7~"/0' 70 (4.0/2.0) C 
AX<> = may: '~,:')~: ~ -~I ;5 (1.0) 
AX<> = could: ;0"~~ (1.0) 
Fig. 5: Part of Case-Frame Tree for Take 
• Similarity 
The number of case categories actually used in the ease- 
flame tree was drastically smaller than the number used in 
the PCFF, ( row (3) vs. row (5) of Table 3). In the case- 
fi'ame tree tbr lake, for example, the following case catego- 
ries were used: AX (adverb equivalents), D (adverbial par- 
ticles), ON (object noun clause), SIN ( subject to-infinitive 
chmse), and SN (subject noun clause). The top node, i.e. the 
most important node, became D, the adverbial particle, fol- 
lowing the description in an ordinary dictionary. Most of 
these syntactical categories are usually used to describe the 
verb patterns in ordinary dictionaries. The case-frame tree 
basically duplicates the verb patterns found in an ordinmy 
dictionary. 
• Precision 
From the line marked A in Fig. 5. the translation became 
kakaru (need time) under the condition of (ON=0) though 
lake is usually used as a transitive verb, so the lack of an 
object noun looks nnnatural; this part of the tree, however, 
corresponds to time expressions like "take long" and "take 
awhile" which do not have object nouns. This is reasonable 
learning. 
From the line marked B, the idiomatic expression "take 
730 
pm't in" was learned as "take part." The word in was judged 
to be redundant and thus an ineffective element. While our 
corpus did reveal one example thal did not have in it still 
had the same translation: "sanka suru." Ttfis learning is 
more precise than the description in an ordinary dictionary. 
• Complementary learning 
The lines marked C in Fig. 5 show an exan@e of what we 
call complementary learning. The case-frame tree surpris- 
ingly distinguished "kakutoku stlrtf' (will) from "okolmwa 
reru" (happen). The former was learned from "lake ttfird 
place." The latter corresllonds to an idiomatic expression, 
"take place". Tim way tile algorithm learns is tmiquc. The 
key to discrimination was found in SN, the subject noun, 
which sounds reasonable. Discrimination is done in terms 
of the subject's nature: person vs. actiou notlll, llowever, 
this could also be distinguished by the existence of the 
modifier to place, since in the idiomatic sense, no modifica- 
tion is allowed between take andplace. In our PCbT, modi- 
tiers were not iocluded and the system found complemen- 
tary knowledge to distinguish the translations. The same 
phenomenon was fotmd in many paris of the flees. The 
learning algorithm does its best to sub-categorize the trans- 
lations within the given case categories. While this can 
yiekl linguistically-skewed case frames, tttey are still effec- 
tive, at least in the corpus. 
• Differences among sentence categories 
The results flom other sentence categories had a mttch 
different appearance. Trees for make and take which were 
obtained from the PCFT for tile to-infinitive chmse con- 
tained only one case category, ON (object noun clause). 
The case categories effective ill the declarative sentence, 
like the adverbial particle, were not effective for this sen- 
tence category. This strongly suggests that translations 
should be selected by using lhe case frames for the sentence 
type. 
6 Conclusion 
We proposed the idea ef gradual knowledge acquisition 
from a bilingual corpus. The knowledge addressed in this 
paper was the surface verbal case frames for the Japanese 
translation of English verbs. The process consists of three 
steps: corpus construction, data conversion, and maclfine 
learning. 
The case-fiame trees we obtained were highly intelli- 
gible: they can be interpreted from the linguistic viewpoint, 
They basically matclled linguistic intttition and more pre- 
cise knowledge was sometimes acquired. Tree analysis 
showed that in some cases comlllementary learning oc- 
curred even wllen neccssmy knowledge was nol awlilable. 
The trees successfully distinguislled tile translations of 
the training data. 
Our approach basically fulfills our primary goal: acquit'- 
ing detailed knowledge and accunltdating it in a way that is 
consistent and unambiguous. 
There are several areas for future work. The work in lhis 
paper used tile word forms as tile restrictions for tile case 
categories, resulting ill case-frame trees with limited traus- 
lation power for open dala. To increase the lranslation 
power, we are generalizing the corllus by using semanlic 
codes and plan to produce case-frame trees with thenl. 
Acknowledgements 
I would like Io thank Prof. Makoto Nagao oF Kyoto Uni- 
versity and Prof. lloztuni Tatlaka of tbe Tokyo Institute of 
Technology tor their vahtable suggestions. I would also 
like to tl/ank my supervisors Dr. Yuichi Ninomiya, 1)r. 
Teruaki Aizawa, and I)r. Terumasa Ellara, and my col- 
leagues whose discussions helped clarify this work. The 
allonylnotls reviewers inade very COllStl'uclive COllllnelllS 
which gave us vahlable pointers for ottr future work. 
References 
Aizawa, T., Ehara, Uratani and Tanaka (1990). A Machine 
Translation Syslem for l,'oreign News in Satellite Broad- 
casting, t'roc, o/Coling-90, Vol..5', pp. 308-310. 
Cardic, C. (1092). l~eartfing to l)isambiguate Relative Pro- 
notms. Pro< o/'AAAI 92, pp. 38-43. 
Cardie, C. (1993). A Case-Based Approach to Knowledge 
Acquisition for Dotnain-Specil'ic Sentence Analysis. 
l'roc, of AAAI.93, pp. 798-803. 
Oktllllura, M, and Tanaka (1990). Towards htcremental 
I)isambiguation with a Generalized Discrimination Net- 
work. Proc, o/'AAAI-90, Vol. 2, pp. 990-095. 
Qttinlan, J, R. (1993). C4.5 programs for machine learning, 
Morgan Kau flnalul. 
Tanaka, 11. ( 1991 ). The MT User Experience. Proc. o/ MT 
Summit I11, pp. 123-125. 
Tanaka, I1., Aizawa, Kin/and I latada. (1992). A Method of 
Translating English I)elexical Structures into Japanese. 
l'roc, q/'Coling-92, Vol. 2, pp. 567-573. 
Tanaka, 11. and Ellara (1993). Automatic Verbal Case 
Frame Acquisition l'rom Bilingual Corpora (in Japanese). 
lb'oc. 47th Anmml Convention IPS Japan, Vol, 3, pp. 195- 
196. 
\[_Jtstlro, T., MalsumoIo and Nagao.(1993). Verbal Case 
l:rame Acquisition flom Bilingual Corpora. l'roc, qf the 
I, ICAl.93, Vol. 2, pp, 1150-1156. 
Z31 
