STRUCTURAL MATCHING OF PARALLEL TEXTS 
Yuji Matsulnoto 
Graduate School of Information Science 
Advanced Institute of Science and Technology, Nara 
Takayanaa-cho, Ikoma-shi, Na.ra 630-01 Japan 
matsu@is.a ist-na ra.ac.jp 
Hiroyuki Ishimoto Takehito Utsuro 
Department of Electrical Engineering 
Kyoto University 
Sakyo-ku, Kyoto 606 Japan 
{ishimoto, utsuro} @pine.kuee.kyoto-u.ac.jp 
Abstract 
This paper describes a method for finding struc- 
rural matching between parallel sentences of two 
languages, (such as Japanese and English). Par- 
allel sentences are analyzed based on unification 
grammars, and structural matching is performed 
by making use of a similarity measure of word pairs 
in the two languages. Syntactic ambiguities are re- 
solved simultaneously in the matching process. The 
results serve as a. useful source for extracting lin- 
guistic a.nd lexical knowledge. 
INTRODUCTION 
Bilingual (or parallel) texts are useful resources for 
acquisition of linguistic knowledge as well as for ap- 
plications such as machine translation. Intensive 
research has been done for aligning bilingual texts 
at the sentence level using statistical teclmiques by 
measuring sentence lengths in words or in charac- 
ters (Brown 91), (Gale 91a). Those works are quite 
successful in that far more than 90% of sentences 
in bilingual corpora, are a.ligned correctly. 
Although such parallel texts are shown to be use- 
ful in real applications such as machine translation 
(Brown 90) and word sense disambiguatioll (Daga.n 
91), structured bilingual sentences are undoubtedly 
more informative and important for filture natural 
language researches. Structured bilingual or multi- 
lingual corpora, serve a.s richer sources for extract- 
ing linguistic knowledge (Kaji 92), (Klavans 90), 
(Sadler 91), (Utsuro 92). 
Phrase level or word level alignment has also 
been done by several researchers. The Textual 
Knowledge Bank Project (Sadler 91) is building 
lnonolingual and multilingual text bases structured 
by linking the elements with grammatical (depen- 
dency), referential, and bilingual relations. (Karl 
92) reports a method to obtain phrase level corre- 
spondence of parallel texts by coupling phrases of 
two languages obtained in CKY parsing processes. 
This paper presents another method to obtain 
structural matching of bilingual texts. Sentences in 
both languages are parsed to produce (disjunctive) 
feature structures, from which dependency struc- 
tures are extracted. Ambiguities are represented as 
disjunction. Then, the two structures are matched 
to establish a one-to-one correspondence between 
their substructures. The result of the match is ob- 
tained as a set of pairs of minimal corresponding 
substructures of the dependency structures. Exam- 
ples of the results are shown in Figures 1, 2 and 3. 
A dependency structure is represented as a tree, in 
which ambiguity is specified by a disjunctive node 
(OR. node). Circles in the figure show substruc- 
tures and bidirectional arrows show corresponding 
substructures. 
Our technique and the results are different from 
those of other lnethods mentioned above. (Kaji 92) 
identifies corresponding phrases and ahns at pro- 
ducing tra.nslation templates by abstracting those 
corresponding phrases. In the Bilingua.l Knowledge 
Bank (Sadler 91), the correspondence is shown by 
23 
links between words in two sentences, equating two 
whole subtrees headed by the words. We prefer 
the Ininimal substructure correspondence and the 
relationship between substructures. Such a mini- 
mal substructure stands for the minimal meaning- 
ful component in the sentence, which we believe is 
very useful for our target application of extracting 
lexical knowledge fi'om bilingual corpora. 
SPECIFICATION OF 
STRUCTURAL MATCHING 
PROBLEM 
Although the structural matching method shown 
in this paper is language independent, we deal with 
parallel texts of Japanese a.nd English. We assume 
that a.lignment at the sentence level is already pre- 
processed manually or by other methods such as 
those in (Brown 91), (Gale 91a). Throughout this 
paper, we assume to match simple sentences. 1 
DEFINITIONS OF DATA STRUCTURES 
A pair of Ja.panese and English sentences are parsed 
independently into (disjuuctive)feature structures. 
For our present purpose, a part of a feature struc- 
ture is taken out as a dependency structure consist- 
ing of the content words 2 that appear in the original 
sentence. Ambiguity is represented by disjunctive 
feature structures (Kasper 87). Since any relation 
other than modifier-modifyee dependencies is not 
considered here, path equivalence is not taken into 
consideration. Both of va.lue disjunction and gen- 
eral disjunction are allowed. 
We are currently using LFG-like grammars for 
both Japanese and English, where the value of the 
'pred' label in an f-structure is the content word 
that is the head of the corresponding c-structure. 
We start with the definitions of simplified dis- 
junctive feature structures, and then disjunctive 
dependency structures, that are extracted from the 
disjunctive feature structures obtained by the pars- 
ing process. 
Definition 1 Simple feature structures (FS) (L is 
the sel of feature labels, and A is the set of atomic 
values) are defined recursively: 
1 Matching of compound sentences are done by cutting 
them up into simple sentence fragments. 
2In the present system, llOUllS, l)FOtK~utls, verbs, adjec- 
tives, mad adverbs are regarded as content, words. 
NIL 
a where a E A 
1:4 where l E L, 4EFS 
¢ A ~b where 4,¢ E FS 
C V g, where ¢,¢ E FS 
To define (Disjunctive) Depen.dency Structures 
as a special case of an FS, we first require the fol- 
lowing definitions. 
Definition 2 Top label set of an FS ¢, written as 
tl(¢), is defined: 
1. If O = l: if1, then tl(4) = {l}, 
2. If4 = 41A4". or4 = 41V42, then tl(4) = 
tl(41) U ?~l(42). 
Definition 3 A relation 'sibling' between feature 
labels in 4 is defined: 
1. If4 -= l : 41, then l and labels in 41 are not 
sibling, and sibling relation holding in 41 also 
holds in 4. 
2. /Jr4 -- 41A 42, then labels in tl(41) and labels 
in tl(4_,) are sibling. 
3. If ¢ -- 41 V 42, then labels in 41 and labels in 
42 are not sibling. 
Note that the sibling relation is not an equiva- 
lence relation. We refer to a set of feature labels 
in ¢ that are mutually sibling as a sibling label set 
of 4. Now, we are ready to define a dependency 
structure (DS). 
Definition 4 A dependency structure ~b is an FS 
lhaI satisfies the following condition: 
Condition: Every sibling label set of ¢ includes ex- 
actly one 'pred' label. 
The idea behind those are that the value of a 
'pred' label is a content word appearing in the orig- 
inal sentence, and that a sibling label set defines 
the dependency relation between content words. 
Among the labels in a sibling label set, the values 
of the labels other than 'pred' are dependent on 
(i.e., modify) the value of the 'pred' label. A DS 
can be drawn as a tree structure where the nodes 
are either a content word or disjunction operator 
and the edges represent the dependency relation. 
Definition 5 A substructure of an FS 4 is defined 
(sub(4) stands for the sel of all substructures of 
4,): 
1. NIL and 4 itself are substruclures of 4. 
2. If 4 = a (a E A), then a is a s'ubstructare of 
¢. 
24 
English: She has long hair. 
Japanese: ~- 0 -~- ~: J~ 
she - GEN hair - TOP long 
she 
long 
hair 
= ~ 
Figure 1: Example of structural matching, No.1 
English: This child is starving for parental love. 
Japanese: U_. 69 --~- ~ ~- 09 ~- W-_ 
this child - TOP parent- GEN love - DAT 
pa,ental ~~ 
¢ 
be-starving 
this = 
child = 
love = 
~.69 
Figure 2: Example of structural matching, No.2 
English: Japan benefits from free trade. 
Japa,,ese: ~* ~ ~ 0~ ,~,,,N- * 
Japan - TOP free-trade - GEN benefit - ACC 
o°°°°°o.°..O~° oo'" ...................................... ...°°.... 
........ .,e.." . I (, japan.) 
la an 
:~benefit ) c '~ t~f A ~,. ~ / ~;2:~ 
i~ ....... 
" free : ....................... 
%.°..,° 
receive 
japan = El 
benefit = ,~,~ 
trade = I~ 1~ 
Figure 3: Example of structural matching, No.3 
25 
3. If ¢ ---- l : ¢1, then sub(t1) are substructures of 
¢. 
It" \]f ¢ ---- (~1 A (/)2, then for a~y (q C sub(el) and 
for any ¢2 e sub(C2), ¢1A¢~ is a subslruclure 
oft. 
5. If ¢ = ¢1 V ¢2, then for for any '/r/)l ~ sub(~) 1 ) 
and for any ¢2 E sub(C2), ¢1 v¢2 is a sub- 
slr~ucture of ¢. 
The DS derived fi'om an FS is the maximuln sub- 
structure of the FS that satisfies the condition in 
Definition 4. The DS is uniquely determined fi'oln 
an FS. 
Definition 6 A disjunction-free maximal sub- 
structure of an FS ¢ is called a complete FS of 
¢. 
An FS does not usually have a unique complete 
FS. This concept is important since the selection of 
a complete FS corresponds to alnbiguity resolution. 
Naturally, a lnaximal disjunction-free substructure 
of a DS ¢ is again a DS and is called a complete 
DS of ¢. 
Definition 7 A semi-complete DS of a DS ¢ is a 
substruclure of a complete DS of¢ thai satisfies 
the condition in Definilion ~. 
Note that a substructure of a DS is not neces- 
sarily a DS. This is why the definition requires the 
condition in Definition 4. 
A complete DS ~/., can be decomposed into a set 
of non-overlapping selni-complete DSs. Such a de- 
composition defines the units of structural lnatch- 
ing and plays the key role in our problem. 
Definition 8 A set of semi-complete DS of a DS 
¢, D = {¢1,"'¢n}, is called a decomposition of 
¢, iff every ¢i in the set contains at least one oc- 
currence of 'pred' feature label, and every content 
word at the 'pred' feature label appeariT~g in '¢ is 
contained in exactly one ~i. 
Definition 9 Th.e reduced DS of a DS (, with re- 
spect to a decomposition D = {¢1,"-4',~} is con- 
stracted as follows: 
I. ¢i is transformed to a DS, "pred : St', where 
Si is the set of all coT~le~l words appeari~J 9 i7~ 
¢i. Th.is DS is referred to as red(it). 
2. If there is a direcl dependency relatiol~ between 
two conient words wl and w~ that are in ¢i 
and tj (i 7~ j), lh.en lhe dependency relation 
is allotted between ¢i and l/,j. 
Although this definition should be described pre- 
cisely, we leave it with this more intuitive descrip- 
tion. Examples of dependency structures and re- 
duced dependency structures are found in Figures 
1, 2 and 3, where the decompositions are indicated 
by circles. 
It is not difficult to show that the reduced DS 
satisfies the condition of Definition 4. 
STRUCTURAL MATCHING OF BILIN- 
GUAL DEPENDENCY STRUCTURES 
Structural matching problem of bilingual sentences 
is now defined formally. 
Parsing parallel English and Japanese sentences 
results in feature structures, from which depen- 
dency structures are derived by removing unrelated 
features. 
Assmne that ~.'E and 'OJ are dependency struc- 
tures of English and Japanese sentences. The struc- 
tural matching is to find the most plausible one-to- 
one mapping between a decomposition of a com- 
plete DS of CE and a decomposition of a complete 
DS of C j, provided that the reduced DS of CE and 
the reduced DS of Cj w.r.t, the decompositions 
are isomorphic over the dependency relation. The 
isomorphism imposes a. natural one-to-one corre- 
spondence on the dependency relations between the 
reduced DSs. 
Generally, the mapping need not always be one- 
to-one, i.e., all elements in a decomposition need 
not map into another decomposition. When the 
mapping is not one-to-one, we assume that dummy 
nodes are inserted in the dependency structures so 
that the mapping naturally extends to be one-to- 
one. 
When the decompositions of parallel sentences 
have such an isomorphic one-to-one mapping, we 
assume that there are systematic methods to com- 
pute similarity between corresponding elements in 
the decompositions and to compute similarity be- 
tween the corresponding dependency relations 3. 
We write the function defining the former sim- 
ilarity as f, and that of the latter as g. Then, f 
is a flmction over semi-complete DSs derived fi'om 
English and Japanese parallel sentences into a real 
number, and 9 is a function over feature label sets 
3in the case of similarity between dependency relations, 
the original feature labels are taken into accotult. 
26 
of English and Japanese into a real number. 
Definition 10 Given dependency structures, DS1 
and DS,,, of two languages, tile structural match- 
ing problem is to find an isomorphic oT~e-to-one 
mapping m be*ween decompositions of DSa aT~d 
DS2 that maximizes the sum of the vahtes of simi- 
larity functions, f and g. 
That is, the problem is to find the fltnctioT~ m that 
maximizes 
~-~m(f( d, re(d)) + ~t g(l, ,n.(/))) 
where d varies over semi-complete DS of DS1 and 
l varies over feature labels in D,-q. 1. 
The similarity functions can be defined in vari- 
ous ways. "vVe assume some similarity measure be- 
tween Japanese and English words. For instance, 
we assume that the similarity function f satisfies 
the following principles: 
1. f is a simple function defined by the similar- 
ity measure between content words of two la.n- 
guages. 
2. Fine-grained decompositions get larger simi- 
larity measure than coarse-grained decompo- 
sitions. 
3. Dummy nodes should give solne negative vahte 
to f. 
The first principle is to simplify the complexity 
of the structural matching a.lgorithm. The second 
is to obtain detailed structural matching between 
parallel sentences and to avoid trivial results, e.g., 
the whole DSs are matched. The third is to avoid 
the introduction of dunnny nodes when it, is possi- 
ble. 
The fimction g should be defined according to 
the language pair. Although feature labels repre- 
sent grammatical relation between content words 
or phrases and may provide useful information for 
measuring similarity, we do not use tile informa- 
tion at, our current stage. The reason is that we 
found it difficult to have a clear view on the re- 
lationship between feature labels of English and 
Japanese and on the meaning of feature labels be- 
tween semi-complete dependency structures. 
STRUCTURAL MATCHING 
ALGORITHM 
Tile structural matching of two dependency struc- 
tures are combinatorially diflicult problem. V~re 
apply the 1)ranch-and-bound method to solve tile 
problem. 
Tile branch-and-bound algorithm is a top-down 
depth-first backtracking algorithm for search prob- 
lems. It looks for tile answers with the BEST score. 
Ill each new step, it estimates tile maximum value 
of the expected scores along the current path and 
compares it, with the currently known best score. 
The maxinmm expected score is usually calculated 
by a. simplified problem that guarantees to give a 
value not less than the best score attainable along 
the current path. If the maximuna expectation is 
less than the currently known best score, it means 
that there is no chance to find better answers by 
pursuing the path. Then, it gives up tile current 
path and hacktracks to try remaining paths. 
We regard a dependency structure as a tree 
structure that inchtdes disjunction (OR nodes), 
and call a content word and a dependency rela- 
tion as a node and an edge, respectively. Then 
a semi-complete dependency structure corresponds 
to a connected subgraph in the tree. 
The matching of two dependency trees starts 
from the top nodes and the matching process goes 
along edges of the trees. During the matching pro- 
cess, three types of nondeterminisln arise: 
1. Selection of top-most subgraphs in both of the 
trees (i.e., selection of a semi-complete DS) 
2. Selection of edges ill both of tile trees to decide 
the correspondence of dependency relations 
3. Selection of one of the disjuncts a.t an 'OR' 
node 
While tile matching is done top-down, the exact 
score of the matched subgraphs is calculated us- 
ing the similarity function f.4 When the matching 
process proceeds to the selection of the second type, 
it selects an edge in each of the dependency trees. 
The maximum expected score of matching the sub- 
trees under the selected edges are calculated from 
the sets of content words in the subtrees. Tile cal- 
culation method of the maximum expected score is 
defined ill solne relation with the similarity func- 
tion f. 
Suppose h is the function that gives the maxi- 
mum expected score of two subgraphs. Also, sup- 
pose B and P be the currently known best score 
4~,Ve do not take into account the similarity measure 
between dependency relations as stated in the preceding 
section. 
27 
and the total score of the already matched sub- 
graphs, respectively. If s and t are the subgraphs 
under the selected edges and s' and t' are the whole 
relnailfing subgraphs, the matching under s and t 
will be undertaken fi, rther only when the following 
inequation holds: 
P + h(s,t) + h(s',t') > B 
Any selection of edges that does not satisfy this 
inequality cannot provide better matching than the 
currently known best ones. 
All of the three types of nondeterminism are sim- 
ply treated as the nondeterminism in the algorithm. 
The syntactic ambiguities in the dependency 
structures are resolved sponta.lmously when the 
matching with the best score is obtained. 
EXPERIMENTS 
We have tested the structural matching algorithm 
with 82 pairs of sample sentences randomly selected 
froln a Japanese-English dictionary. 
We used a machine readable Japanese-English 
dictionary (Shimizu 79) and Roget's thesaurus (Ro- 
get 11) to measure the silnilarity of pairs of content 
words, which are used to define the fimctiou f. 
Similarity of word pairs 
Given a pair of Japanese and English sentences, 
we take two methods to lneasure the similarity be- 
tween Japanese and English content words appear- 
ing in the sentences. 
For each Japanese content word wj apl)earing in 
the Japanese sentence, we can find a set of translat- 
able English words fl'om the Japanese-Ellglish die- 
tionary. When the Japanese word is a. polysemous 
word, we select an English word fi'om each polyse- 
mous entry. Let CE\] be the set of such translat- 
able English words of wj. Suppose CE is the set of 
contents words in the English sentence. The trans- 
latable pairs of w j, Tp(u u), is de.fined as follows: 
Tp(wj) = {(wj,'wE) \['we E CE., n C.'L,} 
We use Roget's thesaurus to measure similarity 
of other word pairs. Roget's t.hesaurtls is regarded 
as a tree structure where words are a.llocated at the 
leaves of the tree: For each Japanese content word 
'wj appearing in tim Japanese sentence, we can de- 
fine the set of translatable English words of wa, 
CEj. From each English word in the set., the mini- 
mum distance to each of the English content words 
appearing in the English sentence is measured. 5 
This minimum distance defines the similarity be- 
tween pairs of Japanese and English words. 
We decided to use this similarity only for esti- 
mating dissimilarity between Japanese and English 
word pairs. We set a predetermined threshold dis- 
tance. If the minimal distance exceeds the thresh- 
old, the exceeded distance is counted as the nega- 
tive similarity. 
The similarity of two words Wl and w2 appear- 
ing in the given pair of sentences, sim((wl, w~)), is 
defined as follows: 
) = 
6 (wl, w2) E Tp(wl) or ('w2, 'wx) E Tp(w2) 
-I~ (,w~, w.) ~t Tp(w~) and (w2, w~) ft Tp(w.,) 
and the distance between wl and w., 
exceeds the threshold by k. 
0 otherwise 
Similarity of semi-complete DSs 
The similarity between corresponding semi- 
complete DSs is defined based on the similarity be- 
tween the content words. Suppose that s and t are 
semi-colnplete DSs to be matched, and that Vs and 
Vt are the sets of content words in s and t. Let A 
be the less larger set of l~ and Vt and B be the 
other (I A I<l B I). For each injection p from A 
into B, the set of word pairs D derived from p can 
be defined as follows. 
Now, we define the similarity fimction f over 
Japaaese and English semi-colnplete DSs to give 
the naa.xinmm value to the following expression for 
all possible injections: 
( 
= max/  × O.951vd+IVd -~ 
J 
The summation gives the maximuna sum of the 
similarity of the content words in s and t. 0.95 is 
the penalty when the semi-complete DSs with more 
than one content words are used in the matching. 
Figures 1, 2 and 3 shows the results of the struc- 
tural matching algorithm, in which the translatable 
pairs obtained fi'om the Japanese-English dictio- 
nary are shown by the equations. 
5 The dlstaame between words is tile length of tile shortest 
path in the thesatu'us tree. 
28 
Table 1: Results of experiment, s 
Parsing J al)anese and English sent.enccs 
Number of sentences 82 
Parse failure 23 
Parsable 59 
Correct parsability 
Correctpa.rse \] 53 \] 89.8%(53/59) 
Incorrect parse 6 10.2% (6/59) 
The match with tile best score includes 
Correct matching 47 89% (47/53) 
no correct naatching 6 11% (6/53) 
Single correct matching 34 64% (34/53) 
Results of the experiments 
We used 82 pairs of Japanese and English sen- 
tences appearing in a Japanese-English dictionary. 
The results were checked and examined in detail by 
hand. Some of the sentences are not parsable be- 
cause of the limited coverage of our current gram- 
mars. Although 59 pairs of them are parsable, 6 
out of them do not include correct parse results. 
The structural matchi,lg algorithm with the set- 
ting described above is applied to the 53 pairs. The 
cases where the correct, matchilig is not included in 
the best rated answers are 6 out of them. The 
remaining 47 pairs include the correct matching, 
of which 31 pairs result in the correct matching 
uniquely. Tal)le 1 sumnaarizes tile results. 
EVALUATION AND DISCUSSION 
Although the number of sentences used in tile ex- 
periments is small, the result, shows that about 
two third of the pairs give the unique matching, 
in which every syntactic ambiguity is resolved. 
The cases where no correct matching was ob- 
tained needs be examined. Some sentences contain 
an idiomatic expression that has coml)letely differ- 
ent syntactic structures fl'om the sentence struc- 
ture of the other. Such an expression will 110 way 
be matched correctly except that the whole struc- 
tures are matched intact. Other cases are caused by 
complex sentences that include an embedded sen- 
tence. When the verbs at the roots of the depen- 
dency trees are irrelevant, extraordinary matchings 
are produced. We intend not to use our method to 
match complex or compound sentences as a whole. 
~,¥e will rather use our method to find structural 
matching between simple sentences or verb phrases 
of two languages. 
Tile matching problmn of complex sentences are 
regarded as a different problem though the simi- 
lar technique is usable. We think that the scores 
of matched phrases will help to identify tile cor- 
responding phrases when we match complex sen- 
tences. 
Taking the sources of other errors into consider- 
ation, possible improvements are: 
1. Enhancement of English and Japanese gram- 
mars for wider coverage and lower error rate. 
2. Introduction of more precise similarity mea- 
surement of content words. 
3. Utilization of grammatical information: 
• Feature labels, for estimating matching 
plausibility of dependency relations 
• Part of speech, for measuring matching 
plausibility of content words 
• Other grammatical information: mood, 
voice, etc. 
The first two iml)rovements are undoubtedly im- 
portant. As for the similarity measurement of con- 
tent words, completely different approaches such 
as statistical methods may be useful to get good 
translatable pairs (Brown 90), (Gale 91). 
Various grammatical information is kept in the 
feature descriptions produced in the parsing pro- 
cess. However, we should be very prudent in using 
it. Since English and Japanese are grammatically 
quite different, some grammatical rela.tion may not 
be preserved between them. In Figure 3, solid ar- 
rows and circles show the correct matching. While 
'benefit' matches with the structure consisting of ' 
,~,,~ ' and ' ~_.~ ~ ', their dependent words 'trade' 
and ' H~:~' modify them as a verb modifier 
and as a noun modifier, the grammatical relation 
of which are quite different. 
This example highlights another interesting 
point. Dotted arrows and circles show another 
matching with the salne highest score. In this case, 
'japan' is taken as a verb. This rather strange in- 
terpretation insists that 'japan' matches with ' H~ 
' and ' .~ 6 '. Since 'japan' as a verb has little se- 
lnantic relation with ' \[\]:~ ' as a country, discrim- 
ination of part-of-speech seems to be useful. On 
the other hand, the correspondence between 'ben- 
efit' and ' ~,~ ' is found in their noun entry in the 
dictionary. Since 'benefit' is used as a verb in the 
29 
sentence, taking part-of-speech into consideration 
may jeopardize the correct matching, either. The 
fact that the verb and noun usages of 'benefit' bear 
common concept implies that more precise similar- 
ity measurement will solve this particular probleln. 
Since the interpretations of the sample English sen- 
tences are in different mood, imperative and declar- 
ative, the mood of a. sentence is also usefnl to re- 
move irrelevant interpretations. 
CONCLUSIONS 
The structural matchillg problem of parallel texts 
is formally defined and our current implementation 
and experilnents are introduced. Although the re- 
search is at the preliminary stage and has a. very 
simple setting, the experiments have shown a. nuln- 
ber of interesting results. The method is easily 
enhanced by ilnproving the gramnm.rs and by in- 
corporating more accurate similarity measurement. 
Number of other researches of building tra.nsla- 
tion dictionaries and of deterlnining similarity re- 
lationship between words are useful to improve our 
method. 
To extract useful information fl'om bilingual cor- 
pora, structural matching is inevitable for language 
pairs like English and Japanese that have quite dif- 
ferent linguistic structure. Incidentally, we have 
found that this dissimilarity plays an important 
role in resolving syntactic ambiguities since the 
sources of anlbiguities in English and Japanese sen- 
tences are in many cases do not coincide (Utsuro 
92). We are currently working on extracting verbal 
case frames of Japanese fi'om the results of struc- 
tural matching of a aal)anese-l~nglish corpus (Ut- 
suro 93). The salne teclmique is naturally a.pplica- 
ble to acquire verbal case fi'ames of English as well. 
Another application we are envisaging is to extract 
translation pattern from the results of structural 
matching. 
We plan to work on possible improvements dis- 
cussed in the preceding section, and will make large 
scale experiments using translated newspal~er arti- 
cles, based on the phrase matching stra.t.egy. 
ACKNOWLEDGMENTS 
This work is partly supported by the (-;rants 
from Ministry of Education, "Knowledge Science" 
(#03245103). 

REFERENCES 
Brown, P.F., et al., A Statistical Approach to Ma- 
chine Translation, Computalional Linguistics, 
Vo1.16, No.2, pp.79-85, 1990. 
Brown, P.F., Lai, J.C. and Mercer, R.L., Align- 
ing Sentences ill Parallel Corpora, ACL-91, 
pp.169-176, 1991. 
Dagan, I., Itai, A. and Schwall, U., Two Lan- 
guages are More Iuformative than One, ACL- 
91, pp.130-137, 1991a. 
Gale. W.A. and Church, K.W., A Program 
for Aligning Sentences in Bilingual Corpora, 
ACL-91, pp.177-184, 1991b. 
Gale. W.A. and Church, K.W., Identifying 
Word Correspondences in Parallel Texts, '91 
DARPA Speech and Natural Language Work- 
shop, pp.152-157, 1991. 
Kaji, H., Kida, Y., and Morimoto, Y., Learning 
Translation Templates froln Bilingual Text, 
COLING-92, pp.672-678, 1992. 
Kasper, R., A Unification Method for Disjunc- 
tive Feature Descriptions, ACL-87, pp.235- 
242, 1987. 
Klavans, J. and Tzoukermann, E., The BICORD 
System: Combining Lexical Information from 
Bilingual Corpora. and Machine Readable Dic- 
tionaries, COLING-90, pp.174-179, 1990. 
Miller, G.A., et al., Five Papers on WordNet, Cog- 
nilive Science Laboratory, Princeton Univer- 
sity, CSL Report 43, July 1990. 
Roget, S.R., Roget's Thesaurus, Crowell Co., 
1911. 
Sadler, V., The Textual Knowledge Bank: De- 
sign, Construction, Applications, Proc. h~ler- 
national Workshop on Fundamental Research 
for the Future Generation of Natural Language 
Processing (FGNLP), pp.17-32, Kyoto, Japan, 
1991. 
Shimizu, M., et al. (ed.), Japanese-English Dictio- 
nary, Kodansha, 1979. 
Utsuro, T., Matsumoto, Y., and Nagao, M., Lexi- 
cal Knowledge Acquisition from Bilingual Cor- 
pora., COLING-92, pp.581-587, 1992. 
Utsuro, T., Matsumoto, Y., a.nd Nagao, M., Ver- 
bal Case Frame Acquisition from Bilingual 
Corpora, to appear IJCAI-93, 1993. 
