Acquisition of Phrase-level Bilingual Correspondence 
using Dependency Structure 
Kaoru Yamamoto and Yuji Matsumoto 
Graduate School of Information Science, 
Nara Institute of Science and Technology, 
8916-5, Takaymna-cho, Ikoma-shi, Nara, Japan 
Abstract 
This paper describes a method to find phrase- 
level translation patterns from parallel corpora 
by applying dependency structure analysis. We 
use statistical dependency parsers to determine 
dependency relations between base phrases in a 
seN;ence. Our method is tested with a business 
expression corpus containing 10000 English 
Japanese sentence pairs and achieved approx- 
imately 90 % accuracy in extracting bilingual 
correspondences. The result shows that the use 
of dependency relation helps to acquire interest- 
ing translation patterns. 
1 Introduction 
Since the advent of statistical methods in Ma- 
chine qh'anslation, the bilingual sentence align- 
merit (Brown et al., 1991) or word alignment 
(Dagan et al., 1992) have been explored and 
achieved numerous success over the last decade. 
In coN;rasl,, fewer resull;s are reported in phrase- 
level correspondence. As word sequences are 
not translated literally a word for a word, 
acquiring phraseqevel correspondence still re- 
mains an important problem to be exploited. 
This paper proposes a method to extract 
phrase-level correspondence fi'om sentence- 
aligned parallel corpora using statistically prob- 
able dependency relations, i.e. head-modifier re- 
lations in a sentence. 
The distinct characteristics of our approach 
is two-fold. First, our approach uses depen- 
dency relations rather than alignment, cognate 
and/or position heuristics previously applied 
(Melamed, 1995). Our approach is based on 
the assumption that the word ordering and po- 
sitions may not necessarily coincide between the 
two s, but the dependency structure 
between words will be preserved. We believe 
that dependency relations offer richer linguistic 
clues (syntactic information) and are effcctive 
for  pairs with different word ordering 
constraints. 
Secondly, statistical dependency parsers are 
used to obtain candidate patterns. Previous 
methods mostly use rule-based parsers for pre- 
processing(Matsumoto et al., 1993), (Kitamura 
and Matsumoto, 1995). The progress in parsing 
technology are noteworthy, and in particular, 
various statistical dependency models have been 
proposed(Collins, 1997),, (Ratnaparkhi, 1997), 
(Charniak, 2000). It has an advantage over the 
rule-based counterpart in that it achieves wider 
coverage, does not need to care for consistency 
in rule writing, and is robust to domain changes. 
We conjecture that our approach improves cov- 
erage a.nd robustness by use of sl;atistical depen- 
dency parsers. 
In this paper, we aim to bwestigate tim effi- 
cacy of statistically probable dependency struc- 
ture in finding phrase level bilingual correspon- 
dence. Though our discussion will proceed for 
English Japanese phrasal correspondence, the 
proposed approach is applicable to any pair of 
s. 
This paper is organised as follows: In the 
next section, we present the overview of our ap- 
proach. In Sections 3 and 4, components are 
elaborated in detail. In Section 5~ experiment 
and results are given. In Section 6, we compare 
our approach with related works, and finally our 
findings are concluded in Section 7. 
2 Overview of Our Approach 
Our approach presupposes a sentence-aligned 
parallel corpora. The task is divided into two 
steps: a monolingual step in which candidate 
patterns are generated by use of dependency re- 
lations, and a bilingual step in which these can- 
didate patterns fi'om each  are paired 
933 
J Sentences --aligned-- E Sentences 
ii iiii ; i i Candidate 
Generator 
J Candidates 
I Candidate Generator I 
E Candidates 
/ 
?Phrase Matching \] 
JE Translation Patterns 
Figure 1: flow of our approach 
with their translations. Figure1 shows the flow 
of our method. 
Our primary aim is to investigate the effec- 
tiveness of dependency structures in the mono- 
lingual candidate generation step. For this rea- 
son, the bilingual step borrows the weighted 
Dice coefficient and greedy determination from 
(Kitanmra and Matsumoto, 1996). 
In the following sections, we explain each step 
in detail. 
3 Dependency-Preserving Candidate 
Patterns 
Dependency grammar or related paradigm 
(Hudson, 1984) focuses on individual words and 
their relationships. In this framework, every 
phrase is regarded as consisting of a gover- 
nor and dependants, where dependants may 
be optionally classified further. The syntacti- 
cally dominating word is selected as the gov- 
ernor, with modifiers and complements acting 
as dependants. Dependency structures are suit- 
ably depicted as a directed acyclic graph(DAG), 
where arrows direct from dependants to gover- 
nors. 
We use a maximum likelihood model pro- 
posed in (Fujio and Matsumoto, 1998) where 
the dependency probability between segments 
are determined based on its co-occurrence and 
distance. It has constraints that (a) dependen- 
cies do not cross, (b) ee;ch segment has at least 
one governor I . Furthermore, the model has an 
1except for the 'root' segment. For Japanese, the 
'root' segment is the rightmost segment. For English, 
option to allow multiple dependencies whose 
probabilities are above certain confidence. It 
is useflfi for cases where phrasal dependencies 
cannot be determined correctly using only syn- 
tactic information. It has an effect of improving 
recall by sacrificing precision and may contain 
more partially correct results useful for our can- 
didate pattern generation. 
We apply the following notions as units 
of segments: For English, (a) a preposition 
or conjunction is grouped into the succeeding 
baseNPs 2, (b) auxiliary verbs are grouped into 
the succeeding main verb. For Japanese, one 
(or a sequence of) content word(s) optionally 
followed by function words 3. 
Having chunked into suitable segments, sen- 
tcnccs are parsed to obtain dependency rela- 
tions. We have setup the following three mod- 
els: 
1. best-one model : uses only the most 
likely (statistically best) dependency rela- 
tio~m. At most one dependency is allowed 
for each segment. 
2. ambiguous model : uses dependency re- 
lations above the certain confidence score 
0.54 . Multiple dependencies may be con- 
sidered for each segment. 
3. adjacent model : uses only adjacency re- 
lations between segments. A segment is ad- 
jacent to the previous segment. 
In tile ambiguous model, we expect that 
nlore likely dependency relations will appear 
frequently given in a large corpus, thereby in- 
creasing the correlation score. Hence, ambiguity 
at parsing phase will hopefully resolved in the 
following bilingual pairing phase. As for the ad- 
jacent model, only chunking and its adjacency 
are used. 
Finally, dependency relations between seg- 
ments is used to generate candidate patterns. 
the segment that contains tim mahl verb is regarded as 
the 'root' segment. 
2a ba~seNP or 'minimal' NP is non-reeursive NP, i.e. 
none of its child constituents are NPs. 
'~often referred ,~s a bunsetsu. 
4statistically-not-the-best dependencies m'e also in- 
eluded if 
prob(kth -- ranked dependency) 
p~ob((g + 1)th ~ged dependency) -> O.S O) 
934 
\[1\] 
size i) 
size 2) 
size 3) 
\[saw\] \[agirl\] \[inl lhe park\] 
{1, saw, gM, park} 
{l_saw, girl_saw, in-park_saw} 
{lgirlsaw(T), l_in-parksaw(T)} 
\[I\] \[saw\] \[agMl \[inl the park\] 
size i) 
size 2) 
size 3) 
{I, saw, girl, park} 
{ saw_l, girlsaw, in-park_girl } 
{girl saw_l(L), in-parkgirlsaw(L) } 
Ill 
size l) 
size 2) 
size 3) 
Figure 2: best-one model 
\[saw\] \[a gM\] \[inl theparkl 
{I, saw, gM, park } 
{ I__saw, gM saw, inq~ark saw, in-park_girl } 
{ I_ gil'l saw(T), l j n-parksaw(T), in-park, girl saw(L) } 
Figure 3: anfl.figuous model 
In this paper, dependency size of a candidate 
pattern designates the nulnber of segments con- 
netted through dependency relations. Figures 
2, 3, and 4 illustrate examples of English can(li- 
(late patterns of dependency size 71, 2 and 3 for 
the proposed dependency models. 
In a del)endency-connected candidate pat- 
tern, function words of the governor segment is 
dropped. This is to cope with data sparseness 
in generated candidate patterns. Moreover, two 
types of DACs can be generated from patterns 
of size 3, and we use DAO-type tags ('I2 and 
'T') to distinguish their types. W(' also note 
that candidate patterns do not necessarily for 
low the word ordering of original sentences. 
The algorithn~ is as follows: 
Input: a corpus, the inininmm occurrence 
threshold in a corpus fmin and the dependency 
size dw. 
For each sentence ill a corpus, process tlm fol- 
lowing: 
1. Part-of-Speech Tagging 
2. Chunking: Rules are written as regular ex- 
pressions defined over POS word sequences. 
3. Dependency Analysis 
4. Candidate Pattern Generation: Candidate 
patterns are generated and stored with 
their sentence ID. Dependency-connected 
patterns of less than or equal to the size 
dw are extracted. 
Figure 4: adjacent model 
Output: a hash-table that maps from candi- 
(late patterns appearing at least the minimum 
occurrence f'min to their sentence IDs found in 
the corpus. 
4 Phrase-level Correspondence 
Acquisition 
Pairing of candidate patterns is a confl)inatorial 
problem and we take tile following tactics to 
reduce the seard~ space. First, our algorithm 
works in a greedy manlmr. This nmans that a 
translation pair deternfined in the early stage of 
the algorithm will imver be consktered again. 
Secondly, filtering process is incorporated. 
Figure 5 illustrates filtering for a sentence pair 
"l saw a girl in the park/*\]~ ~:~ ./L ~ , DJ cO (J/" ~- ~ }~ 
\]-". A set of candidate patterns derived fi'oln 
English is depicted on tile left, while thai; from 
Japanese is depicted on the right. Once a 
pair "I_girl_saw(T)/&..~'~ ~ _~ k (T)" is de- 
termied as a translation pair, then the algo- 
rithm assumes that "gI~_ ~'~ ~ }~\]:-- (T)" will 
not be paired with candidate patterns related 
to "Lgirl~qaw(T)" (cancelled by diagonal lines 
in Figure 5) for tile sentence pair. The oper- 
ation effectively discards the found pairs and 
causes recalculation of correlation scores in the 
proceeding iterations. 
As mentioned in Section 2, our correlation 
score is calculated by the weighted Dice Coeffi- 
cient defined as: 
2f .i 
pj) = + .5 
where .\[j and .re are the number of occurrences 
in Japanese and English corpora respectively 
and fej is the number of co-occurrences. 
'.\['.he algorithm is as follows: 
Input: hash-tables of candidate patterns for 
each , the initial threshold of frequency 
.fc~,rr and the final threshold of fi'equency fmin. 
935 
/\[ I 
\[ . \[ " 
i, i I 
L .'\[ 
i 
Figure 5: Filtering: word correspondence = 
Repeat the following until fcurr reaches fmin. 
1. For each pair of English candidate pe and 
Japanese candidate pj appearing at least 
f~,.r times, identify the most likely cor- 
respondences according to the correlation 
scores. 
• For an English pattern Pc, obtain the 
correspondence candidate set pa = { 
Pjl, Pj2, ..., Pin } su& that sim(pe,pjk) 
> log2 fmir~ for all k. Similarly, obtain 
the correspondence candidate set PE 
for an Japanese pattern pj 
• Register (Pe,Pj) as a translation pair if 
pj = arglnax Pjk E PJ sim( Pc, Pjk ) 
and Pc = argmax Pek C PE sin1( pj, 
p& ). The correlation score of (Pe,Pj) 
is the highest among PJ for Pe and PE 
for pj. 
2. Filter out the co-occurrence positions for 
Pc, Pj, and related candidate patterns. 
3. Lower the threshold of frequency if no more 
pairs are found with fcurr. 
5 Experiment and Result 
5.1 Experimental Setting 
We use a business expression corpus (Takubo 
and Hashimoto, 1995) containing 10000 sen- 
tences pairs which are pre-aligned. 
NLP tools are summarised in Table 1. 
Parameter setting are as follows: dependency 
size d~ is set to 3. Initially, fc~,.~ and fmin are 
set to 100 and 2 respectively. As tile algorithm 
proceeds, f,u~ is adjusted to half of its previous 
value if it is greater than 10. Otherwise f,~r," is 
(I, gI~)(saw, g \]5_ ) (girl, ~" ~k)(park, g~, N ) 
preprocessing tool 
POS(E) ChaSen2.0 96% precision 
POS(J) ChaSen2.0 97% precision 
chunking(E) SNPlexl.0 rule-based 
cNmking(J) Unit rule-based 
dependency(E) edep trial system 
dependency(J) jdep 85 87 % precision 
Table 1: NLP tools 
decremented by i. If the number of registered 
translation pairs is less than 10, then fcurr is 
lowered in the next iteration. All parameters 
are empirically chosen. 
5.2 Result 
Our approach is evaluated by the metrics de- 
fined below: 
count(pl,) 
pr ccisicm - count (px ) 
Ep, (h;ngth(pt) * cofreq(pt) ) 
coverage = ~pl occur (Pl ) 
Precision measures the correctness of ex- 
tracted translation pairs, while coverage mea- 
sures tile proportion of correct translation pairs 
in the parallel corpora. Let X be a pattern. 
count(X) gives tile mmlber of X returned, 
occur(X) gives the mlmber of occurrences of X 
in each corpus, length(X) gives the dependency 
size of X and cofrcq(X) gives the number of co- 
occurrences in the parallel corpora.. Px nmans 
extracted patterns, and of which correct pat- 
terns are designated as pt- p~ means the candi- 
date patterns generated from each side of paral- 
lel corpora. Coverage is calculated for English 
936 
th 
25 
12 7 
10 6 
9 4 
8 13 
7 10 
6 19 
5 29 
4 67 
3 150 
2 414 
( *2 264 
total 725 
( *totM 989 
correct extracted c / e 
6 6 100.00 
7 100.00 
7 85.71 
4 100.00 
13 100.00 
13 76.92 
20 95.00 
29 t.00.00 
72 93.05 
164 91.46 
461 89.80 
474 55.69 
796 -- 
1269 -- 
precision 
1.00.00 
95.00 
95.83 
92.30 
97.29 
92.00 
92.85 
94.94 
94.15 
92.83 
91.08 
77.93) 
91.08 77.93) 
Tal)le 2: Precision: best one model 
th correct 
25 6 
12 7 
10 6 
9 4 
8 13 
7 11 
6 18 
5 29 
4 68 
3 118 
2 432 
( *2 256 
total 712 
( *total 968 
extracted 
6 
7 
7 
4 
1.3 
13 
19 
29 
73 
126 
468 
759 
765 
1.524 
C / e 
100.00 
100.00 
85.71 
100.00 
100.00 
84.61 
94.73 
100.00 
93.15 
93.65 
91..50 
33.72 
precision 
100.00 
i00.00 
95.00 
95.83 
97.29 
94.00 
94.20 
95.91 
94.73 
94.27 
93.07 
63.51) 
93.07 
63.51) 
Table 3: Precision: aml)iguous model 
and Japanese separately and then thier nman is 
taken. 
Precision for each model is summarised in Ta- 
bles 2, 3, and 4, while coverage is shown in Table 
5. To examine the characteristics of each model, 
we expand correspondence candidate sets PE 
and Pa so that patterns '5 with tile correlation 
score > log2 2 (> 1) are also considered. These 
are marked by asterisks "*" in Tables. 
Random samples of correct and near-correct 
translation pairs are shown in Table 6, Table 
7 respectively. Extracted translation pairs are 
matched against the original corpora to restore 
their word ordering. This restoration is done 
nmnually this time, but can be automated with 
little modification in our algorithm. 
5 1.(L patterns where f~j = f(, = fj = fi,~i,, = 2 
(*2 
total 
( *totM 
th 
25 6 
12 7 
10 6 
9 4 
8 13 
7 10 
6 18 
5 29 
4 68 
3 114 
2 419 
280 
694 
correct extracted 
6 
7 
7 
4: 
13 
13 
19 
29 
73 
126 
484 
496 
781 
974 1277 
c / o precision 
100.00 100.00 
100.00 100.00 
85.71 95.00 
100.00 95.83 
100.00 97.29 
84.61 92.00 
94.73 92.75 
100.00 94.89 
93.15 94.15 
93.65 92.59 
86.57 88.86 
56.45 76.27) 
Table 4: Precision: ad.jacent model 
88.86 
76.27) 
model English Japanese coverage 
I)est-one 18.16 % 18.43 % 18.29 % 
best one* 19.12 % 19.59 % 19.13 % 
~mt)iguous 18.63 % 18.82 % 18.72 % 
ambiguous* 19.57 % 19.95 % 19.76 % 
~d.ia('ent 17.74 % 18.03 % 17.88 % 
adjacent* 18.69 % 19.20 % 18.94 % 
Table 5: Coverage 
5.3 Discussion 
As we see from Tal)le 2 and 3, the t)est one 
model adfieves 1)etter precision than the adja- 
cent model. Upon inspecting the results, nearly 
the same translation patterns are extracted for 
higher thresholds. This is because our depen- 
dency parsers use the distance feature in deter- 
mining dependency. Consequently, nearer seg- 
ments are likely to 1)e dependency-related. Ex- 
periment data shows that tile exact overlaps 
are found in 9348 out of 14705 (63.55%) candi- 
(late patterns for English and 6625 out of 11566 
(57.27%) for Japanese. 
However, the difference appears when the 
threshold reaches 3 and patterns such as " not 
hesitate to contact/~)~, ~ < ~*~,~" which is 
not found in the adjacent model are extracted. 
Moreover, the l)est-~one model is l)ettm" in terms 
of coverage. These results support that the de- 
pendency relations appear useful clues than just 
being linearly ordered. 
Comparing the 1)est one model with the am- 
t)iguous model, the aml)iguous model achieves 
a higher precision except for *2. This indicates 
937 
English 
thank+you 
consultations+include 
apply+for_t he_position 
thank+you+in_advance 
not +hesitate+to_contact 
b e+enclosed+a_copy 
be_writing+toJet+know 
applications+include 
upcoming_borard+of_director_s "_meeting 
willJ~ave+to_cancel 
have+high_hope 
business+is_expanded 
we+have_learned+t~om_your_fax 
leaving+in+about _ten_days 
get +you+in_close_business_relationship 
we+are_inquiring+regarding 
pay+special_at tcntion 
Japanese 
~b U D~'n~- 9 
~_t~ :b o ~ +~ 6-~_~ ~ ~- 
~± ~-_a',5 _~ +i~_t~ < +t~ 
:k ~ ~ lc_ +Jt/l.f,~_t- ,5 
,~_1_0_ H _f~ +,N ~ 
score 
4.7037 
2.3219 
2.2157 
1.6000 
1.6000 
1.0566 
1.0566 
1.0000 
1.0000 
1.0000 
1.0000 
1.0000 
1.0000 
1.0000 
1.0000 
1.0000 
1.0000 
Table 6: random smnples of correct translation patterns in best-one model. "+" indicates a segment- 
sepm'ator ~md "2' indicates a morpheme-sepm'ator. 
~English 
(have_been_pleased) +to_serve+as_thier_main_banker 
\[be_held\] -t-aLhotel_new_oht ani 
assets _position+ (in_good_shape) 
(have_been_placed) +into_our_file 
(put) +one_monthJimit 
\[passed\] +on_past _tuesday 
Japanese 
~.l, k+ ~ _a) + 7 7 4 :b 
l_O H _cO +iN liJ~ 
Table 7: randmn samples of near-correct translation patterns where score is 1.000. Segments to bc deleted 
to become correct patterns are embraced by "0". Segments to be added are embraced by "~" 
that the accuracy of dependency parsers cur- 
rently achieves are insufficient, and therefore, 
better to expand the possibilities of candidate 
patterns by allowing redundant dependency re- 
lations. As the dependency parsers improve, tlm 
best~one model will outperform the ambiguous 
model. However, as the result of *2 shows, can- 
didates from redundant dependency relations 
are mostly exl;racted at the low threshohl. The 
overall trend reveals that redundant relations 
act as noise at low thresholds, but help to scale 
up the the correlation score at higher thresh- 
olds. 
As shown in Table 6, a domain-specific dis- 
ambiguation sample ("Thank youFb U ;b~ ~ 9 " 
vs. "Thank you in advance/~:b -z "E ~3N~ 
b 3= W • 9-") is found. As for long-distance 
dependency-related translation patterns, "~i"- 
case (nominative) and verb patterns (consulta- 
t;ions include/~,~ t:-- I,:1;. ~ ~ ~ ) are extracted 6. 
6A typical Japanese sentence follows S-O-V s~ructure: 
Other types of long-distance translation pat- 
terns such as "~d "-case (accusative) and verb 
patterns (be held at X/X -d ~g@.9- ;5 ) are not ex- 
tracted even candidate patterns fi'om each cor- 
pus are generated. 
Generally speaking, acquiring long-distance 
translation patterns is a hard problem. We still 
require fllrther investigation examining under 
what circumstance the dependency relations are 
really effective. So far, we use relatively "clean" 
business expression corpora which is a collec- 
tion of standard usage. However, in the real 
world setting, more repetitions and variations 
will be observed. Adjuncts can be placed in less 
constrained way and the adjacent model cannot 
deal with if they are apart. In such cases, awdl- 
ablilty of robust dependency parsers become es- 
sential, dependency relations plays a key role in 
finding the long-distance translation patterns. 
while tile English counterpart follows S-V-O s~rucfiure. 
938 
6 Related Works 
Smadja et a1.(1996) finds rigid and flexible col- 
locations. They first identify candidate collo- 
eLtions in English~ and subsequently, find the 
corresponding lq'ench collocations by gradually 
expanding the candidate word sequences. Ki- 
tamura et a1.(1996) enmnerates word sequences 
of arbitrary length (n-grmn of content words) 
that appear more than the mininmln threshold 
from English and Japanese and attempts to tlnd 
the correspomlence based on the prepared can- 
didate lists. 
Difference from Smadja et a1.(t996) is that 
our method is hi-directional and difference from 
KiLamura et al. (1996) is that we use de- 
pendency relations whidl leads to "structured" 
phrasal correspondence as opposed to "flat" ad- 
jacent correspondence. 
On the other hand, Matsumoto et a1.(1993), 
KiLamura et a1.(1995) and Meyers et a1.(1996) 
use. dependency structure for structm'al match- 
ing, of sentences to acquire translation rules. 
Their methods empl W grammar-based parsers 
and only work for declarative sentences. Their 
objectives are complete inatching of dependency 
trees of two s. 
:\[nstea(t, our method uses statistical depen- 
dency parsers and are not restricted to sim- 
ple sentences for input. Fnrthermore, we are 
concerned with partial matdfing of dependency 
trees so that the overall robustness and coverage 
will be improved. 
7 Conclusion 
In this paper, we propose a method to find 
phrase-level bilingual correspondence using de- 
pendency structure from parallel corpora. We 
have conducted a preliminary experiment with 
10000 business sentence pairs of English and 
Japanese and achieved approximately 90% pre- 
cision. 
Though a fuller investigation still requires, 
our finding shows that the dependency rela- 
tions serve as useful linguistic clues in the task 
of phrase-level Mlingual correspondence acqui- 
sition. 

References 

P.\]?. Brown, J.C. Lai, and R,.L. Mercer. 1991. 
.Aligning sentences in parallel corpora. In 
ACL-29: 29th Annual Mceting of the Asso- 
ciation .for Computational Linguistics, pages 
169-176. 

E. Charniak. 2000. A maximum-entropy- 
inspired parser. In NAA CL-2000: 1st Meet- 
ing of the North American Cltapter of the 
Association for Computational Lingv, islics, 
pages 132 139. 

M.J. Collins. 1997. Three generative, lex- 
icalised models for statistical parsing. In 
ACL-97: 35th Annual Meeting of the Asso- 
ciation for Computational Linguistic% pages 
16-23. 

I. Dagan, K. Church, and W. Gale. 1992. R.c)- 
bust bilingual word alignment for machine 
aided translation. In PTve. of life Workshop 
on Very Large Co77)ora, pages 1-8. 

M. Fujio and Y. Matsumoto. 1998. Japanese 
dependency structure analysis based on Icy 
icalized statistics. In Pros. of 3rd Conf. on 
Empcricat Mcthods in Natural Language Pro- 
cessing, pages 88 96. 

R.. Hudson. 1984. Word Grammar. Blackwell. 
M. Kitamura and Y. Matsumoto. 1995. A ma- 
chine translation system based on transla- 
tion rules acquired from parallel corpora. In 
Pros. of Recent Advances in Natural Lan- 
nguage P~vccssing, pages 27-44. 

M. Kitamura and Y. Matsumoto. 1996. Aul, o- 
matic extraction of word sequence correspon- 
dences in parallel corpora. In Pros./~ttt Work- 
shop on Very Large Corpora, pages 79 87. 

Y. Matsumoto, H. Ishimoto, and T. Utsuro. 
1993. Structural matching of parallel texts. 
In ACL-93: 31st Annual Mcetin 9 of the Asso- 
ciation for Computational Linguistics, pages 
23-30. 

I.D. Melamed. 1995. Automatic evaluation and 
uniform filter cascades for inducing n-best 
translation lexicons. In Pros. of 3rd Work- 
shop on Very Large Cmpora, pages 184-198. 

A. I/atnaparkhi. 1997. A linear observed time 
statistical parser based on maximum entropy 
models. In Proc of 2nd Conf. on Empsri- 
eal Methods in Natural Language P~vcessing, 
pages 1-10. 

K. Takubo and M. Hashimoto. 1995. A Dictio- 
nary of English Bussiness Letter" Expressions. 
Nihon Keizai Shimbun, Inc. 
