Constituent lloundary Parsing for Exanll)lo-lkised Maclhine Tr,'inslation 
Osamu FUI,~I.JSF. and llitoshi \[IDA 
ATR Interpreting Telocomnlunications Research Laboratories 
Abstract 
This paper i)roposes an effective parsing nicthod for 
examlile-based machine transhltiOl~. In this method, an 
input string is parsed by the tOl)-down aplflication of 
linguistic patterns consisting ol variables and 
constituent boundaries. A constituent boundary is 
expressed by either a functional word or a l)art-of..speech 
bigram. When structural ambiguity occurs, the most 
plausible structure is selected usin b, tile total values of 
distance calculations in tile oxanll)le-basod Iraillework. 
Transfer-Driven Machine Translation (TDMT) achieves 
efficient aitd robust translation within the example-based 
framework by adopting this parsing method. Using bi- 
directional translation between Japanese and Vnglish> tile 
effectiveness of this method in TDMT is nlso shown. 
1 Introduction 
I-xample-basod franieworks are increasingly being 
applied to machiilo translatioi/, since th0y c~.ill l)rovido 
efficient and robust processing (Nagao, 1984; Sate, 
1991; Sumita, 1992; Furuse, 1992; Watanabe, 1992). 
However, in order to make tilt best use o1 the a(.lv:.lnlages 
of an example-based fl'amcwork, it is essential to 
effectively integrate an example-based method anti source 
language analysis. Unfortunately, whcll all exainl)le- 
based nletiiod ix combined with a SOUFC0 lnnguago 
analysis inelhod having cOlnl)lox l~r~illilliflr rules, pulling 
a heavy load eli translalion, the advai/lai;os of lhe 
example-based franiowork iilay l)e ruined. To achieve 
efficient and robnst processing by the exanii)lc-basod 
framework, a lot of sttldies have beell nlado for the 
pui\])ose of combining source lal!gtiage analysis with all 
example-based method, lind of efficiently covering the 
analyzed source langilllge strtiCttlro by me;illS of trailsfcr 
knowledge (Grishman, 1992; Jollcs, 1992; McLean, 
1992; Manlyama, 1992, 1993; Nirenburg 1993). 
One wily to reduce tilt load of source langua!,,c 
analysis ix to directly apply trallSl'cr knowledge to all 
input siring, which sinlultaneously executes both 
siruciinal parsing aiM transfer knowlc.dgo al)lHication 
through pattorll-il/atchii/g, l:'allerll-nlalchi~ig does liot rise 
grainillaticaI symbols such as "Notlil Pliraso", but uses 
surfi.ice words an(\] non-granlmalical synlbols. Therefore, 
in patlern-matching, rule coml)otition is reduced, and 
linguistic structure is expressed in a simpler manner 
thall ill gramnmr-based parsing. Thus, pattern-nlatcifing 
achieves efficient 1)arsing. It is also useful in treating 
spoken language, which sometimes deviates from 
convcntion:ll grammar, while grammar-based p,'lrsing has 
difficulty treating ilnreslricle(l spoken I\[ingllll,ge. 
This pal)Or proposes a constituom boundary parsing 
method based on paltorn-niatching, and shows its 
effeclivonoss for spoken langnago translation within the 
exaniple-I)asod framework. In otlr parsing method, aii 
inl)Ut string .is applied linguistic patterns e×pressing 
some linguistic constitticnts and their bonnds-lrios, in a 
top-down f:.tshion. \Vhon structural anlbiguity occurs, 
tile most phlusi/)lo structure is selected rising the total 
vahios of dislanco calculations in tilt example-based 
lrs-Illiowork. Shico the description of a linguistic ps-ittern 
is sinlplo, it is easy to update by adding f0etlback. 
A constiLuonl boundary ixusing method using nuitual 
illfoiillation i~ l)roposed in (M,'lgerlflan 1990). This 
method accouilts for the unrestricted lls-ltLlra\] langtlage and 
is efficient, llowever, it tends to be illacctirate> and 
difficult, to ad(l feedback to, since it completely depends 
on st'ltistical information withoul, resort to a linguistic 
viewpoint. On the cont,ary> in order to achieve accurate 
parsing and Iransb'ition, our conslituent boundary parsing 
method implicitly incorporates grammatical information 
into p'ltterns, e.g. constituent boundary description by a 
i)art-of-sl)eech bigrani, and classification of i)ailerns 
according lo linb, uislic levels such s.ls simple sentence 
,tlrld tlOtHI l)hrase. 
Tlallsfer-Orivell Maehillo TranslatiOll (TI)MT) 
(\[:tlrtiso> 1992, 1994) uses tile COl/Stil.llont botlndary 
1)a~sint ,, liielhod l)l'eSollto(l in this paper, as an alternative 
to glamliiar-based ali:.ilysis, aiKI lliakos the i)ost ilSe of 
the ex:lmplo-based framework. A bidirectional translation 
syslcnl between Jap,'lnesc lind English for dialogue 
sentences concerning international conference 
regislralions has been illlplenlented (Sobashima, 1994). 
l~xperimonts with the systonl have shown ollr parsing 
iiicthod I() t~ effcctive. 
Section 2 defines patterns expressed by variables and 
con.<;liluont boundaries. Section 3 OXl)lains a method for 
derivin{, possible English structures. Soelion 4 explain'4 
structural disanibi,gnaliOti using tlislanco calculations in 
Iho o×anilflo-b,'lsed framework. Section 5 exphlins an 
example of Japanese sent0nee analysis using our 
consliluont boundary parsing method> and Section 6 
705 
reports on the experimental resulLs. 
2 Pattern 
A pattern represents meaningful units for linguistic 
structure and transfer in TDMT, and is defined as a 
sequence that consists of variables and synrbols 
representing constituent boundaries. A variable 
corresponds to some linguistic constituent, and a 
constituent boundary does not allow any two variables 
to be adjacent. A constituent boundary is expressed by 
either a functional word or a part-of-speech bigram 
marker l 
The explanations in this anti the subsequent two 
sections, use English sentence parsing. 
2.1 Part-of-speech 
Table 1 shows tile English parts-of-speech, currently 
used in our English-to-Japanese TDMT system. This 
part-of-speech system does not necessarily agree with 
that of conventional grammar. 
Table 1 English parts-of-speech 
~of-speech abbreviation example 
adjective adj large 
adverb adv exactly 
interjection i nterj oh 
common noun noun bus 
numeral num eleven 
proper noun propn Kyoto 
pronotm pron I 
wh-word wh what 
verb verb go 
be-verb be is 
auxiliary verb aux crm 
preposition prep ca 
conjunction co nj bta 
determiner det the 
suffix suffix a.m. 
In this part-of-st)eech system, a be-verb, auxiliary 
verb, preposition, conjtmction, deterntiner, and suffix, 
are classified into a functional word. 
2.2 Constituent I)()ulldary marke," exl)ressed 
by a functional word 
One problem with pattern descriptions using surface 
1 In this paper, variables, actual words, and part-of- 
speech abbreviations are expressed in calfital letters, 
italics, and gothic, respectively. 
words is the necessity for a large number Of patterns. To 
snppress the nnnecessary patterns, the surface words in 
patterns are in principle restricted to functional words, 
which occur frequently, and which modify or relate 
content words 2. 
Fnr instance, the expression, "go to the station" is 
divided into two constituents "go" and "the station", 
and the l)reposition, "to" can be identified as a 
constituent boundary. Therefore, in parsing "go to the 
station", we use tile l)attem, "X to Y ", which has two 
variables X and Y, and a constituent boundary, "to." 
2.3 Constituent I)oundary marker expressed 
by a pa,'t-nf-sl)eech hig,'anl 
The expression "1 go" can be divided into two 
constituents 'T' and "go." But it has no surface word 
that divides tile expression into two constituents. In this 
case, a part-of-speech bigr,'un is used as a constituent 
boundary. 
Suppose th,qt a constituent X is immediately followed 
by a constituent Y. We express a boundary-marker 
between X and Y by A-B, where A is a part-of-speech 
abbreviation of X's last word, and B is a 1)art-of-speech 
abbreviation of Y's first word. For instance, 'T' and 
"go" are a pronoun and a verb, respectively, so the 
marker "pron-verb" is inserted as abot, ndary marker into 
"1 go". Namely, "I pron-verb go", i.e. with the 
boundary marker inserted into the original input, 
matches tile pattern "X pron-verb Y." 
2.4 Linguistic level 
Patterns are classified into (lffferent linguistic levels 
to limit the explosion of structural ambiguity during 
parsing. Table 2 shows typical linguistic levels in 
F.nglish patterns. 
Table 2 Typical levels in English patterns 
level exan_!p_le 
beginning phrase excuse me but X 
conlpotlnd sentence X when V 
simple sentence I would like to X 
verl) phrase X at Y 
noun phrase XofY, XatY 
c()mpound word X o'clock 
2 Exceptions are canned expressions such as '7 would 
like to" and "in front of', or frdquent content words 
such as "what." 
106 
In Table 2, beginning phrase is the highest level, and 
compound word is the lowest. A variable on a given 
level is instantiated by a string described on that same 
level or on a lower level. For instance, in the noun 
phrase "X of Y ", the variables, X and Y cannot be 
instantiated by a simple sentence. 
3 Derivation of Possible Structures 
The algorithnl for constituertt l)oundary parsing is as 
follows; 
(A) Assignment of morphological inRn'nmtion to each 
woM of an input string 
(B) Insertion of constituent boundary nmrkcrs 
(C) Derivation of possible structures by top-down 
pattern matching 
(D) Structural disambiguation by distance calculation 
Note: we will explain (A), (B) and (C) in this section, 
and (D) in the next section, usirlg die following English 
sentence; 
(1) "The bus leaves Kyoto at eleven a.m," 
3.1 Assignment of nlorphohlgical 
int'ormathtn 
First, each word of the input string is assigned 
morphological information, such as its part-ol'-sl)eech 
and conjugation fc.rm. Through tiffs assignnient, we can 
get the lollowing part-of-speech sequence for (1). 
(2) dot, noun, verb, propn, prop, num, suffix 
hi addition, each word is also assigned a thesaurus 
code for distance calcnhltions ,'lnd ,'ill index for retrieving 
l)atterns. For instance, "bits" has a thesaurus code 
corresponding to tile semantic attribute 'car.' Moreover, 
from the word "(it", we can obtain the index to the 
pattern "X (at Y", whicll is found for both verb phrase 
and nOl.ln phrase. 
.3.2 Marker hiserlic, n 
A constituent boundary marker is inserted in an input 
string for pattern-matching. The marker is extracted \[rein 
the part-of-speech sequence of an input sentence. Since 
such bigrams as dot-noun belong to the same 
constituent, marker insertion by a part-of-sl)eech bigram 
is restricted according to the items below. 
(a) Neither A nor B is a part-of-speech relating two 
constituents, such as a preposition 
(b) A is not a l)art-of-speech nlodifying a latter 
constituent, such :.is a dotorinh/or. 
(c) B is not a l)art-of-sI)eech modifying a previous 
constituent, such as a suffix. 
We mainttlin a list of p:lrt-of-speech bigrams that are 
eligible as marke,'s because they satisfy the above 
conditions. Of the bigrams in (2), "det-noun", "propn- 
prep", "prop-nora", and "nun>suffix", vioklte the above 
conditions, and are of course excluded. Thus, only 
"noun-verb" and "verb-propn" are inserted into sentence 
(1), as shown in (3). 
(3) "The bus noun-verb leaves verb-propn Kyoto 
at eleven a.tn." 
3.3 al)l)liealhm of Ilaltel'ns 
Our pattern-nlatchhlg nlethod parses an inpilt 
sentence in a top-down fashion. The highest level 
patterns of the input sentence :.ire applied first; then 
lmtterns at lower levels are applied. The application 
procedure is as follows. 
(I) Get indices to patterns from each woM of the 
sentence. With these indices, patterns are retrieved 
and chocked to determine if each of them can match 
tile sentence. Then exectlte (II). 
(ll)Try to apply the highest-level patterns first. If 
there is a pattern tlmt can be applied, execute (1II) 
with respect to the variable bindings. Otherwise, 
exectite (IV). 
(Ill)Try to apply surface words (content words 
registered in a dictionary). If lhe al)lflicalion 
succeeds, the application fo, that portion is 
finished successfully. ()thcrwise, execute (I1). 
(IV) If the pattern to be applied is at the lowest level, 
the api)lication fails. Otherwise, lower tile level of 
the patterns and execute (II). 
If pattern al~plication finishes successfully for all 
portions o\[" an input sentence, one or more source 
strttctures are obtained: since there is a possibility that 
more ttmn one pattern can be apl)lied to an expression in 
step (II), structural ambiguity may occur. We seek all 
possible structures by breadth-first application, and 
select the most plausible structure by the total distance 
value (See Section 4.4). 
107 
In step (I), indices to possible patterns :-ire obtained 
from several words and bigrams in the marker-inserted 
sentence (3), as shown in Table 3. 
Table 3 ReUieved patterns from (3) 
word 
the 
noun-verb 
verb-propn 
at 
a. ?l'l. 
retrieved pattern (lilmuistic level)_ 
tt, e X (compound word) 
X noun-verb Y (simple sentence) 
X verb-propn Y (verb pltrasc) 
X at Y (verb phr:~se, noun phrase) 
X a.m. (corot×rand word) 
After step (I) is finished, steps (II)-(IV) are repeated 
recursively. First, the highest level pattern of the input 
sentence is applied. This is "X noun-verb Y ", which is 
defined at the simple sentence level. Next, an attempt is 
made to apply patterns to the variable bindings "the 
bus" and "leaves verb-propn Kyoto at eleven a.m.", 
which are bound to variables X and Y, respectively. To 
"the bus", at compound word level p'tttern "the X " is 
applied first, and the surface word "bus" is applied to 
proso "tile bus." Likewise, patterns and suri'aee words 
are appliecl Io tile remaining part, and tile al~plic:-nion is 
finished successfully. 
The pattern "X at Y " is found for both verb phrase 
and noun phrase. "leaves verb-propn Kyoto at eleven 
a.m." thus has two possible structures, by the 
application of "X at Y." "X verb-propn Y " at the verb 
phrase level and "X a.m." at compotmd word level, are 
also applied. Fig. 1 is tile tree representation derived 
from the structure for sentence (1) where "X at Y " is a 
veal) phrase, while Fig. 2 is a tree representation derived 
from the slrnctllre in which "X at Y " is a noun phrase. 
A boldfilce denotes the head part in each pattent. This 
infer,nation is t, lilizcd for extracting an input for 
distance calculations (See section 4.3). 
X noun-verb Y 
/ k 
the X X at Y 
I / \ 
bus X verb-propn Y X a.m. 
I I I 
loaves Kyoto eleven 
Fig. 1 Structure in wltich "X at Y " is a verb phrase 
X noun-verb Y / \ 
the X X verb-propn Y 
I I 
bus loaves 
\ 
X atY 
I \ 
Kyoto X a.m. 
I 
eleven 
Fig. 2 Struclure in which "X at Y " is a noun phrase 
tile thes:mrus, and varies from 0 to 1. Tim value 0 
indicates that two semantic attributes belong to exactly 
the same category, and 1 indicates that they :-/re 
tmrclated. 
An expression consists of words. The distance 
between expressions is the sum of the (listance between 
words multiplied by each weight. 
The distance is calculated quickly bectutse of the 
simple mechanism employed. (Sumita, 1992) and 
(Furuse, 1992, 1994) give a clctailcd account of tile 
distance calculation mechanism we are aclopting. 
4 Distance Calculatitm 
In this ,ruction, a nlethod for structural 
disaml)iguation utilizing dist,'mce calculation, is 
described. 
4.1 Distance 
The distance between two words is retluced to the 
distance between their respective sem;mtic attributes in a 
thesaurus. Words have associated thesaurus codes, which 
correspond to partietflar semantic attributes. The distance 
between the semantic attributes is determined according 
to the relationship of their positions in the hierarchy of 
4.2 Best-match by distance calcul:ltinn 
The advantages of an example-based framework are 
mainly due to the distance calctdation, which achieves 
the bcst-malch operation between tile input and provided 
examples. 
In TDMT, translation is performed by applying 
stored empirical Iransl'er knowledge. In TDMT transfer 
knowledge, each source pattern has example words of 
variables and possible target patterns. The most 
• qppropriate target pattern is selected according to the 
calculated distance between, the input words and the 
example words. The English pattern "X at Y " at the 
verb phrase level, corresponds to several possible 
108 
Japanese expressions, as shown in the folhlwing 
English-to-Japanese transfer knowledge: 
XatY => Y' de X' ((present, conference)..), 
Y' ni X' ((stay, hotel)..), 
Y' we X' ((look, it)..) 
The first possible target pattern is " Y' de X' ", with 
example set ((present, cotg'erenee)..). We will see that 
this target pattern is likely to be selected to the extent 
that the input variable bindings are semanticqlly similar 
to the example elements "present" and "coati're|Ice." 
Within this pattern, X' is the target word correslx)nding 
to X, tile result of transfer. "preset, l" and "con/~reaee" 
are sample bindings for " X at Y ", where X = 
"present", and Y = "conference". The al)ove transfer 
knowledge is compiled from such translation examples 
as the source-target pair of " presem a paper at the 
conference" and "kaigi de ronbun wo happ),ou-st~ru", 
where "kaigi" means "conference" and "happyou-sltru" 
means "present". 
Tilt semantic distance from the input is calculated for 
all examples. Then lhe example with the least distance 
from the input is chosen, and the target expresskm of 
that example is extracted. If the input is closest to 
(stay, hotel), "Y' ni X' " is chosen as the target 
express ion. 
The enrichment of examples increases tile aCc,lracy Of 
determining the target expression and structure because 
conditions become more dclailed. 
4.3 lnl)ut of' distance calculation 
An input for distance ealcuh.ltion consists of head 
words in variable parts. In "X at Y " for the structure in 
Fig. l, X and Y are substitumd \[or the compound 
expressions, "leaves verb-propn Kyoto" a1~d "eleven 
a.m.", respectively. In such eases, it is necessary to 
extract head words as the input for the disEmce 
calculation about "X at Y ". 
In order to get head words, tile head part is (lcsignawd 
in each pattern (boldface in Figs. 1 and 2). For inslance, 
the t)attern "X vorb-propn Y II e(li)t;lillg the information 
that X is a head part. So the head of "leaves verb-propn 
Kyoto" is "leaves", and tile head or "x a.m." is 
"a.m.". Thus, in "X at Y " for Ihe strncture in Fig. 1, 
the ini)ut of the distance calculation is (leaves, a.m.). 
Table 4 shows tile result of distance cqlculation in "X 
at Y " in Fig. 1. The most plausible target structure 
"Y' ni X' " and its distance value 0.17 are obtained by 
the dislance calculation. 
Head words are passed upward from lower palterns to 
higher 1)atterns. Since the head of the verb phrase 
pattern, "X at Y " is assigned te X, the head of "leaves 
verb-propn Kyoto at eleven a.m." is "leaves", which 
is tile head of "leaves wrb-propn Kyoto". The head of 
"the bus" is "bus" fi'om the head information that the 
Table 4 Result of distance calculation in 
"X a/Y " in lqg. 1 
input:(leave, a.m.) 
AL~J£ELeXxl)ression closest example and |IS value :~ 
Y' de X' (arrive, a.m.) O. 17 
Y' ni X' (serve, reception) 0.67 
Y' we X' (look, it) 1.00 
head of "the X " is X. Thus, rite input of tile distance 
calculation of "X noun-verb Y " is (bits, leave). 
4.4 SI,'uetural dis:mlbignation 
Distance calculqtion selects not only the most 
l)lausible target expression but also the most plausible 
source structure. When .strtlcttlral aml)iguity occttrs, the 
most apllrOl)riate structure is selected by comt)uting tl~o 
totals for all possible combinations of partizfl distance 
values. The structure with the least total distance is 
judged most consistent wilh empirical knowledge, and is 
chosen as Ihe most 1)lausil)le structure (Furuse 1992, 
1994; Sumita 1993). 
Table 5 shows the result of each partial distance 
talc|Ha|ion for tile structure in Fig. 1. l:mm Table 5, we 
V.Ct Ihe total distance value 1.17 for the structure in 
l:it;. 1. 
Table 5 Result of each partial distance calculation 
for tile slructure in I,'ig. 1 
souiee chosen l~lr..~c:\[ distance val,lg 
the X X' 0.33 
X rlotJrl-vorb Y X' wa Y' 0.67 
X verb-propn Y Y' we X' 0.00 
X .t Y Y" ni X' 0.17 
X a.m. gozeJ~ X'ji 0.00 
The difference in total distance value I)etween two 
l)OSsible structures for sentence (1) is due only to the 
distance value of "X at Y ", for the structure in Figs. 1 
and 2. For the strucltne in Fig. 2, the distance valtl0 of 
"X at Y " at tile neun phrase level is given as 0.83, as 
shown in Table 6, and is given a total distance ef 1.83. 
Thus, the structure in Fig. 1 is selected as the 
3 The:.;e vii\]ties were col//pu,ed based on Ihe present 
transfer knowledge of the T1)MT system. 
appropriate restflt because it has the least total distance knowledge for the pattern "X pron-noun Y "; 
value. 
Table 6 Result of distance calcul,ltion in 
"X at Y " in Fig. 2 
input:(Kyoto, a.m.) 
target expression ¢losest exampl0 and its value 
Y' no X' (room, hotel) 0.83 
Y' deno X' (language, conference) 1.00 
In macbine translation, it ix important to 
disambiguate tbe possible structures, l)ecause a difference 
in structure may bring about a translation difference. For 
instance, the structures in Figs.1 and 2 give different 
Japanese translations (4) and (5), respectively. (4) is 
selected because it is generated from the best structure 
with the least total distance value. 
(4) basu wa gozen 11 ji ni Kyoto we de masu 4 
(5) basu wa gozen \] 1 ji ~_ Kyoto we de masu 
5 Constituent Boundary Parsing in 
Japanese 
Since a postposition is quite often used as a case- 
particle in Japanese, tim botmdary markers expressed by 
a part-of-speech bigram may not be used less frequently 
than in English. However, in spoken Japanese, 
postpositions are frequently omitted. The Jqpanese 
sentence "Kochira wa jimukyoku" where kochira 
means this and jimukyoku means "office", is 
translated into the English sentence "77fis is the office" 
by applying transfer knowledge such as the 
following5: 
XwaY => X'be Y' 
But postpositions are often omitted in natural six)ken 
Japanese, e.g. in the sentence "Kochira jimukyoku." 
The sentence can thus be divided into two noun phrases, 
"kochira" and "jimukyoku." "kochira" is a pronotm, 
and "jimukyoku" is a noun. So, using the bigram 
method of marking boundaries, we get "Kochira pron- 
noun jimukyoku", where the bigram "pron-noun" was 
inserted. The English sentence "77fis is the oJfice" can 
then be produced by applying the following transfer 
4"basu", "de", and "masu" mean "bus", "leave", and 
a polite sentence-final form, respectively. 
5 For simplicity, examples and other possible target 
expressions are omined. 
X pron-noun Y => X' be Y' 
In Japanese adnominal expressions, too, constituei~t 
bonndary markers ,'Ire inserted between the modifier and 
the modified. 
6 Results 
We have evaluated tim efficiency of our parsing 
method by utilizing a Japanese-lo-English (Jg) and 
English-to-Japanese (E J) TDMT prototype system 
(Furuse 1994; Sobashima 1994), which ix ,'unning on a 
Symbolics XL120(I, a LISP machine with IOMIPS 
performance. The system's domain is inquiries 
concerning international conference registrations. The 
efficency is evaluated with 154 Japanese sentences and 
138 corresl)onding English sentences, which are 
extracted from 10 dialogues in the domain. The systeln 
has al)out 500 source p,'llterns for JE translation and 
about 35(1 source patterns for EJ transhttion. 
The test sentences mentioned above have already l)een 
tr:tined to investigate the efficiency of the method, and 
can be p-lrse(l correctly by the system. Table 7 outlines 
the 154 Japanese sentences and 138 corresponding 
English sentences. 
Table 7 Outline of test senlences 
_ Japanese E_j1Aj_I i sh 
words per inpnt sentence 9.8 8.7 
average numl)er of ix)ssible structures 1.5 4.8 
An l-nglish sentence tends to have more struclural 
ambiguities than a Japanese sentence, bec,'tnse of PF'- 
altachment, the phenomenon that an English preposition 
f)rodtlCCS \[)()\[h a noun verb p\]lrasc \[Ilia a \[iolln phasc. In 
contrast, tile Jai)aneso l)ostposition does not generally 
produce different-level constituents. 
Table 8 shows how ,nuch time it takes to reach the 
best structure and translation output in our JE and EJ 
TDMT system. The processing time for distance 
calculation includes strnctnral disaml)iguation in addition 
to ktrget pattern selection. 
Tiffs demonstrates that the ot~r parsing method can 
get the best structure and translation output quickly 
wit\]fin the examl)lo-/xlsed framework. 
110 
Table 8 Processing time for the TI)MT system 
6 JF. E,I 
derivation of possible structures 0.25 (scc) 0.l 7 
dislance calculation 1.32 0.14 
whole tr,'lnsl;ition 2.17 1.07 
7 Conclu{lhlg Renllirks 
A constituent boundary parsing method for cxaniplo- 
based in;ichinB translation has been propose{I, l,inguislio 
patterns consisthlg of variables and constituent 
boundaries, are applied to an input string in a top-down 
fashion, and the possible structures can bc 
{lisambigutated using distance calculation by the 
examl}le-based framework. This nlothod is cll'icicut, and 
useful for parsing bolh Japanese and Knglish sentences. 
TIle "\['DMT system, which bidirectionally translates 
between Jal/anese and English within the eXaml)le-b:~sed 
framework, utilizes this parsing method and achieves 
efficient and robust spokel) larlguage translation. 
By introducing linguistic information to more 
patterns, there is a possibility that this method can also 
be utilized for ruled}ased MT, deep soinantic analysis, 
and so on. We will improve our parser by increasing the 
number of lraining sentences, and test its accuracy on 
olvn dala. 
Acknowledgements 
The authors wotlld like to th-lnk the menlbors of ATP, 
Interpreting Telecomnlllilicatioiis P, esoarch Laboratories 
for their colnlrlOnls oi1 variotls p,'irts of lhi~, research. 
Special thanks are due 1o Kohei \[labara and g:lsuhiro 
Yamazaki, for their snl)l)ort of this research. 

Bibliography 
Furuse> O., arid lida, H. (1992). Cool)er:ltion betweon 
Transfer aild Analysis in Example-Based Framework. 
Prec. of COTING-92, pp.645-65 I. 
Fnrnse, O., Sumita, E., and Ii(la, I1. (1994). Transfer- 
Driven Machine Translation Utilizing Iimpirical 
Knowledge. Transactions of hiformation Processing 
Society of Jal)an, Vol.35, No.3, 17t}.414-425 (in 
Japanese). 
Grishman, R., and Kosaka, M. (1992). Combinhlg 
6 The distance calculation time in F.J transhltion is short, 
since the system has llOt yet learned crlough trai/s\]:lliOll 
examples cmlcerning EJ translation. 
Rationalist and liml)iricist Aplnoachos to Machine 
"l'ransNilioin. Prec. of TMI-92, pp.263-274. 
Jones, D. (1992). Non-hybrid l-xample-baso(l Machine 
Translation Architectures. Prec. of TMI-92, i)p.163-171. 
McI.ean, i. J. (1992). F.x,'uni}le-Based Machine 
Translation using Counectionist Matching. Prec. of 
TMI-92, pI).35-43. 
M:u?,ernlan, D. M., and Marcus, M. P. (1990). Parsing 
a Naltlr;ll \],allgtiage Using Mtlttial lnfornialion 
Sl:ltistic~. l'roc, of AAAI 90, I}p.984-989. 
Maruyalna, 11.> and Watanal)e, I1. (1992). Tree Cover 
Search Algorithin for l';x,'lmple-llased "Franslaliom Proc. 
of TMI-92, Pl). 173- 184. 
M:.uuyan/a, 11. (1993). Pattern-Based Translation: 
Conlcxt-Free Transducer and Its Al}t)lication to Practical 
NI.P. Prec. of Natural l,anguago Processing Pacific P, im 
SylnpO.'-;itlln '93, i)P.232-237. 
Nqgao, M. (1984). A franlework of a mechanical 
Iranslalion between Japanese and l-nglish by analogy 
principle, in Artificial and lhunan Intelligence, ods. 
Elithorn, A. and Banerji> P,., North-Ilolland , pp.173- 
180. 
Nirenburg> S., I)omashnov, C., and Grannes, D.J. 
(1993). Two Al}proaches to Matching in l-xample-Base{l 
Machine Translation. Prec. of TIVlI-93, pp.47-57. 
Sale S. (1991). Examl)le-P, asod Machine Translali{)n. 
l)oclorial Thesis, Kyoto University. 
Sobashima, Y., Furuse, O., Akamine, S., Kawai, J., 
and Iida> I1. (1994), A l.lidirectional Trnasfer-Driven 
Machine "l~ransl:ltion Syslein for Spoken Dialogues. 
Prec. of COI.IN(i-94. 
Sumita, E. and lida> 11. (1992). Examl)le-P, ase{l Transfer 
of Japanese Adnoinin,:il Particles into f~2nglish. IEICI~ 
TITANS. INV. & SYST., Vol.li75-D, N(}.4, pi).585- 
59-'1. 
Stunita, F,., \]-'uruse, O.,and lid,q, t\[. (1993). All 
l{xainple-llasod Disaulbiguation of Prepositional Phrase 
Aitachn~ont. Prec. ofTMI-93, pi).80-91. 
Watanal)e, 11. (1992). Similarity-l)riven Transfer 
Systoln. Prec. of COTING-92, pl}.77{1-776. 
