An Interactive Japanese Parser for Machine Translation 
Hiroshi Maruyama 
maruyama@jpntscvm.bitnet 
Hideo Watanabe 
wat anabe@jpntscwn.bitnet 
Shiho Ogino 
IBM Research, Tokyo R.esearch Laboratory 
5-19 Sanbancho, Chiyod~-ku, 
Tokyo 102 Japan 
Abstract 
fin this paper, we describe a working system for 
interactive Japanese syntactic an',dysis. A human 
user can intervene during parsing to hell) the sys- 
tem to produce a correct parse tree. Human in- 
teractions are limited to the very simple task of 
indicating the modifiee (governor) of a phrase, and 
thus a non-expert native speaker can use the sys- 
t:em. The user is free to give any information in 
;my order, or even to provide no information. The 
:.;ystem is being used as the source language ana- 
lyzer of a Japanese-to-English machine translation 
::;ystem currently under development. 
1 Introduction 
I)espite the long history of research and develop- 
ment, perfect or nearly perfect analysis of a fairly 
',vide range of natural language sentences is still 
beyond the state of the art. The users of the ex- 
isting batch-style machine translation systems are 
obliged to post-edit the machine-translated text 
even if it contains errors because of an analysis 
failure. 
We haw~ developed an interactive Japanese syn- 
tactic analysis system, JAWB (Japanese Analysis 
WorkBench), for a Japanese-to-English machine 
translation system. It can produce very reliable 
.,~yntactie structures with the help of a human user. 
User interactions are limited to the very simple 
task of specifying the modifiee (governor) of a 
phrase, and thus a non-expert native speaker can 
use the system. The number of user interactions is 
minimized by using constraint pTopagation (Waltz 
1975) to eliminate inconsistent alternatives. 
One feature of our system not found in previous 
attempts (Kay 1973, ~Ielby 1980, Tomita 1986) is 
that the user is completely free to give the system 
any information in any order. He also has the ai- 
ternative of providing no information, in this case, 
the system runs full;," automatically, although the 
quality of output may be degraded. 
In the next sectiom we describe the system 
structure. Then in Section 3 we discuss the in- 
teractive dependency analysis, and show a sample 
session. Section 4 gives the results of evaluation of 
the system. 
2 System Structure 
The system structure of JAWB is shown in Fig- 
ure 1. Japanese syntax analysis is divided into 
two parts: morphological analysis and dependency 
analysis. 
An input sentence is first segmented into a se- 
quence of linguistic units called bu'nsets'u, which 
can be roughly translated in English as phr'ase,s. 
Each bunsetsu, hereafter called a phrase, consists 
1 257 
Input Sentence 1 
i I i Morphological Analysis i 
Dependency Analysis 
i Constraint- 
i Propagation 
'i Engine ~,~ 
Grammar! 
Rules .) 
"--._ 
,, Constraint Network User 
t 
Transfer I I 
(I went: to the sea.) 
fl, f:t it~.'\ tTf-:~ l:_ ,: 
(I) (sea+TO) (go+PAST} 
i~ < (go+PAST) 
1__ r-- 1 
~/d:l: (I) i~.~ (sea+TO) 
Figure h System structure 
of one or more primitive words. The morphological 
analyzer analyzes a consecutive sequence of char- 
acters and identifies word and phrase boundaries. 
Japanese morphological analysis is a relatively well 
established technology (Maruyama et al. 1988) 
and intervention by the user is seldom required, 
although the system does provide a facility for this. 
A Japanese syntactic structure is depicted by 
modifier-modifiee relationships between phrases. 
The dependency analyzer determines the modifiee 
of each phrase. This is the most difficult task and 
normally user interaction takes place at this stage. 
First, the system determines the modifiee candi- 
dates of each phrase by using the grammar rules, 
and builds a data structure called a constraint net- 
work. The grammar rules are based on Constraint 
Dependency Grammar (Maruyama 1990), and are 
essentially constraints between modifications. The 
constraint network holds the modifiee candidates 
of each phrase, and the grammatical constraints 
are posed between the candidates. 
The system then proposes the most plausible 
reading and displays it on the screen along with 
the other possibilities. If the human user is satis- 
fied with the proposal or does not want to make 
any decision, he tells the system to 'go ahead' 
and the proposal is passed through to the transfer 
component as the unique parsing result. Alterna- 
tively, tile user can select an arbitrary phrase and 
choose its modifiee from the rest of the candidates. 
The system incorporates this information into the 
constraint network, makes another proposal, and 
shows it to the user. This process is iterated until 
no more ambiguity remains. During analysis, the 
constraint propagation engine keeps tile constraint 
network locally consistent by using the constraint 
prvpagation algorithm (Waltz 1975). 
Before the unique parse tree is submitted to 
the transfer component, JAWB performs some 
'post processing' on the tree. This processing 
includes resolving remaining lexical ambiguities, 
giving grammatical relations such as SUBJ and 
DOBJ, and transforming a passive-voice struc- 
ture into an active-voice structure. Since mak- 
ing such decisions requires expert knowledge about 
Japanese linguistics and/or the system's internal 
structure, it is preferable that this process is car- 
ried out automatically. Since correct modifier- 
modifiee relationships are given at the previous 
stage, this process makes few errors without huo 
man intervention. 
258 2 
Ssent = { 
\[phrase=l, string ='' & f£ ~,2 ~" (anataga), 
cat=rip, mcat=pred modifier, 
modif J ee-~{~,2,3,4,5Y,}, 
words:: .\[ 
\[string ='' ~t£ ~',i" (anata) , 
syn=-\[Y. 
\[pos=105, 
string="j~)~- "(you), 
sem = \[sf={hum}, caseframe={}\] \] , 
\[pos=105, 
string="~\]~ "(far off), 
/ram = \[sf={loc, con,abs},caseframe={}\] \] 
X}\], 
\[string=" 75~" (ga), 
syn= \[pos=75, string=" ~" (SUB J) \] \] 
}\], 
\[phrase=2, string=" ~ \[~ "(kinou) , 
cat=advp, mcat=pred_modifier, 
modif iee={~,3,4,5~,}, 
Figure 2: Input to the dependency analyzer 
3 Dependency Analysis 
Let us consider sentence (1). 
(1) 
Anataga kinou deatta 
you~SUBJ yesterday meet-PAST 
otokowo miCa. 
man-\[\]BJ see-PAST 
Part of the input to the dependency analyzer 
for this sentence is shown in Figure 2. A sentence 
is a sequence of phrases, each of which is repre- 
sented as a feature structure. Some of the values 
are enclosed by special brackets {% and %}, repre- 
senting di.sj~Lrtctio'ns or choice t)oints. Phrase 1 in 
Figure 2, for example, contains two choice points, 
one for structural ambiguity (the modifiee slot) and 
the other tor lexical ambiguity (tile sgn slot of the 
first word). In Japanese, every phrase except the 
last one modifies exactly one phrase on its right. 1 
Therefore, the modifiee of phrase 1 is one of the 
four succeeding phrases. 
The grammatical rules that we need here are as 
follows: 
for X in $sen~ begin 
/* GI. pred_modifier modifies a pred */ 
(X.mcat=pred modifier => 
Ssent.(X.modifiee).eat in {vp,adjp,adjvp} 
)~ 
/* G2. noun_modifier modifies a noun */ 
(X.mcat=noun modifier => 
Ssent.(X.modifiee).cat in {np} 
) 
end 
for X,Y in Ssent begin 
/* G3. modifications do not cross */ 
X.phrase<Y.phrase & Y.phrase<X.modifiee => 
Y.modifiee <= X.modifiee 
end 
According to the above rules, tile modifiee (i.e., 
the governor) of phrase 1 (you-SUBJ) is either 
phrase 3 (meet-PAST) or phrase ,5 (see-PAST), 
since phrase 1 is a predicate-modifier and phrases 3 
and ,5 are predicates. Similarly, phrase 2 can mod- 
ify either phrase 3 or phrase 5. The values of the 
modifiee slot of each phrase thus become as follows: 
phrase i : modifiee={~,3,5~,} 
phrase 2: modifiee={~,3,5~,} 
phrase 3 : modifiee=-\[~,4~,} 
phrase 4: modifiee={~,SYo} 
Because modification links do ,lot cross each 
other (by rule G3), tile cases of phrase 1 modifying 
phrase 3 and phrase 2 modifying phrase 5 do not 
co-occur. Therefore, this sentence has three differ- 
ent readings, which correspond to (1-1) to (14): 
(1-1) (I) saw the man you met yesterday. 
(1-2) You saw the man (I) met yesterday. 
(1-3) Yesterday, you saw the man (I) met. 
Tile system maintains these readings im- 
plicitly by having constraints between choice 
points. For example, the following eorzstrairzt ma- 
1This is a common view of Japanese syntax, although 
there are different views. 
3 259 I 
2 2 1 1 
you-SOBJ yester- meetmpAST man- see- 
day 0BJ PAST 
2 2 I I 
you-SUBJ yester- meet-PAST man- see- 
day OBJ PAST 
a. When the cursor 
• is on phrase 1 
b. When the cursor 
is on phrase 2 
Figure 3: 
triz is attached between the two choice points 
Ssent.i.modifiee and $sent.2.modifiee: 
$sent.i. 
modifiee 
$sent.2. value 
modifiee 
3 I 
5 0 
3 I 
5 I 
By means of the constraint matrices, the system 
can defer tile generation of individual parse trees 
until all structural ambiguities are resolved. The 
number of parse trees may combinatorially explode 
when the sentence becomes long. For example, sen- 
tences with more than 20 phrases are not rare and 
such sentences may have tens of thousands of parse 
trees. 
User Interface 
The essential portion of the user interface is shown 
in Figure 3. The system does not display the pro- 
posed modifiees of all the phrases at once. Instead, 
when the user moves the cursor to a phrase by us- 
ing a mouse, the proposed modifiee and the other 
possible candidates are highlighted. In the figures, 
the current phrases pointed to by the cursor are un- 
derscored, the proposed modifiees are in reversed 
video, and the other modifiee candidates are in 
a shaded box. 2 The number appearing at the 
left lower corner of each phrase shows the num- 
ber of modifiee candidates of the current phrase. 
2These are in different colors on the real screen. 
If this number is one, the modifiee is uniquely de- 
termined. Otherwise, the modifiee of the phr~e is 
ambiguous. 
Figure 3-a shows the screen when the cursor is 
on phrase 1 (you-SUB J). Phrase 1 can modify ei- 
ther phrase 3 or phrase 5, and the system's pro- 
posal is phrase 5. Figure 3-b shows the screen when 
the cursor is o11 phrase 2. By moving the cursor 
oi1 tile phrases, the user can check the current sys- 
tem proposal. If tile user is satisfied with it, he 
indicates this by clicking a special 'go-ahead' icoq. 
Otherwise, he has to select the proper candidates. 
The user selects one of the ambiguous phrases 
by clicking tile mouse, moves the cursor to its 
proper modifiee, and clicks the mouse again. The 
second click triggers the constraint propagation en- 
gine, and the updated situation is displayed instan- 
taneously. Figure 4 shows the situation after the 
user has instructed the system that phrase 1 modi- 
fies phrase 3. The reader may notice that the mod- 
ifiee of phrase 2 is also determined automatically 
because of constraint propagation. 
During parsing, the user always has the initia- 
tive in the interaction. The user knows the exact 
sources of the structral ambiguity, and he can se- 
lect any of them to give information to the sys- 
tem. This is in contrast to the previous systems, 
in which the user must answer system-generated 
queries one by one. The constraint propagation 
engine ensures that the given information is maxi- 
mally used in order to minimize further interaction. 
The user also has the option of saying ~go-ahead' 
260 4 
1 1 1 1 
you-SgBJ yester- meet-PAST man- see- 
day 0BJ PAST 
1 1 1 1 
you-SllBJ yester- meet-PAST man- see- 
day 0BJ PAST 
Figure 4: Screens after specifying that phrase 1 modifies phrase 3 
at any time, taking the default choices proposed 
by the system. 
4 Evaluation 
One of tile claims of JAWB is that it can be used 
by non-expert users. To validate the claim, we con- 
ducted a comparative test with an expert user and 
a non-expert user. Figure 5 shows the results of the 
test. Subject A is one of the authors who actually 
developed the grammar. Subject B is a Japanese 
native speaker with no background in linguistics 
or computer science. Given an initial screen of de- 
pendency analysis, subject A spent 12.9 seconds 
on the average before making a correct parse tree. 
This period includes the time spent specifying the 
proper modifiees (1.1 times oii average) and veri- 
fying the system proposals, but does not include 
overheads such as the time spent choosing a new 
sentence to be analyzed and waiting for the sys- 
tem to look up dictionaries from a disk. The same 
task took 18.8 seconds for subject B. The impor- 
tant point here is that although the performance 
is somewhat different, tile parse trees generated 
by both subjects were essentially identical, a This 
means that, with a non-expert human user's help, 
JAWB is capable of producing very reliable parse 
trees fairly efficiently, although the efficiency can 
be increased by about 50% if an expert user uses 
it. 
Another yardstick for evaluating the system is 
the accuracy of the initial proposals. From 1,089 
test sentences taken from actual newspaper arti~ 
3There were differences when the sentence was truly am- 
biguous, in which case even a human user could not resolve 
the ambiguity without the context knowledge. 
cles, JAWB generated correct initial proposals for 
507 sentences (47%), which means that, if it is used 
in a flfll-automatic mode, its accuracy is 47%. On 
the other hand, the system rejected two sentences 
as ungrammatical, which means that for 99.8% of 
the test sentences, JAWB was capable of producing 
correct parse trees with appropriate user interac- 
tion. 
5 Conclusion 
JAWB is currently being used to accumulate cor- 
rect parse trees for a corpus of texts. The accu- 
mulated data are vital for the development of our 
machine translation system for at least two rea- 
sons: 
1. The transfer component, which generates an 
English syntactic structure from a Japanese 
syntactic structure, is difficult to develop 
without having enough error-free input data, 
that is, Japanese parse trees. 
2. The accumulated parse trees are used as reli- 
able linguistic data from which various statis- 
tical data are obtained in order to refine the 
grammar rules. 
We believe that interactive source language 
analysis is a promising approach to practical ma- 
chine translation not only because it may signifi- 
cantly reduce the task of post editing, which should 
be carried out by a professional translator, but also 
because tile cost-saving effect is multiplied when 
the same text is translated into several different 
languages. 
5 261 
Sentence length 
(~p of phrases) 
1-3 
4-6 
7-9 
10- 12 
13- 15 
16- 18 
19 - 21 
Ave. time (see.) 
Subj. A Subj. B 
6.3 
14.1 
20.6 
31.9 
32.5 
42.0 
Ave. ~ of interaction 
Subj. A 
3.6 
8.3 
14.6 
21.1 
27.5 
48.0 
0.0 
0.5 
1.5 
2.1 
2.5 
4.0 
Subj. B 
0.1 
0.7 
2.1 
2.8 
3.5 
4.0 
Ave. 9.8 12.9 18.8 1.1 1.5 
Figure 5: User performance 
Acknowledgements 
The authors are grateful to Masayuki Morohashi, 
Hiroshi Kitamura, and ttiroshi Nomiyama for their 
valuable discussions and suggestions. The authors 
also would like to thank Michael McDonald for his 
help in preparing the manuscript. 

References 

Kay, Martin. 1973, "The MIND system," in 
Rustin, R. (ed.) Natural Language Processing, 
Algorithmics Press. 

Maruyama, Hiroshi., 1990, "Structural disam- 
biguation with constraint propagation," Proc. 
of ACL Annual Meeting. 

Maruyama, Naoko; Morohashi, Masayuki; 
Umeda, Shigeki; Sumita, Eiichiro, 1988, "A 
Japanese sentence analyzer," IBM Journal of 
Research and Development, Vot. 32. 

Melby, Alan. 1980, "ITS: Interactive transla- 
tion system," Proceedings of COLING '80. 

Tomita, Masaru. 1986, "Sentence disam- 
biguation by asking," Computers and Trans- 
lation, Vol. 1. 

Waltz, David 1975, "Understanding line draw- 
ings of scenes with shadows," in: Winston, 
P.H. (ed.): The Psychology of Computer Vi- 
sion, McGraw-Hill. 
