Local Cohesive Knowledge for A I)ialogue-machine Translation System 
lkuo KUI)O 
AT'I,: Interpreting, Teleplwny I':e.~earch Laboratories 
Sanpeidani ltllmlat~i Seika-cho Souraku-gntr, Kyoto, 619-02, dal)Un 
E-mail addre.~s : kudo%(~tr-h~.alr coJ'l~(mr~uta,l.l~u.m:t 
Abstract 
In a natural dialogue, there are many disturbances in 
the context level because of interruptions and inserted 
sentences. In spite of such phenomena, cohesion is a very 
important idea for understanding the context correctly. 
In our approach, cohesive knowledge which judges 
cohesion between sentences is given to the system and 
then the knowledge is used to find cohesion in 
disarranged context. It is also applied to interpret 
anaphora, ellipsis and pro-forms in the context. In order 
to do so, we define the knowledge and use its definition to 
abstract knowledge fl'om a linguistics database almost 
automatically. 
1. Introduction 
When we build a machine translation system for 
dialogues, we must face a lot of contextual-phenomena 
such as ellipses, anaphoras and pro-forms. In a dialogue 
these phenomena are more complicated because of many 
disturbances such as interruptions, inserted sentences 
and utterance disorder. The phenomena have not been 
treated on the computer though these phenomena 
influence the context-dependent problems such as 
ellipses, anaphoras, pro-forms and referent-transfers. In 
this paper, we propose a context processing mechanism 
which fits for the disarranged phenomena, and describe 
the linguistic knowledge, called "local cohesive 
knowledge", which is a constraint for grasping the 
contextual relationship. 
In Section 2 we will give examples which arc 
dependent on the context and then describe the cause of 
difficulty in processing them. In Section 3 we propose 
"local cohesive knowledge" and apply the mechanism in a 
dialogue-machine translation system in Section 4. 
2. Contextual Robustness in l)ialogues 
Context-depcndent problems such as ellipses, 
anaphoras, pro-forms and referent-transfers, present 
complications as shown in Figure 1. 
(1) Anaphora: the previous utterance is the same, 
however, "it" points to the different terms, "the 
registration fee" in Example (1) and "the conference" in 
Example (2). Therefore context is complicated. In 
Example 1, the sequences of questions are disordered. In 
Example 2, the answer is a negation for the sentence, "I 
would like ...". 
(2) Ellipsis'. in Example 3, there is an ellipsis in a 
Japanese sentence, (2) "motte inai no desu ga." The 
F, xample 1 : sent, ence disorder 
( I ) I low much is t/~di.str¢~tiotl flee? 1 would like to attend the 
eonl'el'ence, 
(2) /_t's 2005. 
Example 2 : sententiul negation 
(I) 1 would like to attend the colfferencc. Ilow much is the 
registration fee? 
(2) I'm sorry. ~'s closed. 
l~xample 3 : (in Japanese) 
(1) k gr@tto kaado (credit card,) no (of) namae (name) o (OBJ) 
oshicte kudasai (Could you tell). 
\[ = Could yoti tell me the name of your credit card ?1 
(2) sumimasen (I'm sorry). ~nottc (have) ina~_\[not) no desu ga 
~. \[= l'msorry. Idon'thaveacreditcard.l 
Example 4: (in Japanese) 
(l) totlro\[~.uryou. (registration fee) wa (topic) en (yen) de (by) 
shiharatte yoroshii dcshou ka (can I pay). 
\[ = Can I pay the registration fee in yen?l 
(2) doru (U.S. dollars) de (in) ~',qishi ¢na:s{~ (prg-\['orm = Wc 
\[ = We would like you to pay in US. doIlars,\] 
Figure 1 Examples of contextual phenomena, 
ellipsis depends on the context and means 'credit card'; it 
is both a focus and an object (OBJ). 
(3) Pro-form: in Example 4 (2), "onegaishimasu" is a pro- 
form and means 'We would like you to pay' in Japanese. 
The meaning is dependent on the context. 
We call processing the disarranged phenomena 
"contextual robustness!~". In order to process such 
phenomena, it is necessary to understand cohesion in a 
context correctly. 
3. l,oeal cohesive knowledge 
We define cohesion in the view of computational 
linguistics. Here cohesion regulates whether two 
sentences are connected or not. However it does not 
regulate a relationship between two sentences. That is, 
cohesion is a constraint for two sentences. 
\[The definition of "local cohesive knowledge"\] 
In our approach, "cohesion" is grasped in a context 
with "local cohesive knowledge". It includes not only the 
constraints tbr "local cohesioW~" but also its results such 
as interpretations of ellipses, anaphoras, pro-forms and 
referent-transfers. Therefm'e if constraints are satisfied, 
the interpretations are obtained. Therefore "local 
cohesive knowledge" has two parts, "constraints for 
cohesion" and "inter-pretations", as follows. 
(Constraints for local cohesion) 
= > (interpretation) 
tl. Ordinarily, robustness means an ungrammatically sentence. 
I lowevm' "contextual robustness" is used for the discourse level. 
t2. We treat the contextual phenomena which occur locally, thus 
we use the term, "local cehcsion". 
1 391 
\[ Constraints \] 
The constraints are described as follows. 
verbl < X1,Y1,Z1 > ,verb2 < X2,Y2,Z2 >. 
In the "verbl<Xl.,Y1,Zl>", "XI", "YI" and "ZI" 
means the case elements of "verb1"; subjective (SUBJ), 
objective (OSJ) and second objective (oBJg) cases. If two 
sentences are satisfied with these constraints, they are 
called "local cohesion" here. As shown in Figure 2, there 
are 18 types, determined by three constraints for verbs 
and six constraints for nouns. 
Type h the same verbs and the same nouns. 
For example, 
"Could you send me a paper?" 
" \[ sent you the paper yesterday." 
Both of the verbs in the question sentence and the 
answer sentence are the same words, "send". Also, its 
object is the same word, "paper". This constraint is 
described as follows. 
send < Xl,paper, Z1 >, send < X2,paper,Z2 >. 
This constraint means that if two sentences include 
"send" and its object, "paper", the sentences are cohesive. 
Therefore the following sentences are cohesive because 
they satisfy the same constraint. 
For example, 
"May I send you a paper to your office?" 
"Please send me the paper to my home address." 
send< Xl,paper, ZI>, send< X2,paper,Z2 >. 
\[ Interpretation \] 
This knowledge can be applied into interpretation 
~verbs I 
noklns 
the same 
the synonymic nouns \[ 
tile sanle nouus with 
modifier. 
the same nouns with 
compound noun. 
the the 
same synonymic 
verbs verbs 
Type 1 Type2 
Type 4 Type 5 
Type 7 Type 8 
Type 10 Type 11 
the 
different 
verbs 
Type 3 
Type 6 
Type 9 
Type 12 
synonymic nouns with Type 13 Type 14 Type 15 
modifier. 
synonymic nouns with Type 16 Type 17 Type 18 
coropound noun. 
Type 2: Synonymio verbs and the same nouns. 
"Could you send me a paper?" "1 will bring you the papm" soon." 
send < Xl,paper, Z1 > ,bring < X2,paper,Z2 >. 
Type 3: Different verbs and the same nouns. 
"Dici you read the paper? .... Please send me the paper." 
read < Xl,paper >, send < X2,paper,Z2 >. 
Type 6: l)ifferent verbs and the synonymic nouns, 
"Did you read the registration ?" "Please send me the form." 
read < Xl,registration >, send < X2, form,Z2 >. 
Type 9: Different verbs and tim same nouns with modifier. 
"Could you tell me the limit for application?" 
"The application is closed now." 
tell < X 1 ,limit(applieation),Z1 >, dose < X2, app}ioation >. 
Type 12: l)ifferent verbs and the same nouns with a compound 
noun. 
"Could you tell me the registration limit?" 
"The registration is received till August 10th." 
tell < Xl,registration llmit,Z1 > ,receive < X2, registration >. 
Figure 2 18 types ofeonstralnts and their examples. 
problems such as anaphoras, ellipses, pro-forms and 
referent-transfers. Local cohesive knowledge has 
interpretation. If the constraints are satisfied, its 
interpretation is obtained. Examples are shown in 
Figure3 (b) and (c). 
(b) Interpretation of an anaphora: for example, 
"Could you send me a paper?" 
"I will send it to you. " 
(c) Interpretation of an ellipsis: for example, 
"Could you send me a paper?" 
"I will send ~ to you." ; ~ means an ellipsis. 
( In Japanese dialogues, such an ellipsis is found often.) 
Ca) send<Xl.,paper, Zl>,seud<X2,paper,Z2>. 
(b) send < Xl,paper, Z1 >, send < X2,it,Z2>, 
= > it = paper. 
(e) send<Xl,paper, gl>,send<X2,fO,Z2>, 
= > O = paper. 
Figure 3 Examples of local cohesive knowledge. 
4. Context proeessing with local eohesive knowledge 
I will now explain the mechanism which is useful for 
"contextual robustness", and interpret contextual 
phenomena such as anaphoras, ellipses and pro-forms. A 
flow of the system is shown in Figure 4. Inputted 
sentences are analyzed with grammar rules and lexicons, 
based on LexicaLfunctional Grammar (LFG) (1), and then 
intermediate representations ( F-structures of LFG ) are 
obtained. An intermediate representation is converted 
into its skeleton, because it has too much information to 
process for a context, in Figure 5. It is used to unify with 
"local cohesive knowledge" in the context processing. 
The algorithm of the context processing mechanism is 
as follows. 
(1) Make a pair of skeletons: to check the local cohesion, 
bring the skeletons of the previous utterance and make a 
pair of skeletons. 
(2) Check the local cohesion: look up the table of "local 
cohesive knowledge" as a key of the pair of skeletons. If 
the pair satisfies the constraints of "local cohesive 
knowledge", the pair is cohesive and then the 
Input Output 
I Ooneration 1 
c---~n____+~, An intermediate I An intermediate *, +~j 
\[ _ r ef_,r_e2 e n tati_o n_ _ \] ~ \[ _ r_e_p r e s" on_ration . 
I Interpretations ofanaphoras, ellipses, pro- 
I Ibrms and referent-transfers. ¢ 
skeleton ~ \[I,oealeohesiveknowledge 
(gx.) (fie skeleton)={...} I~ ~(1) Constraints 
in Figure 5. \]} }(2) Intm'pt'etations of 
Ilistory of skeletons 1 I ,/anaphoras' ellipses, pro-forms 
(Ex') (fl sketet°n) ={'"}in Figure 5. /1 and referent-transfers' 
Context processing 
Figure 4 A flow of a dialogue-machlne translation system. 
B92 2 
interpretations of ellipses, anaphoras, pr0-forms and 
ref'erent transfers are obtained with "local cohesive 
knowledge". 
5. The experiment 
When we built the system, one of the most important 
problems was how to produce the knowledge. We defined 
the local cohesive knowledge and used its definition to 
extract knowledge from a linguistics database almost 
automatically. 
We have a linguistic database which includes 60 
keyboard dialogues. The dialogues include 70,000 words 
in total and the number of different words is more than 
3000. These dialogues are analyzed and managed by a 
linguistic database (P-). 
We extracted local cohesive knowledge from 60 
dialogues which include 350 verbs and 1000 nouns. First 
we made a table which includes each verb and its noun. 
Then we extracted constraints of local cohesive knowledge 
to make the pair from the table. Constraint pattern (a), 
a:!~ shown in Figure 3, was obtained automatically from 
the data and patterns (b) and (e) were generated from 
pattern (a). We obtained 24531 assertions of"local 
cohesive knowledge" for types 1, 2 and 3, and 651 
a.,.;:sertions of"local cohesive knowledge" for t.ypes 7, 8 and 
9, We have learned that local cohesive knowledge is very 
sparse. Therefore the volume of "local cohesive 
kzmwledge" is not a problem. 
We have implemented the fi'amework as a module of a 
i(1" I sle.eletml ) = ; ( 1 ) skeinton 
i {iifl I qUiD ; ='toil < {l'; SU B,I ;,, f OBJ 2i,(f,. Oi~j ~ >'\], 
\[(ft SU ILj )=:l:e, i fe pRI!;D)__ ~ \] ' ;~N.B) !fx lq(I';I)) - -@, 
\[fit ()F;J2)=: f.a, Ila Pllt",ll) = ~\], ; I~ men c,s an ellipsis, 
{I\['I O ~J = \[I. "1 P x 'H) ='nu d el", t~ MOI))= fn, I'~ Pl{i,il)l='c~edi'~card' 
I \[\]} ;¢ N.B } ( f:,: M O DI = f'~,, It meac, s ;i nlodillel , 
(lie skeletoni = ; (2) sk~ lotml 
{\[{ rio PRED) ='ilax, e < (l' 10 SU \]?,.J l,(\[1 ~ O BJ I -> 'i, 
\[trio SUBd) = I'll , (l'l; PI.H..' l) ) .: ¢.~\], 
\[{h~, ()BJI = fi> {t12 I'RI';D):: Q \], 
\[I) 
local cohesive knowh,dge(I} 
In) tell < X 1, nunlb(!r (l:l ed it ca rd 1, ZI >, have < N 2, credi t ca rd > 
ib) feb < Xl, ntlmlJlrr (erl'dit card), ZI >, have,': X2, \[I;", 
= > 'it' = 't:ttrdit cazd', 
it) telL< X1, numbm Icrodit, cardi ZI >, have <X2, ~ >, 
= > ~ = 'credit. card'. 
h,cal _cohesive._ knl)wh dge 12). (N,B) ~" n~ is a meta-valiabie. 
( 1 ) Constraints tilt sl.:cletolls 
'~ at t'ltED)=~'telt <( "~ \[~l SUIM),I { ni OBJ2),( { ai OBJ)>', 
nI OILJ)=e "\[ ha, 
( ~ na PI{.ED) = c 'number ', 
( \]' na MOI)) = e ~ n,h ( T .4 lq~'i';I)) = c 'credit card', 
~" n2 l'lU<D)=c:'have <( } n?. SUB5),( ~ ~z OllJ5 >', 
{2) interl)retations fi>r anaphoras and eliipos: 
(a)( '\[ t~z OBJ )=~. ~ n5 ,( '\[ ,;5 Pl(l"l))=c'crediL ca~:d'" 
or {b} { { ~2 ()BJ }=c { n5 ,{ { ,:;, Pl{I,;I))=c'it' 
:- > ( "~ n5 ANAPI IORA) = 'credit card'. 
or (c} ( "\[ n'e {)BJ )=e } n5 ,{ "\[ ...~ l}lH';I\]l = cO 
= > ( T n5 I"i'l'll)SlS)':'(:I edit carci'. 
(N.B) I lere the local cohesive kni)wlcdge ( 1 ) is k el)resented us Li"() rel/t esentat ion, 
the h)cal cohesive knowledge {2). It, is equivalent. In Ihe imph!Inentatil/n Ihe \[,I;'G 
style was used, 
Figure 5 Examples of a pair of skeletons 
and their local cohesive knowledge. 
context process in a dialogue machine-translation system. 
The system is built on a Lt?G based machine-translation 
system (3). It has 200 grammar rules and more than 3000 
words. It transfers Japanese sentences into English ones. 
It was implemented in Quintus Prolog on a SUN-4 system 
and its program size was 3.4MB. 
An example is shown in Figure 5. 
(1) kurejitto kaado (credit card) no (of) namae 
(name) o (OBJ) oshiete kudasai (Could you tell). 
\[ = Could you tell me the name of credit card. 9\] 
(2) motte (have) inai (not) no desu ga (copula). 
\[ = 1 don't have a credit card. \] 
in the sentence (2) there is an ellipsis. It means 
"kurejitto kaado (credit card)". It points to the modifier in 
the previous sentence, "kurejitto kaado no namae ( name 
of credit card)". In this approach, as a results of analysis, 
the skeletons of two sentences are obtained as shown in 
Figure 5. The pair of skeletons are satisfied with the local 
cohesive knowledge (c) in Figure 5. Then the ellipsis is 
obtained as a 'credit card'. 
6. CONCI:USIONS 
To build a "contextual robustness" system, we 
proposed a context-processing mechanism which analyzed 
the context with "local cohesive knowledge". In order to 
apply the model into a machine-translation system, the 
knowledge needs to be produced effectively. Therefore we 
defined 18 types of"local cohesive knowledge" and used 
this definition to abstract knowledge from a linguistics 
database almost automatically. Some of the 18 types were 
implemented on a machine translation system. The other 
types were not generated, because they includes 
synonyms. In l, he future, we will construet them with a 
thesaurus and also extend the context processing 
algorithm to process more complicated phenomena such 
as parallel phrases. 
A(JI(NOWI~EI)GEMENTS 
The a;~timr would liI~e to thank Akh'a }';.urematsu , president of 
ATR and Iiitoshi lida \[,r their coustant encouragemenl,. Also thanks 
Tsayos}~i Morimoto, l(entarou Ogura, Kazuo Ilashimoto, Naomi 
lnoue and Naoko Shinozaki for ATR Linguistics Database. 

References

(l) Kaphm, R.M. & lh'esnan, J. "I,exical-Fuactional Grammar: A 
Formal System for Grammatical Representation" In: Bresnan, d. ted) 
"The Mental Representation of Grammatical Relations", The MIT 
Press, Cambridge, Massachusetts, pp,173-281 (1982). 

(2) Ogura K., IIasimoto K. & Morimoto T. :"An Integrated Linguistic 
I)atabase Management System",ATR Technical Reports, T1(-I-0036, 
(1938). 

(3) Kudo, I. & Nomura, l I. 'Lexieal functional Transfer: A Transfer 
}q'amewol'l~ in a Machine Translation System Based on LFG', 
I)roc(mding of I lth International Conference on Computational 
I,ir, guisties, Bonr,, August, pp. 112 114 (1986). 
