Auton latic Detection of Discourse Structure 
by Checking Surface information in Sentences 
Sa.(lao I(uroliaslii and M;ll(ol, o Naga.o 
Depl:. of \]';lecl, i'ical l!\]l/@lie(!riug, \[(yoi;o \[lnivers{Ly 
Yo~;\]li(\]a--\]iOlllll;/(:\]ii, ~a\],;y(), I(V()(,(), ()()(), ,\]al);/.li 
Abstract 
Ill this \[)~/I)(?l" , WL' \[)1'()\])(1:-;4! ~lll ~ltl{,()lll~lli(! iilOl\]l()(\[ 
for det.octing disc.oiirse, slrtlcl, lire IlSillgj ;i variely of 
chics exisLhig in I.\]ie surf;ice infolunalion of ~>euten('es. 
~,Jl/(? \[lilVC cousi(ler(!(\] l,\[il'(!e Dyl)eb o1' (:hie illFOl'lllfll, iou: 
(;lil(? (~Xl)l'eS;y;ioIIS, occlil'r(~li(:() ()\[' i(l(~lll, i(;;i\]/syliOllylliOilS 
words/phra,~e,~, aii(\] shuihwii.y I)elW('(!u l~v(.)<~(uli(!il(:/"~, 
l'\]xlmrillienlnl resull.q \]l;/ve ~,li()wu lll;ll, ill Ill<, (';iq(' c>\[' 
scieui, ific and I.echliiC;/\] lexL<,, ('on~i(Ior;ihl~' ilarl ()\[ Ill<, 
discollrse S4(l'il(~l.lli'O Cilli I)() eslilllaled I) 3 iu('orl)or;tliug > 
t.he i.hree I.yl)eS o\[ clue ill\['orilialioll, ~iilioui i)(,r\['orui - 
ing senleiicl~ iin(\](u'slan(ling I)i'(~ce~;se~ which re(liJirc's 
givi,g knowledge I.o COllil)tll(!r~i, 
1 Introduction 
'1'o /lll(\]o.rslJilld ii (,ext or dialogue, oue IIIilsl II'~/(:k. tile 
dis('.()urse <'.4(;i'lilll;lll'(~ (\[)g), Sl)i!(:il')'iu<t, ~ how SelIloIIC()S 
are c()nlhine,.\[ and whal, I,:ind of i'~,hltiolls (c(ih,.!reiic(~ 
relations) they hay< \Vorl~ on I)S has ul;17nly \[bcll~<~d 
Oll SllCll (lileSl.ions ;is wii;d, kin(I of kliowl(.(rg(! should he 
eini~loyed , and how inl))ren('e Iil~ty \])(! i)(u'li)rlne(\] Ims('cl 
Oil Silch I,;uowle(Ige (e.g., (h'osz ail(\[ ~icllll'l' 1!)8($; \[l(Jbhs 
1.08.5; Za(lrozliy ali(I J(!llS(!ll 1991 ). Ilowev~'i', hy eXalil 
iiiing i.he cilri'(!lll, sl.;il.ilS ()(' work holh on aillolli;lli(" ex- 
I.ra(:/.ioli all(I Oli Iiiilllll;/\] (:o(liug o\[ kuowle(li,>e , (h'lail<,d 
knowledge wil.h \[)roatl coverage avaihihilily Io coili\])ill 
ers is unlikely I.o I)e. cousl, ru('.led \[or llle i)l'(!S(lll. Oil 
Ille ol.her hand, rec(,lil ral)i(I ilicrt?;l:~e ill Ihe ;llllo~lnl o\[ 
ou-lille l.exl.s has force(I ii~; Io ail;l177,e I1()1, (.)lily i<~oli/led 
S()II\[,fHI(I()S \])ill also (liscoiirses ll<4illl ,) \])1"1),4(7111 available 
I~l~owle(Ige. 
\Ve i)i'ol~os(~ lu~l'e ~lli aulolnatic ulc~lh(~d 17)r (.<4{ilil;ll- 
illg \[)P-; in s(:ieiilific and te('lilii('nl I<'xls I) 3 , a v;u'iely o\[ 
keys exisLin/+; iu lhe surface hif'orlil;lliOll o\]' senleli('eq. 
()li(! iinl)orl,ailt, key for I)S is cJue words (e.g., (Jo 
\]Jell 198,'1; (ir()sz and Si(li?er 1,9S($; ltei('hin;in 19<'45). 
I"ilrl.iierinore, we have considere(I lwo ii1o1'(! illll)Oi'Lanl 
clues. ()lie iS I.Im ()CCtll'r(~il(:(! o\[ iclelll.icall~Yn(mynlous 
wor(Is/phra.<q!s \[or (\[elecling i;Ol)i.('. (;h;liiliilg or Loi)i(> 
d(ililillallL chnlniil<g relaiiou (Pohmyi nild ~cli;i 19<',;4); 
the oilier is a (;el'laiii ;dniihti'ii.y I)elween Iwo ~:elil(~ll('es 
for (\]el.ect.ilig I.heir coorclhlai<' feb/lion. The ju/lguietil 
hase(I Oll snch ('hie ini'()rnialion is llOl nl~;ohlll! \])ui jusi 
i)rohal)le. 'l'herefore, we }l~lV(? iucorl~oraled i. he above 
liielil.iOlle(l Ihl'ee I'a('lors illlO i)lll! e\'ahlaliOli Iil(!;i,<~lll't ' ((1 
eslinial.<; I.he iilosl \])hlusiille I)~g. 
2 Discourse Structure Model 
and Coherence Relations 
Studies o\[' I)S have I)een reported by a large munher 
o\['resca.rchers (e,g., Cohc'n I.OS4; l)algren 1988; (~rosz 
aud Sidner 1!)80; llohbs 1!)85; Mann 1994; Po\]auyi 
am\] ,t-;elm 19&'l; FhgchumH 1985; Zadrozny ,,i.nd J(!llSell 
1991). \'Vhal, has been c,.)lnmoldy suggest,ed is that 
Ih( ~ I)<-; resull,iug I't'oul the recursive elubeddiug mid se- 
queuciug o\[' di.<-:cc, urs,,~ uniLs has the \['ornl of a \[l'C(? 
(discoHrse hist, c)ry parse i, rce). lfo,,vever, Lhere has heeu 
a variety of ,.lefiuitioll fbr discourse Illli\[iS, constil, uengs 
oftl.' tree, mid cc, hereuce relat,ions. \[n this research 
we have adOl)ted the siI.plest, model iN t,hc interest of 
focusiug ou how to dei,e('A, I)S aul,onmtically. In our 
ulo&A, each s<~mence is considered a dis('om'se tlliig, 
au(\[ each no(h: of Lhe discourse history I)a.rse tree is a 
seuleHcc~ aud each liuk a cohereuce relation, 1 
(:ollereuce relai,iolls existJng iu a text;, as IL{ficlnuau 
(1985) I}oiuh~d out, greatly del)end ou the genre of the 
\[.(?X\[.; ii;/lr~/i.iv(!~ lll'glllllOIIL, iI(!WS article, COliVerslt|.iOll, 
au(\] scienLific reporL. AIIIOllg it lltlillb(?l' ()1" Lhe. cohQl +. 
enc~ rehli, ious suggeste.d s() Jar, we selected the follow- 
iug set of the relat, ions which a,zcounted \[br intuitic, l~S 
couceruing our t.argel, text,s, Immely scientific amt t, ech- 
aical texts (S:i. deuot,~s Lhe forlncr selm.mce ;tlld Sj the 
latter). 
last: S:i. and Sj iuvolve the salH<~ or similar events 
or state~,, or the same or sinfitar imi>orLanl, con. 
stitueuis, like sq..3 and s,\]-6 in Api~ei.lix. 
(JOIIILI'~I,NI; : Si aim Sj involw? c<.~ut, rastil~g events or 
!-;t~lt(!s, or (:Ollll'~l~d.illg illll)ol'i.~tlll. (:oll.c,(,i\[,llOlll,s. 
T,.:,pic cliaining : Si and Sj haw~ dist, incL pre(lica 
I, ious ahouL Lhe sanle t, opic, like s I-13 am\] s l-i9. 
Tol)iC-,.l,.mfimm.t: chahihlg : A dolninanl> constii,- 
ueuL al:,m'l from a giw~n tol)ic iN Si i)ecolnes a t, ol)ic 
in Sj, like s,\] d ;m(\] s,'t-5, 
I'\]hll)(:,raLion : Sj gives (let,ails ah, olfl; ;~ constiLueut in- 
tr,:)duced in S±, like s\]-16 aud s1.17. 
lb:!a.s,.ni : Sj is Lhe reason for S:i., likesl-\[3 and sl-.ld. 
Canse : .<-;j <:)c('urs as a result of Si, lil,:e sl..17 and 
sl 18, 
IA\[ \[)l'l!~,l!lit~ W(! reT, ard ;t NL'JI((!II((\] IllD.l'J';f!d off \]Jy ~l i)(!l'iOd ,:is 
it disc~>tu'<.e uuil, (Jt+lll!l'ellCl! rel;dloils ;ire exlsthig also bel, ween 
cl;lll~;ey, iu tt sf ill#lllC(!, \~Ie Ilihlk lilll" al)ln'om:h ex;ili/hiillg stlrf~l.ce 
clue \]ll\[)n'ulal i<ul fJt.ii IJ(! adapted Io exlra,.;\[ Ihcir I'elations, ;tiid 
W() illl(!llrl Io (?Nil!lid OIIl' $ysl(?l/I ill hltlldle ih(!lll, 
1123 
(a) i (1>) ~ ",, 
)- 
le i~:o ~ic t\[I liil{I\] 
Iopic- dOlllillillll olaboraliOll ¢ hllmi'illioli chainilig 
catl,~e{~ chaillillg~ 
Io f>ic-doiliilialil Iof, ic-dolilhlalil 
chilhlhlg ch:lhlil~ 
FiguPe l: Exanlph~s eli' discourse sgrllet.ilres. 
Change : The event or stal,e hi Si changes hi Sj (llSil- 
ally as thne passes). 
ExeinI)lification-present : AlL exainple o\[' |,he 
event, st.ate or eonsl.il.nenl, ill Si is hll.roduced iu 
Sj, like sl-13 and sl-1O. 
EXenlpllfical;ion-exl)lain : A n exaiiH)\]o elf' Ihe eVelll., 
state or eonsl, il.uent iu Si is explained ill S 3. 
QlleSlLiOli-aliSWC.r : Sj is I.he allswcr Io the qileSt ioii 
ill Si, like s4- I and s,'l-2. 
The l)Ss for the sample re×i, hi Al~peudix is shown iu 
Figure 1. 
As in really previous approaciies, we also niake the 
foilowhlg assulrlptioll in the I)S model: ;i liew SOlil.eiice 
coining iu ean be eolinecl.ed to the node eli |,he right 
pnost edge ill the DS tree (llerel/ftel', we ealla ilew 
sentellce all NS, alld a I)ossible eolineel,cd Selll,ell('e on 
the right edge in the I)S tree a CS: Iqgnre 2). This 
lllOalls that, after detailed explanal.kms \[br erie |.epic 
l.erlrihial,e, alid a new topic is hitro(lueed, derails of lhe 
old to|tic are hidden hi ililll,r liodes ali(I are IiO IOliger 
refer ro(i to. 
3 Automatic Detection of Dis- 
course Structure 
3.1 Outline 
C'orlsiderhlg oilr \[)S model, what lhe I)S analysis 
should do is clear; for each NS, il. t.rles to find lhe 
correct C~ alid l.he ('orrecl. relalioli bei.weell l.helli. In 
ordeP to cstiluate l.lieuL we have dil'eel.ed otlr aLI.ell- 
|.ion |.o I.hrc.e l.ypes el" chic rill'or'ilia|loll'. 1) clue i!×- 
pressions indicating SOllie re|aliens, 2) oe(fiil're/ICe o\[' 
identiea.l/synouyulous woi'ds/phrases hi topic chaiiihlg 
or topic-don|brant chahfing relal, ion> 3) similarity be- 
tween two s01itellces in list or ColHra.st relation. By l.he 
iriethod described later we Call l,l'atisforul such illf'Ol'- 
iPlat{()l-i into reliable scores for so|lie relalions. As all 
Figure 7: Rankh\]g Pelations leo CSs by three types of 
el tie ill \[ori llatio il. 
NS COllieS in, \['or each CS we calculate reliable scopes 
for all relations by exaniiniug the above three types 
o\[ ehles. As a \[hlal resull,, we choose the CS and tile 
relation haviug the ntaxhnuni reliahle score (Figure ~). 
kS air initial stai,e t~ I)S has one node, starting 
node,. We always give a ceri,alli score for the speci~d 
relalion, sf, art;, bei.ween all N,q alid tim sl,;u'ting node. 
\\qleli any other relal.iou Io lilly (L',,~ (lees ilol. have hu'ger 
s('ol'e for all N,('J, it, is eOl/lieeled Io tile starl, hlg node by 
si.arl, relat.ion. '\['his lllealls |.hal, I, he NS i~ i, he start- 
ing senl.eli('e of a ii<)w large sogilielll, like paragral)h or 
,seclioli, in |he I)S, 
3.2 Detection of Chle Expressions 
\Ve l)re.lmred heurisl, ie rules for findiug clue expressions 
by lmtl.ern lnatchilig and relating I, herri 1,o proper re- 
lations with reliable scores. A rule consists oft, he \['of 
lowing paris: 
• condition for rule alHHic.ation : 
rule applicable range (how far ill the. se- 
quence of C, Ss the rule can be applied to) 
relation of CS to its previous DS 
dependency structure pattern for CS 
dependency structure pattern for NS 
• corresponding relation alld reliable score. 
Patterns for CS and \]~IS arc lilatelied ilof for woPd 
s(~(lll(qic(~s I)lll for del~emh'uey slluicliIll'es el' botih sen- 
I(!li('es. 2 \4,"e IlSl' it \])ouer\[ll\] patlerli uiaLchhlg facility 
for del)elidelioy SIl'llCttll'eS, where a wild card n\]atching 
ally l>arl.ial del)eudeney sl.rtlCtill'e.> reglliar expressions, 
AND-, ()It , N()'lLol)eral.ors, etc. are available (~l/\[tl - 
ral.a and Nagao 1993). We apl)ly each rule for the 
pair of' a C'.S and all NS. If the eondition of' the rule 
is satisfied, the sl~e(',ified reliable score is given to 
2 hllml, lo {Itll' syslcIIl iS IL SO(IIIC;I1Cf: Of pa.rsed soil|elites I (\[e- 
I)(!lld<~llCy SII'tlCtilI'CS, IJy Otll" developed IM).FSeI' (l'~uroh~lshl and 
Nagao 1992a). In .Japanese Ihe del)endency siruclure of a sen- 
I(!llCC CCtli~isi~4 lt\[' head/nlodifier relallons })el.ween |)llll.'4el;sus, 
e;ich of which is , Ollil)OSe¢l <If ;i ( Olllelll %V(ll'd ~lltd suffix WOFdS. 
1124 
I/.ule-I 
range : 1 
relatioi~ of CS : * 
CS * 
NS NAZIi-NAP, A~ 
(because) I 
.'t: 
l'ClaLl()II : rel\]s(}ll 
score: 2(} 
'\['able l: IBxamph~s of hctu'ist.ic rules for cluc CXln'essions. 
Ratle-2 
r(lllg{2 : * 
re\]a(,ion ol'(:S : * 
(:S: NS : :l:---n. 
I 
:t: ~ X-NO~ 
(eft ~' 
X :l:--~ RI!I :l:~ 
:1: (exalnple) ~' 
:1: 
re\]at, iou : exeml~li\[icatiolt-l}reseld, 
score: 30 
l'{.nle-3 
I'~lllgO : 1 
relal.ion of CS : exemplification-present 
CS : * 
NS:* 
rela(,ion : e.lahoratioll 
score. : 25 
"A --. I1" denotes a head/modifier relation, 
where "A" depends on "13". 
"*" {\[ellol, es ~t wiM C~tl'd. 
the corresponding relation be{,ween l.hc ('S and the 
NS. 
For exatuph.h lhde-I in 'l'al)h'. l giw!s a score to {,he 
reasou relat, ion I)et,weeu two adjoiniug semel.:,.~s (note 
t, he rule applicable range is '1') i\[' t,he NS sUtrt.s 
wit.h l.he expressioll "NAZI';.NAI{A (because)". l~ulc- 
2 iu Table 1 is al}l~lied i~ot only for tim tLeighl)oring (:S 
I:,ut ;\].Is() \['{~r faP(,\]ler CSs, I)y specifying I\]le occurreH(:e (}\[' 
identical words ("X") in (,he cor,ditiort. V(e also {'au 
Sl}cei\['y the relation of CS to its previous DS as 
a. eolldition, like Rule-3 in Table 1. This rule cousiders 
the thcl, that when sonic exmnl)h:s are iMr{'xluce{l I)y 
exen@ificatiou-i}r('.s('.nt relal, iou, d<,aih~d {'xl)tauat, iotls 
for l, hem oft,en follow. 
3.3 Detection of Word/Phrase Chain 
Ill gelleral a senlen(;(: can bc divided h~{o Ix,,,{) \])nl'l'-;; 
a topic part. a.n(I a iloll-{,()\]}i{; part. \Vh{?ll (we sell- 
t,(',llces ape ill a l,opic chaiuillg rclalion, Ihe same 
Lopic is maintained t, hrc+ugh them. 'l'herel(}re, (.he oc- 
(;/li'r(!llC(! of idelttical/syuouynlous words/I}hrases ((,ll{' 
word/phrase chaiu) in topic lmrl,s o\[' t, wo sellLel~ces 
suppor(,s Lifts relation, in Llle case of (,olfic--dominaul 
chaining relation, a ,,Iomil\]aut constituelLI iutrodtJce, d 
in a non-(,el}i( I:,n\]'{, o\[' a prior senlence h(?COIil{~S ~1 topic 
in a succeedil\]g sent, ence, So, t, he v,'ord/l',hrase chain 
froln a noi>t,OF, iC part of a I)rior seut(!n('e to a topic 
\]);IP\[, Of 3 Siicceedill~ selll,ell('{! Stll)l)Ol'ts (,his t'(!\];lI,ioll+ 
\]\[owever, since there are Ill~|lly chios R)r ~llI ~S sup- 
porLing or, her relatious to some Cgs, we luusl, uot ouly 
fiud such WOl'd/I}hrase chaius but also give sou.! (+eli - 
able score I,o t,OlfiC chailfing or t,ol}ic-douliuanl, chaiu- 
ins relation. In order to do this, we give scores to 
words/p\]lrases ill t,ol}ic and nolM,OlfiC l++U't,s ac(,~)rding 
1,o the degree o\[' their importauce iu s{mtcn('cs; ;v{! also 
give scores 1o the IIlal,ching o\[ i{h'll{,ical/syllou3ulous 
wor(Is/l}hrases ac{:ordiug to the (l{+grt~e (}/+ their agt'e{'- 
inent, 'l'hetl we give these Pelatiol~S the sum of the 
scores of t, wo clmiued words/i)llrases au{\] Ihe score of 
Lh{!ir sial,thing (l"igure 3). 
All of these, are doue hy al}lflyiug rules COIISiSLillg O\[' ;:t 
i}at, t,erll for a imrl, ial dep(md{,iwy st, rucl.ur{, and a score. 
l;'or example, I)y I~ule-a mid I/ in Tal:,h~ 2, words hi a 
l)\]n.ase whose head word is folio, wed I)y a topic umrl,:- 
ills i)osl, posiI, ion "\VA" are given sotlle scores tin (,el)i( 
"l'olfie paLl Non-topic i)all {:,+: \] 
/ Match : 10 ............... Give 28 points to topic chaining relation. 
"l'{',l}ic part Non>lopic part 
F ----- --T~f'~ L---________ L ~.~_~ Give 30 poi,lts to topic- 
CS: 
/Match : 5 ..... "" dominan! chaining relation. 
Figure 3: Scores for t.ol)ic/tol}ic-(loluinant chaining. 
parts. A word in a I~ou-tol}ic part in the sentential 
st.yle, ".. (~A Al\]U(l.here is ...)" is given alarge score 
by llule c in 'l'al)le 2 because (.his word is arl important 
uew iulbl'umtioll iu this Sellt.elw.e and topic-dominant 
chaiuiug relat+iou iuvolving it, oft,en occur. MaLching of 
phrases like "A of If' is give, n a larger score than (.hat, 
{ff word lil,:e "A" alone by I/uh>d and e in Table 2. a 
3.4 Calculation of Similarity between 
Sentences 
\V\]LOIL I, WO SelliX?llc{!s have list, or contrast relation, they 
have a certain Silnilarity. I\[owever, their similarity can- 
Ilot be deteci.ed by rules like the abow~ which see re\]- 
ativt!Iy sinai\[ }}locks ill senl.ences, because it is n+oL the 
situl~le similarity lint. the silnilarity in the sequence ol' 
wor{l:; aud their granmlat, ical sl,ructures as a. whole. 
111 Ol'(\[er 1,O illeaSilPe Sllch LI similarity, we extended 
our dymunic programufillg method for detecting the 
scope o\['a coordination in a sent.ence (Kurohashi and 
Nagao 19921>). '\['his method ('.an calculate the overall 
similarity value h{~t.weexl Cwo word-strings of art)itrary 
leugths. First., the similarity value between gwo words 
are cal{'ulal,ed a(:(:,.~l'ding to exact matching, matching 
of {h{"il' parts of Sl~ee(:h, aIKI their closeness in :-i the- 
saurus dicliouary. 'l'heu, the siruilarity wdue between 
two wor(I-+strillgs are calculat.ed roughly hy combining 
t.he similarity values bel.ween words in the two word- 
:q)H,' clifllcult problem i~ that authc, rs often use subt.ly dil'- 
fcrcm {'XlJl',~ssi,ms, n<,l, M<,ttlc,,d Wol'ds/i)hrases , fc)r such chains. 
\Vhih~ s.tnP, of them can be caught by uMng a Ihesaurus and by 
rules like \[hde-f in 'FaMe 2, ~here is a ',vMc range of variety in 
their diflbrences, Their complete trea.lment will be a. target of 
OIIl' fll|tLl'¢+ WOI'\]'~. 
1125 
Table 2: Examples of rnles for topic/non-topic parts 
and matching. 
Toi)ic i)art, Matching 
Rule-a II.nle-d 
pattern pa.tl.ern : X * -'~-~-~- x * 
score : 10 score : 5 
Rule-b R.ule+e 
pattern: \[~\]q Imtte.rn : 
,.._IV X*---~ ~ x*--~ 
y * y .'1: 
score : 8 * WA score : 8 
Non- f, ople pal't; Rule-f 
Rule-e patte, rll : x{NO I NIYORU} 
l,a.ttern : ~__j, X Y.'I. + ~ (ofl by) +~ 
I \[ ARU Y :~: 
score : 1 l (there is) score : 6 
As tk)r rules for topic/non-topic paris, t, he score is given to 
the hnnset.su ma.rked by ;t square. As for rules for matching, 
"X" and "x" denote identical words or synonynlous words 
from this Jatmnese thesaurus, "Bulu'ui (',el \[\[you". So do 
llV+~ a.lld lly~. 
strings. 
While originally we cnlculated the sitJtilarity vahle 
between possible conjuncts in a setllellce, het+e we cal- 
culate the similaril,y value I)et.ween t.wo senlenees, a (:S 
and an NS, by this method. '\['his can he done simply 
by connecting two sentences and calctJlating the simi- 
\]a.rity value between two imit, ative conjuncts consisting 
of the two sentences. We give t.he ItOrl/laliz,:'d sinlilar.- 
ity score between a CS and an NS (divided hy their 
average length) to their list, and contrast relnlious as a 
reliable score. 
4 Experiments and Discussion 
F, xlmriments of detecting ItS were done \['or him*. sec- 
tions of an article of the impular science jourmd, '%ci- 
ence", translated into Japanese (Vol.17,No.12 "Ad- 
winced Conq)uting for Science", the origimfl is "St:i- 
entific Atnerican" Vol.257,No.4). For I, he first three 
sectiolls, we wrote rules \['or clue expressions and 
woM/phrase chains, and adjusted their parameI+ers 
through experimentation. The.tl we analyzed t.}lo re- 
tnaining six sections by ad(ling rules only for the chle 
expressiolls. The +malysis l'eStltl.s are s\]lovcn ill 'Fill)le 
3. lle,'e the NSs in the t.ext+ werl' classified accord- 
ing to their correct relations ill conne(:lh~g to ilrop(,r 
(3~S, ~Stlc('.ess ~' Itteans t.liat, the correct, relation alld 
CS were detected for au NS (correct. relations and CSs 
were judged I)y authors). 
'l'abh; 3 shows that nlauy chics uxist in a text, so tirol. 
nmch of the ItS can I.)e guessed without detailed knowl- 
edge. lit order I+o COllstrtlcl. rtlleS \['or chle exl)ressions 
with hroad cow~'rage, we need to COlmull aud aualyzc 
a large volume of I.exts. l\[owever, in most. cases rules 
TaMe 3: Aualysis results. 
Relation 
St, art. 
l,ist 
Cont, rast, 
Tol)ic cha.iniug 
'\['opic-donlinant c. 
l!',la boration 
l {+(.'asoll 
Cause 
Change 
l/',xenlP.-iiresent 
l!3xeml).-explain 
Question-answer 
Training text (3 sections) Test Text 
(6 sections) 
Success Failure 
7 1 
10 l 
6 \[ 
13 1 
10 4 
9 1 
3 0 
2 0 
3 0 
1 0 
3 0 
l 0 
Success Failure 
6 :2 
15 2 
2 2 
21 5 
37 14 
9 1 
l 0 
6 0 
0 0 
0 0 
2 0 
1 0 
Total 68 9 100 26 (S,ccess r~tio) (ss%) (7,o%) 
%r clue e, xpressious can be written exclusively so that 
they scarcely int.erfcre with each other, h, our experi- 
ments, added rules for the remaining six sections had 
uo influence on the mlalysis of the first three sections. 
The text \['rein sl-13 to sl-19 in Appendix was trans- 
formed to the strtlctttre ill Figure l-a as fbllows. 
sl-ld: the clue expression, "-I)AKAI{A-DEAI{U" 
which tneans "this is because". 
sl- 15: the clue expression, "-\VA I( 1'3-I)EAIUJ". 
s1-16: l.\]le clue exl)ressiou "exnml)le of X". 
s1-17: the heuristic rule SUl~porliug elaboration rela- 
I,ion after cxelnplifical, ion-i)resent relal, iou, 
s1-18: the clm~ expressiou "(SONO)I(H(IEA(-WA), 
(the reslt\]l, is)" which corresl)ou<ls to "lead" in se- 
illanl, ics. 
sl-19: theehain of"synthet.icalq)roach"+ 
The text from s4-1 to s4-7 in Appendix was also 
transformed to the structure, in Figure 1-b as follows. 
s4-2: the cluc expressions: "q~A" (a suffix indicating 
au interrogative senteuce)in s.'t-I and "(the) answer 
" ill S"I+2. 
s4-3: thechaiu o\[' "douhle star". 
s4-4: the chain \['rein "shriuk" in s4--:+ to "this pro- 
cess" in s4-,l (sonte expressions like %his process" 
ate rega.rded as tttal,clting ally vet'\]) ill a, previous 
SOIl\[ ellCe). 
s4-5: the chain of"uuclcar fusion". 
s4-6: the large, similarity wdue between s4-a and s4-(i 
and the due exl)ressiou "similarly". 
s4-7: this NS could uot be analyzed correctly. List 
relation wil, h s,'l-(~ was (h',tected incorrectly because 
of' their sinlilarity value, 
lu s.I-6 and s.t-7, while the same ,vord "Ileal," 
is us~'d in I;',nglish, IIle prior "heat," was trans- 
lated inl.o "ON I)O(tenllmral.lu'e)-( ',A J()\[)'Sll()U- 
StJl/tI(rise)" in .lal)ancse. lu oMcr to detect the chain 
l'or I,heir t.ol)ic-doluiuant chaining relation, we must, in- 
\['er that the risiug o\['telulmrat.ure i)roduce a \]leat. Such 
a l)t'ol)leln is igrlored in t, his research. 
1126 
5 Conclusion 
We have l)rOl)OSe(I a. lne.Lhod of detecl.iug I)S aulonlal,~ 
ically usilig surface inforllialioli ill s(!nleiices: chl(! ex 
i)re.ssions, word/i)\]lrase chniils, and siulilarii,y I)e£:vecu 
s(~.ril>eliC.os. In Lhe case of scieitlJ\[ic and l.(~chilica\] lexl,s, 
considerahle i)ai'L of the l),"g call I)o. esLininl.ed I)y incor- 
poraLillg t,\]ie l, hree types of clue iiiJ;:Jrnial, ion, wil, houl, 
porrorllihlg S(?IQ, elIC(~ tllldersl.al\]diiig I)rocc~ss<~s w\]iic\]i I'(> 
quires givilig \],:nowledg(! (.o colnl)ul.0rs. Tiffs al)pl'oach 
Call bo snioothly integral, cd wii,h t.he ('llrrelil. N I,P sys 
re.ms dealiug wiLh hu'ge alilOlllll, S of texts. 
References 
(k)hen, 1{.. (198,1). "A (k)nllml.ai.ional Theory of the 
Vuncf,,i()n o\[ (',hie \'Vor(Is iu Arg>uillent 
l\]lldersLalldhlg." \[11 lJrocccdiTU\].'; of loll# C.'(:)I,I.<VC;. 
I)ahlgl'Cli, K. (1998). Naive ,%'cTnaTIITcs JS," A.'alural 
I,a.'nguagc llndcrsl(r~ding, l'(hlwcr Aca(lelnic 
lhll)lislmrs. 
(-~rosz, \]\]. J. alid ~i(\]lier, C.. \[,. (lgg(J). "Aileut.iOli, 
lnLenLions, and the ,ql.ruct.ures of l)iscourse." 
(7olnpulotiollal Lingui.'dic.s, 17-:1. 
l/ol)bs, a. I'L (1985). On Ihc (,'ohcr(n('c a'nd ,%'h'ttchtr(: 
o J" Discourse. Technical I~cl)ort No. (:SI,I-<',;5:17. 
l(urollaslii, S. alld Nagao, M. (1992a), " A Synthetic 
Aunlysis Method of I,ollg .lai)alleSe ~vlll('ll('es bns(!(I 
011 ('.olijilll(tti;,'(; ,c'JLruclnros' l),:q(!c'tioii." (iu 
,ialialt(~se), 11),%'J- I'V(iNL ~- I . 
l(urohashi, S. aim Nagao, M. (1.99:2h), "l)ynmnic 
Programming Method \['or Analyzing (h)njunciive 
S(.ructures in Jal)anc'se." In Procccd~i~9.s of ll~gl~ 
COLIN(;. 
Mhllll, W. C. (J.,984). "l)iscourse Sl.l.tiCl.llr0s \['or 'l'l!xl. 
(:cnerai.ion." In tb'oceedi~vt~, of IOID ('OLIN(;. 
Mural.a, M, all(\] Nagao, M. (1993). " I)elerllliu;/I loll (.)\[' 
re\[~relll.in\] i)rolmrl.y alid litlliil)or Of IIOIIIIS hi 
,\]ap~illOS(! Sell((Hl(;Os for lll~l('hilie ll'tlllSlltl.ioli Jill() 
I!higlish." \[ii I'rocc+ di~Lq.~ of 'I'M I '.<)7. 
l)ohinyi, \[,. nnd ~c\]in, I{ (1!)~,'1). " A sylilnclic 
Apl)roac\]l 1.o I)is('oursn ~('lilnllli('s" 1u I'r~c(cditiq.s 
of ~OlD COLIA"(;. 
l{eiclnnali, I{. (1985). GctI~WI (:omp~tlcrs Io TaUc Li,(:c 
You and Me. Cnnibridge, M A, The ikllT I>ress. 
~,~/r.ti'OZliy, VY'. aim J eliSeli> \['(. ( \]091 ). "~ellla nt, ics of 
\])tll'tlgl'~lJ)hs." Oon/p'ul(tlToll(ll Lingui.s/ics, 17 2. 
Appendix: Sample Text 
Tii;le: Adwuiced Coinplii;ing for S(',i(!licl~ 
'i" nnd "i" in si-i d<;no(.e ill<! se('(.i()u illll\]il)er nnd I.he (!lll.ellC.e i'illlilber resl)ec .iwly) 
s1-13: ~-~,~U,I,L, \[~bT~>~,>~Ao)"d.¢>;l!'Du)lIGa)l:lt!Lridll(o 
A!7 k< 7t~ f'lt ~ ~ ~ ) h ' I:) "(i 0 ~ /~ h ~i " '1 i 0 ) W J T J ' u)7111D' ~" f,17 J,~ t J: 7) 
D'G I¢~ i <s: v~ D JD ~zN:: f!/! 7) ;/ t. Z~, ('I'll(" synihet, ic njtl)roa(:ll 
is called for who.Ii (.he fuudnnienlnl l~r()c('sscs o\[" the ili- 
terac£ions alli()llg, Ihe 1)nrls of a sysl(!lll ~lro known, hill 
the dei.ailed coli\[igural.ioll of the sysl<!ili is ilol.) 
b~(2&7,, (One can aLl, erupt to determine the unknown 
configurat, io\]l by synLhesis: oue can suiwcy Lhe possible 
configurnt, ions aim work oui, the consequences of each.) 
s1-1 5: ~ 5 b l:'_ff,~i'.4' ~ i)~"4! h'b si{~-6 2v~£1llh'~") ='- ~ ~ ~0 
f>:L,-t#'(,~Ll$.', ll'./~g;,.,~a)}iliSt~ R'1d:*o g < 3&tl)\]e ~ Z,~I£R- 
j~l}3: C'_ ~ 7'F'( ~ LX 7o ;b~,-} ("& ,4, (l{y careflllly matching the 
ol>servnlfle details of the experimental situation with 
l,i\]osL ~ COilS(\](lllCllCO.S , o11(': C;lll choost\] \[,\]l(\] COllflgtlr&l, ion 
Lhal, best, acCOtllll, s for the observations.) 
sl-l(J: I 9 tllf~Sl~J.'J,q(:c/)gT)sJ(i):'aO;l'(.'}qx'.s {VIi ~ 1. %', ~.-lh'a) 
/o gE ~1- /)'~-(v. ~ 7_0 (A \['l/liiOllS examph.' of the synLhei, ic ap- 
I;roach fronl the 19th cenlury is tile .aLLcnq)t that ,,',,,as 
luml<! {(;, uiidcrsl.nll(l t.llc ohs('rw:,.l but unexF, hfin,'.!,.I iler - 
I, lu'l)al.iOllS hi l,he o\]'l)il, or Uranlis.) 
sl - 17: fiJl:")~ :~,"/.: b I.~)<l{~}.-.{ q g: '~l~{t{ a)!:~ \]i!: @ JJ\[I .:L, ~',t,j)it.O w 
< .J!{!liDD"i!/, G ,~Tv g "(:', ;fa),li/I.j?ja);t9 .:x-. y -. {-~{L~tL 
(v,.,/.-(hivcstignl,ors mhled a hypothetical planet to 
I.he solar sysl.ciu aml varied the lmrameters of its orbit 
tllilil it sal.isfa(:t.ory r(woJisl.rtl(:t, ioll of t,h(.' perturbation 
was fotul(I.) 
sl- 18: <-oY~i~V4~:L±, -\]",a,!( ,~$ ~l~l::ffLg;:o\]f~ < "('~a)~}:l-h'a)~'~ L 
t~ ~ 2)/#~ !4{: {:'- i!'('}~)}ili <~'~) t~ ~/:i Oh "C &) Zo ('1 'he work led d irccl,ly 
Io the (liscovcry of Nepl, une, fomld nenrt, he i)redi(-l,<.'d 
posil;o..) 
s 1-19: :'- ~)<~i~LL'h";~/lJ ~:" b Zo a)I:~, :i~'~-j# Iz ~±, Jt;@fi94t$'l! 
6"Y'j ,q'l:-I{I, ~ L ~c'( v,/:: (ill the i)ns(, (.h(! synl.he{,ic apl>roach 
was liniil(.'d I o couq)nrativ(dy sinlplc situations. ) 
s 4 - 1 : 9.{ ".K' ~';;:: ~, l.}: ;5 a) 4"~f(o) ~\]/,.~ !~ f,5 & -t~" I\],\[/ll.:t~ ~- g ~-~ a)/:~ & 5 7)" 
(Why are astronomers iNl,crested in this Idnd of colli- 
sion'?) 
s4-2: ~ a>~?~: h_ l:J.. " ~,{~ " ~" ~.'l:. ~$'t/. Zo ~) (,: \]!l!J_iL/acf~ii L~Cv'~ 
7{2.',0\]~.2&1., (The answer li('s ill (.hc role of double stars 
in g('n(!rnl.ing "lien(.." ) 
s4-3: .i'\]{;i!..~ q4.Ji!-.7'Fi!f,/'$T4 <,1>, ;~l':'i\[f<f.4'iiiA,d/J, ,5 < Z3 0, 0. 
~\] ~ ~ i: \] . r~, \]b ill:' , . ~" ~ j. ~. ' {" 0 )jiij I!li oh !i\]. a> ~.~ HI ~: 71,ag> Z, C- 7:. D e 
Ab>~S (ill n collision hel,w(!eu el (\[oul)h~ sl.ar and a. single 
M.nl', I.he dottl)le sl,ar Call shriill,;, £rausferrhig eltorgy 1,o 
ihc siugh' si.nr nnd Lh(!rel)y heathig the 1>ool of stars 
itrOlilld \[\]lOlii.) 
s.'l-4: ;'- (I)j!~ \[',tt.J., I~;(YD:hq!f'igT: L. "(fi!lt q7 b, .L 0 ,It:t, ')JlCf't~ 
{Z/.~" Z,,I{;:}, "1", ,'L,:~:'- ~')jgi~l~'<JZo\[):l~i~'\['7~ ,~. < 4J.7,"('t,'Za (This 
i)ro(:(>ss is ~lli~llogoils to litlCleilr fusion, wherein aLolnic 
iiuch:i collide au(I fuse ilil.o lieavh~r nu('.h~i, relcmsing (;i/- 
ergy,) 
,D (Nuchenr fusiou is l.he same phcnoliiclioil that lllal,:es 
I. ll<! stars, incltlding lhe Still, shine.) 
s<t-(;: :J. ~:. :i'!li;ll~{: "- ,l: -:.> "( \]~JI$.(\]JiIL%j~D'e~tTI,'J, L.. >~(\])/,2~l;'-i~i 
?gJb2o)!i!..l,Jla)q'l.'~<)}Id~{~D ~' I:!d.g-/. ;7- ~ ~-gyh~ :) 7t~ (Simi- 
larly> orhiinl shrilil.:.ag(! o\[' doul)h; sl, al'S hiduccd by eli- 
cotill\[.(H's Call h(!~l\[, \[,h0. (:o1'(~ O\[ delise si,.3r clusters.) 
s4-7: C 0):,*.!~3, ',i!..D%7~k g°}glfl\].~ L. % t,~ 7.,,/iY-fl o.~\]ni l: :~} 
7~ ~7!~J{i!j.7 & :i;~\] O (t ") & ~: (,')'(2 3 >L 7o u)'(2d6 6 (This heat ca.II 
\])nhlltC'e llie Io<qs(,s ill. l, hc sul'l\]lce of sl.,hr chisi,el'S, whore 
sl,nrs boil off' C()lil, iliuolisly.) 
7127 
