Full-text processing: improving a practical NLP system 
based on surface information within the context 
Tetsuya Nasukawa. 
IBM Research Tokyo Resem~hLaborat0ry 
t623-14, Shimotsurum~, Yimmt;0¢sl{i; I<almgawa<kbn 2421,, J aimn 
• nasukawa@t,rl:, vnet;::ibm icbm 
Abstract 
Rich information fl)r resolving ambigui- 
ties m sentence ~malysis~ including vari- 
ous context-dependent 1)rol)lems. can be ob- 
tained by analyzing a simple set of parsed 
~rces of each senten('e in a text withom 
constructing a predse model of the contex~ 
tl(rough deep senmntic.anMysis. Th.us. pro- 
text. Without constructing a i):recige filodel of the 
eohtext through, deep sema~nfiCamtlys~is, our frmne= 
"work-refers .to a set(ff:parsed trees. (.r~sltlt~ 9 f syn- 
: tacti(" miaiysis)ofeach sexitencd in t.li~;~i'ext as (:on- 
text ilfformation, Thus. our context model consists 
of parse(f trees that are obtained 1)y using mi exlst- 
il!g g¢lwral syntactic parser. Excel)t for information 
()It the sequence of senl;,en('es, olIr framework does nol 
consider any discourse stru(:~:ure mwh as the discourse 
cessmg a gloup of sentem (s togethel makes • ,!,' " ' .., '~,' . i ': .' • " segmenm, focus space stack, or dominant hierarclty 
it.,p.(,)ss{t?!e .t.9 !~npl:ovel-t\]le ~ccui'a~'Y (?f a :: :.it~.(.fi.ii~idin:.(cfi.0szufid, Sht/!er, dgs6)i.Tli6refbi.e, om< 
,~ ~t.ehi- sii~ 1 "~g; ~-ni~chin¢''ti-mslat{b~t'@~- - ". .......... ": ........... ' • " " : ' le , ~-. - ....t . ;, -..: .;- . .? • . : ...... ~,', ;-.preaches 'to-context pr0cessmg,, and.m tier-ran d at. 
tern In thin a (r we d,es(tib~, a. ,~inqfl6 • ~ . . ' .: " : .... : : " " ' ' ~ ......... • ; ::'Li: i. .j.;'P'..p) ~;.!i : .% ..i-!: ..... , .... ..- .,.. '..obtmnm~..~:.pei'iect aimly,~ls. Howev~r,: by 9xtel~d c 
:co3!t e~/~ ~.g0dc\[.~.Ofi!si.~tilng .-ef:p~fs.e d t'Imcs.6!-.: .~,..,2, '.{ .in~/t, ii~,}.~{ifit..of .tile. iJl:oces~i~/g 6bji,bt. fr0ih'-0ne'.smt~ 
:~tch'~:n~q~ce::.in :.a.text,4~a¢~fit~-eff~'!,~ven£~s :> L #..:::'ti;iic~. ~6 .rfiifltiifl6 sgi~tdlt¢:es {n:.~{ k0ifi'ee, f,6XVxnd .by 
fin handhng various l~roblems m NLP such "- : """:' '!?": '"" .... (' " " ; using syntaciic information on all tlie 0theriwSr(ls 
as the resolutien of stru~:tural ambiguities, i.n the whole text. Snch ,~s nmdifiee-modifier relation- 
pronoun referents, and the focus of focusing ships and their I)ositions in the text. our framework 
subjuncts (e.g. also and only), as well as improves the overM1 ae('uraey of a natural language 
for •adding SUl)plemen:t~try phrases t.o seine pro('essing system. 
elli)ti(al sen:fences.. ". " " " . .' ...... : ........ <:. : ? -: - ' 2";" ,:- q.: '. <;. :- .:'~ .v: '..-:: .' : :". '.-::' ' C".. : We, imp\]etne~i{.M this II:amework:on:an Englisl~-to- 
• ,*"-' "~''<- ":~ ....... '. ".',' '." " , . "," ,', ' " - " . cOnllith, er lllallu~tts.,. ±.tt.: xtood.~,) .~,¢~v.. ....-.. ......... ,-~: .'key, '~.eclimot.ogy. for :.. mH)r.0y!ng" tlie. ac(;nra(:y • (ff ,h~xt.- .: ::. < :. :. ~G,':.; .:.: .. ", - ,.;-' , : '; ..... ',' : :-: : - ; :'; .... : 
t : ....... "q ""'.' . " ' ..... ' ...... " ", " .... " " m 
}rDuiid:i~/lowledge.,t, hd.(l(,el> iltfo1'(,ll(e !n(~cllal)isi~ls, C'O'tt'~X~-:~;:~l';!:;lt e t(:::~:~:1~;\]~ ' :~~ (.2;1\]I>!;;0'~ y ~.~;" 
-' " " " .,' " " .o' ;' " ' 'eS ~ Jiob SllInlng • .' "s . : 1~,,'" ,\[ : . .' ,< ; 7. is true:.that. We. can mWays nn.d ex~mq)l.,' ot 1 ' .- . .:..'. ~x,,, ,, ; ,' • '. " ; .;(.. '",. ' 
lems thai requir.e c6mmoil 'sei,g~ mM .ilff¢4rence~ lnech.r ::. ,." ;u.~w~i~:~d:h;na::P;'.lYSo:::~lg.:~:;n:~'l'. < w3!-n\[;a2ls;t 
a~lisI~lS;':s~rch.:as :t.ti( class'ie i)t:6iflems \]ne.ntloned. it\['.- ',,.":" !. :. '. '.'"., !.:. '~' '.,-<,:' " ' : . 
(Cl{anlliai("1973~ il\]whietl tl\[i re.fi rents (if'l:,{'ouonns result of Word semse (tiszmtblgua~mn m one sentenc.e 
Y :: - - ':..: .-" ," ,: •, . - <-' ," "; ;; ;- ' - " . Wit-ir 'M~ the (W6i'd~ .{/c"{Sd i~lis~iou{'~(~ t.fih~ :Sl{Xl'(~:.ttie "are ndt,<ex lw~itl s%~L15P£t.l.li \[;lle~l;(}x~;, nowexer ..m a..'. .... ,.. .: ~. .::" ":,~..:~: ...::.'. ,,:-:;...': :.':,.;.:,: .:'.,..,.. ":.- 
• <.....< :.. .... P . .. ~ z-., .% ~, . ::.: ...;-:".-..;..-,~..,,. :-z-el. "" satlw .ld~illn~L:2 Fffrtlte1'iiior(,:,..:I~.~asstntlin'g;W.fllS.C(~Ml~Se 
:t~.X~..Wit'l'IllT ~ur(,sZl'l(%e;(l (iOlllai,n" ..:p&rr, l('lnarly An \[,e 11: " .: *.. : --: - (..'. • .... .. • , , , .' • : • : ', , . -. .... . 
'abSerxre..t'ti/~i~3~ ~(mi);x(id~-ilenderlt :tfi:obloniS;'tht~( Art! ';:: ) ; ..6. y'.',: ';-,-.: .,..-! ..... ;..Y-;! : -i~:- "<:..'. ?., i~:!,.:.,!!~¢:.!..>"-..,':'~.:..,' *.~ 
• : i" ;~;. ' " ,-," " .-/, " v ( :~ ;-., < i-. - " .. "e ~. ?._:SL:.£t : . 'W/~,"(;all "O\])t,,~kilI:.6"l.ll~C,~ "t,~, U~'l;erglllllltg: \[,tie ll~DetlIleog .o I 
s(MgaDlg..Wll;nOt'lt-.Ille!lS( .DI. a (Leep lllieronce ille(lt7.. ' '~ "', .' ",.' ' -1 . '.... i. .~ .: , • ~ t" e 
imism o~ Cal( full" haiid (od( d data su( h ~ scl, i-)ts ' sm~(~nr~uy ftmolguous 1)nrases Irom sH'.ltc{urai.mlor: " ?" "~ 
' y. - : ' ' " ..' " ' " " 1 ' • : • ', . " - ' .- • , :' ' ': ' ' \[Schhnk and R{dsti6}k 1)981~ Wc therefore tried t0 matnm on all words w~th the same:lemmawlthm the 
.~,~...~, -.' i <..,,.'...~., '...,-,.,~,!,-.;.... . ~,, ;, ~,i ', .~,.'~..:~; ' ; di,~(:a~r:;'e ~,.'.M6r~ovel',, p~;O.c~siilg.:,~-MiOle.~.xt:: ~t:.a~i~:, 
d/)VelOp' a- praY( tl(-'.\[tl ~,ile~no(\[- tllaL .wolll(l SOlV.O' lllOS ,.. , - ' - , , .., .... ' - • ....... " . ..... ' : - . - ' " • " " - .... tinle ltmkes:-it. po~ibto ~6,.refer::to otl~erdn.f(*rm~ti61~ 
abntexVddpenden.t px'obl0,nls.and, m~t)r.o'~e the aceu-.. ,' ' ,~,- .. ,-;, ' . ., • .,..-,,:; -.:. :, .,. " ;.. • ; • . ,,- 
.:. ':.'.; ; ",': ....... ,i":-, " i "' ;.' :' " "., ," - , ,~:' ," " : ; .' S'li(;\[Fa.,~wor(IIreqi'fOn(~I~d~,1~e.poslrlo~z:ot.e~ten,WOl'(r, 
• .racy, oI ~ex.b,all&lysls :Dy,U,qlng a. stlnpte lnecP2a, nlstD.:. , , .~ ~ • :.. ..... ...: ...... :.....: ....,",, :, : : ,:", '., ..... ..- "< ." : .. ~.:....~. . ;.. : ,. i ,.'., .; .-. " .... . . .- ." . ,wfiieh,c.an be.lmedf0r gesoban~i)ron.em~ ref~tetlcd und 
'and-,eXlSldl\]g 1i13~llllle-l'l,~a(13A)leoara . , , • ' ,. . - _..,. : . ;, :.',-. ,: '-. ', .; - : " < , " - :,.', " -¢ -, - - " T6 t~e~h~, w:it}~ ~e' ttevel6,;~(\[ ~(ftah~4v6tk'f(fi" ~i@6 ':/ tlii~ '~0('us< ot-m~ash~gsnla'imictq :stidi..it~ a.l.~o, and-0nt:y, 
• " "-" ~-~ .- . ~,,. '~ - ~ .' : ' - . r """; :',"<' ', '-"' ':: ". "'-:':'"' " -':.? 2' *,..¢"': . ":'.':; ) - ',. , ': 
{~essi.ng.all_: se\]~te.n~:8~ .{g..a': t;eXt ;siimflt~n('.0i/sk6! s(; that..i: • .Ii/:~hi's .: p~p"h'i:,, ~$!./.d'eScti'.t~9.)d3ii:.....a!ob)ist.. (>O~\,tgx, t~ 
~e'a(-hsent.ende e~u(b'e d~s~nb'~gu~ded by itsing i~ffo~mia- . (proceS~ing l~ett!od , .m~mely~"filll-.t(~x~:processlng ,, f6- 
tioi~ ex~rgcted from other senten('es within tlm same (:using On its effects on ttie.output of a nmehine trails- 
824 
lation system. In the next section, we briefly (h'scril)e 
tim framework of our method, which uses a siml)le 
context model; tl,(`n, in the following s('etions, we il- 
lustrate its effe(:tiveness with some actual outl)uts of 
our English-to-JCL1)anese lna('hine translation system. 
2 Framework 
Full-text processing consists of thr('e steps: 
1. G(`neratil,g a context model tlmt consists of 1)arsed 
trees of each seltt('n(`e ill a sour('e t(`xt 
2. Refining the context model by assigning a single raft- 
fled parse tree to (`a<'h senten<'e in the text 
3. Resolving the prol)lems in -;t<'h sentence in the <'<m- 
text ntodel an<l generating a. final analysis for ea<'h 
sentence in tit(. text 
The resl)ective procedures fl)r these steps are (It'- 
scribed in the tolh)wing thre(, subs(`ctions. 
2.1 Generation of a simple context model 
In order to refer to ('ontext information that consists 
of dat;t on multiple senten('es in at text, it is esseu- 
tim to constru('t some eollt(`:~t model; the tirst st(' 1) 
of the full-text 1)ro('essing nwthod ix therefore to ('on- 
stru('t a context lnodel by amalyzing (`a('h senten('(` 
in an inlmt text. To avoid any (,rrors that may o('cur 
during transforlmLtion into any other rel)r(`s('ntations, 
su(:h as a h)gicM rel)resentation , we stayed with sur- 
face structures, and to i)reserve the robustn('ss of this 
framework, we used only a. set of l)arsed tr('es as ;t 
(:ontext model. Thus, ea(:h sent.enc(` of an inl)Ut text 
ix pro('(`ssed t)y a syntactic lmrs('r in the first st('I), 
and the positi(m of eac|t instance of every h'mma., its 
morphological information, and its lno(lifiee-modifier 
relationships with other content words are extracted 
from the parser output, and stored to construct a 
context model, ;~s shown in Figure 1. In addition, if 
any on-line knowledge r(`sourc('s are ~tvMbd)l(`, infl)r- 
mation extracted froln tit(, resour<:es is also stored in 
the context model. For examl)le, infl)rmation on sym 
onyms extra.('te(t from an on-lilw thesaurus dictionary 
and information (m wor(l sense all(\[ structural disam- 
biguation extracted D()m an examl)le l)~ts(`, such as 
<me describe<l in (Urmnoto, 1991) and (Nagao, 1990), 
may l)e ad<led to the cont('xt model. 
2.2 Refinement of the context model 
In the first step, a syntactic l)~trser may not always 
generate a Mngl(` unified parse It(`(` for e~wh sentence 
in tiw source text. A syntacti(' parser with general 
grammar ruh's is often mml)le to analyze not only 
se.ntences with grammatical errors and ellipses, but 
also h)ng s(`nten(:es, owing to their comi)lexity, l Thus, 
it: ix indispensable to (`stablish a ('orrect analysis for 
l In texts front a restricted (lomain, suelt as compltter 
manu~tls, most sentences are g1:mmm~tic~tl\[y correct, ttow- 
(wer, even a well-established syntaetie parser usually fails 
to generate a ratified parsed structure for a\])out 10 to 20 
1)(~rc(:nt of all the sentences in such texts, and the failnre 
in syntactic analysis leads to a failure in the filt~tl outl)l/t 
of a,, NLP system. 
Context = {Sentencel, Sentence2, ..., Sentence n\] 
Stenence i = \[Word i-I, Word i-2, ..., Word i-j} 
John likes apples. Sentence 1 
Word1-1 \[John\] 
POX : N BASE : John ... 
.............. ub ./~o hn~ Word1-2 \[llktm\] 
................ ~ POX : V BASE : like ... 
POX : N BASE : apple .~. 
Tom ah'o likes apples. Sentence 2 Word2-1 \[Tom\] 
POS : N BASE : Tom ... 
Word2-2 \[also\[ 
POS : ADV BASF. : also., 
Word2-3 \[likes\] 
POS : V BASE : like ,.. 
Word2.4 \[apples\] 
POS : N BASE : apple .. 
He also likes oranges. Sentence 3 Word3-1 \[He\] 
POS : PN BASE : he ... 
Word3-2 \[also\] 
POX : ADV BASE : also., 
Word3-3 \[likes\] 
POS : V BASE : like ,.. 
Word3-4 \[oranges\] 
POS : N BASE : oranse ,., 
Figm'e 1: Example of ~t context mod('l 
su('h a s('ntenee, hfformation extracted front COln- 
pl(`te 1)arses of w(`ll-formed sentences 2 in a context 
model ('all b(` us('(l to cOlnlflete incolnl)lete parses, in 
the f()rm of partially parsed chunks that a bottom- 
up 1)ars(,r outlmts fl)r ill-formed sentences by using a 
previously des('ribed method (N~Lsukawa, 1995). 
On the other hand, fl)r some sentences in a text, 
such as Time \]lies like an arrow, a syntactic t)arser 
lltay gent,rate nlore thatl olle parse tree, owillg to the 
1)r(`sen(-e of words that Call \])e ;Lssigned to more than 
one part of st)eech , or to the l)resen('e of complicated 
coordinate structures, or for wtrious other re~Lsons. In 
attempting to select the correct 1)arse of such a sen- 
t(`nee, on(' (;an use the tyt)es of the l)revious and sub- 
se(\[lleltt sentences or 1)hras(`s (Sll('h as sentence, llOllll 
phrase, verb 1)hrasc, anti so ()It) an(l the modifier- 
modifiee 1)atterns in the context model. 
Therefore, in the second step, tit(: context model 
g(`nerat(`d in the firs{; st(' 1) is refined by referring to 
information in the context model. First, the most 
l)referable candidate parses are selected for sentences 
with multit)le parses by referring to information on 
ea('h sentence in the context model for which a parser 
lent'rated a single unified parse. Then, partiM parses 
of ill-forlned sentences are ('ompleted by referring to 
information on well-h)rmed senten(:es in the context 
model. 
The algorithm for multiple parse selection based on 
"'Ill this paper, a "well-fornwd senten(-e" life,IllS ()It(' 
that is 1)arsed as one or lllOl'e than Ol1(` lllli~i('d strll('tllre~ 
and an "ill-formed sent(me(`" means one that c;mnot be 
pm'sed as a unified strncture. 
825 
the context model is as fi)llows: 
1. In each candidate 1)arse of a sentence with nmMph' 
candidate i)arses, assign a score for each lnodifier- 
modifiee relationship that is fl)und in the context 
model, and add u I) the scores to assign a 1)reference 
value to the (:andidate l)arse. 
2. Select the 1)arse or 1)arses wilh the highest preference 
value. If more than one l);~rse has the highest t)ref - 
erenee wdue, go to the next ste 1) with those lmrses; 
otherwise, leave this i)ro('edure. 
3. Assign a 1)reference value to each remaining candi- 
date parse that has the same tyl)e of root node (su('h 
as noun phrase, verb l)hrase, or sentence) as the 
parse of the 1)receding sentence or the next senten('e. 
4. Select the parse or 1)arses with the highest 1)reference 
wdue. If more than on(' parse has the highest 1)ref - 
erence value, go to tit(, next ste I) with dtose 1)arses; 
otherwise, leave this procedure. 
5. Assign a preference wfiue to ea('h remaining ('andi- 
date parse based on heuristic ruh's that assign scores 
to structures according to their grammatical prefer- 
ability. 
6. Select the parse or parses with the highest prefer- 
ence value. If more than one t)arse has the highest 
1)reference wfiue, select the first parse in the list of 
the remmning candidate parses. 
Tile procedure of conq)leting l)artia\] \])kLl'ses of a.n 
ill-formed sentence consists of two steps: 
1. Inspecting and restrnet.uring of each 1)artial parse 
The part of st)ee('h mid the modifiee-modifier rela- 
tionshil)s with other words are inspe('ted for each 
word in a 1)artial l)arse. If the part of speech and tit(" 
modifiee-modifier relationships with other words are 
different from those in the eont('x:t model, the 1)aerial 
parse is restructured a('eor(ling to the information in 
the context model. 
2. Joining of partial pmses 
If the 1)artial l)arses were not ratified into a singh" 
structure in the previous step, they arc, joined to- 
gether on tit(" l)asis of modifier-modifiee relationshil) 
1)atterns in the ('ontext model so that a unified i)arse 
is obtained. 
2.3 Problem resolution for each sentence in 
the context model 
Finally, in the third stel) , ea,'h senten('e in the ('Olltext 
lnodel is mmlyzed individually, and its mnl)iguities 
and context-dependent prol)h'ms are resolved by re- 
ferring to information on other sentences in the con- 
text model. The next section des('ribes the 1)roce- 
dures for problenl resolution, and explains lheir ef- 
fectivene, ss in lint)roving nmehine transla.don output. 
3 Effectiveness 
The a(:cura('y of syntactic analysis m~\y l)e improved 
by refinement of the ('ontext nn)del in tlt(' second step 
of the procedure. For ex~mlple, in an exl)eriment on 
244 sentences from a. chapter of a COml)uter manual, 
in which we attempted to select the correct parse of 
a sentence from multiple candidate l)arses, ('orre('t 
parses were sele('ted for 89.1% of 110 multiple pa.rsed 
sentences by using infbrmation in the ('ontext model, 
where~us the success rate obtained when the ('ontext 
model ¢'ontmned no ilfformation was 74.5%. In our 
experiment on ill-f(mned sentences ill technical do('- 
ulnents, in more than h~flf of the incoml)letely 1)~trsed 
sentences, the lmrt.iM parses were joined into a single 
stru('ture by using ilfformation in the context model. 
However, after the second step, ambiguities in each 
sentence are kept unresolved in the context model. 
Thus, we need to resolve problems in each sentence 
in the context model ill(lividuMly. 
In this section, we describe how the accuracy of 
senten('e mtalysis in other probh'nls is improved by 
referring to the siml)le context model, and how the 
results are refiecte(l in improved machine translation 
outlmts. 
3.1 Resolving the focus of focusing 
subjuncts 
Ih,solving the focus of fi)cusing sul)juncts such as 
also ;rod only is a tyl)ieal context-del)endent prob- 
l('m tha.t requires ilffornmtion on the 1)revious con- 
text. Fo('using sul)jnncts (lr~tw m.tention to a part 
of ;t senten(-e th~tt often represents new information. 
Consider the se(:ond senten('e, Tom also likes apples, 
in Figures 1 mM 2. Ill this sentence, the scope of also 
can 1)e To'm, likes, the entire predicate (the whole sen- 
t.enee except the subject Tom), or apple.% acc(trding 
to the itrevious context. In this ('as(', the preceding 
senten('e, Joh, n likes apples, has the structure, A likes 
B, whereas sentence (2) has the structure, X also likes 
B, where B and the predi(:ate fib,s are identical. The 
eoml)arison of these two structures indicates that the 
new intbrmation X (Tom) is the scope of also in sen- 
tence (2). 
The fl)('us of focusing sul)jun('ts ix resolved by 
means of the following algorithln: 
1. Find among the 1)revious sentences in the context 
model one that contains expressions morphologically 
identical with those in the sentence containing the 
focusing suhjunet. 
2. Contpare each candidate focus word or phrase in 
the sentence containing the tl)('using subjunct with 
words or phrases in tit(" senten('e extracted in ste l) 1. 
3. Drop any mori)hologieally i(hmtical words or I)hrases 
as candidates for the focus, and select the remain- 
der as the focus of the fo(-,tsing su|)junct. If more 
than one candidate remains, take the defaul}, inter- 
pretation that wouhl be used if there were no context 
iuformatiolt. 
Figure 2 shows the translation outputs of our sys- 
te,n with and without information 1)rovi(h~d by con- 
text pr(t(:essing. As shown in this figure, with(tar the 
context information, also modifies the 1)redicate like 
l)y default in l)oth senten('es (2) and (3). In contrast, 
when context pro('essing is apt)lied, the focus of also 
ix determined to I)e Tom in senten(:e (2) and orange 
in sentence (3). 
In our amtlysis of ('omlmter manuals, most nouns 
were repeated with the same expressions unless they 
were repla.('ed by 1)ronouns or definite expressions 
su(h as th, is, that, and tit('.. ()n the other hint(I, predi- 
(-ates were sometimes repeated with different expres- 
sions. For exanlple: 
A has B. ~ A also includes C. 
A contains B. --~ C is also included in A. 
826 
(1) John likes al)l)l'.'s. 
\[With and 'vViihou(. (:ottt<,xi\] 
I)ep(qidency SI rtl('l iil'(q 
'l'ranslaiioll: "~ !i ~'+&, ~J "/:-:~:~.t'-g*~-51"<, 
,lOhgL \]t¢+ 'l'i~tylO 'lllo kOrtLOllti ilZ¢LR'IL, 
(2) Tom Mso likes a.l)l)l,,s. 
\[wiu,,..~ < ',.,,,.'x,l (-~----" 
L ~-_ 
'l'ranslalh)n: I" ACJ., i) "t::{:, I.J4<>~::~:& ~:-g , 
:I'0711, \]DfL /'Zll,¢/O llJo f\[OTL'!J(JII, IL{ I~;OII, O IlLI lllfL,'47+ 
(a) H(: a,lso lik(,s oranges. 
\[Wit, hour. ('<'""×t I Qi,,,,) 
Translation: ~t&, 7~+ 1/;/5) g:, I,iJd,~V?"ai&'t 'j,, 
\[((Zl'(~ Jt(~ 01'(Z~II(: '+Ill) C\]OttlIO'IZTtZ 7L('Z()HZi Iltf~N'lg. 
\[With (:ontexi\] 
'l~o~t~ ~ltO ~'itZ\[lO "~llO /,:Ol~O~tti ltZ(t,~'a. 
Iwi, ,, (:(>.~,(,~q Qa'_) . i)op<,,.I,,,,,.y s,,..,., .,..: p,.iaV,'~..-).:?,,7;', 
t<a~, >q+aa--'X:-:~ .............. v .,.oA 
Ti'al,sl~tiioti: ~{2, >\]" I/F'S{)~'&~-~',, 
t((tl't: \]l,(Z 09'(tlZg('. 11t0 ~;01tCl#tti 7tt(t.S'll,. 
Figure 2: l~;xaml)h ' of translation (I) 
\[11 this case, infornlltl:ion on ,~3"ilOllyillS a, lld deriva- 
tiv('s (,xtr+t('t(,d fi'om on-line (li('tionari('s can t)(' us('d 
l;o exalllille the (:OH'eS\[)Oll(h'n('e \])etw('ell two words. 
3.2 Resolving pronoun referents 
Pronoun resolution is a.noth(,r typical ('ont(,xl- 
(h'l)('nd('nt 1)rol)h'nJ, sin('(' the r('fcr('nl of a l)ronoun is 
not Mwa.ys in('lud('d in lh(' sam(, smlt:(,n('(,. Our ('ou- 
l:ex:l: n).o(lel is us('d to s(qe('t (+uMidat(' noun l)hras('s 
for a 1)ronoun r('fl'rent. \]qlrthermore, information on 
word fr('qu(m('y and moditi('r-moditi('(' rel+t(ionships 
extr;tcted fi'om the (:ontext 1no(\[el inll)roves the a(.(.u- 
racy with whi('h th(' ('orre('t rcf('r(,nt is s(q(,('tod froui 
the (';m(lid~t(' noun l)hri~s(,s, a.s shown in a. pr('vious 
pap('r (Nasukaw;t, 199,i). By applying h(mrisii(' rules 
according to which a, candi(lat(, that has h('im fre- 
qu('ntly r(,pe~m~(l in th(, 1)re('eding sent(m('es and it 
candidate th~tt modifi(,s the morl)hoh)gi('a.lly id('nti-- 
(:al predicat('s as tho 1)rollollll in i;he same context 
are t)referred, w(, obt.Mn('d a su('(:(,ss i'~'L(,O O\[ ,0.'~.8(Z, ill 
pronoun r(,solution. 
However, the results of pronoun resohliiOn may not 
be explicitly r('th'('t('d in th(, out.put of :t ma.('hin(, 
tral,sla.tion system, sin((' most languag('s have ('orre 
Sl)onding an+q)hori(: expressions, ~tnd us(' of th(, corre- 
Sl)onding a.naphori( expression in lhe translation oul- 
l)ut: hi~s the adviLnt+tge of a.voi(ling misint('rl)r('ta.tions 
('a.used by misr('solution of 1)ronoun ref('r('nts, ('v('n if 
the probability of misim.('rl)r('tation is less than 10J(. 
Thus, ill Figure 2, He in .q('illrOll('(~ (3) is tra, nsl~Lt('d 
as the Ja,1)anese 1)ronoun ~;a'r(:, Mthough its ref(,renl; 
is correctly resolv(,d a,s Tor~,. Even so, corr(,('t res- 
olution of a 1)ronoun r('f('r('ul: is iml)ortanl for dis- 
ambiguating the word sense ()f a 1)r('di('al:(' modified 
1)y t, he l)roiiou11. "~ Ill ad(lition, if the 1)ositions of a, 
aIn fact, t.he result of pronoun r('solution for s('nl:('nc(' 
(3) of Figure 2, in whi('h To~,. is s(%('t(,d as (.ho rofe> 
t)t'()ll()llll i/,tl(| i1:,% l'('f('l'Olll; llOlln 1)hra,s( ' &l?(' reversed ill 
the ll:~ulsllt/:ion of a. (:Oml)h,x senten('e where an ini- 
tim main ('lause ill a, sour('(,-lmtgmtge s(,nt(,n('(, ('om(,s 
afl(,r th(' sul)ordin+tte ('l+ms(' in th(' target language, 
the r('t'('r(mt, noun phr~ts(' shouhl be repbt('ed with th(' 
I)ronoull, to avoid ('ata.phori(' refer(,n('(,. For ('xaml/h', 
the t"m~,,lish S('lll,(qlc(' 
Th,(: dog 'will eat you,'r c.,k¢', if you dcm,'t ho, v¢: 
q'eti(:kly, 
should bc translatod as 
Kiw~.i \[v,,,,\] ~/a .~ono keiki \[th< <..kq wo ,~'tq/'~¢,~,i \[q,,i,.~l.\] 
~a.l~¢' ":~,¢ri \[,10,,'~ < .~1 ~,(1,'ra., ,,~o'n,o i~tu \[~h, d,,~\] ,qa 
,I :a, hetc:_sD, i?r~,a,'i£~/o \[,,,i.., q. 
Sin('(' in the t,r;mslated .\]ai)~uwse s(,nt(,n('(, the sub- 
oMinate clause, i,f you do'u'I have it quickly, ('om(,s 
1)efor(' th(' main el+rose, The dog 'will ,at your" (:ai;e, 
the pronoun it in th(, sUbol'dinat(, claus(, must l)e r('- 
solved in order to g('n(,r;tte a natura.1 .\]iq)an(,s(, sen- 
t(m('(,. Mioreover, the word sense of h, ave in the subor- 
dinar(' claus(' cannot 1)e sch,('t(,d without infl)rma.tion 
on th(' ret'orent of the pronoun it. 
3.3 Lexical and Structural disambiguation 
In a. consistent text, 1)olyselnOUS words withiu a dis- 
course tend (o have the sam(, word s('ns(' (Gale et a,l., 
1992; N;tsukawa, 1993). Thus, \])y al)plyiug discours(! 
('ovstra.int in such a, nlanner that 1)olysemous words 
with the slune lemma within a context ha.ve th(' same 
(,nt of He, is r('tle(q;(~d in (:he translation of the predicate 
like. lh'('~mse of the l,~(:k of tt scnmnti(' f('ature £'lt~t~,an for 
th(, h'xi('al enl;ries '/'o~, a.nd ,loh'u in our (ti('tion~try at th(' 
tinio of this transla, tion, diti'eront word senses for animate 
sul)jc('ts mid nolt-aalinla|;(! sul)je('ts were s(,lectcd for tl, c 
verb like, and the verb like was r(,n(h,r('d (lit\[(,r('nlly in th(' 
translations with mM withont context. 
<lThis translation was not 1)roduced by our syst(,m. 
827 
word sense, a result of word sense (lisambiguation aI)- 
plied in one sentence cau be shared with all ()tiler 
words in tile context that have the same lemma. Fur- 
thermore, by assuming dis('ourse I)reference, namely, 
a tendency for each word to modify or be modified by 
similar words within a discourse, structural infornm- 
tion on all other words with the same lemma within 
the discourse 1)rovides clue for determining the mod- 
ifiees of structurally mnl)iguous 1)hrases (Nasukawa 
and Uramoto, 1995). This method can 1)e used to 
solve context-dependent t)rol)leuls such as the well- 
known examt)le shown in Figure 3. 
(1) John saw a girl with a telescol)e. 
\[Without (\]ontcxt\] 
Translation: ~ !J ~/t~t. ~{N,~< 3: o<. ~'/.0.'{'5~ ~ b/do 
John ha boucnkyou niyotte shoujo wo ~nimashita, 
\[With Context\] 
|)el)endency Structure: 
'<)0, ........... 
........ ( with 
John ha bouenkyou wo motsu shoujo wo mimashita. 
(2) The girl witl, a telescope was walking on the street. 
\[With and Without Context\] 
Dependency Structure: 
v,?a -D ............ 
Translation: ~,~" % ~)~J~'.0,'I2v, }~i')"~'\]J.Z~,,Z'V,$ bt:o 
Houenkyou wo moran shoujo ha loori de aru~tc imashita. 
Figure 3: Translation with context (II) 
In sentence (1) of tile figure, the mo(lifiee of the 
prel)ositional phrase with a telescope can be either 
saw or girl, depending on its context. In this case, in- 
formation in sentence (2), where the identical t)repo- 
sitional t)hra.se modifies girl, provides a clue that with 
a telescope in sentence (1) is likely to modify girl. 
In this way, modifier-m<)difiee relationships extracted 
from a context model provide clues for disambiguat- 
ing structurally ambiguous phrases. Needless to say, 
the effectiveness of this method is highly dependent 
on the s<mrce text, and it may seem too optimistic to 
expe(:t such useful information ill the same context. 
However, as shown i~1 Figure 4, which is a transla- 
tion output of an actual <:Oml)uter manual, we can 
often find modifier-modifiee relationships that (lisam- 
biguate structurally anlbiguous phrases in tile sltme 
context, at least in technical documents. In Figure 4, 
the ambiguous prepositional 1)hrase of a job 5 in sen- 
tence (2) is disamt)iguated and attached to the flow l)y 
~of + noun may modify verb, as in He robbed a lady 
of her money. 
using the information provided by the unamt)iguous 
1)rel)ositional phrase in The flow of a job in sentence 
(7). Similarly, tile information on the unaml)iguous 
prepositional phrase in placed on an output queue in 
sentence (11) disaml)iguates the aml)iguous I)rel)osi - 
tional t)hrase on a job queue in sentence (9), alh)wing 
it to be attached to places. 
3.4 Supplementing phrases for elliptical 
sentences 
Supplementatiml of elliptical phrases is another typ- 
ical context-dependent prol)lem. In spite of the sin> 
t)lMty of our context model, some elliptical phrases 
can be supt)lelnented by using information extracted 
h'om the context model. For example, if a group of 
words ending with a cohm is not a complete sentence, 
as in the ease of (3) in Figure 4, 
This allows you to: 
our system adds either do the following or the follow- 
ing t)y referring to the tyl)e of the next sentence or 
phrase in the context model. If verb phrases follow, 
do the following is added, and if noun l)hrases folh)w, 
the following is added. Thus, in (3) in Figure 4, do 
the following is added 1)ecause a verb phrase follows 
this sentence. 
3.5 Resolving modality 
The modality of itemized sentences or phrases is of_ 
ten ambiguous as a result of the 1)resence of ellipses. 
For example, (4), (5), and (6)in Figure 4 couhl be 
imt)erative sentences in certain contexts. In this ease, 
however, they are itemized phrases, and by reference 
to (3), they (:all be identified as supl)lementary w, rb 
phrases to be attached to (3). Thus our system ana- 
lyzes them as verb phrases and nominalizes them in 
the translation. 
4 Discussion 
We. have described how a simple context model that 
consists merely of a set of parsed trees of each sen- 
tence ill a text provides rich information for resolving 
amt)iguities in sentence analysis and various context- 
dependent prol)lems. The greatest advantage of our 
coutext-processing method is its rolmstness. Storing 
information on a large number of sentences requires 
a relatively large memory space, which has become 
available as a result of progress in hardware tech- 
nology. Our fl'amework is highly practical, since it 
does not require any knowledge resources that have 
been specially hand-coded for context processing, or 
a deep inference mechanism, yet it improves the accu- 
racy of sentence analysis and the quality of a practi- 
cal NLP system. The basic idea of our method is 
to improve the accuracy of sentence analysis sim- 
ply by maintaining consistency in word sense and 
nmdifiee-modifier relationship among words with the 
same lemma within the same text, on the basis <>f tile 
following assun, pti<ms: 
• Vocalmlary is relatively small in a consistent 
text, and words with the Sanle lemma are re- 
peated in a relatively small area of a text. 
828 
(1) Tracking Your Job 
:~.--% -m-2 ~, 7~i!~aM7~ :_ ~: \[ U,~e'r v,o job ,~o t.v~l,i,~eX:iaur'u 
koto) 
(2) It is iml)ortant to know th(, flow of a job so that 
you can track it through thv system and display or 
change its status. 
tj,'e k'~ 8to,, &.Sv,l:t~i~,,J 0P, t~ <k 5 ~::, g ~-f a)j,f.~L4.~fl-,-Cu,7o ~ >: 
~:tili:'~:-(*'~<> \[ Uaer ga, .system ,wo too,~hih:, sore wo tau- 
isekidekite,, oyobi aono ,joukyou wo hyo'widekiruka, 
aruih, a henkou kanouna youni, joD no nay/are ¢~o sh, it- 
teiru ko*o h,a j'ml~,yo',, dear,. 1 
(3) This allows you to: 
t *tt:l, :~---~'- -~< ~ ~><, 1:2 V~:~fr.; 5 < ~ 4 "J(lP,~: L :~ g. ( Kore 
h,a, user ni totte, ika wo o\]~:ona,,u, A:oto wo );:a'nou ni 
.~himasu. \] 
(4) End or hold a batch job. 
~,~-7- • "~ ~-74,~,j"¢Z,t ~ ~v,~t~'i,-~~ ~_ ~: \[Batch job 
wo ahuuryou.~,ar,l~, koto ar,ltih, a hoji,~urlt koto\] 
(5) Answer messages sent by the system. 
9,x~-&~<2<>'(~(,ttT~,g'7-k- 715~) 7~12& \[~y,<4te'm, r~i 
yotte ok,lt,vareru ~n, esaaSle wi \]cota, c'r'lt hoto\] 
(6) Control print('r output. 
l~l.!llil\] ~ ¢?,'. ~') ill )J 'k liilJ ~11 "¢" 7~ C k \[bt,.sat,~'u,so'lt, ch, i 'no d~,ut- 
suryoku wo seigyo*ur,u hot<)\] 
(7) Tit(! flow of ~t job can have lip LO fiVC StCl)S: 
gu-fa)~;~t~, l~),.; r) o)x-)~,~7'lfide> b'~!J,~J': \[.Job rl, o "nafta'l'~: 
hi, .~aidai 5 no .~tep ga a.riemn,v~u\] 
(8) 1. A nser or 1)rogram sutmdts a jol) to 1)e run. 
~t;flb~-j'o \[1. U.ser ar.u, iha program ha, jikko'u, anrcrlt 
tame no job wo jit,:ko'niro, iddmas,la. ) 
(9) 2. The system places tim job on a job queue. 
2. ~x-ye,),t, "7~.7"?,~Aj:YUV, "2,'~-/~:¢~3'2q'o \[2. Hy,~te'm, 
ha, jobmachi.qyouret,~'lt hi, .)lrO/) IDO okim,s.,vlt. \] 
(10) 3. The systean takes tit(, job fi'om the job (l,t<'ue and 
rllltS it. 
J'<> (3. System h,a, jobma, chigyov~rct,va kara, job ,wo 
tori, sore wo jikkou.~hima.~'u,.\] 
(11) 4. If this job creates some inforlnation (output) that 
needs to be 1)tinted, th(" printer output is placed on 
~l~I\[ Ollt;\[)1l{, (IllCllC. 
4. ~_ a) g u "fib ~, \[itJtiltJ c ~ \]¢c~.'g~¢o~>~ t,, < <)do,¢)'I)'i ~#, ( tllJJ ) {"P}~J~ 
1-~,~,{,::~;J:, ~l~lNitta)tl~)o~;t, tl',Jj~,7 ~,qi-#~j~c~gi?t. ~ *t ~t-¢<> \[4. 
Kono job ga, insatsu,~are'r'lt h, it,v~yo'u, ga aru ikut,,,l&a 
no jouho'u (ah,,~ttauryok,u) um ,~akuseisur'lt baa.i niha, 
inaatsusouchi no shuts,~ryok,~ ha, ,~h,.~d,~,l~,ryokamachi- 
gyouretau ni haichisaremas,l~,. 1 
(12) 5. The system takes printer output fl'om the out- 
1)ut qlteUe and sends it; to t;h(, desired 1)rint~w to l)e 
printed. 
5. "NXg:J, IJ~, tllJJ{,}6,~j:~/IJJ'G, lillt6~@,<.o)~qlJd~l\[~O'~_g~, f~rl 
• ll~7ote~©~"~¢lilJliliJ~'t.~:, ~a~'l-~ b 2-j'o \[5. Sy.'~tem 
h,a, ,~h'l~t.~,aryok'lt.m, ach, igyo'u'rc't,~u kara, i'~t,~at,~rlt,~o'l~chi 
no ,~h,utsu¢,,qoku wo torikomi, in.sat.~'u,.~arcr'u, tam( no 
hitsuyouna inaatsuso,~chi ni, sm'e wo oX:'arimasu. 1 
Figure 4: Translation wil:h context (III) 
I Polysmnous words within a discourse tend to 
h;tve the Sa, lllP word S('llS(". 
• Words with th(' same h'nnna ten(\[ to modify or 
1)(' modified by similar words. 
• Topical words t('nd to I)e repeated frequently. 
Therefore, the effectiveness of this lnethod is highly 
(h'p(qid('nt on the source text. th)wever, at least in 
mos\[ l:('('hnic&| do('uln('tits Stl('h ~ts ('()ili\[)llt('l' IlI&IIII&|S, 
th(' above ;mSUml)tions hohl true, and we h~we had 
encouraging results. 
Acknowledgements 
I wouhl like to thm,k Mi('hael McDonald for his inwdnabh, 
help in l>roofr('ading this paper. I wouhl also like to thank 
Taijiro Tsutsumi, Masayuki Morohashi, I'~oichi Takeda, 
Iliroshi Maruyam~h Hiroshi Nomiyamn, Hid(x) \Vatanabe, 
Shiho ()gino, Naohiko Uramoto, and the anonymous re- 
vi('w('rs for their (:omnlents a,nd suggestions. 
References 
IP~ltl|g~('tt(" Chm'niak. 1973. Jack and .}an(,i; ill Search of 
a 'l~h('ory of Knowledge. In Proceedings of IJCAL7,7, 
Img('s 337 343. 
William A. Gale, I':emwth W. Church, and David Yarowsky. 
119(,)2. ()n(' Sense per Dis('onrse. In Proceedi',41,~ of th, e 
4th DARPA Speech and Naturo, l Lanq'uagc Work:ahop. 
Barbara ,1. Grosz and Candmt('e I,. Sidner. 1986. AI.- 
tentions, hLtentions, and the Structure of Discourse. 
Compatational Linquiatic,% 12(3):175 204. 
Dmdel Lyons and Gracme Hirst. A Compositional Se- 
ma,ntics for Focusing Sul)juncts. In Procceding,q ofACL- 
90, pages 54 61, 1990. 
I(atashi Nagao. 1990. Dependency Amdyzer: A l(imwh'dge- 
Bas('d Api)roach to Stru('tural Dismnl)iguation. In Pro- 
ceedinga of COLING-90, pages 282 287. 
~\['('tsuya Nasukawa. 1993. Discourse Constraint in Com- 
lmt('r Manuals. In Procecding.~ of TMI-93, pages 183 
194. 
Tetsuya Nasukawa. 1994. Ilo|)ust Method of Pronoun 
Resolution Using Full-Text, Information. In Proceedings 
of COLING,94, pages 1157 1163. 
Tctsuya Nasukawa, 1995. Rol)ust Parsing Based on 
Discours(~ Inform~ttion: Coml)leting Partial Parses of 
Ill-Forlned S(?nten(-es on the Basis of Discourse Infor- 
ln~ttion, lit fb'oceedinga o\]" A CL-95. 
T(,tsuya Nasuk~w~t mM Naohiko Uramoto. Discours(~ 
as a I,\[nowledge Resourc(~ for Senten('e Disaml)iguatiom 
In Proceedin9.s of \[JCAL95, 1995. 
Roger C. S(:tmnk m,d Christot)her K. t{iesb(x:k. 1981. 
I'n.~ide Computer Underatanding: Five Pro.qram.~ plu,,~ 
Miniature,< Lawrence Erlbauln Associates, tlillsdah', 
New Jersey. 
Koichi Take&t, Naohiko Urmnoto, T(,t, suya Nasukawa, 
and Taijiro Tsutsumi. Shalt2: Symmetric Machine 
Trm,slation System with Co,reel)rUM '\]'ransf(,r. ht Pro ~ 
ceedin.q,~ of COLING-92, pages 1034 1038, 1992. 
Naohiko Uramoto. 1992. Lcxical and Structural Dis- 
ambiguation Using an Exauq)le-Base. In Procecdings of 
the 2rid ,lapan-Au,~tralia ,loint Sympoaiu)n on Naturo.l 
Lauguage Proce,~sin.q, pages 150 160. 
829 
