• 1" )" A Met;hod for Ac(:elerati lg CFG-l msing/)y \[Js\]ng \])ependency 
\]nformation 
Hideo \¥atmJalm 
IBM lh'scarch, '.l.'okyo lh~'sc'arch Lal)oral:ory 
\]623-1d Shimotsuruma, Ymnato, \](anagawa 242-8502, Jalmn 
watanabt~((0trl.ilml.(:o.j i) 
Abstract 
'\].'hi.q lmlmr d(,scrib(;s an algorithnl for accc'lerat- 
ing l;h(; CF(~'qm.rsing t)ro(:t;ss by using (lel)(;nd(;ncy 
(or modifier-nlodifie(; relationship) infornmtion given 
by, for insi,an(:e, d('.llcnd('alcy cstimal,ion l)rograms 
Sll(:h as sl;o(:\]ulsl;i(: 1)arsers~ llSCl';,q Jltdic;d;ion in an 
inl;(,ra(;tiv(', al)t)li(:al;il)n ,mM \]inguisl;ic mmotal;i(nm 
;t(hh:(1 in a sour(:(' l;(.'xl;. This is a ml;l;hod for ('.n- 
\]mn(:ing exi,%ing grmnmard/as(',d CF(\]-l)arsing sys- 
Wan by using dc'tmnden(;y informal;ion. 
_1. Introduction 
Th(' parsing sysl;O.lil is ;i, key co111t)o11(111|; \]'or 11at- 
tual , ai)i/lications such as machine trans- 
lal;ion, informal;ion rel;rJ(wal, l;cxt ,'-;unllnariz;ll;ion, 
and its l)(;rfornlml(:(; (\])roct;ssintt; speed and act:u- 
racy) is very inq)orl;ant to l;h(! success of l;lms(' ap- 
pli,::ations. 
Tim umm\] CF(Ltmrsin/~ a lgorithlns \[3, 6\] k(,.(!p all 
interm('.dJat(; l)ossibiliti(~s which may or may not t)c 
tls(xl ill tim tinal pm:se r(;,qults. Tlmr(~for(!, we usu- 
ally reduce 1;hi;s(; illl;(.'l:lllcdial;o, l)ossibiliti('s which 
are unlikely to t)(; used as tinal results in the nlid- 
die ()f l;he process l)y using s(;vt;ral l)rmling t('x:h- 
ni(luCS. One good information ,sour(:('~ for pruning 
is d(;t)end(;n('y information |)c.tw('.cn words. It has 
nol; l)(;(m so easy l;o gel; such d('l)('n(h'.ncy informa- 
tion until a. few years a.go, but, th(; sil;uat;ion has 
ret:(ml;ly chang(,,d. 
Recent intensive studies on statistical alll)roach 
\[7, 1, 2\] a(lvanccxl statistical parsing systems, and 
wc can gel, relatively correct dct)en(h'ncy informa- 
tion using these systems, leurthc'r, if we SUl)t)osc 
an interactive NLP system, then there aa(, sore(, 
types of user intera(:tions which can b(; considered 
to determilm 1;11(; modifice c;mdidatc. I11 addition, 
recent studies on the linguistic infi)rnmtion mmo- 
(;a.l;ion \[10, 4, 12, 1.3\] provid(; tools l/y which a user 
can (;asily annotate linguistic intbrnmtion (si/ecial 
XML markup tags) into source texts, and we can 
OX\[)(X;I; |;0 ,qtX) a,ll increase of tho 11111111)(11 of l;exi;s 
wil;h linguistic information. This linguistic infor- 
nlai;ion usually includes dependtmc,y infornml;ion. 
For instmmc, the following example shows m~ &llllO- 
(;al;ion ('xaml/h' by Linguistic Annol;ation lmnguag(; 
described in \[12, 13\], and the id and rood atl;l'i})ll|;c,q 
inside tal:w (,hmmifl;s spc.ci\[\[y word dependencies. 
IIe (lal:w id=" 1" )saw(/lal:w) a man (lal:w 
,nod="1" )with(/lal:w) a tc'l(',scolm. 
in this (;xanll)h', the word "with" modifi(;s l;he word 
As shown in l;hc, above (~xample,% we can now 
get depc.ndcncy inlbrnmtion more easily than a ti;w 
years ago. This paper describes an algorithni for 
accelcrnting CFG-lmrsing systems by using su(:h 
d(;pcnd(;ncy (or modifier-moditi(;e r(~lationship) in- 
formation. Th(; prot)oscd algorithm does not as- 
sume all words are given dctmndency int'ormation~ 
ratht;r it works in case such that some of words are 
partia.lly given dep(;ndt',ncy infl)rnm.tion. 
2 Ol)timizing Algorithm Using De- 
pendency Infornmtion 
We use a. nornml CFG lmrsing sysi;('m with one' 
(;xl;(;nsion that for (m.t'h ru\]c ( here, must he. o11c righl;- 
h;md sid(' (or \]{ITS) t('rm I mark(,d as a h(,ad, and 
th('. informati(m (if a head term is trmlsJhrr('.(l Lo l;hc. 
lc.ft-hmM side (or HIS) tenn. In this lmtmr, a CIeG 
rule is (hmol;ed as follows: 
{x -~ ~q ... ~.... ~.;,} (,,, > 0) 
In tim above, notation, X is dm left-lmnd side 
(or LHS) term, mM I5- are right-hand side (or llH$) 
terms, mM a RHS term followed by an asterisk '*' is 
a head term. The l;ypical usage of the head is that 
the LHS t(;nn shares many features of the head 
term in the RHS. For instmme, a matching word of 
the the LHS tcnn becomes the same as the one of 
the head term in the RHS. 
For each rule, an arc is constructed over a word 
segment in a.n input sentence. An aa'c is d('alot(,d 
using terms of its base rule as follows: 
Ix-~ ~q ... E-. ~1+,* ... 5,\] 
1A term expresses a non-terminal symbol in IAIS, an(1 ~'~ 
non-terminld or a terminal symbol in l/.IIS. 
913 
The LHS term of an arc nmans the LHS term 
of the base rule of the arc, and RHS terms of an 
arc means RHS terms of the base rule of the arc. 
In the above notation, a single dot indicates thai; 
RHS terms located to the left of a dot are inactive, 
that is, they already match the LHS term of some 
other arcs. Three dots are used to ret)resent zero or 
any number of terms. An arc whose RHS terms are 
all inactive is called an inactive arc, otherwise it is 
called an active arc. An arc covers a segment of 
input words; the start point of an arc is the index 
of the first word in the covering segment, and the 
end point of an arc is 1 plus the index of the last 
word in the covering segment. 
Basically, a standard CFG parsing algorithm such 
as \[3, 6\] consists of the following three operations. 
Initialization: For each word, arcs are generated 
froln rules such that the leftmost RHS term 
matches it. 
Operation A: For each inactive arc A, an arc is 
generated fl'om A and a rule R such that the 
leftmost RHS term of R ,natchcs the LHS 
term of A. 
Operation B: For each inactive arc A, an arc is 
generated from A and another active arc B 
such that the leftinost active RHS term of B 
matches the LHS term of A and the end t)oint 
of B is the stone as tile start point of A. 
We assume that some dependency information 
1)etween words are given, and such det)endency in- 
formation is denoted as follows: 
w.~w, 
The first of the above examples represents that 
a word I/V u modifies another word I~(~, attd W~, pre- 
cedes 14~j, while the second one represents that a 
word Rq, modifies another word H~j and W,, pre- 
cedes 1/1~/. 
Given this kind of dependency information, the 
following conditions are imposed on Operation A 
and Operation B. 
Conditions for Operation A: 
Condition A1 (when the leftmost RHS term of 
a rule is a head term): 
Given an inactive arc Arc1 denoted by 
\[A ~ ...\] and a rule which has two or 
more RHS terms and the leftmost RHS 
term is a head denoted by {X -+ A * 
B ...}, Operation A is executed only if 
there is dependency information 144, 
lYb where 1,14~ is a word matching the 
LHS term A of Arci and lVb is a word 
located anywhere to the right of the end 
I)oint of Arc1. 
{X->A* B ...} 
Wa WD 
-. ............ °--" 
Figure 1: Condition A1 
Figure 1 shows the above condition. In this fig- 
ure, a thick arc ret)resents an inactive arc, a line 
represents a matching to be tried in this ot)eration, 
a dotted line represents a matching betweeu a term 
in an arc and a word, and a dotted arrow represents 
dependency infbrmation. In this case, this type of 
rule implies that a word matching the LHS term 
of the arc to be matched with the leftmost term of 
the rule must be modified by any word which is lo- 
cated after the end t)oint of the arc, since the head 
term is the left;most term of the rule. Therefore, if 
the A1 condition does not hold, Operation A is not 
required to be executed. 
Condition A2 (when the leftmost RHS term of 
a rule is not a head term): 
Given an inactive arc Arc1 denoted by 
\[A --+ ...\] and a rule which has two or 
more RHS terins and the leftmost I{HS 
term is not a head denoted by {X --+ A 
... D* ...}, Operation A is executed 
only if there is a dependency informa- 
tion 14~ ~ 1¥~ where 1/1~ is a word 
Inatching the LHS term A of Arc1 and 
Wv is a word located anywhere after the 
end point of Arc1. 
Figure 2 shows the above condition. In this case, 
this type of rule ilnplies that a word matching tile 
LHS term of the arc to be matched with the left- 
most term of the rule Inust inodify any word which 
is located after the end point of the arc, since the 
head terin is not tile leftmost term of the rule. 
Conditions for Operation B: 
Condition B1 (when the leftmost active RHS 
term of an active arc is the head term): 
Given an active arc Area denoted by 
\[X --+ Ao ... A~ . B....\] and anin- 
active arc Arc1 denoted by \[B -+ ...\] 
914 
{X-> A ... D* ...} 
Wa Wb 
Figure 2: Condition A2 
such that the end point of Area is the 
santo as the start point of Arc r, Ol)era- 
tion B is executed only if, for each l,lz,~ 
(0 < i < n) which is a word match- 
ing the RItS term A~ of AreA, there 
is dependency information I'Vai => I'Vb, 
where Wi, is a word matching the LHS 
term B of Arc,. 
ix->. 
,: : -, V / \ 
Wao ... Wa,, Wb 
""-- ...................... .-" 
l?igure 3: Condition B1 
Figure 3 shows the above condition. Ill this fig- 
ure, ~ dotted thick arc represellts an active m:c. hi 
this case, this type of active arc implies that words 
matching inactive terms before' the head term of 
the active art: must modify a. word matdfing tile 
LHS term of the inactive arc. 
Condition B2 (when the head term is on the left 
side of the leftinost active RHS term of an active 
arc): 
Given an active mc AreA denoted by 
\[X ~ ... A* ... B ...\] and all in- 
active arc Arc1 denoted by \[B --+ ...\] 
such that the end point of Area is the 
same as the start point of Arc1, Oper- 
ation B is executed only if there is de- 
t)endency information W,, ~ Wb where 
144~ is a word matching the RHS term 
A of AreA, and Wv is a word matching 
the LItS terln B of Arc1. 
..... 
IX->, " 
l" 
I 
Wa Wb 
Figure 4: Condition B2 
Figure 4 shows tile above condition. In this case, 
this type of active arc implies that a word lnatching 
the LIIS term of the inactive arc nmst modi~ a 
word matching the head tcrin of the active arc. 
Condition B3 (when the head term is on the 
right side of the leftlnost active RItS term of an 
active arc): 
Given an actiw; arc Area denoted by 
\[X ~ A. B ... C* ...\] and an inactive 
arc Arc1 denoted by \[/3 -+ ...\] such that 
the end point of Area is the same as the 
start point of Arc,, Operation B is exe- 
cuted only if there is dependency infof 
nmtion Wb ~ 14~ where Wf, is a word 
matching the. LItS term B of Arcl, and 
l'14: is a word on the right side of the, 
end point of Arci. 
Wb ,. I," Wc 
Figure 5: Condition B3 
Figure 5 shows the above condition. In this case, 
this type of active arc implies that a word matching 
the. LHS term of tile inactive arc must modify a 
word after the end point of tile inactive arc. 
The dependency information is not necessarily 
given to all words. If there is any source word ex- 
cept for the root word of a sentence such that there 
915 
is no del)endency information originating fl:om it: 
then a set of such del)endeney inibrmation is called 
partial, otherwise, it is called total. If the given de- 
1)endency informatioll is partial, the A\] condition 
can not be used, since, even if there is no det)en- 
dency information targeting I.V,,, we eanllot know if 
such del)endency information does not really (,xist, 
or if such delmndency inlbrmation is llot Sul)plied. 
For other conditions, we check them only when all 
source words for dependency checking have depen- 
dency information. On the other hand, if the given 
dependency information ix total, all conditions are 
checked. 
3 Experiment 
We have imt)lemented the 1)reposed algorithm 
into an existing English CFG-parser we have devel- 
oped for a machine translation t)roduct \[8, 9, 11\] e 
, and conducted an experinmnt to know the effec- 
tiveness of this algorithm. 
We selected 280 test sentences rmxdomly from 
a sentence set created by .\]EIDA :~ for ewfluating 
translation systen L and made the correct dei)en-- 
(lency relation data for these selected test sentences. 
We collected the number of inactive arcs, the num- 
b(;r of active arcs, and the t)rocessing time for cases 
such that C modifiee candidates (one of which is 
the correct modifiee) are given to a word. 4 If C:=I 
then it; corresponds to the best case for a parser 
such that only one correct modifiee is given fin' each 
word, while if C is 3 or 4 then it; corresponds to the 
approximation of using a statistical modifiee esti- 
ination program for getting candidate modifiees. 
The graphs in Figure 6 indicate the reduction 
ratios of active arcs, inactive arcs, and 1)recessing 
time for using conditions for total dependency in- 
formation and conditions tbr partial del)endeney in- 
formation. The de'nominators for calculating these 
r&tios are the numbers of ar(:s and the processing 
time (seconds) in case of the parser without this al- 
gorithm. In these graphs, C=X indicates that X is 
the maxilnunl nulnber of moditlee candidates given 
to a word. 
From these gratlhs, we can so(; that the more 
words in a sentence, the better the 1)erformance. 
In a real domain, most sentences consist of more 
than ten words. Therefore, looking at values for 
around 10 in the X axis, we can see that inactive 
arcs are reduced by about 40% and 25%, active arcs 
2This parser is used in a Web page translation software 
called "lnternet King of 3t'anslation" released from IBM 
.laI)an. 
a.lal)all Electronic Industry \])evcloi)ment Association 
4Modifiee candidates are selected randomly except for 
the correct oi1o. 
are reduced by about 65% and 35%, and t)rocessing 
time is reduced by about ~15% and 15%, for the 
ideal case (C-1) and more practical cases (C=3 
or 4), respectively, in the (:as('. of total del)endency 
information. Please note that, since the 1)arser in 
which this algorithm ix impleumnted has already 
several pruning mechanisms, we can expect more 
reduction (or pertbrmance gain) for generic CFG 
pars(',rs. 
4 Discussion 
As a study for accelerating the parsing tu'ocess 
using dependency information, Imaichi\[5\] reported 
an algoritlnn for Japanese . The condi- 
tions introduced by hnaichi are described by using 
the notation in this paper as ~bllows: 
Condition MI: 
Given an active arc Area denoted by 
\[X -~ A . 13.\] and an inactive arc Arc1 
denoted by \[B -+ ...\] such that the end 
point of Area is the same as the start 
point of Arc j, Operation B is executed 
only if there is dependency infl)rmat.ion 
I<1~, -=> lYb where 1'15~ is a word matching 
the RIIS term A of AreA, and lVt, is 
a word mat(:hing the I~HS term ,r3 of 
mrcl. 
Con(lition M2: 
Given an inactive ar(: Arc1 denoted 1)y 
\[A -+ ...\] and a rule denoted by {X --~ 
A ...}, Operatioil A is execllted only 
if there is no det)endency iifl'ormation 
Wt. => l'l/-,~ where 1.'I~ ix a word match- 
ing the LHS term A of Arc1 and lYt. is 
a word loca.ted before the start point of 
A~rcl . 
The condition M1 correspouds to B 1. Since hnaiehi's 
algorithln considers only .Japmmse in which all words 
other than the last; word modifies one of the suc- 
ceeding words, it does not deal with cases usually 
seen in Eurot)eall s where a word modities 
one of the preceding words. Therefore, it is not 
applicable to any  other than Jat)anese in 
general. Fnrthcr, since a CFG rule is restricted to 
be in Chomsky normal form, hnaichi's algorithm is 
limited in terms of at)plicability. 
Since the algorithm proposed in this pal)er does 
not have any restrictions on the dependency direc- 
tion and the CFG rule format, it can be applicable 
to any CFG-parsers ill any s. 
916 
Reductiof Ratio o\[ Inactive Arcs 
for Total Dependency Inf0. 
60 
85o 
'~\[ 40 
O 
qJ 
10 
0 
4 5 6 -! 8 9 10 11 12 
NUt'rl 01" WOIL~; 
O0 
Reduction I~atio of Active Arcs 
for loLal Dependency Info. 
80 
70 
~ 60 ii 
'~0J ,50 ft- 
=40 o 
~: 20 
10 
4 5 fi 7 8 £ 10 11 1 
\[J\]J G=1 
I~ 0=2 
\[\] O~-3 
I~ 0:-4 
Nl.lm of 'lil'~l'Orl\[}~ 
O0 
\[\]\]J O=l 
.~ 0--2 
\[\] 0=3 
\[\] 0=4 
60 
Reduction Ratio of lime 
for Total Dependency Info. 
~.5O 
'~ 40 ~0 
~30 
,\]3 
~clO 
4 5 6 7 8 g 10 11 12 
Nurl'l of "~l'l,l'Orlj4,. 1: 
\[\] 0=1 
\[\] O--2 
\[\] 0--3 
\[\] O=4 
Reduction Raito of Inactive .Arcs 
\[or Partial Dependency In\[o. 
60 
~5o 
0 -P 40 
0 ~20 
E 10 
4 5 6 1 8 9 10 11 
NLIFr'I Elf 'l,ll,¢Oi"l.~-: 
(t,) 
t21 O=2 
O=3 
F_1 0=4 
12 
Reduction Ratio ot Active Arcs 
for-Partial Dependency Info 
5O 
~40 
o 
o "~ 20 
"U j 
4 5 6 7 8 g 10 11 
NU~'\[\] ,:If ),ll,tClrl~:~: 
(d) 
~ O=l O=2 
O=4 
12 
50 
40 
C I 
o ailO 
Reduction Ratio of lime 
for Partial Dependency Info. 
4 5 6 7 8 9 10 11 12 
Num ,:,f W0rd~. 
(,;) (f) 
~\] 0=1 
U O=2 ~ 
G=3 
6=4 
Figure. 6: l{educl;ion ratios o\[ inaci.ive arcs, acl~ive arcs, a.ml processing Lime 
917 
5 Conclusion 
We developed an algorithm for accelerating the 
performance of the CFG t)arsing process if we are 
given dependency information. From an experi- 
ment, we can show the effectiveness of this algo- 
rithm. 
By using this algorithm, we can enhance exist- 
ing grammar-based parsers using dependency in- 
formation given by stochastic parsers, interactive 
systems, and texts created by linguistic annotation 
systems. 

References 

\[1\] M. Collins. A new statistical parser based on 
bigram lexical dependencies. In Proc. of 3~fl~ 
AC£, pages 184-191, 1996. 

\[2\] M. Collins. Three generative, lexicalized models 
fbr statistical parsing. In P~vc. of 35th A CL, 
pages 16-23, 1997. 

\[3\] J. Earley. An efficient context-free parsing al- 
gorithm. In Readings in Natural Language Pro- 
cessin9. Morgan Kauflnan, 1969. 

\[4\] K. Hashida, K. Nagao, et al., Progress and 
Prospect of Global Document Annotation. (in 
Japanese) In P~vc. of ~th Annual Meeting of 
the Association of Natural Language Process- 
ing, pp. 618-621, 1998. 

\[5\] O. Imaichi, Y. Matsumoto, and M. Fujio. An 
integrated parsing method using stochastic in- 
formation and grammatical constraints. Jour- 
nal of Natural Language Prvcessing, 5(3):67-83, 
1998. 

\[6\] M. Kay. Algorithm schemata and data struc- 
ture in syntactic processing. Technical Report; 
CSL-80-12, Xerox PARC, 1980. 

\[7\] D. M. Magerman. Statistical decision-tree mod- 
els fi)r parsing. In Prvc. of 33rd A CL, pages 
276-283, 1995. 

\[8\] K. Takeda. Pattern-based context-free gram- 
inars for machine translation. In Proc. of 3~th 
ACL, pages 144-151, 1996. 

\[9\] K. Takeda. Pattern-based machine translation. 
In Proc. of 16th Coling, volume 2, pages 1155 
1158, 1996. 

\[10\] Text Encoding Initiative 
(http://www.uic.edu:80/orgs/tei/) 

\[11\] H. Watanabe and K. Takeda. A pattern- 
based machine translation system extended by 
example-based processing. In Proc. of 17th Col- 
ing (Coling-ACL'98), volume 2, pages 1369- 
1373, 1998. 

\[12\] H. Watanabe, Linguistic Annotation Lan- 
guage - The Markup Language for Assist- 
ing NLP Programs -. IBM Research Report 
I/T0334, 1999. 

\[13\] H. Watanabe, K. Nagao, et al., Linguistic An- 
notation System for Improving the Performance 
of Natural Language Processing Programs. In 
Proc. of 6th Annual Meeting of The Association 
for NLP (in Japanese), pp. 171-174, 2000. 
