Dynamic Programming Method 
for Analyzing Conjunctive Structures in Japanese 
Sadao Kurohashi and Makoto Nagao 
Dept. of Electrical Engineering, Kyoto University 
Yoshida-honmachi, Sakyo, Kyoto, 606, Japan 
kuro@kuee.kyoto-u.ac.jp 
Abstract 
Parsing a long sentence is very difficult, since long 
sentences often have conjunctions which result in am- 
biguities. If the conjunctive structures existing in a 
long sentence can be analyzed correctly, ambiguities 
can be reduced greatly and a sentence can be parsed 
in a high successful rate. Since the prior part and 
the posterior part of a conjunctive structure have a 
similar structure very often, finding two similar series 
of words is an essential point in solving this problem. 
Similarities of all pairs of words are calculated and 
then the two series of words which have the great- 
est sum of similarities are found by a technique of 
dynamic programming. We deal with not only con- 
junctive noun phrases, but also conjunctive predica- 
tive clauses created by "Renyoh chuushi-ho". We will 
illustrate the effectiveness of this method by the anal- 
ysis of 180 long Japanese sentences. 
1 Introduction 
Analysis of a long Japanese sentence is one of many 
difficult problems which cannot be solved by the con- 
tinuing efforts of many researchers and remain abau- 
doned. It is difficult to get a proper analysis of a 
sentence whose length is more than fifty Japanese 
characters, and almost all the analyses fail for sen- 
tences composed of more than eighty characters. To 
clarify why it is is also very difficult because there 
are varieties of reasons for the failures. People some- 
times say that there are so many possibilities of modi- 
fier/modifyee relations between phrases in a long sen- 
tence. But no deeper consideration has ever been 
given for the reasons of the analysis failure. Analysis 
failure here means not only that no correct analysis 
is included in the multiple analysis results which are 
caused by the intrinsic ambiguity of a sentence and 
also by inaccurate grammatical rules, but also that 
the analysis fails in the middle of the analysis pro- 
re88, 
We have been claiming that many (more than two) 
linguistic components are to be seen at the same time 
in a sentence for proper parsing, and also that tree to 
tree transformation is necessary for reliable analysis 
of a sentence. Popular grammar rules which merge 
two linguistic components into one are quite insuf- 
ficient to describe the delicate relationships among 
components ill a long sentence. 
Language is complex. There often happens that 
components whicb are far apart in a long sentence co- 
occur, or have certain relationships. Such relations 
may be sometimes purely semantic, but often they 
are grammatical or structural, although they are not 
definite but very subtle. 
A long sentence, particularly of Japanese, con- 
tains parallel structures very often. They are ei- 
ther conjunctive noun phrases, or conjunctive pred- 
icative clauses. The latter is called "Renyoh chuushi- 
ho". They appear in an embedded sentence to mod- 
ify nouns, and also are used to connect two or more 
sentences. This form is very often used in Japanese, 
and is a main cause for structural ambiguity. Many 
major sentential components are omitted in the pos- 
terior part of Renyoh chuushi expressions and this 
makes the analysis more difficult. 
For tbc successful analysis of a long Japanese sen- 
tence, these parallel phrases and clauses, including 
Renyoh chuushi-ho, must be recognized correctly. 
This is a key point, and this must be achieved by 
a completely different method from the ordinary syn- 
tactic analysis methods, because they generally fail 
in the analysis for a long sentence. 
We have introduced au assumption that these par- 
allel phrases/clauses have a certain similarity, and 
have developed an algorithm which finds out a most 
plausible two series of words which can be considered 
parallel by calculating a similarity measure of two ar- 
bitrary series of words. This is realized by using the 
dynamic programming method. The results was ex- 
ceedingly good. We achieved the score of about 80% 
in the detection of various types of parallel series of 
words in long Japanese sentences. 
2 Types of Conjunctive Struc- 
tures and Their Ambiguities 
First, we will explain what kind of conjunctive 
structures (hereafter abbreviated as 'CS') appear in 
Japanese\[l\]N. 
The first type is conjunctive nomi phrases. We 
ACRES DE COLING-92, NAN'r~, 23-28 AOU'r 1992 1 7 0 PROC. OF COLING-92, NANTES, AUO. 23-28, 1992 
'l'alfle 1: Wards indicating conjunctive structures. 
~- (a) Conjunctive noun phrases 
|$,,/ctl *a 6@|c a55~,tt ~a b < ~t 
L. ~)_.~+(~ Conjunctive predic~ttive clauses 
~.la (c) Conjunctive incomplete structures 
~ ~-Tgg~a ~gggK ~ETK 
' + ' means succession of words. Characters in ~( )' may 
or may not aplmar. 
can find these phrases by tile words for conjunction 
listed up in Table l(a). Each conjunctive noun some- 
times has adjectival modifiers (Table 2(il)) or clause 
modifiers (Table 2(iii)). 
The second type is conjunctive predicative 
clauses, ill which two or more itredicates ~ arc in 
a sentence forming a coordination. We call find 
these clauses by the ll,enyoh-lbrnts ~ of predicates 
(Renyoh ehuushi-ho: Table 2(iv)) or by tile predi. 
cares accompanying one of the words in Table l(b) 
('rable 2(v)), 
'\['he. third t.ype is CSs consisl.ing of parts of conjtmc- 
tire predicatiw~ clauses. We call this type eonjune- 
tlve incomplete structures. We can find these 
structures by the correspondence of p(xstpositional 
particles (Table 2(vi)) or by the words in Table l(e) 
which indicate CSs explicitly (Table 2(vii)). 
l,br all of these types, it is relatively easy to tind 
the existence of a CS by detecting a distinctive key 
bmlsetsu a (we call this bunsetsu 'KB') which ac- 
companies these words explained above. KB lies last 
in the prior part of at CS, but it is difficult to deter 
mine which bunsetsu sequences on both side of tile 
KB constitute a CS. That is, it is not easy to deter- 
mine which Imnsetsu to tile hfft of a KII is tile leftmost 
element of the prior part of a CS, and which bunsetsu 
to the. right of a Kil is tile rightmost element of the 
posterior part of a US. The bunsetsus betweeu these 
two extreme elements constitute the scope of the 
CS. Particularly in detecting this scope of a CS, it is 
essential to find out the last Imnsetsn in the posterior 
part of the CS, which corresponds to the KB. q'here 
art'. lnany candidates for it ill a seatence; e.g., ill a 
conjunctive noun i)hras~ all nouns after it KII are the 
candidates. We call snch it candidate bunsetsu '(211'. 
It is almost impossible to solve this problem merely 
by using rules based oil phra.se structure grammar. 
lilt addition to verbs tutti aAjectives~ assertive words 
(kinds of postpositioxm) " /d"(da), "q2ab5 "(dearu), "e-J- 
"(desu) and so on, which follow directly after nouus, cm~ be 
predicate it| d*tl>ltllese. 
~'fhe ending foritls of inflectional words which c;m modify 
vet|>, ~tdjective, or a~ertivc word au~ c-tiled I~e/lyoh-fornl in 
.1 apanese. 
3 \]~utmetuu is tile Slllgtllet~t ineanhlgful block tx|nsisting of *tit 
indelxmdcnt word (lW; tmuns, verbs, adjectives, etc.) and 
aCCOlttpau~yittg word~ (AW; l),xslp~sitio|lal pgu'ticles, &uxiliguy 
verbs, etc.). 
Table 2: Examples of conjunctive structures. 
Conjunctive noun phrases 
(i) ... lMgr (analysis) ~ ( a,,l} *_l:~ + (generation) ... 
(ii) .,./~.~ag (so.,'~e l.avuage text) ~9 (o\]) g~dCc (anal- 
ysis) ~(and) ~1~ ( ta,'gct language tezt) © ( oJ) ~k 
~ ~ (.qeneration) ... 
(iii) ... ~'~g~ ( so,,'ce languaqe text) ~i$~T b ( au- 
alyziag) ~t~t{(processmg) ~(a,d) ~Hl~g~: (target 
language text) _d~r~3 ~ 6 (generating) ~..~" (process- 
ing) . .. 
I Conjunctive predicative clauses 
tag) , ~AI=J2~ ( ta,'uet l.,,9uage text) tl:.~'# ~ (9c,- 
ertatittg) (~ (processi,,g) ...). 
tLl~ (9eneration) :eta (\]o,.) *1Jill L @ ~ (do .ot ~e) ( ~: 
Omljunctive incomplete structures 
~,' ~(the lat~cr) ~:M~ (~ ..... lion) re(/o,')..'7 
(vii)... ~/t~ (a,,alvsis) ~ (10,'), $ ~:t~(and) ~ (9en- 
3 Analysis of Conjunctive 
Structures 
We detect the scope of CSs by using wide range of 
information around it KB. 4 An input sentence is first 
divided into bunsetsus by tile conventional morpho- 
logical analysis. Then we calculate similarities of all 
pairs of ~)unsetsus ill a selltence, and calculate a sum 
of similarities between a series of bunsetsus on the 
left of a KII and a series of bunsetsus on the left of 
a CB. Of all the pairs of the two series of Imnsetsus, 
the pair which has the greatest sum of similarities is 
determined as the scope of the CS. We wilt explain 
tins process in detail in the following. 
3.1 Similarities between Bunsetsus 
An appropriate similarity value between bunsetsus is 
given by the following process. 
• If the parts of speech of IWs (independent words) 
are equal, giw~ 2_j>oints as the similarity values. 
Then go to the next stage and add further the 
following I)oints. 
1. If IWs match exactly (by character level) each 
other, add 10 points and skip the next two 
steps and go to tile step 4. IflWs are inflected, 
infinitives are compared. 
2. If both IWs are nouns and they match par 
tially by character level, ad<l the number of 
matchin~ characters x 2 \]mints. 
4 We (Io not halldle Colljullclive predicatiw~ el*tune* cteatexl 
by the Itcnyoh fc*rtns of predicates (|{enyoh c|nmshi-ho) which 
do ltOt accompany COllllll*t, })e¢llll~: almost all of these prc,ll- 
c,ties iilOdify thc llCXL llt~al¢~st \[)l'edicltte lilld there is 11~) need 
t,~ chc<:k the possibility of conjunct|oil. 
Acn:.s DE COLING-92, NAutilus, 23-28 Aom' 1992 1 7 1 PROC. O1~ COLING-92, NANTES, AU6.23-28, 1992 
^': ~.pmlal maulr. 
P 
~- ..................... r ................... 
. ~ I ~(p n+l) 
\ .",.. I "Ne ~'" " similarity value 
:"... I~----- A = (.(ij)) 
Figure 1: A path. 
3. Add points for semantic similarities by using 
the thesaurus 'Buurui Goi Ityou' (BGH)\[3\]. 
BGH has the six layer abstraction hierarchy 
and more than 60,000 words are assigned to the 
leaves of it. If the most specific common layer 
between two IWs is the k-th layer and if k is 
greater than 2, add (k - 2) × 2 points. If ei- 
ther or both IWs are not contained in BGH, no 
addition is made. Matching of the generic two 
layers are ignored to prevent too vague match- 
ing in broader sense. 
4. If some of AWs (accompanying words) matcb, 
add the number of matchin$ AWs x 3 points. 
Maximum sum of the similarity values which can 
be added by the steps 2 and 3 above is limited to 
10 points. 
• Although the parts of speech oflWs are not equal, 
give 2_.points if both bunsetsus can be predicate 
(see footnote 1). 
For example, the similarity point between "~ 
~Pi~ (low level language) +," and " ~lt¢,~'~ (high 
level language) + ~ (and)" is calculated as 2(match of 
parts of speech) + 8(match of four characters: Y~l/t~ 
~) = 10 points. The point between " ~\]'aq~ (revision) 
+ L (do) +," and "l~U3(deteetion) +'J-~ (do)" is 
2(match of parts of speech) + 2(match by BGII) + 
3(match of one AWs) - 7 points. 
3.2 Similarities between Two Series of 
Bunsetsus 
Our method detects the scope ofa CS by two series of 
bunaetsus which have the greatest similarity. These 
two aeries of bunsetsus are searched for on a triangu- 
lar matrix A = (a(i,j)) (Figure 1), whose diagonal 
element a(i,i) is the i-th bunsetsu in a sentence and 
whose element a(i,j) (i < j) is the similarity value 
between bunsetsu a(i,i) and bunsetsu a(j,j). 
We call the rectangular matrix A' a partial ma- 
trix, where 
A' =(a(i,j)) (O< i< n; n+ l < j <1) 
t,(i, ~3~ .......... i'---~i, j-I )-~i, 9 
"J ..... I i i\ 
,(.. ni i i % 
Figure 2: An ignored element. 
..... ~ ......... ~,~: .......... ! ....... 
i c5~.( ",, i ..... i ........ -2 'v-\ ........ ~ ....... 
..... i ........... :;'i.:':~'~:~ .....~ ....... 
Figure 3: Penalty points. 
is the upper right part of a KB (Figure 1). In tile 
following, 1 indicates the number of bunsetsus and 
a(n, n) is a KB. We define a path as a series of ele- 
ments from a non-zero element in the lowest row to 
an element in the leftmost column of a partial matrix 
(Figure 1). 
path ::= 
(a(pl, m), a(p2 .... 1) ..... a(p ..... + 1)), 
where n + l < m <1, a(pl,m) ¢ O, Pi = n, 
PI>>.PI+I( 1 <i<m-n- 1). 
The starting element of a path shows the correspon- 
dence of a KB to a CB. A path has only one element 
from eacb column and extends towards the upper left. 
We calculate the similarity between tbe series of 
bunsetsus on the left side of the path (sbl in Figure 
1) and the series under the path (sb2 in Figure 1) as 
a path score by the following four criteria: 
1. Basically the score of a path is tile sum of each 
element's points on the path. But if a part of the 
path is horizontal (a(i,j),a(i,j - 1)) as shown in 
Figure 2, which leads the bunsetsu correspondence 
of one element a(i, i) to two elements a(j- 1, j- 1) 
and a(j,j), the element's points a(i,j - 1) is not 
added to the path score. 
2. Since a pair of conjunctive phrases/clanses often 
appear ~s a similar structure, it is likely that 
both cmdunctive phrases/clauses contain nearly 
the same numbers of bunsetsus. Therefore, we 
impose penalty points on the pair of elements 
in the path which causes the one-to-plural bun- 
setsu correspondence so as to give a priority to 
the CS of the same size. Penalty point for 
AcrEs DE COLING-92, NANTES. 23-28 Aou'r 1992 1 7 2 Paoc. oF COLING-92, NANTES, AUO. 23-28. 1992 
'Fable 3: Separating levels (SLs). 
~: ~ co,,aitio, _, ~o ~,,:,~ot~o 
5 
--i 
--5 
2 
--1 
Being the KB of a conjunctive predicative clause, 
or accompanying a topi(:~ntarking postpositional 
particle ~' I~ " all(I comma. 
Accompanying a postpositional particle not (:re- 
sting a conjunctive nolul phrase and conlllla, or 
being an adverb aCColnpanyillg conlnla. 
Being the Renyoh-\[orm of a predicate which does 
1|o~ ~tccolnp~l/y conllna~ or accolnpanyil|g tt topic- 
marking postpositionM particle " t.t ", 
Being the KB of a conjunctive noun phrase ac- 
companyillg COlllnla, 
Accompanying a comma, or being the KB of 
a conjullctive IlOllll phrase not aCcolnparlying 
colnlna. 
(a(pl,j),a(pi+~,j - 1)) is calculated by the for 
mule (Figure 3), 
\[p, - pi+x - 11 X 2. 
Tim penalt,y points are subtracted from the path 
score. 
3. Since each phrase in the CS has a certain cO- 
herency of meaning, speciM words which separate 
the meaning in a sentence often limit the scope of 
a CS. If a path includes such words, we impose 
penalty points on the path so that the fmssihil - 
ity of including those are reduced. We define five 
'separating-levels' (SLs) for hunsetsus, which ex- 
press the strength of separating a sentence mean- 
ing (Table 3, of. Tahle 1). If bunsetsus on the left 
side of the path ~md under it include a bunsetsu 
whose SL is equal to KB's SI, or higher than it, 
we reduce the path score by 
(SL of the hunsetsu - KB's SL + 1) x 7. 
ltowever, two high SL bunsetsus corresponding 
to each other ofteu exist in a CS, and those do 
not limit the scope of the CS. For example, topic- 
marking postpositional particles correspond each 
other in the following sentential style, 
h ~. L'C ~ (As to A) .... -cab 9 (be), 
f~ ~ L~c ~ (~s to ~l) .... -c.~ (~e). 
Therefore, when two high SL bunsetsus corre- 
spond in a CS, that is, the path includes the ele- 
ment which indicates the similarity of them, and 
those are the 'same-type', the penalty points on 
them arc not axlded to tile path score. We define 
thc same-type bunsetsus ~LS two bunsetsus which 
satisfy the following two conditions. 
• IWs of them are of the same part of speech, and 
they have the identical inflection whcn they arc 
inflectional words. 
• AWs of them arc identical. 
Table 4: Words for honuses. 
~ll~t A-tA~ (~Josjultctive noun phrases 
4. Some words frequently become tile AW of the last 
bunsetsu in a CS or the IW following it. These 
words thus signal the end of the CS. Such words 
are shown in Table 4, Bonus points (6 points) are 
given to the path which indicates the CS ending 
with one of the words in Table 4, as that path 
shouhl he preferred. 
3.3 Finding the Conjunctive Struc- 
ture Scope 
As for each non-zero element in the lowest row ill a 
partial matrix A' in Figure l, we search for tile best 
path from it which has the greatest path score by a 
technique of the dynamic programming. Calculation 
is performed cohuun by columu in the left direction 
from a non-zero element. For each elenmnt in a col.. 
umn, the hast partial path including it in found by 
extending the partial paths from the previous cohmm 
and by choosing the path with the greatest score. 
Then among the paths to the leftmost column, the 
path which ha.s the greatest score becomes the best 
Now calculatill 8 
:" ill this column. ,--~vl t .... ~--.~---~---, ~---,.---~.--~---,---~--~..-, 
: :13 : : : . : : t : : : : l ~-+"~,:+-+ i q ~ .... +'- h J~ ~t ~th 
v... ¢.-+-v+---,-.-4 v ...... +--- :+...+..-+..-~ 
dregte~test+ : i:.-"' '" "~ ; ~.~.; "\[ Score path. 
...12:u'~ i~thtg i" i O'j 
e|emt~lt. 
Figure 4: The best path from a element. 
~:~-d ........... ~'.7 ~F "'~'"'~'~J'ne'n,aximtun path. 
<-> i.: >"~ L..L.~ 
"-4~Kn)o 2 0 5 0 4 o 
'llm ~o1~ of the ~-~ ......... l .............. conjunctive ~Lructme. ~ ~-~ ! ! 
~c&7) i 
~E 
Figure 5: The maxinmm path specifying a conjunc- 
tive structure. 
ACqES DE COLING-92, NANTES, 23-28 ^o~r 1992 1 7 3 PROC. OV COLING-92, NAWrEs, AIJo. 23-28, 1992 
path from the non-zero element (Figure 4). 
Of all the best paths from non-zero elements, the 
path which have the maximum path score defines the 
scope of bhe CS; i.e., the series of bunsetsus on tim left 
side of the maximum path and the series of bunsetsus 
under it are conjunctive (Figure 5). 
4 Experiments and Discussion 
We illustrate the effectiveness of our method by the 
analysis of 180 Japanese sentences. 60 sentences 
which are longer aud more complex than the aver- 
age sentences are collected from each of the following 
three sources; Encyclopedic Dictionary of Computer 
Science (EDCS) published by lwanami Publishing 
Co., Abstracts of papers of Japan Information Cen- 
ter of Science and Technology (JICST), and popular 
science journal, "Science", translated into Japanese 
(Vol.17,No.12 "Advanced Computing for Science"). 
Each group of 60 sentences consists of 20 sentences 
from 30 to 50 characters, 20 sentences from 50 to 80 
characters, and 20 sentences over 80 characters. 
As described in the preceding sections, many fac- 
tors have effects on the analysis of CSs, and it is very 
important to adjust the weights for each factor. The 
method of calculating the path score was adjusted 
during the experiments on 30 sentences out of 60 sen- 
tences from EDCS. Then the other 150 sentences are 
analyzed by these parameters. As the analyses were 
successful as shown in the following, this method can 
be regarded as properly representing the balanced 
weights on each factor. 
This method defines where the CS ends, that is, 
which bunsetsu corresponds to the KB. However, as 
for conjunctive noun phrases containing clause mod- 
ifiers or conjunctive predicative clauses, it is almost 
impossible to find out exactly where the CS starts, be- 
cause mm~y bunsetsus which modify right-hand bun- 
setsus exist in each part of the CSs and usnally they 
do not correspond exactly. Thus it is necessary to re- 
vise the starting position of the CS obtained by this 
method. We treat the actual prior part of a CS as 
extending to bunsetsus which modify a bunsetsu in 
the prior part of it obtained by this method, unless 
they contain comma or topic-marking postpositional 
particle " #2 "(ha). 
4.1 Examples of Correct Analysis 
Examples of correct analysis are shown in Figure 6- 
8. The revisions of CS scopes are shown in notes of 
each figure. Chains of alphabet symbols attached to 
matrix elements show the maximum path concerning 
the KB marked by the same alphabet and '>'. 
In the case of example(a) in Figure 6, the conjunc- 
tive noun phrase, in which eight nouns are conjuncted 
(chains of %', 'b', ... 'g'), is analyzed rightly thanks 
to the penalty points by SLs of every comma between 
nouns. Thus, the CS consisting of more than two 
|¢°2 2 2 2 2 2 t 4 0 2 2 0 0:2 0 J 0 2 0 2 (in~one.,e~,~'l) 
~:~. ~ ~ 5 .~ 5 5 2 0 2 ~ 2 0 2 0 2 0 2 2 2 (~clnmr~e) 
~b~C/.~. ~b7 ~ 5 5 2 0 2 2 2 0 2 0 :2 0 2 2 2 (collection) .\ 
~m4e. s, s ~ ~ 2 o 2 2 2 o ~ o 2 o 2 2 2 (r~.,~c~tlo,~) 
\ k~tt~l. ~ S S 2 o 2 ~ 2 o a o 2 o 2 2 2 (~mtd~) 
_... ............... , .... 
~m. 4& o 2 2 2 o ~ o 2 o 2 2 2 (~urWcmtum) 
tzU¢= 0 2 2 0 0 20~ 2 0 2 0 2 (~,.d~.mea~) 
~¢~i'~6 0 0 2 0 0 0 2hO 0 0 2 o (mhlU, d~) ~ 
'~:11, 12o o ~ o o ~o 2 o ~ (~a) ~II ~k 
0 0 2 0 0 2 ffa 3h 0 4 (n~,ne) 
~~t~L. 0 0 2 5 0 2 0 nbo (tail'/=) 
hoo o 0 o o 0 0 O (t,m) 
"~'~ o o 2 0 2 0 ~0 
~t>~l,: 0 0 2 0 0 0 (d~ 
~t~tcO 0 0 5 O0 
ll}tllo o 2 0 2 (athena.) 
~wtt ~ ~cludm,t 
It k a kind of ~c te~ce which analyz~ the e,seats ~uui tmatre related to info~ttation'$ 
occmrence, collection, systematiz~o~, ~afi~ t retrieval, uaderstendia,8, c,. com- 
mtmicmtlon, and application, tad so on, Lad inv~tigat~ social tdaplability of the 
clarified mass*. 
Figure 6: An example of analyzing conjunctive struc- 
tures (a) 
7't~szl!lllH$. 2 2 0 ~ 2 0 2 0 0 s 2 2 0 0 2 
MII~I~9 2 0 2 ~ 0 2 0 0 2 s 2 0 0 
lilll~ o 2 5 am ~1 0 0 2 2 ~ o o 2 
I~ o o 2 o Oml~O o o o 9 o 
~>'~. 2 o 2 o o 15.2 2 0 0 t2 
* M~'ko ~ o 0 2 2 ~ 0 0 2 
M~ o o 2 o obo o ~ o 
I~: o o o o ~b o o 
~'Og~ o o o o ~o 
~b> '~~ , 2 2 o o 12"02 
~$eO 0 2 
+\ +~tz 0 o o 
IKI~x'~ 0 0 
a "MOt ~/" ~" is iteiua~a. 
Pro~)ramming l~a~uises ame de fn~M to have objectives that they c~n d~cribe 
~arious co~Is of p~oblem fields, that they can ~Irictly describe algoritlm~ for *olving • problem, and that they cia drive fuactions of 
m computer tuffickmfly. 
Figure 7: An example of analyzing conjunctive struc- 
tures (b) 
0 (ProemmamlnS Imp) 
0 (d pr~m r~Idf) 
0 (v~ c~pm) 
2 (~ dmefit~) 
O (the) 
0 ( m f4ro~31mm) 
7 (f~ talv~n$) 
0 (~g~at4mm) 
o (tmct~a) 
0 (~ffwit lly) 
2 (c~ cbw~) 
2 (~) 
parts is expressed by tile repetition of the combina- 
tion of CSs consisting of two parts, in this exam- 
ple, also the conjunctive predicative clause is analyzed 
rightly (chains of 'h'). 
In the case of example(b) in Figure 7, the CS which 
consists of three noun phrases containing modifier 
clauses is detected as tile combination of the two con- 
secutive CSs like example(a) (chain of 'a' and 'b'). 
In tile case of example(c) ill Figure 8, the con- 
junctive noun phrase and the conjunctive predicative 
clause containing it is analyzed rightly. In this exam- 
ple, the successful analysis is due to the penalty points 
by SL of the topic-marking postpositional particle " 
" in "~ff~g~l~rl~t (a computational e~:periment)" 
and ",~1~ (in that)" which are the outside of tile 
CS and the bonus points by the AW " ~ v, 5 " in the 
last bunsetsu of the CS . 
AcrEs DE COLING-92, NANTES. 23-28 Aot~'r 1992 1 7 4 PROC. OF COLING-92, NANTES, AUO. 23-28, 1992 
L . - ,Source 
Length .... 
'Fable 5: Results of experiments 
30-,~o \[ 5~:ho I s~-~a9 30-5o I so~so I st~44 \[ .~5o I s~so 
~nTohe cox~junctive 5 7 : : 5 un phrases \[ Success 
Failure \] 3 \[ '~ 
\[ "9 
The conjm~ctive Success 6 
i ..... plete structures }:allure 0 0~~ \] 0 0 ,-9 0 
L,~.~. 0 0 0 0 0 0 0 0 0 0 o 0 0 0 0 0 0 I~ttI~IyS~. ~ 2 0 6 2 2 0 6 ~ 0 2 0 s 0 6 0 
• ~¢:6~ 4 o 4 2 2 o ~ ~ o 2 o s o 4 o 
'~ 042 2 O4202O2040 
• nI~tz o o o ~ o o o o o o o o o 
~l~a) 2 2 0 l/)g 0 ~ 0 4 o 17 0 
~t~b~ o 2 o 2 2.0 2 o 2 o ~ 0 ~ 
i-6 0 0 o 0 2~0 2 0 0 0 2 
C'. ~ ~> 02202102050 
~'~'/)b. 0 o 0 2 o ~o o 0 2 
~@~o o o o o Ol~O o 
b>~ I~" Sb 0 2 0 4 o 1~ 0 
>~¢~StL~, o 2 o o o 2 
incx,)+), o 4 o 
~151 ,t '9 ~o o 
¢ <m~: , 
(9~el~t(nl) 
(^~) 
(, ~apuade~ 0 e~pcamem) 
0 
(~ptrin~ ~s) 
0 (t~ 6ar~) 
0 
o (c~ and) 
o 
(t~ m~) 
0 
0 
0 
And a compulBtioual experiment is bener ill that infeasible cxpaiments can be dolm slid paran~eters il~z~cesuible to ¢xpelltllent or ~?aliOll 
¢il1 be measttred 
Figure 8: All example of analyzing conjunctive struc- 
tures (C). 
4.2 Experinaental Evaluation 
We evaluated tile analysis result of 180 Japanese sen- 
tences by hand. The results of cvaluatlug every sen- 
tence by each CS type are shown in Table 5. If tile 
same typc CSs exist two or more ill a sentence, the 
analysis is regarded as a success only when all of them 
are analyzed rightly. 
There arc 144 conjunctive noun phrases ill 180 sen- 
tences, and ll9 phrases among them are analyzed 
rightly. Tbe success ratio is 83%. There are 118 con- 
junctive predicative clauses ill 180 sentences, and 94 
clauses among them are analyzed rightly. The suc- 
cess ratio is 80%. There are 3 pairs of the conjunctive 
incomplete structures, and all of them are analyzed 
rightly. 
As showu in \]'able 5, the sucecss rate for tile Sell- 
tences from J1CST abstracts arc worse than that of 
the sentences from other sources. The reason for the 
failures is that tile sentences are often very ambiguous 
and confusing even for a lluman because they have too 
many contents in a sentence to satisfy the limitation 
of tile docnment size. 
4.3 Examples of Incorrect Analysis 
and Solutions for Them 
Wc give examples of failure of analysis (Table 6, Fig- 
urc 9), and indicate st)lutions for them. In Table 6, 
underlined parts show the KBs, I- ...d shows tile 
wrongly analyzed scope, and r ... j shows the right 
scope. 
• It is essential ill this method to define the appro- 
priate similarity between words. Thus changing 
the sinlilarity points for more detailed groups of 
parts of speech (e,g. nouns call be divided into 
ilul~lerals~ proper nonns, conlmon nouns, and ac- 
tion nouns which becomc verbs by the combiua- 
tion with " ~-~ (do)") can improve the accuracy 
of the anMysis. For example, the example(i) in 
'Fable 6 may bc analyzed rightly if the similar- 
ity points between action noun "t1~\[~ (extension)" 
and action noun "t~'f (maintenance)" is greater 
than that between action noun " t1~ (extension)" 
and common noun " ~1~ (di~cully)". 
• Semantic similarities between words are currently 
calculated only by using BOIl which do not con- 
tain technical terms. If tile sinfilarity points be- 
tween technical terms can be given by thesaurus, 
tile accuracy of tile analysis will be improved. 
Example(ii) will be analyzed rightly if greater 
points are given to tile similarity between " T P 
"T 4 7". -k 4.-- b ~f~'~ ( Actlve Chart Parsing)" 
and "llPSG( Head-drtve, Phrase Structure Gram- 
i lly the additional usage of relatively simple syn- 
tactic conditions, some sentences which are an- 
alyzed wrongly by this method will be analyzed 
rightly. For example, because Japanese modi- 
fier/modifyee relations, inchnling the relation be- 
tween a verb and its case frame elements, do 
not erc~s each other, the modifier/modifyee re- 
lations in nmm phrases and predicative clauses do 
not spread beyond each phrase or clause, except 
the relation concerning the last bunsetsu of them. 
This condition is not satisfied by the analyzed CS 
in the example(ill) whose prior noun phrase con- 
tains no verb related with the case frame element " 
~,~" (grammar)". By this condition it can be~-~- 
timated that only " 17~1/~\[0 (natural langlage) 
MI~ ~ (analysis and)" or "~i:~: (analysis and)" 
AODdS Dr COLING-92, NANTES, 23-28 hOt~T 1992 17 $ PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 
Table 6: Examples of failure of analysis. 
(i) r ta.b (these) ~© (of analysis methods) ~i~ L ¢c (common) Pdl~ & L T_ (as problems) ~l~lJ~: (grammar 
rules) 9k ~ < ~k-9 ~: (increasing) Jib.a9 (in the case) \[-~111.\[O (of rules) \[~t~-af , (extension and) {~ ,'~a9 (of mainte- 
.ance)l \[\]~A~$~ ( di.Oieultv) 3 3:=Hr b Sb ~5 (can be thought). 
(ii) ... H;tg~Jt~3~ll~to~t~ (Japanese dialogue analytic module), r V~ltyf~© (o~ the analysis procesa)~l~r~ 
(control) I~t ~ ~ (be free) T ~' ~" 4 P" " ~" ~r -- b ~:~£~ )" (Active Chart Parsing and) ffl--~ ~ (on unification) ~"~ ~ ~ 
(based) ~t~W - tl~ (lexicon based) 3~.tt:~tg~"C ~b i5 (being the grammatical framework)3 H P S G ~" ( HPSG)J 
J~ L ~ ~ ~, (be adopted). 
(iii) \[-~--t9 (one) 3~ (grammaO ~1~¢~\]~© (natural language) t~lq~ ~ (analysis and) t5~3~2 (generation)J fIJ~a 
(using) ~gl5\]~3~je~a) (of bi-directional grammar) ~?l (the research), 3 ilf~f~agA:i)~t9 ¢~ (in point of computa- 
tional linguistics), ~ll~ll~t~ (machine translation and) 1~t~4 Y • 7 ~- :x ~. w-9 ?c (such as natural language in- 
terlace) l~Jl~J~¢9 ~ (from the point of view of an application} ~.'Oi15~a (be importaut). (73chs) 
(iv) ~ (in fact), tll~l~ ~ ~ (authors) ~ r U k'b ~" (it) ~E.9"X: (using), ~ll)'J~ff~l~J~r~: (gravitationally interacting) 3F.~. 
"j" ~5 (governing)J ~{tgto (astronomical) ~-gw'~ (about the motion), ?~¢I~PJ£~ (high-precision) ill.a) (high-speed) 
I~1"~ ~: (numerical computation) ~ ~ t5 (can) ~ 4 ~ ~ x. . ~ t. ~3 -- & ~ 5 (called Digital Orrery) ~ = :/~" ~ --$t -- 
t (s~vial-pu,'pase computer)~t.'r ~ ~ (create). J 
(v) ... f \[-~II+3~I~T~ (for illegal sentences) ~3\[:~:'~ (termination and) Ikl~ ?5 (outputted) ~ (o) sen'fences) ~bw 
~ t~ ~ a) (of ambiguities)3 .\]:~t~:'gv~'~ (about the maximum)J ~rk v. (there is no guarantee). (vi) 
... r~ ~ (/or every expression) J~btc (prepared) \[-~aa) ( in a combinative structure) ~__._~ 
~ (combinative elements) 3~r~O (in a sentence) lgh~.~ ~" ¢9 (between case elements)3 J ~.~: (correspondence) ... 
i&~&l= 0 2 2 2 2 2 (tctev~lexp*~*au) i ~L?: O 
0 0 0 0 <pr+~.,tred) 4\XI~'~I~I~*6D g 81 5 2 (ham~mbinltivel~alcture) 
Figure 9: An example of failure of analysis. 
can be the prior part of the CS. We are plan- 
ning to do such a correction in the next stage of 
the syntactic analysis, which analyzes all modi- 
fier/modifyee relations in a sentence using the CS 
scopes detected by this method. 
• in example(iv), the KB in the beginning part of 
a sentence corresponds to the last CB. That is, a 
short part of a sentence corresponds to the follow- 
ing long part. It is very difficult to analyze such 
an extremely unbalanced CS because this method 
gives a priority to similar CSs. In order to ana- 
lyze example(iv) the causal relationship between 
"~1~-9"C (usiug)" and "~tr~'J~z~ (create)" will be 
necessary. 
• Some sentences analyzed incorrectly are too sub- 
tle even for a human to find the right CSs. Exam- 
pie(v) cannot be analyzed rightly without expert 
knowledge. 
• This method cannot handle the CSs in which the 
prior part contains some modifiers and the poste- 
rior part contains nothing corresponding to them 
(example(vi), Figure 9). For these structures we 
must think the path extending upward in a partial 
matrix, but it is impossible by the criteria about 
word similarities alone. 
The CSs such as example(v) and example(vi) can- 
not be analyzed correctly without semantic informs- 
tion. fIowever such expressions are very few in actual 
text. 
5 Concluding Remarks 
We have shown that varieties of parallel structures 
in Japanese sentences can be detected by the method 
explained in this paper. As the result, a long sentence 
can be reduced into a short one, and the success rate 
of syntactic analysis of these long sentences will bc- 
come very high. 
There are still some conjunctive expressions which 
cannot be recognized by the proposed method, and 
we are tempted to rely on semantic information to get 
proper analyses for these remaining cases. Semantic 
information, however, is not so reliable as syntactic 
information, and we have to make further efforts to 
find out syntactic rather than semantic relations in 
these difficult cases. We think that it is possible. One 
thing which is certain is that we have to see many 
more components simultaneously in a wider range of 
word strings of a loug sentence. 
ACRES DE COLING-92, NANTES, 23-28 not~r 1992 1 7 6 PROC. OF COLING-92. NANTES, AUG. 23-28, 1992 

References 

\[1\] M. Nagao, J. Tsujii, N. Tanaka, M. lshikawa 
(1983) Conjunctive Phrases in Scientific 
and Technical Papers and Their Analysis (in 
Japanese). IPSJ- WG, NL-36-4. 

\[2\] K. Shudo, K. Yoshimura, K. Tsuda (1986) Co- 
ordinate Structures in Japanese Tehuical Sen- 
tences (in Japanese). 7~'ans.IPS Japan, Vol.27, 
No.2, pp.183-190. 

\[3\] The National Language Research Institute 
(1964) Bunrui Goi Hyou (in Japanese). Shu- 
uei Publishing. 
