Informed Parsing for Coordination 
with Combinatory Categorial Grammar 
Jong C. Park Hyung Joon Cho 
parkOc s. kaist, ac. kr hj cho@nlp, kaist, ac. kr 
(~,oI\[li)llt;er Science Division and 
Advanced Intbrmation Technology Research Center 
Korea Advanced Institute of Science and Teclmology 
373-1 (\]usung-dong, Yusong-gu, Da(don 305-701, K()IREA 
Abstract 
Coordination in natural  hamlmrs ef 
tieient parsing, especially due to the mul- 
tiple and mostly unintended candidate con- 
juncts/disjmmts in a given sentence that shows 
structural aml)iguity. The l)rol)lem gets more 
serious in a coml)inatory categorial grammar 
framework, which is well known \[or its coral)e- 
tent treatment; of coordination, as tlm tlexil)ility 
of syntactic analysis ofl;en strikes back as Sl)U- 
rious amt)iguity. We t)rot)ose to address these 
ambiguities with predicate argument structures 
and semantic ('o-occurrence similarity intbrma- 
tion, and present encouraging results. 
1 Introduction 
Sentences with eoor(lination (:ontaill multil)le 
phrases of like syntacti(; tyl)e. When the given 
sentence shows structm'al ambig'lfit;y, tiler(; may 
1)e multii)le pairs of candidates tbr l)ossil)le con- 
juncts/disjuncts, usually a single pair of which 
is identified as intended by human  un- 
derstanders, l?arsing for coordination should 
thus find the exact syntactic t)oundaries of these 
"intended" conjuncts/(lisjuncts. Previous work 
elnployed a 1)reproccssing module for parsing to 
work on a constrained range of the candidate 
conjmwts/disjuncts (Kurohashi & Nagao, 1994; 
Yang, 1995; Okmnura & Muraki, 1994). 
Coml)inatory categorial grammars (Steed- 
man, 1990; 2000) arc known to explain a wide 
range of syntactic phenonmna, such as coor- 
dination, extraction and long distance depen- 
dency, without employing the no~ions of move- 
ment and empty categories. While CCGs oiler 
explanations ibr various natural  phe- 
nomena with limited combinatory rules such ms 
type raising and flmction composition, these are 
also well known to increase the complexity of 
parsing, giving rise to quite a few irrelevant syn- 
tactic analyses as well as relewmt ones~ a phe- 
nomenon often called as spm'ious ambiguity. 
In this paper, we propose to address the 
two types of ambiguity with predicate argument 
structures and semantic co-occurrence similar- 
ity intbrmation. 1 For a more concrete discus- 
sion, we focus in this paper on coordination in 
Korean. First, related work is reviewed in {i2. 
The two types of anfl)iguity arc then discussed 
in !}3. We also examine the characteristics of 
coordination in a (:ortms in §4. In §5, we show 
ore: proposal to enhance t)arsing efficiency, with 
encouraging ext)erimental results. 
2 Related Work 
2.1 Conjunct Identification 
First, we review three of the techniques that at- 
tempt to narrow down the candidate con.juncts. 
2.1.1 Complex Information 
At)t)roaches that use conlplex information are 
baaed on the assmnl)tion that there are vari- 
ous clues for morphological and semantic sire- 
ilarity between the pair of matching con- 
juncts/disjuncts. The usual measure tbr such 
similarity includes the part-of-st)eech (pos) tea- 
ture intbrmation. The proposal by Agarwal and 
Boggess (1992) for English coordination uti- 
lizes a semi-parser ibr the assignment of perti- 
nent semantic and morphologicM inibrmation to 
each lexical item in a given sentence. The pro- 
cedure for the conjunct/disjunct identification 
with this infbrmation is sunnnarized below. 
1. Keel) pushing the lexical items in a given sentence 
into the stack until a coordination item such as and, 
or, but, etc, is encountered. 
1This work was supported by the Korea Science and 
Engineering Foundation (KOSEF) through AITrc. The 
second author is now with SK Telctcch in Korea. 
593 
2. With the coordination item, treat the immediately 
following phrase as the post-conjunct/disjuuct. 
3. Pop the lexical items fi'om the stack one by one, 
comparing their morphological and/or semantic 
tbatures with those of the post-conjunet/disjunct. 
The reported precision is 81%, but the method 
identifies only tile starting positions of the con- 
juncts/disjuncts. 
The method proposed by Okumura and Mu- 
ram (1994) for English coordination looks into 
the symmetric patterns of conjuncts/disjuncts. 
These symmetric patterns are classified into 
tbur categories: phrasal/clausal patterns, lexi- 
cal patterns, morphological patterns, and com- 
plex patterns. The first; three patterns are de- 
fined in terms of the respective features in lex- 
ical items. The best match among all the pos- 
sible word sequences with these features before 
and after the coordination item is passed over 
to the parser. The reported precision is 75%. 
This method assigns various weights to the fea- 
tures tbr the measure of the symmetric pat- 
terns. It is not (:lear if the weight assigmnent 
method is principled and if it can also address 
conjuncts/disjuncts with ellipsis. 
2.1.2 Co-Occurrence Information 
Yang (1995) utilized co-occurrence information 
tbr the resolution of structural ambiguity in Ko- 
rean noun phrase coordination. The method 
looks ut) the related pair of nouns and verbs 
ti'om a large corpus and uses the statistics on 
the case inibnnation of the nouns with respect 
to the given verb for the similarity informa- 
tion among nouns. The co-occurrence similarity 
DSim(rq, n2) between the nouns n, and n2 is 
defined as tbllows, which incorporates tile pre- 
diction that the similarity of the two nouns is 
higher when they are more frequently used with 
the same verb of the same syntactic category. 
2. n2)l 
DSim(nl,u2) - +  .calVg(n2)l 
where 2 
• G =- {subj, obj, loca, last, modi} 
• Vg(n) =- {v\[v is a verb such that fg(n,v) > 1} 
• I~(,~)l = E~v.(,,)f.(~, v) 
2fe(n, v) is the nunlber of times that noun n of type 
g occurs with verb v in tlm stone sentence in a corpus. 
• IC~,(,~,,,~=)l- Emin{f~(ni,v), fg(n2,v)} 
veCVg (at,n2) 
This method makes it; possible to resolve the 
syntactic ambiguity due to coordination in Ko- 
rean as illustrated in (1) and (2). a DSim is 
used to predict correctly that 'suthayk' (stack) 
is in coordination with 'khyu' (queue) and not 
with 'yey' (example). Likewise, 'kyesan' (com- 
putation) and 'thmnsayk' (search) are correctly 
identified to be in coordination with each other. 
(1) suthayk-~-£\] khyu-uy yey 
stack-co queue-poss exmnplc 
exaInple(s) of stack(s) and queue(s) 
DSim(suttr,~yk,khyu) = 0.319 
DSim(suthayk,yey) = 0.053 
(2) haysing hamswu-uy kyeysan-\[~ pekheys-uy 
thamsayk-ey soyotoynun sikan 
hashing fimction-poss comlmtation-co bucket-poss 
search-loc spending time 
the time spent for the comtmtation of the hashing 
thnction and the sem-ch for the bucket 
DSim(kyeysan,pekheys) = 0.024 
DSim(kyeysan,thamsayk) = 0.303 
DSim(kyeysan,sikan) = 0.067 
In addition to the fact that the method may snf 
fer from data sparseness, it is also not directly 
~pplicable when nouns are polysemous or when 
the conjunct/disjunct nouns have weak seman- 
tic similarity (cf. (3)). 
(3) i notu-nun teyitha yoso-\[~ lisutu-uy tamn wenso- 
hfl cisiha-nun phointhe-lul phohamha-n-ta 
this node-nora data element-co list-poss next 
element-ace indieate-un pointer-ace eontain-pres- 
decl 
'this node contains the data element and the pointer 
that indicates the next element in the list' 
Its reported precision is 85.8%, and the recall 
95.9%. We note that this co-occurrence simi- 
larity information is useful when large-scale lin- 
guistic knowledge bases such as WordNet are 
not available for the  in question. 
2.1.3 Part-Of-Speech Patterns 
The method with pos patterns (Park, 1998) 
extracts sentences with coordination from a 
pos-tagged corpus, trains the system with 
the pos patterns of the left- and right- 
conjuncts/disjuncts, and resolves the ambiguity 
with trained pos patterns (cf. (4)). 
3Coordination items are shown in a box. We use the 
Yale notation for transcribing Korean alphabets. 
594 
(4) ha-nun \[kemjeng ppang-\[~ huyn plmngl-ul :,tel:- 
un salam-ul poass-ta 
I-nora \[black bread-co white bread\]-acc eat-un 
person-acc see-past-decl 
~I saw the person who ate black bread and white 
bread' 
The longest pattern is selected when there are 
multiple candidate pos-patterns. The reported 
precision is 71.6%. When there is structural 
anfl)iguity ms shown in (5) below, however, it 
is difficult to identi\[y the right conjunct if the 
system considers only pos inibrmation, where 
the longest match is 'mangko-lul swuipha-mm 
nala' (the nation that imports mangos), not the 
intended hnangko' (mango). 
(5) phainayphul-~ mangko-lul swuipha-nun nala 
pineaplfle-eo mango-ace import-un nation 
'the nation that iml)ort;s pineaptfles or mangos' 
2.2 Combinatory Categorial Grammar 
for Korean 
Examples of coordination in Korean are. shown 
below. 
(6) \[chelswu-nul: uysa-ka\]\[7\] \[yenghi-nun sensayngniin-i\] 
toy-ess-ta 
\[chelswu-to I, doetor-comp\], \[yenghi-top teach-comp\] 
become-past-decl 
~Chelswu becmne a doctor, and Yenghi a teacher' 
(7) \[kochwukapsi olu-myen kochwu-lul\]\[~\] \[tway- 
cikokikapsi oht-myen twaycikoki-hfl\] swuipha-n-ta. 
\[peplmr-price-nom rise-if pepimr-acc\], \[pork-price- 
n01n rise.-if pork-ace\] import-pres-decl 
'(The govermnent) imports 1)epl)ers if the price for 
pel)l)ers rises, and i)ork if the 1)rice for pork rises' 
In CCGs, tyl)e raising and function comt)o- 
sition ruh's are typically utilized to corn(; up 
with single categories for those ti'agments above 
in square t)rackets, often called qmn-standard' 
constituents. ~D~ble 1 shows the reduction rules 
proposed tbr Korean (Cho, 2000; Cho and Park, 
2000). Figure 1 shows a sample syntactic and 
semantic deriw, tion of part of (6). Space pre- 
cludes further explanation of the tbrmalism. 
3 Two Types of Ambiguity 
3.1 Spurious Ambiguity 
CCGs provide two syntactic derivations tbr 
the semantically unaml)iguous sentence (8), as 
shown in (9) and (10). 
(8) chelswu-l:a sakwa-hfl mel:-ess-ta 
Chelswu-nom at)pie-ace eat-past-decl 
1. \[\[chelswu-nun sakwa-lul\] mek\]-ko 
\[\[yenghi-nun ttalki-hfl\] mekl-nunta 
2. \[\[chelswu-nun sakwa-lul\] illekl-ko 
\[yenghi-nun \[ttalki-hd mek\]l-nunta 
a. \[a,elswu-::.,, \[sakwa-ha ,1:ek\]l-ko 
\[\[yenghi-nun ttalki-hfl\] mekl-nu,:ta 
4. \[chelswu-mm \[sakwa-lul mek\]\]-ko 
\[yenghi-nun \[ttalki-lul mekll-nunta 
Table 2: Example Spurious Ambiguity 
'Chelswu ate an apl)le/at)l)les' 
(9) \[\[chelswu-ka sakwa-lul\] mek-e.ss-ta\] 
(10) \[chelswu-ka \[sakwa-M mek-ess-ta\]\] 
These distinct syntactic derivations make it 1)os- 
sible tbr a CCG to correctly analyze sentences 
with coordination, as shown in (11) and (12). 
(11) \[\[chelswu-ka sakw>lul\]\[~ \[ycnghi-ka ttalki-hfl\]\] 
mek-ess-ta\] 
'Chelswu ate al)l)les and Yenghi strawberries' 
(12) \[chelswu-ka \[\[son-ul ssis\]-~ \[sakwa-hfl mek-ess\]- ta\]\] 
'Chelswu washed his hands and ate apples' 
Spurious anfl)iguity retbrs to the phellomenon of 
this kind in which there are multiple, syntactic 
derivations, as in (9) and (10), tot' a ti'agment 
that is semantically unmnbiguous. 4 While de- 
scriptively justifiable, it; nevertheless results in 
a repeated comput.at:ion of the stone fragments 
that are semanticMly indistinct, adversely af- 
fecting parsing efIiciency. This prot)lem gets 
more serious with coordination (of. (13)). 
(13) chelswu-mm sakwa-h:l rock-Jim\] yenghi-nun ttalld- 
hfl mek-nun-ta 
'Chelswu eats apples and Yenghi strawberries' 
Table 2 shows four syntactic derivations tbr (13), 
all with identical semantics. 
3.2 Struetural Ambiguity 
The spurious ambiguity as discussed above does 
not give rise to wrong syntactic analyses, but 
the structural ambiguity may, as shown in (14). 
(14) cengchi-uy canglay-F~ \] nongmintul-uy 
saynghwalsang-uy mwuncay-wa-nun keli-ka 
mel-ta 
4Note that there are arguments that some of these 
syntactic derivations are associated with distinct prag- 
matic, fimctions, so that the ambiguity might not be en- 
tirely 'slmrious' (Prevost, 1995). 
595 
Reduction Rule Rule Name Rule Symbol 
X/Y Y ~ X Forward Al)plication > 
Y X\Y -+ X Backward Application < 
X conj X -+ X Coordination < </~'~ > 
X/Y Y/Z --+ X/Z Forward Composition > B 
Y\Z X\Y -~ X\Z Backward Composition </3 
X/Y Y\Z ~ X\Z Forward Crossed Composition > Bz 
X --+ T/(T\X) Forward Type Raising > T 
X -+ T\(T/X) Backward Type Raising < T 
Table 1: Reduction Rules in a CCG tbr Korean 
chelswu - nun 
: AI.I ehelswu 
uysa - ka 
(S,,\NP,,)/(S~\NP},\NP~) 
: Xf.f doctor > B 
S~/(S,,\NP,,\NP~) , 
: Af.f doctor chelswu 
, yenghi - nun sensayngnim - i 
conj S~/(S,~\NP,, I (S,,\NP,,)/(S,~\NP@NP~) 
: and'(X, Y) : Af.f yenghi : Af.f teacher 
> B S,,/(S.\Nn~\NP~) 
: Af.f teaeh, er yenghi <¢> 
, , : Af.and' (f doctor chelswu , f teacher yenghi ) 
Figure 1: Sample CCG Derivation 
politics-poss fllture-co farmers-poss life-poss 
problem-wa-top distance-nora far-deel 
'(It is) far from the problems of the fllt, ure of politics 
or the lives of farmers' 
Table 3 shows six of the syntactic derivations. 
While the semantics makes it clear that only 
the derivation 2 is the intended one, it is im- 
possible for a parser with only syntactic infor- 
mation to tell the difference. Notice that this 
problem is not mfique to a CCG-based parser. 
While the general solution would obviously re- 
quire not only semantic infbrmation but, also 
pragmatic and discourse information, we exam- 
ine approaches that take into account only se- 
mantic information in this paper. 
4 Coordination in Corpus 
In an attempt to assess the coordination phe- 
nolnenon in ~t realistic manner, we examine a 
pos-tagged corpus available at KAIST. The cor- 
pus contains newspaper articles (40,428 eojeol), 
essays (41,666 eojeol), textbooks (50,208 eo- 
jeol), technical documents (2,729 eojeol), novels 
(40,498 eojeol), with the total of 175,524 eojeol 
and 17,123 sentences. ~ Table 4 shows the types 
5Eojeol is a unit in Korean that roughly corresponds 
to space-delimited words in English. Each eojeol con- 
tains both the stem and its morphological endings. 
1. \[cengchi-uy canglay\]-na \[nongmintul\]-uy 
saynghwalsang-uy mwuncey 
2. \[cengchi-uy canglay\]-lm \[nongmintul-uy 
saynghwalsang\]-uy mwuncey 
3. \[eengehi-uy canglay\]-na \[nongmintul-uy 
saynghwalsang-uy mwuneey\] 
4. cengchi-uy \[canglay\]-na \[nongmintul\]-uy 
saynghwalsang-uy mwuncey 
5. cengchi-uy \[eanglay\]-na \[nongmintul-uy 
saynghwalsang\]-uy mwuncey 
6. cengchi-uy \[canglay\]-na \[nongmintul-uy 
s~tynghwalsmlg-uy mwuneey\] 
Table 3: Example Structural Ambiguity 
and frequencies of sentences containing coordi- 
nation in this corpus. We used PERL scripts 
to narrow down the sentences meeting certain 
basic conditions for coordination and manually 
identified sentences with coordination among 
those chosen. G Table 4 indicates that coordi- 
nation is used quite often in Korean. 
5 Parsing for Coordination 
In this section, we present techniques of dealing 
with coordination for efficient: parsing. 
6A pos-tagged corpus does not, give sufficient infor- 
mation for the fully automatic identification of sentences 
with coordination, since we also need to take the senten- 
tial semantics into account. 
596 
Coordination Item I1 
ending 
l) OStl)osition 
adverb 
C()llllll{~ 
Articles 
728 (21.4%) 
671 (19.7%) 
91 (2.7%) 
154 (4.5%) 
Essays 
1092 (33.7%) 
352 (10.7%) 
22 (0.68%) 
126 (3.9%) 
Textbooks 
1101 (20.1%) 
812 (14.8%) 
58 (1.1%) 
216 (3.9%) 
Documents 
46 (40%) 
23 (2o%) 
6 (5.2%) 
21 (18.3%) 
Novels 
2174 (44.5%) 
320 (6.6%) 
17 (0.35%) 
43 (0.88%) 
Total 
5142 (30.o%) 
2178 (12.7%) 
194 (1%) 
605 (3.5%) 
Total Sentences \]\[ 3403 3239 5485 115 4881 17123 (100%) 
Tal)le 4: Characteristics of Coordination in Korean 
5.1 Predicate Argument Structure 
We used the CKY algorithm to imt)lement a 
CCG parser. As discussed, the l)resence of Slm- 
rious ambiguity in a given sentence ibrces re- 
t)eated syntactic analyses for fragments with 
identical semantics. This can be avoided if we 
use the same cell to record the synta(:tie anal- 
yses with the same semantics. 7 CCG makes 
this 1)ossit)le, as both the syntactic and semantic 
derivations are constructed in tmldeln. Table 5 
shows part of the t)arsing table for (13). The tbl- 
lowing shows the relevant syntactic categories. 
• chelswu-mm, yenghi-nun : .s/(.~\npu) 
• sakwa-lul, ttalki-hfl : (s/np,,)/(.~\npu\np,) 
• reek: s\np,-~\np,~ 
• ko : conj 
The cells (C1,R3) and (C5,R3) each contain 
two syntactic analyses with the same semantics, 
resulting in tbur syntactic analyses with the 
sanle semantics in the cell (CI,R7). We can 
1)revent su(:h multil)le analyses 1)y not writing 
into the cell a synta(:tic analysis with the same 
recorded semantics. We thus have one syntactic 
analysis for each of the cells (C1,R3), (C5,R3) 
and (C1,R7). 1,br longer sentences, we ext)e('t a 
significant redu('tion in the number of derived 
syntactic analyses, as also partly verilied by 
our experiments. 
5.2 Co-Occurrence Similarity 
5.2.1 The .Algorithm 
Among the pairs of head nouns of can- 
didal;e conjuncts/disjuncts, al)out 88.1% 
pairs are identified as semantically related 
in our corpus. It is thus reasonable to 
consider only those candidates with some 
semantic relation. We use the following mod- 
itication of Yang's (1995) original proposal. 
rCL Ka,rttmmn, 1989; Eisner, 1996; Komagata, 1999 
1 Whenever the coordination reduction rule is in- 
voked, check the syntactic categories of the can- 
didate conjuncts. 
2 If they are nouns or noun phrases, skip to the next 
step. Otherwise write them to n cell. 
3 Locate the head nouns in the candidate conjuncts. 
4 Comlmte the co-occurrence similarity of the head 
ltOllllS. 
5 If the coordinatioi~ reduction rule has already 
been applied to the he.ad noun of the left candidate 
conjunct~ compare the co-occurrence siinilarity of 
the recorded and the new. If the co-occurrence 
similarity of the newly identified candidate con- 
j,mcts is stronger, write them to a cell, and delete 
the existing candidates in the cell. Otherwise, dis- 
card the new and retain the old. 
6 U1)datc the list of conjuncts whenever there is a 
newly recorded candidate conjunct. 
For the co-occurrence similarity intbrmation 
between llOilllS, we used another KAIST cor- 
tins that is lna,mMly tree-tagged (Lee, 1998). It 
contains al)out 31,000 sentences with 352,730 
eojeol. The average sentence length is 11.35 
words. The domains include newst)at)er edito- 
rials, economy, religion, science fiction, exl)edi- 
tion, novels, and history. We have considered 
only nominative, accusative, adverbial, comple- 
ment, and adnominal cases in relation to the 
verbs tbr the extraction of co-occm:rence simi- 
larity intbrmation, which is then recorded into a 
dictionary and is looked ll t3 \[)y the parser when 
it deals with coordination. 
5.2.2 Thesaurus 
The use of co-occurrence similarity information 
as in (Yang, 1995) suffers ti'om data sparseness, 
especially since we have a relatively small corpus 
with fairly unrestricted domains, s For instance, 
(15) and (16) below show examples of wrong 
co-occurrence similarity intbrmation. 
(15) kimchi-wa 1)al)(0)-man CiVll-Illln kes(0.002252)-ita 
kimchee-co steamed rice-only give-top thing-decl. 
'(They) served only kimchee and steamed rice' 
kimchi: 3 verbs, pal): 9 verbs, kes: 2083 verbs 
sin contrast, Yang used a corpus with one inillion 
eojeol and restricted to a comImter science domain. 
597 
I~7 
R3 
R2 
11,1 
II 
s 
and'mekta'sakwa'chelswu' 
mekta'ttMki'yenghi' 
\[\[chelswu-ka sakwa-lul\] mek\]ko 
\[\[yenghl-ka ttalki-lul\] reek\] 
\[\[chelswu-ka sakwa-lul\] mek\]ko 
\[yenghi-ka \[ttMki-lul reek\]\] 
\[chelswu-ka \[sakwa-luI mek\]\]ko 
\[\[yenghi-ka ttalki-lul\] reek\] 
\[chelswu-ka \[sakwa-lul mek\]\]ko 
\[yenghl-ka \[ttMki-lul reek\]\] (ez,aa)+(es,na) 
8 
mekta'sakw~t'chelswu' 
\[\[chelswu-ka sakwa-lul\]mek\], 
\[chelswu-ktt \[sakwa-lul reek\]\] (C1,R2)+(ca,R1), 
((Jl,II,1)+(C2,R2) s\npn 
hy.mckta'sakwa'y 
\[sakwa-lul rock\] 
(C2,Ftl)+(Ca,R1) 
)~ f.sakwa'chelswu' 
\[chelswu-ka sakwa-hll\] (O,,lU)+(c2,ttl) 
chelswu-ka sakwa-lul 
C1 C2 
lnek ko 
I ca leVI 0,5 
s\npn .ky.mekta%talki'y 
\[ttalki-lul rock\] (06,m)+(07,i~ 1) 
ttMki-lul rock 
cG I cr I 
8 
mekt ,'C t t alld'yenghi' 
\[\[yenghl-ka ttalki-lul\]mek\], 
\[yenghi-ka \[ttalki-lul rock\]\] 
(C5,R2)q-(cr,R.1), 
(cs,rtl)+(c6,R2) 
s/(s\npn \npa ) 
5,f.ttalki'yenghi' 
\[yenghi-ka ttMki-lul\] (c5,m)+(ca,R1) 
ycnghi4<a 
Table 5: Sample CKY Parsing Table 
(16) os--kwa cangsingkwu(0)-hll yenkwuha-nun 
kes(0.008647)-iess-suImit a- 
os: 46 verbs, cangsinkwu: 2 verbs, kes: 2O83 verbs 
In (15), there are three verbs that occur with 
'kimchi' (kimchee), and nine verbs that occur 
with 'pap' (steamed rice), significantly fewer 
than those verbs that occur with 'kes' (thing). 
And in (16), the number of verbs that occur 
with 'cangsinkwu' (accessory) is smaller than 
that of the verbs that occur with other nouns. 
Both result in a wrong analysis. We can use a 
thesaurus to address this problem. 
In a thesaurus, words in the same class are 
assumed to have related meanings. We can use 
these class-mate words to compensate tbr data 
sparseness. In constructing a lexicon, we con- 
sult the thesaurus when the nmnber of verbs 
that occur with a given noun falls below a 
threshold, and let the noun share the data with 
those in the same class. The thesaurus has the 
'word-meaning code' tbrmat. The present the- 
saurus contains slightly more than 1000 nouns 
that are mammlly constructed. The classifi- 
cation follows the NTT hierarchy. We have 
assigned meaning codes to only the most fre- 
quently used meanings tbr polysemous entries. 
The depth of the hierarchy is 6. The following 
shows adjusted results with our thesarus. 
(17) kimchi-wa pap(0.768942)-man cwu-nun 
kes(0.008380)-ita. 
kimchi: 62 verbs, pat): 64 verbs, kes: 2083 verbs 
(18) os-kwa cangsinkwu(0.648276)-hfl yenkwuha-mm 
kes(0.008647)iess-supnita. 
os: 46 verbs, eangsinkwu: 69 verbs, kes: 2083 verbs 
Table 6 shows the comparison of the meth- 
ods with w~rious co-occurrence similarity dic- 
tionaries using 84 sentences containing nomi 
phrase coordination and structural ambiguity. 9 
It shows that a thesaurus is indeed usefifl in 
dealing with data sparseness. In this experi- 
\[\[ Nil M2 
Precision 84.1% 88.5% 
Recall 95.3% 92.7% 
Table 6: Comparison of l)ifferent Methods 
ment, we have shared the nouns that are asso- 
ciated with fewer than 20 verbs. We have also 
set the maximum shared examples to 70 and 
tuned the figures for maximum precision. 
5.3 Results 
For tile pertbrmance evaluation, we have con> 
pared three kinds of parsers. A employs only 
the CKY algorithm. B has tile additional mod- 
ule for spurious ambiguity. C utilizes the afore- 
mentioned dictionary, in addition to the module 
for spurious ambiguity. We tbund out that the 
9~{1 utilizes only the similariy dictionary as defined 
by Yang (1995). M2 has tile extra information from 
KAIST tree-tagged corpus and the thesaurus. 
598 
t)rilmtry t:';~ctors for (;he extra 1)re:sing (:Oml)lex- 
ity include the nuntber of right COlljUlle(; candi- 
dates, (;he presence of mo(liiiers in th(: left con- 
junct, and the type of sen(;ences (simph: or com- 
plex) with noun phrase (:oordination. 
In order to check for the inthlen('e of struc- 
tural anfl)iguity on parsing efficiency, we have 
(:\[assitie(t 53 sentences into d types, considering 
modifiers in the left conjmm(; (of. g'~fl)le 7). H) 
'.l'M)le, 8 shows l;he mmfl)er of semantic sl;rllc- 
Sentences Modiliers (range) ://: of candidates 
Type 1 no < 3 
Tyl)e 2 yes (mmmMguous) > <I 
Tyl)e 3 yes (anfl)iguous) < 3 
Type 4 yes (alnl)iguous) > 4 
3hble, 7: Tyl)es of Sentences 
(;ures as derived by each l)~rser. ~ The ~wer- 
age numbers of semantic structures derived by 
B mM C a.re 26.2 and 7.;), resl)e('l;ively, result~ 
ing in t;he reduction of 72.1%. \[l?d)le 9 shows 
I-- 
\] 89.8 4.3 
963 32.3 
245.1 13.4 
83.3 
TM)h: 8: Derived Semmlti(: S(;rllt;l;llres 
the t)arsing time (i)r each m(~tho(t. I~ Tim reason 
(;hnt B ~q)t)em:s generMly faster (;lmn C (ex('ei)(; 
for type d) is tha,t () sl)ends extra time on con- 
sull;ing the (tiction~ry dai;at):~se for the similar- 
ity. But the av(:r;~g(: analysis time by B is 298.79 
ms, whereas C takes 228.92 ms, m~king pars- 
iug more efficient in time reduction of 23.39%. 
Albeit premature, we l)elieve that these results 
are encouraging. 
6 Concluding Remarks 
'l'hrough ext)eriments , we h~we confirnled thai; 
we cm~ address spurious ambiguity in a CCG 
(°The ~werage sentence lengths of tyl)es 1 through 4 
are 12.4, 16.7, 13.3, and 20.5 morph(,mes, respectively. 
J 1 A was not at)le to 1)roduee results at all for any of the 
sentences of tyl)e 4, and t'ailed on half of the sentences 
of type 2 due to insuflMent lnemory. The tigures in the 
table retlect only the successflfl ones. 
12loVe llsed tit(: statis(;ics t)ackage of SICS(,us Prolog, 
under Sun Enterl)rise 250 with 512MB RAM. There wer(; 
23, 10, 10, mM 10 sentences of types \] through 4, resl). 
L I A 1 u I 
Tyt)e 1 1168 228 277~ 
Type 2 5789 1001 1377  
TyI)e 3 1284 3(/5 335~ 
Tyt)e 4 - 4504 2504~ 
r Fable 9: Average Parsing Tilnes (in ms) 
framework by incorl)or~fing predicate ~rgmne.nt 
structures into t;he parsing module, and shown 
that co-occurrence simil~ri(;y information mlg- 
mented with a tlmsaurus helps the parser to (teal 
with strueturM aml)iguity. We leave open the 
problem of addressing those conjunct, s/disj uncts 
thai; m:e ncil;her semantically related nor mltic- 
ip~ted by a corpus. 

References 

R. Agarwal and L. Boggess. 1992. A simple lint usefifl 
api)roach to COl\jUlt(:t i(hmtitieal;ion. A CL, pp. 15-21, 

l\]. J. Cho. 2000. Coordinate Constructions i~l, Ko'rcan 
and Parsing /ssucs in Combinatory Catcgorial Gram- 
mar. KAIS\]? MS thesis. 

Ill..\]. Cho an(t J. C. Pa.rk. 2000. Conll)inat;ory Cal;egorial 
Grammar fin' the Synt;actic~ Semantic, and Discourse 
Analyses of Coordinal;ion in Korean. Journal of KISS: 
Software and Applications, 27(4), pp. 448-462. 

,J. Eisner. 1996. Efficient normal-tbrm parsing for Coml)i- 
natory CaCegorial Grammar. A CL. pp. 79-86. 

L. Karl;tunnen. 1989. RadicM Lexicalism. In Altcrna, tive 
Conceptions of Ph'rasc St, ruct'urc. U of Chi(:ago Press. 

N. Komagata. 1999. l'nJbrmatio'n, St, ruct,u, rc in 7'czts: A 
Comput, ational Analysis of Contcztual Appropriateness 
in English and ,lapa~tws~:. U of l)ennsylvauia Ph\]) thesis. 

S. Kurohashi and M. Nagao. 1994. A synl;a(:(,ic analysis 
m('tlmd of long Jal)anes(: s(mten(:(:s based on (letection 
of (:onjun(:tive structures. CL 20(4), pp. 507-534. 

K. J. Lee. 1998. Stochastic Syntactic Analysis of Korean 
'with Linguistic 1)~vpertics. KAIST PhD thesis. 

A. Okumura and K. Muraki. 1994. Symmetric pattern 
matching analysis tbr English coordinnte structures. 
ANLP, pp. 41-46. 

J. S. Park. 1998. Anab, lsis of Coordinate Constructions 
in Korean with Fart-@Specch Patterns. KAIST MS 
thesis. 

S. Prevost. 1995. A Semantics of Contrast and \]nfof 
marion Structure for SpeciJyin9 Intonation in Spoken 
Language Generation. U of Pennsylvania PhD thesis. 

M. St(,cdman. 1990. Gal)l)ing as Constituent Coordina- 
(;ion. Linguistics and l)hilosophy 13, Pl)- 207-263. 

M. Steedman. 200(/. The Syntactic Process, MIT Press. 

J. It. Yang. 1995. Structural Ambiguity l{esolution in Ko- 
rean Noun Phrase Conjmmtions Using Co-occurrence 
Similarity. Jo~t~'nal of KISS (B), 23(3), I'P. 311-321. 
