Corpus-Based Approach for Nominal Compound Analysis for 
Korean Based on Linguistic and Statistical Information 
Juntae Yoon* 
jtyoon@linc.cis.upenn.edu 
IRCS 
Univ. of Pennsylvania 
Philladelphia, PA 19104~ USA 
Key-Sun Choi 
kschoi@world.kaist.ac.kr 
KORTERM 
Dept. of Computer Science 
KAIST~ Taejon 305-701~ Korea 
Mansuk Song 
mssong@december.yonsei.ac.kr 
Dept. of Computer Science 
Yonsei Univ. 
Seoul 120-749, Korea 
Abstract 
Accurate nominal compound analysis is cru- 
cial for in application of natural language pro- 
cessing such as information retrieval and ex- 
traction as well as nominal compound inter- 
pretation. I,n the nominal compound analysis 
area, some corpus-based approaches have re- 
ported successful results by using statistal co- 
occurrences of nouns. But a nominal compound 
often has the similar structure to a simple sen- 
tence, e.g. the complement-predicate structure, 
as well as representing compound meaning with 
several nouns combined. Due to the grammar- 
ical characteristics of nominal compounds, the 
fi'amework based only on statistcal association 
between nouns often fails to analyze their struc- 
tures accurately, especially in Korean. This pc- 
per presents a new model for Korean nominal 
compound analysis on the basis of linguistic and 
statistical knowledge. The syntactic relations 
often have an effect on determining the struc- 
ture of nominal compounds, and we analyzed 
40 million word corpus in order to acquire syn- 
tactic and s-tatistical knowledge. The structure 
of a nominal compound is analyzed based on 
the linguistic lexical information extracted. By 
experiments, it is shown that our method is ef- 
fective for accurate analysis of Korean nominal 
compounds. 
1 Introduction 
Nominal compound analysis is one of crucial 
issues that have been continuously studied by 
computational and theoretical linguists. Many 
linguists have dealt with nonlinal compounds 
in view of semantic interpretation, and tried to 
explain how nominal compounds are semanti- 
This work was partially supported by a KOSEF's post- 
doctoral fellowship grant. 
cally interpreted (Levi, 1978; Selkirk, 1982). In 
the field of natural language processing, various 
computational models have been established for 
syntactic analysis and semantic interpretation 
of nominal compounds (Finin, 1980; McDon- 
ald, 1982; Arens ct al. , 1987; Pustejovsky 
et al. , 1993; Kobayasi et al. , 1994; Van- 
derwerde, 1994; Lauer, 1995). Recently it has 
been shown that noun phrase analysis is effec- 
rive for the improvement of the application of 
natural language processing such as information 
retrieval (Zhai, 1997). 
Parsing nominal compound is a basic step 
for ~11 problems related to it. From a brack- 
eting point of view, structural ambiguity is also 
a main problem in nominal compomld analysis 
like in other parsing problems. Re(:ent works 
have shown that the corpus-b;~sed approach for 
nominal compound analysis makes a good re- 
sult to resolve the ambiguities (Fustcjovsky et 
al. , 1993; Kobayasi et al. , 1994; Lauer, 1995; 
Zhai, 1997). 
Lauer (1995) has compared two diffbrent 
models of corpus-based approaches fbr nomi- 
nal compound analysis. One was called as the 
adjacency model which was inspired by (Puste- 
jovsky et al. , 1993), and the other was re- 
ferred to as the dependency model which was 
presented by Kobayasi ~t al. (1994) 2 and 
Lauer (t995). Given a nominal compound of 
three nouns n~'-.2'a:~, let A.s. t)e a metric used to 
evaluate the association of two nouns. In the 
adjacency model, if A.~(',,l:',J.2) > A.s(n2,n3), 
then the structure is determined as (('hi 'n2) n3). 
Otherwise, ('nl (',l,~ 'n:,)). On the other hand, in 
2In their work, the structure is determined l)y com- 
paring the multiplication of the ~ssociations between all 
two nOuns, that is, by comImring A,s('..t, 'n2)A.s(n2, n3) 
and AS(nl, n3) As (n2, ',l.:~). It m~tkes similar results to the 
dependency model. 
292 
tim dClmn(h,,ncy model, the decision is det)en- 
dent on the association strength of nt for 'rt2 and 
',,::. That is. the left branching tree ((at 'n2) ha) 
is constructed it" A.s(nt,'u2) > As(at,ha), and 
I:he right branching tree ('nL (n2 'ha)) is made, 
~M,,,rwise. Lauer (1995) has claimed that the 
~h',lmndency model makes intuitive sense and 
i)r~)duces t)(,,tter results. 
In this paper, we propose a new model tbr 
~)minal comt)ound analysis on the basis of 
w()rd (:o-()(:cui'ren(;(?s and grannnatical rela- 
ti(mshil)s ilnmanent in nominal (:ompounds. 
Tim grammatical relation can sometimes 
ma,k(,, the (tisnmbiguation more precise as 
wo, ll as it gives a clue of the nonfinal in- 
l.(Ul)r('Iation. For example, in the nominal 
(:~nnl)ound "KYEONG JAENG (competition) 
YUBALa(bringing about) CHEJE(system)" 
whi(:h meallS system to bring about competition, 
tim nominal conlpound "KYEONGJAENG 
Cl-tEJE((:oml)etition system)" co-occurs much 
more fl'equently titan "KYEONGJAENG 
YUBAL(bringing about competition)". How- 
o.w;r, its structure is selected to be \[\[KYEONG- 
.IAENG YUBAL\] CHEJE\]. Why it is analyzed 
in such a way can be shown easily by trans- 
li)rming the nominal compound to the clause. 
Because "YUBAL(bringing about)" is the 
predicatiw,, noun that derives the verb with the 
1)redicative suffix attached, the modifying noun 
phrase can be transformed to the corresponding 
VP which has the meaning of "to bring about 
competition" (Figure 1). The verb "YUBAL- 
HA-NEUN(to bring about)" in VP takes the 
"KYEONG,lAENG(competition)" as the ob- 
.iect. The predicative noun "YUBAL(bringing 
about)" also subcategorizes a noun phrase 
"KYEONGJAENG(competition)" in the same 
rammer as the verb. In the right syntactic 
tree of Figure 1, it should be noted that the 
object of a verb does not have the dependency 
,elation to the noun outside the maximal 
1)rojection of its head, VP. Likewise, the object 
"KYE()NGJAENG(competition)" does not 
have a,ny dependency with the other noun 
over the predicative noun "YUBAL(bringing 
a,t)out)". 
:WUBAL is a noun in Korean which means to cause 
t,o bring about something 
2 Structure of Nominal Compound 
There is not any adjective derivation in Ko- 
rean. Rather, a noun itself plays an adverbial 
or adjective role ill a nominal compound, or 
modifies other noun with possessive postposi- 
tion attached. Table 1 shows various relations 
occurred in nominal compounds. 
As shown in the example, there is a rela- 
tionship between two nouns which have de- 
pendency relation in a nominal compound. 
For instance, the first nominal compound 
in the example expresses compound mean- 
ing of individual nouns, i.e. the attribute 
that a .file has. On the other hand, in 
the example (c) of the example, the noun 
"GAENYEOM(concept)" is the object of the 
predicative noun "GUBUN(discrimination)". A 
nominal compound, as such, often has the 
similar structure to a simple sentence, e.g. 
complement-predicate structure, as well as 
representing compound meaning with several 
nouns combined. 
Many researchers have tried to explain con- 
straints given in tile process of word combi- 
nation and the principle of semantic compo- 
sition. Levi (1978) has tried to find the se- 
mantic constraints which govern the combina- 
tion of each noun in a nominal compound. 
Sproat (1985) has taken into consideration the 
predicate-argument relation of nominals on the 
basis of generative syntax. He explained that 
the nominalization suffix nominalizes the syn- 
tactic category of a verb, but 0 role of the verb 
is percolated into its parent node. 
We claim that the nominalization is the phe- 
nomenon occurred at the syntactic level, and 
hence the syntactic relations should be reflected 
in nominal parsing. Namely, tbr accurate nomi- 
nal compound parsing, we need syntactic knowl- 
edge about nominal compound in addition to 
lexical information about lexical selection. We 
propose a nominal parsing model based on two 
relations, which can be immediately applied to 
nominal interpretation. We classi(y the syntac- 
tic relations in a nominal compound as tbllows: 
modifier-head relation One noun (adnomi- 
nal, adjective) adds n certain meaning to 
the other noun (head) producing a com- 
pound meaning (1, 2 in Table 1). 
complement-predicate relation One is the 
293 
NP NP 
NP NP ," .... -__. VP NP 
_~_ CHFF.JE ,' ~ CHEJE - . (system) .''-- -. (system) --.. 
," NP(obj) NP .-~ ...... ~4 P(obj\] .... V- " - :'?2z-z.~ t 
/' KYEONGJAENG YUBAL I t q0 KYEONGJAENG YUBAL , 
~ (competition) (bringing about) / '. subj (competition) (bringing about) ,' 
Figure 1: Example shows that syntactic relations have influence on deternfining the structure of a 
nominal compound 
nominal compound meaning 
PA'IL(file) SOGSEONG(attribule) 
GIBON(basis) GAENYEOM(concept) 
GAENYEOM(concept) GUBUN (discrimination) 
DAETONGRYEONG(president) DANGSEON(being elected) 
GONGDONG (working together) BEONYEOG(translation) 
file attribute 
basic concept 
discrimination of concept 
being elected to president 
to translate together 
Table 1: Role of modifying noun in nominal compomM 
complement (subject, object, adverb) of 
the other noun (predicative noun) in a 
nominalcompound (3, 4, 5 in Table 1). 
When considering the complement- 
predicate relation, we can figure out 
some syntactic constraints imposed on 
nonfinal compounds. For example, 
in "PA'.IL(file) SOGSEONG(attribute) 
BYEONKYEONG (change)", 
"SOGSEONG(attribute)" is the object of the 
predicative noun "BYEONKYEONG (change)". 
It can be expanded to a sentence like "X changes 
the .file attribute". In other words, the syntactic 
lewfls of two phrases "PA'IL SOGSEONG(file 
~ttribute)" and "BYEONKYEONG(change)" 
in the compound noun are different, where 
one is NP and the other is VP. That the 
syntactic levels (i.e. syntactic categories) of 
nominal compounds are different means that 
the different method is required for the proper 
a,nalysis of their structures. 
Next, a predicative noun does not subcate- 
gorize more than two nominals with the same 
granunatical cases. For instance, a predicative 
norm in a nominal compound governs either a 
subject or an object at most. The situation is 
w-~ry sinfilar to that occurred in a sentence. In 
this paper, this is called one case per sentence, 
which means that a predicative noun cannot 
subcategorize two nouns of the same grammat- 
ical cases when the relations of nominals can be 
expanded to a sentence. 
3 Acquiring Lexical Knowledge 
We collect lexical co-occurrence instances from 
corpus in order to get knowledge tor nomi- 
nal compound analysis. The text material is 
composed of 40 million (:ojeols of Yonsei Lex- 
icographical Center corpus a.mt KAIST corpus 
(330M bytes). The Korean morphoh)gi(:al ana- 
lyzer, the POS tagger and the partial parser are 
used to obtain co-occurreu(:es. 
In order to construct linguistic lexical 
data tbr nominals, we first, extracted verb- 
noun CO-OCcurrence (|ata fi'onl (;()rpus using 
the partial parser. A noun is c(mnected 
to a verb with a synta(:ti(: relation, and 
the co-occurrences are re,1)rescnted t)y triples 
(verb, nou'n,, syntactic rda, t'io'H,). The postpo- 
sitions are reposited in tit(,, syntactic relation 
feld in order to represent the syntacti(: relations 
which might o(:cur tmtween two nouns. Nom- 
inal pairs with (:omplenmnt-predi(:ate relation 
are derived fl'om the data extracted. 
Predicative nomls l)e(:()me vexbs with 
the verbalization suffix such as '-HA-' at- 
tached. For exampl(,,, the predicative noun 
'KEOMSAEK(retrieva.1)' is verbalized to 
'KEOMSAEK-HA(retrieve)' 1)y adding 
the suffix '-HA-'. Theretbr(~, we (:an get 
294 
c~mq)lement-predicate relations by reducing 
w;rl)s to predicative nouns with cutting, if 
;my, the verbalization suffix. Table 2 shows 
s(Hne llOun-nouIl co-occurrence examples of 
,omplement-predicate relation derived in that 
way. 
Second, co-occurrences colnposed of only two 
1,orals (complete nominal compound) were ob- 
rained. In Korean, complete nominal com- 
IT(rends arc extracted in the tbllowing way. Let 
us suplmse that N, NA, NP be the set of nouns, 
the set of nouns with tile possessive postposi- 
,:ion, and the set of nouns with a postposition 
~xcept the possessive postposition, respectively. 
• For eojeols et,e2,e3, where el ¢ N U 
NA, e2 E NUNA, e3 E NP, count (n2, ha), 
where 'r~,2 and n3 are tile nouns that belong 
to e~ and e:~ respectively. 
The data could contain two relations e.g. 
modifier-head relation and complement-head re- 
lation. Therefbre, we manually divide them into 
two classes by hand according to the relation. 
Many erroneous pairs could be removed by the 
ma,nual process. Furthermore, we manually as- 
sign to each nominal pair syntactic relations 
such as SUB J, OBJ and ADV since the syn- 
ta(:tic relation does not explicitly appear from 
Ira.its obtained in the second (Table 3), Actually, 
there is it() immanent syntactic relation between 
two nouns of modifier-head relation. On the 
other hand, some syntactic relation such as case 
marker and adverbial relation can be given to 
two nouns with complement-predicate relation. 
Some examples are given in Table 3. The data 
of complement-head relation are merged with 
those established with the partial parser, which 
are complement-head co-occurrences. The rest 
of the data have modifier-head co-occurrences. 
Consequently, the complement-predicate co- 
occurrence is represented with a triple {comp- 
',,o'wn,, pred-noun, syn-rel) as shown in Table 2. 
Syntactic relation is described with postposition 
tbr case mark or ADV in Korean. The syntactic 
relation is not given to the modifier-head co- 
occurrence. 
In the corpus based approach for natural lan- 
guage processing, we should take into consider- 
ation the data sparseness problem because the 
data do not contain whole phenomena of the 
language in most cases. Ma~W researchers have 
proposed conceptual asso(:iation to ba(:k off the 
lexical association on the assumption that words 
within a (;lass behave similarly (Resnik, 1993; 
Kobayasi et al. , 1994; Lauer, 1995). Namely, 
word classes were stored instead of word co- 
occurrences. 
Here, we must note that predicates does 
not act according to their semantic category. 
Predicates tend to have wholly different case 
frames ti'om each other. Thus, we stored 
individual predicative nouns and semantic 
classes of their arguments instead of each 
semantic class tor two nouns: In effect, given 
a word co-occurrence pair ('nl,'n2) and, if any, 
a syntactic relation s, it is transfbrmed and 
counted in the fbllowing way. 
1. Let ci be the thesaurus class which ni belongs to. 
2. If (nl,n2) are a pair in eo-occurrences of 
complement-predicate relation 
3. Then 
4. For each ci which nl belongs to, 
5. Increase the \]~'equency of (ci, 'n2, s) with the count 
of (~1, n~). 
(Here, ,s is an immanent syntactic relation) 
6. Else 
7. For" each class ci and c i to which 'n~ and n2 belongs 
respectively, 
8. Increase the .#'equency of (ci, cj) with the count of 
(n~,,~) 
Consequently, we built two knowledge sources 
with different properties, so that we needed to 
make the method to deal with them. In the next 
section, we will explain the effective method of 
analysis based on that different lexical knowl- 
edge. 
4 Nominal Compound Analysis 
In order to make tile process efficient, the ana- 
lyzer identifies the relations in a nominal com- 
pound, if any, which can be the guideline of 
phrase structuring, and then analyzes the struc- 
tures based on the relations. 
Figure 2 shows an example of the phrase 
structure of a nominal compound to include the 
complement-predicate relation. We showed that 
the nominal compound with the complement- 
predicate relation can be expanded to a sim- 
ple sentence which contains NPs and VP. This 
means again that the nonfinal compound with 
295 
argument predicative noun syntactic relation 
GAENYEOM(concept) YEONGU(study) OBJ 
GYEONJEHAG(eeonomics) YEONGU(study) OBJ 
GWAHAGJA(scientist) YEONGU(study) SUBJ 
Table 2: Noun-noun co-occurrence examples derived from lexical data of predicate YEONGU- 
HA(research) 
first noun second noun immanet syntactic relation (meaning) 
DAMBAE(tobacco) GAGE(store) 
CHARYANG (car) GAGYEOG(price) 
GEUMSOG(metal). GAGONG(process) OBJ(process metal) 
WANJEON(wholeness) GADONG(operation) ADV(operate wholly) 
Table 3: Examples 
the complement-predicate relation can be di- 
vided into one or more phrasal units which we 
(:all inside phruse. 
The nonfihal compound in Figure 2 has three 
inside phrases - NPsuBJ, NPoBJ and V. Some 
nonfinal compounds may not have any inside 
phrase. Besides, the structure in each inside 
phrase can be determined by the word co- 
occurrence based method presented by Lauer 
(1995) and. (Kobayasi et al. , 1994), i.e. only 
statistical association. 
4.1 Association between nouns 
Inside phrases can be detected based on the 
association, since two nouns associated with 
the complement-predicate relation indicate exis- 
tence of an inside phrase. We distinguish the as- 
sociation relation by discriminating knowledge 
source. Thus the associations are calculated in 
a different way as follows. Here, ambi(n) is 
the number of thesaurus classes in which n ap- 
pears, and Nc'p and NMH are the total number 
of the complement-predicate and the modifier- 
head co-occurrences- respectively. 
. Complement-Predicate 
The association can be computed based 
(m the complement-predicate relations 
obtained from complement-predicate co- 
occurrence data. It measures the strength 
of statistical association between a noun, 
'At, and a predicative noun, n.2, with a given 
syntactii~ relation s which is the syntactic 
relation like subject, object, adverb. Let ci 
1)e categories to which nl belongs. Then, 
the degree that nl is associated with n2 as 
of two nouns analyzed 
. 
the complement of n2 is defined as tbllows: 
Assoccp (?t,1, n2) -.~ 1 freq(ci, 'n2) (1) × 
i 
Modifier-Head 
The association of two nouns is estimated 
by the co-occurrences wlfich were collected 
for the modifier-head relation. In the sim- 
ilar way to the above, let ci and qj be the 
categories to which 'n, and 'n2 belongs re- 
spectively. Then, the association degree of 
nl and n2 is defined as tbllows: 
ASSOCMH(ni,n2)-- 1 ×Z freq(ci,cj) NMH . a'm, bi(nl )ambi(n2) 
(2) 
The syntactic relation is deternfined by the 
association. If' the association between two 
nouns can be computed by the t'ornnfla 1, 
the complement-t)redicate relation is given to 
the nouns. If not, the relation of two nouns 
is simply concluded with the modifier-head 
relation. We can recognize the syntactic 
relation inside a nominal (:Oml)OmM by the 
association involved. In order to distinguish 
the associations in accordance with the rela- 
tions, the association is expressed by a triple 
(relation, (sy'n-'re, l, v.,l'u,e.)}. Tim relation is 
chosen with CP or MH a~:c:ording to the fi)rmula 
used to estimate the a.ssocia.tion. If 'relation is 
CP, the syn-'rc, l has a,s its va.lue SUB J, OBJ, 
ADV etc., which arc given by co-oc~:urrence 
data acquired. ()therwise, (/) is assigned. Lastly, 
the value is computed by the tbrnnfla. The 
association is estimated in the tbllowing way, 
296 
I ! s 
\] _~- VP 
/- -t f~_-~ ~_ 
i NPsuBJ NPoB J V I 
NP 
NPsuBa NPom NP v 
,,'" SAYONGJA-YI", ,."'FILE SOKSEONG",, ,'" BYEONKYEONG'" 
".. (of user) ./" "-. (file) (attribute).." ".. (change) ..' 
Figure 2: Example of the phrase structure of a nominal compound 
l,h(:r(:fl)re: 
ff A.~,~o(:c.,p(, l,"~,'2) > 0 
As.s,,(:(.,,.,, ,,..2) = (CP,(.W,,n-rel, Assoecp (n,, ,,.2))) 
( './,,'i ( ', 
.4.s.so(:(,,,,, ",'2) = (MH,(¢, ASSOeMH(nl, n2))) 
If no co-occurrence data for a nominal 
(:Oml)ound are fbund in both databases, the 
modifier-head relations is assumed and the left 
association is favored tbr unseen data. The 
lm;ti-wence of left association is reasonable tbr 
I)ra.cketing of nonfinal compounds since the left 
associations occupy the bracketing patterns 
lnuch more than the right associations as shown 
in Ta,l)le 6. 
4.2 Parsing 
Since the head always tbllows its complement in 
Korean, the ith noun in the nominal compound 
consisting of n nouns has head candidates of 
,,,- i that it might be depend on, and the parser 
selects the most probable one from them. The 
parser determines the head of a complement by 
a,n association degree of head candidates for the 
complement. 
The easiest way is to have the head candi- 
date list sorted on the association, and select 
most strongly associative one. In the process of 
selection, the tbllowing constraints are imposed 
if the relation of two nouns is complement- 
predicate(CP). Given a nominal compound of 
three nouns (?~, 1., '//,2, ha), 
• If (n2, ha) are related with CP and the syn- 
tactic relation of (",2, ",:3) is the same as that 
of (nl, ha), then "~,l is not dependent on n3. 
This is called one case per sentence con- 
straint. 
If nl has an association with n2 by CP rela- 
tion, it does not have dependency relation 
with ha. See Figure 1 
If n2 plays an adverbial role tbr ha, then n, 
is not linked with rt,2. 
Cross dependency is not allowed. It means 
that dependent-head relations do not cross 
each other. 
As an example, given the nominal compound 
"iDAEJUNG(public) ~MUNHWA(culture) 
aBIPAN(criticism)", we can get the association 
table as shown in Table 4. According to 
the table, the first and second noun can be 
linked with the modifier-head relation and 
the association degree of 0.00021. The second 
noun can depend on the third noun with 
the complement-predicate relation, and the 
association degree is 0.00018. Furthermore, 
the argument is inihrred to the object of the 
predicate, which can be easily recognized by 
the co-occurrence data extracted. 
The table is sorted on the association so that 
the parser can easily search tbr the probable 
candidate for head. In order to effectively de- 
tect inside phrases and check the constraints, 
the syntactic relation should be checked prior to 
the comparison of the association value. That 
is, the first key is the rdal:ion and the second, 
association value. Thus, CP > MH, and the 
297 
2 3 
(MH, (¢, 0.00021)) (CP, (OBJ, 0.00014)) 
(CP, (OBJ, 0.00018)) 
Table 4: Association table(AT) for the example nominal compound "DAEJUNG MUNHWA BI- 
PAN" 
association values are compared in case of the 
sanle rvlation value. 
As a consequence, the association table is 
actually implemented to the association list as 
follows: 
\[DAEJUNG (public)\]- (3,OBJ, ( CP, O.O0014)) 
(2,¢,(MH,0.00021)) 
\[MUNHWA (culture)\]- (3,OBJ, (CP,0.00018)) 
From the list we know it is probable 
that the noun "DAEJUNG(public)" is depen- 
dent on "BIPAN(criticism)" with OBJ rela- 
tion. On the other hand, two words "DAE- 
JUNG(public)" and "MUNHWA(culture)" are 
tbund in modifier-head co-occurrences and thus 
associated with the modifier-head relation. 
Then, the parsing process can be defined as fol- 
lows: 
h, ead( n,: ) = 'at (3) 
l = index( max (Assoc(ni, nj))) j=i+ l,...,k 
Here index returns the index of noun nl 
whose association with ni is the maximum. 
Namely, the parser tries to find the following 
candidate tbr the head of each noun ni in a nom- 
inal compound consisting of k nouns, and make 
n link between them. If constraints are violated 
while parsing, the next candidate of the list is 
considered by the parser. According to the al- 
gorithm, the given example is parsed as follows: 
. There is only one candidate for 
"MUNHWA". "MUNHWA(culture)" has 
the dependency on "BIPAN(criticism)" 
with object relation. The fact that there 
is tim complement-predicate relation 
lmtween two nouns indicates that those 
are the elements of inside phrases, where 
one belongs to NP and the other has the 
property of VP. The inside phrases are 
detected by the syntactic relation. 
2. The most probable candidate of 
"DAEJUNG(public)" is also "BI- 
PAN(criticism)", but it violates one 
case per sentence since the predicative 
noun already took the object. Thus, 
another candidate is taken. 
3. The next head candidate 
"MUNHWA(culture)" is satisfactory 
to the constraints as the modifier-head 
relation, and "DAEJUNG(public)" is 
linked to "MUNHWA(culture)" with the 
relation. 
5 Experimental Results 
For experiments, we collected 387 nominal com- 
pounds fronl a million word corpus. Nominal 
compounds conlposed of more. than tbur nouns 
(a series of 5 nouns or more) are excluded be- 
cause the number of them is too small to eval- 
uate our system. 
Some examples of analysis are shown in Table 
5. In the table, the modifier-head relation is 
represented with MH, and the complement- 
predicate is described with OBJ and SUBJ 
that means object and subject respectively. No 
depedency between nouns is marked with '-' 
For instance, the modifier-head relation is as- 
signed to "MUSOG SINANG" which have the 
meaning of the religion o.f private society that 
is traditional and s'alJerstitio'as. However, we 
don't know about the semantic relation hidden 
in the results analyzed. In addition, the nom- 
inal compound "JISIK'IN-YI(intellex:tual's) 
CHAEK'IM (responsibility) HOIPI(evasion)" 
means that the intellectual evades h, is responsi- 
bility. Actually, its structure is determined as 
\[JISIK'IN-YI,s,,t~./ \[CHAEK'IMot~./ HOIPIv\]\] 
which can be ext)anded to a, siml)le sentence. 
Bracketed patterns of the example uonfinal 
conlpounds are shown in %tble 6. According to 
the table, the baseline a.ccm'acy of the default 
system is at least 73.6%. As shown in Table 7, 
the precision fi)r nnalysis of nominal comt)ounds 
298 
nominal compounds(n1, n2, ha) structure R.(n,.,'n,2) iI~(v,,,'n:~) /~('n2,na) 
MUSOG SIN'ANG JEONTONG ((nl n2) n3) MH MH 
(private society, religion,tradition) 
DAEJUNG MUNHWA BIPAN ((nl n2) n3) MH OBJ 
(public, culture, criticism) 
FRANCE KEUNDAE MUNHAG (nl (n2 n3)) MH MH 
(France, modern, literature) 
.I\[SIK'IN-YI CHAEK'IM HOIPI (nl (n2 n3)) SUB.I OBJ 
(intelh;(:tual's, responsibility, evasion) 
Ta.lfle 5: Examples of some nominal compound analyses, R(n,z, ',,~) is the, synta.ctic relation between 
",i a,n(1%; identified 
if- of n(mns in NP pattern fl'eq 
(nl-YI (n2 n3)) 
((n>Y~ n2) n3) 
((nl n2) n3) 
(nl (n2 ha)) 
(nl-YI (n2 (n3 n4))) 
((nl-YI (n2 n3)) n4) 
(((nl-YI n2) n3) n4) 
(nl-YI ((n2 n3) n4) 
((nl ng.) (ha n4)) 
(((nl n2) n3)n4) (nl ((n2 n3) n4)) 
((nl (n2 n3)) n4) 
(nl (n2 (n3 n4))) 
54 
31 
189 
41 
2 
10 
4 
6 
9 
32 
Ta,lfle 6: the patterns of nominal compound 
s(;ru(;t;ures 
ot' the length three and four is about 88.3% and 
66.3%. The result is fairly good in that nomi- 
~m.1 compounds of length three occur much more 
t'requently than those of length four. Overall 
I~recision of analysis is about 84.2%. 
In addition, we compared three different mod- 
els to evaluate our system - default model by the 
dominant pattern, dependency model presented 
by Kobayasi et al. (1994) and Lauer (1995), 
a.nd our model. In the default analysis, nomi- 
ha.1 compounds were bracketed by the dominant 
pa,tterns shown in Table 6. For the dependency 
model, we used the method presented by Lauer 
(1995). 
Table 8 shows the comparison of the results 
produced by our algorithm and the other two 
methods. Our system made a better result 
in the disambiguation process. The results 
show that the syntactic information in nomi- 
hal phrases plays an important role in deciding 
their structures. 
However, there are still errors produced. 
Some nouns has the word sense ambiguity, and 
are used as both predict~tive noun and com- 
mon noun. Because of the sense ambiguity, 
some modifier-head relations are misrecognized 
to complement-predicate. Other errors contain 
the same kind of results as (Latter, 1995). To 
overcome the errors, we think that semantic re- 
lations immanent in two nouns are considered. 
6 Conclusion 
Many statistical parsers have not taken care of 
analysis of nominal compounds. Furthermore, 
many researches which dealt with nominal com- 
pound parsing seemed not to have computa- 
tional approaches tbr linguistic phenomenon in 
nominal compounds. 
We proposed Korean nominM compound 
analysis based on linguistic statstical knowl- 
edge. Actually, immanent syntactic relations 
like subject and object as well as structures 
of nominal compounds arc identified using our 
nominal compound analyzer and knowledge ac- 
quisition method. Syntactic relations identi- 
fied can be effectively used in semantic inter- 
pretation of nominal compound. Moreover, the 
parser was more accurate by using linguistic 
knowledge such as structural information and 
syntactic relation immanent in nouns. 
It is expected that our parsing results in- 
cluding identification of syntactic relations are 
useful for the application system such as infor- 
mation extraction because many nominal com- 
pounds are contained in Korean document bod- 
ies and titles, which often represent some events. 
However, the system still has some difficul- 
299 
# of nominal compounds # of success I precision 
3 nouns 315 278 88.3 
4 nouns 72 48 \[ 66.3 
total 387 326 84.2 
Table 7: Overall results of nominal compound analysis 
total 
# of success precision 
3 nouns 
precision 
4 nouns 
precision 
(1) 285 73.6 77.1 58.3 
(2) 315 81.4 85.4 63.9 
(3) 326 84.2 88.3 66.3 
Table 8: Results of nominal compound analysis (1) default analysis by pattern (2) results using the 
dependency model (3) results using our algorithm 
ties, which caused erroneous results. In the fu- 
ture work, we feel it is necessary that lexical 
I)arameters be transformed into conceptual pa- 
rameters with large size of semantic knowledge, 
and filrther studies on linguistic properties of 
nominals be made. 

References 

Arens, Y., Granacki, J. J., and Parker, A. C. 
1987. Phrasal Analysis of Long Noun Se- 
quences In Proceedings o.f the 25th Annual 
Meeting of A CL 

Choi, K. S., Han, Y. S., Han, Y. G., and Kwon, 
O. W. 1994. KAIST Tree Bank Project for 
Korean: Present and Future Development. In 
P'mceedings of the International Workshop on 
Sharable Natural Language Resources. 

Finin, T. W. 1980. The semantic interpreta- 
tion of compound nominals. University of Illi- 
nois at Urbana-Champaign. University Mi- 
crofilms Iilternational. 

Hindle, D., and Rooth, M. 1993. Structural 
Ambiguity and Lexical Relations. In Com- 
putational.Linguistics Vol. 19(1). 

Isabelle, P, 1984. Another Look at Nominal 
Compomlds In Proceedings of COLING 84 

Kobayasi, Y:, Takenobu, T., and Hozumi, T., 
1994. Analysis of Japanese Compound Nouns 
Using Collocational hlformation. In Proceed- 
i'ags of COLING 94 

Lauer, M. 1995. Corpus Statistics Meet the 
Noun Compound: Some Empirical Results. 
In P'mceedings of the 33'rd Annual Meeting (tf 
ACL 

Levi, J. 1978. The Syntax and Semantics of 
Complex Nominals. Academic 

Marcus, M. 1980. A Theory of Synta(:tic Recog- 
nition fbr Natural Language. Cambridge and 
London: MIT Press 

McDonald, D. B. 1982. Understanding Noun 
Compounds. Carnegie-Mellon University. 

Pustejovsky, J. and Anick, P. G. 1988. On the 
Semantic Interpretation of Nominals In Pro- 
ceedings of COLING 88 

Pustejovsky, J., Bergler, S., and Anick, P. 
1993. Lexical Semanti(: Te(:hni(tues fbr Cor- 
pus Analysis. In Computational Linguistics 
Vol. 19(2). 

Resnik, P. 1993. Selection and hdbrmation: A 
Class-Based Al)t)roa(:h to Lexi(:al Relation- 
ships. Ph.D. dissertation. University of Penn- 
sylvania, Philadelphia, PA. 

Selkirk, E. 1982. The Syntax of Words. MIT 
Press 

Sproat, R. W. 1985. ()n Deriving the Lexicon. 
Doctoral Dissertation, MIT. 

Sproat, R. W. and Lil)erman M. Y. 1987. To- 
ward Treating English Nominals C()rrectly. 
In Proceedings of the 25th, An'n, ual Meeting of 
ACL 

Vanderwerde, L. 1994. Algorithm tbr Auto- 
marie Intert)retation of Noun Sequences In 
Proceedings of COLING 94 

Zhai, C. 1997. Fast Statisti(:nl Parsing ()f Noun 
Phrases tbr Documenting Indexing. In PTv- 
ceedings of the 5th, Co'nf, re,nce on Applied 
Nat'a~nl Langv, age P'mcc, s.sing  
