A Local Grammar-based Approach to Recognizing of 
Proper Names in Korean Texts 
Jee-Sun NAM* **, Key-Sun CHOI* 
nam@worldkaist.ac.kr, kschoi@worldkaist.ac.kr 
* CAIR, Deparlment of Computer Science, KAIST, Korea 
** IGM, University of Maroe-la-Vallee, France 
Abstract 
We present an LO-based approach to recognizing of Proper Names in Korean texts. 
Local grammars (LGs) are constructed by examining specific syntactic contexts of lexical 
elements, given that the general syntactic rules, independent from lexical items, cannot 
provide accurate analyses. The LGs will be represented under the form of Finite State 
Automata (FSA) in our system. 
So far as we do not have a dictionary which would provide all proper names, we need 
auxiliary tools to analyze them. We will examine contexts where strings containing proper 
names occur. Our approach consists in building an electronic lexicon of PNs in a way more 
satisfactory than other existing methods, such as their recognition in texts by means of 
statistical approaches or by rule-based methods. 
1. Introduction 
In this paper, we present a description of the typology of nominal phrases containing Proper 
Names (IN) and the local grammars \[GrogT\],\[Moh94\] constructed on the basis of this description. 
The goal is to implement a system which detects automatically PNs in a given text, allowing the 
construction of an electronic lexicon of PNs. 
The definition of Proper Nouns, as opposed to that of Common Nouns, is often a problematic 
issue in linguistic descriptions \[Gar91\]. PNs are understood in general as phonic sequences 
associated with one referent, without any intrinsic meanings, such as Socrates, Bach or Paris. They 
usually are characterized by nominal determination, the upper case marker, prohibition of 
pluralizing procedure, or non-translativity \[Wi191\]. However, semantic or syntactic criteria do not 
allow to distinguish these two categories in an operational way. For example, nouns such as sun, 
earth or moon, semantically appropriate to the definition of proper nouns such as Mars or Jupiter, 
do not have to be written with the upper case initial: hence, they are not considered as proper nouns. 
On the contrary, some proper nouns such as Baudelaire or Napoleon can be used as well as common 
nouns in contexts where they occur in metonymic or metaphorical relations with common nouns 
like: 
I read some (Baudelaire + poems of Baudelaire) 
He is a real (Napoleon + general) 
Moreover, they often allow, like common nouns, the derivation of adjectives: e.g. Socratic, 
Napoleonic or Parisian. These are also written with initial upper case, differently from usual 
adjectives. 
The situation concerning French is similar to that. Let us consider \[Gar91\]: 
273 
J'ai dcoutd (du Bach + ,~e la musique) 
J' ai bu (du Champagne + du vin rouge) 
Derivational productivity is also underlined: socratique, parisien or newyorkais, which however do 
not begin in the upper case. 
In the case of Engli,;h or French, one could delimit formally the category proper nouns by 
means of the upper case, even though this criterion does not correspond entirely to our intuition 
about proper nouns. However, in Korean, there are no typographical markers such as upper case vs. 
lower case, while one assumes that the nouns such as in (1) could be semantically and syntactically 
different from those of(2): 
(1) ~ ~--$-, z-l~. _=~ 
Kim Minu, Seoul, France 
(2) ~}, -~:, q-~ 
namja\[rnan\], sudo\[capital\], nala\[country\] 
This situation makes more difficult the distinction between proper nouns and common nouns than in 
the case of French or English, when the former appears in the same grammatical positions as the 
latter like: 
(I listened to (Masart + classic misic) all day) 
-,-~ (~_~2~ + ~. ~y_~).~ ~ 
(He only drinks (Bordeaux + red wine)) 
The derivation of some other categories from PNs is also observed: 
~;~ ~ - ~ \[in Park JungHee's manner\] 
~z~.. =~- \[France style\] 
~-. ~-o~ \[Chinese (language)\] 
In fact, the distinction between these two categories might be arbitrary. We should perhaps consider 
a continuum of the noun system: a thesaurus of nouns constituted of the most generic nouns to the 
most specific nouns (which we call proper nouns). The following example shows a part of a noun 
hierarchy (Figure 1): 
All 
Physical~ ~ Abstract 
Animate Non-Animate 
",,ffi 
Person Animal (not-human) 
".ffi 
Korean American ... a / 
Kim MinU Park JungHee ... Figure 1 
Therefore, in the automatic analyses of texts written in Korean, we intend to consider the definition 
problem of proper nouns from a different view point: whatever the given definition of proper nouns 
is, once a complete list of them is available, we presumably do not need any longer this particular 
distinction between proper and common nouns. All nouns have some semantic and syntactic 
properties, which lead to group them into several classes, not by binary distinctions. Nevertherless, it 
seems still hard to establish an exhaustive list of what we call proper nouns. Actually, proper nouns, 
important in number and in frequency, are one of the most problematic units for all automatic 
analyzers of natural language texts. 
274 
In this study, we will focus on the problems of recognition of proper names. We do not try to 
characterize them as an axiomatic class, but attach to them a formal definition to determine 
explicitly the object of our study. Here is our formal criterion: 
{X e (Interrogative Pron -~? \[Who?\]) \[ X e (DECOS-NS) } 
Hu\[x~~raon nouns \[DECOS-NS\] 
That is, proper names are determined by the fact that they do not exist in our lexicon of Korean 
common nouns (DECOS-NS/V01) \[Nam94\], and by their correspondence with the interrogative 
pronoun '-~5 z nugu? \[who?\]'. The nouns considered as proper names according to these conditions 
do not always correspond to our semantic intuition. Nevertheless, they usually do not have intrinsic 
meanings; and they do not have explicitly distinct referents. Given that a lexicon of Korean common 
nouns (DECOS-NS) has already been built \['Nam94\], the ambiguity across the category of common 
nouns and that of proper ones will be settled only in one of these two lexicons by looking up 
DECOS-NS: if they already are included in this lexicon, we do not consider them in the lexicon of 
proper nouns, without questioning their linguistic status. Remember that our goal is not to discuss 
the nature of this noun class, but to complete the lexicon of Korean Nouns in NLP systems. In order 
to handle them in an NLP system, given that we do not have yet a dictionary which provides all 
proper nouns, auxiliary methods are required, such as syntactic information or local grammars that 
allow to analyze them. 
In the following sections, we will classify in five types the contexts where Proper Names can 
appear, and describe their characteristics in detail. 
2. Typology of PN Contexts 
2.1. Type I < PN-(Postposition + E) > 
This type of noun phrases is without any particular characteristics inherent to Proper Names 
(PLY). They actually occur in the positions of common nouns, as shown in the following graph 
(Figure 2): 
Figure 2. Type I of Nominal Phrases containing PArs 
Postpositions observed on the right side of nouns (proper or common ones) indicate grammatical 
functions of the attached noun. When they appear in this context, there are no ways to sort out 
proper names, only by analyzing their syntactic structures. Let us consider: 
~ol "~°--1 ~l~1.olr-.1- Kim 3ung II - i bughan-eui tongchija-ida 
PN<Kim Jung II>-Postp North Korea-of president-be 
(Kim Jung II is the President of North Korea) 
275 
t ¢ 
We cannot distingnish this PN <Kim Jung I!> from other nouns that can be found in this position, 
such as in the following: 
(=z ~+z-~).ol ~ro_4 ~\]~o\]r-q- 
(geu namja + josenin)-~i bughan-eui tongchija-ida 
( (This man +A Korean) is the President of North Korea) 
As mentioned above, in English or in French, proper names could be distinguished from common 
nouns, at least by means of the use of the upper ease for the former, even though it is not an absolute 
criterion. Consider: 
Jacques Chirac e.st le President de la France 
Bill Clinton is the President of USA 
Nevertheless, the upper case does not totally satisfy our semantic intuition, since we also observe 
nouns with the upper case, such as President or President, which certainly do not designate one 
particular person (here, we encounter the fundamental problem of the definition of the term 
'proper'). Likewise, in the following sentence, the noun Franqais and American started with the 
upper case cannot be considered as proper names, whatever the definition of proper name is: 
(Cet homme + Un Franqais) e.st le Prdsident de la France 
(This man + An American) is the President of USA 
2.2. Type H < PN (Spec+E) Professional Title-(Postposition+E) > 
This type of sequence is characterized by the presence of nouns of professional title (PT), such 
as- 
Wx~ bagsa \[Doctor\] 
~-~ gyosu \[Professor\] 
~ wenjang \[Director\] 
~0~ sajang \[President\] 
~r rd janggwan \[Minister\] 
For example: 
Kim MinU bagsa-neun migug-eise 5nyengan gongbuha-essda 
PN<Kim MinU> Doctor-Postp U.S.A.-Postp 5 year-during study-Past 
(Dr. Kirn MinU has studied in U.S.A. during 5 years) 
The noun phrase in subject position: 'K/m MinU bagsaneun" is composed of three strings. 
However, in Korean, typographical constraint is not a reliable criterion, since we cannot prohibit 
writing this phrase in other ways like: 
(2a) ~% ~,\]--~ 
(2b) ~\] ~ ~,~ 
KimMinU_ bagsaneun 
KimMinUbagsaneun 
When proper names occur as attached to other elements of noun phrases, their analysis becomes 
more complicated. Therefore, a local grammar recognizing PTs such as (Figure 3): 
Figure 3. Local grammar of Type II 
276 
will reduce numerous mismatchings between the strings like (2b) and the combination of the items 
found in a dictionary. 
Since a family name alone can precede PTs, the grammar above should be refined (Figure 4): 
Figure 4. A more detailed Local grammar of Type II 
Thus we observe (3) instead of(l): 
(3) ~ ~ =1~,~t 5~ o=~r~ 
Kim bagsa-neun migug-eise 5 nyengan gongbuha-essda 
(Dr. Kim has studied in the U.S.A. during 5 years) 
while a given name alone hardly appears with PTs: 
.~.~-?- ~'~,~--b ~1~1~t 5~Z~ ~ 
??MinU bagsa-neun migug-eise 5 nyengan gongbuha-essda 
(Dr. Min U has studied in U.S.A. during 5 years) 
When we list the nouns of professional title, the number of PNs recognized by the local grammar 
presented in Figure 4 will be increased. Nevertheless, listing these nouns up does not guarantee 
automatically to recognize PArs, since we can come across specific nouns (Spec) inside of these 
sequences: 
(4) ,~ ~-- mu'~" u_.~,~\]._~. =1~011~ 5~z~ o~-V~Ir~ 
Kim MinU bebhag bagsa-neun migug-eise 5nyengan gongbuha-essda 
(Dr. of Laws Kim MinU has studied in U.S.A. during 5 years) 
The Specs are appropriate to PTs: we observe nouns designating scientific domains such as 
'physics', "biology', "mathematics', or 'literature' for the PTs like 'doctor', whereas we find another 
set of Specs for the PT'minister': ' education', 'culture ', or 'Wansfortation' for example. 
Notice that PTs can also appear without PNs: 
(bebhag + E) bagsa-ga jeil senmangbad-neun sahoijeg fiwijung-eui hana-i-da 
(A doctor (of Law + E) is one of the most envied social titles) 
-' (~'+E) ~1--~ ~'~ ~-~'-~ ~t~ 
geu ('bebhag + E)bagsa-neun iljfig hangug-eul ddena-ssda 
(This doctor (of Law + E) left Korea early) 
Thus, in order to analyze the strin~ followed by a PT in contexts such as (5), the system should first 
look up a lexicon of Common Nouns (and eventually a lexicon of Determiners), and if the search 
fails, one could suppose that we found a proper name: 
(Sa)ol~ ~,1-71- ~71~ -~r~ 
igonggyei bagsa-ga ingi-ga nop-da (Doctors of Natural Science are highly requested) 
(Sb)o I ~ ~x~7~ ~71~ ~ 
i gonghag bagsa-ga ingi-ga nop-da (This doctor of Science is highly requested) 
277 
(Sc)ol ~ ~'l'~ ~ 21 21- ~r4- 
iminu bagsa-ga ingi-ga nop-da (Doctor Lee MinU is highly requested) 
In (Sa), the string found with 'bagsa \[doctor\]' is a simple noun 'igonggyei \[natural science\]'; the 
sequence that precedes 'bagsa \[doctor\]' in (5b) is a phrase composed of a determiner 'i \[this\]' and a 
common noun 'gonghag \[science\]'; the element followed by "bagsa \[doctor\]' in (5c) will not be 
matched with any entries of the lexicon of common nouns: only this string will then be recognized 
as a proper name. 
The local grammar proposed so far should be completed by the description of the following 
transformation. Let us compare (4) with (6): 
bebhag bagsa Kim MinU-neun migug-eise 5 nyengan gongbuha-essda 
(Kim MinU, Dr. of Laws, has studied in U.S.A during 5 years) 
The sentence (6) can still be transformed into: 
~ ~l~oll,~-I ~ ~ ~ ~,q-olr--l- 
Kim MinU-neun migug-eise 5 nyengan gongbuha-n bebhag bagsa-ida 
(Kim MinU is a doctor of Laws who has studied in U.S.A. during 5 years) 
In fact, the sequence containing PTs corresponds to a simple sentence: 
PN W-Professional Title 
W-Professional Title PN 
S: PN be a W-Professional Title 
(7a) ~ ~q- ~ ~x~ 
(Tb)ffi ~ed ~x~ ~ ~- 
(7¢)= z~ ~-~ ~ ~1-x\].olr_. t. 
Kdm MinU bebhag bagsa \[Dr. of Law Kim MinU\] 
bebhag bagsa Kim MinU \[KJm MinU, Dr. of Law,\] 
Kim MinU-neun bebhag bagsa-ida 
\[KimMinU is a Dr. of Law\] 
2.3. Type HI <PN-(Gen+E) Family Relation-(Postposition+E) > 
This type of phrases contains nouns designating a family relation (FR) such as: 
o)~ adeul \[son\] 
o~x\] ~eji \[father\] 
~ che \[wife\] 
;~ sonja \[~andchild\] 
~ ~ myenegli \[daughter-in-law\] 
These nouns have a strong possibility to occur with a proper name, as shown in the following 
sentence: 
~o_.\] o~ .~ 20~olr-l- 
Kim Min U-eui adeul-eun olhai 20sal-ida 
PN<Kim Min U>-Gen son-Postp this year 20 years old-be 
(Kim Min U's son is 20 years-old this year) 
The Genitive Postposition '°-4 eui \['s/of\]' can be omitted: 
• ~ ~ o\].~.~- .~.~ 20 ~olr-4" 
Kim MinU adeul--eun olhai 20 sal-ida 
(Kim Min U son \[--'s son\] is 20 years-old this year) 
278 
I 
I 
I 
i 
I 
!1 
I 
I 
I 
I 
I 
I 
I 
I 
i 
I 
i 
I 
I 
The structure can be formalized in the following graph (Figure 5): 
Figure 5. Type III of norm phrases containing PNs 
The strings "N-(Gen+E) FR" do not automatically guarantee existence of proper names, since 
common nouns that have a human feather can also appear with a FR like: 
(~'~ ~gd + ~:~ ~xl-)- (~ +E) oI-'~'~ -~'~ 20~°1~ 
(bbang/ib juin+yepfib namja)-(eui+ E) adeul-eun olhai 20 sal-ida 
((The baker + The neighbor)'s son is 20 years-old this year) 
'In fact, strings containing FRs are necessarily based upon human nouns, proper names being only 
one class of human nouns. This context helps to f'md proper names, but is not a sufficient condition 
to recognize them automatically. 
2.4. Type IV <PN Vocative Term-(Postposition+E) > 
We call Vocative Terms ( FT) the following utterances: 
,~ z~ ! yenggam ! \[Sir !\] 
e~ ~d ! senbainim ! \[Senior !\] 
~ ! nuna ! \[Elder sister ! (for a boy)\] 
,L1 q ! enni ! \[Elder sister ! (for a girl)\] 
! hyeng ! \[Elder brother ! (for a boy)\] 
.9.~. ! obba ! \[Elder brother ! (for a girl)\] 
The nouns above can all be used as FTs, that is, a term one can use to indicate some social or 
familial relations between himself (i.e. the speaker) and his interlocutor(s), or to call on somebody 
paYing due respect to his social status (honorific terms). In addition, with proper names, they can 
also occur in assertive sentences, like: 
Kim yenggara-i wa-ssda PN<Kixn> sir-Postp come-Past (Sir. Kim came) 
Ina nuna-ga ddena-ssda PN<In A> elder sister-Postp leave-Past (Elder Sister InA left) 
These FTs should be compared with the nouns of professional title (PT) that we examined in section 
2.2. and those of family relation (FR) mentioned in 2.3., since some of them (PTs and FRs) can also 
be used in calling someone, like in: 
Kim Gyosu! (Come here !) 
Kim MinU Wenjangnim ! 
\[PT: Professor Kim !\] 
\[PT: Director Kim MinU !\] 
o~ z\] ! abeji ! \[FR: Father I\] 
o~ ~ q ! emeni ! \[FR: Mother !\] 
Let us examine differences among them: 
• r. Difference between FTand PT 
Nouns of Professional Title (P~ are different from Vocative terms (FT), not only in syntactic, 
but also in semantic ways. As PTs do not have inherently vocative functions, they can hardly be 
used alone in the vocative case: 
279 
?*~@ l ?*gyosu ! \[Professor !\] 
?*~d'-vd - ! ?*/anggwan ! \[Minister !\] 
Then, one should either attach to them a vocative suffix such as '~d him', or adjoin them to proper 
names: 
gyosu-nira!/Kim MinU gyosu! 
janggwan-niml/Kim janggwan! 
Semantically, PTs designate professions, the list of which we can determine a priori, while ITs are 
more vague and non-predictable without examining pragmatic situations: the latter are closer to the 
nouns of Family Relation (FR), since, as mentioned above, they imply familial or social relations 
between a speaker and his interlocutor(s). 
• 4. Difference between FT and FR 
What we call nouns of Family Relation (FR) cannot appear with a proper name when they are 
used in the vocative case. Thus, is not allowed the internal structure: 
*ProperName FamilyName ! 
such as: 
(1)*~ ~--?- o'l-~lal ! 
• Wo.\]~q q ! 
*KimMinU abefi! \[FR:Father Kim MinU\[\] 
*Park emeni / \[FR: Mother Park !\] 
Remember that FRs are formally defined as occurring in the structure 'PN-Gen FR \[Proper Name's 
FR\]', thus, when we encounter them in the contexts 'IN FR' (e.g. :~& o\].~qa\] Kim MinU abefi 
\[Kim MinU\]), such strings have a meaning corresponding to 'PN-Gen FR" (e.g. ~ ~-q-al o\]-Mx\] Kim 
MinU-eui abefi \[Kim MinU's father\]): PNs are not appositions to FR, like in sequences composed of 
"PN IT" such as (2). ITs, by definition, should be able to appear directly associated with proper 
names. Compare (I) with: 
(2)zd~@~ ! KimMinUhyeng! \[VT:BrotherKimMinU!\] 
~ ! Park hyeng ! \[VT: Colleague Park !\] 
~o~ ~ ! Park yenggam \[VT: Sir Park !\] 
@ ~. ~ ! Minu obba ! \[VT: Brother MinU \[\] 
Let us underline that some ITs do not accept family names alone, whereas some others allow them, 
as well as given names alone or full names. Here are some cases (Figure 6): 
:.name alone G.name alone Full name 
, ~1 hyengl \[ - ÷ "4- ; 
2 hyeng2 + - i 
.p..m\]- obba \[ + + 
o~ :~ yenggam \[ + - + I 
÷ .4- + s 
Figure 6. Some FTs with their associated PNtypes 
~d~ senbai \[ 
2.5. Type V <PN Incomplete Noun-(Postposition+E)> 
This type of Noun Phrase is similar to the preceding one: what we call Incomplete Nouns (IN) is 
also used for social appellation. However, they are different from the preceding ones by the fact that 
they do not have syntactic autonomy, and therefore they never can appear alone in any positions of a 
sentence. Here is their list: 
280 
II 
I 
I 
I 
I 
I 
I 
l 
I 
I 
II 
! 
l 
II 
i 
.11 
I 
I! 
! 
I 
I 
! 
I 
l 
I 
~\] ssi 
°o~ yang 
7\]-ga 
~d nim 
-~ gun 
ong 
\[Mr. / Miss. / Mrs.\] 
\[Miss.\] 
\[Mr. <pejorative>\] 
\[Mr. / Miss. / Mrs. <respectful>\] 
\[Mr.<young boy>\] 
\[Mr. <old man>\] 
Let us consider: 
~7\]- ~rff KimMinU-ssi-ga wa-ssda 
PN<Kim MinU> - IN\[Mr.\] - Postp come-Past (Mr. Kim MinU came) 
~°~o1 lz'l~r--1- Kim-yang-i ddena-ssda 
PN<Kim> - IN\[Miss.\] - Postp leave-Past (Miss. Kim left) 
Notice that PNs vary according to/Ns. The following table represents different types (Figure 7): 
:. Name G. Name Full Nam, 
' ~1 ssi f 
" °o~ yang 
+ 
-~ gun 
! @ong 
+ .4- 
+ + ÷ 
7~ ga + - 
, ~nim - - + 
+ + + 
+ ° + 
Figure 7. Types of PNs according to INs 
The table above can be represented by a Finite State Automaton (FSA) \[Lap96\], \[Gro87\] as shown 
in the following graph (Figure 8): 
Figure 8. FSA of PN-IN 
These nouns (/Ns), syntactically and semantically incomplete, always require proper names to their 
left side. In this sense, this type of contexts is appropriate to PNs: if an 1N is recognized, we can be 
assured to fred a PN near to it. In spite of this strong constraint, since all/Ns are mono-syllabic, 
ambiguity problems are often hard to handle. For example, the 1N '71. ga \[Mr.\]' is an homograph of 
several items. Let us consider some of them (Figure 9): 
Fype Part of Speech Meaning " Examph 
Incomplete N. Mr. 
Simple Noun grade --~@=\] °~7} 
Prefix 1 dance 7}-~ 
Prefix2 provisory 7}4 ~o ~ 
Suffixl \[ letter ~ 71. 
281 
Suffix2 
Suffix3 
value 
family 71. 
Suffix4 music -8- ~ ~1" 
Suff'tx5 person ~& T\]- 
Suffix6 boundary ~T1- 
Suffix7 area ~ ~- 71. 
Verb go ,~ q.71. ~ ~ 
Pos~position Nominative ~ ~ 71. 
Terminal Sfx Interrogation ~ ~ ~ 7\]- ? 
Figure 9. Homograph types of'ga' 
The following sentence illustrates this ambiguity problem: 
(\]) ~71- ~3~-~-~ -?-~ 71- ~ 71- ~71- ~-~tl ~- -?-~7l-,:,.II x.~ ~ 71-~ ~_ _v_ ~ ~, ~ oo~71- ~,~ 
bag-ga chingu-deul-gwa uli-ga muhega jutaig-ga geunche han umul-ga.eise yuhaing-ga-leul buleu-go 
isseul ddai, yengyang-ga ebs-neun bbangbuseulegi juwi.-ei myech mali sai-ga anja iss-ess-den-ga ? 
(When we were singing popular songs with Mr. Park's friends at the edge of a well near the area of 
unlicensed buildings, how many birds were there sitting around bread crumbs without any taste ? ) 
We observe the morpheme ga 9 times. But only the first occurrence of ga is an Incomplete Noun 
which accompanies a PN. In the 8 other strings, we should not expect occurrences of PNs: in order 
to recognize an 1Nga, first, dictionaries of all common nouns (i.e. simple nouns, derived nouns, and 
compound nouns) must be available. If the string containing ga is not found in these dictionaries, 
then the f'mal syllable ga might be a verb, a nominative postposition attached to a noun, or an 
inflectional suffix attached to a verb; or else, it is an IN ga. 
In the case of (1), strings containing ga, such as the following ones, are detected as common 
nouns (simple or derived ones): 
~ ~ 71- muhega unlicensed 
~ 71. jutaig-ga area of buildings 
~-7\]- umul-ga edge of a well 
~ 71. yuhaing-ga popular songs 
e~ o~ 7~ yen~,ang-ga any taste 
and the following ones are either nouns followed by a postposition ga or a verb including the 
inflectional suffix (IS) ga: 
-~ ~ 71. uli-ga we-Postp 
x~ 71. sai-ga bird-Postp 
.~.~ ~. 7}. iss-ess-den-ga be-IS \[Past-Past-Interrogation\] 
The string '~71. bag-ga' will not be recognized as one of these cases, even though there exists a 
simple noun "bag \[pumpkin\]' in the dictionary of common nouns, since the postposition required by 
this noun is not '71-ga', but 'ol r. Therefore, bag-ga will be analyzed as a proper name bag (family 
name alone) followed by an INga. 
3. Building Local Grammars of PNs 
Let us summarize the formal definition of the five contexts where a Proper Name (PN) can occur: 
I 
I 
I 
I 
I 
I 
I 
i 
I 
I 
I 
I 
I 
i 
I 
I 
I 
282 
'-,'. I 
i 
! 
I 
! 
I 
! 
! 
! 
| 
! 
! 
! 
Type I. Noun Position : 
• ~ Type II. With Professional Title(PT) : 
,¢. Type III. With Family Relation (FR) : 
Type IV. With Vocative Term (VT) : 
• ~. Type V. With Incomplete Noun (IN) : 
<PN-Postp> 
<PN-(Spec)-PT-Postp> 
<PN-Gen-FR-Postp> 
< PN- VT-P ostp> 
<PN-1N-Postp> 
These five contexts are represented in Figure 10: 
Figure 10. Local grammar of PNs 
Notice that when we recognize Incomplete Nouns (i.e. ~ ssi, ~ yang, 7} ga, ~d nim, :~ gun, 
ong), the occurrence of proper names is guaranteed, since _/Ns cannot occur without PNs. 
Nevertheless, as mentioned above, serious ambiguity problems appear in the distinction of/Ns from 
their homographs, we here propose two complex local grammars in order to increase the ratio of 
identification of/Ns. 
3.1. Use of PostHN appropriate to Human Nouns 
There are specific items appropriate to human nouns: we name them .PostHN. They do not 
constitute autonomous units, but are attached to human nouns at the syntactic level. Thus, they 
appear even after the plural marker ~ deul \[/s/\]. For example, in the following sentences, a PostHN 
%il nei \['s family/house\]' appears with a PN alone, or with a PN followed by an/N (here, ~1 ssi 
\[Mr.\]): 
~417} el =\]--~*tlxt ~ql~ ~xl~'rq" MinU-nei-ga imaeul-eise jeiil bujilenha-da 
PN<MinU>-PostHN\[farnily\]-Postp this village-Postp most diligent-St 
(MinU's family is most diligent in this village) 
7,~..9_a\]~lo~lx t -~o1 ~rq- GangGinO-ssi-nei-eise 
PN<Kang GinO>-IS\[Mr.\]-PostHN\[house\]-Postp fire-Postp occur-Past 
(There was a fire in Mr. Kang GinO's house) 
bul-i na-ssda 
In French, we observe a preposition similar to this PostHN: ehez ('s family/house), a locative 
preposition, as at one ~ in English, which selects only human nouns: 
283 
fly a eu un feu chez M. Pierre Piton 
There was afire at M. Pierre Picon 
Therefore, when we encounter a sequence that ends with an IN-PostHN-Poalp, the possibility to find 
a PN is increased. For example, the following string: 
~o~ 7~ !=~\]-~. jang-ga-nei-neun 
can be analyzed in 510 ways (i.e. (7 x 7 x 5 x 2) + (2 x 5 x 2) = 510) after a simple matching of the 
words of this string with their lexicon entries (Figure 11): 
fPN , me> IN:Mr. /<Poyn_ :, mily> 
/ <Nl:soysauce> <N:grade> / <PRON:you> 
J <N2:intestines> <Pfl::2 typ~ <Num.four> 
<N3:marlket> <Sfi:7 ty~s> <FU:yes!> 
<N4:chaper> <V:go>/ <IS:declarative> 
~5.'wardrobe> <P os~b: nominative> 
interrogation> 
< N1 :marriage> 
< N2 :long poem> 
<Postp:Sub> 
<IS:Det> 
Figure 11.510 analyses of" ~'7\]-~-'1\] ~ jang-ga-nei-neun" 
According to the local grammars we have constructed, we get the following result for this string 
(Figure 12): 
<PN.f.name> <IN:Mr.> <PostHN:family> <PosT:subject> 
Figure 12. Accurate analysis of the string in Figure 11 
3.2. Snperposition of contexts for PNs 
Let us examine the following sentences: 
Kim-ssi-dong~ing-nei-jib-wun aju keu..da 
PN<Kim>-IN-brother-PostHN-appartment-Postp very large-St 
(The appartment of Mr.Kim's brother's family is very large) 
Kim MinU bagsa-nim-adeul-nei-eise janchi-ga yelli-essda 
PN<Kim MinU>-.doctor-Sfx-son-PostHN-Postp party-Postp occur-Past 
(There was a party in Dr. Kiln MinU's son's family's house) 
GinO hyeng-nim-nei-lo modu ga-ja ! 
PN<GinO>-brother-Sfx-PostHN-Postp together go-St 
(Let's go together to Brother GinO's house !) 
!1 
I 
II 
II 
I 
I 
I 
284 ! 
II 
I 
I 
I 
I 
I 
I 
i 
Here, several of the noun phrases we have examined so far occur piled together. The internal 
structures of the examples above are respectively: 
(2a) PN- <Type I/> - <Type 111> - PostHN- Noun - Postp 
(2b) PN- <Type 11> - <Type III> - PostHN- Postp 
(2c) PN- <Type IF> - PostHN- Postp 
Hence, by providing information about the combinations of these strings, we could rise the accuracy 
in recognizing PNs. For example, the string that includes the sequence ~\]..~ z~ 41 ~ ssi-dongsaing-nei- 
jib in (la) can hardly be anything else than a noun phrase containing a PN. Thus, even though we 
:find several entries :~ lkim in the lexicon of nouns, such as: 
kira 1. Noun = steam \[e.g. ~ ol ~- \] 
2. Noun = dried laver \[e.g. ~ g\]'\] 
3. Noun = hope \[e.g. ~o\] ~r.Jc \] 
4. Completive Noun = chance \[e.g. -~ ~ ~... \] 
we can eliminate these interpretations, since these forms precede the complex sequence that requires 
necessarily a PN. 
4. Experimental results 
i So far, we have examined contexts where we expect to encounter Proper Names (PAr). In order to 
recognize automatically PNs on a large scale in texts in the absence of a complete lexicon of PNs, 
the description of noun phrases containing PNs should be necessary. We constructed local grammars 
based upon our description of the types of nominal phrases containing proper names. 
I Notice that implementing such a system requires the use of the relation between Recall and 
Precision. In general, it is understood that Recall is the ratio of relevant documents retrieved for a 
given query over the number of relevant documents for that query in a database, and Precision is the 
I ratio of the number of relevant documents retrieved over the total number of documents retrieved 
\[Fra92\]. 
However, Recall-Precision plots show that Recall and Precision arc inversely related. That is, 
I when Precision goes up, Recall typically goes down and vice-versa. If we want to recognize 
automatically PNs in a given text in order to construct an electronic lexicon of PNs, Recall, that is 
the ratio of PNs retrieved for a given grammar over the number of PNs in the text, should certainly 
be higher than Precision. 
I Let consider results of In the contexts of i.e. <PN us some experimental our study. Type V, 
Incomplete Noun-(Postposition + E)>, the Incomplete Noun (/N) '~\] ssi \[Mr./Miss./Mrs.\]' can 
appear with a family name alone, a given name alone, or a full name (of. 2.5. Figure 7). Remember 
I that, in Korean, a typographical unit delimited by blanks cannot directly be taken as a basic element 
for morphological analysis \[Nam97\]: we should then analyze the strings occurring with a blank on 
the leit side of/Ns as well as the strings stuck to 1Ns in order to examine the context Type V. Thus, 
I the local grammar of Type V for '~q ssr is the following graph (Figure 13): 
! 
Figure 13 
II 
II 285 
Our first text was composed of 29373 characters \[Cho96\], we located 22 sequences containing ssi, 
20 of which are PNs (Figure 14): 
Figure 14 
In the second text, composed of 30869 characters, 69 occmrences of "X-ssi" are observed. All "Full 
name-ssi" sequences here appear attached, whereas, in the preceding text, they all appear with a 
blank (i.e. 7(-#-ssi'). Here is the result (Figure 15): 
:.Name -ssi G.Name -ssi Full Name - ssi nonPN Tota 
35 I 1 t 24 ' 9 i 69 ' 
Figure 15 
The 9 sequences 'non-PNs" are as followings: 
\[1 \] ~ ~q 7\] eg-ssi-gi \[6\] o\]-7\].~\], aga-ssi, 
\[2\] °~\]---~--a" ib-ssi-leumeul \[7\] :z\]\]~\]y~ jei-ssi-doineun 
\[3\] o\]-7}~ ~ aga-ssi-lan \[8\] ~ ~ ~ nal-ssi-yessda 
\[4\] o\].y\].~\]~ aga-ssi-du \[9\] ~Y~ nal-ssi-ga 
\[5\] o\].y~ ~ aga-ssi-la 
Looking up our dictionaries of Korean Simple Nouns (DECOS-NSN01) \[Nam94\], and of Korean 
Postpositions (DECOS-POST/V01) \[Nam96b\] eliminate \[2\], \[3\], \[4\], \[5\], \[6\], \[8\], \[9\], which are the 
sequences composed of a common noun and a postposition (or a typographical separator such as a 
comma). Because \[1\] is a dialectal adverb, and \[7\] a 'Noun-Verb' string, they were not detected in 
our system. 
The third text composed of 33982 characters contains 10 occurrences of ~'-ssi" one of which is a 
nonPN ('.~-01 peulo ssi-leumi \[N/N/Postp\]') (Figure 16): 
Figure 16 
This nonPNwas eliminated after looking up the dictionary of Postpositions: there is no postposition 
'~o\] leumi'. The analysis of the text above on the basis of the local grammar presented in Figure 3 
(Type II <PN Spec-PT-(Posq~ + E)> ) in 2.2. allows to recognize PNs in a more satisfactory way. 
Besides ~-ssi" strings, with two PTs: '~-~-~ daitonglyeng \[the President\]' and '~ susa~g \[the 
prime minister\]', we could recognize 73 % of PNs, that is, 49 occurrences of 67 (i.e. Recall is 0.73). 
However, use of the local grammars of Figure 13 and Figure 3 (only with these two PTs above) 
leaves some nonPNs: Precision is 0.7 (49 strings of 70 which occurred with these/N and PTs are 
PNs). Since our goal is to recognize most contexts where PNs can occur, in order to consn'uct a 
lexicon of FNs as complete as possible, Recall should be more important than Precision in our 
system. By adding a few more PTs (cf. Type II) such as '~--~ janggun \[general\]', '~ sensu 
\[player\]', FRs (cf. Type III) such as '~l.~ bunye \[father-daughter\]', or/Ns (cf. Type V) such as '°o z 
yang \[Miss.\]' in the lexicon on the basis of which our local grammars are constructed, we could 
obtain a more reliable result as shown in the following table (Figure 17): 
i 89 ' 59 
Figure 17 
286 
l 
I 
I 
i 
I 
I 
I 
i 
I 
I 
I 
i 
I 
i 
i 
I 
i 
I 
I 
l 
I 
I, 
I 
I 
i 
I " i 
,-,) 
., I 
I 
I 
I 
I 
I 
I 
I 
I, 
I 
I 
I 
Thus, Recall increases: 0.88, whereas Precision goes down: 0.66. The 8 PNs that are not retrieved by 
our local grammars are given below. Actually, their contexts are hard to determine, since they are 
syntactically identical with contexts where common •nouns can appear: 
\[2\] "¢ ot-~"l'°A 
\[3\] '~s~l'¢ 
\[4\] ~q~ ~x~\]e.\]--~71- 
saddoo 
maigadeeui 
doyoddomieui 
kimdaijung munjeiladeunga 
\[5\] ':~ ~.~ xl-~', 
\[6\] ~,,,.l\] :~ ~,\]-~ 
\[7\] ~ ;q ~'l-u\] - 
\[8\] :~ q¢- ~\]-~, 
kimdaijung sagen 
munseigwang sagen 
kimjihana 
kimdaijtmg sagen 
To guarantee that all occurrences of PNs are covered by local grammars, it would be necessary to 
consider a great part of the contexts where common nouns appear. 
In this paper, we have described the contexts where proper names can occur, but the complete 
lists of the nouns requiring PNs have not been done. We are sure that these lists are not illimited 
• ones, they will be presented in further studies. Notice that these studies are deeply related to the 
syntax of nouns, especially that of human nouns. In this sense, human noun, a semantic concept, can 
nonetheless become an operational term in the formal description of natural languages, 
indispensable many procedures of Natural Language Processing CNLP) systems. 

References 
\[Cha93\]Chang, Chao-Huang, 1993, Corpus-based Adaptation Mechanisms for Chinese Homophone 
Disambiguation, Proceedings of the Workshop on Very Large Corpora, Ohio State University, 
USA. 
\[Cho94\]Choi, Key-Sun et al, 1994, A Two-Level Morphological Analysis of Korean, Proceedings of 
the 15th International Conf. on Computational Linguistics (COLING '94), Kyoto, Japan. 
\[Cho96\]Choi, Key-Sun ~ al, 1996, Korean Information Base Corpus, KAIST. 
\[Cou87\]Courtois, Blandine, 1987, Dictionnaire 61ectronique du LADL pour les mots simples du 
fran~ais (DELAS), Rapport Technique du LADL, N ° 17, University Paris 7. 
Dictionnaire universel des noms propres (Petit Robert 2), 1974, ed. Le Robert, Paris, 1st edition. 
\[Fra92\]Frakes, William B.; Ricardo Baeza-Yates, 1992, Information Retrieval: Data Structures and 
Algorithms, Prentice Hall, Englewood Cliffs, New Jersey 07632. 
\[Gar91\]Gary-Prieur, Marie-Noelle, 1991, Le nora propre constitue-t-il ,me cat4gorie linguistique?, 
Langue fran~aise N-92, Paris: Larousse. 
\[Gro87\]Gross, Maurice, 1987, The use of finite automata in the lexical representation of natural 
language, Lecture Notes in Computer Science 377, Springer-Verlag. 
\[Gro89\]Gross, Maurice, 1989, La construction de dictiounaires 61ectroniques, Annales des 
T%l~ommunications, tome 44 N ° 1:2, Issy-les-Moulineaux-Lannion: CNET. 
\[Gro93\]Gross, Maurice, 1993, Lexicon Based Algorithms for the Automatic Analysis of Natural 
Language, in Theorie und Praxis des Lexikons, Walter de Gruyter: Berlin. 
\[Hee93\]Heemskerk, Josee S., 1993, A probabilistic Context-free Grammar for Disambiguation in 
Morphological Parsing, Proceedings of the 6th Conference of the European Chapter of the 
Association for Computational Linguistics, Utrecht, The Netherlands. 
\[Lap96a\]Laporte, Eric, 1996, Context-free parsing with finite-state transducers, RT-IGM 96-13, 
University of Marne-la-Vall6e. 
\[Moh94\]Mohri, Mehryar, 1994, Application of Local Grammars Automata: an Efficient Algorithm, 
RT-IGM 94-16, University of Marne-la-Vall6e. 
\[Nam94\]Nam, Jee-Sun, 1994, Dictionnaire des noms simples du cor~en, RT N ° 46, Laboratoire 
d'Automatique Documentalre et Linguistique, University Paris 7. 
\[Nam95\]Nam, Jee-Sun, 1995, Constitution d'un lexique 61ectronique des noms simples en cor6en, 
Acres du LGC-1995 : Lexique-grammaires compar6s et traitements automatiques, University of 
Qu4bec a Monu'6al, Canada. 
\[Nam96a\]Nam, Jee-Sun, 1996a, Dictionary of Korean simple verbs: DECOS-VS/01, RT N-49, 
LADL, University Paris 7. 
\[Nam96b\]Nam, Jee-Sun, 1996b, Dictionary of Noun-Postpositions and Predicate-Postpositions in 
Korean: DECOS-PostN / DECOS-PostA / DECOS-PostV, RT N- 5 l, LADL, University Paris 7. 
\[Nam96c\]Nam, Jee-Sun, 1996c, Construction of Korean electronic lexical system DECO, Papers in 
Computational Lexicography Complex '96, ed. by Ferenc Kiefer, Gabor Kiss et Julia Pajzs, 
Budapest : Linguistics Institute, Hungarian Academy of Sciences. 
\[Nam96d\]Nam, Jee-Sun, 1996d, Classification syntaxique des constructions adjectivales en cor~en, 
Amsterdam-Philadelphia: John Benjarnins Publishing Company. 
\[Nam97\]Nam, Jee-Sun, 1997, Lexique-Grammaire des adjecfifs cor~ens et analyses syntaxiques 
automatiques, Langages N°124, Paris : Larousse. 
~,Tar93\]Narayanan, Ajit; Lama Hashem, 1993, On Abstract Finite-State Morphology, Proceedings of 
the 6th Conference of the European Chapter of the Association for Computational Linguistics, 
Utrecht, The Netherlands. 
\[Oga931Ogawa, Yasushi; A.Bessho; M.Hirose, 1993, Simple Word Strings as Compound Keywords: 
An Indexing and Ranking Method for Japanese Texts, Proceedings of the 16th Annual 
International ACM SIGIR, Pittsburgh, USA. 
\[Par94\]Park, Se-Young et al., 1994, An Implementation of an Automatic Keyword Extraction 
System, Proceedings of Pacfic Rim International Conference on Artificial Intelligence '94, 
Beijing, Chine. 
\[Par96\]Park, Se-Young et al, 1996, Korean Corpus- based on News papers, ETRI. 
\[Per95\]Perrin, Dominique, 1989, Automates et algorithmes sur les mo.ts, Annales des T~l~communi- 
cations, tome 4.4 N 1:2, Issy-les-Moulineaux-Lannion: CNET. 
\[Rey74\]Rey, Alain, 1974, Pr6sentation du Petit Robert 2. 
\[Si193\]Silberztein, Max, 1993, Dictionnaires ~lectro-niques et analyse automatique de textes, Le 
syst~me INTEX, Paris: Masson. 
\[Ton93\]Tong, Xiang; Chang-ning Huang; Cheng-ming Guo, 1993, Example-Based Sense Tagging 
of Running Chinese Text, Proceedings of the Workshop on Very Large Corpora, Ohio State 
University, USA. 
\[Wil91\]Wilmet, Marc, 1991, Nom propre et ambiguit~, Langue fran~aise N ° 92, Paris: Larousse. 
