Identifying Temporal Expression and its Syntactic Role Using 
FST and Lexical Data from Corpus 
Juntae Yoon 
jtyoon@daumcorp.eom 
Daum Communications Corp. 
Kangnam-gu~ Smnsung-dong~ 154-8 
Seoul 135-090~ Korea 
Yoonkwan Kim Mansuk Song 
{general,mssong} @december.yonsei.ac.kr 
Dept. of Computer Selene% Engineering College 
Yonsel Univ. 
Seoul 120-749, Korea 
Abstract 
Accurate analysis of the temporal expression is cru- 
cial for Korean text processing applications such 
as information extraction and clmnking for efficient 
syntactic analysis. It is a complicated problem since 
temporal expressions often have the ambiguity of 
syntactic roles. This t)al)er discusses two problenm: 
(1) representing and identiflying the temporal expres- 
sion (2) distinguishing the syntactic tim(lion of the 
temporal exI)ression in case it has a dual syntac- 
tic role. In this paper, temporal expressions and 
the context for disambiguation which is called local 
context are represented using lexical data extracted 
fiom corlms and the finite state transducer. By ex- 
periments, it; turns out that the method is eflimtive 
for temporal expression analysis. In particular, our 
al)t)roach shows the corI)us-based work could make 
a promising result for the t)roblem in a restricted 
domain in t, hat we can eflbctievely deal with a, large 
size of lexical data. 
1 Introduction 
Accurate analysis of the temporal expression is cru- 
cial tbr text processing aplflications such as informa- 
tion extraction and for chunking for efficient syntac- 
tic analysis. In information extraction, a user might 
want to get a piece of information about an event. 
Typically, the event is related with (late or time,, 
which is represented by temporal expression. 
Chunking is helpflfl for efficient syntactic analy- 
sis by removing irrelevant intermediate constituents 
generated through parsing. It involves the task to 
divide sentences into non-overlatli)ing segments. As 
a result of chunking, parsing would be a problem of 
analysis inside chunks and between chunks (Yoon, et 
al., 1999). Chunking prevents the parser fl'om pro- 
ducing intermediate structures irrelevant to a final 
output, which makes the parser etticient without los- 
ing accuracy. Thus, it turns out that chunking is an 
essential stage tbr the application system like MT 
that should pursue both efficiency and precision. 
Korean, an agglutinative , has well- 
developed flmctional words such as postposition 
and ending by which the grammatical fimction of 
a phrase is decisively determined. Besides, because 
it is a head final  and so the head always 
follows its complement, the chunking is relatively 
easy. However, we are also faced with an mnbiguity 
problem in chunking, which is often due to the tem- 
poral expression. This is because inany temporal 
nouns are used as the modifier of noun and vert) in 
a sentence. Let us consider the tbllowing examI)les: 
\[Example\] 
la jinan(last) :l\]('oFd'll,'llt(SlllillIler) 
-+ 
lb 
uvi-tteun 
(we/NOM) hamgge(together) san-c(to moun- 
tain) .qassda( went ) 
\;Ve went to the mountain together last sum- 
Incr. 
jinan(last ) yeorr;um(summer) banghag-c(in va 
-9 
2a 
cation) 'uri-neun(we/NOM) hamgge(together) 
san-c( to mountain) gassda( went ) 
We, went to the mountain together in the last; 
SUllllller Va(;atioIl. 
i0 weol(October) 0 il(9th)jeo'nyeo.q(evening) 
-+ 
2b 
"7 sit7 o'clock) daetong'ryeong-yi (presi- 
dent/OEN) damh,'wa-ga(talk/NOM) issda(be) 
The president will give a talk at 7:00pro in 
Oct. 7th. 
10 wool(October) 9 il(9th)jconyeog(evening) 
7 sit7 o'clock) bih, aenggipyo-reul(flight ticket;/ 
ACC) yeyaghal su isssev, bnigga(can reserve) 
-+ Can I reserve the flight ticket tor 7:00pro in 
Oct;. 77 
Ill the examples, each temporal expression plays a 
syntactically different role used as noun phrase or 
adverbial phrase (The undeJ'lined is a phrase) al- 
though they comprise the same phrasal fornls. Tile 
temporal expressions in la and 2a of the example 
serve as tile temporal adverb to modify predicates. 
On the other hand, the temporal expressions iu lb 
and 2b are used as the modifier of other nouns. That 
is, as a temporal noun either contributes to construc- 
tion of a noun compound or modifies a predicate, it; 
causes a structural aml)iguity. 
One sohltion might be that the POS tagger as- 
signs a different tag to each temporal noun e.g. NN 
954 
mid ADV'. However, since (let/en(len(:ies of teml)oral 
llOllllS .~l, re lexically (lecided, it does not seem that 
their synta('t;ic ta.gs could tie ac(:urately t)redi(:ted 
with a relatively small size of PeG tagged (:orl)uS. 
Also, the siml)le rule based a.1)l/roach Callliol; litake 
sa|;isfa.ctory resuli;s without lexi('al information. As 
such, identification of temi)oral expression is a coin- 
plicate(t l)roI)lem in Korean text mmlysis. 
This 1)aper disc, usses identiticatiol~ of temi)oral ex- 
t)ressions ;m(1 their synta(:tic roles, in this t)aper, 
we wotfld deal with two 1)robleins: (1) re, I)resent- 
ing mM idenl;ifying |;he teinl)oral exl)ression (2) dis- 
tinguishing its syntactic time|ion in case it; has a 
drml syntacti(: role. Acl;ually, tile two t)rol)lems are 
(:h)sely related since the identifi(:ation and (lisam- 
biguation proc(~,ss WOll\]d lie done un(ler l;\]w, r(',l)re- 
seul;a|;ion s(:henm of |;emporal exI)ression. Tim pro- 
('ess bases (in lexi(:al data exl;ra('l;e(t \['rOlll (:orl)llS ;Ill(t 
the finite state trans(lu(',(u' (FST). A(:(:(n'(ting to our 
ol,servation of texts, we (:ould see that a fe, w wor(ls 
following a. teml)oral llOllll ha.ve great ef\['ect on the 
syntactic funet, ion of the temt)oral noun. Theretbre, 
we note that the stru(:tura.1 amlliguity (:ould lie re- 
solved in lo(:al (:ont(~xts, mM so obtain lexical in- 
tbcmation for the lo(:a\] (:ontexts from (:orlms. The 
lexi(:al da.ta which (:onl;ain (;onl;exl;s for disambigua- 
|ion are ref)resenl;e(t with |:emt)()ral wor(t transition 
()ver the 1EST. 
lkMly (l(~scrilling our methodology, we tirsl ex- 
l;r:,mI; (',Oll(;or(lail(:(~ dal,a of each I;emt)oral w(/r(t using 
~/ COllCOrd,:t,llC(~ l)rogr;nn. '\['he CO-OCCllrr(}IlC(~S r(~l)l'e- 
SQIlI; relations ll(d;ween l;e\]nl)ora\] wor(ls and also ex- 
plain how I;(~,llll)Ol'al ll()llllS ;~tll(\] COllllllOll ll()llllS }/17o 
combined to gCllCritl;C .:1 (:Oml)OmM noun. It wouhl 
be the like, lihood of woM (:oral)|nail(m, whMl helps 
disambiguate tim syntacti(: role if a teml)oral word 
have a synta(:|;i(: duality. \]n particular, we (:lassi\[y 
t(mq)oral llOllllS into 26 classes in ac(:ordan(:e with 
their meaning a.nd fmwtion. 'l'hus, the w(n'd (:o- 
OC(;llrr(~,llC0,S l)ecollle t;\]lose ~llll()ll~ |;ellli)oral (:\]asses 
or |;elnl)oral (;lasses and el;her ll(/lillS~ whi(:\]l resuli;s 
in re(lu(:ing the 1)m-mn(d;er spa(y_ Se, con(1, l;emi)o- 
ral expl'essiolls ('Ollt}tillillg |;he c()-occ/lrrellces ()\[" fera- 
l)oral (:lasses an(1 other 1\]OllllS are rel)r(;sented with 
the FST to identit~ temporal ext)ressions and assign 
their syntactic tags in a Sell|;(}llC0,. it has t/een shown 
|;lint; the FST t)resents a very etlMent way for repre- 
senting 1)hrases with locality. The inlmt of the FST 
is the result from morphological analysis a.n(1 POS 
tagging (here, the teml)oral noun is tagged only as 
nora 0. Its ou|;I)Ul; is the syntaci;i(: tag for each word 
in the senl;ence and temi)oral words are al;l;ached tags 
such as noun and adverl), l:igm'e 1 shows tim over- 
all system fl'om the morI)h(/logical analyzer l;() l;he 
chunker. 
Therefore, the, t)ro(;(~,ss atl;at;ll(~s sylll;a(:l,i(; labels 
to the 1)revious exmnl)les so thai; (:hunking w(ml(1 lie 
safely executed from the results as follows: 
\[Exmnple\] 
la.' \[j/na'n.(last) y(dol'(;ltfl,.( Sllllllller ) \]T A 'ltf'i-'ll.('lt~l. 
(we/SeMI mou,,- 
rain) gassda(went) 
--> \Ve went to the mountain together last smn- 
lller. 
b' ) sum,, ,,r ) \].,, x 
e(in vacation) u'ri-neun(we/NOM) hamg.qe 
to we,ltO 
-+ We. went to tile momltain together in the last 
SllIlllllOr vo~cal;iOll. 
2 1' \[.to o il( o ;h ) eve.i.  ) 
7 si(7 o'clock)\]TA dactongr'ycong-yi (presi- 
dent/GEE) damh,'wa-ga(tall¢/NOM)'i,s',s'da(be) 
-~ The president will Give a talk at 7:0()pro ill 
O(:t. 7th. 
21) ' \[10 ',,,,.ol( ( ),'|~ol ier ) :) ',:l(9a) .#o',.~,,,,9( ev~;,ti,lg) 
7 s/(7 o'clock)\]:rN bih, a(:nq.qipyo-re'u,l(flight 
ticket/ACe) ycyagh, al su issscubni99a(can re- 
Se, I'V(~,) 
--~ Can I reserw~ the tlight ticket for 7:0()lml in 
()(:t. 7? 
2 Related Works 
Almey (\] 991) has proposed texL chunldng as a t)l'e - 
liminary st;e l) to tmrsing on the basis of psycho- 
logical (wi(lence. In his work, tim chunk was (le- 
fine(1 as a. t)artitione(l segnmnt wlfi(:h corl:(~,Sl)Oll(ts in 
some way to ln'osodic lmtt(!rns, l\[n addition, con> 
1)lex }/I;|;tlchlllellt (\](?cisions ;Is occurring in NP or 
VP analysis are 1)OStl)one,(l wit;hour \]icing (leci(led 
in (:hunldng. Rmnshaw and Marcus (1995) intro- 
du(:e(l a 1)aseNl' whi(:h is a non-re(:ursive NIL They 
used trmlsfornmtion-1)ase(l learning to i(lentif~y n(/n- 
recto'sire l)aseNPs in a s(mtence. Also, V-typ(~ (:hullk 
was iifl;roduce(l in their system, and so I;lw, y (aied 
to t)artition sentences into non-overlal)l)ing N-type 
;til(t V-tyl)e ('hunks. Y(/on, et al. (\].099) have, de- 
fined ('lmnking in various ways for efficient analysis 
of Korean texts mM shown that the, mt:tho(t is very 
eff(;(:tive for practical al)l)li(:ation. 
l leside, s, th(',r(~ have })een many w(/rks based on the 
finite state ma(:hine. The finite state machine i,~ of 
ten used for systems su(:h as speech t)r(l('essing, liar- 
tern mat('hing, P()S tagging and so forth becmlse 
of its ei\[i('ien(;y of sl)ee(l mid si)aee and its ('onve- 
nience of rei)resenl;ai;ion. As for parsing, it is not 
suitable \['or flfll parsing based on the grammar that 
has recurrent property, but for partial parsing re- 
quiring simple, sl;&l;e l;rallsitioll, l~.o('he all(1 S(:hal)es 
(1995) have i;ransformed l;he Brilt's rule based tagger 
to the (ll)timized deterministic FST and imI)roved 
the sl)eed mM sl)a.ce of the tagger. A. nora.tile one re- 
lated t(/this work is about local grammar 1)resented 
in Gross (1993), which is suital)le for rel)resenting 
955 
SOlltelleO 
jinan yeoreum banghag-e uri-neun hamgge san-e gassda 
Morp Anal and POS Tagger 
.LJ.I 
jinan/A yeoreum/N banghag/N-c/P uri/PN-ncun/P hamgge/AI) san/N-c/P ga/V-ss/TE.-da/E 
FST Ibr idefifying temporal expression 
jinan/A_m ycorcum/N_m banghag/N-e/P uri/PN-neun/P hamgge/AD san/N-e/P ga/V-ss/TE-da/E 
Text Clmnkcr 
\[jinan/A_ln ycoreum/N_tn banghag/N-c/P\] \[uri/PN-neun/P\] \[hamgge/Al)\] \[san/N-e/P\] \[ga/V-ss/TE-da/E\] 
Figure 1: System overview fl'om the inort)hological analyzer from the chunker 
rigid phrases, collocations and idioms unlike global 
grammar for describing sentences of a  in 
a formal level. The temporal expression was repre- 
sented with loc~l grammar in his work, where it, was 
claimed that the formalism of finite automata could 
be easily used to represent them. 
3 Acquiring Co-occurrence of 
Temporal Expression 
3.1 Categorizing Temporal Nouns 
Since many words have in common a similar mean- 
ing and flmction, they can be categorized by their 
features. So do temporal nouns. That is, we say that 
'Sunday' and 'Monday' have the same features and 
so would take the similar behavior patterns such as 
co-occurring with the similar words in a sentence or 
phrase. Hence, in the frst place we categorize tem- 
poral nouns according to their meaning and func- 
tion. We first select 259 temporal nouns and divide 
them into 26 classes as shown in Table 1. Among 
them, some temporal words have syntactic duality 
and others play one syntactic role. Thus, the dis- 
ambiguation process would be applied only to the 
words with dual syntactic functions. 
3.2 Acquisition of Temporal Expressions 
from Corpus 
Temporal words would be combined with each other 
in order to be made reference to time, which is called 
temporal expression. Since a temt)oral expression is 
typically composed of one or a few temt)oral words, 
it seems to be possible to describe a grammar of 
modifying noun jeonlsilll-eUll nlasisseossda 
(hmch/NOM) (was delicious) 
oneul(today) ~@ ..... @ 
~"X hagayo-c(lo school) gassda(wenl) 
modifying predicate 
Figure 2: Syntactic flmctional ainbiguity of tcnlpo- 
ral expression 
the temporal expression with a simple model like fi- 
nite automata. Ill tile practical system, however, we 
are confronted with a complicated problenl in treat- 
ing teml)oral expressions since many temporal words 
have a functional ambiguity used as both a nominal 
and predicate modifier. For instance, a temporal 
noun oneul(today) could play a different role in the 
similar situation as shown in Figure 2. In the first 
and the second path, the words to follow oneul are all 
noun, but the roles (dependency rela;ions) of oncul 
are different. 
Accurate classitication of their syntactic flmctions 
is crucial for the application system since great dif- 
ference would be made according to accuracy of the 
dependency result. Practically, we therefore should 
take into consideration the structural ambiguity res- 
olution as well as their representation itself in identi- 
956 
word cat(;gory class # l;eml)oral words 
modifier 1 ol(l;his), jinan(lasl;), ... gemt)oral 1)refixes 
lllllll\])er 2 
3 -10 
llHlllbel" . • . 
era~ age 
l;emporal unit .Sg(li(Cell~l'y), 7~ygoT~(ye~l.l.'),... Lellll)oral llOllllS 
years 
lno~|l, hs 
19 
20 21 
weeks 
----~ day of week 
(lays I (lay\] 
day2 
time1 
1;line ~ 1;|me2 
~111(\[ \[ seasoll 
(lura- I Sl)eeiti e (hlration 
1;lOIl _I edge 
11 ftosaenfldae(1)aleoz(71c), ... 
12 9eumnyeon(this year)-, saehae(new year),... 
13 ~ month), jeon.qwol(January), ... 
14 Oeum:#;,(this week), naeju(nexl; week),... 
15 |lye'//(Sunday), wolyo' il, ... 
16 17 h, aru(one day), ch,'ascog(Thanksgiving day), ... 
18 o,~Tia, lgo,lay), ,,,:' il(*;omorrow), . . . 
.saebyeoo( (lawn ) , achim( morning) , ... 
yeonmal(year-end~, ... 
22 ~st)ring), yeoreum(smmner),... 
23 hwar~:~eolgi(time of season changing), ... 
24-25 ch, ogi(early t, ime), j'm~,gban(mid), ... 
l;emporal suffixes ~eml)oral suffixes 26 dongan(durillg), .~,aenae(l;hrough), ... 
Table 1: Categorizal.ion of l;eml)oral words 
t~ying l;eml)oral exl)ressions. The poinl; I;hal; we note 
here is thai; we eould pre(liel; the synt;a(:l;ic fllncl;ion of 
I;emt)oral words 1)y looking ahead one or l;wo words. 
Namely, looldng at; a Dw words thai; follows a 1;em- 
poral word we can figure oul; which word the tempo- 
ral expression modifies, and call l;he following words 
local conl, e:rt. 
Unfortunal;ely, it is not easy t() define t, he, local 
conl;exI; for del;ermilfing 1;he synt, ael;i(: flm('l;ion of 
eae.h temporal word 1)eeause l;hey are lexieally re- 
lai;ed. Thai; is, il; is wholly ditl('.renl; fl:om each wor(t 
wlw,|;her a I;(nnl)oral noun would modit)- ()l;h(w i1()1111 
(:o form ~ (:Oml)Om~d noun or mo(lii~y a 1)re(li(:al;e as 
m). adverbial l)hrase. ()ur al)t)roa/:h is l;o use col 
pus to acquire informal;ion atmut, l;he local ('oni;exl;. 
Since we could obtain fi'om eorl)uS as many exam- 
ples as needed, rules for comt)ound word generation 
can be (:onstructe(1 from l;\]le examI)les. In l;his 1)al)el', 
we llSe CO-OCClllTe, II(;O rela|;ions of l;emlmral llO/lllS ex- 
t,r,acted froll\] large corpus I;o represent and consi;ru(:t; 
rules for idenl;ifieal;ion of l:emporal expressions. 
As lnenl;ioned before, we would 1)a.y a|;ten(,ion (;o 
two t)oinl;s here,: (l.) In whal; order a tenq)oral ex- 
1)ression would 1)e represenl;ed with temt)oral words, 
i.e. descrit)|;ion of the temporal exi)ression nel~work. 
(2) how the local context would 1)e described to re- 
solve tile ambiguity of the syntactic tim(lion of tem- 
1)oral ext)ressions. For this tmrpose, we tirs|; extract 
examl)le sentences containing each of 259 l;eml)oral 
words from eorlms using l;he KAIST concordance 
progrmn :l (KAIST, 1998). The numl)er of t, elnporal 
words is small and so we could mmmally manii)u- 
late lexieal dal;a ext;racted frOlll corl)us. Figure 3 
1KAIST corlms consists of about 50 million cojeols, l';ojeol 
is a sl)acing unit; comi)oscd of a content word and functional 
words. 
shows exanll)le sen|;enees at)out, ye.o'reum(sununer) 
ext;ra(:t;ed l)y the. coneor(lanee l)rogram. 
Second, we s(;leet only l, he t)hrases related wit;h 
temporal words fl'om the examples (Table 4). As 
shown in Table 4, yeoreum is associated wii;h va.ry- 
ing words. Temporal words like temporal pre.tixes 
can come before it and coIlllIlOil llOl_lllS C}lll follow ig. 
In (,his stage we describe con(;exts of each temporal 
word and (;he olll;1)U(; (syn(;ae(;ic tag of (;he tOtal)oral 
word) under the given eonl;ex(;. In l)artieular, each 
l;eml)oral word is assigned a (.emporat class. Be.sides, 
or;her nouns serve as local cent;exits for disaml)igua- 
(;ion of syntac(;ic flmc(;ion of t, emporal words. 
lS:om (;he examl)les , we can see t, lmC if ha're(night), 
byeo(jau, g(villa), ba, ngh, ag(vacal, ion) and so on follows 
it, yeoreum serves as a (:olnponent of a ('ompomld 
noun with the following word. On t, he other hand, 
the word naenac wtfich means all the time is a tem- 
poral noun and forms a teml)oral adverl)ial l)hl'}/se 
wil;h ()Idler 1)receding temporal nomL Moreover, yeo- 
7'eum(sllnuner) might represent time-relal, ed expres- 
sion with t)receding l;eml)oral prefixes. 
4 Identifying Temporal Expressions 
and Chunking 
4.1 Representing Temporal Expression 
Using FST 
The co-occurrence data extracted by tile way de- 
scribed ill the previous section can be represented 
with a finite state machine (Figure 5). For synt;ae- 
tie :\[inlet;ion disambigual;ion an(t chunking, the au- 
tomata should produce an out, lm|;, which leads to a 
fiifite st;ate t;ranslhleer. In fact, individual deserip- 
l;ion for each data could be integral,ed into one large 
FST and represented as the right-hand side in Fig- 
ure 5. A finite sta.te transducer is defined with a nix- 
957 
left context word right context 
~ ;._l_~otl = ~1~,~11 ~ .~o,~ ~_ ~,~.. \[ 4~- 
-~1~- A~I ~ ~-lo~l,g ~- ~1ol ~t-. \[o:1-~- 
4~-~ ~-~-~-~- ~1 &~l~ll ~1~,~-. \[d~ 
0 
~k. ~1~11 °' ° 
~,,t~xl ~ ~-~@~1~11 ~ ¢4'q ~. \[xl~ °4~\] 
~d-~dol~l-q ~t~- ~o1-7-~I-!" ~ \[~ll o:t~-\] 
,'~ ~q-~l 4 o~_,£. \[o I.d oq ~\] 
~\]'d~ °lq ~<-ol- d ~ ;q "d 
'gb~\] ~-<>11 ~. o_~ -, ~ ~-1~1-~- ~ 
~o~*. ~\] '~ <gl~- -V-~°II -~}~ 
.j-q-\] _~ol~x- I ,lu&~ o,o~ ,~ 2,2, 2~...J~. 
,~Xor\]Ol~tq, :z~} "J.-*~ls1-~'li ol ~,\] 
-~-~\]~ ~x~lq- ~1~,4 ~. ~'t-l-b 
~'l~xl'4 ~ ~-~t ~1;'I ~ ~1~~ ~1~ 
Figure 3: Example concordance data of yeorcum(summer) 
befbre temporal noml after outlmt freq 
yeoreum( suulmer ) 
x\] ~.!-/ t~ (ji.,.,,~, ~ant) 
~/tlO (hae,yeal') 
o\] ~/ld (ibcon,thin) 
o:t ~-/t22 
o:t-~-/t2.,_ 
o-1 ~- / t.2.2 
oq ~-/t,2., 
o~ ~-, /~ 
o~ ~-/t._,., 
r~\[(bam,night) TN 2 
vo~,(banghagyacation) TN 7 
'~ ~-( bycoljang,villa ) TN 1 
~(jumal, weekends) TN 1 
J,~ 71 (.qam,.qi,flu ) TN 1 
q\] q\]/t2~ (nacnae,all the time) TA 1 
~-F~-(na,,eunJ/TOP) TA \] 
6.25, \]- \] , ('m,a.?rmag, the last) TA 2 
~ (.leo'ntuncun,1)attle/.\[ 0I ) TA 1 
Figure 4: Temporal expression phrases selected fronl examl)les 
tuple (Ej, E2, Q, i, F, E) where: E1 is a finite input 
alphabet; E2 in a finite output alphabet; Q in a ti- 
nite net of states or vertices; i E Q in the initial state; 
F C_ (2 is the set of final staten; E C Q × E~: ~ E.; x (2 
is the set of transitions or edges. 
Although the syntactic function of a temporal ex- 
pression would be nondeterministically selected fl'om 
the context, temporal expressions and the lexical 
data of local context can be represented in a de- 
terministic way due to their finite length. For the 
deterministic FST, we define the partial functions ® 
and • where q®a = q' iffd(q, a) = {q'} and q,a = w' 
iff ?q' E Q such that q®a = q' and 5(q, a, q') = {w'} 
(R.oche and Schabes, 1995). Then, a nubsequential 
FST is a eight-tui)le (El, E2, Q, i, F, ®, *, p) where: 
E1,E2,Q,i and F are the smnc as the FST; ® is 
the deterministic state transition fimetion that maps 
() x E1 on Q; • is the deterministic emission fimction 
Figure 6: 
T = (~,, r,.~, O,i, F, o,.,p}) 
)2~ = {tl, t~2, t~6, wi, wj} 
E2 = {TN, TA, NT} 
0 = {o, 1,2,3} 
i = 0, F = {3} 
0c4tl =1, O,t1 =TN, 
1®t22 =2, l*t~=G 
2®t~< =3, O*twi =TN-NT, 
2 @ t.~6 =3, O * t.2s = TA, 
2 ® twj =3, O * t, u = TA_NT, 
p(3) = 
Deternfinistic FST resulted from Figure 5 
that maps Q x E1 on E~; p : F --) 22~ is the final 
outtmt fluiction. 
Our teniporal co-occurrence data can lie relive- 
sented with a deterministic finite state transducer 
958 
} ¢~g~'tl\[1111 N , bdli#~\[ 
}~¢~CUUltl N I~) colJanglNI 
i ,*. ! N I~mylHL'IN I 
/ )c, nc ~I'A ;., ) :"" .( &, 
jin;ulY\[ N/ 
~¢,w,,,,,n X , j.,,,,,l,x I 
~ c(~etll,l/I N ila ¢11 a,'\[ \['A 
jln;m/I N ¢,l¢lbrlltl A ilajlclmiN I 
k,e/IN ),,,reumflA ¢,25/NI 
kw/IN ~¢,liclnllJ\[ a ilk ij~tll Ik~\]N I 
\ d ................................................. \, ............. ('/ 
Figure 5: Fiifite sl;al;e nta('hin(~ (:onsl;rucl;ed wil;h l;he (bd;a in Figure d 
O t'/:"H-( ) 
0 I 
wl /!l'N, NT' 
t2'-'(' ~ ( ) !~';IT A 
u,j /'I'A, NT 
ll" :: {barn, jumol, t, altgha 9 .... } 
mi (i lI" 
wj ~ 1V 
Figure 7: A d(;1;erministic tinil, e sl, aW, (,ransduc('a' 1;o 
i)roce, s,s temi)oral ex\])re,,qsion 
in a similar way. The, sut)s('qu(',ntial FST f()r our 
sysl,em is (l(,,fined as in Figure (i and Figur(~ 7 ilhLs- 
l.ral;es I;he tral~sdu(:(!r in l?igur(~ 6. In L\]m tiI~me, ti 
is a c\]ass 1:o whi(:h lhe (:eml)oral w()rd 1)elongs in l;he 
lx',mporat (:las,qiti('at;ion. wi is a word ol;her l;han l,em- 
1)oral ones 1;hal; has l;he pr(',(:(~(ting t eml)()ral wor(l 1)(', 
il:s modiiier, and wj is not; such a word 1;() make a 
compound noun. TN, TA and NT are synt, aciic 
tags. A word t;agged with 5/'N would modify a su('- 
ceeding llOllll like, barn(night), bangh.ag(vacati(m). A 
word al:t, ached with TA would lnodify a predica.lx~' 
aim one with NT nmans ii; is not; a 1;emporal word. 
A(:mally, individual FSTs are coml)in(;d into one aim 
rules for tagging of temporal words are pul; over l;h(; 
.,J. The rule is applied according to the prioriW 
by freqllellcy ill case lllOrl2 t;hall ()lie ()ll{;l)ill; are \])os- 
sible for a (:Oilt;ex|;. Nmnely, it; is a rule-l)ased system 
W\]I()I'(I ~\]le rllles al'e, (~xl;ra(;|;(?(l frolll (;ort)llS. 
4.2 Chunldng 
Afl;er the FST of l;enlt)oral (',xt)ressions adds I;o woMs 
syntactic tags such as TN and TA, chunking is con- 
ducted with l'eSlllI;s frolll OllI;l)llI;S 1)y t;h(' FST. As we 
said earlier, (:hunldng in Korean is relal;iv(;ly easy 
only if t;h0, t;eml)oral exi)l'essioll wou\](t be success- 
fully recognized. Act;ua.lly, our ('hunker is also based 
on the, finil;e s(;a,l;e machine. The following is an ex- 
mnl)le for (:hunldng rules. 
iN1,) ~ (NF (NP) I (2V)* (2VIO* (UN) 
('rNI,) ~ {TN)* (N)* (XP) 
\]\](!re: j\r is a noun wil;h(ml, rely \])ost.t)osit.ion , NP is 
a noun wit.h a. 1)oSl;l)OSil,ion, TN is a t;enll)oral noml 
recogniz(~,d as modifying a suc(:ox'ding n(mn, NU is 
a number and UN is a uni(; n(mn. Afl;e,r t('mporal 
l.a.gging, 1;he ('hunker l;ransforms 'NT' into N, NP, 
(d,(',. according I,o morl)hologi(:al consi;itueid;s and 
their I)OS. I h'io, tty, t;he, rule says thai; an NP ctmnk is 
mad(', from eil3mr NI' or l;emporal NIL An NP would 
\])(! (:()llsLrll('l,(',(1 wii,h on(; or lll()l'(} llOllll.q ;ill(1 \[;boil" 
modilie(~ or with a noun (lUanl;ified. A TN\] ), whi(:h 
is r(',lal:ed with lime, is made from n(mns moditied 
by t('ml)oral words wlfi(:h would 1)(', i(t(;nl;itied by the 
FST. By i(l(mtifi(:ation of lx!mporal (',xpressi()n and 
chunking, tlm following (',xmnl)k', senl~elu:e, is chunked 
as |)(;low. 
• jinan(lasQ ycor~'um(summer) bau, ghag-e(in 
• s * " v,,,:,,.(;io,,) ,,,.,.,,-,,.,;,,,,4,,,~./s un.~) k~o,,,,VV,,- 
t(.~o(,.o,m,,,~;(,F) .,,~,(th~.,,o~)d(.~'-,~',,l(.,,iqOl~.J) 
sassda(bought;) 
-+ \,Vo, bought l;hrce c()mlmt(~'rs in the lasl; 
81111111101" V~IC~I(;i()ll. 
• jin(t?Vl,N ',~l(:OTC'tt'lllq'N balzgha(j-CNl, '~tri-nc'ltlZNl, 
kco'vnl)yUl, eON SCNU dae-reUINl, sassdav 
• \[jinawl'N yCOVCUmTN ban!lh, ag-CNl,\]Nc 
\['uri-ne, wnNP\]NC: \[kcompyuteoN SeN¢: dac- 
?'C?tlNI,\] N(; sassdav 
5 Experimental Results 
For l;hc ext)erinmnl; a.bouL l;eint)oral expre, ssion, we 
e, xla'aci;ed 300 senl;enc('~s (:onl;a.ining temporal expres- 
sions from E\]?I{I I)()S cortms. Table 2 shows the r(',- 
959 
~c'*¢unnrL ~ ( 
y~,~cl,,,,ItX { 
ya~cu,,,fl N ( 
) ¢,~c,,,n/I N~ ( 
) b'un*N'l ~: : 
) t') ¢"II:~'U'/N.I i 
g;,,,,~,lXl~ 
,.¢aadD~ 
. xy¢l}l¢onlfI'A /- ) 
!+,,+,,,aN )++,.+,,,,,,,A< 2 '':''L"''~'L ,2 
,~,.,ax -( y?,,~,,,,~,L~ /,,~j,,.,,,.xr(; ,~ 
Figure 5: Finite state machine constructed with the data ill Figure, 4 
wi/TN, NT 
.. ti/TN C~ 1.22/{ F,/" t2o/'l'A " "'' 
0 2 \ --/" 
wj/TA, NT 
IV = {barn, jumal, ba~dhao,...} 
Wl G IV 
wj ~ II / 
Figul'e 7: A {lel;('~rministi{: finil;{; stat(; trans{lucer t{} 
process temporal expression 
in a similar way. The subs(xlU{'.ntial FST for our 
sy:stem is detined as in Figure {i and l?igu\]{~ 7 illus- 
trates th{. ~ trans(hl{:er in Figure. (i. In the tigurc, ti 
is a {'lass to which the tc.mporal w{}rd 1}elongs in the. 
temporal classification. "wi is a word {}the.r than tem- 
poral ones that has |;11{; prex:e.ding temporal word be 
its moditier, and 'wj is not such a word to make a 
COml)Oui:d noun. TN, TA and NT are synta{'ti{: 
tags. A word tagged with TN would mo(lit~y a suc- 
(:ceding noun like barn(night), ban.qh, ag(vacation). A 
word attached with TA wouhl mo{lii2y a predicate 
and oi1{; with NT means it is not a temporal word. 
Actually, individual FSTs arc {:oml}ined int{) one. an(l 
rules for tagging of teml}oral wor{ts are put over the. 
FST. The rule is al}plied according to the priority 
by fro(tllOll{;y ill case m(}re than o11o (}uttmt are pos- 
sible for a context. Namely, it is a rule-based system 
where the rules are extracted fi'om corl}us. 
4.2 Clmnking 
After the FST of temporal exi}ressiolls adds to words 
syntactic tags such as TN and TA, chunking is {:on- 
ducted with results tiom outl)uts 1)y the Fsr\] '. As we 
said earlier, {:lmnking in Kore.an is relatively easy 
only if the teml}oral ext}ression would l)e suc{:e.ss- 
fltlly recognized. Actually, our clmnker is also 1)ased 
on the finite, state lnachine. The tbllowing is an ex- 
ample tbr chunking rule.s. 
(Nl~h,.,~) -÷ (NP) I (TNP) 
(NP) -~ (N)* (NP) I (N)* (Nu)* (uN) 
('rNu) -~ ('rN)* (N)* (NP) 
Here, N is a noun without any 1)ostl)osition, N/? is 
a noun with a postposition, TN is a temporal noun 
recoglfized as modii~ying a succeeding 1101111, NU is 
a numbe.r and UN is a unit noun. Aft, er tcmI)oral 
tagging, the chunker transforms 'NT' into N, NP, 
(~tc. according to morphological constituents and 
their POS. Brietly: the rule. says that an NP clmnk is 
made fl'om either NP or temporal NIL An NP would 
1)e (:onst, rll(;te.(1 with one. or lnOl'O llOlllIS and their 
m{}{lifie{~ {)r with a noun quantified. A TNP, whi{:h 
is re, lated with time, is made fr{)m nouns mo(liticd 
by teml}oral words which w{mld be ide, ntitied by the, 
FST. By identification {)f t('mt){}ral ex\]}ression and 
chunldng, the following exami)le sentence is ctmnked 
as below. 
• jinan(last) ycorcum(summer) ban.qhag-c(in 
vacation) 'ari-ncun(we/SUB3) kco'm, pyu- .<thr,,{,) 
sa.ssda(bought) 
-+ \VC bOllght 1;t117o.o comi)uters in the last 
Sllllllller vacal0io\]l. 
• jinanTN ycoreumTN banghag-cN1, uvi-ncunNp 
kcom, pyuteoN SeNU dac-vculNp sassdav 
• \[jinanTN yeorelt~lZ,l,N ban.qha.q-cNP\]N(~' 
\[Iwi-TtC'lt*tNP\]N C \[kcompyutcoN SCNU dac- 
rculN P\]N(; sassdav 
5 Experimental Results 
bbr the {;xi}erinmnt al}out temporal exI}ressi{m, we 
extracted 300 senten{:es containing teml}oral expres- 
si(ms from ETRI POS corlms. Tal}le 2 shows the r{'.- 
959 
t precision J_£gcall 
rate (%) 97.5 90.56 
Table 2: Results of identifying temporal expression 
no chunking-\[ using chuifldng 
4.8 -\[ 3.3 avg. # of cand \[ 
Table 3: Reduction of candidates resulted from 
chunking 
sults from identit:ying temporal expressions and dis- 
aml/iguatil~g their syntactic functions. From the re- 
sult in the table we see that the method is very effec- 
t, ive in that it very accurately identifies all tile tem- 
poral expressions aim assigns them syntactic tags. 
And, Table 2 shows the reduction resulted from 
chunking after temporal expression identification. 
We take into consideration the average numl)cr of 
head candidates for each word since our parser is 
dependency based one. The test was conducted on 
the tirst file (about 800 sentences) of KAIST tree- 
bank (Choict al. , 1994). The number was reduced 
by 51% in candidates compared to the system with 
no chunking, whMl makes pa.rsing efficient. 
Most of errors were caused by tile case where tein- 
poral words have different syntactic roles under the 
same context. In this case, the global context such as 
the whole sentence or intcrscnt;cntial infornlation or 
sometimes very soi)histicated processing ix needed to 
resolve the prol)lem, l~br instance, '82 ~tycoTt(year) 
h, yco'njac-yi(now/Gl;N)' could be used two-.way. If 
the speech time is the year 1982, then h, yeou(fl, e-yi are 
conlbined with 82 nycon to represent time. Other- 
wise, 82 does not ino(til~y hycordac-yi, wlfich cannot 
be recognized only with the local context. Neverthe- 
less, the system is promising in that generally it can 
ilnprove e\[flciency without losing accuracy which is 
crucial for the pracl;ical system. 
6 Conclusions 
hi this paper, we presented a method for identifi- 
cation of temporal exi)ressions and their syntactic 
functions based on FST aim lexical data extracted 
fl:om corpus. Since tenlporal words have the syntac- 
tic ambiguity when used in a sentence, it; is impo> 
tant to identify the syntactic functioll as well as the 
temporal expression itself. 
For the purpose, we manually extracted lexical c(> 
occurrences t¥om large corpus aim it was possible as 
the number of temporal nouns is tractable enough 
to manipulate lexical data t)y hand. As shown in 
tile result, lexical co-occurrences are crucial for dis- 
mnbiguating the syntactic flmction of the tenlporal 
expression. Besides, the finite state approach pro- 
vide(l an eftieient model for temporal expression pro- 
cessing. Combined with the clmnker, it helped re- 
nmrkably lessen, by 1)runing irrelewmt candidates, 
intermediate structures generated while parsing. 

References 

Almey, S. 199t. Parsing By Chunks. ill Berwick, 
Abney, and Tenny, editors, Principlc-\]3ascd Par.s- 
ing. Boston: Klnwer Acadenlic Publishers. 

Choi, K. S., tIan, Y. S., Han, Y. G., and Kwon, O. 
W. 1994. KAIST Tree Bank Project for Korean: 
Present and Future Develot)ment. In Proceedings 
of the l~ttcrnational Workshop on ,%ara, blc Natu- 
ral Language Resources. 

Ciravegna, F. and Lavelli, A. 1997. Controlling 
Bottom-Ut) Chart Parsers through Text Clmnk- 
ing. In Proceedings of the 5th International Work- 
shop on Parsing ~);chnology. 

Collins, M. J. 1996. A New Statistical Parser Based 
on Bigram Lexical l)et)endencies. Ill Proceedings 
of the 3~th Annual Meeting of the ACL. 

Elgot, C. C. and Mezei, J. E. 1965. On relations de- 
fined by generalized finite alltonlata. \[B~d r Journal 
of Rcscarc.h and Development, 9, 47-65. 

Gross, M. 1993. Local Grmnmars and their I{ep- 
resen~ation by Finite Automata. Data, Descrip- 
tion, Discourse: l'apcrs on l;hc English  
i77, \[tor'lto,ur" of John Mc\[\[ Sinclair, Michael Hoey 
(ed). London: HarperCollins Publishers. 

KAIST. KAIST Concordance Program. URL 
htt;p://(:sfive.kaist.ac.kr/kcp/. 

Mohri, M. 1997. Finite-state ~\]5:ans(luccrs in lan- 
guage and Spe(~ch Processing. 6*omp'ul;ational 
Li'n.gui.stics , Vol 23, No (2). 

Ramstmw, L. A. and Marcus, M. \]'. 1995. Text 
Chunking Using Transtbrmation-Based l,earning. 
In Proceedings of the ACL Workshop o'n l/cry 
Large Corpora. 

l{oche, E. and Schabes, Y. 1995. Deterministic 
Part-of-Speech Tagging with Finite-State Trans- 
ducers. Computatiou, al Li'n, guistics, Vol 21, No 
(2). 

Roche, E. and Schabes, Y. 1997. Finite-State Lan- 
guage Processing. The MIT Press. 

Skut, W. and Brants, T. 1999. Chunk Tagger. 
llIl Proceedings of ESSLLI-98 Workshop on Au- 
tomated Acquisition of Syntax and Parsing. 

Sproat, R. W., Shih, W., Gale, W. and Chang, 
N. 1994. A Stochastic Finite-State Word- 
segmentation Algorithm for Chinese. In Pwceed- 
ings of th, c 32rid Annual Meeting of ACL 

Yoon, .J., Choi, K. S. and Song, M. 1999. Three 
Types of Ctmnking in Korean and l)ependency 
Analysis Based on Lexical Association. In l'm- 
cccdings of ICCPOL '99. 
