An Abstraction Method Using a Semantic Engine 
Based on Language Information Structure 
Hirohito 1NAGAK1 and Tohru NAKAGAWA 
NTI" lluman Interface Laboratories 
1-2356 Take Yokosuka-Shi Kanagawa 238-03 Japan 
E-tnail in agaki@ ntthli.ntt.jp 
Abstract 
This paperdescribes the framework for a new ab- 
straction method that utilizes event-tmits written 
in sentences. Event-units are expressed in Lan- 
guage Information Structnrc (LIS) form and the 
projection of LIS from a sentence is f)cffonned 
by a semantic engine. ABEX (ABstraction EX- 
traction system) utilizes the LIS outpttt of the 
semantic engine. ABI';X can extract events from 
sentences and classify them. Since ABEX and 
the LIS form nsc only limited knowledge , the 
system need not construct or nlaintain a large 
amount of knowledge. 
1 Introduction 
Automatic abstraction, a lnajor natural lauguagc \]n'uecss- 
ing application, is difficult to achieve. Luhn\[5\] developed 
a very simple extraction method that searches for the 
keywords ill each sentence. This type nf abstraction is 
easy to accomplish, but its quality is poor. Other abstrac- 
tion methods\[6, 31 utilize natural language unders~ding 
(NLU). However NLU is still in the development stage. 
For achieving good practical performance, it is necessary 
to treat the thformation expressed in a tlocument uni- 
formly so that it can be analyzed with only a small fixed 
amount of knowledge. 
We propose tile L.IS form which allows inlormatinn 
about events tO Lvd unilollnly treated, t;urt\[lcl~nore, the 
semantE engine only uses abstract words, this reduces thc 
size of the knowledge. So the semuntic engine projects 
a sentence to a LIS form within a lmdted anloant of 
knowledge. 
In abstraction, classification is the first step. Classifica- 
tion is performed using unilorln approach, the LIS lkmn. 
LIS event representation ullow us to select and classify 
sentences, 
2 A semantic engine 
2.1 What is the l,anguage Information 
Strncture(LIS) form? 
The LIS form expresses the inlormation structure that 
pernlits commnnication between individuals. If two ino 
dividuals communicate about one that happened (will 
happen) m the real world, the core inlormation is the 
event. Sometm~es a spe,aker will atulche an attitude to the 
event. So information al~ml real world is expressed by 
the event and the attitude of tile speaker. 
2.2 L1S form 
In the LIS form, there are two types of feature-structure, 
word feature-strncture and event feature-structure. Al- 
most all slots of tile word feature-stxucture are filled with 
appropriate values and lew slots of the event feature- 
structure are empty. The semantic engine tries to fill all 
slots of lhe feature-structure. 
2.2.1 Event feuture-str ucture 
Event has one leature-structure, the role lkkqture. The sen- 
tence conlilins one or more events and die event feature- 
structure indicates the role of words or phrases in the 
event. The role feature is either essential or extensional. 
Seven essential roles have been created: as AGENT, OB- 
JECT, ACTION, LOCATION, TIME, FROM, and TO. 
These roles are defined not for verbs but for events. This 
is quite diflerent li'om Fillmore's cases \[2\]. Therefore, the 
action ill the event is represented by the ACTION slot, 
which c~ln be lilled by verbs, nouns, gerunds, and so oil. 
It is not necessary to fill the ACTION slot by a verb. 
l:or exulnple, tile phra.sc " a laud tmrchase agreement" 
is dealt with as one event in the LIS, and the ACTION 
slot-value is "agreement". Other slots, such as AGENT, 
OBJECT, LOCAI'ION, TIME, FROM and TO slots are 
ahoost the same as in Fillmore's cases of 'agent', 'object', 
'location', 'time', 'source', and 'goal (or experiencer)'. 
It is important that our role model deals with the roles of 
words (or pltrases) in an event, not word meaning. 
Using just seven essential roles, it is difficult to assign 
a talc to a word (or a phrase). To overcome this problem, 
AcrEs DECOL1NG-92, NANTES, 23-28 AOl~f 1992 8 7 5 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
we introduce extensional roles which allow to be moditied 
by the addition of "/constraint". 
2,2.2 Word feature-structure 
Word has six features. These are semantic lizature (DDF) 
slot, numerical-value slot, date slot, constmtint slot, modal- 
ity slot, and word string slot. 
Using the semantic feature, the event feature-structure 
will be determined during semantic interpretation process. 
Six classes of semantic features are defined, such as INDI- 
VIDUAL, ELEMENT, THING, ACTION, LOCATION, 
and TIME These classes are instantiated to the Domain 
Dependent semantic Features (DDF) when tile domain is 
decided. 
The constraint fcmure restricts the feature-su'ucturc of 
brother words or t)hrases. FurthcrlllOre, tile constraint 
feature determines the relations betweell word feature- 
structure and event feature-structure, ht Japanese lan- 
guage, a word which have a ACTION DDF usually has 
the constraint feature that determines tile slots of event 
feature-structure. 
The numerical-value slot expresses numerical value of 
a word ; 0, 1 , 2, -- (one) , ¢i (hundred) , :1"- (thousand) 
, and so on. The calculation of countthg up and down 
is necessary, so all figures are separated. The nmnerical- 
value feature will be expressed as folh)ws.(Onr notation 
of a feature-structure is ~eature-name =feature-value\].) 
\[numerical-value= \[s~icr')~!!dTidg~it .... \] \] 
The date slot expresses event occurrence time and is 
expressed by the Christian era. In the Christian era, days 
are counted by numbers, so that date slots arc calculated 
nsing the numerieal-value feature. The date slot has a 
minute slot, a second slot, a hour slot, a day slot, a month 
slot, and a year slot. Eacll slot is expressed in nmnerical 
value. 
The modality slot is classilied into three c,'ttcgories; 
tense, aspect, and uqood. Since tile tense and aspect are 
linguistically Iixed, we use an ordinary categorization. 
However, mood is needed to be categorized differently, 
because the information unit used this system is an event. 
So we categorized mood as a combination of Bratman's 
Belief-Desire-lnten lion model\[ 1 \] and modal logic. That is 
the skate of event is expressed hy modal logic (necessary 
operator, possible operator, and negation sign) and the 
attitude of speakers cart tx~ cthssiliod into belief, desire, 
and intention. For example, a seurence I think it ts 
possible to construct a plant there will be expressed as 
Belief(Possibleli', where E means an event;construct 
a plant there, that is, the individual believes that E is 
possible. 
Furthermore, it is necessary to consider u situation in 
which information is transferred. In tfie newspaper, it 
is created by journalists who get information from other 
services (person or company information bureau), ht this 
situation, the event aml the attitude of file inforumtion 
possessor (IP) is transported to a speaker (SP);journalist. 
The journalist then reexpresses the information to reflect 
his attitude. If the modality of IP and SP are expressed as 
M odalit ~,/se (), M odalit Yl v (), respectively, information 
in newspaper is expressed as, 
M odalitys p ( M odality, e( E V E N T) ) ). 
If the target document is newspaper, the LIS form 
includes the modality of speaker (Modalitysp) and the 
modality of information possessor (Modalitylp). 
2.3 Projection Mechanism 
Parsing is done using Morphological analysis and De- 
pendency zmalysis\[4\] attd yields a syntactic tree for a 
sentence, 
After the parsing, we search a feature-structure dictio- 
nary to extract feature-structures of all words related to 
the domain. To perform semantic analysis with limited 
knowledge, word feature-structures are prepared only for 
abstract words. The registration of proper nouns are left 
to the user. The semantic engine infers the semantic 
meaning of words or phrases from the system default and 
user registrations held in the dictionary. This means the 
semantic engine do not need all knowledge of words for 
semantic interpretation. Thus only a small amount of 
words need to be maintained. 
After attaching the appropriate word feature-structure to 
all important words, semantic interpretation can proceed. 
From the type of propagation for the feature-structure in 
a parsing tree,there are two types of features. One is the 
synthesized type whose value is calculated from sons to 
fatber relationship of the parsing tree. The other is the 
inherited type that are calculated from father or brothers. 
Word string, DDF, numerical-value, date, and modality 
features are synthesized type and other features, such as 
consmdnt and role are inherited type. The propagation era 
feature-structure is accomplished by unification calculus, 
but the grammar is different. 
For DDF features, the grammar is as follows. 
Ii'.I)DF ::= iml~ fini~e 
N.DDI'" ::= NI .l)l)Fq~ N2.DDFq,... N,.DDF 
N.I)DF ::= individaal(company)l 
element(corapm~y) lmoneylmanlproductl 
action(company)llocation\[time 
Note: Uncapitalized words mean terminate and capitalized words 
mean nonterminate. E means EVENT node structure and 
N means other node structure and N.DDF means DDF 
feature-value in node 'N'. Symbol 'n' is the number of 
nodes. Operator ,~ means unification operator. 
For the constraint feature, 
E,constraint ::= ~ (NI.DDF@ E) 
i=l,...,n 
N ...... t,'aint ::= ~ (N).DDF@N) 
j=l,..,n 
N.conslrainl ::= feature-structure of brother nodes 
ACRES DE COLING-92, NANTES, 23-28 AOL'T 1992 8 7 6 PREC. OF COLING-92, NANTES, AUG. 23-28, 1992 
Tile dale and nunierical-value features are ralJlOr COlU- 
plicated because we have to 0col with the semantic mo,'tn- 
ing of time. 
The grammar of l~)r the date feature is, 
N.date ::= Ni.dale ~ h'2.dal~ .\] .... q, N,,.da& 
Sff'olld : • • • 
\]lollr ~ , ' • N.dale ::= 
da.q ..... 
l~lOl~l/t :. - • 
ijeal. =: , •. 
Tile calculation of number aud date features is done 
like a stack. The nmnerical-value feature has one stack 
and date feature has six stacks. 
For example, lbr the number 1992, all die numbers, 1. 
2. and 9, are expressed as follows, 
1 ~ \[numerical-value = (oval (push-number-stack 111\] 
2 ~ \[numerical-value =(eval (push-nuulber-sluck 2))\] 
9 ~ \[numerical-vulue = (oval (pusll-numbur-stack 9))} 
Note: Symbol 'oval' i/lealis ltlal next \[oi112 will be evaluated by 
Common-lisp, Symbol 'push-stuck' is the lunction Ihat 
puts the argumem wdue ou the top el tile stuck, 
The equation lot tile nun/eric/l-value of 1992 is, 
1992.numerical-value = 
{\[numerical-value = (ewtl (push-number-stack 1))\] 
~\[numerical-vuluc =(ovid (push-number-stack 9))\] 
~\[numerical-value = (ewil (push-number-stack 9))\] 
(9\[numerical-value =(eval (push-number-stack 2))\]J 
which, after evaluation, gives as the value of 1992 as 
follwing expressions,touting right to left, tirst digit being 
2. second digit begin 9, and so on, 
first-digit =: 2 \] 
second-digit 9 
1992.numerieal-value = third-digit 9 
fourth-digit 1 
If we process the phrase, 1992 ~q". (year 19921, the 
equation becomes, 
1992 ~.date : 
{ 1992.numerical- value ,\], 
date : (eval(ifSELFnumerical-value ) 
(push-year-stack SELF.numerical.value))) j 
Note: Nonteriifinate 'SELF' refers to the sell leaturc-structure 
Symbol 'push-year-stuck' is the functiou that places the 
argument value to tile year stack ill lhe date feature. 
Then and we get the time feature-structure as, 
\[ 1992t, F..date = \[ year = 1992\] \] 
The grammar for modality is quite simple, 
E.modality ::= Nl.modality ~l, N~.modality it, 
• .. N,,.modality 
N.modality ::: NI mo l q .~ A"~.tnodulity q, 
• .. N,,.modolitq 
N.modalit¢l ::= t~ n,~c , ti ,+p~ (t. u t~d mood 
2.4 All example of the projection process in 
tile semantic engine 
This passage comes from Tile Nikkan-kougyou shiu- 
bun(Daily Industrial Newspaper). The headline is "il~ 
Co.. Ltd. is constructing a new plant to assemble large- 
scale steel bridges in W~tkayama-ken.). 
amtounced a land purchase agreement in Shimotsu-machi, 
Wahayeana-ken, where they will construct a new plant 
giving ttuem additional ,wace and capabilities to fabricate 
larger scale steel bridge structures.) 
-~" "~ o " (771e 170,S56-sq.-ttwter construction site, previ- 
ously occupied by part eta Maruzen Oil Co., Ltd. refinery, 
was purchased from Maruzen Shinwtsu Kosan for an esti- 
mated x\[ 10 billion. Construction on the new plant facility 
is slated to begin this coming spring.) 
S3: " ~t.tl~,j :iG" :: I'{,~:.liJo " (hlvestment capital is 
about ¥ 22 billion.) 
$4: " "~l~ t(fil~\[Lq)~o) ~'~o " (Operations will begin 
in April, 1993,) 
el'lie Nikkan-koagyou shinbun, November 17, 19901 
At lirst, tile domain is decided. Ill this example, we use 
"~-.~l~-J/h'~ "(company act) domain. 
Tile system rougldy separates events and extracts all 
events which related to the dolnaill: corupany act. From 
tile featnre-structure dictionary, file sentence was quickly 
reviewed to determine whether there is a word which has 
an ACTION DDF or not. If there is no such word, the 
system thea stops analyzing the sentence. In the example, 
the Iirst sentence (S 1) is made of two events. One is" J~ 
(construction)" and the other is " D,~\]~\[l (agreement)". The 
two events are connected by" $- -5 ~ ~ "~" (surukoto-de)". 
So seutence ($1) is separated to two events. Senteuce $3 
does not have an ACTION DDF, so further analysis is not 
couducted. 
Therefore, we call obk'lin live events from five sen- 
tences; S I to $5. 
L- ~ ~Clli~11 Ii),/ \[q'ltlilf i<c ~\[t~ L, " (lakada-kiko Co., 
Ltd. will coastruct a new plant giving tl~em additional 
space arid capabilities to fabricale larger scale steel bridge 
structures.) 
~EO L hz ~ ~j~ L ~o " (lakada Kiko announced a 
land purchase agreement ) 
Event3inS2: " ~t7~.,~|~'.l:l\[:l~L~;/S~lll~~l~2E~l~'~:l~tl©-- 
Jj'J- i-)~-¢~s'Jl'if,~;lXj~"£lt~, " ('1"he 170,856-sq.- 
p,u~ter construction site, previously occupied by part of 
A~s DE COLING-92. NANTFA~. 23-28 AOl~ 1992 8 7 7 PRec. OF COLING-92, NANTES, AUG. 23-28. 1992 
a Maruzen Oil Co., Ltd. refinery, was purchased front 
Maruzen Shimotsu Kosan for an estimated -"#~10 billion.) 
Event4inS2: " ~:~h~Cg~32~K-~AL"~-~o " (Con 
struetion on the new plant facility is slated to begin this 
coming spring.) 
EventSinS4: "' ~\[zl~l-~qJ:\]~e) ~"~.'o "' (Operations 
will begin in April 1993.) 
After the event separation process concludes, semtmtic 
interpretation is commenced. The first stage is attaching 
a feature structure to each word. 
Let's consider the Event4 in $2; " ~75~¢9~3Lg~,~ 
~f'c~t~IL'J- ~ o " (Construction on the new plant facility is 
slatedtobeginthiscomingspring.), lit this passage, there arc 
three Bunsetsu, five independem words, three delxmdent 
words. We need only five leamre-structures as shown 
below. 
~l~;this coming spring-- 
"~ (raisyun)"; thl,~ comlcting .~pr~n!l 
time 
(push-year-stack (1 + *article-year*)) 
string :: 
ul DDF = 
date : 
!.~;plant~ 
string 
u2 DDF 
= "A2~ (koujyou)";plaT~l 1 
= den enl(companlt) J 
t\]~;construction-- 
strtt~9 = "~.~t~ (ket~etsu)"; con.~trucliol* 
DDF = .ctw~d~cm~p.n~j) 
\[ d~gent.l)l)l'" = ~ndivld.alleomlm71y)\] 
,3 con- I_action.1)l)F= actiott(coTnpany) I 
straint= \[_ob3ect,DDF= elemet~ticomp...V) \[ 
\[ _timc.DDF = ti,... .... \] 
~;is slated to ...~ 
\[ st,,ing = ",,,AL(chakkou)"',i.sslaledl .... \] 
u4 modali(y \[.sp¢ ('l = ju.~l_bcforc\] 
6 ;wiil~ 
\[ string = "~-Zo(suru)";.'ill \] 
u5 modoliltj \[t~ ~,sc = J'ul.r~\] 
Note: Symbol '_agent.Dl)F' means IhlU if the DDF feature value 
is unified to the one node, then variable '_agent' is tx)unded 
to that node's feature-structure. Variable *article-year* is 
bounded to the date of year when the article is publishezl. 
The example is parsed as shown in ligure 1. 
Once the parsing is finished, the semantic thterpre- 
tution process begins. Node n\] wilt have tile feature- 
structure timt is the result of calculation between the 
feature-structure of" ~ (raisynn)" and " 7~, b (kara)", 
but the word" 2,~ C9 (kava)" has no leature-structurc so thc 
feature-structure of" ~ (raisyun)" ul, is propagated to 
node nl. The feature-structures of all nodes are calculated 
same way. 
For the constraint lcature, uniiication was done 
to all brothers. For example, \[~(tgenl.Dl)t" = 
(raisyun) (kara) (shin) (koujyou)(ke~etsu)(ni) (chakkou)(suru) ¢ ¢ ¢ ¢ ¢ 
ul u2 u3 u4 u5 
Figure 1 : First stage of syntactic tree 
iJ.lividual(comprmv)\] means that one brother node is 
needed which have the DDF value of agent(company). 
If there is a node which satisfies the constraint, then the 
variable _agent is \['rounded to that node feature-structure. 
If there is no node which satisfies tile constraint, then 
variable _agent is unbounded. 
Try to think about the constraint feature in "Jd~P. 
(kensetsu)". There is no node that has agent(company) 
in DDF, but there are nodes which satisfy the constraint, 
such as the ~action,_object, and _time which are bounded 
to nodes n3, n2, hi,respectively. 
Finally we get the event feature-structure of top node 
n-top,shown in figure 2. 
3 An abstraction using the LIS form 
3.1 The basic method of the abstraction 
In tile abstraction, we utilize classification of the LIS 
ouqmt. First, a sentence is put into the LIS form by the 
semantic engine. 
TIle LIS output is used to commence the abstraction 
procedurc. To extract information from sentence, we 
think classilicaUon is tile best way. The semantic engine 
analyzes sentence in fixed domain, after the semantic 
attalysis. Sentences tire classilied whether an event or not, 
artd tile system extracts the events which are related to the 
domain. 
Finally, ABEX provides a abstraction. One abstraction 
proposed here is the classification of event occurrence 
time and similarity of event. This classification reveals the 
relationships of each event. Individual event occurrence 
times will be determnined from value of the time feature 
and the similarity of events is calculated by comparing 
event feature-structure slots. 
Tile other method is classification by the modality of 
information. Front the view point of Modalitytp, we can 
classify an event according to the modality of information 
possessor (1P). If there is no modality in the event, we 
classify it as 'fact'. Others are classified using modality 
feature. This classification of tile event's modality reveals 
the attitude of the information possessor. 
ACRES DE COLING-92, NANTF~S, 23-28 Ao~r 1992 8 7 8 Pate. oF COLING-92, NANTES, Auo. 23-28, 1992 
_attic. 
_object 
..llllt( 
t ~,,,~/ \[ " ,~lq~ (kensetsu) IC (:u)" 1)1)t' ~ :l on(COmlm~O ) 
= ,,~ con au:tlon.l) l)F = act~o~l(CUmlmlLq ) 
smdnt _objert. I)l)l'" : : chm. nl(conumny ) 
_tlme. Dl)l'" :-: tune 
\[ sl,,,,9 = "*,i(shit,j lj~t(koujyou) '' \] 
ua l)l)t.1 elt:mcnl(~l~mlmng 1 
a: ut nto~lh = 4 
d,tc = yea, 1991 
,..,,s ,,,o,z.l,,v = L te~se :: future \] 
Figure 2: All exam 
\[~ZCs c k-e 
EVENTT:SELL' 
EVEN'F2:SIGNNING 
E VF2-rI'6: ~:'~ ~ "~OSIN~ 
Figure 3: Tile classification result by tile event occurrence 
time and the similarity 
4e of all event node \[catttre-stl'ucture 
~: ~ 9:ge;5~ (Fact) ' II 
I Takada Kikou Co. Ltd. "will construct... I -~--2~.0)~{,-£~ ( O f fi~t al' bulletin) I 
I .To,~:ada Kikou announced a land... I I ~_::~o97~:1~ (Intention) Ill 
l+~m~. III 
l Operations will begin in April, 1993, I 
Figure 4: An ahsn'action result according to modality 
3.2 An example oftlne abstraction 
Figure 2 shows a typical abstraction result of ABEX. 
The events me classified by tile eveiit occurrence title 
and the simimlity of each event. In this Iigure, x- 
axis indicates tthsohlte event (}cctllfence tittle and y axis 
indicates relative sinlilarity of events alld cilcled icQIIS 
indicate single events. 
A typical classification restllt using tile modality of 
information is shown in ligure 3. 
The Event 2 lilts tile modality of an official bulletin and 
Event 5 has rite modality of company imention, so we get 
tile abs~tction result shown in figure 3. 
4 Conclusion 
We have described a I'ranlework I()r a now ahsIraciitnl 
method that utilizes classilication. Classilication is per 
formed using tile outpnt of u senlantic engine that is based 
on LIS form. Since the LIS Ionn takes into account the 
incompleteness of knowledge, the system requires curly 
a small amount of knowledge to i)erfoma the setnaatic 
analysis. 
First, ABEX utilizes the selectivity of the semantic en- 
gine according to the domain and the event. Furthermore, 
ABEX classify accorcling to tile LIS constituents such as, 
TIME modality and so on. The generation mechanism is 
poor, bttt ahstraction by classification is an easy way mak- 
illg all ahsffact. Furthermore, the chtssification nletilod 
descrihed here well supports human abstract tasks. 
References 
\[\] \] M. E. 13ratman. Phms and resource-bouulded practical rea- 
sonirlg. Computational huelligence, (4), 1988. 
\[2\] C. J. Fillmore. Toward a au}dern theory of case. Ibentice. 
1lull, 1969, 
13\] U. Hahn. Topic essentials. Coling-86, 1986. 
141 11. hlagakh S, Miyalutra, and F. Obashi. Sentence disam- 
biguatiorl by docutxlent oriented preference sets. Coling-90. 
1990. 
15\] 11, 1:'. Luhn. The automatic creation of literature abstract. 
IBM Journal. Vo12, 1958. 
\[61 L. F, I~.au. Inlormation extraction and text summariza- 
tion using linguistic knowledge acquisition. Processing & 
Mammgment, ilages ,119~28, 1989. 
ACRES DE COLING-92, NANTES. 23-28 AOm- 1992 8 7 9 PRec. OV COI.ING-92, NANTES, AUG. 23-28, 1992 
