A METRIC FOR COMPUTATIONAL ANALYSIS OF MEANING: TOWARD AN APPLIED THEORY OF LINGUISTIC SEMANTICS 
Sergei Nirenburg Victor Raskin 
Department of Computer Science Department of English 
Colgate University Purdue University 
Hamilton, New York 13346 West Lafayette, Indiana 47907 
U.S.A. U.S.A. 
SERGEI @COLGATE JHZ@PURDUE-ASC. CSNET 
ABSlRACT 
A metric for assessing the e~mplemlty of mmmntic 
(and pragmatic) analysis in natural language 
processing is proposed as part of a general applied 
theory of linguistic semantics for NLP. The theory 
is intended as a complete projection of linguistic 
semantics onto NLP and is designed as sa exhaustive 
list of possible choices among strategies of 
aementlc analysis at each level, from the word to 
the entire text. The alternatives are summarized in 
a chart, w~ch can be ccmpleted for each existing 
or projected MLP system. The remaining components 
of the applied theory are also outlined. 
i. Goal 
The immediate goal of the paper is to explore the 
alternative choices in the analysis of meaning in natural 
lsaguage processing (I~P). Throughout the paper, semantics 
subsumes pragmatics. The mere an~itioes goal of the paper, 
however, is to lay ground for an appl~ed theory of 
~ti~ for ~2 ¢~ST/~2). 
2. ~ ~ of ~ smmtics for mmral 
AL6TAxLP is a part of an applied ~ theory for 
natural ~ngnage pmoeessing (ALT/NLP). The latter obviously 
includes other ccmponsats, most prominently syntax sad 
morphology. The applied theory is the result of a projection 
of linguistic theory onto tlm NIP plane or, in other tenas, 
sa adaptation Of general linguistic theory specifically for 
l~P purposes. 
2.1. ~tis Theo__q£, Semantic Theor Z. The modern 
concept of linguistic theory, developed primarily by Ch~sky 
(1965), is that of a set of statements which i) 
characterizes language as a cc~plex structure sad describe 
that structure tap down, 2) underlies each description of a 
particular language and determines the format of such a 
description. Semantic theory as part of linguistic theory 
determines semantic descriptiens. Semantic descriptions 
assign mesaings to sentences, and each meaning is a fonmala 
logically deduced from the rules provided by seamntic theory 
ald utilized in the descr~tion. A valid semantic 
description assigns each sentence the sere meaning that the 
native speaker does. 
The theoretical inadequacy of m/ch of contemporary 
linguistics may stem from Chomsky's ~ that the theory is 
one. An alternative viow of ~ as the ex~sL~tive ~ of 
altenmatlves, complete with the issues on which the 
alternatives differ and the consequences of each choice, is 
s~ply indispensable for applications. 
2.2. ~ Applications and NLP. A meaningful 
application of linguistics always deals with a problem which 
comes entirely from the area of application and not from 
linguistics. Every ~ system requires the description of a 
natural language fragment, often of a sublsaguage. On the 
one hand, modem linguistics, with its emphasis on 
formality, would seem to be uniquely sad unprecedentedly 
qualified to supply such a description. On the other band, 
while every single fact about language the ~P expert needs 
is out there in linguia~ics, much of it is not easily 
ancessible. Descriptions posing as theories or theories 
posing as descriptions tend not to list all the necessary 
facts in any way facilitating computer Jmpemsatation (see 
below). The only solution to the problem is to develop a 
~ .m ~omBtm ~ ~ pmdemm~ ~ 
k*mwledg~ onto NLP, which is what ALT/NLP is ell about. 
2.3. ~ Theo~, I: ALT/NLP. ALT/hLP deals with 
pretty much the same facts and phenomena of language as 
linguistics per s e. There are, however, eroaial differences. 
First, while both "pure" and "applied" theories are formal, 
the nature of the fo~d/sm is differ~mt. Second, pure 
linguistic theory deals with a language as a whole while 
ALT/~LP deals with limited and relatively closed 
sublalgusges or language fragments (sen Raakin 1971, 1974, 
1985b; Kittredge and Lehrberger 1982). 
Third, pure linguistic theory must ansure a complete and 
even coverage of everything in the texture of language; 
ALT/hLP analyze only as much as needed for the purposes of 
NLP and ignore all the linguistic informatisa that is 
superfluous for it. Fourth, the ultimate criterion of 
validity for pure linguistic theory is the elusive 
explanatory adequacy; the ultimate criterion for ALT/h\[2 is 
whether ~ systems resulting from its application work. 
Fifth, pure linguistic theory can afford not to pursue 
the issue once a method or a principle is established. In 
ALT/hLP, everything should be dose explicitly to tlm very 
end, and no eKtrapolatios is possible. And finally, pure 
linguistic t/~eory has to be concerned about the botmdary 
between linguistic and encyclopedic knowledge, i.e., between 
cur knowledge of language and our knowledge of the world 
(cf. Raskin 1985a). There may be no particular need to 
maintain this distinction in an hLP system (cf. Schenk et 
al. 1985) because the computer needs all the I~nds of 
available information for processing the data. 
2.4. ~ ~ II: ASLT/hLP. AS~, a pr~ection 
of lin~dstic mmm~ties onto NLY, is designed to serve all 
the various hip systems. Therefore, it is viewed sad set up 
as the e~mnative list of possibilities for semantic 
m~lysls and de~ription available in linguistic semantics. 
The intended use of A~T/NLP is to bring to the NiP 
ousta~er, not necessarily knowledgeable in linguistics, the 
totality of what linguistics knows about meaning by i) 
listing ell the choices available at each level of sanmntic 
analysis, 2) determining causal connections among choices 
sad the propagation of constraints through the choice space, 
3) assessing say existing NLP system as to the c~lexity of 
its ssnantic equipment sad the possibilities of expanding it 
in the desired direction if necessary, and 4) relating each 
chain of compatible choices to the practical needs and 
resources. This paper deals almost ~clusively with the 
first item on this agenda. 
3. The ~ Scale of Semantic Anal sy~. 
The scale proposed in this section is a list of choices 
available at each of the five levels of semantic m~slysis 
corresl0cnding to the five meaningful linguistic entities 
pertinent to ELP - the word, the ~, the sentence, the 
and the text, or discourse. At each level, 
attention is paid to such dimensions as the cc~pleteneas and 
relative depth of analysis. 
All the emsmples are taken from one paragraph (I) in 
Ul\]msa (1982:1-2). The paragraph does not stand out in any 
sense except that it clearly belongs to the ccmputer 
sablanguage of IMglish. 
(I) (i) Data, such as the above, that is stored more or less 
pem~anently in a c~puter we te~m a database. 
(ii) The software that allows one or msay persons to use 
sad/or modify this data is a database mana~emant 
eros). 
(iii) A major role of the D~M8 is to allow tlm user to 
deal with the data in abstract terms, rather than as 
the computer stores the data. 
(iv) In this sense, the D~MS acts as an interpreter for a 
hlgh-level programming language, ideally allowing 
the user to specify what most be done, with little 
or no attention on the user's part to the detailed 
algorithms or data representation used by the 
system. 
(v) However, in the case of a D\]IMS, there may be far 
less relationship between the data as ~ by the 
user and as stored in the computer, than between, 
say, arrays as defined in a typical programming 
language and the representation of those arrays in 
memory. 
338 
3. I. The Word. The asmantic descriptions of the words are 
usually stored in the dictionary of an NLP system. ~e 
matym~ ~ t~ w~ level may be full ~ ~/al. The 
analysis i~ full if every word of the analyzed text is 
s~ppesed to have a non-~pty (i.e., distinct from just the 
spelling) entry in the dictionary. The analysis is partial 
if oaly sa~e words must have an entry. Thes, an analysis of 
(li) as a sequence of three key words (for instance, in 
automatic obstranting), as sho~n in (2), is definitely 
partial. 
(2) DATA ~ DATAm~ 
'1~ae ~mi~ may be ~ted or tml~mlted. The analysis is 
unlimited if the meaning of tt~ word needs to he utilized in 
its entirety. The analysis is limited if, for the purpese~ 
of a given NLP, it would suffice, for instate, to describe 
the words in (3i) as physical objects and the words in (3ii) 
as mental objects and omit all the other elements of tl~ir 
meanings. 
(3) (i) penmen, operator, computer 
(ii) data, database, algorithm 
~other version of limited analysis would be to analyze 
the meanings of the words to the point of distinguishing 
each word from any other word and no further. Th~s, !~erator 
snd ~ can be distinguished in terms of semantic 
description as sh~n in (4). 
(4) (i) o_o_~erator: Physical Cbject, ~imate 
(ii) c ca~_te___Er: Plr fsical Object, Inan/mate 
It is worth noting tl~t while person and !~erator can be 
simJ/srly d:Lstinguished along the lines of (5), they cannot 
he distingtdslmd in the computer sublanguage and are, 
therefore, complete synonyms. In other words, person is the 
parent of ~ in English as a whole but not in this 
sublangsage. 
(5) (i) person: }hman 
(il) operator: Humsn, Using Gadget 
~*e sm~lysi~ can use a rammer of metheds. The first and 
min~Ll one seems to be the ~ approach, e.g., 
key-word analysis. Within this approach, words are assi@guad 
to certain semantic classes, represented by what is often 
called key words or descriptors, and this r~ains their only 
characteristic. In mare sophisticated versions, descriptors 
may he further m~categorized, i.e., parent-child relations 
m~ong them ,,,an be set up, and disti~a~y entries will then 
contain hielarchles of them, e.g.. (6). 
(6) data M~TPAL OBJELT (It4PUT~-P/KAT~ 
Second, a form of feature (or eom\[x*~ntlal) snalysis can 
be used. The rosin distinction between feature analysis and 
set membersl~p is that, in the fozmer, the features come 
from different hierarchies. Thus, for (6) to be an exsmple 
of feature analysis rather tlmn of descriptor analysis, 
CQMP~\]I~R REIAT~D should not be a child of ME2~_AL OBJECT in 
the system. 
X1%ird, tie dicti~mry entries may be set up as netwo,\]~. 
In lings~stic semantics, the concept of semantic field (sea, 
for instsnce, Raskln 1983:31-2) corresponds to a primitive 
network. In a pure netwozk-besed approach, only actual words 
serve as the nodes - there are no nmta~ords or categorJal 
ma~ers (unlike in syntactical trees) and no primes (unlike 
in feature analysis). The netwonks may have weighted or 
tmweighted \].inks (edges); they may also, or alternatively, 
be labeled or tmlabeled. The number of labels may vary. The 
labels cen also he set up as the other kind of nodes. 
Generally, the nodes caa be equal (flat) or unequal 
(hierarchical). ~Ims, redness may be set up as a node while 
___~ is a slot of a physical object, connected with the 
redness node by the link color. 
3.2. The Clanse. The clause boundaries are obtained 
through tl~ application of a syntactic parser. The 
full/partial dimession at this level deals with whether 
every cl~ase of the sentence is armlyzed or some are 
omitted, and the latter is not Jmpasslble. The 
unlimited/limited dimension deals with the detalization of 
the analysis along the various paraneters (see below). 
Decisions on both of the (lims~ioas may be predeteunined by 
those taken at the ~rd level. In general, the ~ll/partial 
and unlimited/llmited dimensions become the more trivial and 
obvious the higher the level. Accordingly, while fully 
reflected at each level ~i the chart in (I0), they will he 
hardly mentioned iu the subsequent subsections. 
The most ~mportent decision to nmke at the clause level 
is whether the ontput is structured or not. The unstmctm-~ 
will simply list tl~ semantic characteristics of all 
the words in the alanse which have them, in the order of 
their appear~mce. The only clanse-ralated infor.mtion in 
such a csSe wLll be tl~e classe boundaries. 
The stm~tt~ed outlm~¢ may he dependent on t}~e 
natural-language syntax of the clause or not. The accepted 
terms are: semm~t~c ~,*terpretati(n for 
s2mtsc t icalls~dependent outputs, and semantic 
repre~tari~,, otherwise. In a t~plcal semBntic 
representatictl, a tree-like structure, such as (i0) (of. 
Nirenherg et al. 1985:233), may he set up for clauses 
instead of t|m/r re~lar syntactic strastures, with the 
nodes and/or link labels being of a different nature. 
event with its antsnts as in (7ii) should be an obvious 
possible cheice for the analysis of the clause. The 
structures may be more or less distant from the syntactic 
str~ture (in any guise) but the presence of just one 
semantic node or - more often - link \]abe\]. would render them 
non-syntactic. 
(7) (i) \[data\] is stored ~)re or less peunanently in 
the cc~uter 
(ii) store 
agent object t~,e space goal 
operator data elways con~uter maintain-datsbase 
In (7ii). the deviations from syntactic structure abound 
and include most prominenC\[y i) different link labels, e.g., 
goal; 2) substitution of m~lengusge-deter~dned paraphrases, 
e.g., ~ for more or less pemanently; 3) infonmtion 
not contained in the clause asd supplied frcm the 
sublanguage knowledge base, e.g., L ~ -- maintain-database. 
Whether information for the semantical analysis of the 
clause is supplied from ontside of the clause as well as 
from inside for its analysis or only from inside dete~ines 
whether the analysis is s~n~-¢mpasitiomsl or eomp(~itimal. 
Finally, the clause analysis may include or ~clude 
suprapropesiticnal infomm!iticn. ~y proimeiti~ml 
anmlysis will basically mml~ze the clause as a sentence. 
Thus, (7i) w:\[ll be analyzed without the square brsckets 
around da__~, which signify that the word is the supplied 
antecedent for a proncadnal entity (that), 
S~mpro~sitlmml mmlys~J typically subsumes propesiticaal 
enalysis and adds to it the infonnation on the links of the 
clause with the other classes of its own and/or t}~e adjacent 
sentences. Thus, in the case of (7i), that sh~tld be related 
to data two clauses earlier end the nature of the l~nk 
should be described: synt~:!tically, it is a relative clause; 
}~wever, a sementlc label, such as EXPANSION, would be much 
more infolmatJ~e (sea also below). 
3.3. The Sentence. The first important phenomenon to 
consider at the sentence level is whether the sentence is 
represented as a claasal d~cmmse sfcxt~tmu~ or not. If the 
sentence is not represented as such a structure, it becomes 
simply a sequence of classes suited by syntactical 
dependency infomnatien. ~mh a sequence will not be much 
distinct from a sequence of monoclaussl sentences, e~ept 
that some of them will be eEustered together. If the clausal 
discourse structure is there, it will be probably presented 
as a graph with the clauses for nodes and relations between 
them for link labels. Again, as in the case of the clause, 
the link labels may renge fron the syntactic ten~s to 
semantic relations. A more semantically informative 
structure, with semantic l:h'~, labels, is illustrated in (8) 
for (li) : 
339 
(8) Data... we term a database 
such as the above that is stored mere or less pe*~mnently 
in a ccmputer 
S~m~ntic link labels are often associated with 
non-syntactic clauses being distinguished - thus, such as 
the above is not a full-fledged syntactic clause. 
L~ce clause analysis, sentence analysis may be 
¢~mpomitim~ or m~p~iti~ml. There is much mere 
supraccmpesitional information available at this level than 
at the clause level. The strpracompesitionsl information is, 
of course, knowledge-based. It can include i) semantic field 
information for words (paradigmatic semantic infonmation), 
i.e., that c__~ter in (I) is a machine or a mechanical 
device and that certain other words, probably not in the 
sublanguage, are fellow members of the field; 2) information 
on the relations of the sentence with the world or subworld 
(for a sublanguage), e.g., for (I), the meaning of each 
sentence is clarified if semantic analysis utilizes a rule 
about the subworld, namely that avery mental object in the 
subworld is located in the computer memory; 3) speech act 
info~nation, i.e., whether the sentence is an assertion, a 
question, a c(~mend or any other possible value of the 
illocutionary-force variable (see ~burg et al. 
1985:234); 4) informatJ~n on d~e I~ of the sentence with 
other sentences (see the n~t paragraph); 5) given/new 
information, e.g., that this data is given in (lii); 6) main 
clause infozmation. 
Information on the links of the sentence with other 
sentences includes connectives, both explicit as, for 
instance, however in (iv), aud implicit. This infomaties is 
crucial for establishing the discourse structure of the 
paragraph (see 3.4). ~ch info~nation is used only in 
systems which acccm~0date ~trasent~mtial ~nfommti~n and 
ignored by systems with emlusively ~ntentiel info~. 
Finally, each sentence can be characterized as to the 
it expresses. In a textbook exposition llke (i), the 
goal tends to be nnnotonous - it is to convey information or 
to teach, but in a narrative te~t with protagonists or in a 
dialogue, goals can vary with each cue (see Schank and 
Abe/son 1977; Reichman 1985). 
3.4. The ~. The semantic analysis of the 
paragraph may include its representation as a sent~mtiel 
~ or not include it. If there is no such 
representation, then similarly to sentence analysis, the 
paragraph will be treated s~mply as a linear sequence of 
sentences. Otherwise, the paragraph may be represented an a 
graph with sentences for nudes and with relations between 
the sentences for label links. No standard syntactical 
nomenclature is available for this level. Using one simple 
semantic link label, (I) may be represented as (9) : 
(9) (li) ~p~sion 
(lii) ~pansi~ j~ansien ~~ion 
(liii) ~ (liv)~ (Iv) 
Because of the nature of (i) and of its sublengusge, the 
links hetwesn the sentences are much less diverse than in 
casual discourse - and this is good for ~LP. It is possible, 
end often advisable to combine the clausal structures of the 
sentences end the sententisl st~ctures of the paragraph in 
one graph, because frequently a clause Jn one sentence is 
linked to a clause in another rather than the whole sentence 
to the other, and the resulting graph is mere informative. 
It is also important to decide at this level whether to 
develop peragraph tnp4n eKtraetien or not. For the fozmer 
optien, the paragraph can be summrlzed by creating a new 
sentence or, alternatively, one of the emisting sentences is 
selected to "represent" the whole paragraph. 
3.5. The Text. The questions of parsgrs~ 
structure and of teKtual t~p~ ~xtrscti~n arise here 
s/milarly to paragraph analysis. 
340 
4. A S~mantic Metric for NLP. 
(I0) mmmarizes all the main options for semantic 
malysis in ~2 ~=level). 
(10) Semantic Metric for ~LP: 
(LAUSE ~ PARAGRAPH T~XT 
+Full +Frill +~ldl +Full +Full 
+Limited _+Limited _~imited _+Limited _+Limited 
Method: +Ccmp. _+CI .Bound. +Sen .Bo~d. +Para.Bound. 
set/fes- +Prop. _+Disc. Str. _+Disc. 8tr. _+Disc. Str. 
ture/net _+Cemp. +Topic Extr. _+TopIC Extr. 
Each system of hLP can use (i0) to chart out its own 
method of semantic analysis, both before and after its 
formulation, and to ccmpare itself with any other system 
(the actual metric is derived from (I0) by adding an obvious 
measure of distance). Naturally, there are few~ pessib~e 
basic types of semsutic analysis in h~2 than 3x2-' > 5x10-, 
simply because meny values in (I0) determine others and 
render many c¢~binatiess incompatible. On the other hend, 
there are variations witldn the besic types. 
The proposed metric is just one part of A~LT/M2. The 
co,fete ASLT/SLP adds the following parts to the metric: 1) 
mutual determination and ~clusion of values in (1O); 2) 
choices for ~ecutien of each value; 3) relations between 
~LP needs aqd values and esmbinations of values. 
It slx~dd he noted that besides ensuring the total 
modularity of semantic moalysis in BLP by providing the 
full/partisl and unllmited/limited values for each level, 
this part of the theory is itself modular in the sense that 
any value or option, which may have been left out 
inadvertently or which may emerge in the future, can be 
added to (I0) without any problem. 

References

Chemsky, N. 1965. _~ of the Theor X of ~. 
Cambridge. MA: M.I .T. Press. 

Kittredge, R. and J. Lehrberger 1982. 5klblanguage-Studies 
of ~ ~ Restricted Semantic Domains. Berlin 
- New Yo~k: de Gruyter. 

Nirenburg, S. (ed.) 1985. Proceeding of the Conference on 
Theoretical and Methodological Issues in Machine 
Translation of Natural ~. Hamilton, N.Y.: 
Colgate University. 

Nirenbumg, S., V. Reskin, and A. B. Tucker 1985. 
'qnterlingua design for TRAN~IARDR." In: Nirenburg 
(1985), pp. Z~+-z#~. 

Reskin, V. (V.) 1971. K teorii ~ pedsistem /Toward a 
Theory of Linguistic Subsystems/. Moscow: Moscow 
University Press. 

Raskin, V. 1974. "A restricted sublanguage approach to high 
quality translation." American Journal of 
Computational Linguistics ii:3, Microfiche 9. 

Raskin, V. 1983. A Concise Histor~ of ~ ~mmntics. 
W. lafayette, IN: Purdue University, 3rd. ed. 

Rankin, V. 1985a. '%inguistic and encyclopedic information 
in te~t processing." ~asderni di Senantica VI:I, 
pp. 92-102. 

Reskin, V. 1985b. 'Fuinguistics and natural language 
processing." In: Nirenburg (1985), pp. 268-82. 

Reichman, R. 1985. Gettin~ the ~ to Talk Like You And 
Me. Cambridge, MA: M.I.T. Press. 

Schank R. and R. Abelsen 1977. ~ Plans, Goals, and 
Understendin Z. Hilladale, N.J. : L. Erlbanm. 

Schsnk, R., L. Bimbsum, and J. Mey 1985. '~ntegrating 
s~mntJ~s end pragmatics." ~aderoi di Semantica 
VI:2, pp. 313-24. 

Ullman, J. D. 1982. Princ~le~ of Database ~stems. 
Rsckville, MD: Computer Science Press, 2~d ed. 
