O. Introduction 
This article describes a project whose aim is to specify 
tools to be integrated in an environment for lexical 
analyses. As a result, a prototype of a workbench can be 
created which provides a user with several modules 
possessing different functions, in order to approach a text 
from different viewpoints. 
The prototype has been implemented on Macintosh. 
Every module can be used autonomously; once integrated 
in the environment they realize a sort of network of tools 
interacting with one another. 
Let us take a look at the single components of the 
system. 
Firstly, the user has at his disposal tools for the 
processing of a text in order to obtain indexes, 
concordances, lmnmatizations and various types of 
statistic analyses. 
The prototype also supplies the representational tools for 
structuring knowledge. 
A module containing an ontological reference scheme may 
be used to show a network of relationships between 
concepts or to suggest the description of single concepts. 
The user is also given a further possibility: access, 
starting from any node in the ontological network, to a 
lexical archive indicating all the terms that describe a 
specific conceptual field, with their relative definitions. 
In this way, the system helps in the interactive treatment 
of texts and nmkes it possible to mmlyze and to organize 
various types of information about a text. 
The front-end and certain modules have becu implemented 
by using HyperCard TM. This has certain cousequences on 
the interface to the global system, and on the structure add 
function of any single component. 
In a hypermedia framework, a text is no more a sequence 
of words or sentences, as phenomenologically it appears 
to a user, but it is a virtnal network of the associations 
implicit in it. 
In this way, the substance of a text coincides with the 
set of its possible readings: its informative content is a 
magma of fragments whose sense is re-created in the 
path of each reading. 
From a theoretical viewpoint, a hypertext denotes a 
non-linear writing whose structure is a set of nodes liuk~l 
by arcs. Nodes contain informative contents, while arcs 
represent the possible associations between different 
informative contents, in accordance with the logic of the 
hypertext itself. 
To sum up, the organization of the diflerent knowledge 
sources within the system facilitates the behaviour of a 
human operator working on a text from different 
viewpoints by using the computational metaphor of 
AN INTEGRATED ENVIRONMENT 
FOR LEXICAL ANALYSES I 
C. CALIGAI~IS, A. CAPPELLI, M. N. CATARSI, L. MORE'Iq'I 
lstituto di Linguistica Computazionale - CNR 
Pisa, Italy 
hypertexts as a means of presentation of data: he can 
consult a library of electronic books, generate and consult 
lexical archives and indexes of frequencies, and 
contextualize words representing the knowledge of a text, 
while using knowledge sources of different types as a 
control and a guide. The global ,architecture of the system 
is shown in figure 1. 
Ontok~llc.~l Bodu~ 
Figure 1 
1. Lexical Treatment of Texts 
The user has at his disposal certain tools by which he can 
build and consult several sources, each of which 
constitutes a sub-enviromnent with its own specific tools. 
In particuhtr, a library offers a set of texts to be treated by 
using a set of lexicographic tools (..Elaborazione l,,essicale 
Testi) (Moretti, 1991). 
1 This work was partially supported by Progetto Finalizzato Sistemi lnformatici e Calcolo Parallelo of 
C.N.R. 
Ac'rEs DE COLING-92. NAI',rrES, 23-28 AOI~'i" 1992 9 3 S I)ROC. OF COLING-92. NANTES, AUG. 23-28, 1992 
The environment 'text' is composed of electronic books, 
and it allows the user to perform all classical operations of 
text processing with the text 'on line'. In particular, 
concordances can be obtained by choosing the length of 
the context, lists of frequencies or variants can be shown, 
and lemmatization can be performed interactively by using 
the lexical archive as a guide. 
Hypermedia technology makes it possible to approach the 
text in several ways, since the fragments of a text 
can be linked in accordance to a possible reading criterion. 
In this sense, it is possible to match different critical 
editions or to follow the text in accordance with 
linguistic stylistic facts. 
2. Knowledge Representation Language 
The knowledge representation language is a member of 
the family of hybrid systems, and is made up of a 
terminological component and an assertional one, 
although certain characteristics make it more similar to 
classical KL-One (Cappelli et al., 1983; Bracbman & 
Scbmoltze, 1985; Nebel, 1989). 
The terminological part may be used for the definition of 
generic concepts, representing classes of objects, while the 
assertional part is used for the definition of individual 
concepts, representing single objects. 
The structures of the terminological part serve to specify 
the properties of the generic concept that we are defining. 
The principle of inheritance applies among the concepts of 
the network. The sub-concept inherits the properties of the 
superconcept, even if these are not expressly declared. 
Furthermore it is possible to indicate, by means of other 
generic concepts, the relationships that exist between the 
properties of the generic concept that we are defining: 
these relationships are known as structural descriptions. 
The syntax of the terminology is shown in the following 
<terminology> ::= <generic declarations> ; 
<role declarations> ; 
<paraindividual declarations> ; 
<generic declarations> ::= 
(<generic identifier> = <generic>)* 
<role declarations> ::= (<role identifier> = <role>)* 
<paraindividual declarations> ::= 
(<paraindividual identifier> = <pamindividual>)* 
<generic> ::= <generic identifier> I 
thing I 
(primC <index>) I 
(and <generic> <generic>) I 
(or <generic> <generic>)l 
(all <role> <generic>) I 
(atleast <number> <role>)l 
(atmost <number> <role>) 
(sd <paraindividual> <generic>) 
<role> ::= <role identifier> I 
(primR <generic> <name>) 
<paraindividual> ::= <paraindividual identifier> I 
(paraindividual <generic> <name>+) 
<generic identifier> ::= stringa di caratteri 
<role identifier> ::= stringa di earatteri 
<paraindividual identifier> ::= stringa di caratteri 
<name> ::= stringa di caratteri 
The structures of the assertional part serve to define 
individual concepts by specifying the values assumed by 
the properties of the corresponding generic concept. 
The language is based on an intensional semantics, 
formally specified in Mazzeranghi (1991), and its 
constructors are interpreted on a universe of structured 
objects. In other words, the denotation of a generic 
concept is represented by its properties. 
It is thus suitable to account for complex processes 
involving properties of objects which are specific to the 
linguistic analysis of a text and, in particular, to the 
structuring of lexical knowledge. 
The expressive power of the language has been further 
increased in order to account for other conceptual facts, 
such as recursive definitions (father~mother) or definitions 
expressed by procedures (length, addition, subtraction) 
(Mazzeranghi, 1991). 
As an example, the partial definition of the concept 
football-team is shown in the following: 
football-team = (and team 
(all member football-player) 
(atleast 11 mertd)er)) 
that is to say, a football-team is a type of team whose 
members are football-players who are at least l 1. The 
denotation of football-team is the following: 
I footba!l-tecun = 
max max nU 
PR OD(t I(\[T 1\] rain 5,...,t n(\[T r,\] nun ~ )'/Zt~t/0_f_c(\[P\] 11)) 
where: 
PROD denotes the Cartesian product, 
max 
\[An\]min~ denotes the lists of elements belonging to A, 
whose length is between rain and max 
(if max=nil then there is no upper bound to the length of 
the lists), 
m m_.~..~, which is the name of the role member, acts as a 
type constructor, 
tl ,...,tn are the names of the properties inherited by team, 
TI,...,Tn are the value-restrictions of the properties 
inherited by team, 
minl,maxl,...,minn,max n are the number-restrictions of 
the properties inherited by team, 
P is the denotation of football-player. 
The denotation of football-team is graphically represented 
in figure 2 (where circles represent denotations of generic 
concepts and squares represent denotations of roles). 
The language can be used to interrogate the ontological 
module, which can give information about both the 
syntax and the semantics of the definition of a concept, 
which in turn can be translerred into the body of a 
programme specified in terms of the language itself. 
3. Ontological module 
The ontological module serves to guide the user in the 
acquisition and structuring of knowledge by suggesting 
AcrEs DE COLING-92, NANTES, 23-28 AOC'r 1992 9 3 6 DROC. OF COLING-92, NANTES, AUG. 23~28, 1992 
hypotheses about the descriptiou ol coucepls and thci~ 
possible relatiouships. 
TI (minl,maxl) 
lm~RuLr, p 
"'" team\] II\[football-"~y,--'~ J-a 
c>a/ 
Tn (minn,naaxn) 
Figure 2 
At present, it contaius a collection el two hundred 
concepts organised into the form of a scmiultic uetwnrk, 
with which it is possible to classify a vast portion of 
reality. 
This leads to a laxonouly which serves its an ontological 
reference guide, snggesting the map of possible 
relationships between concepts untl the most plausible 
elements of their structure. 
3.1. Ontological Theories 
Many theories have been proposed about ontological 
descriptions of coucepts (S m ith & Medin, 1981). 
In the classical model, concepts arc described hy using 
necessary and safficient conditions. \[u other models, 
proposed by psychologists, descriptive elements urc 
partitioned rote properties and dimensions, the fornlcr 
being labels assmning binary lrulh values, while the latter 
only numerical values. In certain cases, descriptive 
elements arc related to their definiendum on tile bilsis of 
probabilistic parameters or fuzzy logic. 
A taxonomy of part-whole relations has also been 
proposed (Winston et al., 1987; Fredcrking & Gehrkc, 
1988) where properties arc classified into six types 
(component~integral object, member~collection, 
portion~mass, stuff/object,fbature/activity, place/area) and 
deductions can he performed according to certain 
principles which govern the relation between the 
definiendum and its descriptive parts, such as, \[or iustauce, 
transitivity. 
Outologists have propostxl global models ou tile basis of 
types of concepts and of their properties. The world is 
then partitioned lute substances aud accidents and ce~tam 
classical notions are dcfiued, snch as genu.~, eidos , etc 
(Simons, 1983). 
KCmer (1970) defines a categorial framework as a whole 
where epistemological, logical and ontological aspects ure 
intertwinexl. 
Keil (1989) introduces a division rote three general types 
of concepts: natural, nominal and art~fact, and describes 
criteria for their individuation and description. 
Knowledge-based systems using large kuowlcdgc bases 
organized on the basis of ontological principles have been 
prnt~Jse.d in Artificial Intelligence (Nirenburg & Moltarch, 
1987; Lena( & Gnha, 1990; Onyshkevych & Nireubarg, 
1991) 
Briefly, efiin ts have been devotexl to finding out criteria for 
structurinp the world by individuatiug both general tylms 
of coucepts iull)osiug goueral constraints oil subtypes and 
types of properties which are i)ertiueut to specific types of 
conceilts. In i)articular, tile logic has been investigated 
which goverlls Ihe relationship between a definiemlnm and 
its definiens, even it so tar results are far from bciug 
definilivc. 
3,2. Ontological classtficat#m 
To be cpistcmologically adequate, an ontology must 
include i) a taxonomy of concepts wilh lheil descriptions, 
it) classification and indivklualion t~riuciples associated to 
concepts. 
L2,I, Taxonomy 
As lcgards the coustruction (if 1he taxonomy, certlliu 
optious have been adopted, with the aim el accomtting 
for aspects el tile inucr ilaturc el cr)ucepts and 
gtlalauteeing a consistent method of acquisition of 
knowledge and, consequently, a plausible level of 
iuterential powcl. 
At the toll of tile taxonolny, as "pule ontological" 
summa genera, the distinction into: natural (apple, lion), 
nominal (mayor), al~d artifilct kimls (cat, chair) has been 
dfawu. 
Nalural kinds are those existing in nature aud arc descdbexl 
by natural sciences; lhey "tele, to classes of lhiugs that 
occur m the world iudcpendeillly of hnnlau activities" 
(Kcil, 1989 1).25). Artifacts arc elements ii~tentionally 
btlilt to IlerfoHn a specific lmlction. Nonlinal kiuds are 
inere abstl'act cnlities which collsist of it descliptiou 
(mayor) which can bc applied lo instunccs belonging to 
diflk~rcnl kinds. 
This distinction between tmtohlgical kinds is relevant ill 
order to slrncture \[be universe iu\[o chunks of knowledge 
which ure homogeneous from all inlereutial pniut of view. 
l,et us illlfo(ltlce all exualple ill order to clarify the 
structure (fl the hi,Ill. 
The nominal kind "lllayor" can be applied loa person who 
is a human being - a natural kind , and it denotes a 
temporary status of such a human beiug. To be no longer 
a nlayor dogs not inlply the negation of the existence of 
till individual, while to uegalc lhe essence as a hulllau 
being does. "\]'his classificatilm obviously has effects on 
the outological existence of objects (Wiggins, 1980; 
Keil, 1989). From the point el view of the topological 
Skructure el (lie IIl~lp, this \[)hellolneln)ll creates a complex 
chunk of knowledge, its shown in figure 3. 
Only u correct disposition o\[ the concepts involved 
guarantees the right instamiation of individuals, thus 
allewmg trae intetcuces. 
3.2.2 Descriptions of concepts 
hi describing a cnnccl:,t, Cell2tiu illhelenl luoperties are 
expressed. To be something means sharing certaiu types 
of descriptive parts with a set of other concepls. The 
description of a single concept has to express the 
in'opel(its ou the basis of which it cau be diflcreutiated and 
indlviduatcxl. 
AClT~ roe COLING-92, NAbrITS, 23-28 ^ot'rr 1992 9 3 7 Pt~oc. o1: COLING 92, NANrlis, AUG. 23-28, 1992 
In the ontological map, certain types of properties are 
associated with a concept which, as a whole, constitutes a 
guiding reference scheme for the description of all its 
dependent subconcepts. 
Figure 3 
As an example, the concept "container" is associated to a 
set containing the following types of properties: content, 
stuff, shape,function, and component. 
It is worth noting that these last are ~ of properties to 
which specific values can be associated in the description 
of each single subconcept or individual. 
On the basis of these types, a set of constraints can be 
specified, such as, for instance: 
the property 'Stuff' follows the part-whole taxonomic 
model as shown in Winston et al. (1987) and Frederking 
& Gerhke, (1988); 
the property 'Content' is organized on the basis of the 
"place/area" model, where the following transitivity 
principle is valid: if in(x,y) ~ in(z,x) then in(z,y); 
'Shape' in certain cases refers to the shape of one of the 
components of a container, which may coincide with the 
shape of the whole; 
'Component' also follows the part-whole model; 
'Contextual use' is to be intended as a social and not a 
functional use, the latter being the specific use of 
containing something. 
To sum up, every type of property is interpreted through a 
specific set of rules. In this way, a sort of infinite lattice 
structure is realized where different axiomatic systems of 
knowledge coexist (see figure 4), each of which has its 
own interpreter and interacts with the others (Woods, 
1990). 
3.2.3. lndividuation principles 
The map has been created by using the knowledge 
representation language previously described, which 
supports classification and individuation principles. 
The calculus of the properties of a concept makes it 
possible to build concepts using constructs, such as; for 
instance, and, or, not, applied to roles of concepts, or to 
compare concepts, or to classify concepts on the basis of 
their whole structures. 
Furthermore, the knowledge representation language has 
acquired more "ontological" adequacy by the insertions of 
global ontological rules concerning the number of 
properties a concept can possess, such as for instance: 
- if two concepts each have only one property and the 
properties belong to the .same type, then the properties 
cannot have the same value; 
- no value can appear more than once in the description of 
a concept, etc.. 
These rules act as integrity constraints in the creation of 
Phy~c al ~Events 
Liv~g Things Functional~Artifacts 
Sentient Beings Nonsentient Beings 
I Part-Whole 
Taxonomy 
Containers 
I stuff I C°mwn~tl I 
I sha~ II 
Figure 4 
concepts and control both the syntax and the semantics of 
the knowledge base beeing created. In other words, the 
result has been achieved of specifying a sort of "style 
checker" guiding in the manipulation of knowledge. 
Furthermore, procedures of any kind can be associated to 
concepts for their interpretation (Ihooks). 
In this way the knowledge representation system realizes 
de facto an object-oriented system. 
In our system it is possible to specify an assertional 
AC'rES DE COLING-92, NANTES, 23-28 AOt3"r 1992 9 3 8 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
language which makes it possible to introduce an 
individual concept into a programming language, like any 
other data type. For instance, an individual concept is 
passed to a function as a l)arameter; once verified that this 
individual is an instance of a generic concept, or of one of 
its subeoncepts, the fimction will be executed. 
4. Lexical archive 
The lexieal archive contains a set of lemmas to which 
with the following information is associated: i) a set of 
forms with morphological categories; ii) etimology; iii) 
phonological transcription; iv) definitions in form of 
text. 
Every type of information can be usexl for retrieving data 
inside the lexical archive. In order to retrieve conceptual 
knowledge, which can be extracted from definitions, many 
possibilities are given. By applying the ELT tools, which 
make it possible to contextualize l×~rtions of texts, the 
visualisation of the definition of a word can be obtained, 
or the immediate super-ordinates of the word, or the entire 
conceptual hierarchy implicit in the whole archive can be 
retrieved, or parts of definitions in order 1o find out 
differences or commonalities can be compared. 
4.1. Linking ontology and lexical items 
Concepts in the ontology are linked to lexicul terms of the 
lexical archive and, vice-versa, from any lexical entry in 
the archive, the ontological module can be accessed. This 
is done by using a set of entry points which correspond to 
specific elements in a definition. 
Certain concepts of the ontological network arc associated 
with a list of operators which map the concept in 
significant words inside definitions. As an example, the 
concept of "human being" can be mapped onto the 
operators 'person', 'who' which realize the concept of 
"human being" in the lexical archive. Accessing the 
lexical archive starting from the ontological module, 
lexical tools are triggered which make use of the list of 
the operators as searching criteria. In this way the explicit 
organization of knowledge of the ontological module is 
virtually linked to the organization which is implicit in 
the lexical archive. 
5. Conclusions 
To sum up, we may say that we are trying to create an 
environment composed of various tools, integrated 
together, which allows the treatment of a text, and to 
facilitate the construction and the use of knowledge bases, 
created from the text itself, for a human operator. 
The construction of each single module and its integration 
within the global system has been carried out taking into 
accotmt the philosophy of knowlcxlge-bascxl systems and 
hypertexLs. 
The latter represent a good tool lot the presentation of 
data, thus allowing 'personal' readings of them: once they 
arc integrated with knowledge-based tools, the global 
expressive power of the system substantially increases, 
since data can be abstractly manipulatc~l. 
Knowledge representation tools make it possible to build 
specific theories of the world; by using these tools with 
the control of an ontological reference schema, any user 
can realize his own theory of the world in a continuous 
comparison with a 'standard' organization of knowledge. 
The specific theory is then able to increase the modalities 
of searching through dau~ sto~,.d in different modules, since 
it acts as an intelligent interface to data. For instance, it 
can be used as a filter in searching in the lexical archive, 
thus overcoming the low degree of expressiveness of its 
stored information. In this way, a more flexible 
interaction with any module can be obtained. 
References 
Brachman R. J., Schmolze J. G., An overview of file 
KL-ONE Knowledge Representation System, Cognitive 
Science 9 (1985). 
Cappelli A., Moretti L., Vinchesi C., KL-Conc: a 
Language for Interacting with a SI-Nets, in Proceedings of 
the 8th-lJCAl Conference, Los Altos: Kaufmann, 1983. 
Frederking R. E., Gehrke M., Resolving Anaphoric 
References in a DRT-based Dialogue System, in H.Trost 
ted.), 40sterreichische Artificial-Intelligence- Tagung, 
Springer, 1988, 94-103. 
Keil F. C., Semantic and conceptual development, 
Cambridge (Ma.): Harvard University Press, 1979. 
Keil F. C., Concepts, Kinds, and Cognitive 
Development, Cambridge: MIT Press, 1989. 
KOrner S., Categorial FrameworL~, Oxford: Blackwell, 
1970 
Lenat D. t3., Guha R. V., Building Large 
Knowledge-Based Systems, Representation and Inference 
in the Cyc Project, Reading (Ma.): Addison-Wesley, 
1990. 
Mazzeranghi D., Una Semantica lntensionale per un 
Linguaggio di Rappresentazione della Conoscenza, ILC- 
KRS-1991-3, Pisa: Ist. di Linguistica Computazionale, 
1991. 
Moretti L., Text Processing in un Ambiente lpertestuale, 
in Atti del Corso Seminariale "Nuove Tecnologie e Beni 
Culturali" a cura dell'Aecademia di Studi Mediterranei di 
Agrigento, 199l. 
Nebel B., Reasoning and Revision in Ilybrid 
Representation Syster~z~, Berlin, 1990. 
Nirenburg S., Monarch I., The role of Ontology in 
Concept Acquisition for Knowledge-Based Systems, 
Carnegie- Mellon University, Pittsburgh, PA, 1987. 
Onyshkevych 13. A., Nirenburg S., Lexicon, Ontology 
and Text Meaning, in J. Pustejovsky & S. Bergler (eds.), 
Lexical Semantics and Knowledge Representation, 
Berkeley (Ca.), 1991. 
Simons P., A Lesniewskian l,anguage for the 
Nominalistic Theory of Substance and Accident, lbpoi 2 
(1983), 99-109. 
Smith E. E., Modin D. L., Categories and Concepts, 
Cambridge (Mr.): Harvard Univ. Press, 1981. 
Wiggins D., Sameness and Substance, Oxford: Basil 
Blackwell, 1980. 
Winston M. E., Chaffin R., Hernnann D., A Taxonomy 
of Part-Whole Relations, Cognitive Science 11 (1987), 
417-444. 
Woods W. A., Understunding Subsumption and 
Taxonomy: A Framework for Progress, TR-19-90, 
Harvard Univ. Center for Research in Computing 
Technology, Aiken Computation Laboratory, Cambridge 
(Ma.), 1990. 
AC'rES DE COLING-92, NANTES, 23-28 AoL'r 1992 9 3 9 PaOC. OF COLING-92, NANTES, AUG. 23-28, 1992 
