CONCEPT-ORIENTED PARSING OF DEFINITIONS 
WILLY MARTIN 
Free University of Amsterdam 
The Netherlands 
0. Introduction 
Computational lexicology/lexicography 
currently favours issues related to the 
~, the reoresentation and the 
application of iexical knowledge functioning 
within a NLP-environment. Within the first 
domain, that of acquisition, one finds e.g. such 
topics as the extraction of information from 
corpora and machine readable dictionaries 
(MRD's). This paper - which is the result of a 
project in which, next to the author, also 
M.Reedijk, E.ten Pas and L.Willekens 
participated -, falls within this first domain as 
it will explicitly deal with the extraction of 
(lexicai) knowledge from (dictionary) 
definitions. However, it will be evident that 
acquisition without a representational 
framework does not make (much) sense. 
Furthermore we will also indicate how to use 
the knowledge obtained. 
1. Definitions and Meaning Typ¢~ 
Our starting-point is the fact that words, with 
regard to their meaning, can be classified into 
meaning _t~zpes. Words can have meanings that 
are predominantly conceptual, collocational, 
grammatical, figurative, connotative, stylistic 
and contextual/discursive. A word such as the 
geological term magma typically has a 
conceptual meaning only, another one such as 
bloody (as in 'you bloody fool') typically 
combines collocational meaning (intensification) 
with stylistic meaning aspects (very informal), 
whereas the same word, bloody, e.g. in a 
sentence like 'I got my bloody foot caught in 
the bloody chair' (example taken from 
LDOCE) mainly gets a discursive, a 
contextual, meaning (functioning as an 
(emotional) stopword). Different kinds of 
lexical meaning types require different 
descriptive treatments. So e.g. terms, showing 
'par excellence' conceptual meaning, will 
require first and foremost conceptual meaning 
descriptions i.e. concept-oriented definitions. In 
what follows then we will concentrate on terms 
and their meaning as expressed in definitions, 
the typical locus tbr conceptual meaning 
information. Accordingly the parser we will 
present will be concept-oriented. 
2. Conceot-oriented oarsim, of terms 
The parser under discussion is set up to analyze 
definitions of medical terms in En~,lish. As 
such it is but one of the components of a 
system which at the moment consists of a 
preprocessor, a segmentor, a lexicon, a set of 
conceptual relations and a parser proper. In 
order to better understand the approach under 
discussion we will 
first give a general overview of the 
overall algorithm 
thereafter globally comment upon those 
aspects which are most relevant from a 
computational linguistic point-of-view 
(as it is impossible, given the amount of 
ACTES DE COLING-92. NANTES, 23-28 AOt3"r 1992 9 g g Prtoc. OF COLING-92. NArCrES, AUG. 23-28. 1992 
space and time, to give a full and 
detailed picture of the whole project). 
2.1 Overall al~,orithm 
The basic algorithm can be roughly 
characterized as consisting of the following 
steps: 
a. read definition 
b. segment definition 
c. look for head of definition 
d. check clues 
e. look for subhead(s) of definition 
f. fill frame subhead(s) taking into account 
(checks on) 
coordination 
clues 
postmodification 
g. fill flame head 
h. write sense frame 
A typical input reads like this: 
"rheumatoid "arthritis: a chronic disease of the 
musculo-skeletal system, characterized by 
inflammation and swelling of the joints, muscle 
weakness, and fatigue" (taken from Collins 
Dictionary of the English Language 19862 ) 
The corresponding output looks like 
rheumatoid arthritis: 
\[disease gaffects musc_skel_syst\] 
\[disease has_qual chronic\] 
\[disease hassymptom fatigue\] 
\[disease has_symptom weaknessl 
\[disease has_symptom inflammationl 
\[disease hassymptom swellingl 
Iweakness - gaffects musclel 
lswelling gaffects jointsl 
linflammation g_affects joints\] 
In what tbllows we will try to make clear the 
main features (a system leading to) such a 
result implies. 
2.2 Basic features 
2,2,1 Input soecifications 
Up till now we have only dealt with definitions 
for ~lhS.C,~Cdi (terms for nosoiogy concepts). 
These definitions can be taken from all kinds of 
sources, e.g. from termbanks or from 
(terminological) dictionaries. The example 
given above should make clear that we work 
with analytical definitions exhibiting all kinds 
of difficulties in both lexis and ~ (such as 
structural ambiguities cf. 'inflammation' vs. 
'inflammation of the joints'). 
2,2,2 Lcmmatizer-tagger as Front-end 
It goes without ,saying that a lemmatizer-tagger 
is a basic requirement for the efficient 
operation of the parser. This way text words 
(: word forms occurring in the definitional 
text) can be linked up with the items occurring 
in the lexicon (see below). For that purpose we 
use an adapted version of Dilemma (see Martin 
e.a. 1988 and Paulussen-Martin 1992). 
2.2.3 Minimal syntax 
After having been lemmatized and tagged, the 
definition gets ,split up into smaller parts 
(segments) by the Ee.g!llgaI~. This module is a 
minimal syntactic processor which, on the basis 
of categorial information (such as Boolean 
values for NP compatibility and NP 
delimitation), delimits word groups in the input 
string. Unlike other approaches (such as 
Alshawi 1989) which make use of syntactic 
pattern matching techniques, syntax is kept to 
a strict minimum as one of our claims is that 
ACRES DE COL1NG-92, NANTES, 23-28 AOOT 1992 9 8 9 PROC. OF COLING-92, NANTEs, AUG. 23-28. 1992 
much of what is done (by others) syntactically, 
can be left out when one disposes of more 
powerful, i.c. conceptual, knowledge. As a 
result our input definition now looks as follows 
( \[ indicating delimiters, \[ \[ indicating 
boundaries): 
a chronic disease l of the musculo-skeletal 
system I , (characterized) I by inflammation I 
and swelling I of the joint(s) I, muscle 
weakness l, and fatigue I I. 
2.2.4 Conceptual knowledge and calculation 
The knowledge banks which form the core of 
the system are the ilg_xJgg.a and the set of 
conceotual relations. 
A lexical entry, e.g. aids, is a three-place 
predicate consisting of the actual lexeme, its 
concept type and its word category. So: 
(aids, concept (nosology-concept, aids, lu, u, 
u, u, u, ul), n). 
As one will observe, the second argument, the 
concept type, consists of a sixtuple, i.c. six 
unspecificied slots. The parsing of definitions is 
precisely aimed at fillinf, or snecifving these 
slots. 
It is the set of conceptual relations that a 
concept type may have that determines this 
specification. At the moment such a 
I~ for diseases (nosology concepts), 
somewhat simplified, looks as follows: 
nos-concept 
g affects (nos, {macro, 
embryo}) 
o_affects (nos, organism) 
caused_by (nos, etiology) 
has_symptom (nos, finding) 
transmitted by (nos, trans) 
has qual (nos, qual) 
micro, funct, 
In our approach the universe of discourse is 
split up into 22 interrelated ooncepttypes, 
which, as a rule, form homogeneous subsets. 
At the center of it one finds nosology concepts 
which show relations with other concepttypes 
which in their turn may show relations with 
other concepttypes, which in their turn are 
related to other concepttypes, etc. 
At this point it is important to see that implicit 
concepts such as nosology concepts, (and so the 
conceptual meaning of the iexeme aids e.g.) 
can a.o. be defined/specified by concepts taken 
from the domain of macro- and micro-anatomy 
and that, in the given case, the relation between 
both arguments will be established. In this 
respect it is crucial for the parser to find the 
head conceot of the definitional phrase. It does 
so by setting up a syntax-based hypothesis 
(taking the rightmost noun occurring in front of 
the first delimiter) and checking it with 
conceptual knowledge. In case of a definition 
of aids as 
"a group of diseases secondary to a defect in 
cell-mediated immunity associated with a 
single newly discovered virus" (taken from 
Eurodicautom) 
in a first instance group will be taken up as 
head. Afterwards it will be rejected on 
conceotual grounds, a.o. because of the fact 
that group is not considered a medical concept. 
In other cases head shiftin~ will take place 
because of the fact that the head candidate can 
not be conceptually specified by its subheads 
(conceptual incompatibility between the 
assumed head and its subhead(s)). In the same 
vein, when being confronted with "classes of 
phenomena that present great difficulties for all 
syntactic formalisms (...) \[One ofl, the most 
important of these being conjunction (...)" 
(Winograd 1983, 257-258), the parser again 
will solve (or try to solve) these cases by 
making use of conceptual information. That, in 
the case of rheumatoid arthritis, it does not 
yield parses such as 'swelling of muscle 
AcrEs DE COLING-92, NANTES. 23-28 AOt~' 1992 9 9 0 PROC. OF COLING-92, NANTES. AUO. 23-28, 1992 
(weakness)' and that it manages to combine 
'joints' both with 'swelling' and 'inflammation' 
proves it to be fairly successful in this respect. 
Other examples of conceptual calculation imply 
the establishment of new concept types out of 
old ones. 'Throat' e.g., being a macro- 
anatomical concept, becomes a finding concept 
when in combination with a qual concept such 
as in 'sore', this way 'sore throat' can 'fill' a 
symptom relation with a nosoiogy concept. 
This example also shows that the ~ is 
conceived as an ~ one: concepts are 
thought of as atoms from which more complex 
structures can be derived; if the latter are 
compositional and can be computed however, 
they are not taken up as such. Other examples 
of conceptual par~ing include the application of 
rules for PP-attachment. Compare: "a disease 
characterized by a sense of constriction 
chest" vs. "a disease characterized by a sense 
of constriction in children". In the tbrmer case 
the PP 'in the chest' will be attached to the 
preceding concept 'constriction', in the latter 
the PP 'in children' will be attached to the head 
concept 'disease'. Local attachment of PP's 
(other than those introduced by 'ot ~) only 
prevails on global attachment (to the head) if 
certain conceptual conditions are met, such as 
the nature of the concept types in the PP 
tollowing a finding concept such as 
'constriction'. 
2.2.5 Frames 
Given a definition of which the head or 
conceptual type has been established, the parser 
tries to fill its conceptual template or frame as 
much as possible. It does so by looking 
recursively for pre- and postmodifiers (the 
latter are called subheads), which 'fit' the head 
(or its modifiers). Fitting here means that the 
concept type of the governed lexeme 
corresponds with the concept type of one of the 
arguments of the template of the governing 
iexeme. In the 'rheumatoid arthritis' example 
above e.g. the functional concept type of which 
'musculo-skeletal system' is an instantiation, 
'fits' or 'fills' the first argument or slot of the 
concept type rheumatoid arthritis belongs to. 
M.m. the .came can be ,said tor all the other 
slot-fillers. 
Front the above it will have become clear that 
for the representation of conceptual meaning 
we have chosen tbr a frame-based system (see 
e.g. Habel 1985): concept types are defined by 
frames, i.e. sets of conceptual slots, attributes 
or features. 
3. Usefulness 
The parser described here tries to serve a 
twofold aim. In the first place its aim is 
p_~aJ_Citl. By making definitions conceptually 
explicit it is first of all possible to enhance the 
access to data bases (by making search items 
available in a systematic way). Secondly 
because of the fact that definitional knowledge 
becomes available in a systematic way, it also 
becomes possible now to generate from pilrtial 
conceptual knowledge (answering such 
questions as: what is the term for the disease 
caused by HIV?, how is the disease affecting 
the immune system called ? etc.). Finally, by 
yielding 'semantically relational knowledge', 
syntactically ambinuous structures can be more 
readily solved (think of PP-attachment, cf. he 
treated the children with epilepsy). 
In the second place the system-cure-parser was 
set up as a pilot project in order to shed some 
light Oll such notions as lexicon ~tructure and 
(power of) conceat-oriented parsing. Judging 
from the results obtained up till now, we dare 
say that, with regard to the former, a 
relational-conceptual model of the lexicon 
offers interesting perspectives (although we still 
have to tackle in more detail such problems as 
concept disjunction and non-monotonic default 
reasoning), and that, with regard to the 
parsing, in as far as analytical definitions 
reflect a conceptual structure, syntactic 
problems in parsing become by far more 
feasible to overcome. 

References
M.Reedijk, Een conceptuele oarser voor 
definities van medische termen, ms., 
Amsterdam, 1991. 
Winograd, T., l~lneuage as a cognitiv¢ 
EtO¢_¢~. Vol 1: Svntax, Addison-Wesley, 
1983. 
Alshawi, H., Analysing the dictionary 
definitions, in: B.Boguraev and T.Briscoe, 
(Eds.), Comoutational Lexicography for 
Natural Languaze Processing, london/New 
York, 1989, 153-169. 
- Burkert G. and P.Forster, Representation of 
semantic knowledge with term subsumption 
languages, ms., Stuttgart, 1991. 
- Habel, C., Das Lexikon in der Forschung der 
kiinstlichen Intelligenz, in: C.Schwarze and 
D.Wunderlich, (Eds.), Handbu¢ta ¢ler 
Lexikologie, K/3nigstein, 1985, 441-474. 
Mars, V. e.a., Eindrapportage Saoiens- 
P_r_oj¢~, UT-gedeelte, Twente, 1991. 
Martin, W. e.a., Over Aflex. Relset. 
Conceotor e.a., VU-bijdrage tot bet Sapiens- 
prototype, Amsterdam, 1991. 
- Martin, W. e.a., Dilemma, an automatic 
lemmatizer, in: .CA/lJllg~, 1988, 5-62. 
- Martin W. and E.ten Pas, Metatools for 
Terminology, in: Corpusgebaseerde 
Woordanalvse. Jaarboek 1991, Amsterdam, 
1991, 83-99. 
- H.Paulussen and W.Martin, Dilemma-2: a 
lemmatizer-tagger for medical abstracts, 
conference on Aoolied Natural Lan~,uaee 
Processing. Trento. 1-3 Aoril 1992, ACL 
Morristown 1992, 141-146. 
