I 
i 
I 
/ 
I 
/ 
l 
I 
l 
I 
/ 
/ 
I 
Natural Language Concept Analysis 
Vera Kamphuis 
Dept. of Language and Speech 
University of Nijmegen 
The Netherlands 
v. kamphuis@let, kun. nl 
Janos Sarbo ~ 
Computing Science Institute 
University of Nijmegen 
The Netherlands 
j anos@cs, ktm. nl 
Abstract 
Can we do text analysis without phrase struc- 
ture? We think the answer could be positive. 
In this paper we outline the basics of an under- 
lying theory which yields hierarchical structure 
as the result of more abstract principles relat- 
ing to the combinatorial properties of linguistic 
units. We discuss the nature of these proper- 
ties and illustrate our model with examples. 
Keywords Language processing, relational 
model. 
1 Introduction 
In most mainstream approaches to natural language 
modelling and parsing, some form of hierarchical 
structure plays a central role. The most obvious 
case in point is phrase structure. However, while 
the latter notion has shown its theoretical relevance 
in many ways, practical applications based on phrase 
structure description are not without problems. The 
main reason for this is the high flexibility of natural 
language. In performance data (i.e. actual language 
use), many disruptions of, and variations on stan- 
dard phrase structure patterns occur. As a result, 
application of phrase structure-based parsers in nat- 
ural language processing shows only limited success. 
This has inspired a search for alternative methods, 
such as statistical based or lexicon-driven parsing. 
In our search for a solution to the problems men- 
tioned we decided to take one step back, and exam- 
ine the underlying nature of hierarchical structure in 
general, and phrase structure in particular. Our aim 
in doing so was to find a more principled solution 
to the problems of linguistic analysis and parsing. 
We looked for ways to derive structural information 
from input, and to incorporate this in a mathemat- 
ically well-founded theory of knowledge representa- 
tion. As a result we found a level of abstractness 
that, in principle, allows language-independent mod- 
elling and analysis. 
In our approach we capitalise on the property that 
the information carriers, the lexical items, are 'will- 
ing' to combine. These combinatorial properties 
are determined by inherent characteristics of lexi- 
ca/ items. Hierarchical structure follows naturally 
from the interaction of these properties, while leav- 
ing room for variation and flexibility in structural 
patterning. 
In our model of natural language (NL) the input 
is represented as a binary relation. This is due to 
the dichotomy of language, meaning that a classifica- 
tion of lexical items as objects and attributes can be 
made (we use the term "dichotomy" in a restricted 
sense: a division into two mutually exclusive parts). 
The two classes are interrelated, and their relation 
can be dete ~rrnined merely on the basis of lexicai in- 
formation and some general principles, like word or- 
der (e.g. SVO). The relation between the classes 
is due to the principle of relatedness. This princi- 
ple entails that any non-empty set of objects implies 
the existence of a non-empty set of attributes (prop- 
erties) it is related to, and vice versa. Minimally, 
an observable entity (object) has the property of ex- 
istence (attribute). This principle gives rise to a 
relation representing the semantics of the 'thought' 
described by the sentence in terms of a set of related 
items, called observations. An observation captures 
a set of objects and properties that are mutually 
characteristic of each other. Such a notion corre- 
sponds to a/ormal concept in lattice theory. We will 
show that the above relation is supported by linguis- 
tic considerations. 
Our approach to language, Natural Language 
Concept Analysis (NLCA), constitutes a linguisti- 
cally and mathematically based theory. This is re- 
flected by the different readings of the acronym, as 
follows. NLC(:A): the analysis of concepts that play 
a role in natural language; (NL)CA: the lattice the- 
Karaphuis and Sarbo 205 Natural Language Concept Analysis 
V. Kamphuis and J.J. Sarbo (1998) Natural Language Concept Analysis. In D.M.W. Powers (ed.) NeMLaP3/CoNLL98: New 
Methods in Language Processing and Computational Natural Language Learning, ACL, pp 205-214, 
oretical model of formal concept analysis applied to 
natural language; N(LCA): a natural transformation 
on language (in concrete, on functor-argument rela- 
tions). 
1.1 Related research 
Our theory goes together with a movement of mod- 
ern formalisms in computational linguistics which 
can be characterised by a shift of emphasis from a 
large, detailed syntax and simple lexicon, to a com- 
pact syntax and rich lexicon. Amongst other works, 
one can cite HPSG (Pollard and Sag, 1994), and 
most recently, a proposal by Berwick and Epstein 
(Berwick and Epstein, 1995). 
Berwick and Epstein outline a model that, in ac- 
cordance with Minimalist principles, does not posit 
"any syntactic entities at all beyond what \[is\] ab- 
solutely necessary for linguistic description and ex- 
• planation." The necessary machinery, as they point 
out, is one based on categorial grammar (Lambek, 
1988). Their argument follows from the fundamen- 
tal idea that natural languages are limited to rules 
specifying how constituents can be concatenated to 
form larger constituents. Berwick and Epstein intro- 
duce a single syntactic operation, Hierarchical Com- 
position (HC), for the realization of such syntactic 
constraints. 
With respect to the above mentioned movement 
in natural language processing, we note that the en- 
deavour to move (almost) all information to the lex- 
icon can be theoretically justified. Intuitively, prac- 
tical NL formalisms like HPSG can be seen as vari- 
ants of two-level, e.g. attribute grammars. The- 
oretically, for such a grammar, a weakly equivalent 
grammar using only a single nonterminal symbol ex- 
ists (Franzen, 1983). In such a grammar all struc- 
tural information is specified by attribute functions. 
These functions can be defined by the lexicon. 
2 A supporting theory 
Natural language modelling usually assumes some 
form of hierarchical structure as given. Experience 
shows that practical application of such an approach 
to a non-trivial subset of the language can be a 
highly complex task (Aarts, 1991). In our search for 
a more flexible basis we arrived at the question: How 
does phrase and clause structure emerge in natural 
language? It appeared that this question is related 
to a more general one: How can knowledge about 
real world be structured? 
We found a philosophical background in C.S. 
Peirce's pragmatism (Peirce, 1931) and a mathe- 
matical formalisation of Peirce's ideas in R. Wille's 
theory on Formal Concept Analysis (FCA) Wille, 
1982). Relatedness, for example, relies on Peirce's 
epistemological argument saying that "... there is 
no judgment of pure observation without reasoning ' 
(Houser and Kloesel, 1992). This means that an 
observation is always tied to "judgment"; in other 
words, in our case, observation of an object always 
implies the presence of an attribute, and some inter- 
pretation of their relation. 
In the FCA framework, observable world is de- 
scribed by a binary relation between the sets of ob- 
jects and attributes. These sets give a dichotomous 
characterisation of observable entities, and together 
with their relation formalise Peirce's universal cat- 
egories: firstness, secondness and thirdness. These 
are defined as follows: "The first is that whose be- 
hag is simply in itself, not referring to anything nor 
lying behind anything. The second is that which is 
what it is by force of something to which it is second. 
The third is that which is what it is owing to things 
between which it mediates and which it brings into 
relation to each other" (Houser and Kloesel, 1992). 
For the time being we adopt the interpretation 
of Lehmann and Wille (Lehmann and Wille, 1995) 
who state that "the object g is a \[f\]irst ... to which 
the attribute m is a \[s\]econd...". According to 
Lehmann and Wille, this interpretation is compati- 
ble with Peirce's general understanding of firstness 
and secondness. 
In FCA, observations, or concepts, are mathemat- 
ically formalised. Traditionally, the philosophical 
notion of a concept is determined by its extension 
and its intension. The extension consists of all ele- 
ments (set of objects) belonging to the concept while 
the intension covers all properties (set of attributes) 
valid for all those elements. 
In the mathematical model, the triple consisting 
of the sets, objects (G; Gegenst~tnde) and attributes 
(M; Merkmale), and the relation between them (R), 
is called the context (we assume that G and M are 
finite sets). We say, for g E G, m E M, (g, m) E 
R or equivalently, (gRin), iff the object g has the 
attribute m. 
For a context the following mappings are defined: 
A' = {m e M I gRrn for all g e A} for A C_ G; and 
B' = {ge G I gRrnforallme B} forB C M. 
A (formal) concept of a context (G, M, R) is a pair 
(A,B) with A C G, B C M, which satisfies the 
conditions (i) A' = B and (ii) A = B'. 
Informally, A ~ is calculated from A by considering 
the elements of A and accumulating the properties 
common to them all. B' is calculated dually. We 
say (A, B) is a concept if, by the above calculation, 
A and B mutually determine each other. 
For any concepts (A1,Bt) and (A2,B2) of a con- 
1 
II 
1 
1 
II 
II 
II 
II 
1 
1 
il 
II 
Karaphuis and Sarbo 206 Natural Language Concept Analysis 
II 
II 
II 
II 
II 
II 
tex~ the hierarchy of concepts is captured by the def- 
inition: (A1,B1) < (A2,B2) iffA1 C A2 (or equiva- 
lently, iff B1 _D B2). When the above order relation 
holds, (A1,A~) is called the subconcept of (A2, A~), 
and (Ae,A'~) the superconcept of (A1,A'I). The set 
of all concepts of a context with this order relation 
is called the concept lattice. 
3 Linguistic relations 
NLCA applies Wille's theory to natural language by 
the equivalence: attributes are functors, and ob- 
jects are arguments. Functor-argument relations, 
the manifestations of the (combinatorial) properties 
of lexical items, have various realizations on the lin- 
guistic level. For example, the verb-complement re- 
lation is not the same as the relation of modifica- 
tion. This becomes clear when we look at the op- 
tionality of modifiers. In English, we cannot say, on 
the basis of encountering a noun, that it needs an 
adjectival modifier; however, when we encounter an 
adjective, we do know that at some level it needs 
a noun because it is a semantic predicate (functor) 
taking an argument of which it is predicated. In this 
case, then, there is an asymmetrical relation between 
functor and argument. On the other hand, the re- 
lation between a verb and its complementation is a 
symmetrical one. 
In NLCA, we distinguish between two kinds of re- 
lations: major and minor. These types of relations 
can be recursively nested, and their sum uniquely 
characterises the input. The first type of relation, 
the major relation, or predication, is a pair (p,a), 
where a functions as an argument to the predicate 
p. A major relation may involve the distinction 
between an action/state and its participants (sym- 
metrical relation: each requires the presence of the 
other) and between an action/state or participant on 
the one hand and its properties on the other (asym- 
metrical relation, or modification: the predicate re- 
quires the argument of which it is predicated, but the 
reverse does not hold). We call predicates of the first 
type major predicates, and predicates of the second 
type minor predicates. 
It is interesting to note that this distinction re- 
flects the difference between constituency on the one 
hand, and dependency on the other. In linguistics, 
these two relations are often treated as (formally) 
equivalent alternatives. ~ In the current view, they 
entail a difference in status of the units that are in- 
volved in the relation. The nature of the relation 
in both cases is that of predication; however, in the 
1However, see (Fraser, 1996) for some qualifying re- 
marks on this topic. 
constituency case each part assumes the presence of 
the other, whereas in the dependency case, the pred- 
icate is optional. 
There are Various distinguishing factors between 
major and minor predicates. In English, major pred- 
icates (usually) relate to the noun-verb division; mi- 
nor predicates do not. Major predicates are typically 
realized by verbs; minor predicates by adjectives and 
adverbs. There is never more than one major predi- 
cate associated with an argument; there may be sev- 
eral minor predicates related to the same argument. 
(This reflects the possibility of having zero or more 
modifiers of an action or participant.) Both ma- 
jor and minor predicates can provide semantic roles, 
but major predicates introduce participant roles for 
their arguments; minor predicates can introduce ad- 
ditional roles (such as location or manner) or prop- 
erties of their arguments. 
The second type of relation, the minor relation, or 
qualification, distinguishes between the core content 
of a linguistic expression and some qualification of 
it. At the level of an action and its participants, for 
example, this qualification may relate to referential 
status of NPs (e.g. definite vs. indefinite article), 
or to tense and aspect information at clause level. 
Intensifying adverbs (e.g. very, extremely, deeply) 
and comparative adverbs (e.g. more and most) also 
belong to the class of qualifiers. These examples sug- 
gest that qualification may also have a symmetrical 
and asymmetrical variant: article, tense, aspect etc. 
being of the first type, and intensifying and compar- 
ative adverbs of the second. However, this is still 
an object of further study. In this paper we will re- 
strict ourselves to the distinction between qualifier 
and core in general. 
The difference between a minor predicate and a 
qualifier is that the latter does not introduce a mean- 
ing that is independent of the element it qualifies. 2 
The presence of a qualifier of a specific type, there- 
fore, also signals the presence of its counterpart. 
Furthermore, there can be several modifiers associ- 
ated with an argument or predicate; typically, how- 
ever, there will only be a single (possibly compos- 
ite) quali~er. In the case of referential information, 
for instance, the qualifier situates the argument or 
predicate in its referential context of which there will 
only be one. In some cases different aspects of the 
2By contrast, a minor predicate has some aspect of 
meaning that is independent of the element it combines 
with. This is illustrated by the fact that minor predicates 
can be used in different contexts. For example, a prepo- 
sitional phrase can modify an argument (e.g. noun) but 
also a predicate (e.g. verb). An adjective phrase can be 
used as a modifier of a noun, but also in the complemen- 
ration of a verb. 
Kamphuis and Sarbo 207 Natural Language Concept Analysis 
qualifier can be expressed separately (such as tense 
and aspect); in that case these different aspects must 
be unifiable but there cannot be more than a single 
qualifier relating to the same domain. 
The qualifier evokes its COUnterpart; nevertheless, 
the semantic 'core' is also complete in itself, in that 
it forms a full account of semantic relationships. 
Therefore it does not require realization of the quali- 
fier as such: cf. the use of such bare relations in cap- 
tions or telegram style speech (e.g. "Lion attacked 
woman.t" ). 
Summing up, we distinguish between the following 
relations: 
• major predication 
• minor predication 
• qualification. 
A schematic representation of these possibilities is 
given in Fig. 1. 
/ 
Minor relation: 
Qualification / \ 
Qualifier Core 
Linguistic relations \ 
Major relation: 
Predication / \ 
Major predication Minor predication 
(symmetrical) (asymmetrical) / \ / \ 
Major Argu- Minor Argu- 
predicate ment(s) predicate meat 
Figure 1: Inventory of linguistic relations in NLCA 
It is important to note that this diagram does not 
represent the hierarchical structure of sentences, or 
the organisation of conceptual content within the 
sentence (we will come back to this below); it merely 
shows the different types of relations that our ap- 
proach identifies. These relations lie at the heart 
of structure formation in NL. How phrase structure 
emerges as a result of their interaction is explained 
in Sect. 6. 
As mentioned, the different relations can be recur- 
sively nested. For example, at the level of argument, 
a modifying predicate may be added in the form of 
an adjective phrase, or a qualifier may be present 
in the form of a determiner. Each element that is 
added stands in a certain relation to its counterpart, 
based on the type of relation that was applied. 
There is a potential mapping between the linguis- 
tic relations displayed in Fig. 1 and the hierarchical 
organisation of information structure. For example, 
it is likely that the major predication relation is the 
most important information carrier with respect to 
the semantic content of the sentence, and that the 
minor predication relation reflects additional infor- 
mation of less importance. This illustrates the rela- 
tive contribution of the different linguistic relations 
to information content. In information retrieval this 
could help to generate a concise representation of re- 
trieved text. It would also be in line with the use, 
already mentioned, of the major predication relation 
in captions or telegram style speech; furthermore, 
it could be a possible explanation for the ability of 
speed-reading that readers may develop. 
The qualification gives concrete reference to all 
the items involved in the predication relation, and 
as such is relevant for all levels. The presence of the 
qualifier at all levels of representation is a matter 
of some importance: word order, for instance, may 
also be classified as part of the qualification relation 
(e.g. in English, word order is relevant for identify- 
hag questions, and also in assigning thematic roles to 
participants). The relationship between qualifiers in 
NLCA, and operators in the semantically based hi- 
erarchy of Role and Reference Grammar (Van Volin~ 
1993) would be a potentially useful area to investi- 
gate. 
4 A first sketch of the model 
The distinctions made above have been incorporated 
into the NLCA-model on the basis of the abstraction 
of FCA: the context. In the dyadic model of FCA, a 
context allows only two kinds of entities: object and 
attribute. Therefore, each lexical item has:to be clas- 
sifted as one these, based on its lexical type. Typical 
objects are nouns; typical attributes are verbs (ma- 
jor predicates), and adjectives and adverbs (minor 
predicates). We refer to these attributes uniformly 
as major attributes (involving predication). Qual- 
ifiers are classified as attributes, as well. We call 
them minor attributes (involving qualification). 
A comment with respect to the classification of 
lexical items and its relation to Peirce's universal 
categories is in place. We mentioned that objects 
and attributes formalise the categories firstness and 
secondiaess. Each item of these classes may evoke a 
different relation (called interpretant) depending on 
the item's syntactic and semantic properties, and in 
general, the properties of the item as a sign (Liszka, 
1996)..In NLCA these interpretants are instantia- 
tions of the linguistic relations formalising the cat- 
egory of thirdness. For example, the interpretant 
II 
II 
II 
II 
II 
Kamphuis and Sarbo 208 Natural Language Concept Analysis 
i 
i 
in 
i 
i 
i 
II 
II 
i 
i 
i 
i 
created by a verb, an instance of a major predica- 
tion, may 'explain' how that verb binds its argu- 
ments together '~in a bundle of interlocking relation- 
ships" (Sowa, 1996). 
The surjective mapping from lexical types to the 
sets of the dyadic model can be defined without caus- 
ing confusion. The set of lexical types defines a par- 
tition of L, the set of lexical items and semantic roles 
involved in the analysis, which is further partitioned 
according to the dyadic model, yielding the sets G 
and M. From G C_ L and M C L follows that there 
is an embedding of the relation R C G x M in L x L. 
This means that any pair (g, m) E R can be defined 
as the unique yield of 11 and l~ (ll, 12 E L) by the 
assignments ll ~ g and 12 ~-r m, where ~ respects 
the mapping of lexical types. 
As said above, each lexical item is classified ac- 
cording to its type. Furthermore, with each lexical 
item is associated a number of positions for inter- 
nal and external arguments, denoted as suffixes, int 
and _ext, respectively, z Internal arguments contain 
information regarding the item itself. External ar- 
guments relate to combinatorial demands to make a 
complex linguistic unit, according to the linguistic 
relations described above. We say the input is well- 
formed if the combinatorial demands of each lexical 
item are satisfied. 
The internal argument positions are filled (i.e. as- 
signed) by modifiers and qualifiers, which refer to 
distinct domains of analysis. For each domain holds 
that when an argument position is filled by more 
than a single element, these different elements have 
to be compatible (possibly depending on the con- 
text). For example, with multiple modifiers, e.g. two 
or more adjectives modifying a noun, the modifiers 
have to be semantically compatible in order to make 
a sensible construct: cf. the tall happy girl vs. ?the 
tall short girl. 
The external arguments of a verb (major predi- 
cate) are determined by the verb's valency: the sub- 
ject is also an external argument. These external 
arguments are involved in a symmetrical relation: 
an object fills the external argument position of an 
attribute, and vice versa, the attribute fills the ex- 
ternal argument position of that object. 
The external argument of a modifier (minor pred- 
icate) is involved in an asymmetrical relation: an 
object fills the external argument position of an at- 
tribute, and the attribute fills the (modifier) internal 
argument position of that object. 
The qualifier-domain of the internal argument 
Sin procedural terms, argument position and ar- 
gument correspond to formal and actual parameter, 
respectively. 
contains specific information that relates to the type 
of lexical item. For nouns, it is information re- 
garding reference: specific/generic/unique reference; 
number. For verbs, it is information regarding finite- 
ness/tense/aspect, etc. Thus, when the qualifier- 
domain of the internal argument of both the object 
and the major attribute is filled, there is explicit 
reference with respect to the action and the partic- 
ipants involved. Since qualifiers contain a specific 
type of information, they can be regarded as a syn- 
tactic pointer to the qualified element itself: if this 
domain of the internal argument is filled, there must 
also be an object/attribute of the type that the inter- 
nal argument belongs to. In case the qualifier pre- 
cedes its argument, this feature is reflected in the 
computational model by introducing placeholders, 
called Proto-items (cf. Sect. 6). Proto-items can 
only be introduced by qualifiers. When the argu- 
ment of the qualifier is found, it replaces the Proto- 
item and fills the external argument position of that 
qualifier. The relations that the Proto-item is in- 
volved in are inherited by that argument. 
Besides these relations, NLCA applies a set of 
general principles, like word order (e.g. SVO), in- 
heritance of relation and 'greedy' binding of lexical 
items. By the latter principle, the input string, form- 
ing the context of each lexical item, is evaluated from 
the perspective of that item and its needs: functors 
take the textually nearest arguments available, and 
vice versa. In NLCA input is analysed from left to 
right. 
Summarising thus far, we have incorporated the 
following aspects in our model: 
• each linguistic unit is classified according to the 
object/attribute dichotomy 
• each linguistic unit has positions for internal 
and external argument(s) 
• the internal argument positions are filled by 
qualifiers and modifiers 
the external argument positions are filled by el- 
ements that are involved in the predication re- 
lation. 
5 Relation Matrix 
The analysis of examples in NLCA is represented in 
terms of a so-called Relation Matrix (RM). A Rela- 
tion Matrix shows the relation between objects (rep- 
resented in rows) and attributes (columns). 
Conform to our definition in Sect. 4, we say a Rela- 
tion Matrix is well-formed if each external argument 
position of each attribute is filled (meaning that the 
Kamphuis and Sarbo 209 Natural Language Concept Analysis 
semantic roles of major predicates are realized and 
that all other combinatorial demands of attributes 
have been fulfilled) and each object is the external 
argument of some attribute. This implies that the 
input corresponding to a well-formed RM must con- 
sist of one or more clauses. In this paper we focus 
on the case that there is only a single clause• 
We represent a symmetrical relation by a pair of 
asymmetrical relations, and an asymmetrical rela- 
tion by a directed relation, called a pointer (PTR). 
Technically, the value of a matrix element, RM\[i,j\], 
is a tuple encoding a Boolean variable and a set of 
PTRs. 
In the graphical representation the value of a 
Boolean variable is represented by a '+' (true) or 
the empty string (false). We may also use the nota- 
tion '+i' referring to the ith true-value assignment. 
A PTR is depicted as a directed edge• Internal ar- 
gument positions of objects and attributes are dis- 
played to their left-hand side (there is one argument 
position for the qualifiers, and one for the modifiers); 
external argument positions to their right. Empty 
argument positions are omitted. 
If the external argument position of an object (at- 
tribute) is filled by an attribute (object), we assign 
true to the Boolean variable of the corresponding cell 
in the RM. These variables will be used for the rep- 
resentation of linguistic structure. The assignments 
can take place after the analysis is completed, or, in 
most cases, during the analysis. 
Attributes may have more than one external argu- 
ment position, and each of these may be involved in 
a different relation. Therefore, we use the conven- 
tion that external argument positions of verbs are 
displayed in separate columns. The relation of at- 
tributes and their external argument positions can 
be traced back in the Relation Matrix, however, in 
the examples, we do not graphically represent it. 
6 Examples 
It is now possible to illustrate the model by dis- 
cussing some examples in detail. The language of 
illustration will be English. 
Example 1 The door squeaks. 
the Minor attribute; it generates a column. Articles 
belong to the class of qualifiers, and thereby re- 
quire the presence of their counterpart. They 
create a Proto-object that needs to be filled, 
of which they themselves are the internal argu- 
ment. The Proto-object allocates a row in the 
RaM; 'the' points at its internal argument. 
• the --4 Proto-object_int 
door Object; fills the Proto-object slot created by 
'the' and thus finds 'the' pointing at its inter- 
nal argument position. 'door' itself points to 
the external argument position of 'the' (on the 
basis of the combinatorial demands of the lat- 
ter), leading to the linguistic unit 'the door' and 
yielding a '+' in the RM. 
This leads to the following postulate: when- 
ever the external argument position of an at- 
tribute (except for a verb) is filled, the elements 
transitively involved in the relation constitute a 
phrase. 
'door' replaces Proto-object 
door ~ the_ext 
%' in RM in cell door/the 
squeaks Major attribute; its internal argument is 
the present tense marker 's'; its external argu- 
ment is the 0-role of THEME (but see below), 
filled by 'door'; hence there is a pointer from 
'door' to this role, 4 and a '+' is put in the 
RM. 'squeaks' itself is the external argument 
of 'door'; again a '+' in the RM. This time 
the relation involves the external argument po- 
sitions of both attribute and object (as opposed 
to before, when the internal argument position 
of the object was involved). There are no exter- 
nal argument positions left unfilled; this signals 
a clause. 
Again we can formulate a postulate: there is 
a well-formed clause when all external argu- 
ment positions of a major predicate (attribute) 
are filled by objects, and the attribute fills the 
external argument positions of those objects, 
and no external argument positions of items in- 
volved in the major predication are left unfilled. 
(N.B. aVP can be found as a subset of a clause.) 
• squeak -4 door_ext 
• door -4 squeak_ext (THEME) 
• %' in RM in cell door/squeak 
The precise labelling of thematic roles varies 
across different models. The current role is 
suggested by Haegeman's THEME2 (Haegeman, 
1991), in Role and Reference Grammar (RRG) 
(Van V~\]in, 1993) it would probably be called 
4There could in fact also be pointer from the third 
person present tense marker to the object or THEME-role: 
this can be relevant especially in head-marking languages 
where the verb carries morphemes indicating the person 
and number of its arguments. Note, incidentally, that 
the latter situation is totally tmproblematic for NLCA, 
since it is based on the abstract linguistic relations rather 
than on concrete syntactic realizations. 
Kamphuis and Sarbo 210 Natural Language Concept Analysis 
II 
II 
II 
!1 
II 
II 
II 
Ii 
II 
I 
II 
II 
II 
m 
II 
m 
B 
| 
m 
m 
| 
I 
m 
| 
I 
I 
| 
m 
art EFFECTOR. Where Van Valin and Haege- 
man are in conflict, we shall at this point choose 
the more general role of the two. However, a 
more detailed analysis of predicates into differ- 
ent types (accomplishment, activity, state and 
achievement) with associated logical structure 
(as in RRG) is a desirability for the develop- 
ment of the NLCA-lexicon. 
The Relation Matrix for this sentence is displayed 
in Fig. 2 below. 
the .s.squeak ~_~ 
.+2 ~3 
Figure 2: The door squeaks 
Note that objects may have pointers to several ex- 
ternal argument positions of attributes. This corre- 
sponds to saying that the object may fulfil the argu- 
ment role of more than one attribute. This is in fact 
the case: cf. the occurrence of multiple modifiers 
of a single head, but also, in the current example, 
'door' functioning as argument to both the article 
'the' and the verb 'squeaks'. That the two have dif- 
ferent status is not a problem and is in fact relevant: 
when 'door' fulfils the external argument role of a 
verb it is involved in the major predication, because 
it itself requires a verb, but when it fulfils the ex- 
ternal argument role of adjectives or of articles it is 
not; it just completes their demands. 
There is another aspect of the model that can be 
illustrated on the basis of a sentence of this kind. 
For this purpose, let us use the sentence The moon 
rose. The lexical item 'rose' is ambiguous: it can 
either be the past tense of the verb 'rise', or it can 
be a noun referring to a flower, s In the former case, 
the analysis will take place in the same way as in 
the example above. But let us look at what would 
happen in the latter case. 
the 
lll°kon ~ + 1 
J'OSe~ 
Figure 3: The moon rose 
5We disregard for the moment the possibility of 
analysing the two nouns as a compound. 
In this example, we can see that assigning 'rose' 
to the object class leaves the analysis incomplete: 
the object is not connected to any attribute. Hence, 
under this reading, the sentence is ungrammatical 
(cf. Fig. 3). 
Example 2 The happy girl bought some flowers. 
the cf. E,x. I. 
• the ~ Proto-object_int 
happy Attribute involved in the predication rela- 
tion (minor predicate); generates a column. Its 
internal argument position is not filled. Its ex- 
ternal argument position needs to be filled with 
an element of type Object (as an adjective, it 
is predicated of nominal elements). There is a 
Proto-object present, hence there is a pointer 
from the Proto-object to the external argument 
position of the attribute (greedy binding), so we 
can put a '+' in the RM (under 'happy'). The 
attribute itself points to the internal argument 
of the Proto-object. Note that this leads to a 
chain of PTRs from 'the' via the Proto-object 
to an external argument that has been filled; 
such a chain gives rise to inheritance of the '+' 
to all relations involved in it. Therefore, there 
will also be a '+' under 'the'. This, in fact, cre- 
ates a nominal adjective phrase with an implied 
head (the Proto-object). 
Proto-object ~ happy_ext 
• happy --+ Proto-object_int 
'+' in RM in cell Proto-object/happy 
'+' in RM in cell Proto-object/the 
There is an important difference here between 
the role and treatment of the article and the ad- 
jective. Note that the nominal adjective phrase 
would not be created without the article: it is 
the article that supplies the Proto-object that 
functions in the nominal adjective phrase. It 
can do so because it belongs to the class of qual- 
ifiers; they require the presence of their counter- 
part and therefore can be said to create a Proto- 
object. The adjective, on the other hand, be- 
longs to the class of modifiers that are involved 
in the relation of minor predication. For this 
reason, they can be said to have an implicit ob- 
ject required to fill the place of their external 
argument position• The Proto-object generated 
by the qualifier can fill this role. Both qualifier 
and modifier belong to the class of internal argu- 
ments; however, they do not have the same sta- 
tus and are treated differently. The qualifier can 
generate a Proto-object (or a Proto-attribute if 
Kamphuis and Sarbo 211 Natural Language Concept Analysis 
it is the qualifier of an attribute) but this Proto- 
item does not fulfil its external argument need: 
otherwise, it would be wrongly assumed that 
a string consisting of the qualifier only would 
be grammatical. Hence there is no pointer from 
the Proto-object to the external argument of the 
qualifier. With the modifying adjective, exactly 
the reverse situation holds. As adjectives can be 
used in different types of contexts (e.g. attribu- 
tively or predicatively), they do not create a 
Proto-object. However, since they are involved 
in the relation of predication, their external ar- 
gument position can be filled by a Proto-object 
generated by a qualifier. They themselves then, 
also may point to the internal argument of that 
Proto-object. 
girl Object; replaces the Proto-object. The ob- 
ject points at the external argument position 
of 'the'. There is still a phrase, but now it is a 
full noun phrase rather than a nominal adjec- 
tive phrase. Since there is not yet a pointer to 
the external argument of the noun, we still do 
not have a clause, only a phrase. 
• 'girl' replaces Proto-object 
- girl --~ the_ext 
bought Major attribute; generates a column. Its 
internal argument is filled by the feature PAST; 
its external arguments are AGENT and THEME. 
Since 'buy' is a major predicate, it is the ex- 
ternal argument of the object 'girl', and 'girl' 
points to the AGENT role. As a result, there is a 
'+' in the Relation Matrix in cells ~rl/AGENT 
and girl/buy. However, since only one of the ex- 
ternai argument positions of the transitive verb 
is filled, the clause is not yet complete• 
- buy ~ girl_ext 
- girl -~ buy_ext (THEME) 
• '+' in Pt23/i in cell girl/AGENT 
• '+' in RM in cell girl/buy 
some A quantifying pronoun which may function 
as a determiner or as an independent pronoun. 
We can make a unified account if we treat it as 
an attribute that, like the article, introduces a 
Proto-object; however, unlike with articles the 
Proto-object now also points to the external 
argument of the attribute• As a result, there 
is a '+' in the Relation Matrix in cell Proto- 
object/some. This explains the possibility of 
e.g. She bought some, which, indeed, is com- 
plete but has an implicit object. In this view, 
then, quantifying pronouns are treated as an in- 
termediate type between the pure qualifier-class 
of articles and the class of adjectives, which do 
allow a Proto-object as their external argument. 
The Proto-object also realizes the external ar- 
gument THEME, causing a '+' to be placed in 
the appropriate cell of the Relation Matrix. 
• some -~ Proto-object_int 
• Proto-object -~ some_ext 
• Proto-object --, buy_ext (THEME) 
- '+' in RM in cell ProW-object/some 
• '+' in RM in cell PrOW-object/THEME 
• '+' in RM in cell Proto-object/buy 
flowers Object; replaces the Proto-object. 's' can 
be regarded as part of the internal argument 
(qualifier). Note that this does not conflict with 
the fact that 'some' also is an internal argument: 
they are unifiable within the same domain (both 
can signify plural; together they are plural in- 
definite)• 
'flowers' replaces Proto-object 
The Relation Matrix for this sentence is displayed 
in Fig. 4. 
the happy~ 
..+1~ 
~sflo~er t 
,PASTbuY+7+4 ~3 V sOmeT+5 J 
Figure 4: The happy girl bought some flowers 
This treatment of quantifying pronouns has two 
important advantages. First, it does not require am- 
biguous lexical entries. The same can be said for 
demonstrative pronouns, numerals and other func- 
tion words that are ambiguous between indepen- 
dent and adjectival use. Second, the use of Proto- 
objects makes it unnecessary to have a rule defining 
noun phrase heads as realized either by nouns, or by 
numerals, quantifying pronouns, demonstrative pro- 
nouns etc. In fact, this also applies to nominal ad- 
jective phrases: there is no need to define adjectives 
as possible realizations of noun phrase heads. The 
nominal adjective phrase follows naturally from the 
presence of the article (creating the Proto-object) 
and the adjective (combining with the Proto-object). 
Furthermore, this approach also accounts for the po- 
tential structural ambiguity of a quantifying pro- 
noun or a nominal adjective phrase followed by a 
Kamphuis and Sarbo 212 Natural Language Concept Analysis 
m 
| 
m 
| 
B 
m 
m 
m 
m 
m 
m 
m 
m 
m 
m 
m 
m 
II 
il 
II 
il 
II 
II 
il 
II 
plural noun phrase, as in apposition. (Example: 'On 
Monday she got a big bunch of flowers. The white, 
lilies, wiited after a mere few days.'). 
Going through the sentence from left to right, we 
see the following structure emerge: 
• At word 'happy' we obtain the nominal adjec- 
tive phrase (+1 and, through inheritance, +2); 
• At word 'girl' we obtain the noun phrase (PTR 
from 'girl' to the_ext ); 
• At word 'some' we obtain the clause with an 
independent pronoun (+5, +6 and +r); 
• At word 'flowers' we obtain the clause with 
'some' as determiner. 
As shown in these examples, the phrases and 
clauses can be found in the Relation Matrix. The 
concept lattice representation is especially valuable 
for the observations, which are essential for informa- 
tion retrieval (Sarbo and Farkas, 1995). Information 
present in the Relation Matrix is accessible from the 
concept lattice and can be used in the explanation 
of the concepts and sublattices. We note that in our 
application of FCA a concept containing the empty 
set is meaningless, because it is in conflict with re- 
latedness. The concept lattice of Fig. 4 is shown in 
Fig. 5. 
C3=({girl,flower(s) }, {buy(PAST)}) 
Cl=({girl}, {the,happy, C2=({flower(s)},{buy(PAST), 
buy(PAST) ,AGENT} ) THEME,some } ) 
CO=({}, {the,happy, buy(PAST), AGENT,THEME .some} ) 
Figure 5: Concept lattice of Fig.4 
For example, the concept C1 denotes the observa- 
tion: a particular ('the') object ('girl') has a prop- 
erty ('happy') and is the AGENT participant of the 
action 'buy(PAST)'. The concept C3 denotes the 
observation that 'girl' and 'flower(s)' are related to 
the action 'buy(PAST)'. It is interesting that these 
concepts correspond to the focus of WH-questions: 
(C1) Who bought (something)? (C3) What hap- 
pened? This suggests a potential correspondence 
between the linguistic device of question-formation 
and the information reflected in the lattice. More- 
over, it exemplifies the postulate (Sarbo, 1997) that 
a concept lattice is an appropriate representation for 
what is referred to in artificial intelligence as 'coop- 
erative communication' (Grice, 1975). 
In addition to the individual concepts, a 
(sub)lattice also has information content, compara- 
ble to that of a clause. In this example, C0-C1-C2- 
C3 represents the clause. The use of sublattices is 
potentially relevant for interpreting discourse rela- 
tions. 
7 Summary and conclusions 
In this paper we have focused on the underlying 
principles of hierarchical structure in language. We 
have discussed the theoretical foundations of Nat- 
ural Language Concept Analysis. We have shown 
that hierarchical structure, which is commonly taken 
as given in linguistics, emerges as the result of 
more abstract principles relating to the combinato- 
rial properties of linguistic units of different types. 
These properties derive from the inherent charac- 
teristics of lexical items and the different linguistic 
relations that they can take part in. The major 
relation, predication, has a symmetrical instantia- 
tion (predicate-participants) and an asymmetrical 
instantiation (predicate/participant-modifier). The 
minor relation distinguishes between semantic core 
and a qualification of that core. The application 
of NLCA was illustrated on the basis of examples. 
Current research on NLCA focuses on an elabora- 
tion of the linguistic and philosophical foundations 
on the one hand and algorithmic implementation on 
the other. 
Acknowledgement 
We are grateful to J6zsef Farkas for his pioneer- 
ing work and inspiration in the initial stages of this 
project. 

References 
Jan Aarts. 1991. Intuition-based and observation- 
based grammars. In K. Aijmer and B. Altenberg, 
editors, English Corpus Linguistics, pages 44--62. 
Longman, London and New York. 
Robert C. Berwick and Samuel D. Epstein. 1995. 
On the convergence of 'minimalist' syntax and 
categorial grammar. In A. Nijholt, G. Scollo, 
and R. Steetskamp, editors, Algebraic Methods in 
Language Processing ( T WL T 10), pages 143-148, 
Universiteit Twente, Enschede. 
Helmut Franzen. 1983. Compiler generation: 
From compiler descriptions to efficient compilers. 
Bericht nr. 83 - 20, Technische Universit~t Berlin, 
May. 
Kamphuis and Sarbo 213 Natural Language Concept Analysis 
Norman M. Fraser. 1996. Dependency Grammar. In 
K. Brown and J. Miller, editors, Concise Encyclo- 
pedia of Syntactic Theories, pages 71-75, Perga- 
mon, Oxford etc. 
Herbert P. Grice. 1975. Logic and conversation. In 
P. Cole and J. Morgan, editors, Syntax and semi- 
otics: Speech acts, volume 3, pages 41-58, Aca- 
demic Press, New York. 
Liliane Haegeman. 1991. Introduction to Govern- 
ment and Binding Theory. Basil Blackwell, Inc., 
Cambridge, MA. 
Nathan Houser and Christian Kloesel, editors. 1992. 
The Essential Peirce: Selected Philosophical Writ- 
ings (1867-1893). Indiana University Press, 
Bloomington. 
Joachim Lambek. 1988. Categorial and Categori- 
cal Grammars. In R.T. Oehrle, E. Bach, and 
D. Wheeler, editors, Categorial Grammars and 
Natural Language Structures, D. Reidel Publish- 
ing Company, Dordrecht-Boston. 
Fritz Lehmann and Rudolf Wille. 1995. A triadic 
approach to formal concept analysis. In G. El- 
lis, R. Levinson, W. Rich, and J.F. Sowa, edi- 
tors, Third lnt. Con\]. on Conceptual Structures, 
ICCS'gS. Springer-Verlag. 
James J. Liszl~. 1996. A General Introduction to 
the Semeiotic o\] Charles Sanders Peirce. Indiana 
University Press, Bloomington and Indianapolis. 
Charles S. Peirce. 1931-35. Collected Papers. Har- 
vard University Press, Cambridge. 
Carl Pollard and Ivan A. Sag. 1994. Head-driven 
Phrase Structure Grammar. The University of 
Chicago Press, Cambridge, MA. 
Janos J. Sarbo. 1997. Building Sub-Knowledge 
Bases Using Concept Lattices. The Computer 
Journal, volume 39, no. 10, pages 868-875, Ox- 
ford University Press. 
Janos J. Sarbo and Jozsef I. Farkas. 1995. Know- 
ledge representation and acquisition by concept 
lattices. In Shanl Markovitch, editor, Proc. o\] the 
11th Izraeli Symposium on Artificial Intelligence 
(ISAI'95), Hebrew University of Jerusalem, Izrael. 
John F. Sowa. 1996. Processes and participants. In 
P.W. Eklund, Gerard Ellis, and Graham Mann, 
editors, Conceptual Structures: Knowledge Repre- 
sentation as Interlingua (ICCS'96), volume 1115, 
pages 1-22, Springer-Verlag. 
Robert D. Van Vz\]in~ Jr. 1993. A synopsis of 
Role and Reference Grammar. In Van Valin, ed- 
itor, Advances in Role and Re\]erence Grammar, 
pages 1-164, John Benjamins Publishing Com- 
pany, Amsterdam-Philadelphia. 
Rudolf WiUe. 1982. Restructuring lattice theory: 
an approach based on hierarchies of concepts. In 
I. Rival, editor, Ordered sets, pages 445-470. D. 
Reidel Publishing Company, Dordrecht-Boston. 
