IMPLEMENTING SYSTEMIC CLASSIFICATION BY UNIFICATION 
C. S. Mellish 
School of Cognitive Sciences, University of Sussex 1 
The "system networks" of Systemic Grammar provide 
a notation for declaring how combinations of properties 
may imply or be inconsistent with other combinations. 
Partial information about a linguistic entity can be 
recorded as a set of known properties, and a system 
network then enables one to infer which other proper- 
ties follow from this and which other properties are 
incompatible with this. The possible descriptions al- 
lowed by a system network are partially ordered by the 
relationship of subsumption, where a description sub- 
sumes any description that is more specific than it, 
given the background constraints declared by the net- 
work. Given this partial ordering, the set of descriptions 
can be seen as forming a lattice with least upper bound 
and greatest lower bound operations. In a class of 
applications (such as parsing and generation) that re- 
quire incremental description refinement, we are only 
really interested in forming new conjunctions (greatest 
lower bounds) and testing subsumption relationships. 
If one factors out the complexity of variable renam- 
ing and introduces special top and bottom elements, the 
set of logical terms also forms a lattice (the lattice of 
Generalised Atomic Formulae -"GAF lattice") under 
the partial ordering relation "is equally or more instan- 
tiated than" (Reynolds (1970)). In this lattice, the great- 
est lower bound operation is unification (Robinson 
(1965)). Unification is a primitive operation in most 
logic programming systems and is also the basis of 
various grammatical formalisms. It is therefore a rela- 
tively well understood operation and can be efficiently 
implemented. 
In this paper, we investigate to what extent it is 
possible to find structure-preserving mappings from the 
description spaces defined by system networks to su- 
blattices of the GAF lattice. Where this is possible, we 
can use a fixed mapping from property names to logical 
terms to create terms that represent conjunctive de- 
scriptions (by unification) and to test subsumption (by 
testing "less instantiated than"). Incompatibility of 
descriptions is also indicated by unification failure. 
There are a number of reasons why it is interesting to 
investigate these possibilities: 
(I) They may result in more efficient or flexible ways of 
using system networks for inferencing. 
(2) They may enable language analysers or generators 
which involve unification for other reasons (e.g., 
analysers for GPSG (Gazdar et al. (1985)) or Func- 
tional Unification Grammar (Kay (1984)) to build in 
feature co-occurrence tests using the same mecha- 
nism. 
(3) They may enable us to make formal sense of various 
ad-hoc mechanisms used by logic programmers in 
natural language processing. 
(4) By exposing the nature of the relevant description 
spaces, they may open various possibilities for the 
implementation of other classification tasks, e.g., 
concept learning (Mellish forthcoming). 
(5) They may give us more insight into the semantics of 
system networks and the potential of unification. 
1 HALLIDAY'S SYSTEM NETWORKS 
System networks, as used in Halliday's Systemic Gram- 
mar (Hudson (1971), Kress (1976), Winograd (1983)) are 
a way of encoding the choices that must be made in the 
generation of a complex linguistic object and the inter- 
dependencies between them. There is actually nothing 
that makes such networks specific to linguistic applica- 
tions, and so there is no reason why they cannot be 
applied to describing the choices involved in other 
complex situations. 
A system network can be viewed as a graph, some of 
whose nodes are annotated with symbols representing 
properties. The nodes are tied together by the use of 
four different "connectives", which we shall designate 
by "1", "{", "}" and "\]". In order to be precise about 
Copyright 1988 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided 
that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To 
copy otherwise, or to republish, requires a fee and/or specific permission. 
0362-613X/88/010040-51503.00 
40 Computational Linguistics Volume 14, Number 1, Winter 1988 
C.S. Mellish Implementing Systemic Classification by Unification 
gender 
mA~uline 
feminine 
neuter 
exactly what system networks mean, we will present a 
logical interpretation, where each appearance of a 
"connective" in a network gives rise to a set of logical 
axioms relating the property symbols (interpreted as 
unary predicates) appearing with it. 
A fundamental concept in system networks is that of 
the choice system. A choice system indicates that, if a 
certain "entry condition" holds, then the object de- 
scribed must have exactly one of the properties men- 
tioned in the system. Choice systems are denoted by 
use of the "1 .... connective". Thus, Figure 1 indicates 
that masculine, feminine and neuter are mutually exclu- 
sive and whenever an object has a gender it has one of 
these. In logic, 
V x: AMO {feminine(x) masculine(x) neuter(x)} 
V x: gender(x) =- feminine(x) v masculine(x) v 
neuter(x) 
where AMO ("at most one of") is defined by: 
AM0 S = =/~ 7(s I & $2) 
sl,s 2 E S 
SI ~ $2 
Incidentally, an alternative reading that might suggest 
itself, namely: 
V x: gender(x) =- (AMO {feminine(x) masculine(x) 
neuter(x)} & 
feminine(x) v masculine(x) v 
neuter(x)) 
is not adequate, because it allows spurious models, for 
instance where there is an object "a" which satisfies 
"feminine(a)" and "masculine(a)" but not "gender 
(a)". 
Sometimes more than one choice will be relevant, 
given the same entry conditions. This is indicated by the 
"{ .... connective". For instance, as indicated in Figure 
2, in some languages a noun may be either singular or 
plural, and also either masculine or feminine. Instances 
of the "{" connective can be translated into logic by 
noull 
f 
ma~uline 
feminine 
singular 
plural 
simply treating the entry condition of the "{" as that of 
all the networks introduced on the right hand side. 
Thus: 
V x: AMO {feminine(x) masculine(x)} 
V x: noun(x) -= feminine(x) v masculine(x) 
V x: AMO {singular(x) plural(x)} 
Vx: noun(x) -= singular(x) v plural(x) 
The final two connectives concern complex entry con- 
ditions into networks. A conjunctive entry condition is 
denoted by "}", as shown in Figure 3. This means 
sing 
geadex 
simply: 
V x: third(x) & sing(x) -- gender(x) 
Finally, "\]" introduces a disjunctive entry condition, so 
that the example provided in Figure 4 
persomd 
demonstr 
number 
means: 
V x: personal(x) v demonstr(x) ~ number(x) 
By convention, uses of the four "connectives" can be 
connected together in any way, as long as "loops" are 
not created. That is, if one regards each connective as a 
set of arcs going from properties on its "left" to 
properties on its "right", the resulting directed graph 
must be acyclic. 
As a larger example, Figure 5 depicts the system 
network for English pronouns presented in Winograd 
(1983)). Here is a logical translation of selected parts. 
V x: AMO {first(x) second(x) third(x)} 
V x: person(x) --- first(x) v second(x) v third(x) 
V x: AMO {singular(x) plural(x)} 
V x: numb(x) = singular(x) v plural(x) 
V x: third(x) & singular(x) -= gender(x) 
It is important to note that in this paper we consider 
system networks as a self-contained notation for de- 
scribing certain types of choices ("systemic choices") 
that are available in the construction of a complex 
(linguistic) object. We will be completely ignoring the 
philosophical differences between Systemic Grammar 
and other forms of generative grammar, and we will also 
completely ignore the other components that are re- 
quired in a full Systemic Grammar, such as realisation 
rules. 
Computational Linguistics Volume 14, Number 1, Winter 1988 
t 
41 
C.S. Mellish Implementing Systemic Classification by Unification 
question 
personal 
f 
f 
f 
persorl 
ease 
animate 
inanimate 
subjective 
objective 
reflexive 
possessive 
possdet 
numb 
first 
second 
third 
singular 
plural 
~ gender feminine masculine 
neuter 
demonsWativ, 
nes.T 
fax 
2 SUBSUMPTION AND THE LATTICE OF DESCRIPTIONS 
The property symbols in a system network provide a 
basic vocabulary out of which descriptions can be built. 
The most obvious way to produce more complex de- 
scriptions is by conjunction and disjunction. The logical 
interpretation of such complex descriptions is straight- 
forward, and we will often blur the distinction between 
a description and its interpretation. Thus: 
masculine & singular corresponds to 
Ax. masculine(x) & singular(x) 
masculine v feminine corresponds to 
Ax. masculine(x) v feminine(x) 
A fundamental relationship between descriptions is 
subsumption: 
d I subsumes d 2 iff ~ ~ V x: d2(x) D dl(x) 
where E is the set of axioms derived from the network. 
Note that our notion of subsumption depends vitally on 
~. This is a special case of what Buntine (1986) calls 
"generalised subsumption". Intuitively d~ subsumes d2 
if, given the axioms E, d~ is a more general description 
than d2. That is, if dl describes all the objects accurately 
described by d2 and maybe more. Subsumption is a 
partial ordering on descriptions, and the set of possible 
descriptions (properties and all possible finite conjunc- 
42 
tions and disjunctions of descriptions made from them j, 
ordered by subsumption, forms a lattice. In this lattice, 
the least upper bound of two descriptions is their 
disjunction and the greatest lower bound is their con- 
junction. Figure 6 is a picture of a portion of the lattice 
consisting of the descriptions derived from the pronoun 
network. In the picture, if there is a line going upwards 
from description d2 to description dl then dl subsumes 
d 2. Two descriptions that are logically equivalent (e.g., 
"personal&singular" is the same as "singular&case") 
give rise to a single node in the diagram (technically, we 
are interested in the quotient lattice of the free lattice 
generated by the property symbols, with respect to the 
congruence relation defined by E). To find the node for 
the conjunction of two descriptions, one finds the 
highest node that is "below" both, i.e., the greatest 
lower bound. Similarly, to find the node for the disjunc- 
tion of two descriptions, one finds the lowest node 
which is "above" both. 
3 INCREMENTAL DESCRIPTION REFINEMENT 
The previous two sections introduced a simple language 
of descriptions and the use of system networks to 
express extra background information about (con- 
straints on) the terms appearing in those descriptions. A 
notion of subsumption was defined which allowed this 
Computational Linguistics Volume 14, Number 1, Winter 1988 
, ( 
C.S. Mellish Implementing Systemic Classification by Unification 
personal singular 
third perso 1 & 
gender 
J 
neuter 
background information to be taken into account. But 
what are the operations that we need to carry out on 
descriptions in practical natural language processing 
systems, and does the structure we have described 
support these? 
In this paper, we will concentrate especially on a 
process that seems to arise in a number of contexts in 
natural language processing--incremental description 
refinement. Incremental description refinement (IDR) 
takes place when. a target description is gradually being 
built of some individual, and information about this 
individual appears as a sequence of self-contained, 
independent data descriptions. For instance, the target 
could be the description of an English sentence and the 
data descriptions partial descriptions of this sentence 
like: 
the sentence is passive 
the sentence is declarative 
the agent of the sentence is the speaker 
At any point in an IDR process, the information that has 
accumulated so far may allow certain properties of the 
individual to be inferred, and so one would like to be 
able to interrogate the partial description that has been 
built. In particular, one would like to be able to answer 
questions about which descriptions are compatible and 
incompatible with the target description. To build an 
effective IDR system, one must have a way to represent 
the conjunction of an arbitrary set of pieces of informa- 
tion so that inconsistency and subsumption relation- 
ships with other descriptions can be easily detected. 
The term "incremental description refinement" was, 
we believe, originally coined by Bobrow and Webber 
(1980), but the notion of incrementally building descrip- 
tions has been influential in a number of AI projects. 
In natural language processing, IDR is relevant to 
both natural language parsing and generation. In pars- 
ing, it is natural to accumulate information about the 
structure of a phrase gradually as words are read. For 
instance, in the sentence 
The hairy sheep was... 
Computational'Linguistics Volume 14, Number 1, Winter 1988 
we know after reading the first three words that the 
gender of the subject noun phrase is "neuter" and after 
the next word we know that the number of that phrase 
is "singular". It is important in parsing that we be able 
to accumulate pieces of information of this kind and 
detect inconsistencies if they arise. In generation, it is 
natural to want to allow different semantic and prag- 
matic factors to provide separate constraints on a 
sentence to be generated. For instance, one pragmatic 
goal may force a sentence to be passive; another forces 
it to have a given surface subject. This conjunction of 
constraints may be inconsistent with certain choices of 
the main verb (e.g., "buy" vs. "sell"). Again there is a 
need to reason about partial descriptions that are built 
incrementally. 
In formal terms, the operations involved in IDR are 
simple. At any point, the information known about the 
target description can be represented without loss by a 
single "partial description--the least upper bound of all 
the descriptions the target could be. Initially this is 
simply the most general description of all ("true"). 
When a new data description appears, the partial de- 
scription is replaced by the greatest lower bound of it 
and the data description. This "algorithm" for IDR is in 
fact a special case of a more general classification 
algorithm given in Mellish (forthcoming). At any point, 
a contradiction is signaled by the partial description 
becoming the most specific description of all ("false"). 
Moreover, one can validly infer that the target is 
subsumed by a given description if the partial descrip- 
tion is. The only operations that we need for IDR are 
subsumption checking and the computation of greatest 
lower bounds. This means that, in fact, we do not need 
the full lattice structure developed abovenall we need 
is the meet semi-lattice (Birkhoff (1963)) that contains 
the possible data descriptions and all possible conjunc- 
tions of them. 
The above description of IDR is not dependent on 
descriptions being related to system networks, and 
indeed IDR has been used in quite different contexts. In 
this paper, however, we will confine ourselves to this 
case, and consider IDR where the data descriptions are 
precisely the properties mentioned in a system network. 
We are thus concerned with ways of computing and 
testing subsumption between conjunctions of proper- 
ties, given the background information provided by the 
network axioms. 
4 USING LOGICAL TERMS TO ENCODE SUBSUMPTION 
RELATIONSHIPS 
The set of terms used in logic is partially ordered by the 
relation ("at least as instantiated as") where: 
t 1 --< t 2 iff 
t 1 = t20 for some substitution 0 
Thus for instance, the following statements are true: 
43 
C.S. MeHish Implementing Systemic Classification by Unification 
fly,z) -< x 
f(a,x) --  ,z) 
f(f(x)) -< fly) 
Just as we formed a lattice from descriptions, collapsing 
together two descriptions that were logically equivalent, 
so we can form a lattice from logical terms, collapsing 
together terms which are "variants" (i.e., terms tl and 
t 2 such that t 1 -< t 2 and t 2 -< t 0. This lattice, with a 
necessarily slightly altered version of -< defined on it 
and with special "top" and "bottom" terms added, is 
the lattice of Generalised Atomic Formulae ("GAF 
lattice") discussed by Reynolds (1970). In this lattice, 
the greatest lower bound operation is unification Rob- 
inson (1965), although for various sublattices the actual 
operation may be simpler. 
The main point of this paper is to investigate when a 
structure-preserving mapping (l-l, 0-preserving meet 
homomorphism) can be found from the description 
lattice arising from a system network to the GAF lattice. 
Where this is possible, we can use the mapping from 
property names to logical terms to make terms that 
correspond to conjunctive descriptions (by unification) 
and to test subsumption (by testing "less instantiated 
than"). Incompatibility of descriptions is also indicated 
by unification failure. 
We will initially illustrate the idea of encoding prop- 
erties as logical terms by an example. Let us assume 
that we are given the above system network for English 
pronouns. We might come up with a mapping z from 
properties mentioned in the network to logical terms 
which includes the following assignments (variables 
whose names are of no importance in the logical terms 
are denoted here by the symbol "_"): 
~(animate) = pr(q(an),c(_),no) 
•(case) = pr(_,c(_),_) 
•(far) = pr(d(far),no,n(_, _)) 
"r(gender) = pr(p(3),c(_),n(s,_)) 
r(neuter) = pr(p(3),c(_),n(s,n)) 
~(numb) = pr( .... n(_,_)) 
r(personal) = pr(p(_),c(_),n(_,_)) 
"r(reflexive) = pr(_,c(refl),_) 
r(singular) = pr(_,_)) 
•(third) = pr(p(3),c(_),n(_,_)) 
This mapping is not purely random, but has been chosen 
so that the logical relationship of subsumption is 
"echoed" in the "degree of instantiation" of the terms. 
So the idea of the mapping r is to have: 
Pl subsumes P2 iff rfp 1) --- "r(p 2) 
Thus, for instance, "gender" subsumes "neuter" and is 
translated into a term which is almost identical, but 
which is slightly less instantiated. The situation is 
similar with "case" and "reflexive". On the other 
hand, the terms from "reflexive" and "third" do not 
disagree on any non-variable component but neither is 
more instantiated than the other. This reflects the fact 
that the two properties are compatible but neither is 
44 
more general than the other. In the case of the terms 
from "singular", "third" and "gender", the "gender" 
term is the least instantiated term that is more instanti- 
ated than each of the other two (it is the result of 
unifying the other two). This reflects the fact that 
"gender" is equivalent to the logical conjunction of 
"third" and "singular". 
Figure 7 shows a portion of the GAF lattice whose 
structure mirrors the part of the pronoun description 
space shown in section 2. In the GAF lattice, one object 
is below another if it is more instantiated (less free), and 
the greatest lower bound is computed by unification. If 
pr(pO,c(...),n(..,,_)) pr(...,_.n(s,_.)) 
pr(p(3),c(._),n(_,_)) pr(p~,c(_.),n(s,_.)) 
pr(p(3).¢O.n(s._)) 
pr(p(3 ),c(...),n(sza)) 
we can find a mapping such as r, we can use operations 
in the GAF lattice to solve problems in the original 
description lattice. For instance, if we have an object 
which is "neuter" we might be interested to see 
whether it can also be "far". This is not possible, and is 
indicated by the fact that the two terms 
pr(p(3),c(_),n(s,n)) 
pr(d(far),no,n(_,_)) 
(from "neuter") 
(from "far") 
do not unify (the conjunction is the "bottom", or 
"false" property). Similarly, we can establish that if an 
object is "singular" and "far" then it cannot be "an- 
imate". For, unifying the terms for "singular" and 
"far" we get: 
pr(d(far),no,n(s,_)) 
which does not unify with the term for "animate". 
Using r, we can also verify that "reflexive&gender" 
definitely implies "personal", that "singular" is com- 
patible with "reflexive", and so on. It is worth noting 
that these inferences involving compatibility and prop- 
erties that are not "maximally delicate" go beyond 
those allowed by previous systems such as Patten (1986). 
This seems to work well, but is there a principled 
way to produce such mappings into the GAF lattice that 
is guaranteed to yield correct results? For correctness, 
Computational Linguistics Volume 14, Number 1, Winter 1988 
C.S. Mellish Implementing Systemic Classification by Unification 
we need the following to be true for all possible prop- 
erties p~, P2 .... Pn+l named in the system network: 
~VX: Pl(X) & P2(X) &...& pn(X) ~ Pn+l(X) 
iff 74pl) II r(p2) II . . 7(p n) -< ~'(Pn+l) 
E D Vx: p~(x) &p2(x) &...& p~(x) = F 
iff {'r(pl)rfp2 ) . . ~(p~)} is not unifiable 
where ~ is the set of logical axioms derived from the 
network and II is unification (greatest lower bound in 
the GAF lattice). A sufficient condition for this would 
be for ~- to be a 1-1, 0-preserving meet-homomorphism 
(Birkhoff (1963)). 
In fact, the above mapping does not echo in the GAF 
lattice the result: 
D V x: case(x) & numb(x) ~ personal(x) 
and cannot be straightforwardly extended to do so. We 
thus need to investigate under what conditions such 
mappings exist and what algorithms might enable us to 
discover them. 
5 A "BRUTE FORCE ~ TRANSLATION 
That such mappings always exist, and that inefficient 
algorithms exist for discovering them, is demonstrated 
by a more general result. If we have a finite set of 
propositional symbols and an arbitrary set of axioms 
mentioning these symbols then there is a method of 
encoding those symbols as logical terms so that con- 
junctions are computed by unification and subsumption 
between conjunctive descriptions (relative to the axi- 
oms) is reflected in "degree of instantiation". We can 
apply this result to the system networks domain be- 
cause the logical axioms, although not strictly proposi- 
tional, are equivalent to propositional axioms if we only 
consider one object being described at a time. 
The way to construct this encoding is to consider all 
possible models of the axioms and the truth assignments 
made to the symbols in them. In the case of the 
pronouns network, there are 54 models of the corre- 
sponding logical axioms. The following example shows 
the truth assignments made by one of them (this corre- 
sponds to the pronoun "he"). 
animate = F, case = T, demonstrative = F, far = F, 
feminine = F, first = F, gender = T, inanimate = F, 
masculine = T, near = F, neuter = F, numb = T, 
objective = F, person = T, personal = T, plural = F, 
possessive = F, possdet = F, pronoun = T, question = F, 
reflexive = F, second = F, singular = T, subjective = T, 
third = T 
If there are n possible models, each property is repre- 
sented by a logical term of arity n + 1. Each such term 
also has the same function symbol, the constant "0" as 
its first argument and the constant "1" as its last 
argument. The other arguments, for a given property, 
are to be derived as follows. Each argument starts off as 
a distinct variable, and then, for each model i which 
assigns "F" to the property, the ith argument is unified 
with the i + 1 st argument. In a situation where there are 
five possible models, the following example includes 
some of the terms that might result (again we use "_" 
for variables with uninteresting names). 
1 2 3 4 5 
I I I I I 
f(0 ..... X, X, 
fro, 0, _, x, x, 
f(0, X, X, X, 1, 
1) (property excludes only model 4) 
1) (property excludes models 1 and 4) 
1) (property excludes models 2, 3, 5) 
Each such term represents the models that the property 
excludes by the instantiation of its arguments. When 
two such terms are unified, the result is a term that 
encodes exclusion of the union of the models excluded 
by the terms individually. If the two terms together 
exclude all possible models, the unification fails (this 
occurs if we attempt to unify the last two of the above 
terms). This is as desired, since excluding all models 
amounts to incompatibility of descriptions. The tech- 
nique used here, which amounts to encoding sets as 
terms, was, we believe, first developed by Alain Col- 
merauer in another context. In fact, we can make do 
with only n arguments. 
In general, given a set of n property symbols, there 
can be as many as 2 n different models, and so this 
"brute force" approach is unattractive. In a situation 
where there are very few models of the logical axioms, 
however, the encoding scheme might be quite practical. 
For instance, for the pronoun network the number of 
models (54) is much smaller than the worst case for the 
same number of properties (22s). The theoretical maxi- 
mum number of models is reduced by a factor of 
roughly 2r/r for every "1" connective with r symbols on 
its right appearing in the system network. Moreover, 
the theoretical maximum is reduced by a factor of 
roughly 2 for each property appearing on the right hand 
side of a "{". The encoding technique is analogous to 
using bit strings to encode sets, the only advantage 
being that a contradictory conjunction is flagged imme- 
diately by a unification failure. On the other hand, the 
terms make use of repeated variables, and so the 
potential for exploiting parallelism in the unifications is 
restricted. 
System networks do not, however, correspond to 
arbitrary sets of logical axioms. Only certain kinds of 
axioms can come out of a system network (although a 
useful characterisation eludes us). We might therefore 
hope that, given the extra restrictions, there are better 
encoding techniques available, in particular encoding 
techniques that do not require the use of repeated 
variables. We will therefore investigate whether struc- 
ture-preserving mappings can be found from systemic 
descriptions to elements of the lattice of generalised 
atomic formulae without repeated variables, which we 
call GAF o. 
Computational Linguistics Volume 14, Number I, Winter 1988 45 
C.S. MeUish Implementing Systemic Classification by Unification 
6 ENCODING FOR THE CONNECTIVES "1~ AND "{~ 
If the only connective used in a system network is "1", 
a particularly simple encoding scheme is feasible. This 
technique has been used in an ad hoc way in a number 
of language processing systems written in Prolog, and 
probably originates from Dahl (1977). Consider the 
example network depicted in Figure 8, expressing the 
top levels of classification normally used for the plant 
kingdom. For such a simple network, we can use 
function symbols more than one argument. Such a 
system has been used by McCord (1986). Figure 9 
depicts a version of the "verb" network used by 
Winograd (1972), simplified to use only "1" and "{". 
One possible translation from the above property sym- 
bols into terms includes the following: 
,(vb) = vb(_,_) 
,(vprt) = vb(vprt,_) 
,(aux) = vb(aux( .... _),_.) 
plant 
v~u~ 
no~v~u~ 
psilopsidan 
sphenopsidan 
lycopsidan 
pteropsidan 
bacterial 
fungal 
algal 
bryophyte 
filicinal 
gymnos~rmA1 
angiospermal 
nesting as a means of capturing the subsumption rela- 
tionships in logical terms. Thus each property, apart 
from properties at the "leaves" of the tree, is associated 
with a unary function symbol, and the argument of a 
function symbol is used for a more "fine grained" 
description if that is available: 
,(plant) = plant(_) 
,(vascular) = plant(vascular(_)) 
,(pteropsidan) = plant(vascular(pteropsidan(_))) 
,(angiospermal) = plant(vascular(pteropsidan(angio- 
spermal))) 
Where there are alternatives at a given level in the 
classification, these are indicated simply by different 
function symbols appearing in the relevant argument 
positions. Thus, for instance: 
*(non_vascular) = plant(non_vascular(_)) 
*(sphenopsidan) = plant(vascular(sphenopsidan)) 
*(filicinal) = plant(vascular(pteropsidan(fdicinal))) 
*(fungal) = plant(non_vascular(fungal)) 
If the "{" connective is also allowed in system net- 
works, this means that a given property may have 
refinements along several independent dimensions. In 
the logical terms, this can be allowed for by giving the 
46 
*(neg) = vb(aux(neg,_,_),_) 
*(be) = vb(aux( .... be),_) 
*(pres) = vb(_,pres) 
Notice how the "aux" function symbol has arity 3; this 
corresponds to the three independent sub-classifica- 
tions introduced by the "{" connective. Similarly, "vb" 
has arity 2. The translation from symbols to terms is 
again fairly straightforward. 
Instead of going into detail about the relatively sim- 
ple problems of dealing with system networks contain- 
ing only "L" and "{" connectives, we shall concentrate 
on an algorithm which also allows "}" connectives. 
This algorithm was used in essence by Bundy et al. 
(1982), although it has not been described before. 
7 INTRODUCING "}~ 
In this section we describe a generalised translation 
algorithm which associates elements of GAF o with 
properties displayed in a system network. The algo- 
rithm works for networks using the connectives "l", 
"{" and "}", but does not handle networks that use it\],,, 
For this algorithm, we require in advance a function 
A which associates with each node n of the system 
Computational Linguistics Volume 14, Number 1, Winter 1988 
C.S. Mellish Implementing Systemic Classification by Unification 
vb 
J 
aux 
vprt 
no__vprt 
pres 
past 
en 
inf 
f 
neg 
pos 
qaux 
no.._qaux 
be 
do 
have 
will 
modal 
network (a possibly labeled location corresponding to 
one or more "ends" of connectives) a function symbol 
A (n). A is required to be I-1, except that it should map 
all the nodes appearing around a "{" connective to the 
same function symbol. Such a function is easy to define 
for a given network; for instance, most nodes can be 
simply mapped to the names associated with them in the 
network. The translation algorithm associates with each 
node of the network a pair iC, P). The values for C and 
P are: 
C = a set of "constraints" describing the term 
P = a "path" indicating a (possibly embedded) com- 
ponent of the term which can become further instan- 
tiated for terms representing properties subsumed 
by the current property 
A path is simply a sequence of alternating function 
symbols and numbers, either empty or starting with a 
function symbol, indicating a specific position in a term 
and the function symbols that appear on the route from 
the outside of the term to this position. A symbol in the 
sequence indicates a function symbol that is present, 
whereas a number selects one of the argument positions 
of the last function symbol. For instance, in 
f(a, g(X,h(b),i(d,e j(c)))) 
the symbol "c" appears at the place indicated by the 
path (f,2,g,3,i,3,j,1) and the symbol "b" at the place 
indicated by (f,2,g,2,h, 1). We will make use of two basic 
types of extensions of a path P: 
an extension of P to f, for some function symbol f: 
If P is 0 then {f) 
Otherwise if P is (p~, P2 .... i) for some 
number i 
then (Pl, P2 .... i,f) 
Otherwise if P is (Pl, P2 .... s) for some 
symbol s 
then iP~, P2 .... s,i,f) for some number i 
an extension of P beyond f, for some function 
symbol f: 
If P is i) then if, i) for some number i 
Otherwise if P is iPl, P2 .... i) for some 
number i 
then iPl, P2 .... i,f,j) for some number j 
Otherwise if P is (Pl, P2 .... s) for some 
symbol s 
then (Pl, P2 .... s,i) for some number i 
For instance, for the path if,2,g), two possible exten- 
sions to h are if,2,g,33,h) and if,2,g, 1,h). For the path 
if, l,g,2) two possible extensions beyond h are 
if, 1,g,2,h,4) and if, 1 ,g,2,h,6). Finally, we will have two 
ways of describing paths that differ, according to the 
type of the first component where they disagree: 
PI and P2 are independent iff their first 
disagreement is on a number 
P1 and P2 are inconsistent iff their first 
disagreement is on a function symbol 
Thus (f,2,g,3,h) is independent of if,2,g,4,d,3), and 
if,2,g,5,a) is inconsistent with if,2,h,4). 
A constraint is a path used to specify that particular 
function symbols must appear at particular places in a 
term. A term is correctly described by a constraint if the 
path makes sense (i.e., all the relevant components 
exist) and the given function symbols do indeed appear 
at the relevant places. A term is correctly described by 
a set of constraints if it is correctly described by all of 
them. Thus, for instance, the set of constraints: 
{ ig, 1 ,f,2,a), (g, 1,f,3,b), (g, 1 ,i'), ig) } 
correctly describes any of the terms: 
g(f(z,a,b(c)),h(1)) 
g(f(w(x),a(d,f,g),b,x)) 
g(f(w(x),a(g),b),x) 
The term that is the translation of a node in a system 
network is obtained from the set of constraints as 
follows. First of all, each function symbol (element of 
the range of A) is taken to have the same arity (number 
of arguments) in all the terms derived from the network. 
This is the minimal arity such that all the constraints 
attached to all the nodes are satisfiable. Secondly, with 
Computational Linguistics Volume 14, Number 1, Winter 1988 47 
C.S. Mellish Implementing Systemic Classification by Unification 
arities fixed in this way, the translation of a property is 
the most general (least instantiated) term that is cor- 
rectly described by the constraints at the property's 
node. Thus, for example, if the arities of a, b, f and g 
were 1, 0, 3 and 2 respectively (note that the arity of f 
must be at least 3), the following term would be the 
"solution" of the above set of constraints: 
g(f(_,a(_),b),_) 
The following defines a space of possible translation 
algorithms. For each place where alternatives are al- 
lowed, it does not matter which the algorithm specifies. 
(1) Record ({(f)},(f)) as the value for the leftmost 
node, where f is the value of A for that node 
(2) Until there is no connective all of whose left nodes 
have translations but none of whose right nodes 
do, do the following: 
(2.1) Select one such connective 
(2.2) If the connective is "l" and the left hand node has 
translation (C,P), assign to the ith node ni on the 
right hand side the translation: 
(C U to,}, p,~ 
where, for each i, Pi is an extension of P to A (ni), 
and where Pi is inconsistent with pj if i ~k j. 
If the connective is "{" and the left hand node L 
has translation (C, P), assign to the ith node on the 
right hand side the translation: 
(c, pi) 
where, for each i, Pi is an extension of P beyond A 
(L) and where Pi and pj are independent if i ~ j. 
If the connective is "}" and the two left hand 
nodes have translations (C 1, P1) and (C 2, P2), 
assign to the node on the right hand side the 
translation: 
(C 1 U Cz, Pt ) 
with either i = 1 or i =.2. 
The basic idea is that as one goes to the right in the 
network, the constraints on a node are the constraints 
on the nodes to its immediate left, together with possi- 
bly extra constraints to differentiate it from them (if 
they properly subsume it). At each point, the path 
component of a node's translation indicates where 
nodes further to the right can be further instantiated. If 
the connective is "l", the extra constraints added for 
each node on the right amount to forcing the place 
indicated by the path to have the function symbol 
associated with that node. Since the paths used for the 
right hand nodes are inconsistent, the terms generated 
for the nodes will be incompatible. If the connective is 
"{", the nodes on the right hand side must have the 
same constraints as the node on the left. On the other 
hand, each is given a different, slightly longer, path so 
that the term can be further instantiated in several 
independent ways. 
The above algorithm, although not optimal in the 
48 
sense of generating the smallest possible terms, is the 
basis of one that has generated correct results in prac- 
tice. In addition, we have a proof of correctness for it. 
The proof hinges on the reduction of the correctness 
criterion given in section 4 to the following simpler one. 
The translation z is correct if for all properties P l, P2 and 
P3 named in the network: 
~ V x: pl(x) D P2(X) 
iff ~Pl) -< "r(P2) 
~ V X: pl(x) D P2(X) 
iff "Z ~ P2 = P4&P5 for some P4,P5 in the network 
with P2, P4 and P5 all logically distinct 
or A(p2) appears in -r(p 0 
~ V X: pl(x) t~ P2(X) = F 
iff {'r(pl)~P2) } is not Unifiable 
~ V x: pl(x) & pE(x) = p3(x) 
iff ~(Pl) II r(p2) = ~'(P3) 
for some function A is introduced above. The reduction 
can be made because of various special properties that 
hold of system networks that only use "l", "{" and "}": 
If ~ ~ V x: p1(x) = p2(x)&P3(X), for properties Pl, 
P2 and P3 mentioned in the network and logically 
distinct, then p~ is equivalent to a property on the 
right hand side of a "}" connective, and Pz and P3 
are equivalent to the properties appearing on the 
left hand side. 
If ~ ~ V x: pl(x)&p2(x)&...pn(X) D p(x), for 
properties pl .... Pn, P mentioned in the network, 
then either Z ~ V x: p/(x) D p(x) for some i, or p 
is equivalent to a property appearing on the right of a ~'}". 
If P is a set of properties mentioned in the network 
and for every "t" connective either no elements of 
P lie to the right of that connective or all elements 
lie to the right of the same right hand branch, then 
the set P is compatible. 
None of the schemes in this section or the previous 
section require repeated variables in the logical terms, 
and so there is nothing to prevent unification tackling 
different components in parallel. Of course, if one had a 
great deal of parallelism available, one could use it to 
implement a naive network searcher for testing compat- 
ibility of two properties (e.g., by searching for the 
rightmost node that appears to the left of both and 
seeing whether the connective at that point is "\[". The 
implementation via unification would then be unneces- 
sary (at least for simple cases). Thus to a certain extent 
the "compilation" of tasks like compatibility- and sub- 
sumption-checking into unification tasks can be viewed 
as a special case of compiling OR-parallel programs into 
AND-parallel programs (Codish and Shapiro (1986), 
Ueda (1986)). The checking can be even faster if the 
unifications are further compiled (Warren (1977)). For 
instance, in the verb network, checking that a descrip- 
Computational Linguistics Volume 14, Number 1, Winter 1988 
C.S. Mellish Implementing Systemic Classification by Unification 
tion of a verb is compatible with "pres" simply amounts 
to seeing whether the third component of the term is not 
a function symbol different from "pres". 
8 INTRODUCING "\]" 
We have intentionally left consideration of the "\]" 
connective to its own section. This is because we have 
no general algorithm (apart from the above "brute 
force" algorithm) for translating system networks with 
"\]" connectives into logical terms in a way that will 
correctly mirror the subsumption relationships. Indeed, 
there is no reason to believe that there are straightfor- 
ward algorithms for this. 
First of all, we can easily demonstrate through an 
example that networks containing "\]"s cannot be im- 
plemented using only GAF o. Thus, something like the 
full power of unification is necessary, rather than the 
simple subcases we have considered up to now. Con- 
sider the system network depicted in Figure 10. The 
entity 
f 
al 
a2 
bx 
b2 
subsumption relationships between properties and con- 
junctions of properties are shown in Figure I1. This 
entity 
al b2 ¢ 
al&b2 at&c bl&a2 b~a~ 
F 
diagram must be read with some care, as we have only 
shown the elements of the meet-semilattice consisting 
of the data elements and their conjunctions. This is all 
that we need for an IDR application, but it means that, 
although we can read off from the diagram information 
about greatest lower bounds, we cannot obtain from it 
information about least upper bounds. Now a~&c and 
b~&a 2 are incompatible, i.e., their greatest lower bound 
is "F". Therefore the terms representing these two 
descriptions cannot unify. If we are not allowing terms 
with repeated variables, the only way that two terms 
can fail to unify is by there being some component 
where the two. terms take on different non-variable 
values. Let at&c take the value "y" in this component 
and bl&a 2 take the value "n" (we do not have to 
assume that "y" and "n" are atomic, merely that their 
"top level" function symbols are different). We can 
then make various inferences about what other terms 
should have in this component: 
al&c - y (hypothesis) 
b1&a 2 - n (hypothesis) 
b~ -- since it is above al c and bltYga 2 
c -_ since it is above bl 
a 2 - n so as to make bl~:r.a 2 when unified with b 1 
b2tY£a 2 - n since it is below a 2 
a~ - y so to make a~&c when unified with c 
a~&b2 - y since it is below a~ 
b E -_ since it is above bEtY£a 2 and altY£b 2 
Now bE a2, as well as being the greatest lower bound 
of b E and a2, is also the greatest lower bound of b 2 and 
c. However, unifying the terms for b z and c, the above 
specifies that the particular component will have the 
value "_". This conflicts with the value already inferred 
for b2&a 2 -"n". Therefore, under the constraint of 
avoiding repeated variables, it is not possible to produce 
a valid mapping from descriptions to terms. 
Figure 12 shows the subsumption relationships again, 
with the descriptions substituted by logical terms ob- 
tained by the "brute force" method. Of course, re- 
peated variables have to be used several times. 
f(O ....... 1) 
f(0.0.0._.l) f(0.0.X.X.1) f(0 ..... 1.1) 
f(o,o.o.o.1) f(o.o.o.1,1) f(o.1.1.1.1) f(o.o.l.l.x) 
It is natural to ask what it is about descriptions 
arising from networks with "\]" that cannot be modelled 
Computational Linguistics Volume 14, Number 1, Winter 1988 49 
C.S. Mellish Implementing Systemic Classification by Unification 
in GAFo. There is at least one property of GAFo (and 
any sub-semilattice of it that contains the bottom ele- 
ment) which is not echoed in description spaces like our 
examples. Since descriptions from "\]" networks can 
violate this property, it follows that GAFo cannot be 
used accurately to model the description spaces. The 
particular property is that compatibility is pairwise 
determined: 
For all sets of elements {tl, t2 .... tn} 
{tl, t 2 .... tn} compatible 
iff t i is compatible with tj for each i, j. 
(we say that a set of objects is compatible if their 
greatest lower bound is not the "bottom" element). In 
our example description space, the set {a~, b2, c} is not 
compatible, and yet each pair of elements taken from it 
is compatible. Thus compatibility is not determined 
pairwise in the description space. 
It seems unlikely in this example that one could come 
up with an encoding in the full GAF that would improve 
on the "brute force" approach. Indeed, this may often 
be the case, as networks containing "\]"s are very 
complex to process. Consider, for instance, Figure 13 
which is an extension to the above network. 
entity 
f 
al 
a2 
am 
bx 
bz 
b,. 
In choosing logical terms for al, a2, • .am, b~, b 2, 
• .bn, one has to reflect the fact that 
c & a i implies bl (i = 1..m-l) 
c & b; implies am (i = 2..n) 
ai&bj&c is contradictory (i= 1..m-l,j= 2..n) 
This certainly prevents the sub-classifications repre- 
sented by the "a"s and the "b"s fi'om being translated 
independently, in the way that could be done in our 
portion of Winograd's verb network. 
9 RELATED WORK 
A piece of work that is closely related in many ways is 
Kasper's work (Kasper (1986)) on translating Systemic 
Grammars into functional unification grammars. Kasper 
describes a way of mapping a system network and 
feature choices into a functional description of FUG 
(Kay (1984)). Such a mapping induces a mapping from 
systemic descriptions to functional descriptions in such 
a way that conjunction of descriptions is computed by 
(functional) unification of the functional descriptions. 
Kasper's work represents an extension of ours in that it 
treats more than the systemic choices of a Systemic 
Grammar, and we are currently investigating how our 
framework can best be extended to deal with these 
other aspects. Unfortunately, the functional description 
that Kasper builds for anything but a complete (maxi- 
mally specific) systemic description will contain dis- 
junctions, and the complexity of functional unification 
is seriously affected by the presence of disjunctions. 
Moreover, Kasper actually needs to extend the notion 
of functional description in order to produce transla- 
tions for networks with complex entry conditions. Since 
logical unification corresponds closely to functional 
unification when there are no disjunctive descriptions or 
ordering patterns, our methods can be viewed as tech- 
niques for mapping systemic descriptions into disjunc- 
tion-free, conventional functional descriptions. 
Whereas Kasper's primary aim is for a mapping be- 
tween a systemic Grammar and a Functional Unifica- 
tion Grammar (with a mapping between individual 
descriptions induced by this), in our case there is only a 
description mapping. It is hard to believe that a unifi- 
cation grammar allowing exactly those functional de- 
scriptions corresponding to the terms generated by our 
"brute force" method would be of much interest. 
Another related piece of work is that of Gazdar et al., 
(forthcoming), which seeks to produce a uniform for- 
malism for describing legal categories in grammatical 
formalisms. In the case of Systemic Grammar, a "cat- 
egory" is a set of feature specifications. The work of 
Gazdar and his colleagues envisages an implementation 
of category conjunction by a kind of unification which is 
similar to logical unification, except that there are no 
repeated variables. As with Kasper's work, the ap- 
proach is to provide a translation for the grammar of 
legal descriptions. In this case, however, one product of 
the translation is a set of global constraints on legal 
descriptions. Unfortunately, unification does not pre- 
serve the validity of these constraints and is not there- 
fore a correct implementation of greatest lower bound in 
the subsumption lattice. Our approach can be roughly 
characterised as an attempt to find alternative transla- 
tions where unification is correct because the global 
constraints have been "built in". To do this, unfortu- 
nately, we need to allow repeated variables. 
ACKNOWLEDGEMENTS 
This work was carried out as part of a project on natural 
language generation financed by SERC grant GR/D/ 
50 ComputationaR Linguistics Volume 14, Number 1, Winter 1988 
C.S. Meilish Implementing Systemic Classification by Unification 
08876. I am currently supported by an SERC Advanced 
Fellowship. 

REFERENCES 
Birkhoff, Garrett. 1963 Lattice Theory. American Mathematical So- 
ciety. 
Bundy, Alan; Byrd, Lawrence and Mellish, Chris. 1982 Special 
Purpose, but Domain Independent, Inference Mechanisms. In 
Steels, Luc and Campbell, John, Eds., Progress in Artificial 
Intelligence. John Wiley, Chichester, UK: 93-111. 
Buntine, Wray. 1986 Generalised Subsumption and its Applications to 
Induction and Redundancy. In Steels, Luc and DuBoulay, Ben, 
Eds., Procs of the Seventh European Conference on Artificial 
Intelligence. Brighton, 1986. 
Codish, Michael and Shapiro, Ehud. 1986 Compiling OR-Parallelism 
into AND-Parallelism. In Shapiro, Ehud, Ed., Procs of the Third 
International Conference on Logic Programming. Springer Ver- 
lag. 
Dahl, Veronica. 1977 Un Systeme Deductif d'Interrogation de Ban- 
ques de Donnes en Espagnol. PhD dissertation, Groupe d'Intelli- 
gence Artificielle, University of Marseille-Luminy. 
Gazdar, Gerald; Klein, Ewan;Pullum, Geoffrey and Sag, Ivan. 1985 
Generalised Phrase Structure Grammar. Blackwell. 
Gazdar, Gerald; Pullum, Geoffrey; Carpenter, Robert; Klein, Ewan; 
Hukari, Thomas and Levine, Robert. (to appear) Category Struc- 
tures. Computational Linguistics. 
Hudson, Richard A. 1971 English Complex Sentences: An Introduc- 
tion to Systemic Grammar. North Holland. 
Kasper, Robert. 1986 Systemic Grammar and Functional Unification 
Grammar. In Benson, J. and Greaves, W., Eds., Selected papers 
from the Twelfth International Systemics Workshop. Ablex, Nor- 
wood, NJ, USA. 
Kay, Martin. 1984 Functional Unification Grammar: A Formalism for 
Machine Translation. In Procs of COLING-84. Stanford, July 
1984. 
Kress, Gunther, Ed. 1976 Halliday: System and Function in Lan- 
guage. Oxford University Press. 
McCord, Michael C. 1986 Design of a Prolog-based Machine Trans- 
lation System. In Shapiro, Ehud, Ed., Procs of the Third Interna- 
tional Conference on Logic Programming. Springer Verlag. 
Mellish, Chris S. (forthcoming) Version Spaces, Description Spaces 
and Unification. Research paper, Dept of Artificial Intelligence, 
University of Edinburgh. 
Patten, Terry A. 1986 Interpreting Systemic Grammar as a Computa- 
tional Representation: A Problem Solving Approach to Text 
Generation. PhD thesis, University of Edinburgh. 
Reynolds, John C. 1970 Transformational Systems and the Algebraic 
Structure of Atomic Formulas. In Meltzer, Bernard and Michie, 
Donald, Eds., Machine Intelligence 5. Edinburgh University 
Press. 
Robinson, J. Alan. 1965 A Machine-Oriented Logic Based on the 
Resolution Principle. JACM 12, 23-41. 
Ueda, Kazunori. 1986 Making Exhaustive Search Programs Deter- 
ministic. In Shapiro, Ehud, Ed., Procs of the Third International 
Conference on Logic Programming. Springer Verlag. 
Warren, David H. D. 1977 Implementing Prolog: Compiling Predicate 
Logic Programs: Research Reports 39 and 40, Department of 
Artificial Intelligence, University of Edinburgh. 
Winograd, Terry. 1972 Understanding Natural Language. Academic 
Press. 
Winograd, Terry. 1983 Language as a Cognitive Process. Volume 1: 
Syntax. Addison Wesley. 
