PRIORITY UNION AND GENERALIZATION 
IN DISCOURSE GRAMMARS 
Claire Grover, Chris Brew, Suresh Manandhar, Marc Moens 
HCRC Language Technology Group 
The University of Edinburgh 
2 Buccleuch Place 
Edinburgh EH8 9LW, UK 
Internet: C. Grover©ed. ac. uk 
Abstract 
We describe an implementation in Carpenter's ty- 
ped feature formalism, ALE, of a discourse gram- 
mar of the kind proposed by Scha, Polanyi, 
et al. We examine their method for resolving 
parallelism-dependent anaphora and show that 
there is a coherent feature-structural rendition of 
this type of grammar which uses the operations 
of prwrity union and generalization. We describe 
an augmentation of the ALE system to encompass 
these operations and we show that an appropriate 
choice of definition for priority union gives the de- 
sired multiple output for examples of vP-ellipsis 
which exhibit a strict/sloppy ambiguity. 
1 Discourse Grammar 
Working broadly within the sign-based paradigm 
exemplified by HPSG (Pollard and Sag in press) 
we have been exploring computational issues for 
a discourse level grammar by using the ALE sy- 
stem (Carpenter 1993) to implement a discourse 
grammar. Our central model of a discourse gram- 
mar is the Linguistic Discourse Model (LDM) most 
often associated with Scha, Polanyi, and their co- 
workers (Polanyi and Scha 1984, Scha and Polanyi 
1988, Priist 1992, and most recently in Priist, Scha 
and van den Berg 1994). In LDM rules are defi- 
ned which are, in a broad sense, unification gram- 
mar rules and which combine discourse constitu- 
ent units (DCUS). These are simple clauses whose 
syntax and underresolved semantics have been de- 
termined by a sentence grammar but whose fully 
resolved final form can only be calculated by their 
integration into the current discourse and its con- 
text. The rules of the discourse grammar act to 
establish the rhetorical relations between constitu- 
ents and to perform resolution of those anaphors 
whose interpretation can be seen as a function of 
discourse coherence (as opposed to those whose 
interpretation relies on general knowledge). 
For illustrative purposes, we focus here on Prfist's 
rules for building one particular type of rhetorical 
relation, labelled "list" (Priist 1992). His central 
thesis is that for DCUs to be combined into a list 
they must exhibit a degree of syntactic-semantic 
parallelism and that this parallelism will strongly 
determine the way in which some kinds of anaphor 
are resolved. The clearest example of this is vP- 
ellipsis as in (la) but Priist also claims that the 
subject and object pronouns in (lb) and (lc) are 
parallelism-dependent anaphors when they occur 
in list structures and must therefore be resolved to 
the corresponding fully referential subject/object 
in the first member of the list. 
(1) a. Hannah likes beetles. So does Thomas. 
b. Hannah likes beetles. She also likes 
caterpillars. 
c. Hannah likes beetles. Thomas hates 
them. 
(2) is Priist's list construction rule. It is intended 
to capture the idea that a list can be constructed 
out of two DCUs, combined by means of connec- 
tives such as and and or. The categories in Priist's 
rules have features associated with them. In (2) 
these features are sere (the unresolved semantic 
interpretation of the category), consem (the con- 
textually resolved semantic interpretation), and 
schema (the semantic information that is com- 
mon between the daughter categories). 
(2) list \[sere: el T~ ((Cl ¢'$2) RS2), 
schema : C1 ¢ $2\] ----4 
DCUI \[ sem : Sl,consem : C1\] + 
DCU2 \[sere : ~S2,consem : ((Cl ~'$2) ~$2)\] 
Conditions: 
C1   $2 is a characteristic generalization of C1 
and S~; R E {and, or .... }. 
Priist calls the operation used to calculate the va- 
lue for schema the most specific common deno- 
minator (MSCD, indicated by the symbol ¢). The 
MSCD of C1 and $2 is defined as the most specific 
generalization of C1 that can unify with 5'2. It is 
essential that the result should be contentful to a 
degree that confirms that the list structure is an 
appropriate analysis, and to this end Pr/ist impo- 
ses the condition that the value of schema should 
17 
be a characteristic generalization of the informa- 
tion contributed by the two daughters. There is 
no formal definition of this notion; it would re- 
quire knowledge from many sources to determine 
whether sufficient informativeness had been achie- 
ved. However, assuming that this condition is met, 
Priist uses the common information as a source for 
resolution of underspecified elements in the second 
daughter by encoding as the value of the second 
daughter's consem the unification of the result of 
MSCD with its pre-resolved semantics (the formula 
((Ca / $2) Iq $2)). So in Priist's rule the MSCD 
operation plays two distinct roles, first as a test for 
parallelism (as the value of the mother's schema) 
and second as a basis for resolution (in the com- 
posite operation which is the value of the second 
daughter's consem). There are certain problems 
with MSCD which we claim stem from this attempt 
to use one operation for two purposes, and our pri- 
mary concern is to find alternative means of achie- 
ving Prfist's intended analysis. 
2 An ALE Discourse Grammar 
For our initial exploration into using ALE for dis- 
course grammars we have developed a small dis- 
course grammar whose lexical items are complete 
sentences (to circumvent the need for a sentence 
grammar) and which represents the semantic con- 
tent of sentences using feature structures of type event 
whose sub-types are indicated in the follo- 
wing part of the type hierarchy: 
event (3) 
agentive 
plus_patient prop-art 
emot-att action believe assume 
like hate kick catch 
In addition we have a very simplified semantics of 
noun phrases where we encode them as of type entity 
with the subtypes indicated below: 
(4) entity 
animate 
human animal 
female male insect ..4"-., 
hannah jessy thomas sam brother beetle bee cater- pillar 
Specifications of which features are appropriate for 
which type give us the following representations of 
the semantic content of the discourse units in (1): 
(5) a. Hannah likes beetles 
\[ AGENT hannah \] 
PATIENT beetle 
like 
b. So does Thomas 
\[ AGENT thomas \] 
agentive 
c. She also likes caterpillars 
\[ AGENT female \] 
PATIENT caterpillar 
like 
d. Thomas hates them 
\[ AGENT thomas \] 
PATIENT entity 
hate 
2.1 Calculating Common Ground 
The SCHEMA feature encodes the information that 
is common between daughter Dcus and Prtist uses 
MSCD to calculate this information. A feature- 
structural definition of MSCD would return as a 
result the most specific feature structure which is 
at least as general as its first argument but which 
is also unifiable with its second argument. For 
the example in (lc), the MSCD operation would be 
given the two arguments in (5a) and (5d), and (6) 
would be the result. 
(6) \[ AGENT human \] 
PATIENT beetle 
emot_att 
We can contrast the MSCD operation with an 
operation which is more commonly discussed in 
the context of feature-based unification systems, 
namely generalization. This takes two feature- 
structures as input and returns a feature struc- 
ture which represents the common information in 
them. Unlike MSCD, generalization is not asym- 
metric, i.e. the order in which the arguments are 
presented does not affect the result. The genera- 
lization of (5a) and (5d) is shown in (7). 
(7) \[ AGENT human \] 
PATIENT entity 
emot_att 
It can be seen from this example that the MSCD 
result contains more information than the genera- 
lization result. Informally we can say that it seems 
to reflect the common information between the 
two inputs after the parallelism-dependent ana- phor in the second sentence has been resolved. 
The 
reason it is safe to use MSCD in this context is pre- 
cisely because its use in a list structure guarantees 
18 
that the pronoun in the second sentence will be 
resolved to beetle. In fact the result of MSCD in 
this case is exactly the result we would get if we 
were to perform the generalization of the resolved 
sentences and, as a representation of what the two 
have in common, it does seem that this is more de- 
sirable than the generalization of the pre-resolved 
forms. 
If we turn to other examples, however, we discover 
that MSCD does not always give the best results. 
The discourse in (8) must receive a constituent 
structure where the second and third clauses are 
combined to form a contrast pair and then this contrast 
pair combines with the first sentence to 
form a list. (Prfist has a separate rule to build 
contrast pairs but the use of MSCD is the same as 
in the list rule.) 
(8) Hannah likes ants. Thomas likes bees but 
Jessy hates them. 
(9) fAGENT hanna~ 
\[._PATIENT insect_.J 
like 
AGENT hannah~ \[AGENT human~ 
PATIENT ant _\] PATIENT bee _J 
like e~ 
ATIENT bee \[ \[-PATIENT entity I 
like hate 
The tree in (9) demonstrates the required struc- 
ture and also shows on the mother and interme- 
diate nodes what the results of MSCD would be. As 
we can see, where elements of the first argument 
of MSCD are more specific than the corresponding 
elements in the second, then the more specific one 
occurs in the result. Here, this has the effect that 
the structure \[like, AGENT hannah, PATIENT ins- 
ect \] is somehow claimed to be common ground 
between all three constituents even though this is 
clearly not the case. 
Our solution to this problem is to dispense with 
the MSCD operation and to use generalization in- 
stead. However, we do propose that generalization 
should take inputs whose parallelism dependent 
anaphors have already been resolved. 1 In the case 
of the combination of (5a) and (5d), this will give 
1As described in the next section, we use priority 
union to resolve these anaphors in both lists and con- trasts. 
The use of generalization as a step towards 
checking that there is sufficient common ground is sub- 
sequent to the use of priority ration as the resolution 
mechanism. 
exactly the same result as MSCD gave (i.e. (6)), 
but for the example in (8) we will get different re- 
sults, as the tree in (10) shows. (Notice that the 
representation of the third sentence is one where 
the anaphor is resolved.) The resulting generaliza- 
tion, \[emot_att, AGENT human, PATIENT insect\], is 
a much more plausible representation of the com- 
mon information between the three DCUs than the 
results of MSCD. 
(10) fAGENT huma~ 
\[_PATIENT insect~ 
ATIENT ant ..J \[_PATIENT bee _J 
like e~ 
LPATIENT bee _\] \[_PATIENT bee ._\] 
like hate 
2.2 Resolution of Parallel Anaphors 
We have said that MSCD plays two roles in Pr/ist's 
rules and we have shown how its function in cal- 
culating the value of SCHEMA can be better served 
by using the generalization operation instead. We 
turn now to the composite operation indicated in 
(2) by the formula ((C, /S~)NS2). This com- 
posite operation calculates MSCD and then unifies 
it back in with the second of its arguments in or- 
der to resolve any parallelism-dependent anaphors 
that might occur in the second DCU. In the discus- 
sion that follows, we will refer to the first DcU in 
the list rule as the source and to the second DCU 
as the target (because it contains a parallelism- 
dependent anaphor which is the target of our at- 
tempt to resolve that anaphor). 
In our ALE implementation we replace Pr/ist's 
composite operation by an operation which has oc- 
casionally been proposed as an addition to feature- 
based unification systems and which is usually re- 
ferred to either as default unification or as priority union. 2 
Assumptions about the exact definition of 
this operation vary but an intuitive description of 
it is that it is an operation which takes two feature 
structures and produces a result which is a merge 
of the information in the two inputs. However, 
the information in one of the feature structures is 
"strict" and cannot be lost or overridden while the 
information in the other is defensible. The opera- 
tion is a kind of union where the information in 
the strict structure takes priority over that in the 
~See, for example, Bouma (1990), Calder (1990), 
Carpenter (1994), Kaplan (1987). 
19 
default structure, hence our preference to refer to 
it by the name priority union. Below we demon- 
strate the results of priority union for the exam- 
ples in (la)-(lc). Note that the target is the strict 
structure and the source is the defeasible one. 
(11) Hannah likes beetles. So does Thomas. 
Source: 5a Target: 
5b Priority\[ AGENT th°mas \] 
Union: PATIENT beetle like 
(12) Hannah likes beetles. She also likes caterpillars. 
Source: 5a 
Target: 5c \[ AGENT hannah 1 
Priority PATIENT caterpillar Union: like 
(13) Hannah likes beetles. Thomas hates them. 
Source: 5a 
Target: 5d 
AGENT thomas \] Priority PATIENT beetle 
Union: hate 
For these examples priority union gives us exactly 
the same results as Priist's composite operation. 
We use a definition of priority union provided by 
Carpenter (1994) (although note that his name for 
the operation is "credulous default unification"). 
It is discussed in more detail in Section 3. The pri- 
ority union of a target T and a source S is defined 
as a two step process: first calculate a maximal 
feature structure S' such that S' E S, and then 
unify the new feature structure with T. 
This is very similar to PriJst's composite opera- 
tion but there is a significant difference, however. 
For Priist there is a requirement that there should 
always be a unique MSCD since he also uses MSCD 
to calculate the common ground as a test for par- 
allelism and there must only be one result for that 
purpose. By contrast, we have taken Carpenter's 
definition of credulous default unification and this 
can return more than one result. We have strong 
reasons for choosing this definition even though 
Carpenter does define a "skeptical default unifi- 
cation" operation which returns only one result. 
Our reasons for preferring the credulous version 
arise from examples of vP-ellipsis which exhibit an 
ambiguity whereby both a "strict" and a "sloppy" 
reading are possible. For example, the second sen- 
tence in (14) has two possible readings which can 
be glossed as "Hannah likes Jessy's brother" (the 
strict reading) and "Hannah likes her own bro- 
ther" (the sloppy reading). 
(14) Jessy likes her brother. So does Hannah. 
The situations where the credulous version of the 
operation will return more than one result arise 
from structure sharing in the defeasible feature 
structure and it turns out that these are exactly 
the places where we would need to get more than 
one result in order to get the strict/sloppy ambi- 
guities. We illustrate below: 
(15) Jessy likes her brother. So does Hannah. 
Source: AGENT 
PATIENT 
like 
~\]jessy \] \[\] \] 
brother 
Target: \[ AGENT hannah \] 
agentive 
Priority 
Union: " AGENT 
PATIENT 
like 
\[\] hannah 1 \[ nl\] 
brother 
AGENT 
PATIENT 
like 
hannah \] \[ 
brother 
Here priority union returns two results, one where 
the structure-sharing information in the source has 
been preserved and one where it has not. As the 
example demonstrates, this gives the two readings 
required. By contrast, Carpenter's skeptical de- 
fault unification operation and Priist's composite 
operation return only one result. 
2.3 Higher Order Unification 
There are similarities between our implementa- 
tion of Prfist's grammar and the account of vP- 
ellipsis described by Dalrymple, Shieber and Pe- 
reira (1991) (henceforth DSP). DSP gives an 
equational characterization of the problem of vp- 
ellipsis where the interpretation of the target 
phrase follows from an initial step of solving an 
equation with respect to the source phrase. If a 
function can be found such that applying that fun- 
ction to the source subject results in the source in- 
terpretation, then an application of that function 
to the target subject will yield the resolved inter- 
pretation for the target. The method for solving 
such equations is "higher order unification". (16) 
shows all the components of the interpretation of 
the example in (11). 
20 
(16) Hannah likes beetles. So does Thomas. 
Source: 
Target (T): 
Equation: 
Solution: 
Apply to T: 
like(hannah, beetle) 
P ( thomas ) 
P ( hannah ) = like(hannah, beetle) 
P = ~x.like(x, beetle) 
like(thomas, beetle) 
A prerequisite to the DSP procedure is the esta- 
blishment of parallelism between source and target 
and the identification of parallel subparts. For ex- 
ample, for (16) it is necessary both that the two 
clauses Hannah likes beetles and So does Thomas 
should be parallel and that the element hannah 
should be identified as a parallel element. DSP 
indicate parallel elements in the source by means 
of underlines as shown in (16). An underlined ele- 
ment in the source is termed a 'primary occur- 
rence' and DSP place a constraint on solutions to 
equations requiring that primary occurrences be 
abstracted. Without the identification of hannah 
as a primary occurrence in (16), other equations 
deriving from the source might be possible, for ex- 
ample (17) : 
(17) a. P(beetle) = like(hannah, beetle) 
b. P(like) = like(hannah, beetle) 
The DSP analysis of our strict/sloppy example in 
(14) is shown in (18). The ambiguity follows from 
the fact that there are two possible solutions to the 
equation on the source: the first solution involves 
abstraction of just the primary occurrence ofjessy, 
while the second solution involves abstraction of 
both the primary and the secondary occurrences. 
When applied to the target these solutions yield 
the two different interpretations: 
(18) Jessy 
Source: 
Target: 
Equation: 
Sol.1 ($1): 
Sol.2 (S2): 
Apply SI: 
Apply $2: 
likes her brother. So does Hannah. 
like(jessy, brother-of (jessy) ) 
P( hannah ) 
P(jessy) = like(jessy, brother-of (jessy) ) 
P = ~x.like(x, brother-of(jessy)) 
e = Ax.like(x, brother-of(x)) 
like(hannah, brother-of (jessy) ) 
like(hannah, brother-of(hannah)) 
DSP claim that a significant attribute of their ac- 
count is that they can provide the two readings in 
strict/sloppy ambiguities without having to postu- 
late ambiguity in the source. They claim this as 
a virtue which is matched by few other accounts 
of vP-ellipsis. We have shown here, however, that 
an account which uses priority union also has no 
need to treat the source as ambiguous. 
Our results and DSP's also converge where the 
treatment of cascaded ellipsis is concerned. For 
the example in (19) both accounts find six rea- 
dings although two of these are either extremely 
implausible or even impossible. 
(19) John revised his paper before the teacher 
did, and Bill did too. 
DSP consider ways of reducing the number of 
readings and, similarly, we are currently explo- 
ring a potential solution whereby some of the re- 
entrancies in the source are required to be trans- 
mitted to the result of priority union. 
There are also similarities between our account 
and the DSP account with respect to the esta- 
blishment of parallelism. In the DSP analysis the 
determination of parallelism is separate from and 
a prerequisite to the resolution of ellipsis. Howe- 
ver, they do not actually formulate how paralle- 
lism is to be determined. In our modification of 
Prfist's account we have taken the same step as 
DSP in that we separate out the part of the fea- 
ture structure used to determine parallelism from 
the part used to resolve ellipsis. In the general 
spirit of Priist's analysis, however, we have taken 
one step further down the line towards determi- 
ning parallelism by postulating that calculating 
the generalization of the source and target is a 
first step towards showing that parallelism exists. 
The further condition that Prfist imposes, that the 
common ground should be a characteristic genera- 
lization, would conclude the establishment of par- 
allelism. We are currently not able to define the 
notion of characteristic generalization, so like DSP 
we do not have enough in our theory to fully imple- 
ment the parallelism requirement. In contrast to 
the DSP account, however, our feature structural 
approach does not involve us having to explicitly 
pair up the component parts of source and target, 
nor does it require us to distinguish primary from 
secondary occurrences. 
2.4 Parallelism 
In the DSP approach to vP-ellipsis and in our ap- 
proach too, the emphasis has been on semantic 
parallelism. It has often been pointed out, howe- 
ver, that there can be an additional requirement of 
syntactic parallelism (see for example, Kehler 1993 
and Asher 1993). Kehler (1993) provides a use- 
ful discussion of the issue and argues convincingly 
that whether syntactic parallelism is required de- 
pends on the coherence relation involved. As the 
examples in (20) and (21) demonstrate, semantic 
parallelism is sufficient to establish a relation like 
contrast but it is not sufficient for building a co- 
herent list. 
(20) The problem was looked into by John, but 
no-one else did. 
(21) *This problem was looked into by John, 
and Bill did too. 
For a list to be well-formed both syntactic and 
semantic parallelism are required: 
21 
(22) John looked into this problem, and Bill did 
too. 
In the light of Kehler's claims, it would seem that 
a more far-reaching implementation of our prio- 
rity union account would need to specify how the 
constraint of syntactic parallelism might be imple- 
mented for those constructions which require it. 
An nPSG-style sign, containing as it does all types 
of linguistic information within the same feature 
structure, would lend itself well to an account of 
syntactic parallelism. If we consider that the DTRS 
feature in the sign for the source clause contains 
the entire parse tree including the node for the 
vP which is the syntactic antecedent, then ways 
to bring together the source vP and the target be- 
gin to suggest themselves. We have at our disposal 
both unification to achieve re-entrancy and the op- 
tion to use priority union over syntactic subparts 
of the sign. In the light of this, we are confident 
that it would be possible to articulate a more ela- 
borate account of vp-ellipis within our framework 
and that priority union would remain the opera- 
tion of choice to achieve the resolution. 
3 Extensions to ALE 
In the previous sections we showed that Prfist's 
MSCD operation would more appropriately be re- 
placed by the related operations of generalization 
and priority union. We have added generalization 
and priority union to the ALE system and in this 
section we discuss our implementation. We have 
provided the new operations as a complement to 
the definite clause component of ALE. We chose 
this route because we wanted to give the gram- 
mar writer explicit control of the point at which 
the operations were invoked. ALE adopts a sim- 
ple eROLOG-like execution strategy rather than 
the more sophisticated control schemes of systems 
like CUF and TFS (Manandhar 1993). In princi- 
ple it might be preferable to allow the very gene- 
ral deduction strategies which these other systems 
support, since they have the potential to support a 
more declarative style of grammar-writing. Unfor- 
tunately, priority union is a non-monotonic ope- 
ration, and the consequences of embedding such 
operations in a system providing for flexible exe- 
cution strategies are largely unexplored. At least 
at the outset it seems preferable to work within a 
framework in which the grammar writer is requi- 
red to take some of the responsibility for the order 
in which operations are carried out. Ultimately we 
would hope that much of this load could be taken 
by the system, but as a tool for exploration ALE 
certainly suffices. 
3.1 Priority Union in ALE 
We use the following definition of priority union, 
based on Carpenter's definition of credulous de- 
fault unification: 
(23) punion(T,S) = {unify(T,S') IS' K S 
is maximal such that unify(T,S') is defined} 
punion(T,S) computes the priority union oft (tar- 
get; the strict feature structure) with S (source; 
the defeasible feature structure). This definition 
relies on Moshier's (1988) definition of atomic fea- 
ture structures, and on the technical result that 
any feature structure can be decomposed into a 
unification of a unique set of atomic feature struc- 
tures. Our implementation is a simple procedura- 
lization of Carpenter's declarative definition. First 
we decompose the default feature structure into a 
set of atomic feature structures, then we search for 
the maximal subsets required by the definition. 
We illustrate our implementation of priority union 
in ALE with the example in (15): Source is the de- 
fault input, and Target is the strict input. The 
hierarchy we assume is the same as shown in (3) 
and (4). Information about how features are asso- 
ciated with types is as follows: 
• The type agentive introduces the feature AGENT 
with range type human. 
• The type plus-patient introduces the feature PA- 
TIENT with range type human. 
• The type brother introduces the feature 
BROTHER-OF with range type human. 
• The types jessy and hannah introduce no fea- 
tures. 
In order to show the decomposition into ato- 
mic feature structures we need a notation to re- 
present paths and types. We show paths like 
this: PATIENTIBROTHER-OF and in order to sti- 
pulate that the PATIENT feature leads to a struc- 
ture of type brother, we include type informa- 
tion in this way: (PATIENW/brother)\[(BROTHER- 
of~human). We introduce a special feature (*) 
to allow specification of the top level type of the 
structure. The structures in (15) decompose into 
the following atomic components. 
(24) Default input: 
( AGENT / jessy) ( D 1 ) 
(PATIENT/brother)I(BROTHER-OF/jessy) (D2) 
AGENT ---~ PATIENTIBROTHER-OF (D3) 
(*/like) (D4) 
Strict input: 
(AGENT~hannah) (S 1 ) 
( * / agentive) ($2) 
Given the type hierarchy the expressions above ex- 
pand to the following typed feature structures: 
22 
(25) 
Default input: 
\[ AGENT jessy \] 
agentive 
AGENT 
PATIENT 
plus-patient 
AGENT 
PATIENT 
plus-patient 
AGENT human \] 
PATIENT entity like 
human 1 
brother 
human \] \] 
brother 
(D1) 
(D2) 
(D3) 
(D4) 
Strict input: 
\[ AGENT hannah \] 
agentive (s1,s2) 
We can now carry out the following steps in order 
to generate the priority union. 
1. Add (94) to the strict input. It cannot conflict. 
2. Note that it is impossible to add (D1) to the 
strict input. 
3. Non-deterministically add either (92) or (93) 
to the strict input. 
4. Note that the results are maximal in each case 
because it is impossible to add both (D2) and 
(D3) without causing a clash between the dis- 
joint atomic types hannah and jessy. 
5. Assemble the results into feature structures. If 
we have added (D3) the result will be (26) and 
if we have added (D2) the result will be (27). 
(26) Result 1: 
" AGENT \[\] hannah \] 
PATIENT \[BROTHER-OF \[\] \] \] brother 
like 
(27) Result 2: 
AGENT 
PATIENT 
like 
hannah \] 
\[BROTHER-OFjessy\] 
brother 
In order to make this step-by-step description into 
an algorithm we have used a breadth-first search 
routine with the property that the largest sets are 
generated first. We collect answers in the order in 
which the search comes upon them and carry out 
subsumption checks to ensure that all the answers 
which will be returned are maximal. These checks 
reduce to checks on subset inclusion, which can be 
reasonably efficient with suitable set representati- 
ons. Consistency checking is straightforward be- 
cause the ALE system manages type information 
in a manner which is largely transparent to the 
user. Unification of ALE terms is defined in such a 
way that if adding a feature to a term results in a 
term of a new type, then the representation of the 
structure is specialized to reflect this. Since prio- 
rity union is non-deterministic we will finish with 
a set of maximal consistent subsets. Each of these 
subsets can be converted directly into ALE terms 
using ALE's built-in predicate add_to/5. The re- 
sulting set of ALE terms is the (disjunctive) result 
of priority union. 
In general we expect priority union to be a com- 
putationally expensive operation, since we cannot 
exclude pathological cases in which the system has 
to search an exponential number of subsets in the 
search for the maximal consistent elements which 
are required. In the light of this it is fortunate 
that our current discourse grammars do not re- 
quire frequent use of priority union. Because of 
the inherent complexity of the task we have fa- 
voured correctness and clarity at the possible ex- 
pense of efficiency. Once it becomes established 
that priority union is a useful operation we can 
begin to explore the possibilities for faster imple- 
mentations. 
3.2 Generalization in ALE 
The abstract definition of generalization stipulates 
that the generalization of two categories is the lar- 
gest category which subsumes both of them. Mos- 
hier (1988) has shown that generalization can be 
defined as the intersection of sets of atomic fea- 
ture structures. In the previous section we outli- 
ned how an ALE term can be broken up into atomic 
feature structures. All that is now required is the 
set intersection operation with the addition that 
we also need to cater for the possibility that ato- 
mic types may have a consistent generalization. 
1. For P and Q complex feature structures Gen(P,Q) =~! {Path: C I Path: A E P 
and Path : B E Q } where C is the most 
specific type which subsumes both A and B. 
2. For A and B atomic types Gen(A, B) =dr C 
where C is the most specific type which subsu- 
mes both A and B. 
In ALE there is always a unique type for the gene- 
ralization. We have made a small extension to the 
ALE compiler to generate a table of type genera- 
lizations to assist in the (relatively) efficient com- 
putation of generalization. To illustrate, we show 
how the generalization of the two feature structu- 
res in (28) and (29) is calculated. 
23 
(28) 
(29) 
Hannah likes ants. 
AGENT hannah \] 
PATIENT ant 
like 
Jessy laughs. 
\[AGENT jessy \] 
laugh 
These decompose into the atomic components 
shown in (30) and (31) respectively. 
(30) (*/like) 
(AGENT/hannah) 
(PATIENT/ant) 
(31) (*/Za.gh) (AGENT/jessy) 
These have only the AGENT path in common alt- 
hough with different values and therefore the ge- 
neralization is the feature structure corresponding 
to this path but with the generalization of the ato- 
mic types hannah and jessy as value: 
(32) \[ AGENT female \] 
agentive 
4 Conclusion 
In this paper we have reported on an implemen- 
tation of a discourse grammar in a sign-based for- 
malism, using Carpenter's Attribute Logic Engine 
(aLE). We extended the discourse grammar and 
ALE to incorporate the operations of priority union 
and generalization, operations which we use for 
resolving parallelism dependent anaphoric expres- 
sions. We also reported on a resolution mecha- 
nism for verb phrase ellipsis which yields sloppy 
and strict readings through priority union, and we 
claimed some advantages of this approach over the 
use of higher-order unification. 
The outstanding unsolved problem is that of esta- 
blishing parallelism. While we believe that gene- 
ralization is an appropriate formal operation to 
assist in this, we still stand in dire need of a con- 
vincing criterion for judging whether the genera- 
lization of two categories is sufficiently informative 
to successfully establish parMlelism. 
Acknowledgements 
This work was supported by the EC-funded project 
LRE-61-062 "Towards a Declarative Theory of Dis- 
course" and a longer version of the paper is available 
in Brew et al (1994). We have profited from discus- 
sions with Jo Calder, Dick Crouch, Joke Dorrepaal, 
Claire Gardent, Janet Hitzeman, David Millward and 
Hub Prfist. Andreas Schhter helped with the imple- 
mentation work. The Human Communication Rese- 
arch Centre (HCRC) is supported by the Economic 
and Social Research Council (UK). 

References 
Asher, N. (1993) Reference to Abstract Objects in Di- 
scourse. Dordrecht: Kluwer. 
Bouma, G. (1990) Defaults in Unification Grammar. 
In Proceedings of the 28th ACL, pp. 165-172, Uni- 
versity of Pittsburgh. 
Brew, C. et al (1994) Discourse Representation. De- 
liverable B+ of LRE-61-062: Toward a Declarative 
Theory of Discourse. 
Calder, J. H. R. (1990) An Interpretation of Paradig- 
matic Morphology. PhD thesis, Centre for Cognitive 
Science, University of Edinburgh. 
Carpenter, B. (1993) ALE. The Attribute Logic En- 
gine user's guide, version ~. Laboratory for Com- 
putational Linguistics, Carnegie Mellon University, 
Pittsburgh, Pa. 
Carpenter, B. (1994) Skeptical and credulous default 
unification with applications to templates and inhe- 
ritance. In T. Briscoe et al, eds., Inheritance, De- 
faults, and the Lexicon, pp. 13-37. Cambridge: Cam- 
bridge University Press. 
Dalrymple, M., S. Shieber and F. Pereira (1991) El- 
lipsis and higher-order unification. Linguistics and 
Philosophy 14(4), 399-452. 
Kaplan, R. M. (1987) Three seductions of computa- 
tional psycholinguistics. In P. J. Whitelock et al, 
eds., Linguistic Theory and Computer Applications, 
pp. 149-188. London: Academic Press. 
Kehler, A. (1993) The effect of establishing coherence 
in ellipsis and anaphora resolution. In Proceedings 
of the 31st ACL, pp. 62-69, Ohio State University. 
Manandhar, S. (1993) CUF in context. In J. Dbrre, ed., 
Computational Aspects of Constraint-Based Lingui- 
stics Description. DYANA-2 Deliverable. 
Moshier, D. (1988) Extensions to Unification Gram- 
mar for the Description of Programming Languages. 
PhD thesis, Department of Mathematics, University 
of California, Los Angeles. 
Polanyi, L. and R. Scha (1984) A syntactic approach 
to discourse semantics. In Proceedings of the tOth 
Coling and the 22nd ACL, pp. 413-419, Stanford 
University. 
Pollard, C. and I. A. Sag (in press) Head-Driven 
Phrase Structure Grammar. Chicago, Ill.: Univer- 
sity of Chicago Press and CSLI Publications. 
Priist, H. (1992) On Discourse Structuring, VP Ana- 
phora and Gapping. PhD thesis, Universiteit van 
Amsterdam, Amsterdam. 
Pr/Jst, H., R. Scha and M. van den Berg (1994} Dis- 
course grammar and verb phrase anaphora. Lingui- 
stics and Philosophy. To appear. 
Scha, R. and L. Polanyi (1988) An augmented context 
free grammar for discourse. In Proceedings of the 
12th Coling, pp. 573-577, Budapest. 
