XML Viewer - j96-3001

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/j96-3001_metho.xml
Size: 85,185 bytes
Last Modified: 2025-10-06 14:14:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="J96-3001">
  <Title>Unification Encodings of Grammatical Notations</Title>
  <Section position="3" start_page="0" end_page="298" type="metho">
    <SectionTitle>
2. A Unification Formalism
</SectionTitle>
    <Paragraph position="0"> To begin with, we will define a basic unification grammar formalism. For convenience, we will use many of the notational conventions of Prolog.</Paragraph>
    <Paragraph position="1"> A category consists of a set of feature equations, written: {fl=vl,f2=v2 .... fN=vN} Feature names are atoms; feature values can be variables (beginning with an upper-case character), atoms (beginning with a number or a lowercase character) or categories. For example: {fl=X,f2=yes,f3={f4=l,f5=X}} Coreference is indicated by shared variables: in the preceding example, f l and f5 are constrained to have the same value. We often use underscore (_) as a variable if we are not interested in its value.</Paragraph>
    <Section position="1" start_page="296" end_page="298" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> For convenience and readability, we shall also allow as feature values lists of values, n-tuples of values, and Prolog-like terms: {fl= \[{f2=a},{f3=b}\] ,f4=(c,d,e) ,fS=foo(X,Y,Z)} These constructs can be regarded as &amp;quot;syntactic sugar&amp;quot; for categories. For example, a term foo (X,Y, Z) could be represented as a category {functor=foo, argl=X, arg2=Y, arg3=Z}. Tuples can be thought of as fixed-length lists, and lists can be defined as categories with features head and tail, as in Shieber (1986). We will use the Prolog notation for lists: thus \[bar I X\] stands for the list whose head is bar and whose tail (a list) is x.</Paragraph>
      <Paragraph position="1"> A lexical item can be represented by a category. For example: {cat =n, count=y, number=sing, lex=dog} {cat=det, number=sing, lex=a} {cat=verb, number=sing, person=3, subcat= \[\] , lex=snores} A rule consists of a mother category and a list of zero or more daughter categories. For example: {cat=s} ==&gt; \[{cat=np,number=N,person=P}, {cat=vp, number=N, person=P}\] A rule could equivalently be represented as a category, with distinguished features mother and daughters: {mother={cat=s}, daughters = \[{cat=np, number=N, person=P}, {cat =vp, number=N, person=P}\] } However, we will stay with the more traditional notation here. Various simple kinds of typing can be superimposed on this formalism. We can distinguish a particular feature (say cat) as individuating different types and associate with each different value of the cat feature a set of other dependent features. This will only be a sensible thing to do if we know that the value of the cat feature will always be instantiated when types are checked. We will write such declarations as: category (np, {person, number}) .</Paragraph>
      <Paragraph position="2"> cat egory (verb, {person, number, subcat } ) .</Paragraph>
      <Paragraph position="3"> The intent of declarations like this is to ensure that an NP or a verb always has these and only these feature specifications. One of the practical advantages of such a regime is that different categories can now be compiled into terms whose functor is the value of the eat feature, and whose other feature values can be identified positionally: for example, {cat=np, number=sing, person=3} would compile to np (3, sing). And in turn the advantage of this is that ordinary first order term unification (i.e., of the type (almost) provided by Prolog implementations) can be used in processing, guaranteeing almost linear performance in category matching.</Paragraph>
      <Paragraph position="4"> It is often convenient to use a slightly different notation when adopting such a regime, to make clear that one particular feature value has a privileged status. Thus we will frequently write: np:{person=3,number=sing} to mean: {cat=np,person=3,number=sing}.</Paragraph>
      <Paragraph position="5">  Computational Linguistics Volume 22, Number 3 We can also provide type declarations for features. We will assume a set of primitive types like atom or category, and allow for complex types also: feature(person, atom({1,2,3})). % value must be an atom in declared set feature(lex, atom). ~ value must be any atom feature(subcat, list(category)). ~ value must be a list of categories We can also, if required, use a simple type of feature default to make the categories written by a grammarian more succinct: default (person, 3) .</Paragraph>
      <Paragraph position="6"> default (number, noun, sing) .</Paragraph>
      <Paragraph position="7"> The effect of the first statement would be to ensure that at compile time, the feature person will be instantiated to 3 if it does not already have a value (of any kind). The second statement restricts the application of the default to members of the category noun. We will often assume that such defaults have been declared to make the various example rules and entries more succinct.</Paragraph>
      <Paragraph position="8"> It is also very often convenient to allow for macros, expanded at compile time, to represent in a readable form commonly occurring combinations of features and values. We will assume that such macros are defined in ways suggested by the following examples, and that at compile time, the arguments (if any) of the defined macro are unified with the arguments of the instance of it in a rule or lexical item. In some cases, the results of macro evaluation may need to be spliced into a category: for example, when the result is a set of feature specifications.</Paragraph>
      <Paragraph position="9"> macro (transitive_verb (Stem), v: {lex=Stem, subcat= \[np: {}\] }) .</Paragraph>
      <Paragraph position="11"> Thus the grammarian might now write: transitive_verb (kick).</Paragraph>
      <Paragraph position="12"> phrasal_verb (switch, of f).</Paragraph>
      <Paragraph position="13"> s:{A,fl=vl .... } ==&gt; \[np:{B,f2=v2,..}, vp:{C,f3=v3,...}\] where thread_gaps (A,B,C) .</Paragraph>
      <Paragraph position="14"> and these will be expanded to: v : {lex=kick, subcat = \[rAp : {}\] } v : {lex=swit ch, subcat= \[np : {}, p : {lex=of f}\] } s : {gapin=I, gapout=0, f l=vl .... } ==&gt; \[np: {gapin=I, gapout=N, f 2=v2 .... }, vp : {gapin=N, gapout=0, f3=v3 .... }\] Notice that the values of variables in categories like s : {A .... } need to be spliced in when the macro is evaluated at compile time.</Paragraph>
      <Paragraph position="15"> Finally we will point out that multiple equations for the same feature on a category are permitted (where they are consistent). Thus a rule like: a:{f=V} ==&gt; \[h:{},c:{f=V,f=d:{fl=a}}\]</Paragraph>
    </Section>
    <Section position="2" start_page="298" end_page="298" type="sub_section">
      <SectionTitle>
Pulrnan Unification Encodings
</SectionTitle>
      <Paragraph position="0"> is valid, and means that the value on c of f, which may be only partly specified, will be the same on category a.</Paragraph>
      <Paragraph position="1"> This completes our definition of a basic unification grammar formalism. While the notational details vary, the basic properties of such formalisms will be very familiar. We turn now to descriptive devices not present in the formalism as defined so far, and to ways of making them available.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="298" end_page="301" type="metho">
    <SectionTitle>
3. Kleene Operators
</SectionTitle>
    <Paragraph position="0"> Kleene operators like * (0 or more) or + (1 or more) are frequently used in semi-formal linguistic descriptions. In a context-free-based formalism they must actually be interpreted as a notation for a rule schema, rather than as part of the formalism itself: something like A -&gt; B C* D is a shorthand for the infinite set of rules: A -&gt; B D, A -&gt; B C D, A -&gt; B C C D, etc.</Paragraph>
    <Paragraph position="1"> While not essentially changing the weak generative capacity of a CFG, the use of Kleene operators does change the set of trees that can be assigned to sentences: N-ary branching trees can be generated directly.</Paragraph>
    <Paragraph position="2"> In some unification-based formalisms (e.g. Briscoe et al. 1987; Arnold et al. 1986) Kleene operators have been included. However, in the context of a typed unification formalism like ours, the exact interpretation of Kleene operators is not completely straightforward. Some examples will illustrate the problem. In a formalism like that in Arnold et al. (1986), grammarians write rules like the following, with the intent of capturing the fact that an Nbar can be preceded by an indefinite number of Adjective Phrases provided that (in French, for example) they agree in gender, etc., with the Nbar: iap:{agr=A} ==&gt; \[ .... adjp : {agr=A}*, nbar : {agr=A}\] This is presumably intended to mean that if an AdjP is present, with agr instantiated to some value, then succeeding instances of AdjP must have agr bound to the same value, as must the Nbar. But a rule like this does not make clear what is intended for the values of any features on an AdjP not mentioned on the rule. Presumably it is not intended that all such values are shared, for otherwise such a rule would parse the first two of the following combinations, but not the third, which simply contains the concatenation of the adjectives appearing in the first two: adjp:{agr=m, foo=a} nbar:{agr=m} adjp: {agr=m, foo=b} nbar : {agr=ra} adjp:{agr=ra, foo=a} adjp:{agr=m, foo=b} nbar:{agr=m} Alternatively, the intention might be that only features explicitly mentioned on the rule are to be taken account of when &amp;quot;copying&amp;quot; the Kleene constituent. But this is still not an interpretation that is likely to be of much practical use. Unification formalisms like ours are intended to be capable of encoding semantic as well as syntactic descriptions. In order to properly combine the meaning of the AdjP* with that of the Nbar (as a conjunction, say), to give the meaning of the mother NP, some feature on the AdjP* like sere=.., will at least have to be mentioned in building the NP meaning. But this very fact will mean that the interpretation of all the AdjPs encountered will be constrained to have the same value for sere as the first one processed. This is clearly not what the grammarian would have intended. The grammarian presumably wanted the value of the sere feature to depend on the AdjP actually present, while wanting  Computational Linguistics Volume 22, Number 3 the value of the agr feature to be set ultimately by the Nbar. Unfortunately, it is not possible to combine these conflicting requirements.</Paragraph>
    <Paragraph position="3"> At this point the reader might well wonder why Kleene operators were wanted in the first place. In most grammars, Kleene * is used for two different reasons. In the first type of case, like that just illustrated, it is used when it is not known how many instances of a category will be encountered. (PP or adverbial modification of VP is a similar case.) Under these circumstances, it is in fact very often the case that a recursive analysis is empirically superior. For example, an English NP rule like: np:{}==&gt; \[det:{}, adjp:{}*, nbar:{}\] actually makes it impossible to capture Nbar co-ordination (unless it is treated as ellipsis). In phrases like: there is no alternative analysis or clever trick in order to get the correct syntax and interpretation, alternative analysis or clever trick has to be treated as a conjunction of premodified Nbars. On an analysis that treats the construction recursively, this is no problem.</Paragraph>
    <Paragraph position="4"> The second reason for which Kleene * is used is to get a flat structure, where there is no evidence for recursion. Examples of this might be, on some analyses, the German &amp;quot;middle field&amp;quot;; and some types of coordination. For these cases, it is genuinely important to have some way of achieving the effect of Kleene operators.</Paragraph>
    <Paragraph position="5"> In our formalism, there are several ways of achieving an equivalent effect. The easiest and most obvious way is to turn the iteration into recursion, with the necessary flat structure being built up as the value of a feature on the highest instance of the recursive expansion. The following schematic rules show how this can be done: kleene : {kcat=C,kval= \[\] } ==&gt; \[\] terminate the recursion kleene : {kcat=C, kval= \[C I T\] } ==&gt; \[C, kleene : {kcat=C, kval=T}\] Y~ find a C, followed by C~ For the other Kleene operators (+, +2, etc.), instead of the first Kleene rule termi- null nating the recursion with an empty category, it terminates with one, two, or however many instances of the category are required. With a suitable macro definition for *, a grammarian can now write rule 1 in the form of rule 2, which will be expanded to 3: 1. a:{} ==&gt; \[b:{}, 2. a:{} ==&gt; \[b:{}, 3. a:{} ==&gt; \[b:{},  C:{}*, d:{}\] *(C:{},C), d:{}\] kleene:{kcat=c:{},kval=C}, d:{}\] A sequence of three cs will be parsed with a structure:</Paragraph>
    <Paragraph position="7"/>
    <Section position="1" start_page="300" end_page="301" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> This structure is, of course, recursive. However, a flat list of the occurrences of c is built up as the value of kval on the topmost Kleene category. Anything that the flat constituent structure was originally needed for can be done with this list, the extra levels introduced by the recursion being ignored.</Paragraph>
      <Paragraph position="1"> It is, however, possible to get a flatter tree structure more directly, and also to overcome the problem with features used for semantic composition. In order to do this we take advantage of the fact that our formalism allows us to write rules with variables over lists of daughters. 1 We assume a category kleene with three category valued features: finish, kcat (kleene category), and next. We enrich the grammatical notation with a * which can appear as a suffix on a daughter category in a rule. Thus our grammarian might write something like: np:{agr=A} ==&gt; \[det:{agr=A}, adj:{agr=A}*, n:{agr=A}\] This is then compiled into a set of rules as follows:</Paragraph>
      <Paragraph position="3"> The original category appears as the value of the kcat feature, and the categories that followed this one in the original rule appear as the value of the finish feature.</Paragraph>
      <Paragraph position="4"> The value of the feature next is a variable over the tail of the daughters list, in a way reminiscent of many treatments of subcategorisation.</Paragraph>
      <Paragraph position="6"> In rule 2, the kleene category is rewritten as an adj, which will share all its features with the value of kcat. The value of next is another instance of the kleene category, which shares the value of the finish feature, and where the value of the kcat feature is the adj category as it appeared on the original rule. This ensures that only the features mentioned on the kleene category will be identically instantiated across all occurrences, enabling the semantic problem mentioned earlier to be solved (at least in principle: the current illustration does not do so). Clearly when the mother of this rule is unified with the corresponding daughter of rule 1, the effect will be to extend the list of daughters of rule 1 by adding the value of next. Since this value is itself a list, now consisting of a kleene category and a variable tail, the resulting structure can again be combined with a following kleene category having the appropriate values.</Paragraph>
      <Paragraph position="7"> This process can continue ad infinitum.</Paragraph>
      <Paragraph position="8"> 3. kleene:{finish=F,next=F} ==&gt; \[\] The third rule (which is general and so only need occur once in the compiled grammar) terminates the iteration by extending the daughters of rule I by the sequence of categories that appeared in the original rule.</Paragraph>
      <Paragraph position="9"> 1 A referee has pointed out that this is akin to the metavariable facility of some Prolog systems (Clark and McCabe 1984), and that a somewhat similar technique, in the context of DCGs, is described by Abramson 1988).</Paragraph>
      <Paragraph position="10">  Computational Linguistics Volume 22, Number 3 Now a sequence det ad31 adj 2 adj 3 n will be parsed having the following structure: null</Paragraph>
      <Paragraph position="12"> The values of the ad3 daughters to kleene will be present as the value of kcat, and so for all practical purposes this tree captures the kind of iterative structure that was wanted.</Paragraph>
      <Paragraph position="13"> In some cases, the extra level of embedding that this method gives might actually be linguistically motivated. In this case, the idea behind the compilation just described can be incorporated into the analysis directly. To give an illustration, the following grammar generates indefinitely long, flat, NP conjunctions of the &amp;quot;John, Mary, Bill, ..., and Fred&amp;quot; type.</Paragraph>
      <Paragraph position="15"> np:{flatconj=n, next=\[\]} ==&gt; \[conj:{}, np:{}\] These rules will give a structure: \[NP \[NP ,\] \[NP ,\] \[NP ,\] . . . \[and/or NP\]\] The trick is again in the unification of the value of the feature next on the daughter of rule 1 and the mother of rule 2. This unification extends the number of daughters that rule 1 is looking for. Rule 3 terminates the recursion. The feature flatconj stops spurious nestings, if they are not wanted.</Paragraph>
      <Paragraph position="16"> In English, at least, this type of conjunction is the only construction for which a Kleene analysis is convincing, and they can all be described satisfactorily in this manner.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="301" end_page="304" type="metho">
    <SectionTitle>
4. Boolean Combinations of Feature Values
</SectionTitle>
    <Paragraph position="0"> Our formalism does not so far include Boolean combinations of feature values. The full range of such combinations, as is well known, can lead to very bad time and space behavior in processing. Ramsay (1990) shows how some instances of disjunction can be avoided, but there are nevertheless many occasions on which the natural analysis of some phenomenon is in terms of Boolean combinations of values.</Paragraph>
    <Paragraph position="1"> One extremely useful technique, although restricted to Boolean combinations of atomic values, is described by Mellish (1988). He gives an encoding of Boolean combinations of feature values (originally attributed to Colmerauer) in such a way that satisfiability is checked via unification. This technique is used in several systems (e.g. Alshawi 1992; the European Community's ALEP (Advanced Linguistic Engineering Platform) system; Alshawi et al. 1991). We describe it again here because we will need to know how it works in detail later on.</Paragraph>
    <Section position="1" start_page="302" end_page="304" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> Given a feature with values in some set of atoms, or product of sets of atoms, any Boolean combination of these can be represented by a term. The encoding proceeds as follows, for a feature f with values in {1,2} * {a,b,c}. We want to write feature equations like:</Paragraph>
      <Paragraph position="2"> To encode these values we build a term with a functor, say bv (for Boolean vector) with N+I variable arguments, where N is the size of the product of the sets from which f takes its values. In the example above, N=6, so by will have seven arguments.</Paragraph>
      <Paragraph position="3"> Intuitively, we identify each possible value for f with the position between arguments in bv: by( ............. ).</Paragraph>
      <Paragraph position="4"> 1 1 1 2 2 2 a b c a b c In building the term representing a particular Boolean combination of values, what we do is work out, for each of these positions, whether or not it is excluded by the Boolean expression. The simple way to do this is to build the models as sets of atoms, and then test the expression to see if it holds of each one. For example, take f= (a;b)&amp;2. The models are {{1,a},{l,b},{l,c},{2,a},{2,b},{2,c}}.</Paragraph>
      <Paragraph position="5"> An atomic expression like a holds of a model if it is a member, and fails otherwise: a here only therefore holds of the two models containing a. Truth functions of atoms can be interpreted in the obvious way. The feature value of f above holds only of {2, a} and {2,b}. Thus all other combinations are excluded.</Paragraph>
      <Paragraph position="6"> For each position representing an excluded combination we unify the variable arguments on each side of it. In our example this gives us: bv(A , A , A , A , B , C , C).</Paragraph>
      <Paragraph position="7"> 1 1 1 2 2 2 a b c a b c Finally, we instantiate the first and last argument to different constants, say 0 and 1. Because of the shared variables, this will give us: bv(O , 0 , 0 , 0 , B , 1 , 1).</Paragraph>
      <Paragraph position="8"> 1 1 1 2 2 2 a b c a b c The reasoning behind this last step is that if all the possibilities are excluded, then all the variables will be linked. But if all the possibilities are excluded, then we have an impossible structure and we want this to be reflected by a unification failure. If we know that the first and last arguments are always incompatible, then an attempt to link up all the positions will result in something that will be trying to unify 0 and 1, and this will fail, as required.</Paragraph>
      <Paragraph position="9"> Notice that the number of arguments in the term that we build for one of these Boolean expressions depends on the size of the sets of atomic values involved. This can grow rather big, of course.</Paragraph>
      <Paragraph position="10">  Computational Linguistics Volume 22, Number 3 Sometimes, it happens that although the set of possible values for a feature is very large, we only want to write Boolean conditions on small subsets of those values. A typical case might be a feature encoding the identifier of a particular lexical item: in English, for example, the various forms of be often require extra constraints (or relaxation of constraints) which do not apply to other verbs. However, we would not want to build a term with N+I arguments where N is the number of verbs in English. Under these circumstances there is a simple extension of this encoding. Assume the feature is called stem. We encode the set of values as something like: {be, have, do, anon}, where anon is some distinguished atomic value standing for any other verb. Then we can write things like: stem=be stem=- (be ;have) stem=have ; do etc.</Paragraph>
      <Paragraph position="11"> However, to express the constraints we need to express, the encoding has to be a little more complex. We could build a term of N+I arguments, as before, where N=4. But now all the items that fall under anon will be encoded as the same term. This means that we are losing information: we cannot now use the stem feature to distinguish these verbs. What we have to do is to give the bv functor another argument, whose values are those of the original feature: in our example, all the different verb stems of English. In other respects we encode the values of the stem feature as before, but with the extra argument the encodings now look like: be: stem=by(be, 0 , 1 , 1 , i , 1) b h d anon have: stem=bv(have, 0 , 0 , 1 , 1 , I) b h d anon expect: stem=bv(expect, 0 , 0 , 0 , 0 , 1) b h d anon decide: stem=bv(decide, 0 , 0 , 0 , 0 , 1) b h d anon The extra argument can now distinguish between the anon verbs. Everything else works just as before.</Paragraph>
      <Paragraph position="12"> This extension can also be generalized to products of large sets. For example, we might want a feature whose value was in the product of the set of letters of the alphabet and positive whole numbers. And let us suppose that we want to exclude some particular combinations of these. The particular constraints we need to write might figure in the grammar as: id=~ (c~(12; 13)) That is, everything except c~12 and c~13. At compile time, when we have examined the whole grammar and lexicon, we know which values are actually mentioned, and we can represent the value space of this feature as: (c, anon1} * {12,13, anon2}, where anon1 and anon2 are again atoms standing in for all the other values. We need two extra arguments this time, and then expressions like ga444, c~13, and (ca (12 ; 13) ) will be coded as:</Paragraph>
      <Paragraph position="14"/>
      <Paragraph position="16"> Notice that for the original Boolean expressions, we may not be able to fill in all the extra argument places.</Paragraph>
    </Section>
    <Section position="2" start_page="304" end_page="304" type="sub_section">
      <SectionTitle>
4.1 Implementation
</SectionTitle>
      <Paragraph position="0"> Implementation of this technique requires the grammar writer to declare a particular feature as being able to take values in some Boolean combination of atoms, for example, something like: bool_comb_feature (agr, \[ \[i, 2,3\] , \[sing, plur\] \] ) .</Paragraph>
      <Paragraph position="1"> Lists of lists of atoms represent the subsets whose product forms the space of values. To compile the value of a particular bool comb feature when in the grammar, first, using the declarations, precompute the set of models (i.e., the space of values). Assume this set has N members. Then, for each feature=value equation, construct for the value an N+I vector whose first member is 1 and whose last is 0, and where all the other members are initially distinct variables. Now encode the feature value into this vector as follows: for i = I to N-I, if feature value does not hold of the i'th model in the set then unify vector positions i and i+1.</Paragraph>
      <Paragraph position="2"> If the models in the set are represented as lists of atoms, then a single atom as feature value holds of (is true in) a model if it is a member of the list representing the model, a conjunction of atoms holds if both conjuncts hold, etc.</Paragraph>
      <Paragraph position="3"> To implement the extensions just described requires only the addition of the right number of extra argument places to hold the original atoms, where relevant.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="304" end_page="310" type="metho">
    <SectionTitle>
5. Type Hierarchies and Inheritance
</SectionTitle>
    <Paragraph position="0"> Type hierarchies are becoming as ubiquitous in computational linguistics as they have been in knowledge representation. There are several techniques for compiling certain kinds of hierarchy into terms checkable by unification: Mellish (1988) describes them.</Paragraph>
    <Paragraph position="1"> The version presented here derives from a more general approach to the implementation of lattice operations by Ait-Kaci et al. 1989, which shows how to implement not only efficient unification of terms (greatest lower bound, &amp;quot;glb&amp;quot;) in a type lattice but also of generalization (least upper bound, &amp;quot;lub&amp;quot;) and complement.</Paragraph>
    <Paragraph position="2"> We will restrict our attention to hierarchies of the type described by Carpenter (1992, Chapter 1), i.e., bounded complete partial orders, (but using the terminology of Ait-Kaci (1986). Carpenter's lattices are upside down, and so for him unification is &amp;quot;least upper bound&amp;quot; and so on.) We further restrict our attention to hierarchies of atomic types. (While in principle the encoding below would extend to non-atomic (but still finite) types, in practice the resulting structures are likely to be unmanageably large.) In our presentation, we make these hierarchies into lattices: they always have a top and bottom element and every pair of types has a glb and lub. Having a glb of  Computational Linguistics Volume 22, Number 3 btm is read as failure of unification. Having a lub Of top means that the two types do not share any information.</Paragraph>
    <Paragraph position="3"> One example of such a lattice is Ait-Kaci (1986, 223)</Paragraph>
    <Paragraph position="5"> A teenager is both an adult and a child; a queen is a monarch, etc. The glb of adult and child is teenager; the lub is person.</Paragraph>
    <Paragraph position="6"> The lattice that we will use for illustration is:</Paragraph>
    <Paragraph position="8"> Notice how easy it is to get a lattice that does not obey our constraints. By adding a line from exports to person (either the slave trade or the brain drain) we get a situation where exports and living no longer has a greatest lower bound, although this would be a perfectly natural inheritance link to want to add.</Paragraph>
    <Paragraph position="9"> To encode the information in this lattice in a form where glb and lub can be computed via unification we first make an array representing the reflexive transitive closure of the &amp;quot;immediately dominates&amp;quot; relation, which is pictured in the diagram above by lines.</Paragraph>
    <Paragraph position="11"> In each row we put a I if the row element dominates the column element, (i.e., column is a subtype of row) and a 0 otherwise. Since everything is a subtype of itself, and btm is a subtype of everything, there is a 1 in each of the diagonal cells, and in the cell for btm on each row. Taking the agent row, we also have a 1 for the institution column</Paragraph>
    <Section position="1" start_page="306" end_page="308" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> and a 1 for the person column. We will refer to such a row as a &amp;quot;bitstring,&amp;quot; although as we have represented it, it is a list rather than a string. (The sharp-eyed reader will see various other list and term representations of things that are logically bitstrings in what follows. I apologize for this abuse of terminology, but have got into the habit of calling them bitstrings.) This is the first step of the encoding technique described by Ait-Kaci et al. (1989).</Paragraph>
      <Paragraph position="1"> They point out that what the rows of this array represent is the set of lower bounds of the row element, via a bitstring encoding of sets. Thus the AND of two rows will represent the set of lower bounds they have in common. This will in fact be the case whether or not the lattice has the properties we are assuming. If it does not, then it will be possible for the bitstring representing the lower bounds of the two types to be distinct from any row. In our case, however, the bitstring will always coincide with one row exactly. This row will represent the glb of the two types.</Paragraph>
      <Paragraph position="2"> Unfortunately, however, ANDing of bitstrings is not the kind of operation that is directly available within the unification formalism we are compiling into. So we have to encode it into a unification operation. For this we can turn again to the Colmerauer encoding of Boolean combinations of values.</Paragraph>
      <Paragraph position="3"> Informally, we regard a bitstring like those in the rows of the array above as a representation of the disjunction of the members of the set of lower bounds of the type. So the row for agent: \[b,t,a,i,l,n,p,p, c,e\] \[i,0,i, 1,0,0, l,O,O,O\]agent is regarded as meaning &amp;quot;btm or agent or institution or person.&amp;quot; Then we can encode the bitstring directly into a Boolean vector term of the kind we discussed earlier. The term will have N+I arguments, where N is the length of the bitstring, and adjacent arguments will be linked if their corresponding bitstring position is zero, and otherwise not linked. The term corresponding to the bitstring for agent will then be: bv(A,B,B,C,D,D,D,C,C,C,C) bv(O,B,B,C,D,D,D,I,I,I,I) bt ai lnpp c e before and after instantiation of the first and final arguments to 0 and 1, respectively, respectively.</Paragraph>
      <Paragraph position="4"> The term corresponding to the living bitstring will be:</Paragraph>
      <Paragraph position="6"> Unifying the two terms together: bv(0,B,B,C,D,D,D, 1, I, 1, i) bv(0,E,E,E,E,F,F,G,I,I,1) bv(0,B,B,B,B,B,B, i, i, 1, i) When we decode this, by the reverse translation (identical adjacent arguments means 0), we get: bv (O,B,B,B,B,B,B, I, i, i, I) = \[1,o,o,0,o,0,1,0,o,o\] which is the bitstring for person, the greatest lower bound of the two types agent and living, as required.</Paragraph>
      <Paragraph position="7">  Computational Linguistics Volume 22, Number 3 With this encoding, unification will never fail, since every pair of types has a glb, even if this is btm. However, since having a glb of btm is usually meant to signal that two types are incomparable and thus do not have a glb, it would be more useful if we could contrive that unification would fail for such cases. In the usual Colmerauer encoding, an impossible Boolean combination is signaled by all the arguments being shared. This will cause an attempt to unify the first and last arguments of the term, which, being 0 and 1, will cause the unification to fail. Such a failure will never happen in our encoding thus far: since the entry for btm in each bitstring is 1, there will always be one adjacent argument pair unlinked, and so unification will always succeed. If, on the other hand, we simply omit btm from the list of types, then when two types have no lower bound, the result of ANDing together their corresponding bit-string will be a bitstring consisting entirely of zeros. Thus, unifying any two Boolean vector terms that results in the term encoding such a bitstring will fail: if all the elements are zero, then all the arguments will be linked, and we will be trying to unify 0 and 1. Everything else will work just as before.</Paragraph>
      <Paragraph position="8"> We have been dealing with type hierarchies that have the property of being bounded complete partial orders, except that we have added a btm element to ensure that every pair of types has a glb. Hierarchies of this sort, when they have a top element, have the defining property of lattices that every pair of types has both a glb and lub. Being complete lattices, they also have the property that they can be inverted, by taking &amp;quot;immediately dominates&amp;quot; to be &amp;quot;immediately dominated by.&amp;quot; Furthermore, what in the original lattice was the glb of two types is now the lub and vice versa. Hence, by computing an array based on the inverse relation one can use exactly the same technique for computing least upper bounds, or the generalization of two types. The array generated for the inverted lattice is:  which corresponds to the lub in the original lattice.</Paragraph>
      <Paragraph position="9"> However, the notion of generalization captured in this way is not distributive (because the lattice is not). If it were, then we should expect the following combinations to yield the same result, where g and u represent generalization and unification:</Paragraph>
      <Paragraph position="11"> Whereas in the lattice we are using for illustration, some choices for A, B and C, do have this property (e.g., A=agent, B=person, C=living), other choices (e.g., A=person, B=plant, C=computer) do not.</Paragraph>
      <Paragraph position="12">  We would do well to require distributivity, for otherwise, operations on lattices will become order dependent. In order to do this we have to make our original lattice a distributive one, making new disjunctive types. We can achieve this effect by instead taking our original lattice (the right way up) and using bitwise disjunction of elements to represent generalizations.</Paragraph>
      <Paragraph position="14"> However, notice that this bitstring does not correspond to any existing row in the original array. It corresponds instead to the disjunctive object {person;plant}. This object is extensionally identical to the type living: in decoding we can recover this fact by finding a bitstring which has a 1 in (at least) every position that the bitstring describing the disjunctive object has a 1, and as few as possible ls other than this.</Paragraph>
      <Paragraph position="15"> This bitstring will be the description of living. In general, to identify the equivalent object for some &amp;quot;virtual&amp;quot; type we take the type description X and find the least object Y such that the generalization of X and Y equals Y.</Paragraph>
      <Paragraph position="16"> Unfortunately, for these lattices I have not been able to find a way of encoding generalization as disjunction of bitstrings in such a way that the resulting encoding will interact with the previous encoding of unification as conjunction of bitstrings. So it is possible to have either generalization or unification, but not both within the same feature system, at least with this encoding.</Paragraph>
    </Section>
    <Section position="2" start_page="308" end_page="310" type="sub_section">
      <SectionTitle>
5.1 Implementation
</SectionTitle>
      <Paragraph position="0"> In the context of linguistic descriptions the types concerned are often categories, i.e., non-atomic entities. The compilation technique given here assumes that the types are atomic. Of course, where the ranges of feature values are finite, hierarchies of non-atomic types can always be expanded into hierarchies of atoms. It is likely that the resulting encodings would be rather large (although Ait-Kaci et al. (1989) describe some compaction techniques). It is thus unlikely that the compilation technique would be able to completely compile away the complex non-atomic type hierarchies used in, say, HPSG.</Paragraph>
      <Paragraph position="1"> However, a useful compromise is to add to our formalism a new type of feature, whose values are members of an implicit lattice of atomic types. We will illustrate with a partial analysis along these lines of agreement in NPs in English. Traditionally, agreement in NPs is taken to be governed by at least three features: person and number (often combined in a feature &amp;quot;agr&amp;quot;) and something like &amp;quot;mass/count.&amp;quot; The person feature is only relevant for subject-verb agreement, but at least number and mass/count are necessary to get the right combinations of determiner (or no determiner) and noun in the following:  We can express the appropriate generalizations quite succinctly by defining a feature whose values are arranged in a hierarchy:</Paragraph>
      <Paragraph position="3"> From the basic traditional types of count (sg and pl) and mass nouns we construct two supertypes: sing(ular) and opt(ional)det(erminer).</Paragraph>
      <Paragraph position="4"> The grammarian needs to add a declaration describing the set of types and the partial order, expressed as immediate dominance, on them.</Paragraph>
      <Paragraph position="5"> part ial_order J eature (agr,</Paragraph>
      <Paragraph position="7"> sing: \[mass, sg\] \] ) .</Paragraph>
      <Paragraph position="8"> From this declaration it is easy to compute the array representing the reflexive transitive closure of immediately dominates:  Now it is easy to precompute for each atomic type, represented by a row of the array, a vector like that for bool comb feature. In this case, each vector will have nine elements, and adjacent positions will be linked if the corresponding column element is 0.</Paragraph>
      <Paragraph position="9"> Given such a feature, the following rules and lexical entries are sufficient to account for the data above, where in a more traditional feature-based approach we would have had multiple entries for the determiners, and two rules for the determiner-less NPs: one for the case of a mass noun, the other for the plurals.</Paragraph>
      <Paragraph position="10"> np: {agr=A} ==&gt; \[n: {agr=optdet, agr=A}\] np: -\[agr=A} ==&gt; \[det : {agr=A},n : {agr=A}\] the: det:{agr=any} a: det : {agr=sg} some : det : {agr=sing} Of course, we could achieve a similar result by using the Boolean feature combinations described earlier. We could define a feature with values in {sing ,plur}*{mass, count} and provide rules and entries with the appropriate Boolean combinations of these. This will always be possible, so, strictly speaking, the encodings we have described are not  necessary. However, there are two reasons for maintaining the type inheritance encoding separately from the Boolean feature combination. Firstly, although in many cases the Boolean encoding might, as here, seem to have a size advantage, in general this need not be the case, especially when the techniques for compaction of bit arrays described by Ait-Kaci et al. (1989) are used. Secondly, and perhaps more importantly for the grammarian, in many cases using the Boolean combination would be a linguistically inaccurate solution. Having a definition like that just given implies that it is just an accident that there are no massaplur NPs, since they are a linguistically valid combination of features, according to the declaration. In this case, and similar ones, the description in terms of type inheritance would be regarded as capturing the facts in a more natural and linguistically motivated way.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="310" end_page="315" type="metho">
    <SectionTitle>
6. Threading and Defaults
</SectionTitle>
    <Paragraph position="0"> The technique of gap threading is by now well known in the unification grammar literature. It originates with Pereira (1981) and has been used to implement wh-movement and other unbounded dependencies in several large grammars of English (Bobrow, Ingria, and Stallard 1991; Pulman 1992).</Paragraph>
    <Paragraph position="1"> The purpose of this section is to point to another use of the threading technique, which is to implement a rather simple, but very useful, notion of default: a notion that is, however, completely monotonic! Consider the following problem as an illustration. In an analysis of the English passive, we might want to treat the semantics in something like the following way: Joe was seen = exists(e,see(e,something,joe)) Joe was seen by Fred = exists (e, see (e, fred, joe)) Joe was seen on Friday by Fred = exists(e,see(e,fred,joe) a on(e,friday)) Joe was seen on Friday in Cambridge by Fred</Paragraph>
    <Paragraph position="3"> Whether we assume a flat structure for the VP modifiers: \[vp pp pp . \] or a recursive one: \[\[\[vp pp\] pp\] pp\] the problems for the grammarian writing a compositional semantic rule for sentences like this are: (i) where the verb meaning is encountered, we do not know whether there is an explicit agent phrase (ii) if there is no agent, we need to put in an existential quantifier of some sort (something).</Paragraph>
    <Paragraph position="4"> (iii) if there is an agent phrase, we need to put in the meaning of the NP concerned in subject position (iv) if there is an agent phrase, it may come at any distance from the verb: in particular, we cannot be guaranteed that it will be either the lowest of  Computational Linguistics Volume 22, Number 3 the highest of the VP modifiers in such sentences. (If we could, then the problem could be solved either at the lowest VP level, or at the point where the VP is incorporated into the S).</Paragraph>
    <Paragraph position="5"> We can formulate this generalization in terms of defaults: the default subject for a passive is something, unless an explicit agent is present, in which case it is that agent. We can encode this analysis literally in terms of threading. The basic idea is that we thread an agent=(In, Out) feature throughout the VP. On the passive verb itself, the features will look like this: v:{if=see(Subj,Obj),agent=(Subj,something),subj=Obj .... } The surface subject is the semantic object (passed down via the feature subj. The semantic subject is the value of the In part of the agent feature. This feature sends the default value something as the Out value. We arrange things so that this value is threaded throughout the VP and returns as the value of the In part if no agent phrase is present. In other words, any PP that is not an agent phrase simply threads the value of this feature unchanged. If an agent phrase is present, the logical form of the NP in it replaces the something and becomes the Out value. The topmost VP, which is a constituent of the S, will unify the In and Out values so that either the default agent meaning, or an actual agent meaning is therefore transmitted eventually back to the verb. (If we require the In value of the agent phrase to be the default value for an agent i.e., a meaning like something, then this threading analysis has the incidental advantage that only one agent phrase can be present.) Some schematic rules to achieve this might be: s:{if=Vsem} ==&gt; \[np:{if=S}, vp:{if=Vsem,subj=S,agent=(A,A)}\] The sentence semantics is taken from VP. The NP subj meaning is sent down to the head V to be put in its first argument position. The Agent thread is passed on by unifying In and Out values.</Paragraph>
    <Paragraph position="7"> The mother VP meaning is taken from the PP, which will simply conjoin its own meaning to that of the daughter VP if the PP is a modifier. If the PP is an agent, it will pass up the daughter VP meaning. PP meanings are functions from VP meanings to VP meanings (or more generally, from predicates to predicates).</Paragraph>
    <Paragraph position="8"> pp:{agent=(A,A),if=and(Vsem,...), vlf=Vsem} ==&gt; \[p:{}, np:{}\] A nonagentive PP co~oinsitsmeaning to that ofthe VE and passesthe agentthread unchanged.</Paragraph>
    <Paragraph position="9"> pp:{agent=(something,NPsem),if=Vsem, vlf=Vsem} ==&gt; \[p:{}, np:{if=NPsem}\] An agentive PP replaces the default agent value with that of the agentive NP and passes up the daughter VP meaning.</Paragraph>
    <Paragraph position="11"/>
    <Section position="1" start_page="312" end_page="312" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> This rule introduces a passive form of V as complement to be, and sends up the default agent meaning.</Paragraph>
      <Paragraph position="1"> I described this technique as expressing a limited notion of default. There are several linguistic concepts which seem amenable to an analysis in terms of such a technique. A notable example might be the LFG concepts of functional completeness and coherence. In all implementations of LFG parsers that I am aware of, such checks are built into the parsing algorithm. However, it should instead be possible to achieve the same effect by compiling the unification part of an LFG grammar in such a way that completeness and coherence are checked via unifiability of two features: one going up, saying what a verb is looking for by way of arguments, and one coming down, saying what has actually been found.</Paragraph>
    </Section>
    <Section position="2" start_page="312" end_page="313" type="sub_section">
      <SectionTitle>
6.1 Implementation
</SectionTitle>
      <Paragraph position="0"> The easiest way to implement this use of threading is by defining and using macros such as those given earlier for illustration. Some implementations (e.g., Karttunen 1986) build threading into the grammar compiler directly, but this can lead to inefficiency if features are threaded where they are never used.</Paragraph>
      <Paragraph position="1"> 7. Threading and Linear Precedence Threading can also be used as an efficient way of encoding linear precedence constraints. This has been most recently illustrated within the HPSG formalism by Engelkamp, Erbach, and Uskoreit (1992). Given a set of some partial ordering constraints and a domain within which they are to be enforced, the implementation of this as threading proceeds as follows.</Paragraph>
      <Paragraph position="2"> Firstly, given a set of constraints of the form a &lt; c, b &lt; d, etc., where each of a, b, c, d is some kind of category description, then add to each instance of the category that can appear within the relevant domains some extra features encoding what is not permitted to appear to the left or right on each category within that domain. How this is done depends entirely on what features and categories are involved: we could use Boolean combinations of atomic values, category valued features, or, as in the example below, a pair of term-valued features, left and rLght.</Paragraph>
      <Paragraph position="3"> Secondly, for each relevant rule introducing these categories in the given domains, we need to identify among the daughters some kind of head-complement or governorgoverned relation. Exactly what this is does not matter: if there is no intuitively available notion, it can be decided on an ad hoc basis. The purpose of this division is simply to identify one daughter whose responsibility it is to enforce ordering relations among its sisters and to transmit constraints on ordering elsewhere within the domain, both downwards to relevant subconstituents, and upwards to constituents containing this one but still within the domain within which the ordering must be enforced. On each relevant rule we need a feature on the mother and the distinguished daughter, here called store, following the terminology of Engelkamp, Erbach, and Uskoreit (1992), and threading features, in and out or their equivalent, on the relevant sister constituents.</Paragraph>
      <Paragraph position="4"> To illustrate the technique in the simplest possible form, here is a small grammar for an artificial language. The language consists of any sequence of four ys from the set {a,b,c,d} within a constituent labeled x, provided that the LP constraints a &lt; c, b &lt; d are observed.</Paragraph>
      <Paragraph position="5"> First we encode the categories in question with the LP constraints in terms of what can precede and follow them. We represent this as a tuple, with a position for each of the relevant categories: (a,b, c,d). The feature le~t encodes what can precede, and  Computational Linguistics Volume 22, Number 3 right what may follow a category. If a member of the tuple cannot precede or follow the current category we put a no in that position of the tuple, otherwise we leave it uninstantiated.</Paragraph>
      <Paragraph position="6"> Next we thread a similar tuple through each category to record which category it is. Thus the position in the tuple for a b must have a b in that position in the out value. All the other positions are simply linked by shared variables.</Paragraph>
      <Paragraph position="7"> /* lexical entries: LP = a &lt; c, b &lt; d */</Paragraph>
      <Paragraph position="9"> This rule just says that an x is a valid parse.</Paragraph>
      <Paragraph position="10"> x: {store=S} ==&gt; \[y: {out=S}\] An x can consist of just a y. The store of the x is the out value of the y. In the other rules, x acts as the distinguished daughter, and y as the subsidiary daughter. x: {store=A} ==&gt; \[y : {out=A, in=B, right=B}, x : {store=B}\] When the distinguished daughter follows the subsidiary, the right value of the subsidiary must be unified with its in value and the store of the distinguished daughter. This means that any y categories following this one will be recorded in the store of the x daughter, and will have to be consistent with the constraints recorded on this y daughter's right feature.</Paragraph>
      <Paragraph position="11"> The out value of the subsidiary daughter is passed to the mother category's store. Thus the mother contains a record both of the distinguished daughter's store, and what has been added to it by the subsidiary daughter.</Paragraph>
      <Paragraph position="13"> This rule illustrates what to do when the distinguished daughter precedes the subsidiary one. Otherwise, things are exactly analogous. Of course, if both of these rules are used, there will be a lot of ambiguity in these &amp;quot;sentences&amp;quot;: they are just to illustrate the different possibilities.</Paragraph>
    </Section>
    <Section position="3" start_page="313" end_page="314" type="sub_section">
      <SectionTitle>
7.1 Implementation
</SectionTitle>
      <Paragraph position="0"> This approach to partial ordering can be implemented by requiring the grammarian to make linear precedence declarations encoding the partial orderings. (If grammars obey the &amp;quot;Exhaustive Constant Partial Ordering&amp;quot; property (Gazdan et al. 1985, 49) one global statement will be sufficient). Then, for each domain, the relevant rules have to be annotated with an indication of the daughter that is to be treated as the distinguished one.</Paragraph>
      <Paragraph position="1"> We define (for each domain) five features (earlier called store, left, right, in, and out) whose values will be tuples of length N, where there are N different categories figuring in the partial order declaration. The members of the tuple will be categories, each associated with a fixed position, or a negative element (here represented as no)</Paragraph>
    </Section>
    <Section position="4" start_page="314" end_page="315" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> which will not unify with any of these categories. Intuitively, left and right encode what can precede or follow the category they appear on; in and out encode what actually does precede or follow; and store encodes the information to be passed up the tree.</Paragraph>
      <Paragraph position="1"> Now when compiling the grammar (and lexicon), for each category figuring in a linear precedence statement Ca &lt; Cb, do the following:</Paragraph>
      <Paragraph position="3"> where no is in the position associated with Cb and all other positions have an anonymous variable; add to Cb the feature specification right=(... ,no .... ) where no is in the position associated with Ca and all other positions have an anonymous variable; add to Ca/b the feature specifications</Paragraph>
      <Paragraph position="5"> associated with Ca/b and the other positions in these two features are linked by shared variables X1 * * * X, as indicated.</Paragraph>
      <Paragraph position="6"> Finally, for each annotated rule with distinguished daughter D, mother M, and  subsidiary daughter S: 1. put store=X on M and out:X on S 2. put store=Y on D 3. if D &lt; S put in=Y, left=Y on S; if S &lt; D put in=Y, right=Y on S.  Macros, perhaps automatically generated by the compiler in response to the declaration, can be used to effect these feature constraints economically. 8. Threading and Set Valued Features The threading technique can also be used to implement some of the effect of set valued features. We represent a set as a tuple of values, e.g. (a, b, c, d). Each member of the set encodes its presence by changes to this tuple on an +-n and out feature: thus a would have (no,B,C,D) as its in value and (a,B,C,D) as its out value. Then on the category representing the domain within which all the members of the set are to be found, we give (no,no,no,no) as the value of in, and (a,b,c,d) as the value of out. These values will be satisfied if and only if all the members of the set have been encountered, in any order.</Paragraph>
      <Paragraph position="7"> Here is a small grammar which implements a kind of set-valued subcategorization analysis. The language consists of sequences of a verb (vabcd, vbcd, or vbd) followed by the things it is subcategorized for, in any order: e.g.</Paragraph>
      <Paragraph position="8">  vabcd a b c d vabcd b a d c, etc.</Paragraph>
      <Paragraph position="9"> *vabcd a b c ~. d missing *vabcd a b d c d Y. too many ds  Computational Linguistics Volume 22, Number 3 Here are the categories (for simplicity regarded as lexical) that can appear on subcat lists:</Paragraph>
      <Paragraph position="11"> Here are the verbs: x : {lex=vabcd, in= (a,b, c ,d), out= (no ,no ,no ,no) } x : {lex=vbcd, in= (no, b, c, d) , out = (no, no, no, no) } x : {lex=vbd, in= (no ,b ,no ,d) , out = (no ,no ,no ,no) } Notice that by putting the negative element in the relevant position on both the in and out tuple we require that that member of the set should not be found at all. And now the rules:</Paragraph>
      <Paragraph position="13"> This rule unifies the in and out values to make sure that what was found was what was being sought; x might typically be a VP, for example, and this identification of feature values would take place on the s==&gt; \[np, vp\] rule.</Paragraph>
      <Paragraph position="14"> x: {in=In,out=Out} ==&gt; \[x : {in=In, out=Nxt }, y : {in=Nxt, out=Out}\] This is like a subcat schema which combines an x-projection with a y-complement, threading the appropriate information.</Paragraph>
      <Paragraph position="15"> This simple technique can be used to implement many of the kinds of analysis that might be thought to require set valued features, although at a small cost of adding some extra features and values to a grammar. It can also be combined with the preceding treatment of linear precedence to enforce a partial ordering on members of the set.</Paragraph>
      <Paragraph position="16"> Set valued features are often used in conjunction with a membership test. It is usually possible to achieve the same effect by inventing a new bool_comb value feature and using disjunction. For example, if our original features involved sets whose possible members were {a b c d e f} and had feature specifications of the form f=X, where member(X,{b c d e}), then the same effect can be achieved by declaring f to be a bool comb value feature with values in {a,b,c,d,e,f} and writing f=(b;c ;d;e). In the case that the members of the set in question are categories, then some new atomic feature values have to be invented to represent these, as is often necessary in other contexts also (Engelkamp, Erbach, and Uskoreit 1992).</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="315" end_page="324" type="metho">
    <SectionTitle>
9. Reducing Lexical Disjunction
</SectionTitle>
    <Paragraph position="0"> This section describes two techniques for eliminating multiple lexical entries for the same word. Having multiple lexical entries for the same word is a form of disjunction, and all forms of disjunction entail increased nondeterminism leading to inefficiency in analysis. It is therefore a good idea to eliminate multiple entries as far as is possible.</Paragraph>
    <Section position="1" start_page="315" end_page="316" type="sub_section">
      <SectionTitle>
9.1 Selectors
</SectionTitle>
      <Paragraph position="0"> A frequently occurring case is the following: a particular word, W, has multiple possible realizations of some property, P1 . * * Pn. Which particular realization is found will</Paragraph>
    </Section>
    <Section position="2" start_page="316" end_page="317" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> depend on the context: in context C1 we find P1, and, more generally, in Ci we will find Pi.</Paragraph>
      <Paragraph position="1"> A simple though rather artificial illustration of this phenomenon might be a treatment of the semantics of prepositions that regarded them as ambiguous between different senses, depending on which type of NP they combined with. For example, we might regard for as having these meanings: 'for_benefactive' with animate NP: The book is for John 'for_time_period' with temporal NPs: He stayed for an hour 'for_directional' with locative NPs: They changed direction for the coast Here the Pi are the different meanings, and the Ci are the different types of NP. The simplest way to achieve the desired result is to have multiple entries for the preposition, one for each sense. We then treat the correlation of the properties with the contexts as a kind of agreement, between some feature on the preposition and one on the NP. Some sample lexical entries, and a rule for combining a P and an NP to make a PP might look like this: p:{lex=for, sem=for_benefactive, type=animate} p: {lex=for, sem=for_time_period, type=temporal} p: {lex=for, sem=for_directional, type=locative} pp:{sem=lambda(X,\[S,X,NP\])} ==&gt; \[p:{sem=S,type=T},np:{type=T,sem=NP}\] Unfortunately, such a treatment can lead to large numbers of lexical entries, which, especially if they are phrasal heads, as in this case, can each generate a separate parsing hypothesis for any occurrence of for in the input.</Paragraph>
      <Paragraph position="2"> A better treatment can be obtained by using the fact that it is the NP that determines the P semantics, and encoding this dependency directly. What we need to do is to make the NP select the appropriate prepositional semantics, representing all the choices within a single lexical entry. We can do this in the following way: . Encode the set of possible semantic values for the preposition as a list or a tuple, where each position in the tuple is going to correspond systematically to a particular type of NP.</Paragraph>
      <Paragraph position="3"> p:{lex=for,semvalues=(for_benefactive,for_time_period...)...} . Use the original seril feature to represent the semantic value that will be chosen when the P is combined with an NP: p:{lex=for,sem=Chosen, semvalues=(for_benefactive,for_time_period,for_directional)...} . Associate with each different type of NP and other relevant categories a selector feature whose value will pick the appropriate member of the tuple on the P. Some illustrative rules and entries are:  np:{type=T,selector=S .... } ==&gt; \[det:{...},n:{type=T,selector=S .... }\] n:{lex=john,type=animate,selector=(X,(X .... ))} n:{lex=week,type=temporal,selector=(X,(_,X,_))} n:{lex=coast,type=locative,selector=(X,( .... X))}  Computational Linguistics Volume 22, Number 3 . Now on the rule that combines a P and an NP to form a PP, use the selector feature to choose the appropriate semantics for the P: pp:{sem=lambda(X,\[Chosen,X,NP\])} ==&gt; \[p:{sem=Chosen,semvalues=Tuple}, np:{sem=NP,selector=(Chosen,Tuple)}\] The value of the sere feature on the P will be the first, second, or third member of the tuple, depending on the type of the NP. This will arise because the selector on the NP will unify the Chosen variable with the position on the tuple identified by its shared variable, X.</Paragraph>
      <Paragraph position="4"> This simple device enables us to have a single entry for each preposition, while still allowing for it to have multiple senses conditional upon the type of NP it combines with. The technique has a wide variety of applications and can be astonishingly effective in reducing the number of explicit alternative entries or rules that need to be written, at the cost of a few extra features that cost nothing in terms of processing time.</Paragraph>
    </Section>
    <Section position="3" start_page="317" end_page="318" type="sub_section">
      <SectionTitle>
9.2 Implementation
</SectionTitle>
      <Paragraph position="0"> As with many of the techniques described here, implementation by way of a compiled out notation can be complex if the features involved interact with other aspects of linguistic description. If we assume that they do not (which can usually be enforced by defining a new &amp;quot;shadow&amp;quot; feature that simply duplicates the information where it is needed) then an attractive and clean way of implementing this technique is as a conditional constraint on feature values.</Paragraph>
      <Paragraph position="1"> There are various notations one could employ: one possibility for the above example is the following, where psem and type are assumed not to figure in any other such statement, and where their total range of values is given by the conditionals (such restrictions could be relaxed to some extent given agreed conventions or extra declarations): pp : {sem=lambda (X, \[PSem, X, NP\] ) } ==&gt; \[p : {psem=PSem}, np: {sem=NP, type=Type) }\] where if Type = animate then PSem = for_benefactive if Type = temporal then PSem = for_time_period if Type = locative then PSem = for_directional Now the compiler has enough information to be able to proceed automatically: .</Paragraph>
      <Paragraph position="2"> .</Paragraph>
      <Paragraph position="3"> construct a values feature whose value will be a tuple of the values of psem in a canonical order. Put this feature specification on the P category. More generally, put this specification on that category of the rule introducing the conditional constraints which contains the feature specification figuring in the consequents of the conditional constraints. construct a selector feature whose values will be of the form (X, ( .... X .... )) where the second member of the tuple is a tuple of the same length as that in the values feature. On each category where a type feature specification is present, add the selector feature also. If the type feature is instantiated, then the selector feature will be of the form indicated by the conditional constraint: that is, the X in the second component of the tuple will be in the position corresponding to the value</Paragraph>
    </Section>
    <Section position="4" start_page="318" end_page="318" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> .</Paragraph>
      <Paragraph position="1"> of the psem feature given by the conditional, and all other positions will be anonymous variables. If the type feature is simply passed from one category to another, as it is for example on the NP rule given earlier, then the selector feature must likewise be coindexed on the two categories. On the categories of the rule introducing the constraint, coindex the feature specifications as follows:</Paragraph>
      <Paragraph position="3"> Again, macros can be used to make it possible to express all this economically.</Paragraph>
    </Section>
    <Section position="5" start_page="318" end_page="320" type="sub_section">
      <SectionTitle>
9.3 Subcategorization
</SectionTitle>
      <Paragraph position="0"> Perhaps the most obvious source of lexical disjunction is subcategorization. Most verbs can appear with several different types of complement, and some verbs appear with many. For example, the verb send in English can occur in at least the following configurations (there is some dialect variation here, but please bear with me for the sake of the example): John sent a letter.</Paragraph>
      <Paragraph position="1"> John sent a letter to Mary.</Paragraph>
      <Paragraph position="2"> John sent Mary a letter.</Paragraph>
      <Paragraph position="3"> John sent out a letter.</Paragraph>
      <Paragraph position="4"> John sent a letter out.</Paragraph>
      <Paragraph position="5"> John sent out a letter to Mary.</Paragraph>
      <Paragraph position="6"> John sent a letter out to Mary John sent Mary out a letter.</Paragraph>
      <Paragraph position="7"> John sent out Mary a letter.</Paragraph>
      <Paragraph position="8"> There are nine distinct configurations here. Let us ignore the fact that some alternations might be capturable by rule, and let us also ignore the fact that different semantic properties might be involved. Given this, it would be nice to be able to have a single entry for the verb send that encapsulated all these alternatives, rather than listing them all as separate lexical entries, as is done in all grammatical formalisms I am familiar with (except of course those that allow explicit disjunction). In a GPSG-like approach to subcategorization (Gazdan et al. 1985), each distinct type of complement has a separate rule. Thus we will have rules, schematically, like:</Paragraph>
      <Paragraph position="10"> etc.</Paragraph>
      <Paragraph position="11"> Using the technique described earlier for encoding Boolean combinations of feature values, we could achieve the desired single entry for send very simply. Rather than  Computational Linguistics Volume 22, Number 3 use numbers to represent the different subcategorization possibilities, we will have an atom with some mnemonic content: {np, np_pp, np_np, np_pnp .... }. Then we define a feature that can take as values Boolean combinations from this set, subcat, and write: v:{lex=send,subcat=(np;np_pp;np np; . . ) . . } The various VP rules are recast using the mnemonic symbols:</Paragraph>
      <Paragraph position="13"> etc.</Paragraph>
      <Paragraph position="14"> Now one entry for each verb will subsume all the possible subcategorization combinations for it.</Paragraph>
      <Paragraph position="15"> This technique certainly reduces the number of items in the lexicon. Unfortunately, it does not necessarily reduce the amount of nondeterminism during analysis. Although there is only a single entry for send, it will, on either a left-corner or head-driven approach to parsing, initiate parsing hypotheses for each distinct VP rule whose head unifies with it. That will be exactly the same number of parsing hypotheses as we would have had with the original GPSG treatment, and so there is no obvious advantage here.</Paragraph>
      <Paragraph position="16"> Nevertheless, this technique should not be scorned, for in other cases, there will be some advantage gained. For example, in derivational morphology the presence of multiple entries for verbs like send can cause unwanted ambiguity. The word sender, for example, would be nine ways ambiguous, given a rule like: n:{} ==&gt; \[v:{}, affix:{lex=er}\] With just one entry for send this problem goes away. (Note that one cannot get round the ambiguity problem by just restricting the agent nominalization rule to one or two types of subcategorization: many different types of verbal complement may be involved: sleeper, designer, thinker, etc.) As we have seen, the GPSG treatment of subcategorization involves many VP rules. A currently more favored approach is to use a single VP rule or schema or subcat principle, and a list of categories subcategorized for by a verb:</Paragraph>
      <Paragraph position="18"> etc.</Paragraph>
      <Paragraph position="19"> Multiple applications of this schema use up subcategorized elements one at a time, with a requirement that when the VP is combined with a subject to form a sentence the subcat list is empty (or contains just one category unifiable with the subject, depending on the approach taken). The tree for a VP will look like:</Paragraph>
      <Paragraph position="21"> This approach requires multiple entries for verbs, but has the advantage that it eliminates the need for different VP rules for each type of complement.</Paragraph>
      <Paragraph position="22"> It would be nice to f{nd some way of combining this single-schema approach with a single subcategorization entry subsuming multiple possibilities. This would eliminate</Paragraph>
    </Section>
    <Section position="6" start_page="320" end_page="322" type="sub_section">
      <SectionTitle>
Pulrnan Unification Encodings
</SectionTitle>
      <Paragraph position="0"> nondeterminism completely, even for verbs capable of appearing with many different types of complement. Although the details are rather complex, it turns out that it is possible to achieve this by combining the Boolean encoding technique in conjunction with the use of selectors as previously described. Unfortunately, there are some limitations on the amount of subcategorization information that can be expressed by the resulting technique: in particular, categories have to be represented by atoms, which is an inconvenient limitation. Nevertheless, for many purposes where efficiency of processing is at a premium, it could be worth living with this limitation.</Paragraph>
      <Paragraph position="1"> First of all, consider how to represent the various subcategorization possibilities of a verb like send, using Boolean combinations of atoms. (I have omitted as many parentheses as possible in the interests of readability. Assume that ; takes precedence over a unless parentheses indicate otherwise.) It might seem that something like:  p a np a np) would accurately describe the possibilities. (This Boolean expression can, of course, be written more compactly by using a few more disjunctions).</Paragraph>
      <Paragraph position="2"> The VP schema that we need will then have to be of the following form. Note that since we need to be able to generalize over categories, we are reverting to the basic (untyped) category notation.</Paragraph>
      <Paragraph position="3"> schema: {cat=vp,subcat=S} ==&gt; \[{cat=vp,subcat=S\]},{cat=S}\] sample entry: {cat=vp,lex=send,subcat=(np; np ~ pp; np ~ np;...)} (Note that this makes cat a Boolean combination feature. Given the importance of the cat feature for efficient indexing and lookup this might be, for practical purposes, unwise. A better implementation would use a new feature).</Paragraph>
      <Paragraph position="4"> A moment's reflection should reveal that this first attempt will not give the correct results, for two reasons. Firstly, the various different orderings are not properly encoded here (because p ~ q is logically equivalent to q g~ p). Secondly, there is no encoding of the requirement to find the correct number of subcategorized entities (because p a p is logically equivalent to p). Thus nothing would prevent us from successfully analyzing a sentence like John sent Mary out Mary a letter to Mary a letter, with too many complements, or John sent out, with too few.</Paragraph>
      <Paragraph position="5"> Let us tackle the ordering problem first. We can solve this by adding new symbols representing the product of the set of relevant categories np, p, pp and the set of positions 1,2,3 after the verb in which they occur. We then define a Boolean feature value type for the feature subcat as follows: {npl,pl,ppl}*{np2,p2,pp2}*{np3,p3,pp3} We encode the subcategorization possibilities in the obvious way, using these new symbols. (This time I have used disjunction to give a more compact encoding.)  Computational Linguistics Volume 22, Number 3 {lex=send,cat=vp, subcat=(npl;</Paragraph>
      <Paragraph position="7"> We will define a new feature that appears on every category that can be subcategorized for, say scat, whose values are tuples. An NP, for example, will have scat=(npl ,np2, np3). Notice that the components of the tuple are values that can appear in Boolean combinations, for they must be of the same type as the subcat feature. In order to pick the correct value for the position in question, we associate with the verb a feature whose value is a list of the constructs called selectors that we used earlier. Each selector picks out a position in the complement corresponding to the position of the selector in the list: the first selector on the list will pick out npl for an NP, ppl for a PP; the second will pick out np2 for an NP, pp2 for a PP, and so on. The feature and value will be of the form: selectors= \[(A, (A .... )), (B, (-,B,_)), (C, ( .... C))\].</Paragraph>
      <Paragraph position="8"> The VP rule schema now uses the current selector to choose the appropriate symbol from the complement it is combining with. It pops selectors off the list each time it applies so that the correct positional encoding is available for the next application.</Paragraph>
      <Paragraph position="9"> {cat=vp, subcat=S,selectors=Rest} ==&gt; \[{cat=vp, subcat=S, selectors = \[ (S, X) l Rest\] }, {cat=_, scat=X}\]  Since the selector list guarantees that the symbols npl etc. are only found in the correct position after the verb, this solves the ordering problem. Although npl &amp; np2 is logically equivalent to np2 g~ npl, the selector list will not allow the second ordering to be found, because this would involve an attempt to unify npl with np2. The use of selectors to encode position also solves some cases of the problem that our first attempt suffered from, of allowing more than the correct number of complements. The selector list will not allow more than three complements of send to be found. Unfortunately, the treatment so far will still allow fewer than three complements to be found even where another is needed for the sequence to be grammatical. For example, the sentence John sent out will be parsed successfully (as indeed will John sent) because no conflict with any of the subcategorization possibilities has been encountered. The way the Boolean encoding works has to allow for elements to be conjoined one at a time, but it cannot require that all the elements are present simultaneously, for this very reason. The way to solve this problem is to expand our Boolean combination of subcat feature values to include some special finish symbols.</Paragraph>
      <Paragraph position="10"> {npl,pl,ppl}*{np2,p2,pp2}*{np3,p3,pp3}*{fl,f2,f3,f4} There is one symbol for each possible subcat position, plus an extra one to mark the end of the list. We have to extend our various selectors and the lists they appear in to accommodate this fourth position.</Paragraph>
    </Section>
    <Section position="7" start_page="322" end_page="322" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> The intuitive motivation behind this move is to regard the completion of a subcat requirement as being signaled by a special finish category. However, that category need not actually be present: its marker can instead be introduced by the rule that combines a completed VP with a subject to make a sentence. (This is analogous with the treatment described earlier in which this rule required a subcat list to be empty at this point).</Paragraph>
      <Paragraph position="1"> To implement this analysis, we enter into the various subcategorization possibilities the information about which position marks their completion: {lex=send, cat=vp, selectors=\[(A,(A ...... )), Y~ each selector now has 4 positions (B, (_,B .... )), (C, ( .... C,_)), (D,( ...... D))\], Z and there are 4 selectors in the list</Paragraph>
      <Paragraph position="3"> An intransitive verb would of course just be subcat=fl.</Paragraph>
      <Paragraph position="4"> Our VP rule schema is exactly as before. For the right results to be obtained, however, we now need to assume the presence of some rule like the following to close off the subcategorization: {cat=s} ==&gt; \[{cat=np}, {cat=vp, subcat=S, selectors= \[ (S, (f I, f 2, f3, f4) ) I _\] }\] This will add the finish marker of the appropriate position to the subcat value of the VP. This unification will only succeed if the verb is subcategorized to finish at that point, and we will not have reached this point unless all the other elements subcategorized for have been found in the correct order. So, using selectors and Boolean combinations of feature values together we have developed an analysis that completely eliminates disjunction and hence non-determinism. It will, of course, generalize to any other area having the same structural properties.</Paragraph>
    </Section>
    <Section position="8" start_page="322" end_page="323" type="sub_section">
      <SectionTitle>
9.4 Implementation
</SectionTitle>
      <Paragraph position="0"> The general features of the implementation of this technique are as follows.</Paragraph>
      <Paragraph position="1"> . we need to know the subcategorized-for categories, the symbols used to identify them, and the maximum number that can occur in a single verb-complement construction. This might conveniently be stated by a declaration something like: subcategorization_feature(Name,Categories,Mnemonics,MaximumLength).</Paragraph>
      <Paragraph position="2"> This will allow us to automatically construct the selector list. For a maximum number of four, this will take the form: \[(A, (A ...... )), (B, (_,B .... )), (C, ( .... C,_)), (D, ( ...... D))\] The declaration will also allow us to work out the mnemonic values (npl ,fl, etc) needed for the scat and subcat features (types of  Computational Linguistics Volume 22, Number 3 booZ_comb_value feature). Rules or lexical entries that build a member of Categories must have the scat feature added with tuple values like (npl ,np2,np3) etc.</Paragraph>
      <Paragraph position="3"> On lexical entries for subcategorizers, the subcat value can be stated as a simple list of possibilities: subcat= \[\[np\] , \[np, pp\] , etc.</Paragraph>
      <Paragraph position="4"> .</Paragraph>
      <Paragraph position="5"> or some convenient abbreviatory notation could be devised. These values should then be compiled to Boolean combinations of the corresponding mnemonic atoms. The tuple-valued selectors feature needs to be added to these entries with the value already illustrated: this can be done automatically, given the declaration.</Paragraph>
      <Paragraph position="6"> The rule that encodes the combination of subcategorizer and subcategorized has to be identified, and feature specifications of the following form added: {...subcat=S, selectors=Rest,...} ==&gt;</Paragraph>
      <Paragraph position="8"> . The rule that closes off the subcategorization needs to have the relevant selectors value added, as in the example above.</Paragraph>
      <Paragraph position="9"> With suitable generation of macros by the compile~ our example might then be written by the grammarian as: declaration: subcategorization feature(subcat,\[{cat=np},{cat=p},{cat=pp}\],\[np,p,pp\],4). subcat schema rule:  where subcat_close_macro(CloseSubcat).</Paragraph>
      <Paragraph position="10"> sample entries: {lex=send,cat=vp,subcat=\[\[np\], \[np,pp\], \[np,np\] etc.\]} {lex=give, eat=vp, subcat=\[\[np,np\],\[np,pp\], etc.\]}</Paragraph>
    </Section>
    <Section position="9" start_page="323" end_page="324" type="sub_section">
      <SectionTitle>
9.5 Limitations
</SectionTitle>
      <Paragraph position="0"> As mentioned earlier, there are some limitations associated with this technique. Because of the type of Boolean mechanism we are using, we are restricted to atomic symbols to represent the subcategorized-for elements. Putting into lexical entries the kind of refined subcategorization information that we often do using features is not possible, or at least not possible without expanding the vocabulary of symbols like npl, pp2, etc. to induce a finer partition among instances of the categories in question. This is, of course, a serious limitation, especially for theories of grammar that are largely lexically based. However, where all the categories that figure in subcategorization will</Paragraph>
    </Section>
    <Section position="10" start_page="324" end_page="324" type="sub_section">
      <SectionTitle>
Pulman Unification Encodings
</SectionTitle>
      <Paragraph position="0"> have some of the same features (e.g., those used for threading gaps) then these can be incorporated directly into, in our case, the VP subcategorization schema rule.</Paragraph>
      <Paragraph position="1"> Another problem is that since we need to be able to generalize over whole categories, we cannot, as things stand, use compilation into terms for feature structures.</Paragraph>
      <Paragraph position="2"> One way round this is to change the VP schema so that complements are characterized not just by a variable but by an explicit new category, say xcomp, with a bool_comb_value feature on it that can serve to identify categories. We then introduce rules expanding xcomp as the &amp;quot;real&amp;quot; category corresponding to that feature. This may in turn re-introduce some inefficiency, since there will be an extra level of structure that is not linguistically motivated.</Paragraph>
      <Paragraph position="3"> A final limitation, which is perhaps more theoretically defensible, is that we are forced to be absolutely and strictly compositional in assembling the semantics of verb phrases grouped under the same subcategorization treatment. Since we have only one entry for a verb, then any semantic differences that are associated with variant subcategorizations will have to be built from the complement constituents in a completely compositional way.</Paragraph>
      <Paragraph position="4"> Alternatively, as is done in many wide-coverage systems for efficiency reasons, syntactic and semantic analysis can be separated into consecutive stages. This can have a further advantage in that now the same technique can be used to eliminate disjunction for words where there is sense ambiguity but no syntactic ambiguity. If different lexical entries are assigned to the content words in the following sentence because they differ semantically but not syntactically, then the sentence will have 16 parses (8 * 2 for the attachment ambiguity) to be disambiguated.</Paragraph>
      <Paragraph position="5"> They saw the ball near the bank.</Paragraph>
      <Paragraph position="6"> (saw = see, or cut wood; ball = round thing, or dance; bank = edge of river, or financial institution).</Paragraph>
      <Paragraph position="7"> If sense selection is instead performed when syntactic processing is completed, on the assumption that the words involved do not differ syntactically, then there will only be two parses and three lexical disambiguation decisions. In general we will only be dealing with the sum and not the product of the syntactic and semantic ambiguity.</Paragraph>
      <Paragraph position="8"> Under such a processing regime the appropriate sense entry for a verb on a particular subcategorization can be simply and cheaply selected (since the complete complement will be there), and the benefits of the preceding analysis for syntactic processing will be retained.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML