File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/j92-2001_metho.xml
Size: 25,715 bytes
Last Modified: 2025-10-06 14:13:11
<?xml version="1.0" standalone="yes"?> <Paper uid="J92-2001"> <Title>Inheritance in Word Grammar</Title> <Section position="5" start_page="143" end_page="148" type="metho"> <SectionTitle> 3 The last decade has seen increased interest in dependency grammar among computational linguists. </SectionTitle> <Paragraph position="0"> Dependency grammar has been applied in the experimental parsing systems of Hellwig (1986), Sigurd (1989), and Covington (1991); in the 'Kielikone' natural language interface of Jappinen, Lassila, and Lahtola (1988); in the machine translation systems of EUROTtLA (Johnson, King, and des Tombe 1985), DLT (Schubert 1987), Charles University (Sgall and Panevov~ 1987), and IBM Tokyo (Maruyama 1990); and in the speech recognition system of Giachin and Rullent (1989). Parsers based on the theories of Lexicase (Starosta and Nomura 1986) and Word Grammar (Fraser 1989; Hudson 1989) have also been developed.</Paragraph> <Paragraph position="1"> Norman M. Fraser and Richard A. Hudson Inheritance in Word Grammar depend. Here, then, we already have a simple example of default inheritance: (30a) word has 1 head.</Paragraph> <Paragraph position="2"> (30b) wl isa word.</Paragraph> <Paragraph position="3"> (30c) so: wl has 1 head.</Paragraph> <Paragraph position="4"> On the other hand, for w2, the finite verb didn't, this general rule is blocked to allow it to occur without a head (i.e. to make the head optional, '\[0-1\] head'). This analysis assumes that obligatory ('1') and optional ('\[0-1\]') conflict, so the former must be suppressed by (31d)J (31a) finite verb has \[0-1\] head.</Paragraph> <Paragraph position="5"> (31b) w2 isa finite verb.</Paragraph> <Paragraph position="6"> (31c) so: w2 has \[0-1\] head.</Paragraph> <Paragraph position="7"> (31d) NOT: finite verb has 1 head.</Paragraph> <Paragraph position="8"> (31e) so: NOT: w2 has 1 head.</Paragraph> <Paragraph position="9"> If the rule about having one head per word allows exceptions in one direction, we may expect exceptions in the other direction as well: words that have more than one head. This is not allowed in other versions of dependency grammar, 5 but in WG it is the basis for our analysis of a range of important constructions: raising, control, extraction, and passives (not to mention coordination, which is often allowed as an exception by other theories). For example, in Mary didn't jump, we recognize Mary as the subject not only of didn't but also of jump, so Mary has two heads, contrary to the general rule.</Paragraph> <Paragraph position="10"> (32) subject If subject~ ~ xcomplement 1 Mary didn ' t jump 4 This analysis may in fact be more complicated than it needs to be. We could allow finite verbs to inherit the regular '1 head' simply by not blocking it, and allow for '0 head' by an extra rule, which provides the other alternative.</Paragraph> <Paragraph position="11"> 5 The notion of a word with two heads is meaningless in theories based on phrase structure, because 'head' is used there in relation to phrases, not words. The X-bar 'head' corresponds to our 'root,' the word in a phrase that has no head inside that phrase. It is true that some linguists have suggested that a phrase might have more than one head (e.g. Warner 1987), and this has been a standard analysis of coordinate structures since Bloomfield (1933); but this is very different from a single word having more than one head.</Paragraph> <Paragraph position="12"> Computational Linguistics Volume 18, Number 2 (In a dependency diagram, the arrow points towards the dependent.) This is permitted by a proposition which, at least by implication, overrides the general rule, and which refers to the grammatical function 'xcomplement': 6 (33) subject of xcomplement of word = subject of it.</Paragraph> <Paragraph position="13"> In other words, a word may have two heads provided that one of them is the xcomplement of the other. (We return below to the relations among the grammatical functions such as 'subject' and 'xcomplement').</Paragraph> <Paragraph position="14"> The possibility of having more than one head is related to another important generalization, namely that heads and dependents are usually adjacent. If we think of each word as defining a 'phrase,' made up of that word plus any words subordinate to it, this is equivalent to the PSG ban on discontinuous phrases. In the simple cases, then, the following generalization is true: (34) position of word = adjacent-to head of it.</Paragraph> <Paragraph position="15"> An operational definition of 'adjacent-to' checks that no word between the words concerned has a head outside the phrase: (35a) A is adjacent-to B iff every word between A and B is a subordinate of B.</Paragraph> <Paragraph position="16"> (35b) A is a subordinate of B iff A is B or A is a dependent of a subordinate of B.</Paragraph> <Paragraph position="17"> But what if a word has more than one head? This normally leads to a discontinuity; e.g. in Mary didn't jump, the phrase rooted in jump consists of Mary jump, but does not include didn't. Saying that Mary jump is discontinuous is the same as saying that Mary is not adjacent to one of its heads, jump. Interestingly, though, Mary does have one head to which it is adjacent (viz didn't), and more generally the same is true of all discontinuities: even if a word has some nonadjacent heads, it also has at least one to which it is adjacent. We can therefore keep our generalization (34) in a slightly revised form, with 'a head' (one head) rather than 'head' (every head): (36) position of word = adjacent-to a head of it.</Paragraph> <Paragraph position="18"> This generalization is inherited by every word, so every word has to be adjacent to at least one of its heads. This treatment of discontinuity has many important ramifications that cannot be explored fully here.</Paragraph> <Paragraph position="19"> The generalizations discussed in this section have referred crucially to grammatical functions. 7 In some cases these were the functions 'dependent' and 'head,' but we also mentioned &quot;subject' and 'xcomplement.' The functional categories are arranged in an 6 The name 'xcomplement' is borrowed from Lexical Functional Grammar. The term used in earlier WG literature is 'incomplement.' 7 As in LFG, the term 'function' is used here in both its mathematical and grammatical senses, but (unlike LFG) with a single word as the argument; so in expressions such as 'head of X' or 'subject of X,' X is always some word or word type rather than a phrase.</Paragraph> <Paragraph position="20"> inheritance hierarchy, and the one for English is shown (in part) in Figure 2. This hierarchy allows generalizations to be made about different types of dependent at the most appropriate level. As with the hierarchy of word classes, we are sure that some of these categories are specific to languages like English, and not universal, but others seem to be very widespread or universal.</Paragraph> <Paragraph position="21"> Generalizations about word order are perhaps the clearest examples of generalizations that take advantage of the hierarchical organization of grammatical functions in WG. Proposition (37) states the default word order of English (i.e. English is a head-first language).</Paragraph> <Paragraph position="22"> (37) position of dependent of word = after it.</Paragraph> <Paragraph position="23"> Although this generalization has important exceptions, it is clearly true of 'typical' dependencies in English; for example, in a running text we find that between 60% and 70% of dependencies are head-first.</Paragraph> <Paragraph position="24"> The exceptional order of those dependent types that typically precede their heads is handled by the propositions shown in (38), referring to the super-category 'prede- null pendent.' (38a) position of predependent of word = before it.</Paragraph> <Paragraph position="25"> (38b) NOT: position of predependent of word = after it.</Paragraph> <Paragraph position="26"> The usual machinery of default inheritance applies, so that (38b) blocks the normal head-first rule, and (38a) replaces it by the exceptional one. There are just a few constructions that allow a dependent to precede its head, one of which is the subject-verb pair. 8 8 As one of our readers commented, if pressure toward consistency were the strongest pressure on language development, we should expect VSO languages to outnumber SVO, but of course they do not (about 40% of the world's languages are said to be SVO, compared with only 10% VSO). One explanation for this is presumably tile strong tendency for subjects to be more topical than verbs, but it remains as a challenging area for research.</Paragraph> <Paragraph position="27"> Computational Linguistics Volume 18, Number 2 One of the most important applications of default inheritance in WG syntax is in the distinction of 'derived' from 'underlying' or 'basic' patterns. The general point is that underlying patterns are allowed by the most general rules, and are therefore most typical; whereas derived patterns involve rules that override these, so they are exceptional. In this way we can capture the different statuses of these patterns in a completely monostratal analysis and without the use of special devices such as transformations, lexical rules, or metarules.</Paragraph> <Paragraph position="28"> Take for instance the rules given in the Appendix for inverted subjects.</Paragraph> <Paragraph position="29"> (39a) tensed polarity-verb has 1 sv-order.</Paragraph> <Paragraph position="30"> (39b) sv-order of verb = {/: s+v, v+s}.</Paragraph> <Paragraph position="31"> (39c) position of dependent of word = after it.</Paragraph> <Paragraph position="32"> (39d) position of predependent of word = before it.</Paragraph> <Paragraph position="33"> (39e) NOT: position of predependent of word -- after it.</Paragraph> <Paragraph position="34"> (39f) NOT: position of subject of v+s verb = before it.</Paragraph> <Paragraph position="35"> (39g) NOT: NOT: position of subject of v+s verb = after it.</Paragraph> <Paragraph position="36"> The first two rules allow us to distinguish tensed polarity-verbs according to whether their subject precedes ('s+v') or follows ('v+s') them. 9 This allows us to treat 'v+s verb' as an exception to the general rule that subjects precede their head, which is in turn an exception to the generalization that words follow their heads. This system allows us to generate a sentence such as Did Mary jump? with just one syntactic structure, free of empty positions, while still showing 1deg that it is a less normal construction than a sentence such as Mary did jump. In parsing terms, the only problem is to find and apply the necessary propositions; there is no need to reconstruct any kind of abstract structure for the sentence itself.</Paragraph> <Paragraph position="37"> The use of 'NOT' rules for overriding defaults finds support in the fact that the 'NOT' rule in (39e) is also crucial for solving at least two other major problems, namely passives and extraction. In a passive sentence like (40), WG handles object-promotion by analyzing the subject as also being the object. This is achieved by means of proposition (41a) (which is slightly simplified).</Paragraph> <Paragraph position="38"> (40) Mary was kissed by John.</Paragraph> <Paragraph position="39"> (41a) subject of passive verb = object of it.</Paragraph> <Paragraph position="40"> (41b) NOT: position of predependent of word = after it.</Paragraph> <Paragraph position="41"> The problem is that Mary, as the object of kissed, ought to follow it, but since Mary is also the subject this requirement is overridden by proposition (39e=41b), so Mary never inherits the need to follow kissed. 11 9 See Hudson (1990: 215-6) for a discussion of the apparent lack of morphological consequences of the subject-position feature.</Paragraph> <Paragraph position="42"> 10 The exceptionality of an inverted subject is shown by the negative proposition 'NOT: NOT: position of subject of v+s verb = after it.' This proposition is inherited by the word tokens concerned--i.e, it is part of the analysis of the sentence itself, and not just available in the grammar. 11 It may not be obvious exactly how this works. How does (41b) stop the object of a passive from following the verb, given that it refers to 'predependent,' which does not subsume 'object'? The answer lies in (41a): the object of a passive verb is also its subject, so any rule (such as (41b)) that applies to the subject also, ipso facto, applies to its object.</Paragraph> <Paragraph position="43"> Norman M. Fraser and Richard A. Hudson Inheritance in Word Grammar A similar approach is used to handle extraction. For example, consider sentence (42). (42) Salesmen I distrust.</Paragraph> <Paragraph position="44"> Here salesmen must be an object of distrust, so the order should be I distrust salesmen, but in this case we also recognize a special kind of predependent relation ('visitor,' roughly equivalent to 'Comp') between distrust and salesmen, so once again the word order conflict between these two relations can be resolved. The rules are given in (43). (43a) finite verb has \[0-1\] visitor.</Paragraph> <Paragraph position="45"> (43b) visitor isa predependent.</Paragraph> <Paragraph position="46"> (43c) position of predependent of word = before it.</Paragraph> <Paragraph position="47"> (43d) NOT: position of predependent of word = after it.</Paragraph> <Paragraph position="48"> (43e) visitor of word -- a postdependent of it.</Paragraph> <Paragraph position="49"> (43f) visitor of word = a visitor of complement of it.</Paragraph> <Paragraph position="50"> Proposition (43a) allows distrust to have a visitor, which according to (43b) is a kind of predependent and therefore, by inheritance from (43c,d), must precede it. The visitor is also some kind of postdependent, according to (43e), so it may be the verb's object as in our example. But equally it may 'hop' down the dependency chain thanks to (43f), thereby providing for the analysis of sentences such as (44).</Paragraph> <Paragraph position="51"> (44) Salesmen I don't think many people say they trust.</Paragraph> <Paragraph position="52"> Further details of the WG analysis of passives and extraction can be found in Hudson (1990).</Paragraph> <Paragraph position="53"> Both passivization and extraction are standard examples of syntactic problems that need special machinery. According to WG--and more recently Flickinger, Pollard, and Wasow (1985) and Flickinger (1987)--all that is needed is default inheritance, which is available (though generally not explicitly recognized) in every linguistic theory; so any theory capable of accommodating exceptional morphology already has the power to deal with subject-inversion, passives and extraction.</Paragraph> </Section> <Section position="6" start_page="148" end_page="152" type="metho"> <SectionTitle> 4. Semantics </SectionTitle> <Paragraph position="0"> Default inheritance also plays a crucial part in the WG treatment of semantics. It is probably obvious how it applies in the familiar examples of inheritance in semantic networks---e.g., how one can infer that Clyde has a head from more general propositions about elephants or animals. Rather than discussing this familiar territory we shall show how default inheritance helps us to answer a recurrent objection to dependency analysis: the syntactic dependency structure is completely flat, so it does not provide any units between the individual word and the complete phrase (where 'phrase' means a word and the complete set of all its dependents and their respective phrases).</Paragraph> <Paragraph position="1"> For example, the semantic structure for the phrase typical French house has to mention the concept 'French house' (a typical French house is a house that is typical as a Computational Linguistics Volume 18, Number 2 French house, and not just as a house); but a flat dependency structure such as (45) provides no syntactic unit larger than the individual &quot;words (Dahl 1980). (45) typical French house Similarly, how can we handle 'VP-anaphora' without a VP node? For example, we need the concept 'adore peanuts' as part of the semantic structure of Fred doesn't in (46a), but adores peanuts is not a phrase in the syntactic structure of the first clause: (46a) Mary adores peanuts but Fred doesn't.</Paragraph> <Paragraph position="2"> (46b) {\[Mary adores peanuts\] \[but Fred doesn't\]} Inheritance is relevant to these questions because notions such as 'French-house '12 and 'adoring-peanuts' can be located in an inheritance hierarchy between the more general notions denoted by their heads ('house' or 'adoring') and the more specific ones denoted by the complete phrase ('typical-French-house,' 'Mary-adoring-peanuts'): As usual, each of these concepts inherits all the properties of the concepts above it in the hierarchy except where these are overridden. 13 It is easy to see how a dependency grammar can generate concepts at the top and bottom of these hierarchies. The top concept comes straight from the lexical entry for the root word (e.g. HOUSE, ADORE), and the bottom one belongs to the word token--the word in the sentence concerned (e.g. the word house in the phrase typical 12 The hyphen in 'French-house' is needed because this is an atomic name whose internal structure is irrelevant.</Paragraph> <Paragraph position="3"> 13 Overriding is found in well-known examples such as fake diamond, not to mention ordinary negative sentences such as Mary didn't jump. A fake diamond is an object that inherits some of the properties of diamonds, especially the visible ones, but not all, and in particular not those that are criterial in the trade. If the sense of Mary jumped (i.e. the kind of situation to which it can refer) is P, then the referent of Mary didn't jump (i.e. the actual situation to which it refers) is NOT: P, in which we know nothing about the situation except that it is not one in which Mary jumped. (The relevant rule is given in the Appendix).</Paragraph> <Paragraph position="4"> Norman M. Fraser and Richard A. Hudson Inheritance in Word Grammar French house). The problem is how to generate concepts in between the two, and the WG solution is to recognize the notion head-sense: the head-sense of some word W is the concept that results from combining W with its head. Thus the head-sense of French in our example is the result of combining French with house (as adjunct and head respectively); and that of peanuts in Mary adores peanuts is the result of combining peanuts with adores. This is how we generate semantic structures that contain 'French-house' and 'adoring-peanuts' without recognizing French house or adores peanuts as units in the syntax.</Paragraph> <Paragraph position="5"> The rules that allow head-senses include these: (48a) dependent of word has 1 head-sense.</Paragraph> <Paragraph position="6"> (48b) referent of word = a dependent of head-sense of it.</Paragraph> <Paragraph position="7"> By the first rule, every dependent of a word has a head-sense, i.e. makes a distinct contribution, in combination with its head, to the sentence's meaning. The notion 'head-sense' is thus a functor that maps the senses of the dependent and the head onto a third concept, applying equally to complements such as peanuts in adores peanuts and to adjuncts such as French in French houses. There is nothing quite like it in standard semantic systems such as Montague Grammar, but it applies in conjunction with rather more standard functors which each pick out one particular (semantic) role as the one filled by the dependent's referent (by rule (48b)). Generally speaking, this role is defined by just one of the words concerned, according to whether the dependent is an adjunct or a complement. If it is an adjunct, it defines its own semantic role (e.g. French defines its own semantic role as 'nationality' or 'location'), but if it is a complement then it leaves the head to define its role (e.g. adores provides the role 'adoree,' or whatever it should be called, for its complement).</Paragraph> <Paragraph position="8"> The relevance to inheritance is that all the different head-senses are held together by inheritance. These are the basic rules: (49a) head-sense of dependent of word isa sense of it.</Paragraph> <Paragraph position="9"> (49b) referent of word isa head-sense of dependent of it.</Paragraph> <Paragraph position="10"> That is, the combination of a word and one of its dependents yields a concept that is normally a particular case of the concept defined by the word on its own (a French house is a kind of house; adoring peanuts is a kind of adoring), and whatever a word refers to must be a particular case of all the concepts defined by it in combination with its dependents (e.g., the particular situation referred to by Mary adores peanuts must be one that is both an example of adoring peanuts and of Mary adoring).</Paragraph> <Paragraph position="11"> We can impose further structure on the semantics; for example, by requiring all other head-senses to build on that of the object: (50) head-sense of dependent of word isa head-sense of object of it.</Paragraph> <Paragraph position="12"> This is equivalent to an ordered procedure that combines the verb with its object before it takes account of any other dependents. This has the attraction of a Categorial Grammar approach, in which dependents are added one at a time and subtle distinctions of grouping can be made e.g., among the complements within VP; but it has the advantage of not requiring a similar binary bracketing in the syntax. Why is this an advantage? Some of the drawbacks of binary syntactic structures are obvious--e.g., Computational Linguistics Volume 18, Number 2 the need for far more phrasal nodes each carrying much the same information. However, the advantage of our system that we should like to draw attention to here is the possibility of reducing the amount of ambiguity.</Paragraph> <Paragraph position="13"> For example, in WG the sequence cheap book about linguistics has just one possible syntactic analysis, the usual flat analysis with both cheap and about as dependents of book, but its semantics contains the concepts 'cheap book' and 'book about linguistics' respectively. These interpretations can be seen from examples such as the following, where the sense of ONE is based on the semantic structure of the antecedent phrase. (51a) I wanted a cheap book about linguistics but I could only find one about cricket.</Paragraph> <Paragraph position="14"> (51b) I wanted a cheap book about linguistics but I could only find a dear one.</Paragraph> <Paragraph position="15"> In a standard approach, the first clause has to be given two distinct syntactic structures, one containing the unit cheap book and the other book about linguistics; but this ambiguity cannot be resolved until the end of the second clause. Our judgment is that the first clause is not in fact semantically ambiguous in this way; and according to our approach there is no need to postulate such ambiguity since both concepts 'cheap book' and * book about linguistics' are available there, in addition to the unifying concept 'cheap book about linguistics&quot; (which could be identified in relation to book as its supersense, the concept that is an instance of the head-sense of every dependent). Here is the relevant part of the structure of both (51a) and (51b): The isa relations in this structure provide the basis for ordinary default inheritance; so if a book has pages, then so do a cheap book, a book about linguistics, and a cheap book about linguistics. They also allow defaults to be overridden in the usual Norman M. Fraser and Richard A. Hudson Inheritance in Word Grammar way; this is precisely the function of ordinary adjectives and adverbs like CHEAP or QUICKLY, which mean 'more cheap/quickly than the default,' where the default value is the average for the category concerned. The theoretical apparatus that allows these meanings is precisely the same as the one we apply in morphology and syntax.</Paragraph> </Section> <Section position="7" start_page="152" end_page="152" type="metho"> <SectionTitle> 5. Conclusion </SectionTitle> <Paragraph position="0"> We have shown that it is possible, and fruitful, to develop a general theory of default inheritance that is equally relevant, and equally important, for all levels of linguistic analysis. We have demonstrated this in relation to syntax and semantics, and by occasional allusions to morphology, but we assume it is equally true of phonology and of pragmatics (in every sense).</Paragraph> <Paragraph position="1"> This conclusion, if true, is important for linguists and psychologists, because it shows that the structures found within language are formally similar in at least some respects to those found in general knowledge, and that both kinds of knowledge are processed in ways that are similar in at least some important respects. And our conclusion is also important for computational linguists because it indicates the possibility of a single very general inheritance mechanism that can be applied at every stage in the parsing process, and also in the manipulation of knowledge structures. A second conclusion, however, is that default inheritance should be based on the principle of stipulated overriding (by means of 'NOT:...' propositions), rather than on automatic overriding. This conclusion conflicts directly with standard theory and practice.</Paragraph> </Section> class="xml-element"></Paper>