File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/j91-2003_metho.xml
Size: 95,687 bytes
Last Modified: 2025-10-06 14:12:40
<?xml version="1.0" standalone="yes"?> <Paper uid="J91-2003"> <Title>Semantics of Paragraphs Wlodek Zadrozny *</Title> <Section position="3" start_page="175" end_page="186" type="metho"> <SectionTitle> 3. The Logic of Reference </SectionTitle> <Paragraph position="0"> The goal of this section is to introduce a formalism describing how background knowledge is used in understanding text. The term &quot;logic of reference&quot; denotes a formal description of this process of consulting various sources of information in order to produce an interpretation of a text. The formalist will be presented in a number of steps in which we will elaborate one simple example: Example 1 Entering the port, a ship brought a disease.</Paragraph> <Paragraph position="1"> This sentence can be translated into the logical formula (ignoring only the past tense of &quot;bring&quot; and the progressive of &quot;enter'9:</Paragraph> <Paragraph position="3"> where s, m, d, are constants.</Paragraph> <Paragraph position="4"> We adopt the three-level semantics as a formal tool for the analysis of paragraphs. This semantics was constructed (Zadrozny 1987a, 1987b) as a formal framework for default and commonsense reasoning. It should not come as a surprise that we can now use this apparatus for text/discourse analysis; after all, many natural language inferences are based on defaults, and quite often they can be reduced to choosing most plausible interpretations of predicates. For instance, relating &quot;they&quot; to &quot;apples&quot; in the sentence (cf. Haugeland 1985 p. 195; Zadrozny 1987a): We bought the boys apples because they were so cheap Zadrozny and Jensen Semantics of Paragraphs can be an example of such a most plausible choice.</Paragraph> <Paragraph position="5"> The main ideas of the three-level semantics can be stated as follows: 1. Reasoning takes place in a three-level structure consisting of an object level, a referential level, and a metalevel.</Paragraph> <Paragraph position="6"> 2. The object level is used to describe the current situation, and in our case is reserved for the formal representation of paragraph sentences. For the sake of simplicity, the object level will consist of a first order theory. 3. The referential level, denoted by R, consists of theories representing background knowledge, from which information relevant to the understanding of a given piece of text has to be extracted. It constrains interpretations of the predicates of an object theory. Its structure and the extraction methods will be discussed below.</Paragraph> <Paragraph position="7"> 4. Understanding has as its goal construction of an interpretation of the text, i.e. building some kind of model. Since not all logically permissible models are linguistically appropriate, one needs a place, namely the metalevel, to put constraints on types of models. Gricean maxims belong there; Section 6 will be devoted to a presentation of the metalevel rules corresponding to them.</Paragraph> <Paragraph position="8"> We have shown elsewhere (Jensen and Binot 1988; Zadrozny 1987a, 1987b) that natural language programs, such as on-line grammars and dictionaries, can be used as referential levels for commonsense reasoning--for example, to disambiguate PP attachment. This means that information contained in grammars and dictionaries can be used to constrain possible interpretations of the logical predicates of an object-level theory. The referential structures we are going to use are collections of logical theories, but the concept of reference is more general. Some of the intuitions we associate with this notion have been very well expressed by Turner (1987, pp. 7-8): ... Semantics is constrained by our models of ourselves and our worlds. We have models of up and down that are based by the way our bodies actually function. Once the word &quot;up&quot; is given its meaning relative to our experience with gravity, it is not free to &quot;slip&quot; into its opposite. &quot;Up&quot; means up and not down .... We have a model that men and women couple to produce offspring who are similar to their parents, and this model is grounded in genetics, and the semantics of kinship metaphor is grounded in this model. Mothers have a different role than fathers in this model, and thus there is a reason why &quot;Death is the father of beauty&quot; fails poetically while &quot;Death is the mother of beauty&quot; succeeds .... It is precisely this &quot;grounding&quot; of logical predicates in other conceptual structures that we would like to capture. We investigate here only the &quot;grounding&quot; in logical theories. However, it is possible to think about constraining linguistic or logical predicates by simulating physical experiences (cf. Woods 1987).</Paragraph> <Paragraph position="9"> We assume here that a translation of the surface forms of sentences into a logical formalism is possible. Its details are not important for our aim of giving a semantic interpretation of paragraphs; the main theses of our theory do not depend on a logical notation. So we will use a very simple formalism, like the one above, resembling the standard first order language. But, obviously, there are other possibilities--for instance, the discourse representation structures (DRS's) of Kamp (1981), which have been used to translate a subset of English into logical formulas, to model text (identified with a list of sentences), to analyze a fragment of English, and to deal with anaphora. The logical Computational Linguistics Volume 17, Number 2 notation of Montague (1970) is more sophisticated, and may be considered another possibility. Jackendoff's (1983) formalism is richer and resembles more closely an English grammar. Jackendoff (1983, p. 14) writes &quot;it would be perverse not to take as a working assumption that language is a relatively efficient and accurate encoding of the information it conveys.&quot; Therefore a formalism of the kind he advocates would probably be most suitable for an implementation of our semantics. It will also be a model for our simplified logical notation (cf. Section 5). We can envision a system that uses data structures produced by a computational grammar to obtain the logical form of sentences.</Paragraph> <Section position="1" start_page="177" end_page="177" type="sub_section"> <SectionTitle> 3.1 Finite Representations, Finite Theories </SectionTitle> <Paragraph position="0"> Unless explicitly stated otherwise, we assume that formulas are expressed in a certain (formal) language L without equality; the extension L(=) of L is going to be used only in Section 5 for dealing with noun phrase references. This means that natural language expressions such as &quot;A is B,&quot; &quot;A is the same as B,&quot; etc. are not directly represented by logical equality; similarly, &quot;not&quot; is often not treated as logical negation; cf. Hintikka (1985).</Paragraph> <Paragraph position="1"> All logical notions that we are going to consider, such as theory or model, will be finitary. For example, a model would typically contain fewer than a hundred elements of different logical sorts. Therefore these notions, and all other constructs we are going to define (axioms, metarules, definitions etc.) are computational, although usually we will not provide explicit algorithms for computing them. The issues of control are not so important for us at this point; we restrict ourselves to describing the logic. This Principle of Finitism is also assumed by Johnson-Laird (1983), Jackendoff (1983), Kamp (1981), and implicitly or explicitly by almost all researchers in computational linguistics. As a logical postulate it is not very radical; it is possible within a finitary framework to develop that part of mathematics that is used or has potential applications in natural science, such as mathematical analysis (cf. Mycielski 1981).</Paragraph> <Paragraph position="2"> On the other hand, a possible obstacle to our strategy of using only finite objects is the fact that the deductive closure of any set of formulas is not finite in standard logic, while, clearly, we will have to deduce new facts from formal representations of text and background knowledge. But there are several ways to avoid this obstruction. For example, consider theories consisting of universal formulas without function symbols.</Paragraph> <Paragraph position="3"> Let Th(T) of such a theory T be defined as T plus ground clauses/sets of literals provable from T in standard logic. It is easily seen that it is a closure, i.e. Th(Th(T)) = Th(T); and obviously, it is finite, for finite T. It makes sense then to require that logical consequences of paragraph sentences have similar finite representations. However, in order not to limit the expressive power of the formal language, we should proceed in a slightly different manner. The easiest way to achieve the above requirement is by postulating that all universes of discourse are always finite, and therefore all quantifiers actually range over finite domains. In practice, we would use those two and other tricks: we could forbid more than three quantifier changes, because even in mathematics more than three are rare; we could restrict the size of universes of discourse to some large number such as 1001; we could allow only a fixed finite nesting of function symbols (or operators) in formulas; etc. The intention of this illustration was to convince the reader that we now can introduce the following set of definitions.</Paragraph> </Section> <Section position="2" start_page="177" end_page="178" type="sub_section"> <SectionTitle> Definitions </SectionTitle> <Paragraph position="0"> * A theory is a finite set of sentences Sent (formulas without free variables in some formal language).</Paragraph> <Paragraph position="1"> Zadrozny and Jensen Semantics of Paragraphs * A deductive closure operator is a function Th : P(Sent) --* P(Sent) (a) T c Th(T), for any T, (b) Th(Th(T)) = Th(T), (c) Th(T) is finite, for finite T; additionally, we require it to be ground, for ground T.</Paragraph> <Paragraph position="2"> * A theory T is consistent if there is no formula ~ such that both ~ and -~ belong to Th(T) (and inconsistent otherwise).</Paragraph> <Paragraph position="3"> * A model of a theory T is defined, as usual, as an interpretation defined on a certain domain, which satisfies all formulas of T. The collection of all (finite) models of a theory T will be denoted by Mods(T). * The set of all subformulas of a collection of formulas F is denoted by Form(F). ~ is a ground instance of a formula G if ~ contains no variables, and ~ = ~, for some substitution 0.</Paragraph> <Paragraph position="4"> Thus, we do not require Th(T) to be closed under substitution instances of tautologies. Although in this paper we take modus ponens as the main rule of inference, in general one can consider deductive closures with respect to weaker, nonstandard logics, (cf. Levesque 1984; Frisch 1987; Patel-Schneider 1985). But we won't pursue this topic further here.</Paragraph> </Section> <Section position="3" start_page="178" end_page="180" type="sub_section"> <SectionTitle> 3.2 The Structure of Background Knowledge </SectionTitle> <Paragraph position="0"> Background knowledge is not a simple list of meaning postulates--it has a structure and it may contain contradictions and ambiguities. These actualities have to be taken into account in any realistic model of natural language understanding. For instance, the verb &quot;enter&quot; is polysemous. But, unless context specifies otherwise, &quot;to come in&quot; is a more plausible meaning than &quot;to join a group.&quot; Assuming some logical representation of this knowledge, we can write that</Paragraph> <Paragraph position="2"> and e2 ~enter el.</Paragraph> <Paragraph position="3"> Two things should be explained now about this notation: Meanings of predicates/words are represented on the right-hand sides of the arrows as collections of formulas--i.e., theories. The main idea is that these mini-theories of predicates appearing in a paragraph will jointly provide enough constraints to exclude implausible interpretations. (One can think of meanings as regions in space, and of constraints as sets of linear inequalities approximating these regions). How this can be done, we will show in a moment.</Paragraph> <Paragraph position="4"> These theories are partially ordered; and their partial orders are written as <enter(x,y), or <enter, or <i, or simply <, depending on context. This is our way of making formal the asymmetry of plausibility of different meanings of a predicate. Again, a way of exploiting it will be shown below.</Paragraph> <Paragraph position="5"> Computational Linguistics Volume 17, Number 2 Definition A referential level R is a structure R = {(C/, <C/) : C/ E Formulae} where---for each C/-- <C/ is a partially ordered (by a relation of plausibility) collection of implications C/ ~ TC/.</Paragraph> <Paragraph position="6"> The term C/ --* T stands for the theory {C/ --* ~- : ~- E T}, and ~) ---* {C/1, C/2,'' &quot;} abbreviates {C/ --* C/1, C/ --* C/2,...}. It is convenient to assume also that all formulas, except &quot;laws&quot;--which are supposed to be always true---have the least preferred empty interpretation ~.</Paragraph> <Paragraph position="7"> We suppose also that interpretations are additionally ranked according to the canonical partial ordering on subformulas. The ranking provides a natural method of dealing with exceptions, as in the case of finding an interpretation of ~ & p & fl with R containing (~ --* y), (c~ & fl ~ -~y), where -~y would be preferred to ywif both are consistent, and both defaults are equally preferred. This means that preference is given to more specific information. For instance, the sentence The officer went out and struck the flag will get the reading &quot;lowered the flag,&quot; if the appropriate theory of strike(x, y)&flag(y) is part of background knowledge; if not, it will be understood as &quot;hit the flag.&quot; The referential level (R, <) may contain the theories listed below. Since we view a dictionary as an (imperfect) embodiment of a referential level, we have derived the formulas in every theory TC/ from a dictionary definition of the term C/. We believe that even such a crude model can be useful in practice, but a refinement of this model will be needed to have a sophisticated theory of a working natural language understanding system.</Paragraph> <Paragraph position="9"> illness caused by an infection */ Note: We leave undefined the semantics of adverbs such as typically in (e2). This adverb appears in the formula as an operator; our purpose in choosing this representation is to call the reader's attention to the fact that for any real applications the theories will have to be more complex and very likely written in a higher order language (cf. Section 4).</Paragraph> <Paragraph position="10"> The theories, which we describe here only partially, restricting ourselves to their relevant parts, represent the meanings of concepts. We assume as before that (el) is more plausible than (e2), i.e. e2 <enter el; similarly, for (shl), (sh2) and <ship, etc. This particular ordering of theories is based on the ordering of meanings of the corresponding words in dictionaries (derived and less frequent meanings have lower priority). But one can imagine deriving such orderings by other means, such as statistics. The partial order <enter has the theories {el, e2, ~} as its domain; ~ is the least preferred empty interpretation corresponding to our lack of knowledge about the predicate; it is used when both (el) and (e2) are inconsistent with a current object theory. The domain is ordered by the relation of preference ~ <enter e2 <enter el. The theory (el) will always be used in constructing theories and models of paragraphs in which the expression &quot;enter&quot; (in any grammatical form) appears, unless assuming it would lead to an inconsistency. In such a case, the meaning of &quot;to enter&quot; would change to (e2), or some other theory belonging to R.</Paragraph> <Paragraph position="11"> We would like to stress three points now: (1) the above implications are based on the definitions that actually occur in dictionaries; (2) the ordering can actually be found in some dictionaries--it is not our own arbitrary invention; (3) it is natural to treat a dictionary definition as a theory, since it expresses &quot;the analysis of a set of facts in their relation to one another,&quot; different definitions corresponding to possible different analyses. (Encyclopedia articles are even more theory-like.) In this sense, the notion of a referential level is a formalization of a real phenomenon.</Paragraph> <Paragraph position="12"> Obviously, dictionaries or encyclopedias do not include all knowledge an agent must have to function in the world, or a program should possess in order to understand any kind of discourse. Although the exact boundary between world knowledge and lexical knowledge is impossible to draw, we do know that lexicons usually contain very little information about human behavior or temporal relations among objects of the world. Despite all these differences, we may assume that world knowledge and lexical knowledge (its proper subset) have a similar formal structure. And in the examples that we present, it is the structure that matters.</Paragraph> </Section> <Section position="4" start_page="180" end_page="184" type="sub_section"> <SectionTitle> 3.3 How to Use Background Knowledge </SectionTitle> <Paragraph position="0"> The next few pages will be devoted to an analysis of the interaction of background knowledge with a logical representation of a text. We will describe two modes of such an interaction; both seem to be present in our understanding of language. One exploits differences in plausibility of the meanings of words and phrases, in the absence of context (e.g., the difference between a central and a peripheral sense, or between a frequent and a rare meaning). The other one takes advantage of connections between those meanings. We do not claim that this is the only possible such analysis; rather, Computational Linguistics Volume 17, Number 2 we present a formal model which can perhaps be eventually disproved and replaced by a better one. As far as we know, this is the first such formal proposal.</Paragraph> <Paragraph position="1"> 3.3.1 Dominance. In Figure I the theories of &quot;enter,&quot; &quot;ship,&quot; etc. and the partial orders are represented graphically; more plausible theories are positioned higher. A path through this graph chooses an interpretation of the sentence S. For instance, the path fint = {el, shl~ pl, bl, dl} and S say together that A large boat (ship) that carries people or goods came into the harbor and carried a disease (illness).</Paragraph> <Paragraph position="2"> Since it is the &quot;highest&quot; path, fint is the most plausible (relative to R) interpretation of the words that appear in the sentence. Because it is also consistent, it will be chosen as a best interpretation of S, (cf. Zadrozny 1987a, 1987b). Another theory, consisting of f~ = {el, sh2, pl, b2~ dl} and S, saying that A space vehicle came into the harbor and caused a disease~illness is less plausible according to that ordering. As it turns out, f~ is never constructed in the process of building an interpretation of a paragraph containing the sentence S, unless assuming fint would lead to a contradiction, for instance within the higher level context of a science fiction story.</Paragraph> <Paragraph position="3"> The collection of these most plausible consistent interpretations of a given theory T is denoted by PT< (T). Then fint belongs to PT< (Th({S})), but this is not true for f'. Note: One should remember that, in general, because all our orderings are partial, there can be more than one most plausible interpretation of a sentence or a paragraph, and more that one &quot;next best&quot; interpretation. Moreover, to try to impose a total order on all the paths (i.e. the cartesian product defined in Section 3.3.2) would be a mistake; it would mean that ambiguities, represented in our formalism by existence of more than one (&quot;best'0 interpretation of a text, are outlawed.</Paragraph> <Paragraph position="4"> sented below. Any path through the graph of Figure 1 is an element .of the cartesian product I'I~Esubformulas(S) (~I, of the partial orderings.</Paragraph> <Paragraph position="5"> Figure 2 explains the geometric intuitions we associate with the product and the ordering. The product itself is given by the following definition: Definition Let F be a collection of formulas ~e, e G m, for some natural number m; and let, for each e, <e be a partial ordering on the collection theories of ~be. Define: {C/e .... ,C/e G,} II(F) ~- 1--\[ (e= 0c: (Ve G m)(31 < ne)~(e) = ~de -'~ Z~\]} e~m We denote by < the partial order induced on II(F) by the orderings <e and the canonical ordering of subformulas (a formula is &quot;greater&quot; than its subformulas). The geometrical meaning of this ordering can be expressed as &quot;higher paths are more important provided they pass through the most specific nodes.&quot; Zadrozny and Jensen Semantics of Paragraphs ///S fint~ .,..,,~1 .e I c-----s h I bit _.. i fl.- j &quot;~ J ~.....,,1~1 e2 sh2, b2&quot; a a I I I a a Q Figure 1 The partial ordering of theories of the referential level R and the ordering of interpretations. Since (shl) and (bl) dominate (respectively) (sh2) and (b2), the path f, nt represents a more plausible interpretation than ft.</Paragraph> <Paragraph position="6"> <1 <2 <5 &quot;flag&quot; &quot;strike&quot; &quot;strike ~ flag&quot; (2) &quot;cloth&quot; &quot;hit/anger&quot; &quot;lower&quot; j (I)</Paragraph> <Paragraph position="8"> The cartesian product I-I <i=-<1 x <2 x K3 can be depicted as a collection of all paths through the graphs representing the partial orderings; a path chooses only one element from each ordering--thus (1) and (2) are &quot;legal&quot; paths, while (4) is not. Also, more plausible theories appear higher: &quot;cloth&quot; > &quot;music&quot; > ~. More specific paths are preferred: Assuming that all higher paths, like path (1), are excluded by inconsistency, path (2) is the most plausible interpretation of c~&fl, and it is preferred to (3). (More explanations in the text).</Paragraph> <Paragraph position="9"> To make Figure 2 more intuitive we assigned some meanings to the partial orders.</Paragraph> <Paragraph position="10"> Thus, (1 represents some possible meanings of &quot;flag,&quot; shown with the help of the &quot;key words;&quot; the meaning &quot;piece of cloth&quot; preferred to &quot;deer's tail.&quot; The word &quot;strike&quot; has dozens of meanings, and we can imagine that the meaning of the transitive verb being represented by (2, with &quot;hit in anger&quot; at the top, then &quot;hit, e.g. a ball&quot; and &quot;discover&quot; equally preferred, and then all other possible meanings. The trivial (3 representing Computational Linguistics Volume 17, Number 2 &quot;strike a flag&quot; should remind us that we already know all that from Section 3.2. Notice that path (2) does not give us the correct interpretation of &quot;strike the flag,&quot; which is created from &quot;cloth&quot;~-&quot;lower.&quot; Each element of the cartesian product I-\[ <i represents a set of possible meanings.</Paragraph> <Paragraph position="11"> These meanings can be combined in various ways, the simplest of which consists of taking their union as we did in 3.3.1. But a paragraph isn't just a sum of its sentences, as a sentence isn't simply a concatenation of its phrases. The cohesion devices--such as &quot;but,&quot; &quot;unless,&quot; &quot;since&quot;--arrange sentences together, and they also have semantic functions. This is reflected, for instance, in the way various pieces of background knowledge are pasted together. Fortunately, at this point we can abstract from this by introducing an operator variable (r) whose meaning will be, as a default, that of a set theoretic union, U; but, as we describe it in Section 6.2, it can sometimes be changed to a more sophisticated join operator. There, when considering the semantics of &quot;but,&quot; we'll see that referential level theories can be combined in slightly more complicated ways. In other words, a partial theory corresponding to a paragraph cannot be just a sum of the theories of its sentencesIthe arrangement of those theories should obey the metalevel composition rules, which give the semantics of connectives. However, from a purely formal point of view, @ can be any function producing a theory from a collection of theories.</Paragraph> <Paragraph position="12"> The cartesian product represents all possible amalgamations of these elementary theories. In other words, this product is the space of possible combinations of meanings, some of which will be inconsistent with the object level theory T. We can immediately exclude the inconsistent combinations, eliminating at least some nonsense: I:I(F) = {f E II(F) : (r)f is consistent with T} It remains now to fill in the details of the construction of PT<. We assume that a text P can be translated into a (ground) theory/5 (a set of logical sentences); T = Th(P) is the set of logical consequences of P. We denote by F the set Form(Th(_fi))--the set of all subformulas of Th(/5), about which we shall seek information at the referential level R. If F = {~bl(C'~),... ,~b,(C'n)} (~/is a collection of constants that are arguments of ~bi), is this theory, we have to describe a method of augmenting it with the background knowledge. We can assume without loss of generality that each ~i(~i) in F has, in R, a corresponding partial order <i of theories of ~i(xi). We now substitute the constants c'i for the variables xi inside the theories of <i. With a slight abuse of notation, we will use the same symbol <i for the new ordering. The product spaces II(F) and I:I(F) can then be defined as before, with the new orderings in place of the ones with variables. Notice that if only some of the variables of ~bi(~i) were bound by c'i, the same construction would work. We have arrived then at a general method of linking object level formulas with their theories from R.</Paragraph> <Paragraph position="13"> Now we can define PT< (T) of the theory T as the set of most likely consistent theories of T given by (H(F), <), where F = Form(T): PT<(T) = {TUT':T'= (r)f and f is a maximal element of (I~I(F), <)} Notice that PT< (T) can contain more than one theory, meaning that T is ambiguous.</Paragraph> <Paragraph position="14"> This is a consequence of the fact that the cartesian product is only partially ordered by <. The main reason for using ground instances ~i(Ci) in modifying the orderings is the need to deal with multiple occurrences of the same predicate, as in John went to the bank by the bank.</Paragraph> <Paragraph position="15"> Zadrozny and Jensen Semantics of Paragraphs The above construction is also very close in spirit to Poole's (1988) method for default reasoning, where object theories are augmented by ground instances of defaults.</Paragraph> <Paragraph position="16"> 3.3.3 Coherence Links. The reasoning that led to the intended interpretation fint in our discussion of dominance was based on the partial ordering of the theories of R. We want to exploit now another property of the theories of R--their coherence. Finding an interpretation for a natural language text or sentence typically involves an appeal to coherence. Consider $2: Entering the port, a ship brought a disaster.</Paragraph> <Paragraph position="17"> Using the coherence link between (b2) and (dr1) (cf. Section 3.2)--the presence of cause(*, ,) in the theories of &quot;bring&quot; and &quot;disaster&quot;--we can find a partial coherent interpretation T E PTc(Th({S2})) of $2. In this interpretation, theories explaining the meanings of terms are chosen on the basis of shared terms. This makes (b2) (&quot;to bring&quot; means &quot;to cause&quot;) plausible and therefore it would be included in T. The formalization of all this is given below:</Paragraph> </Section> <Section position="5" start_page="184" end_page="185" type="sub_section"> <SectionTitle> Definitions </SectionTitle> <Paragraph position="0"> * The set of all theories about the formulas of T is defined as: Here, we ignore the ordering, because we are interested only in connections between concepts (represented by words).</Paragraph> <Paragraph position="1"> * If t, t r E G(T), t ~ t I, share a predicate, we say that there is a c-link between t and tC A c-path is defined as a chain of c-links; i.e. if</Paragraph> <Paragraph position="3"> Under this condition, for any predicate, only one of its theories will belong to a c-path. A c-path therefore chooses only one meaning for each term.</Paragraph> <Paragraph position="4"> * C(T) will denote the set of all c-paths in G(T) consistent with T, i.e. for each p E C(T), Op td T is consistent.</Paragraph> <Paragraph position="5"> This construction is like the one we have encountered when defining I~I(T). The details should be filled out exactly as before; we leave this to the reader.</Paragraph> <Paragraph position="6"> * We define PTc(T) of a theory T as the set of most coherent consistent theories of T given by C(T):</Paragraph> <Paragraph position="8"> Going back to $2, PTc(Th(S2)) contains also the interpretation based on the coherence link between &quot;ship&quot; and &quot;bring,&quot; which involves &quot;carry.&quot; Based on the just-described coherence relations, we conclude that sentence $2 is ambiguous; it has two interpretations, based on the two senses of &quot;bring.&quot; Resolution of ambiguities involves factors beyond the scope of this section--for instance, Gricean maxims and topic (Section 6), or various notions of context (cf. Small et al. 1988). We will continue the Computational Linguistics Volume 17, Number 2 topic of the interaction of object level theories with background knowledge by showing how the two methods of using background knowledge can be combined.</Paragraph> <Paragraph position="9"> 3.3.4 Partial Theories. A finer interpretation of an object level theory T--its partial theory--is obtained by the iteration:</Paragraph> <Paragraph position="11"> Notice that coherence does not decide between (el) and (e2) given the above R, but the iteration produces two theories of $2, both of which assert that the meaning of &quot;ship entered&quot; is &quot;ship came.&quot; A ship~boat came into the harbor/port and caused~brought a disaster.</Paragraph> <Paragraph position="12"> A ship~boat came into the harbor/port and carried/brought a disaster.</Paragraph> <Paragraph position="13"> PT({S1}) contains only one interpretation based on fint&quot; A ship~boat came into the harbor~port and carried~brought a disease.</Paragraph> <Paragraph position="14"> Partial theories will be the main syntactic constructs in the subsequent sections. In particular, the p-models will be defined as some special models or partial theories of paragraphs.</Paragraph> </Section> <Section position="6" start_page="185" end_page="186" type="sub_section"> <SectionTitle> 3.4 Summary and Discussion </SectionTitle> <Paragraph position="0"> We have shown that finding an interpretation of a sentence depends on two graph-theoretical properties--coherence and dominance. Coherence is a purely &quot;associative&quot; property; we are interested only in the existence of links between represented concepts/theories. Dominance uses the directionality of the partial orders.</Paragraph> <Paragraph position="1"> A partial theory PT(T) of an object theory T corresponding to a paragraph is obtained by joining most plausible theories or sentences, collocations, and words of the paragraph. However, this simple picture must be slightly retouched to account for semantic roles of inter- and intra-sentential connectives such as &quot;but,&quot; and to assure consistency of the partial theory. These modifications have complicated the definitions a little bit. The above definitions capture the fact that even if, in principle, any consistent combination of the mini-theories about predicates can be extended to an interpretation, we are really interested only in the most plausible ones. The theory PT(T) is called &quot;partial&quot; because it does not contain all knowledge about predicates-less plausible properties are excluded from consideration, although they are accessible should an inconsistency appear. Moreover, the partiality is related to the unutilized possibility of iterating the operator PT (cf. Section 4).</Paragraph> <Paragraph position="2"> How can we now summarize what we have learned about the three logical levels? To begin with, one should notice that they are syntactically distinct. If object level theories are expressed by collections of first order formulas, metalevel definitions-e.g., to express as a default that (r) is a set theoretical union--require another language, Zadrozny and Jensen Semantics of Paragraphs such as higher order logic or set theory, where one can define predicates dealing with models, consistency, and provability. Even if all background knowledge were described, as in our examples, by sets of first order theories, because of the preferences and inconsistencies of meanings, we could not treat R as a flat database of facts--such a model simply would not be realistic. Rather, R must be treated as a separate logical level for these syntactic reasons, and because of its function--being a pool of possibly conflicting semantic constraints.</Paragraph> <Paragraph position="3"> The last point may be seen better if we look at some differences between our system and KRYPTON, which also distinguishes between an object theory and background knowledge (cf. Brachman et al. 1985). KRYPTON's A-box, encoding the object theory as a set of assertions, uses standard first order logic; the T-box contains information expressed in a frame-based language equivalent to a fragment of FOL. However, the distinction between the two parts is purely functional--that is, characterized in terms of the system's behavior. From the logical point of view, the knowledge base is the union of the two boxes, i.e. a theory, and the entailment is standard. In our system, we also distinguish between the &quot;definitional&quot; and factual information, but the &quot;definitional&quot; part contains collections of mutually excluding theories, not just of formulas describing a semantic network. Moreover, in addition to proposing this structure of R, we have described the two mechanisms for exploiting it, &quot;coherence&quot; and &quot;dominance,&quot; which are not variants of the standard first order entailment, but abduction.</Paragraph> <Paragraph position="4"> The idea of using preferences among theories is new, hence it was described in more detail. &quot;Coherence,&quot; as outlined above, can be understood as a declarative (or static) version of marker passing (Hirst 1987; Charniak 1983), with one difference: the activation spreads to theories that share a predicate, not through the IS-A hierarchy, and is limited to elementary facts about predicates appearing in the text.</Paragraph> <Paragraph position="5"> The metalevel rules we are going to discuss in Section 6, and that deal with the Gricean maxims and the meaning of &quot;but,&quot; can be easily expressed in the languages of set theory or higher order logic, but not everything expressible in those languages makes sense in natural language. Hence, putting limitations on the expressive power of the language of the metalevel will remain as one of many open problems.</Paragraph> </Section> </Section> <Section position="4" start_page="186" end_page="197" type="metho"> <SectionTitle> 4. Coherence of Paragraphs </SectionTitle> <Paragraph position="0"> We are now in a position to use the notion of the referential level in a formal definition of coherence and topic. Having done that, we will turn our attention to the resolution of anaphora, linking it with the provability relation (abduction) t-R+M and a metarule postulating that a most plausible model of a paragraph is one in which anaphors have references. Since the example paragraph we analyze has only one connective (&quot;and&quot;), we can postpone a discussion of connectives until Section 6.</Paragraph> <Paragraph position="1"> Building an interpretation of a paragraph does not mean finding all of its possible meanings; the implausible ones should not be computed at all. This viewpoint has been reflected in the definition of a partial theory as a most plausible interpretation of a sequence of predicates. Now we want to restrict the notion of a partial theory by introducing the formal notions of topic and coherence. We can then later (Section 5.2) define p-models--a category of models corresponding to paragraphs--as models of coherent theories that satisfy all metalevel conditions.</Paragraph> <Paragraph position="2"> The partial theories pick up from the referential level the most obvious or the most important information about a formula. This immediate information may be insufficient to decide the truth of certain predicates. It would seem therefore that the iteration of the PT operation to form a closure is needed (cf. Zadrozny 1987b).</Paragraph> <Paragraph position="3"> Computational Linguistics Volume 17, Number 2 However, there are at least three arguments against iterating PT. First of all, iteration would increase the complexity of building a model of a paragraph; infinite iteration would almost certainly make impossible such a construction in real time. Secondly, the cooperative principle of Grice (1975, 1978), under the assumption that referential levels of a writer and a reader are quite similar, implies that the writer should structure the text in a way that makes the construction of his intended model easy for the reader; and this seems to imply that he should appeal only to the most direct knowledge of the reader. Finally, it has been shown by Groesser (1981) that the ratio of derived to explicit information necessary for understanding a piece of text is about 8:1; furthermore, our reading of the analysis of five paragraphs by Crothers (1979) strongly suggests that only the most direct or obvious inferences are being made in the process of building a model or constructing a theory of a paragraph. Thus, for example, we can expect that in the worst case only one or two steps of such an iteration would be needed to find answers to wh-questions.</Paragraph> <Paragraph position="4"> Let P be a paragraph, let /3 = ($1,..., Sn) be its translation into a sequence of logical formulas. The set of all predicates appearing in X will be denoted by Pred(X).</Paragraph> <Paragraph position="5"> Definition Let T be a partial theory of a paragraph P. A sequence of predicates appearing in i6, denoted by Tp, is called a topic of the paragraph P, if it is a longest sequence satisfying the conditions (1) and (2) below: 1. For all &quot;sentences&quot; Si, (a) or (b) or (c) holds: (a) Direct reference to the topic: Tp C Pred(Si) (b) Indirect reference to the topic: If C/ E Pred(Si) & (C/ -~ TC/) E T, then Tp C Pred(TC/) (c) Direct reference to a previous sentence: If C/ E Pred(Si) & (~ ~ TC/) E T then Pred(Si_l)MPred(C/ ~ TV~ ) # 9~ 2. Either (i) or (ii) is satisfied: (i) Existence of a topic sentence: Tp C Pred(Si), for some sentence Si; (ii) Existence of a topic sentence: a theory of Tp belongs to R, i.e. if 0 is the conjunction of predicates of Tp then 0 ~ To E R, for some To.</Paragraph> <Paragraph position="6"> The last two conditions say that either the discussed concept (topic) already exists in the background knowledge or it must be introduced in a sentence. For instance, we can see that the sentence The effect of the Black Death was appalling can be assumed to be a topic sentence.</Paragraph> <Paragraph position="7"> The first three conditions make the requirements for a collection of sentences to have a topic. Either every sentence talks about the topic (as, for instance, the first two sentences of the paragraph about the Black Death), or a sentence refers to the topic Zadrozny and Jensen Semantics of Paragraphs through background knowledge----the topic appears in a theory about an entity or a relation of the sentence (in the case of Within twenty-four hours of infection .... &quot;infection&quot; can be linked to &quot;disease'--cf. Sections 2 and 4.2), or else a sentence elaborates a fragment of the previous sentence (the theme The effect of... being developed in In less than... ).</Paragraph> <Paragraph position="8"> The definition allows a paragraph to have more than one topic. For instance, a paragraph consisting of John thinks Mary is pretty. John thinks Mary is intelligent. John wants to marry her.</Paragraph> <Paragraph position="9"> can be either about {John(\]), Mary(m), think(j, m, pretty(m))}, or about John, Mary, and marrying. (Notice that the condition 2 (i) forbids us merging the two topics into a larger one). Thus paragraphs can be ambiguous about what constitutes their topics. The point is that they should have one.</Paragraph> <Paragraph position="10"> It is also clear that what constitutes a topic depends on the way the content of paragraph sentences is represented. In the last case, if &quot;pretty&quot; were translated into a predicate, and not into a modifier of m (i.e. an operator), &quot;John thinking about Mary&quot; could not be a topic, for it wouldn't be the longest sequence of predicates satisfying the conditions (1) and (2).</Paragraph> <Paragraph position="11"> We'd like to put forward a hypothesis that this relationship between topics and representations can actually be useful: Because the requirement that a well-formed paragraph should have a topic is a very natural one (and we can judge pretty well what can be a topic and what can't), we can obtain a new method for judging semantic representations. Thus, if the naive first order representation containing pretty(m) as one of the formulas gives a wrong answer as to what is the topic of the above, or another, paragraph, we can reject it in favor of a (higher order) representation in which adjectives and adverbs are operators, not predicates, and which provides us with an intuitively correct topic. Such a method can be used in addition to the standard criteria for judging representations, such as elegance and ability to express semantic generalizations.</Paragraph> <Paragraph position="12"> Definition A partial theory T E PT(P) of the paragraph P is coherent iff the paragraph P has a topic.</Paragraph> <Paragraph position="13"> A random permutation of just any sentences about a disease wouldn't be coherent. But it would be premature to jump to the conclusion that we need more than just existence of a topic as a condition for coherence. Although it may be the case that it will be necessary in the future to introduce notions like &quot;temporal coherence,&quot; &quot;deictic coherence,&quot; or &quot;causal coherence,&quot; there is no need to start multiplying beings now. We can surmise that the random permutations we talk about would produce an inconsistent theory; hence, the temporal, causal, and other aspects would be dealt with by consistency. But of course at this point it is just a hypothesis.</Paragraph> <Paragraph position="14"> An important aspect of the definition is that coherence has been defined as a prop-erty of representation--in our case, it is a property of a formal theory. The existence of the topic, the direct or indirect allusion to it, and anaphora (which will be addressed below) take up the issue of formal criteria for a paragraph definition, which was raised Computational Linguistics Volume 17, Number 2 by Bond and Hayes (1983) (cf. also Section 2.1). The question of paragraph length can probably be attended to by limiting the size of p-models, perhaps after introducing some kind of metric on logical data structures.</Paragraph> <Paragraph position="15"> Still, our definition of coherence may not be restrictive enough: two collections of sentences, one referring to &quot;black&quot; (about black pencils, black pullovers, and black poodles), the other one about &quot;death&quot; (war, cancer, etc.), connected by a sentence referring to both of these, could be interpreted as one paragraph about the new, broader topic &quot;black + death.&quot; This problem may be similar to the situation in which current formal grammars allow nonsensical but parsable collections of words (e.g., &quot;colorless green ideas... '9, while before the advent of Chomskyan formalisms, a sentence was defined as the smallest meaningful collection of words; Fowler (1965, p. 546) gives 10 definitions of a sentence.</Paragraph> <Paragraph position="16"> It then seems worth differentiating between the creation of a new concept like &quot;black + death,&quot; with a meaning given by a paraphrase of the example collection of sentences, and the acceptance of the new concept--storing it in R. In our case the concept &quot;black + death,&quot; which does not refer to any normal experiences, would be discarded as useless, although the collection of sentences would be recognized as a strange, even if coherent, paragraph.</Paragraph> <Paragraph position="17"> We can also hope for some fine-tuning of the notion of topic, which would prevent many offensive examples. This approach is taken in computational syntactic grammars (e.g. Jensen 1986); the number of unlikely parses is severely reduced whenever possible, but no attempt is made to define only the so-called grammatical strings of a language.</Paragraph> <Paragraph position="18"> Finally, as the paragraph is a natural domain in which word senses can be reliably assigned to words or sentences can be syntactically disambiguated, larger chunks of discourse may be needed for precise assignment of topics, which we view as another type of disambiguation. Notice also that for coherence, as defined above, it does not matter whether the topic is defined as a longest, a shortest, or--simply--a sequence of predicates satisfying the conditions (1) and (2); the existence of a sequence is equivalent with the existence of a shortest and a longest sequence. The reason for choosing a longest sequence as the topic is our belief that the topic should rather contain more information about a paragraph than less.</Paragraph> <Section position="1" start_page="189" end_page="191" type="sub_section"> <SectionTitle> 4.1 Comparison with Other Approaches </SectionTitle> <Paragraph position="0"> At this point it may be proper to comment on the relationship between our theory of coherence and theories advocated by others. We are going to make such a comparison with the theories proposed by J. Hobbs (1979, 1982) that represent a more computationally oriented approach to coherence, and those of T.A. van Dijk and W. Kintch (1983), who are more interested in addressing psychological and cognitive aspects of discourse coherence. The quoted works seem to be good representatives for each of the directions; they also point to related literature.</Paragraph> <Paragraph position="1"> The approach we advocate is compatible with the work of these researchers, we believe. There are, however, some interesting differences: first of all, we emphasize the role of paragraphs; second, we talk about formal principles regulating the organization and use of knowledge in language understanding; and third, we realize that natural language text (such as an on-line dictionary) can, in many cases, provide the type of commonsense background information that Hobbs (for example) advocated but didn't know how to access. (There are also some other, minor, differences. For instance, our three-level semantics does not appeal to possible worlds, as van Dijk and Kintch do; neither is it objectivist, as Hobbs' semantics seems to be.) Zadrozny and Jensen Semantics of Paragraphs We shall discuss only the first two points, since the third one has already been explained.</Paragraph> <Paragraph position="2"> The chief difference between our approach and the other two lies in identifying the paragraph as a domain of coherence. Hobbs, van Dijk, and Kintch distinguish between &quot;local&quot; coherence~a property of subsequent sentences--and &quot;global&quot; coherence---a property of discourse as a whole. Hobbs explains coherence in terms of an inventory of &quot;local,&quot; possibly computable, coherence relations, like &quot;elaboration,&quot; &quot;occasion,&quot; etc. (Mann and Thompson 1983 give an even more detailed list of coherence relations than Hobbs.) Van Dijk and Kintch do this too, but they also describe &quot;macrostructures&quot; representing the global content of discourse, and they emphasize psychological and cognitive strategies used by people in establishing discourse coherence. Since we have linked coherence to models of paragraphs, we can talk simply about &quot;coherence&quot;-without adjectives--as a property of these models. To us the first &quot;local&quot; domain seems to be too small, and the second &quot;global&quot; one too large, for constructing meaningful computational models. To be sure, we believe relations between pairs of sentences are worth investigating, especially in dialogs. However, in written discourse, the smallest domain of coherence is a paragraph, very much as the sentence is the basic domain of grammaticality (although one can also judge the correctness of phrases).</Paragraph> <Paragraph position="3"> To see the advantage of assuming that coherence is a property of a fragment of a text/discourse, and not a relation between subsequent sentences, let us consider for instance the text John took a train from Paris to Istanbul. He likes spinach.</Paragraph> <Paragraph position="4"> According to Hobbs (1979, p. 67), these two sentences are incoherent. However, the same fragment, augmented with the third sentence Mary told him yesterday that the French spinach crop failed and Turkey is the only country... (ibid.) suddenly (for Hobbs) becomes coherent. It seems that any analysis of coherence in terms of the relation between subsequent sentences cannot explain this sudden change; after all, the first two sentences didn't change when the third one was added. On the other hand, this change is easily explained when we treat the first two sentences as a paragraph: if the third sentence is not a part of the background knowledge, the paragraph is incoherent. And the paragraph obtained by adding the third sentence is coherent.</Paragraph> <Paragraph position="5"> Moreover, coherence here is clearly the result of the existence of the topic &quot;John likes spinach.&quot; We derive coherence from formal principles regulating the organization and use of knowledge in language understanding. Although, like the authors discussed above, we stress the importance of inferencing and background knowledge in determining coherence, we also address the problem of knowledge organization; for us the central problem is how a model emerges from such an organization. Hobbs sets forth hypotheses about the interaction of background knowledge with sentences that are examined at a given moment; van Dijk and Kintch provide a wealth of psychological information on that topic. But their analyses of how such knowledge could be used are quasi-formal. Our point of departure is different: we assume a certain simple structure of the referential level (partial orders) and a natural way of using the knowledge contained there (&quot;coherence links&quot; + &quot;most plausible = first&quot;). Then we examine what corresponds to &quot;topic&quot; and &quot;coherence&quot;---they become mathematical concepts. In this sense our work refines these concepts, changes the way of looking at them by linking them to the notion of paragraph, and puts the findings of the other researchers into a new context.</Paragraph> <Paragraph position="6"> Computational Linguistics Volume 17, Number 2 5. Models of Paragraphs We argue below that paragraphs can be mapped into models with small, finite universes. We could have chosen another, more abstract semantics, with infinite models, but in this and all cases below we have in mind computational reasons for this enterprise. Thus, as in the case of Kamp's (1981) DRS, we shall construct a kind of Herbrand model of texts, with common and proper names translated into unary predicates, intransitive verbs into unary predicates, and transitive verbs into binary predicates. In building the logical model M of a collection of formulas S corresponding to the sentences of a paragraph, we assume that the universe of M contains constants introduced by elements of S, usually by ones corresponding to NPs, and possibly by some formulas picked by the construction from the referential level. However, we are interested not in the relationship between truth conditions and representations of a sentence, but in a formalization of the way knowledge is used to produce a representation of a section of text. Therefore we need not only a logical description of the truth conditions of sentences, as presented by Kamp, but also a formal analysis of how background knowledge and metalevel operations are used in the construction of models. This extension is important and nontrivial; we doubt that one .can deal effectively with coherence, anaphora, presuppositions or the semantics of connectives without it. We have begun presenting such an analysis in Section 3, and we continue now.</Paragraph> </Section> <Section position="2" start_page="191" end_page="195" type="sub_section"> <SectionTitle> 5.1 The Example Revisited: Preparation for Building a Model </SectionTitle> <Paragraph position="0"> We return now to the example paragraph, to illustrate how the interaction between an object theory and a referential level produces a coherent interpretation of the text (i.e., a p-model) and resolves the anaphoric references. The method will be similar to, but more formal than, what was presented in Section 2. In order not to bore the reader with the same details all over again, we will use a shorter version of the same text.</Paragraph> <Paragraph position="1"> Example 2 PI: In 1347 a ship entered the port of Messina bringing with it the disease that came to be known as the Black Death.</Paragraph> <Paragraph position="2"> P2: It struck rapidly.</Paragraph> <Paragraph position="3"> P3: Within twenty-four hours of infection came an agonizing death. 5.1.1 Translation to Logic. The text concerns events happening in time. Naturally, we will use a logical notation in which formulas may have temporal and event components. We assume that any formal interpretation of time will agree with the intuitive one. So it is not necessary now to present a formal semantics here. The reader may consult recent papers on this subject (e.g. Moens and Steedman 1987; Webber 1987) to see what a formal interpretation of events in time might look like. Since sentences can refer to events described by other sentences, we may need also a quotation operator; Perlis (1985) describes how first order logic can be augmented with such an operator. Extending and revising Jackendoff's (1983) formalism seems to us a correct method to achieve the correspondence between syntax and semantics expressed in the grammatical constraint (&quot;that one should prefer a semantic theory that explains otherwise arbitrary generalizations about the syntax and the lexicon&quot;---ibid.).</Paragraph> <Paragraph position="4"> However, as noted before, we will use a simplified version of such a logical notation; we will have only time, event, result, and property as primitives. After these remarks we can begin constructing the model of the example paragraph. We assume Zadrozny and Jensen Semantics of Paragraphs that constants are introduced by NPs. We have then (i) Constants s, m, d, i, b, 1347 satisfying: ship(s), Messina(m), disease(d), infection(i), death(b), year(1347).</Paragraph> <Paragraph position="5"> (ii) Formulae $1: \[time: year(1347); event : enter(s,m) & ship(s) & port(m) &</Paragraph> <Paragraph position="7"> The notation time : ~(t); event : fl should be understood as meaning that the event described by the formula fl took place in (or during) the time period described by the formula c~(t), t ranges over instants of time (not intervals).</Paragraph> <Paragraph position="8"> Note. We assume that &quot;strike&quot; is used intransitively. But our construction of the p-models of the paragraph would look exactly the same for the transitive meaning, except that we would be expected to infer that people were harmed by the illness. information. The content of this information is of secondary importance we want to stress the formal, logical side of the interaction between the referential level and the object theory. Therefore we represent both in our simplified logical notation, and not in English. All formulas at the referential level below have been obtained by a direct translation of appropriate entries in Webster's and Longman. The translation in this case was manual, but could be automated.</Paragraph> <Paragraph position="10"/> <Paragraph position="12"> /* enter--to come into a place */ We have shown, in Section 3, the role of preferences in building the model of a paragraph. Therefore, to make our exposition clearer, we assume that all the above theories are equally preferred. Still, some interesting things will happen before we arrive at our intended model.</Paragraph> <Paragraph position="13"> finding antecedents of anaphors, we have to introduce a new logical notion--the relation of weak R + M-abduction. This relation would hold, for instance, between the object theory of our example paragraph and a formula expressing the equality of two constants, i and i', denoting (respectively) the &quot;infection&quot; in the sentence Within twenty-four hours of infection .... and the &quot;infection&quot; of the theory (dl)--a disease is an illness caused by an infection. This equality i = i' cannot be proven, but it may be reasonably assumed--we know that in this case the infection i' caused the illness, which, in turn, caused the death.</Paragraph> <Paragraph position="14"> The necessity of this kind of merging of arguments has been recognized before: Charniak and McDermott (1985) call it abductive unification~matching, Hobbs (1978, 1979) refers to such operations using the terms knitting or petty conversational implicature. Neither Hobbs nor Charniak and McDermott tried then to make this notion precise, but the paper by Hobbs et al. (1988) moves in that direction. The purpose of this subsection is to formalize and explain how assumptions like that one above can be made.</Paragraph> <Paragraph position="15"> Definition A formula ~ is weakly provable from an object theory T, expressed as T t-r ~, iff there exists a partial theory T E PT(T) such that T F ~b, i.e. T proves ~ in logic. (We call F-r &quot;weak&quot; because it is enough to find one partial theory proving a given formula.) As an example, in the case of the three-sentence paragraph, we have a partial theory T1 based on (slb) saying that &quot; 'it' hits rapidly,&quot; and T2 saying that &quot;an illness ('it') harms rapidly&quot; (s2_ex). Thus both statements are weakly provable.</Paragraph> <Paragraph position="16"> Zadrozny and Jensen Semantics of Paragraphs Since we view the metalevel constraints M rather as rules for choosing models than as special inference rules, the definition of the R+M-abduction is model-theoretic, not proof-theoretic: Definition A preferred model of a theory T is an element of Mods(T) that satisfies metalevel constraints contained in M. The set of all preferred models of T is denoted by PM(T). A formula 4 of L(=), the language with equality, is weakly R + M-abductible from an object theory T, denoted by T ~-a+M G iff there exists a partial theory T E PT(T) and a preferred model M E PM(T) such that M ~ G i.e. 4 is true in at least one preferred model of the partial theory T.</Paragraph> <Paragraph position="17"> Note: The notions of strong provability and strong R + M-abduction can be introduced by replacing &quot;there exists&quot; by &quot;all&quot; in the above definitions (cf. Zadrozny 1987b). We will have, however, no need for &quot;strong&quot; notions in this paper. Also, in a practical system, &quot;satisfies&quot; should be probably replaced by &quot;violates fewest.&quot; Obviously, it is better to have references of pronouns resolved than not. After all, we assume that texts make sense, and that authors know these references. That applies to references of noun phrases too. On the other hand, there must be some restrictions on possible references; we would rather assume that &quot;spinach&quot; ~ &quot;train&quot; (i.e. V x,y)(spinach(x) & train(y) --, x # y)), or &quot;ship&quot; # &quot;disease.&quot; Two elementary conditions limiting the number of equalities are: an equality N1 = N2 may be assumed only if either N1 and N2 are listed as synonyms (or paraphrases) or their equality is explicitly asserted by the partial theory T. Of course there are other conditions, like &quot;typically, the determiner 'a' introduces a new entity, while 'the' refers to an already introduced constant.&quot; (But notice that in our example paragraph &quot;infection&quot; appears without an article.) All these, and other, guidelines can be articulated in the form of metarules.</Paragraph> <Paragraph position="18"> We define another partial order, this time on models Mods(T) of a partial theory T of a paragraph: M1 >= M2, if M1, satisfies more R + M-abductible equalities than M2. The principle articulating preference for having the references resolved can now be expressed as Metarule 1 Assume that T E PT(P) is a partial theory of a paragraph P. Every preferred model M E PM(T) is a maximal element of the ordering >= of Mods(T).</Paragraph> <Paragraph position="19"> To explain the meaning of the metarule, let us analyze the paragraph (P1, P2, P3) and the background knowledge needed for some kind of rudimentary understanding of that text. The rule (i_1) (infection is a result of being infected by a disease... ), dealing with the infection i, introduces a disease dl; we also know about the existence of the disease d in 1347. Now, notice that there may be many models satisfying the object theory of the paragraph P augmented by the background knowledge. But we can find two among them: in one, call it M1, d and dl are identical; in the other one, M2, they are distinct. The rule says that only the first one has a chance to be a preferred model of the paragraph; it has more noun phrase references resolved than the other model, or--formally--it satisfies more R + M-abductible equalities, and therefore M1 >= M2. This reasoning, as the reader surely has noticed, resembles the example about infections from the beginning of this section. The difference between the cases lies in the equality d = dl being the result of a formal choice of a model, while i = i ~ wasn't proved, just &quot;reasonably&quot; assumed.</Paragraph> <Paragraph position="20"> Computational Linguistics Volume 17, Number 2 In interpreting texts, knowledge of typical subjects and typical objects of verbs helps in anaphora resolution (cf. Braden-Harder & Zadrozny 1990). Thus if we know that A farmer grows vegetables, either having obtained this information directly from a text, or from R, we can reasonably assume tlhat He also grows some cotton refers to the farmer, and not to a policeman mentioned in the same paragraph. Of course, this should be only a defeasible assumption, if nothing indicates otherwise. We now want to express this strategy as a metarule: Metarule 2 Let us assume that it is known that P(a, b) & Q(a) & R(b), and it is not known that P(a', X), for any X. Then models in which P(a, c) & R'(c) holds are preferred to models in which P(a',c) & R'(c) is true.</Paragraph> <Paragraph position="21"> One can think of this rule as a model-theoretic version of Ockham's razor or abduction; it says &quot;minimize the number of things that have the property P(,, ,),&quot; and it allows us to draw certain conclusions on the basis of partial information. We shall see it in action in Section 5.2.</Paragraph> <Paragraph position="22"> We have no doubts that various other metarules will be necessary; clearly, our two metarules cannot constitute the whole theory of anaphora resolution. They are intended as an illustration of the power of abduction, which in this framework helps determine the universe of the model (that is the set of entities that appear in it). Other factors, such as the role of focus (Grosz 1977, 1978; Sidner 1983) or quantifier scoping (Webber 1983) must play a role, too. Determining the relative importance of those factors, the above metarules, and syntactic clues, appears to be an interesting topic in itself.</Paragraph> <Paragraph position="23"> Note: In our translation from English to logic we are assuming that &quot;it&quot; is anaphoric (with the pronoun following the element that it refers to), not cataphoric (the other way around). This means that the &quot;it&quot; that brought the disease in P1 will not be considered to refer to the infection &quot;i&quot; or the death &quot;d&quot; in P3. This strategy is certainly the right one to start out with, since anaphora is always the more typical direction of reference in English prose (Halliday and Hasan 1976, p. 329).</Paragraph> <Paragraph position="24"> Since techniques developed elsewhere may prove useful, at least for comparison, it is worth mentioning at this point that the proposed metarules are distant cousins of &quot;unique-name assumption&quot; (Genesereth and Nilsson 1987), &quot;domain closure assumption&quot; (ibid.), &quot;domain circumscription&quot; (cf. Etherington and Mercer 1987), and their kin. Similarly, the notion of R + M-abduction is spiritually related to the &quot;abductive inference&quot; of Reggia (1985), the &quot;diagnosis from first principles&quot; of Reiter (1987), &quot;explainability&quot; of Poole (1988), and the subset principle of Berwick (1986). But, obviously, trying to establish precise connections for the metarules or the provability and the R + M-abduction would go much beyond the scope of an argument for the correspondence of paragraphs and models. These connections are being examined elsewhere (Zadrozny forthcoming).</Paragraph> </Section> <Section position="3" start_page="195" end_page="197" type="sub_section"> <SectionTitle> 5.2 p-Models </SectionTitle> <Paragraph position="0"> The construction of a model of a paragraph, a p-model, must be based on the information contained in the paragraph itself (the object theory) and in the referential level while the metalevel restricts ways that the model can be constructed, or, in other words, provides criteria for choosing a most plausible model(s), if a partial theory is ambiguous. This role of the metarules will be clearly visible in finding references of pronouns in a simple case requiring only a rule postulating that these references be Zadrozny and Jensen Semantics of Paragraphs searched for, and in a more complex case (in Section 5) when they can be found only by an interplay of background knowledge and (a formalization of) Gricean maxims. Definition M is a p-model of a paragraph P iff there exists a coherent partial theory T E PT(P) such that M E PM(T).</Paragraph> <Paragraph position="1"> Having defined the notion of a p-model, we can mimic now, in logic, the reasoning presented in Section 2.2. Using background information and the translation of sentences, we build a p-model of the paragraph. This involves determining the references of the pronoun &quot;it,&quot; and deciding whether &quot;struck&quot; in the sentence It struck rapidly means &quot;hit&quot; (slb) or &quot;harmed&quot; (s2_ex). We have then two meanings of &quot;strike&quot; and a number of possibilities for the pronouns.</Paragraph> <Paragraph position="2"> We begin by constructing the two classes of preferred models given by (slb) and (s2_ex), respectively. It is easily seen that, in the models of the first class, based on {$2, slb}, (that is {rapidly:strike(yo) and strike(x) --~ hit(x)...}), together with all other available information, do not let us R+M-abduct anything about y0, i.e., the referent for the subject pronoun &quot;it&quot; in P2 (it struck rapidly). On the other hand, from {$2, s2_ex, dl} we R + M-abduct that y0 = d, i.e. the disease struck rapidly. That is the case because s2_ex implies that the agent that &quot;struck rapidly&quot; is actually an illness. From rapidly : strike(yo), strike(x) -* illness(x) & .... disease(y) --, illness(y) & .... and disease(d) we can infer illness(yo) and illness(d); by the Metarule (1) we conclude that Y0 -- d. In other words, the referent for the subject &quot;it&quot; is &quot;disease.&quot; Thus the Metarule (1) immediately eliminates all the models from the first class given by (slb), in which &quot;struck&quot; means &quot;hit.&quot; Notice that we cannot prove in classical logic that the ship has brought the disease.</Paragraph> <Paragraph position="3"> But we are allowed to assume it by the above formal rule as the most plausible state of affairs, or--in other words--we prove it in our three-level logic.</Paragraph> <Paragraph position="4"> We are left then with models of the three sentences ($1, $2, $3) that contain {$2, s2_ex, dl}; they all satisfy y0 = d. We now use {Sl,shl,bl}(enter(s,m) & ship(s) & bring(xo~ d) & ...; ship(s) --~ (3y)carry(s,y) & ...; Vz\[bring(xo, z) --* carry(xo, z)\]). From these facts we can conclude by Metarule (1) that x0 = s: a &quot;ship&quot; is an agent that carries goods; to &quot;bring&quot; means to &quot;carry&quot;; and the disease has been brought by something-we obtain carry(xo, d) and carry(s, y); and then by Metarule (2), carry(s, d). That is, the referent for the pronoun &quot;it&quot; in P1 (... bringing with it the disease... ) should be &quot;ship.&quot; Observe that we do not assert about the disease that it is a kind of goods or people; the line of reasoning goes as follows: since ships are known to carry people or goods, and ports are not known to carry anything, we may assume that the ship carried the disease along with its standard cargo.</Paragraph> <Paragraph position="5"> Having resolved all pronoun references, with no ambiguity left, we conclude that the class PM(P) consists of only one model, based on the the partial theory {$1, $2, $3, shl, bl, e_l, s2_ex, dl, de1, il, al, ctl}.</Paragraph> <Paragraph position="6"> The model describes a situation in which the ship came into the port/harbor; the ship brought the disease; the disease was caused by an infection; the disease harmed rapidly, causing a painful death; and so on.</Paragraph> <Paragraph position="7"> The topic Tp of (P1, P2, P3) is the disease(x). The first sentence talks about it; the second one refers to it using the pronoun &quot;it,&quot; and the third one extends our knowledge about the topic, since &quot;disease' is linked to &quot;infection&quot; through dl. Furthermore, But notice that our definition of topic licences also other analyses, for example, one in which all the predicates of the first sentence constitute the topic of the paragraph, $2 elaborates $1 (in the sense of condition I (c) of the definition of topic), and $3 elaborates $2. Based on the larger context, we prefer the first analysis; however, a computational criterion for such a preference remains as an open problem.</Paragraph> </Section> </Section> <Section position="5" start_page="197" end_page="204" type="metho"> <SectionTitle> 6. On the Role of the Metalevel </SectionTitle> <Paragraph position="0"> We have already seen examples of the application of metalevel rules. In the analysis of the paragraph, we applied one such rule expressing our commonsense knowledge about the usage of pronouns. In this section we discuss two other sources of metalevel axioms: Gricean cooperative principles, which reduce the number of possible Zadrozny and Jensen Semantics of Paragraphs interpretations of a text or an utterance; and connectives and modalities--such as &quot;but, .... unless,&quot; or &quot;maybe&quot;---which refer to the process of constructing the models or partial theories, and to some operations on them (see Figure 4).</Paragraph> <Paragraph position="1"> We can see then two applications of metarules: in constructing models of a text from representations of sentences, and in reducing, or constraining, the ambiguity of the obtained structure. We begin by showing how to formalize the latter. In the next subsection (6.1), assuming the Gricean maxims to be constraints on language communication, either spoken or written, we use their formal versions in building partial theories. A specific instance of the rule of &quot;quantity&quot; turns out to be applicable to anaphora resolution. That example will end our discussion of anaphora in this article.</Paragraph> <Paragraph position="2"> The last topic we intend to tackle is the semantic role of conjunctions. In subsection 6.2 we present a metalevel axiom dealing with the semantic role of the adversative conjunction &quot;but;&quot; then we talk about some of its consequences for constructing models of text. This will complete our investigation of the most important issues concerning paragraph structure: coherence (how one can determine that a paragraph expresses a &quot;thought'0, anaphora (how one can compute &quot;links&quot; between entities that a paragraph talks about), and cohesion (what makes a paragraph more than just a sum of sentences). Of course, we will not have final answers to any of these problems, but we do believe that the/a direction of search for computational models of text will be visible at that point.</Paragraph> <Paragraph position="3"> We assume a flat structure of the metalevel, envisioning it as a collection of (closed) formulas written in the language of set theory or higher order logic. In either of the two theories it is possible to define the notions of a model, satisfiability, provability, etc. for any first order language (cf. e.g. Shoenfield 1967); therefore the metalevel formulas can say how partial theories should be constructed (specifying for instance the meaning of 0) and what kinds of models are admissible. The metarules thus form a logical theory in a special language, such as the language of ZF-set theory. However, for the sake of readability, we express all of them in English.</Paragraph> <Section position="1" start_page="198" end_page="199" type="sub_section"> <SectionTitle> 6.1 A Formalization of Gricean Maxims </SectionTitle> <Paragraph position="0"> A Gricean Cooperative Principle applies to text, too. For instance, in normal writing people do not express common knowledge about typical functions of objects. In fact, as the reader may check for himself, there is nothing in Gricean maxims that does not apply to written language. That the maxims play a semantic role is hardly surprising.</Paragraph> <Paragraph position="1"> But that they can be axiomatized and used in building formal models of texts is new.</Paragraph> <Paragraph position="2"> We present in the next couple of paragraphs our formalization of the first maxim, and sketch axiomatizations of the others. Then we will apply the formal rule in an example.</Paragraph> <Paragraph position="3"> Gricean maxims, after formalization, belong to the rnetalevel. This can be seen from our formalization of the rule &quot;don't say too much.&quot; To this end we define redundancy of a partial theory T of a paragraph as the situation in which some sentences can be logically derived from other sentences and from the theory T in a direct manner: (3S E/5)(3cr E R)\[a E T& e : ~b --~ ~b &/St- ~ & {~} U (/5- {S}) F- S\] The meaning of this formula can be explained as follows: a paragraph P has been translated into its formal version/5 and is to be examined for redundancy. Its partial theory PT(/5) has also been computed. The test will turn positive if, for some sentence S, we can find a rule/theorem a = ~ --* ~ in PT(P) such that the sentence S is implied (in a classical logic) by the other sentences and ~. For example, if the paragraph about Black Death were to contain also the sentence The ship carried people or goods, or The Gricean maxims both, which (in its logical form) belongs to R, it would be redundant: ~ = (shl), there. Similarly, the definition takes care of the redundancy resulting from a simple repetition.</Paragraph> </Section> <Section position="2" start_page="199" end_page="201" type="sub_section"> <SectionTitle> Metarule Gla </SectionTitle> <Paragraph position="0"> (nonredundancy) If T1, T2 C PT(P) and T1 is less redundant than T2, then the theory T1 is preferred to T2. (Where &quot;less redundant&quot; means that the number of redundant sentences in T1 is smaller than in T2) The relevant half of the Maxim of Quantity has been expressed by Gla. How would we express the other maxims? The &quot;too little&quot; part of the first maxim might be represented as a preference for unambiguous partial theories. The second maxim has been assumed all the time--when constructing partial theories or models, the sentences of a paragraph are assumed to be true. The Maxim of Manner seems to us to be more relevant for critiquing the style of a written passage or for natural language generation; in the case of text generation, it can be construed as a requirement that the produced text be coherent and cohesive.</Paragraph> <Paragraph position="1"> We do not claim that Gla is the best or unique way of expressing the rule &quot;assume that the writer did not say too much.&quot; Rather, we stress the possibility that one can axiomatize and productively use such a rule. We shall see this in the next example: two sentences, regarded as a fragment of paragraph, are a variation on a theme by Hobbs (1979).</Paragraph> <Paragraph position="2"> Example 3 The captain is worried because the third officer can open his safe. He knows the combination. The above metarule postulating &quot;nonredundancy&quot; implies that &quot;he&quot; = &quot;the third officer, .... his&quot; = &quot;the captain's&quot; are the referents of the pronouns. This is because the formula safe(x) --, (owns(y~ x) & cmbntn(z~ x) --, knows(y~ z) & can_open(y~ x) ) E Tsafe, belongs to R, since it is common knowledge about safes that they have owners, and also combinations that are known to the owners. Therefore &quot;his&quot; = &quot;the third officer's&quot; would produce a redundant formula, corresponding to the sentence The third officer can open the third officer's safe. By the same token, The captain knows the combination would be redundant too.</Paragraph> <Paragraph position="3"> Zadrozny and Jensen Semantics of Paragraphs We now explain the details of this reasoning. One first proves that &quot;his&quot; = &quot;the captain's.&quot; Indeed, if &quot;his&quot; = &quot;the third officer's,&quot; then our example sentence would mean ? The captain is worried because the third officer can open the third officer's safe; in logic: captain(x) & worry(x,s) & sentence(s) & s = 'the third officer can open the third officer's safe.&quot; We assume also, based on common knowledge about worrying, that worry(x', s') -~ S. That is, one worries about things that might possibly be or become true (S denotes the logical formula corresponding to the sentence s, cf. Section 3); but one doesn't worry about things that are accepted as (almost) always true (such as the law of gravity), so that worry(x', 's') ~ -~(S E Tf), where f ranges over subformulas of S. In our case, S immediately follows from Tsafe and X, where X = safe(sf) & third_officer(o) & owns(o, sf)--the fact that &quot;the third officer can open the third officer's safe&quot; is a consequence of general knowledge about the ownership of safes. And therefore the interpretation with &quot;his&quot; = &quot;the captain's&quot; is preferred as less redundant by the rule Gla. This theory contains representations of the two sentences, the theory of safes, a theory of worrying, and the equality &quot;his&quot; = &quot;captain's.&quot; It remains to prove that &quot;he&quot; = &quot;the third officer.&quot; Otherwise we have P1. The captain is worried because the third officer can open the captain's safe. P2. ? The captain knows the combination.</Paragraph> <Paragraph position="4"> Clearly, the last sentence is true but redundant--the theory of &quot;safe&quot; and P1 entail P2: {P1} U Tsafe t- P2 We are left with the combination Q1. The captain is worried because the third officer can open the captain's safe. Q2. ? The third officer knows the combination.</Paragraph> <Paragraph position="5"> In this case, Q2 does not follow from {Q1} u Tsafe and therefore Q1, Q2 is preferred to P1, P2 (by Gla). We obtain then The captain is worried because the third officer can open the captain's safe. The third officer knows the combination as the most plausible interpretation of our example sentences.</Paragraph> <Paragraph position="6"> Note: The reader must have noticed that we did not bother to distinguish the sentences P1, P2, Q1 and Q2 from their logical forms. Representing &quot;because&quot; and &quot;know&quot; adequately should be considered a separate topic; representing the rest (in the first order convention of this paper) is trivial.</Paragraph> <Paragraph position="7"> 6.1.1 Was the Use of a Gricean Maxim Necessary? Can one deal effectively with the problem of reference without axiomatized Gricean maxims, for instance by using only Computational Linguistics Volume 17, Number 2 &quot;petty conversational implicature&quot; (Hobbs 1979), or the metarules of Section 5.2? It seems to us that the answer is no.</Paragraph> <Paragraph position="8"> As a case in point, consider the process of finding the antecedent of the anaphor &quot;he&quot; in the sentences John can open Bill's safe. He knows the combination.</Paragraph> <Paragraph position="9"> Hobbs (1979, 1982) proves &quot;he&quot; = &quot;John&quot; by assuming the relation of &quot;elaboration&quot; between the sentences. (Elaboration is a relation between two segments of a text. It intuitively means &quot;expressing the same thought from a different perspective,&quot; but has been defined formally as the existence of a proposition implied by both segments-here the proposition is &quot;John can open the safe&quot;.) However, if we change the pair to the triple Bill has a safe under the painting of his yacht. John can open Bill's safe. He knows the combination the relation of elaboration holds between the segment consisting of the first two sentences of the triple and each of the two possible readings: John knows the combination and Bill knows the combination. In this case, elaboration cannot choose the correct referent, but the rule Gla can and does. Clearly, an elaboration should not degenerate into redundancy; the Gricean maxims are to keep it fresh.</Paragraph> <Paragraph position="10"> As we have observed, correct interpretations cannot be chosen by an interaction of an object level theory and a referential level alone, because coherence, plausibility and consistency are too weak to weed out wrong partial theories. Metarules are necessary. True, the captain knew the combination, but it was consistent that &quot;his&quot; might have referred to &quot;the third officer's.&quot;</Paragraph> </Section> <Section position="3" start_page="201" end_page="204" type="sub_section"> <SectionTitle> 6.2 Semantics of the Conjunction &quot;But&quot; </SectionTitle> <Paragraph position="0"> Any analysis of natural language text, to be useful for a computational system, will have to deal with coherence, anaphora, and connectives. We have examined so far the first two concepts; we shall present now our view of connectives to complete the argument about paragraphs being counterparts of models. We present a metalevel rule that governs the behavior of the conjunction &quot;but;&quot; we formalize the manner in which &quot;but&quot; carries out the contradiction. Then we derive from it two rules that prevent infelicitous uses of &quot;but.&quot; Connectives are function words--like conjunctions and some adverbs--that are responsible simultaneously for maintaining cohesiveness within the text and for signaling the nature of the relationships that hold between and among various text units. &quot;And,&quot; &quot;or,&quot; and &quot;but&quot; are the three main coordinating connectives in English. However, &quot;but&quot; does not behave quite like the other two--semantically, &quot;but&quot; signals a contradiction, and in this role it seems to have three subfunctions: .</Paragraph> <Paragraph position="1"> .</Paragraph> <Paragraph position="2"> Opposition (called &quot;adversative&quot; or &quot;contrary-to-expectation&quot; by Halliday and Hasan 1976; cf. also Quirk et al. 1972, p. 672).</Paragraph> <Paragraph position="3"> The ship arrived but the passengers could not get off.</Paragraph> <Paragraph position="4"> The yacht is cheap but elegant.</Paragraph> <Paragraph position="5"> Comparison. In this function, the first conjunct is not so directly contradicted by the second. A contradiction exists, but we may have to Zadrozny and Jensen Semantics of Paragraphs go through additional levels of implication to find it. Consider the sentence: .</Paragraph> <Paragraph position="6"> That basketball player is short, but he's very quick.</Paragraph> <Paragraph position="7"> Affirmation. This use of &quot;but&quot; always follows a negative clause, and actually augments the meaning of the preceding clause by adding supporting information: The disease not only killed thousands of people, but also ended a period of economic welfare.</Paragraph> <Paragraph position="8"> In this section we consider only the first, or adversative, function of the coordinating conjunction &quot;but.&quot; discourse. Because it expresses some kind of contradiction, &quot;but&quot; has no role in the propositional calculus equivalent to the roles filled by &quot;and&quot; and &quot;or.&quot; Although there are logical formation rules using the conjunction operator (&quot;and&quot;) and the disjunction operator (&quot;or&quot;), there is no &quot;but&quot; operator. What, then, is the semantic role of &quot;but&quot;? We believe that its function should be described at the metalevel as one of many rules guiding the construction of partial theories. This is expressed below. Metarule (BUT) The formulas * but k~, ~' but ~1 .... of a (formal representation of) paragraph P are to be interpreted as follows: In the construction of any T E PT(P) instead of taking @f to be the union U of rr --* T~, take the union of ~ --* TC//{k~, ~',...}.</Paragraph> <Paragraph position="9"> The symbol cr ---* T~,/{qt, C/d',...} denotes a maximal consistent with {or, ~, k~',...} subtheory of r~ --* TC/, and in general T/T t will be a maximal consistent with T' subtheory of T.</Paragraph> <Paragraph position="10"> &quot;But&quot; is then an order to delete from background information everything contradicting * , but to use what remains. Notice that &quot;and&quot; does not have this meaning; a model for * and * will not contain any part of a theory that contradicts either of the clauses * or C/d.</Paragraph> <Paragraph position="11"> Typically this rule will be used to override defaults, to say that the expected consequences of the first conjunct hold except for the fact expressed by the second conjunct; for instance: We were coming to see you, but it rained (so we didn't). The rule BUT is supposed to capture the &quot;contrary-to-expectation&quot; function of &quot;but.&quot; We present now a simple example of building a model of a one-sentence paragraph containing &quot;but.&quot; We will use this example to explain how the rule BUT can be used. Using background information presented below, we will construct a partial model for this one-sentence paragraph.</Paragraph> <Paragraph position="13"> Note: Compare (yl) with (cl); in (yl) smallness is a property of a ship; this would be more precisely expressed as yacht(x) --* \[ship(x); property: small(x)\]. This trick allows us not to conclude that &quot;a big ant is big,&quot; or &quot;a small elephant is small.&quot; We ignore the problem of multiple meanings (theories) of predicates, and assume the trivial ordering in which all formulas are equally preferred. (But note that (e_yl) is still preferred to (el) as a more specific theory of &quot;elegant;&quot; cf. Section 3.) Construction of the model: In our case * ___ yacht(yo) & cheap(yo) and * =__</Paragraph> <Paragraph position="15"> We can now use the Metarule (BUT) and construct the partial theories of the sentence.</Paragraph> <Paragraph position="16"> In this case, there is only one:</Paragraph> <Paragraph position="18"> status-symbol (Yo )~ poor_quality(yo ) } In other words, the yacht in question is a poor quality small and elegant ship serving as an inexpensive status symbol.</Paragraph> <Paragraph position="19"> The partial model of the theory T is quite trivial: it consists of one entity representing the yacht and of a bunch of its attributes. However, the size of the model is not important here; what counts is the method of derivation of the partial theory. 6.2.2 Confirming the Analysis. The Metarule (BUT) is supposed to capture the &quot;contrary-to-expectation&quot; function of &quot;but.&quot; The next two rules follow from our formaliza- null Zadrozny and Jensen Semantics of Paragraphs tion; their validity indirectly supports the plausibility of our analysis of &quot;but.&quot; BUT_C1 : ff but -~ is incorrect, if * --* * is a &quot;law.&quot; e.g. Henry was murdered but not killed.</Paragraph> <Paragraph position="20"> Our referential level is a collection of partially ordered theories; we have expressed the fact that a theory of some C/ is a &quot;law&quot; is by deleting the empty interpretation of C/ from the partial order. If we accept the definition of a concept as given by necessary and sufficient conditions, the theories would all appear as laws. If we subscribe to a more realistic view where definitions are given by a collection of central/prototypical and peripheral conditions, only the peripheral ones can be contradicted by &quot;but.&quot; In either formalization we get BUT_C1 as a consequence: Since &quot;laws&quot; cannot be deleted, BUT can't be applied, and hence its use in those kinds of sentences would be incorrect. W. Labov (1973) discussed sentences of the form ,This is a chair but you can sit on it.</Paragraph> <Paragraph position="21"> The sentence is incorrect, since the function &quot;one can sit on it&quot; belongs to the core of the concept &quot;chair&quot;; so--contrary to the role of &quot;but&quot;--the sentence does not contain any surprising new elements. Using the Metarule (BUT) and the cooperative principle of Grice, we get BUT_C2: * but * is incorrect, if * -* k~ is a &quot;law.&quot; The Metarule (BUT) gives the semantics of &quot;but;&quot; the rules BUT_C1 and BUT_C2 follow from it (after formalization in a sufficiently strong rnetalanguage such as type theory or set theory). We can link all of them to procedures for constructing and evaluating models of text. Are they sufficient? Certainly not. We have not dealt with the other usages of '%ut;&quot; neither have we shown how to deal with the apparent asymmetry of conclusions: cheap but elegant seems to imply &quot;worth buying,&quot; but elegant but cheap doesn't; we have ignored possible prototypical effects in our semantics. However, we do believe that other rules, dealing with &quot;but&quot; or with other connectives, can be conveniently expressed in our framework. (The main idea is that one should talk explicitly and formally about relations between text and background knowledge, and that this knowledge is more than just a list of facts--it has structure, and it is ambiguous.) Furthermore, the semantics of &quot;but&quot; as described above is computationally tractable. We also believe that one could not present a similarly natural account of the semantics of &quot;but&quot; in traditional logics, because classical logics withstand contradictions with great difficulty. Contradiction, however, is precisely what &quot;but&quot; expresses. Notice that certain types of coordinating conjunctions often have their counterparts in classical logic: copulative (and, or, neither-nor, besides, sometimes etc.), disjunctive (like either-or), illative (hence, for that reason). Others, such as explanatory (namely or viz.) or causal (for) conjunctions can probably be expressed somehow, for better or worse, within a classical framework. Thus the class of adversative conjunctions (but, still, and yet, nevertheless) is in that sense unique.</Paragraph> </Section> </Section> class="xml-element"></Paper>