XML Viewer - j91-2003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/91/j91-2003_abstr.xml
Size: 22,030 bytes
Last Modified: 2025-10-06 13:47:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="J91-2003">
  <Title>Semantics of Paragraphs Wlodek Zadrozny *</Title>
  <Section position="2" start_page="0" end_page="175" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Logic and knowledge have been often discussed by linguists. Anaphora is another prominent subject in linguistic analyses. Not so frequently examined are different types of cohesion. And it is quite rare to find the word &amp;quot;paragraph&amp;quot; in articles or books about natural language understanding, although paragraphs are grammatical units and units of discourse. But it is possible to speak formally about the role of background knowledge, cohesion, coherence and anaphora--all within one, flexible and natural, logical system--if one examines the semantic role of the linguistic construct called a paragraph.</Paragraph>
    <Paragraph position="1"> Paragraphs have been sometimes described, rather loosely, as &amp;quot;units of thought.&amp;quot; We establish a correspondence between them and certain types of logical models, thereby making the characterization of paragraphs more precise. The correspondence gives us also an opportunity to identify and attack--with some success, we believe-three interesting and important problems: (1) how to define formally coherence and topic, (2) how to resolve anaphora, and (3) what is the formal meaning of linkages (connectives) such as but, however, and, certainly, usually, because, then, etc. These questions are central from our point of view because: (1) the &amp;quot;unity&amp;quot; of a paragraph stems from its coherence, while the &amp;quot;aboutness&amp;quot; of thought can be, at least to some extent, described as existence of a topic; (2) without determining reference of pronouns and phrases, the universes of the models are undefined; and (3) the linkages, which make sentences into paragraphs, have semantic roles that must be accounted for. We can explain then the process of building a computational model of a paragraph (a p-model) as an interaction between its sentences, background knowledge to which these sentences refer, and metatheoretical operators that indicate types of permitted models.</Paragraph>
    <Paragraph position="2"> * P.O. Box 704, Yorktown Heights, NY 10598 (~ 1991 Association for Computational Linguistics Computational Linguistics Volume 17, Number 2 At this point the reader may ask: what is so special about paragraphs; does all this mean that a chapter, a book or a letter do not have any formal counterparts? We believe they do. But we simply do not yet know how corresponding formal structures would be created from models of paragraphs. ~Ib answer this question we may need more advanced theories of categorization and learning than exist today. On the other hand, the paragraph is the right place to begin: it is the next grammatical unit after the sentence; connectives providing cohesion operate here, not at the level of an entire discourse; and it is the smallest reasonable domain of anaphora resolution. Furthermore, we will argue, it is the smallest domain in which topic and coherence can be defined.</Paragraph>
    <Paragraph position="3"> The formalization of paragraph structure requires the introduction of a new type of logical theory and a corresponding class of models. As we know, the usual logical structures consist of an object level theory T and provability relation F-; within the context of the semantics of natural language, the object theory contains a logical translation of the surface form of sentences, and F- is the standard provability relation (logical consequence). In mathematical logic, this scheme is sometimes extended by adding a metalevel assumption, for instance postulating the standardness of natural numbers. In artificial intelligence, a metarule typically, the closed world assumption of circumscription---can be used in dealing with theoretical questions, like the frame problem. But a formal account of natural language understanding requires more. It requires at least a description (a) of how background knowledge about objects and relations that the sentences describe is used in the process of understanding, and (b) of general constraints on linguistic communications, as expressed for instance in Gricean maxims. It is well known that without the former it is impossible to find references of pronouns or attachments of prepositional phrases; background knowledge, as it turns out, is also indispensable in establishing coherence. We have then reasons for introducing a new logical level--a referential level R, which codes the background knowledge. As for Gricean maxims, we show that they can be expressed formally and can be used in a computational model of communication. We include them in a metalevel M, which contains global constraints on models of a text and definitions of meta-operators such as the conjunction but. We end up with three-level logical theories (M, T, R, ~-R + M), where a provability relation ~-~ + M, based on R and M, can be used, for example, to establish the reference of pronouns.</Paragraph>
    <Paragraph position="4"> This work is addressed primarily to our colleagues working on computational models of natural language; but it should be also of interest to linguists, logicians, and philosophers. It should be of interest to linguists because the notion that a paragraph is equivalent to a model is something concrete to discuss; because p-models are as formal as formal languages (and therefore something satisfyingly theoretical to argue about); and because new directions for analysis are opened beyond the sentence. The work should be of interest to logicians because it introduces a new type of three-level theory, and corresponding models. The theory of these structures, which are based on linguistic constructs, will differ from classical model theory--for instance, by the fact that names of predicates of an object theory matter, because they connect the object theory with the referential level. This work should be of interest to philosophers for many of the same reasons: it makes more sense to talk about the meaning of a paragraph than about the meaning of a sentence. The following parallel can be drawn: a sentence is meaningful only with respect to a model of a paragraph, exactly as the truth value of a formula can be computed only with respect to a given model.</Paragraph>
    <Paragraph position="5"> Moreover, it is possible in this framework to talk about meaning without mentioning the idea of possible worlds. However, we do not identify meaning with truth conditions; in this paper, the meaning of a sentence is its role in the model of the paragraph  Zadrozny and Jensen Semantics of Paragraphs in which this sentence occurs. Our intuitive concept of meaning is similar to Lakoff's (1987) Idealized Cognitive Model (ICM). Needless to say, we believe in the possibility of formalizing ICMs, although in this paper we will not try to express, in logic, prototype effects, metaphors, or formal links with vision.</Paragraph>
    <Paragraph position="6"> The paper is presented in six sections. In Section 2, we discuss the grammatical function of the paragraph and we show, informally, how a formal model of a paragraph might actually be built. In Section 3 we give the logical preliminaries to our analysis. We discuss a three-part logical structure that includes a referential level, and we introduce a model for plausible meaning. Section 4 discusses paragraph coherence, and Section 5 constructs a model of a paragraph, a p-model, based on the information contained in the paragraph itself and background information contained in the referential level R. Section 5 further motivates the use of the referential level, showing how it contributes to the resolution of anaphoric reference. In Section 6, we broaden our presentation of the metalevel, introducing some metalevel axioms, and sketching ways by which they can be used to reduce ambiguity and construct new models. We  also show metalevel rules for interpreting &amp;quot;but.&amp;quot; 2. The Paragraph as a Discourse Unit</Paragraph>
    <Section position="1" start_page="172" end_page="173" type="sub_section">
      <SectionTitle>
2.1 Approaches to Paragraph Analysis
</SectionTitle>
      <Paragraph position="0"> Recent syntactic theory--that is, in the last 30 years--has been preoccupied with sentence-level analysis. Within discourse theory, however, some significant work has been done on the analysis of written paragraphs. We can identify four different linguistic approaches to paragraphs: prescriptivist, psycholinguist, textualist, and discourseoriented. null The prescriptivist approach is typified in standard English grammar textbooks, such as Warriner (1963). In these sources, a paragraph is notionally defined as something like a series of sentences that develop one single topic, and rules are laid down for the construction of an ideal (or at least an acceptable) paragraph. Although these dictates are fairly clear, the underlying notion of topic is not.</Paragraph>
      <Paragraph position="1"> An example of psycholinguistically oriented research work can be found in Bond and Hayes (1983). These authors take the position that a paragraph is a psychologically real unit of discourse, and, in fact, a formal grammatical unit. Bond and Hayes found three major formal devices that are used, by readers, to identify a paragraph: (1) the repetition of content words (nouns, verbs, adjectives, adverbs); (2) pronoun reference; and (3) paragraph length, as determined by spatial and/or sentence-count information.</Paragraph>
      <Paragraph position="2"> Other psycholinguistic studies that confirm the validity of paragraph units can be found in Black and Bower (1979) and Haberlandt et al. (1980).</Paragraph>
      <Paragraph position="3"> The textualist approach to paragraph analysis is exemplified by E. J. Crothers. His work is taxonomic, in that he performs detailed descriptive analyses of paragraphs. He lists, classifies, and discusses various types of inference, by which he means, generally, &amp;quot;the linguistic-logical notions of consequent and presupposition&amp;quot; Crothers (1979:112) have collected convincing evidence of the existence of language chunks--real structures, not just orthographic conventions--that are smaller than a discourse, larger than a sentence, generally composed of sentences, and recursive in nature (like sentences).</Paragraph>
      <Paragraph position="4"> These chunks are sometimes called &amp;quot;episodes,&amp;quot; and sometimes &amp;quot;paragraphs.&amp;quot; According to Hinds (1979), paragraphs are made up of segments, which in turn are made up of sentences or clauses, which in turn are made up of phrases. Paragraphs therefore give hierarchical structure to sentences. Hinds discusses three major types of paragraphs, and their corresponding segment types. The three types are procedural (how-to), expository (essay), and narrative (in this case, spontaneous conversation). For each type,  Computational Linguistics Volume 17, Number 2 its segments are distinguished by bearing distinct relationships to the paragraph topic (which is central, but nowhere clearly defined). Segments themselves are composed of clauses and regulated by &amp;quot;switching&amp;quot; patterns, such as the question-answer pattern and the remark-reply pattern.</Paragraph>
    </Section>
    <Section position="2" start_page="173" end_page="175" type="sub_section">
      <SectionTitle>
2.2 Our View of Paragraphs: An Informal Sketch
</SectionTitle>
      <Paragraph position="0"> Although there are other discussions of the paragraph as a central element of discourse (e.g. Chafe 1979, Halliday and Hasan 1976, Longacre 1979, Haberlandt et al. 1980), all of them share a certain limitation in their formal techniques for analyzing paragraph structure. Discourse linguists show little interest in making the structural descriptions precise enough so that a computational grammar of text could adapt them and use them. Our interest, however, lies precisely in that area.</Paragraph>
      <Paragraph position="1"> We suggest that the paragraph is a grammatical and logical unit. It is the smallest linguistic representation of what, in logic, is called a &amp;quot;model,&amp;quot; and it is the first reasonable domain of anaphora resolution, and of coherent thought about a central topic.</Paragraph>
      <Paragraph position="2"> A paragraph can be thought of as a grammatical unit in the following sense: it is the discourse unit in which a functional (or a predicate-argument) structure can be definitely assigned to sentences/strings. For instance, Sells (1985, p. 8) says that the sentence &amp;quot;Reagan thinks bananas,&amp;quot; which is otherwise strange, is in fact acceptable if it occurs as an answer to the question &amp;quot;What is Kissinger's favorite fruit?&amp;quot; The pairing of these two sentences may be said to create a small paragraph. Our point is that an acceptable structure can be assigned to the utterance &amp;quot;Reagan thinks bananas&amp;quot; only within the paragraph in which this utterance occurs. We believe that, in general, no unit larger than a paragraph is necessary to assign a functional structure to a sentence, and that no smaller discourse fragment, such as two (or one) neighboring sentences, will be sufficient for this task. That is, we can ask in the first sentence of a paragraph about Kissinger's favorite fruit, elaborate the question and the circumstances in the next few sentences, and give the above answer at the end. We do not claim that a paragraph is necessarily described by a set of grammar rules in some grammar formalism (although it may be); rather, it has the grammatical role of providing functional structures that can be assigned to strings.</Paragraph>
      <Paragraph position="3"> The logical structure of paragraphs will be analyzed in the next sections. At this point we would like to present some intuitions that led to this analysis. But first we want to identify our point of departure. In order to resolve anaphora and to establish the coherence or incoherence of a text, one must appeal to the necessary background knowledge. Hence, a formal analysis of paragraphs must include a formal description of background knowledge and its usage. Furthermore, this background knowledge cannot be treated as a collection of facts or formulas in some formal language, because that would preclude dealing with contradictory word senses, or multiple meanings.</Paragraph>
      <Paragraph position="4"> Secondly, this background knowledge is not infinite and esoteric. In fact, to a large extent it can be found in standard reference works such as dictionaries and encyclopedias. To argue for these points, we can consider the following paragraph: 1 In the summer of 1347 a merchant ship returning from the Black Sea entered the Sicilian port of Messina bringing with it the horrifying disease that came to be known as the Black Death. It struck rapidly. Within twenty-four hours of infection and the appearance of the first small black pustule came an agonizing death. The effect of the Black Death was appalling. In less than twenty years half  Zadrozny and Jensen Semantics of Paragraphs the population of Europe had been killed, the countryside devastated, and a period of optimism and growing economic welfare had been brought to a sudden and catastrophic end.</Paragraph>
      <Paragraph position="5"> The sentences that compose a paragraph must stick together; to put it more technically, they must cohere. This means very often that they show cohesion in the sense of Halliday (1976)--semantic links between elements. Crucially, also, the sentences of a paragraph must all be related to a topic.</Paragraph>
      <Paragraph position="6"> However, in the example paragraph, very few instances can be found here of the formal grammatical devices for paragraph cohesion. There are no connectives, and there are only two anaphoric pronouns (both occurrences of &amp;quot;it'0. In each case, there are multiple possible referents for the pronoun. The paragraph is coherent because it has a topic: &amp;quot;Black Death&amp;quot;; all sentences mention it, explicitly or implicitly. Notice that resolving anaphora precedes the discovery of a topic. A few words about this will illustrate the usage of background knowledge. By parsing with syntactic information alone, we show that resolution of the first &amp;quot;it&amp;quot; reference hinges on the proper attachment of the participial clause &amp;quot;bringing within it... &amp;quot;. If the &amp;quot;bringing&amp;quot; clause modifies &amp;quot;Messina,&amp;quot; then &amp;quot;Messina&amp;quot; is the subject of '`bringing&amp;quot; and must be the referent for &amp;quot;it.&amp;quot; If the clause modifies &amp;quot;port,&amp;quot; then &amp;quot;port&amp;quot; is the desired referent; if the clause is attached at the level of the main verb of the sentence, then &amp;quot;ship&amp;quot; is the referent.</Paragraph>
      <Paragraph position="7"> But syntactic relations do not suffice to resolve anaphora: Hobbs' (1976) algorithm for resolving the reference of pronouns, depending only on the surface syntax of sentences in the text, when applied to &amp;quot;it&amp;quot; in the example paragraph, fails in both cases to identify the most likely referent NP.</Paragraph>
      <Paragraph position="8"> Adding selectional restrictions (semantic feature information, Hobbs 1977) does not solve the problem, because isolated features offer only part of the background knowledge necessary for reference disambiguation. Later, Hobbs (1979, 1982) proposed a knowledge base in which information about language and the world would be encoded, and he emphasized the need for using &amp;quot;salience&amp;quot; in choosing facts from this knowledge base.</Paragraph>
      <Paragraph position="9"> We will investigate the possibility that the structure of this knowledge base can actually resemble the structure of, for example, natural language dictionaries. The process of finding referents could then be automated.</Paragraph>
      <Paragraph position="10"> Determining that the most likely subject for &amp;quot;bringing,&amp;quot; in the first sentence, is the noun &amp;quot;ship&amp;quot; is done in the following fashion. The first definition for &amp;quot;bring&amp;quot; in W7 (Webster's Seventh Collegiate Dictionary) is &amp;quot;to convey, lead, carry, or cause to come along with one...&amp;quot; The available possible subjects for &amp;quot;bringing&amp;quot; are &amp;quot;Messina,&amp;quot; &amp;quot;port,&amp;quot; and &amp;quot;ship.&amp;quot; &amp;quot;Messina&amp;quot; is listed in the Pronouncing Gazetteer of W7, which means that it is a place (and is so identified in the subtitle of the Gazetteer). So we can substitute the word &amp;quot;place&amp;quot; for the word &amp;quot;Messina.&amp;quot; Then we check the given definitions for the words &amp;quot;place,&amp;quot; &amp;quot;port,&amp;quot; and &amp;quot;ship&amp;quot; in both dictionaries. LDOCE (Longman Dictionary of Contemporary English) proves particularly useful at this point. Definitions for &amp;quot;place&amp;quot; begin: &amp;quot;a particular part of space... &amp;quot;. Definitions for &amp;quot;port&amp;quot; include: &amp;quot;harbour... &amp;quot;; &amp;quot;an opening in the side of a ship... &amp;quot;. But the first entry for &amp;quot;ship&amp;quot; in LDOCE reads &amp;quot;a large boat for carrying people or goods... &amp;quot;. This demonstrates a very quick connection with the definition for the verb &amp;quot;bring,&amp;quot; since the word &amp;quot;carry&amp;quot; occurs in both definitions. It requires much more time and effort to find a connection between &amp;quot;bring&amp;quot; and either of the other two candidate subject words &amp;quot;place&amp;quot; or &amp;quot;port.&amp;quot; Similar techniques can be used to assign &amp;quot;disease&amp;quot; as the most probable referent for the second &amp;quot;it&amp;quot; anaphor in our example paragraph.</Paragraph>
      <Paragraph position="11">  Computational Linguistics Volume 17, Number 2 Equally significant in this instance is the realization that a dictionary points to synonym and paraphrase relations, and thereby verifies the cohesiveness of the passage. Through the dictionary (LDOCE again), we establish shared-word relationships between and among the words &amp;quot;disease,&amp;quot; &amp;quot;Black death,&amp;quot; &amp;quot;infection,&amp;quot; &amp;quot;death,&amp;quot; &amp;quot;killed,&amp;quot; and &amp;quot;end.&amp;quot; Note that there is no other means, short of appealing to human understanding or to some hand-coded body of predicate assertions, for making these relationships. null This demonstrates that information needed to identify and resolve anaphoric references can be found, to an interesting extent, in standard dictionaries and thesauri. (Other reference works could be treated as additional sources of world knowledge.) This type of consultation uses existing natural language texts as a referential level for processing purposes. It is the lack of exactly this notion of referential level that has stood in the way of other linguists who have been interested in the paragraph as a unit. Crothers (1979, p. 112), for example, bemoans the fact that his &amp;quot;theory lacks a world knowledge component, a mental 'encyclopedia,' which could be invoked to generate inferences... &amp;quot;. With respect to that independent source of knowledge, our main contributions are two. First, we identify its possible structure (a collection of partially ordered theories) and make formal the choice of a most plausible interpretation. In other words, we recognize it as a separate logical level--the referential level. Second, we suggest that natural language reference works, like dictionaries and thesauri, can quite often fill the role of the referential level.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML