File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0223_metho.xml

Size: 31,339 bytes

Last Modified: 2025-10-06 14:12:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W91-0223">
  <Title>The Autonomy of Shallow Lexical Knowledge</Title>
  <Section position="3" start_page="0" end_page="267" type="metho">
    <SectionTitle>
2 Theoretical Considerations
</SectionTitle>
    <Paragraph position="0"> The first question to consider is where and how the line is drawn between linguistic knowledge and knowledge in general. Arguments from both psychology and theoretical linguistics are relevant here. Cognitive psychologist Fodor (1987) argues that linguistic knowledge is contained in a module which automatically (deterministically) processes the speech input and produces a logic form without any access to other cognitive components.</Paragraph>
    <Paragraph position="1"> In this modularity hypothesis, other cognitive components have no access to the intermediate representations produced in the linguistic module. Its formal properties are thought to he the result of innate features of human intelligence. It operates in a mandatory way without access to &amp;quot;central processes.&amp;quot; It does not &amp;quot;grade off insensibly into inference and appreciation of context&amp;quot; (19877). One of the chief arguments for the autonomy of the linguistic model is the speed with which it must operate. In contrast, inference is assumed to be open-ended and time-consuming.</Paragraph>
    <Paragraph position="2"> While many agree that syntax has special formal properties which indicate the existence of a distinct module which operates under the constraints reflected in these formal  properties, its autonomy is controversial. In particular, Marslen-Wilson and Tyler (1987) have conducted a number of experiments to show that the production of a discourse model which may require pragmatic knowledge is just as fast as the production of logical form, and that there is influence from the discourse level on syntactic choice. The experiment showing the latter point involved a contrast between adjective and gerund readings of expressions like &amp;quot;visiting relatives&amp;quot;. They had subjects read texts with either an adjectival or gerund bias to establish a context, as in: Adjectival Bias: If you want a cheap holiday, visiting relatives ...</Paragraph>
    <Paragraph position="3"> Gerund Bias: If you have a spare bedroom, visiting relatives...</Paragraph>
    <Paragraph position="4"> They gave subjects a probe of &amp;quot;is&amp;quot; or &amp;quot;are&amp;quot; in these contexts. Response times for &amp;quot;is&amp;quot; were significantly slower in the gerund bias context, and response times for &amp;quot;are&amp;quot; were significantly slower in the adjectival bias case. Thus there are context effects at the earliest point that can be measured. Since central processes are already demonstrably active during processing of the third word in these ambiguous clauses, Marslen-Wilson and Tyler argue that the empirical facts are just as consistent with a model which &amp;quot;allows continuous or even predictive interactions between levels&amp;quot; as they are with a hypothesis in which an encapsulated module produces all possible readings for a higher-level module which sorts them out using inference. They suggest a model in which an independent syntactic processor and an inferencing mechanism contribute in parallel to the construction of a discourse model. Where syntax fails to determine an element unambiguously, the inferencing mechanism fills the gap or makes the choice.</Paragraph>
    <Paragraph position="5"> Let us assume a Marslen-Wilson type model in which pragmatics affects sentence interpretation and disambiguation. The pragmatics they consider ranges from selectional restrictions to knowledge that visitors stay in spare bedrooms. Whether or not this knowledge is linguistic knowledge is a question for theoretical linguistics. Is knowledge linguistic only if it is an element of the machinery required for syntactic processing? Another possibility is that the world knowledge required for interpretation is restricted in principled, predictable ways. Such constraints would indicate a level or module responsible for providing just the information required for linguistic interpretation. This could be called lexical semantic knowledge, as distinguishable from knowledge in general. A separate issue is whether such knowledge has a form which is different from the cognitive models which are the output of linguistic interpretation or from memory. It is possible that lexical semantics could provide the world knowledge necessary for interpretation via a specialized formal language, or via representations which are employed in other types of inference. My claim is that there is an identifiable, constrained layer of linguistic semantic knowledge, but that its form does not differ from the form of general conceptual knowledge.</Paragraph>
    <Paragraph position="6"> Evidence for constraints upon the world knowledge used during sentence processing comes from both psycholinguistic research and my own work in computational linguistcs.</Paragraph>
    <Paragraph position="7"> The protracted debate over the existence of semantic primitives resulted in their ultimate rejection and provided evidence that lexical knowledge does not differ from other knowledge in form of representation. Addressing first the question of form, I will sketch the debate and its outcome.</Paragraph>
    <Paragraph position="8"> The classical theory of word meaning decomposed words into semantic primitives which had the force of truth conditions. The word water meant &amp;quot;clear, tasteless, liquid&amp;quot;. A sentence such as &amp;quot;That is water&amp;quot; was true only of a substance for which the predicates &amp;quot;clear&amp;quot;, &amp;quot;tasteless&amp;quot; and &amp;quot;liquid&amp;quot; were true. The implication was that word meanings were  required to have the force of scientific theories. True sentences couldn't be uttered unless the speakers had knowledge of the true properties of objects. This led Putnam (1975) and others to separate the theory of word meaning from the theory of reference. The relationship between sentences and states of the world sentences was given by a theory of reference. The theory of word meaning became a theory of the knowledge required to be competent in a language, and this knowledge was of prototypes of objects.</Paragraph>
    <Paragraph position="9"> Convergent with this development in the philosophy of language, Rosch (1976) and other cognitive psychologists questioned the assumption that conceptual knowledge took the form of semantic primitives. They found that categories have gradient properties, rather than the all-or-none membership predicted by the classical theory. As a result of these findings, the assumption that word meanings are decomposable into a small fixed set of primitives has been rejected (Smith and Medin, 1981). Where does that leave lexical semantics? Fillmore (1985) has proposed that word meanings are frames of default knowledge. Recent studies in cognitive psychology show that some concepts involve theories of the world (Murphy and Medin, 1985). Building on these findings, Dahlgren (1988) suggests that word meanings are naive theories of objects and events.</Paragraph>
    <Paragraph position="10"> The theory of lexical semantics proposed in Dahlgren (1988), naive semantics, takes lexical representations as concepts. A word names a concept, and also plays a role in a discourse model which can be subjected to formal reasoning for purposes of determining the truth conditions of a discourse. However, the concept the word names constitutes a naive theory of the objects of reference, so that reasoning with word meanings must be non-monotonic. Furthermore, the naive theory has much more information in it than would be included in a representation formed from a stock of semantic primitives. Thus the representation of water includes information such as: Water is typically a clear liquid, you find it in rivers, you find it in streams, you drink it, you wash with it.</Paragraph>
    <Paragraph position="11"> Furthermore, the knowledge places objects in a classification scheme (or ontology) which is intended to correspond to English speakers' conceptions of distinctions such as real versus abstract, and animate versus inanimate. The scheme is based upon psycholinguistic evidence, the classes required to represent verb selection restrictions, and the philosophical arguments concerning the distinction between sentients and all other objects. Study of protocols from experiments in the prototype theory reveal patterns of properties in naive theories. For example, artifacts have function, parts and operation features, animals have habitat and behavior features, while roles have function and status features. These patterns, called kind types, form constraints upon the feature types which are evident at nodes in the ontology.</Paragraph>
    <Paragraph position="12"> Knowledge of verb meanings consists of the implications of events. Cognitive psychological studies show that verbs are not conceived in terms of classes such as motion, exchange, but rather in terms of the other events surrounding the event the verb denotes (Graesser and Clark, 1985, Trabasso and Sperry, 1985). The typical causes, consequences, goals, instruments, locations of events are the main components of conceptual knowledge for verbs.</Paragraph>
    <Paragraph position="13"> When word meanings are identified with conceptual knowledge, a proliferation of mental representational types in the semantic lexicon is predicted. Color concepts have been shown to relate directly to the organization of color perception. Thus this theory predicts that words naming colors have meanings which include mappings to color perceptors.</Paragraph>
    <Paragraph position="14"> Words naming foods have meanings which include taste representations, along with some  verbal elements. Some words are fully represented in terms of other words. (At this stage of computational linguistics, of course, we are in a position to represent only the verbal elements of word meanings.) Thus the main assumptions of naive semantics are that words name concepts which are naive theories of objects and events. The content of these theories is not limited to a set of primitive features. Elements of meaning representations belong to a variety of sensory types. There is no difference in form between word meanings and cognitive representations.</Paragraph>
    <Paragraph position="15"> So far, naive semantics seems most consistent with a model in which there is no distinction between lexical semantics and world knowledge. This is the view of Hobbs, et al, (1986), as well as all of the computational linguistic theories which use frames and scripts to encode domain knowledge. In the Hobbs method all of the commonsense knowledge associated with a word is encoded. For example, extremely detailed levels of naive physics are represented with the expression &amp;quot;wear down&amp;quot;. However, experience with naive semantics in the development of a computational text understanding system indicates that an extremely shallow layer of knowledge is sufficient to provide the information for successful disambiguation and anaphor resolution. A theory which identifies word meanings as just the knowledge needed for linguistic interpretation flows from this experience.</Paragraph>
    <Paragraph position="16"> Furthermore, a theory which constrains lexical semantic knowledge to a very shallow layer would explain the real-time speed with which the discourse model is constructed.</Paragraph>
    <Paragraph position="17"> Fodor's fear of a universe of fridgeons is groundless. Interpretation does not involve an endless chain of inferences, but instead employs a short sequence of predictable inferences. This must be the case because cognitive psychologists have repeatedly demonstrated that inferences are drawn during discourse interpretation, and we know that many of these inferences are drawn in real time while hearing the utterance, rather than later. McKoon and Ratcliff (1990) have conducted a series of experiments to tease out the differences between the effects of discourse context and test questions in recall experiments, and to trace the time course of interpretive inferences. The experiments separate the variables of time and degree of familiarity of semantic information. They have found that well-known information contributes to interpretive inference within 250 ms of reading a sentence, while less-well-known information contributes only after 650 ms. One experiment involved the following context sentence: The old man loved his granddaughter and she liked to help him with his animals; she volunteered to do the milking whenever she visited the farm.</Paragraph>
    <Paragraph position="18"> When subjects were asked to recognize the word &amp;quot;cow&amp;quot; as having occurred in the sentence, the effect of the typically highly familiar association between cows and milking was evident within 250 ms. However, when the association between the context and test word was not highly familiar, the effect was not observed. Given the following sentence, The director and the cameraman were ready to shoot close-ups when suddenly the actress fell from the 14th story.</Paragraph>
    <Paragraph position="19"> When subjects were asked to recognize the word &amp;quot;dead,&amp;quot; the effect of context was not evident after 250 ms (though it was when subjects were given 650 ms). This experiment shows that highly typical information with strong association to words is employed during the construction of the interpretation of a sentence (milking and cows). Information which requires more inferencing is not employed during the interpretation, but can be called upon later (falling and death). McKoon and Ratcliff conclude that &amp;quot;inferences mainly  establish local coherence among immediately available pieces of information and there is only minimal encoding of other kinds of inferences.&amp;quot; These findings support a theory which isolates the highly associated, highly typical knowledge as linguistic semantic knowledge. Part of the explanation for the shallowness of interpretive inference lies in the cooperative principles of communication. If a speaker doesn't believe a hearer shares a piece of knowledge required for the interpretation, then the speaker will include that information in the utterance. Another part of the explanation for shallowness lies in the intuition that knowledge of one's language includes knowledge of the naive theories of other speakers of the language. We know the culture-wide theory of certain objects, even when we are experts on those objects and have a completely different personal theory. An example would be the word &amp;quot;computer&amp;quot;, which is a keyboard, monitor and printer to the non- technician, and is a central processing unit plus peripherals to the technician. The point is that technicians either know the naive theory, or they fail to communicate with non-technicians.</Paragraph>
    <Paragraph position="20"> Thus my claim is that a shallow layer of commonsense knowledge is sufficient to disambiguate and build a discourse model in real time. Furthermore, this shallow layer has a constrained range of feature types, if not of feature values.</Paragraph>
  </Section>
  <Section position="4" start_page="267" end_page="271" type="metho">
    <SectionTitle>
3 Experience with Naive Semantics in Computational
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="267" end_page="271" type="sub_section">
      <SectionTitle>
Text Understanding
</SectionTitle>
      <Paragraph position="0"> In computational linguistic research, I have been involved in the development of a system which reads and &amp;quot;understands&amp;quot; text sufficiently to answer questions about what the text says and to extract data from the text. The system contains interpretive algorithms which disambiguate the text, assign anaphors to antecedents, and connect events with coherence inferences. Each of these algorithms draws upon lexical semantic information.</Paragraph>
      <Paragraph position="1"> In the course of building the algorithms and testing them against corpora in three domains (geography, finance and terrorism), two results have emerged:  1. All of the algorithms use the same knowledge base of lexical information.</Paragraph>
      <Paragraph position="2"> 2. The algorithms succeed with only a very shallow layer of knowledge.</Paragraph>
      <Paragraph position="3">  In other words, the highly typical, strongly associated knowledge is the knowledge that is used to build an interpretation of just what the text says, as McKoon and Ratcliff found, and this is confirmed in the application of a shallow layer of naive semantics to disambiguation and discourse reasoning tasks. This layer is sufficient for the following interpretive components:  Three of the components (word sense disambiguation, PP attachment and anaphor resolution) have been tested against large corpora of text, and have been found to prefer the correct interpretation in over 95% of the cases. This statistical result is important because it is always possible to find examples which require more knowledge than is included in the naive semantics. The result shows that shallow layer of knowledge is sufficient in all but a few real cases. Again, the explanation for this sufficiency is that if the speaker believes that the hearer lacks an element of a naive theory, and that element is necessary for interpretation, the speaker is obligated to express it. Extrapolating to the naive semantic representations for English, the conceptual information subjects produce when asked to rapidly volunteer characteristics of objects and implications of events tends to be shared across a subculture. If information is not widely shared, speakers tend to state it explicitly. The use of naive semantic information containing only the shared knowledge has resulted in broad success statistically in the disambiguation and discourse algorithms; this would be the expected outcome if we assume that the writers of the test corpora followed the cooperative principle.</Paragraph>
      <Paragraph position="4"> The text understanding system Interpretext has been under development for over five years. The early system parsed English text, producing one parse per sentence. This parse was then subjected to disambiguation algorithms which reformed the parse to correctly attach prepositional phrases and disambiguate word senses. (At present we are building a new wide coverage parser which will use naive semantic information to disambiguate structure during the parse, reflecting adherence to a model in which the parser has access to and uses lexical semantics). The formal semantic component of the system translates the disambiguated parse into a Discourse Representation Structure (DRS) (Kamp, 1981). Each new sentence adds new predicates to the DRS. A discourse reasoning module finds the antecedents of anaphors (Dahlgren, Lord and McDowell, 1990) and assigns coherence relations between discourse events (Dahlgren, 1989). The resulting representation is a shallow cognitive model of the text content. It represents only the inferences which must be drawn in order to ensure that one syntactic structure is selected, that word senses are disambiguated, that the individuals or events which are the same are given the same reference markers, and that each discourse event is connected to some other in the discourse. The cognitive model is translated to first order logic, and thence to Prolog. Text retrieval is accomplished with a standard Prolog problem-solver.</Paragraph>
      <Paragraph position="5"> To illustrate the functioning of Interpretext, consider the following short text and the cognitive model produced by Interpretext (Figure 1).</Paragraph>
      <Paragraph position="6"> The parser produces a labelled bracketing for the first sentence which has the prepositional phrase &amp;quot;with terrorist attacks&amp;quot; attached to the noun phrase dominating &amp;quot;Guatemala&amp;quot;. The disambiguation step finds that the prepositional phrase &amp;quot;with terrorist attacks&amp;quot; modifies the verb &amp;quot;charge&amp;quot;, rather than the object noun phrase. In addition, word senses are chosen: the legal sense of &amp;quot;charge&amp;quot; rather than the monetary or physical senses, the social sense of &amp;quot;treatment&amp;quot; rather than the medical sense, and the social sense of &amp;quot;attack&amp;quot; over the physical or medical sense. The formal semantic module translates the disambiguated parse into a DRS. The DRS has a set of reference markers, which stand for each of the entities and events or other abstract types which have been introduced into the discourse (el, al, etc., above), and a set of conditions, which stand for the relations and properties of these entities asserted by the discourse. The DRS provides a framework for interpretation of discourse semantics, such as pronoun resolution. After parsing and semantic translation of the second sentence, the anaphor resolution module identifies &amp;quot;they&amp;quot; with the US rather than Guatemala or &amp;quot;attacks&amp;quot;. The coherence relation module assigns the  Guatemala was charged by the US with terrorist attacks. They cited treatment of suspected guerrillas.</Paragraph>
      <Paragraph position="8"> coherence relation of &amp;quot;constituency&amp;quot; between the events in the two sentences, so that &amp;quot;citing&amp;quot; is seen as part of &amp;quot;charging&amp;quot;. Temporal equations place the charging and citing within the same time interval, rl. The resulting representation is a cognitive model. It is a collection of predicates derived from the text itself expressing the properties of the entities introduced in the text, relations between them, and added inferred coherence relations between the segments of the text. All of the components of this analysis are presently prototyped and running in Prolog. A number of implemented formal semantic treatments such as the handling of plurals, modal contexts, questions, and negation are not shown in the example.</Paragraph>
      <Paragraph position="9"> The naive semantics which is needed for the algorithms is limited to certain feature types. In general, ontological knowledge is used everywhere, especially the sentient/nonsentient distinction. This is because many verbs have selectional restrictions involving sentients, and verb selectional restrictions are frequently in the disambiguation algorithms as well as in anaphor resolution. As for the generic knowledge for verbs, the &amp;quot;cause&amp;quot;, &amp;quot;goal&amp;quot;, &amp;quot;consequence&amp;quot;, and &amp;quot;instrument&amp;quot; features are used by all of the algorithms. For nouns, the features &amp;quot;function&amp;quot;, &amp;quot;rolein&amp;quot;, &amp;quot;partof&amp;quot;, &amp;quot;haspart&amp;quot;, &amp;quot;sex&amp;quot;, &amp;quot;tool&amp;quot; and some others are used by the algorithms, but others, like &amp;quot;exemplar&amp;quot; and &amp;quot;internal trait&amp;quot; are not.</Paragraph>
      <Paragraph position="10"> Interpretext contains algorithms for structural and word sense disambiguation which use naive semantics. In this section two algorithms are cited to illustrated the power of shallow naive semantics in a computational text understanding task. As explained above, all language understanding occurs in the context of some knowledge. Within a subculture there is a body of common knowledge that is shared among participants. There is a relatively shallow layer of that common knowledge (&amp;quot;lexical semantic knowledge&amp;quot;) which the hearer/reader employs in discourse interpretation; this shallow knowledge is accessed as &amp;quot;default values&amp;quot; in the absence of relevant contextual information to the contrary. Two processes central to discourse interpretation are anaphor resolution and the structural  interpretation of prepositional phrases. The following examples illustrate the use of lexical semantic knowledge as a default in prepositional phrase disambiguation and anaphor resolution, i Sentences with prepositional phrases are well-known for their multiple possibilities for syntactic interpretation. Consider: (1) Radio Marti reports that guerrillas are shooting villagers with Chinese rifles.</Paragraph>
      <Paragraph position="11"> The complement clause is syntactically ambiguous. Plausible interpretations for the prepositional phrase &amp;quot;with Chinese rifles&amp;quot; include: (la) Guerrillas are using rifles to shoot villagers.</Paragraph>
      <Paragraph position="12"> (lb) Villagers who have Chinese rifles are being shot by guerrillas.</Paragraph>
      <Paragraph position="13"> If (1) is the first line of a news story, the most likely interpretation is (la). People know that shooting is typically done with guns, and that guerrillas are probably more likely to have guns than villagers are. However, suppose the same clause occurs in another news story but in a different immediate linguistic context: (2) Radio Marti reports that Chinese rifles have been given to villagers cooperating with the government. In retaliation, guerrillas are shooting villagers with Chinese rifles.</Paragraph>
      <Paragraph position="14"> Here the text tells the reader that villagers have rifles. The immediate salience of this fact overrides the general knowledge expectation about who is more likely to have guns, making it more likely that the reader will choose interpretation (lb). The default interpretation favors VP attachment for the prepositional phrase, but the context in (2) favors NP attachment. If a speaker/writer suspects that the hearer/reader might have difficulty interpreting a message, the speaker/writer usually provides clarifying information according to a principle of cooperation in discourse. Consequently, where a correct interpretation goes against the expected default interpretation, there are usually contextual cues. In (2), the first sentence &amp;quot;sets the stage&amp;quot; so that the VP attachment default is overridden. These assumptions about lexical semantic knowledge and sentence interpretation are built into the Interpretext system. The idea that shooting is typically done with guns is part of the naive semantic knowledge encoded in the lexical entry for the verb &amp;quot;shoot&amp;quot;. A rifle is identified as a gun, and a feature in the entry for &amp;quot;rifle&amp;quot; indicates that a typical operation performed with a rifle is shooting. The knowledge that guerrillas typically use guns is part of the naive semantic knowledge about guerrillas; the lexical entry for &amp;quot;villager&amp;quot; does not mention guns. The representation of this shallow level of knowledge is sufficient for the Interpretext system to choose interpretation (la), VP attachment, for (1). This knowlege would also favor an incorrect VP attachment interpretation for (2), unless the system recognizes that, as a result of having been given rifles, the villagers now have them, and a discourse entity of &amp;quot;villagers having Chinese rifles&amp;quot; is established and available for access in the next sentence.</Paragraph>
      <Paragraph position="15"> In the Interpretext system, the shallow knowledge in the lexical entry for the verb &amp;quot;give&amp;quot; includes the fact that, as a consequence of the event of giving, the Recipient has the Object--i.e., the villagers have rifles. Thus, the shallow knowledge about the consequences of &amp;quot;giving&amp;quot; in one sentence can be used to override the knowledge about rifles and shooting in the next sentence (reasoning of this sort has not yet been implemented  in the Interpretext system, but it is entirely feasible). It follows from the principle of cooperation that inferences established from the interpretation of the previous linguistic context will be favored by the system if they conflict with default inferences.</Paragraph>
      <Paragraph position="16"> The principle of cooperation also accounts for contextual information overriding default knowledge in anaphora resolution, as in the following examples: (3) The doctor looked up and recognized the nurse. She smiled.</Paragraph>
      <Paragraph position="17"> Plausible interpretations of (3) include: (3a) The nurse smiled--i.e., &amp;quot;she&amp;quot; = the nurse.</Paragraph>
      <Paragraph position="18"> (3b) The doctor smiled--i.e., &amp;quot;she&amp;quot; = the doctor.</Paragraph>
      <Paragraph position="19"> Although doctors can be men or women, and nurses can be men or women, the current typical default for these roles is to expect doctors to be men and nurses to be women, favoring interpretation (3a). However, these expectations can be altered by previous discourse, as in (4).</Paragraph>
      <Paragraph position="20"> (4) Nurse Roger Smith was nervous as he entered Dr. Mary Brown's office.</Paragraph>
      <Paragraph position="21"> The doctor looked up and recognized the nurse. She smiled.</Paragraph>
      <Paragraph position="22"> In (4) the default interpretation (3a) is overridden by the information in the previous sentence.</Paragraph>
      <Paragraph position="23"> Shallow information encoded in the Interpretext lexical entries makes possible the correct default interpretation for (3): the entry for &amp;quot;doctor&amp;quot; includes the information that doctors are typically (but not inherently) male, and the entry for &amp;quot;nurse&amp;quot; specifies that nurses are typically (but not inherently) female. For discourse (4), the (3a) interpretation needs to be overridden by identifying the definite noun phrase anaphors &amp;quot;doctor&amp;quot; and &amp;quot;nurse&amp;quot; with their respective antecedents, and by accessing shallow lexical knowledge about names indicating that a &amp;quot;Roger&amp;quot; is typically male and a &amp;quot;Mary&amp;quot; is typically female, so that &amp;quot;she&amp;quot; can be only Dr. Mary Brown. A shallow level of lexical semantic knowledge provides enough information to correctly interpret (1) and (3), but in (2) and (4) this information is overridden by inferences from the shallow information in the immediately preceding context.</Paragraph>
      <Paragraph position="24"> Lexical-level shallow knowledge is sufficient for correct interpretation of most instances of sentence structure and anaphor resolution--the immediate representation of text meaning. It is less likely to be adequate for remote bridging inferences. In example (5), shallow naive semantics can bridge between &amp;quot;go&amp;quot; to transportation as an instrument of going, and from transportation to &amp;quot;car&amp;quot; because the inherent function of a car is transportation. However, in (6), shallow knowledge would not be sufficient to bridge from &amp;quot;pregnant&amp;quot; to &amp;quot;surprise&amp;quot; to &amp;quot;swallow gum&amp;quot;.</Paragraph>
      <Paragraph position="25">  (5) Ed decided to go to the movies. He couldn't find his car keys.</Paragraph>
      <Paragraph position="26"> (6) Susan told Ralph that she was pregnant. He swallowed his gum.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML