File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/85/p85-1039_metho.xml

Size: 39,736 bytes

Last Modified: 2025-10-06 14:11:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P85-1039">
  <Title>THE USE OF SYNTACTIC CLUES IN DISCOURSE PROCESSING</Title>
  <Section position="4" start_page="0" end_page="320" type="metho">
    <SectionTitle>
THE DISCOURSE STRUCTURE OF NEWS REPORTS
</SectionTitle>
    <Paragraph position="0"> The syntactic rules used by DUMP work because of the predictable, almost formu\[aic discourse structure of hard news reports~. Two journalistic devices above all else characterize hard news: the inverted pyramid, and the block paragraph (Green, 1979). The inverted pyramid refers to the convention of relating the most important facts of * Features, sports reports, and so forth have their own discourse structure.</Paragraph>
    <Paragraph position="1">  a news story in the first paragraph, followed by less important information given in descending order (or, it may be argued, random order) of importance. Thus, the news differs markedly from canonical story form in which material is given in chronological order. The block paragraph, the second device, is one which stands independent of paragraphs adjacent to it. This unit contains no Logical connectives (however, in addition, ~oreover) which link it to preceding or following paragraphs. The avoidance of such connectives allows the newspaper editor to quickly delete paragraphs from a story in the morning edition to fit into the evening edition without rewriting. The block paragraph is short: over sixty percent of the paragraphs in the corpus are only one sentence long; about one-half have two sentences, and less than one percent have three sentences. The effect is that most sentences of the report are presented at the same level of importance: there is no orthographic unit larger than the sentence which reliably indicates that a group of sentences is related topically or episodically. In place of the normal paragraph, we shall see, is a highly reliable level of syntactic coding which links sentences into episodes.</Paragraph>
    <Paragraph position="2"> At a lower level of organization than the inverted pyramid and block paragraph are the two discourse units which DUMP relies on: the episode, and within the episode, the information field as found in the detached clause.</Paragraph>
    <Paragraph position="3"> News reports may contain more than one episode.</Paragraph>
    <Paragraph position="4"> A new episode begins when the set of characters and/or setting (temporal or geographical) changes. The detached clause is defined Intonatlonally: it is bounded by pauses, has falling intonation at the end, or is preceded by a clause with falling intonation (Thompson, 1983). This clause is almost always set off in text with commas. So, for example, the following sentence from the ninth story in the corpus (&amp;quot;Ararat Forces Lose Key Position,&amp;quot; Boston Globe, November 7, 1983) consists of four detached clauses, or information fields: (9:3)~ Arafat's soldiers, who resisted the assault, fell back sir miles to Beddawi, the remaining PiO stronghold in the area, and Nahr el Bared is now surrounded by Syrian soldiers ....</Paragraph>
    <Paragraph position="5"> The information fields here are: a nonrestrictive relative clause (&amp;quot;who resisted the assault&amp;quot;), an appositive (&amp;quot;the remaining PLO stronghold in the area&amp;quot;), and two main clauses (&amp;quot;Arafat's soldiers fell back...&amp;quot; and &amp;quot;Nahr el Bared is now surrounded...&amp;quot;).</Paragraph>
    <Paragraph position="6"> There are a small number of syntactic forms which reliably indicate the beginning of new episodes. Likewise, there is a strong correlation * The first number indicates the story in the corpus, the second the number of the sentence within that story.</Paragraph>
    <Paragraph position="7"> between the category of information the Journalist conveys in each detached clause and the syntactic structures used for its expression. For example, the nonrestrictive relative clause in 9:3 expresses background events, the appositive expresses an identification of place, and the two main clauses express a main event and a current state, respectively. The next two sections will Look at the syntactic correlates of the information field and the episode boundary in detail.</Paragraph>
    <Paragraph position="8"> Syntactic Correlates of the Information Field The syntactic rules used by DUMP reflect grounding principles found universally in discourse (Grimes, 1975). Certain assertional structures in text deliver foreground information, which tells the events of the narrative and moves the story forward. These events comprise a summary of the story. Less assertional structures are used to express background, supportive information which fleshes out the skeleton provided in the foreground but does not move the action forward. There is a strong correlation between the syntactic form and information type of this supportive material which allows DUMP to subcategorize it into the following classes: past events and processes Leading up to the most recent development in the story; plans for the future; current state of the world; information of secondary importance; identifications; import of the story; effects of actions; comments made by participants in the story; and collateral (things which did not happen).</Paragraph>
    <Paragraph position="9"> This division of material into foreground vs.</Paragraph>
    <Paragraph position="10"> background gives text its texture. A narrative in which everything is presented at the same level of prominence tends to be monotonous. One of the chief means of distinguishing foreground from background is tense and aspect, which has been called a sort of flow-of-control mechanism, allowin K the reader to pick out the most important parts of a discourse (Hopper, 1979). Sentences with simple past verbs in the active voice are the chief conveyors of foreground material in news.</Paragraph>
    <Paragraph position="11"> This fact recalls the broader concept of transitivit 7 put forth by Hopper and Thompson (1980), whereby certain properties of the verb and its arguments transfer the action from agent to patient more effectively than others. Foregrounded clauses have high transitivity, backgrounded clauses low transitivity.</Paragraph>
    <Paragraph position="12"> High transitivity verbs are kinetic, relic, punctual, volitional, affirmative, and realis.</Paragraph>
    <Paragraph position="13"> Kinetic verbs allow easy transfer of action from subject to object. Throw is therefore kinetic, while the copular to be is not. Telic verbs are those which express an action with a natural endpoin=. The verb make ia &amp;quot;John is making a chair&amp;quot; is relic, while the verb sin 5 in &amp;quot;John is singing&amp;quot; is not. Telic and atelic verbs can be ~istinguisned by their entailments: if John is interrupted while making a chair, it is not true thac he has made a chair, but if he is interrupted while singing, it is still true that he has sung (Comrie, 1976). Punctual verbs (sneeze, kick) refer to actions with no obvious internal structure.</Paragraph>
    <Paragraph position="14"> Study and carr~ are examples of non-punctual verbs.</Paragraph>
    <Paragraph position="15">  Volitional verbs (&amp;quot;T wrote his name&amp;quot;) have greater transitivity than non-volitional verbs (&amp;quot;~ forgot his name&amp;quot;)(Hopper and Thompson, 1980, p. 252). Affirmation distinguishes collateral information from all other types. And finally, the realis mode distinguishes events which have existed from those which only might have or would have. Main event clauses therefore never contain modals. The differential behavior of verbs from these semantic classes has been described by a number of taxonomers (Comrie, 1976; Mourelatos, 1981; Ota, 1963; Vendler, 1967).</Paragraph>
    <Paragraph position="16"> Arguments high in transitivity are those which are strong agents, totally affected and highly individuated. Strong agents are human rather than non-human: &amp;quot;George startled me&amp;quot; has more transitivit 7 than &amp;quot;The picture startled me&amp;quot; (Hopper and Thompson, 1980, p.252). Objects which are wholly affected lend greater transitivity than those which are only partially affected (&amp;quot;I drank the milk&amp;quot; vs. &amp;quot;I drank some milk&amp;quot;). Likewise, more highly individuated o--~e~defined as proper, human or animate, concrete, singular, count and definite, add more transitivity than less individuated ones.</Paragraph>
    <Paragraph position="17"> These transitivity parameters assume a good deal of semantic knowledge about verbs and their arguments. In fact, the affirmative and realis features are the only ones reflected Ln DUMP's rules. But in another respect, Hopper and Thompson's notion of transitivity must be extended. An examination of tense and aspect alone is not sufficient to distinguish foreground from background in the DUMP corpus. The type of clause In which the verb appears is also crucial. So, for example, the simple past may be used to convey both foreground and background material, depending on the type of clause in which it occurs: in main clauses, it will always convey the most recent events in a story, while in relative clauses, it will always convey past events. The first two sentences of story 6 (&amp;quot;Stone Meets with Salvador Rebel Official,&amp;quot; Boston GLobe, August 1, 1983) illustrate the distinct uses of the two clause types.</Paragraph>
    <Paragraph position="18"> (6:i) After weeks of maneuvering and frustration, presidential envoy Richard B. Stone met face-to-face yesterday for the first time with a key Leader of the Salvadoran guerrilla movement.</Paragraph>
    <Paragraph position="19"> Here, the simple past is used in a main clause to foreground information.</Paragraph>
    <Paragraph position="20"> (6:Z) &amp;quot;The ice has been broken,&amp;quot; proclaimed President BeLisario Betancur of Colombia, who engineered the meeting.</Paragraph>
    <Paragraph position="21"> The simple past engineered in a relative clause indicates background material.</Paragraph>
    <Paragraph position="22"> The information-bearing capacities of these two clause types, when they occur with the simple, active past, are in complementary distribution in newswriting. The main clause is more assertionaL than the relative clause; it is used to give information which the writer assumes the reader is seeing for the first time. The relative clause, on the other hand, is more presuppositionaL. The writer uses it to convey old information which is of Lesser importance or which the reader may already have knowledge of.</Paragraph>
    <Paragraph position="23"> Sentences 6:i and 6:Z illustrate the way in which syntactic forms provide information which might otherwise need to be culled from world know-Ledge. We know that the planning of a meeting precedes its occurrence, but no such knowledge is necessary here, since the past verb form in a relative clause signals an event which occurred before the main event.</Paragraph>
    <Paragraph position="24"> The so-called &amp;quot;hot news&amp;quot; present perfect i- a main clause (&amp;quot;The president has resigned&amp;quot;) signals a main event if it occurs in the first sentence of a story. Its appearance further down or in a noumain clause signals information about past events  or states. Two sentences from story 16 (&amp;quot;Peronists Suffer Stunning Defeat in Argentine Vote,&amp;quot; New York Times, November I, 1983) illustrate this. (16:1) The Leader of a middle-class party has swept to victory in Argentina's presidential elections ....</Paragraph>
    <Paragraph position="25"> (16:4) The e~C/~on, called by the ruling  military, was a stunning defeat for the Perouists, who have dominated Argentina's political Life since their party was founded in 1945 by Juan Domin~o Peron.</Paragraph>
    <Paragraph position="26"> In 16:1, the present perfect has swept is used in the hot news sense. In 16:4, the present perfect have dominated Ls used in a relative clause with an adverbial phrase (&amp;quot;since their party was founded in 1945...&amp;quot;) to describe a state that has existed for decades. Note also that the verb dominate is atelic and non-punctual, and therefore Low in transitivity. However, knowledge of the verb's semantic class is not necessary to identify the relative clause as supportive. The mere fact that the verb is in a relative clause or the fact that the present perfect appears after the first sentence suffices.</Paragraph>
    <Paragraph position="27"> Syntactic clues may be used to avoid the need for time programs which determine the relative timing of events by interpreting adverbials. The following main clauses use the present perfect, but since they are non-initial, the states and events referred to in them must have occurred before the main event in the story (&amp;quot;O'Neill Now Calls Gren- null ada Invasion 'Justified' Action,&amp;quot; New York Times, November 9, 1983).</Paragraph>
    <Paragraph position="28"> (19:5) Pressures to pass a strict 60-day Legal limit \[to the stay of U.S. troops in Grenada\] have eased in the past week.</Paragraph>
    <Paragraph position="29"> (19:6) Both houses have passed such measures, but the Senate version has been bottled up because it was attached to a debt-ceiling bill.</Paragraph>
    <Paragraph position="30"> (i~:7) Other versions of the 60-day War Powers  Resolution have been introduced but not acted upon.</Paragraph>
    <Paragraph position="31"> The appearance of the present perfect this far  into the story means that the time phrase in the past week does not have to be interpreted by a time program.</Paragraph>
    <Paragraph position="32"> Likewise, the use of the passive simple past in a main clause indicates that the event is supportive material: main events, it turns out, are never expressed with passive voice in the corpus. In story 14 (&amp;quot;U.S. Says Moscow Threatens to Quit Talks on Missiles,&amp;quot; New York Times, October 12, 1983), there is no need to interpret the adverbial in 1980 and in 1979 with a time program, unless relative ordering of background events is desired. The mere presence of the passive marks these events as occurring before the time of the main events in the story.</Paragraph>
    <Paragraph position="33"> (14:8) Talks on a comprehensive test ban of nuclear devices were suspended in Geneva in 1980, and the Geneva negotiations were suspended in 1979.</Paragraph>
    <Paragraph position="34"> Main events then are expressed in main clauses with simple past verbs. Events and states which existed before these main events are expressed with a greater variety of syntactic forms, from main clauses, to relative and subordinate clauses, down to noun phrases (which are not analyzed by DUMP). Nominalizations are perhaps the most frequent conveyors of background information In the news. The nominalization rule transforms a sentence into a noun phrase which can then be inserted into another sentence. St is a highly presuppositionai structure, since the subject and object of the original verb are often deleted during the transformation and the reader must then supply these arguments from world knowledge. An ~xampie from the second story in the corpus (&amp;quot;Lebanon Needs Israeli Troops, Shultz Told,&amp;quot; Boston Globe, March 14, 1983) shows the heavy use of nominaiizations to create a very long prepositions\[ phrase which contains not a single verb: (Z:2) In the first high-Level contacts between the two governments since the start early this year of OS-Israeii-Lebanese ne~otiations on the withdrawal of Israel's forces from Lebanon, ....</Paragraph>
    <Paragraph position="35"> We will see other uses of nominalizatlon to express other information categories and to refer to episodes with a single word.</Paragraph>
    <Paragraph position="36"> The following incomplete llst gives a cursory look at the strong correlation between the remaining information categories in news reports and the syntactic forms used to express them. Most of the examples are from story 6, about envoy Stone's meeting with a Salvadoran guerrilla Leader, and story 16, about the defeat of the Peronists in Argentina's elections. The next two categories,  Current States and Plans, also locate events or states in time, and therefore must occur in finite clauses. Current States: This category describes the scale of the world at the time the report is written. Current states are expressed with simple present or present progressive verbs used in main clauses and in subordinate and relative clauses.</Paragraph>
    <Paragraph position="37"> (6:10) Stone has repeatedly sought to meet with political Leaders of the Salvadoran left, all of whom live in exile, ....</Paragraph>
    <Paragraph position="38"> (16=11) The country Mr. Alfonsin is due to govern is racked by a deep economic crisis.</Paragraph>
    <Paragraph position="39"> Plans: These may be expressed with appropriate modals (will, ~, would) in the same structures used for Current States.</Paragraph>
    <Paragraph position="40"> (6:10) His mission is to encourage participation by the left in Salvadoran elections, which will probably be held in March 198~.</Paragraph>
    <Paragraph position="41"> (16:10) Military officials said the ruling  junta would consider it in a meeting Tuesday.</Paragraph>
    <Paragraph position="42"> Certain verbs which express present planning (come , go,leave, start) can be used to indicate future time with the present tense: &amp;quot;Fiscal year 1983, which begins Oct. 1 .... &amp;quot;.</Paragraph>
    <Paragraph position="43"> It seems to be a discourse principle of Journalese that while non-main events may be &amp;quot;promoted&amp;quot; to expression by the most assertive clause type, they may also be expressed with less assertional forms: subordinate and relative clauses, nominailzations, etc. The converm, however, is not true. Main events may never by &amp;quot;demoted&amp;quot; to expression by any other than the most assertive form.</Paragraph>
    <Paragraph position="44"> The remaining information types do not Locate actions in time, and therefore are free to appear in constructions without finite verbs.</Paragraph>
    <Paragraph position="45"> Import: This category is occasionally expressed with equative sentences of the form: NP V-be NP. The subject and predicate NPs tend to be nominaLizations, with the former referring to the main episode.</Paragraph>
    <Paragraph position="46"> (16:4) The election...was a stunning defeat for the Peronists ....</Paragraph>
    <Paragraph position="47"> Election refers to the main event introduced in 16:i. 16:4 tells why that event is newsworthy.</Paragraph>
    <Paragraph position="48"> Nonrestrictive PPs with nominalizations as heads may also express Import: (4:1) The...Budget Committee, in a major blow to President Ronald Reagan, voted yesterday to hold the real growth in defense spending to 5 percent next year .... (&amp;quot;Senate Panel Trims Reagan Arms Budget,&amp;quot; Boston GLobe, April 8, 1983) Identifications: With only one exception, all identifications in the corpus are made with pre-nominal modifiers (&amp;quot;Prime Minister Smith&amp;quot;) or with appositives, which may be embedded recursiveLy: null (6:3) ...Stone...talked with Ruben Zamora, the No. 2 Leader of the Revolutionary Demo- null cratic Front, the:politicaL arm of the five Marxist-led guerrilla bands fighting government forces here.</Paragraph>
    <Paragraph position="49"> Effects: Detached participial phrases are used to tell the effects of the actions described in main clauses.</Paragraph>
    <Paragraph position="50"> (16:1) The leader of a middle-class party has swept to victory in Argentina's presidential elections, handin~ the union-based Peronists their first election defeat ~n nearly four decades.</Paragraph>
    <Paragraph position="51"> Comments: Comments are simply quotations from people involved in an event. While in other narratives, dialogue is often the chief means of telling a story and moving the action forward, this is not the case in newswriting. Mere, quotes from participants add flavor and give supplementary information, but they are never the sole vehicle for informing readers of an event. This is a lucky fact, sSnce the syntactic forms used in quoted speech are usually much less constrained than those in non-quoted portions.</Paragraph>
    <Paragraph position="52"> (16:5) &amp;quot;We are entering a new stage,&amp;quot; the 56-year old Mr. Alfonsin, whose politics are Left of center, said in a television interview early today.</Paragraph>
    <Paragraph position="53"> Collateral: News reports tell what did not happen in a story, what events and processes never were, with surprising frequency. This information category is expressed by negations of clauses, including negative existentials, negative subordinate clauses, and various negative prefixes and prenominal modifiers.</Paragraph>
    <Paragraph position="54">  (6:7) Salvadoran officials had no immediate comment on what they heard from Stone ....</Paragraph>
    <Paragraph position="55"> (6:9) Stone had been unable to arrange a  meeting with the Salvadoran rebel leaders...</Paragraph>
    <Paragraph position="56"> earlier this month.</Paragraph>
    <Paragraph position="57"> If it were the case that the correspondence between a syntactic form and the information types it expresses was one-to-many, this relation would not be of much help in automatic processing. In fact, the correspondence is closer to one-to-one, so that, for example, equatives only express import and not identifications, as would be natural in conversational English (&amp;quot;Smith is mayor of the city&amp;quot;).</Paragraph>
    <Paragraph position="58"> DUMP was successful in creating good summaries and labeling the information types for all but two of the twenty-three stories in the corpus. These two exceptions were highly eventful, chronological accounts and DUMP had difficulty distinguishing minor events from major ones. in addition, after the completion of the program, it performed well with a final story not from the corpus.</Paragraph>
    <Paragraph position="59"> Syntactic Correlates of Episode Boundaries About one-thlrd of the stories in the DUMP corpus consist of more than one episode. Story 17, given here with its DUMP-derived analysis of information, contains three minor episodes in addition to the major one introduced in the first sentence of the report. The discussion below of syntactic forms used to indicate episode boundaries will call upon this story for examples.</Paragraph>
    <Paragraph position="60">  Special to the New York Times Washington, Nov. 3 - i. The Senate today approved by voice vote continued aid for covert operations In Nicaragua. Z. The approval was made contingent upon notification to the intelligence committee of the goals and risks of specific covert projects.</Paragraph>
    <Paragraph position="61"> 3. The action would provide only $19 million  of the $50 million that the Administration sought for covert operations in Central America, mostly in Nicaragua. 4. Those funds are expected to run out in less than six months, when the Central Intelligence Agency would have to give an account of its activities as it sought the rest of the funds.</Paragraph>
    <Paragraph position="62">  to resolve differences in the two measures, and the Nicaraguan dispute is expected to be a stumb- null ling block in the negotiations.</Paragraph>
    <Paragraph position="63"> Judge Orders Investigation 8. In San Francisco, a Federal district judge ordered Attorney General William French Smith to conduct a preliminary investigation of charges that President Reagan and other Government officials violated the Neutrality Act by supporting the activities of paramilitary groups seeking to overthrow the Nicaraguan government. 9. The ruling came in a lawsuit filed by Representative Ronaid V. DeLLums, Democrat of California \[Page A9\].</Paragraph>
    <Paragraph position="64"> I0. Senator Daniel Patrick Moynihan, the New York Democrat who is vice chairman of the Intelligence Committee, told the Senate that the Admin null istration had modified its covert policy Last summer, and was not supporting the insurgents seeking to overthrow the Sandinista government.</Paragraph>
    <Paragraph position="65"> Summary of Main Events: The Senate today approved by voice vote continued aid for covert operations in Nicaragua. Senator Daniel Patrick Moynihan told the Senate that the Administration had * Dump does not analyze either subtitles, which n~t all newspapers use, or titles.</Paragraph>
    <Paragraph position="66">  modified its covert policy last summer and was not supporting the bnsurgents seeking to overthrow the Sandinlsta government.</Paragraph>
    <Paragraph position="67"> Past Events: ...which \[covert US activity in Nicaragua\] was banned in a House-passed bill.</Paragraph>
    <Paragraph position="68"> Current State: Those funds are expected to run out in less than six months.</Paragraph>
    <Paragraph position="69"> ...the Nicaragua dispute is expected to be a stumbling block in the negotiations.</Paragraph>
    <Paragraph position="70"> Plans: Sentence 3.</Paragraph>
    <Paragraph position="71"> ...when \[in Less than six months\] the Central IntelLigence Agency would have to give an accounting of its activities as It sought the rest of the funds.</Paragraph>
    <Paragraph position="72"> Sentence 6.</Paragraph>
    <Paragraph position="73"> House and Senate conferees will now seek to resolve differences in the two measures.</Paragraph>
    <Paragraph position="74"> Secondar),:* The approval was made contingent upon notification to the intelligence committee of the goals and risks of specific covert projects.</Paragraph>
    <Paragraph position="75"> Identifications: ...Moynihan, the New York Democrat who is vice chairman of the Intelligence Committee. The remaining uncategorized sentences are episode markers and will be discussed below.</Paragraph>
    <Paragraph position="77"> As noted earlier, orthographic paragraphs are not used in newswrittng to indicate episode boundaries. In their place are a small number of constructions which regularly introduce new episodes, relating them temporally to previous episodes. These structures include the double container sentence, the sentence introduced with a won-restrictive location PP, the LinkS, and the detached time adverbial with a nominaLizatiou in it.</Paragraph>
    <Paragraph position="78"> The first four sentences of s~ovy 17 concern the m=%n episode. A new, minor episode is introduced by the double container in sentence 5. This kind of structure has a verb from the small class (e.g. precede, follow, result in) which may take a nominalization in both subject and object position. The subject refers to an old episode and the object to a new one.</Paragraph>
    <Paragraph position="79"> (17:5) The vote followed an hourlong debate that focused on covert United States activity in Nicaragua ....</Paragraph>
    <Paragraph position="80"> The subject vote refers back to the story's main event, the Senate vote in the first sentence. The object, or new episode, is the nominalizatton debate. The object also tells of another episode concerning passage of a House bill. This bill episode is developed in 17:6 and 17:7.</Paragraph>
    <Paragraph position="81"> The second minor episode is introduced with a * This category is not a very reliable one. It includes clauses with passives and copulas.</Paragraph>
    <Paragraph position="82"> simple detached PP of location in 17:8. This structure is used to shift the setting from the dateline location to a new place. In this case, the action moves from Washington to San Francisco: (17:8) In San Francisco, a Federal district Judge ordered Attorney General William French Smith to conduct a preliminary investigation of charges that President Reagan and other Government officials violated the Neutrality Act ....</Paragraph>
    <Paragraph position="83"> This episode is not developed any further in this report, but is interrupted in the next senteuce, a LinkS, by the third minor episode. The Links Is of the form: The nominalized subject refers back to a previous episode and the object of came refers to a new episode. The conjunct or ~r--~osition shows the new episode's temporal relation to the old.</Paragraph>
    <Paragraph position="84"> (17:9) The ruling came in a lawsuit filed by Representative Ronald V. Deilums, Democrat of California. \[Page AP. I The lawsuit episode is developed elsewhere in the paper. The page reference closes this episode, and therefore, since 17:10 contains no reference to a new place or time, and has a simple past main verb (~oLd), it must by default be part of the original, main episode. This decision is supported by the eleventh sentence in the story (not included in the corpus): After this policy change, Mr. Moynihan said, the committee approved additional funds.</Paragraph>
    <Paragraph position="85"> There is no example of the final episode marker in story 17--the sentence introduced by a detached time adverbial with a nominalization in a time phrase (&amp;quot;Two hours before the vote&amp;quot;; &amp;quot;During the Pope's visit&amp;quot;)deg The nomlnalization refers to a previous episode and the main sentence to which the whole adverbial phrase is attached introduces the new episode. Story 10 (&amp;quot;French Jets KetaLiate, Hit Shiite Positions,&amp;quot; Boston GLobe, November 18, L983) begins vith French planes bombing Iranianbacked militia in Lebanon. A related episode starts in sentence 5: (10:5) Six hours after the French air attacks, gunmen fired rocket-propeLled grenades and automatic weapons at a French peacekeepin~ post in the Shiite Moslem neighborhood of Khandik Ghamik in West Beirut.</Paragraph>
    <Paragraph position="86"> Each episode in a report has the potential to contain its own main events, background events, plans, current states, identifications, and so forth. An extension of DUMP's labeling ability would be the creation of a discourse tree for each news report, with a root node dominating episode nodes, which in turn dominate relevant information categories.</Paragraph>
  </Section>
  <Section position="5" start_page="320" end_page="320" type="metho">
    <SectionTitle>
THE DUMP PROGRAM
</SectionTitle>
    <Paragraph position="0"> DUMP works very simply. It takes as input parsed sentences of a story and searches through them for the kinds of syntactic labels described above (declarative sentence, detached PP, etc.).</Paragraph>
    <Paragraph position="1"> These labels introduce information fields, each of which is stored on a stack. A set of rules is then applied to each entry on the stack, and assignment of each entry made Co one of the information categories on the basis of the structural label and optional tense/aspect marker.</Paragraph>
    <Paragraph position="2"> DUMP does not need a full parse of a sentence to assign syntactic structures to a partlcular information category. For example, it does not need to know anything about the attachment of clause-lnternal PPs, a difficult problem for parsing programs. Furthermore, newswriting (with the exception of quoted portions, which DUMP does not need parsed) does not reflect the use of a full grammar of English. The corpus contains no question forms and a number of the &amp;quot;stylistic&amp;quot; transformations (pseudo-cleft, coplcaLizatlon are examples) do not appear. The question of whether some kind of &amp;quot;fuzzy&amp;quot; parser with a limited number of rules could provide adequate output for DUMP is one ~or further research.</Paragraph>
    <Paragraph position="3"> On the other hand, whatever parser is used to prepare input for DUMP will need certain labels not ordinari~y found in parse trees: sentences are not usually distinguished as equative or double container in type. Furthermore, DUMP requires some non-standard features on words. For example, we have seen in a number of instances how crucial it is to mark nouns as nominalizations.</Paragraph>
  </Section>
  <Section position="6" start_page="320" end_page="321" type="metho">
    <SectionTitle>
RELATION TO OTHER WORK
</SectionTitle>
    <Paragraph position="0"> The DUMP program embodies principles useful both to the processing of sublanguages and to AI research. In the former case, these principles allow preliminary automatic processing of texts within the same genre, regardless of the breadth of the semantic field. As noted earlier, current work with subLanguages relies on word co-occurrence classes which result from their very constrained subject matter. Newswriting covers a wide range of topics and therefore word co-occurrence classes are not an efficient method of automatic processing. However, these reports do show predictable constraints in the use of syntactic constructions to express particular kinds of information and it is this regularity that DUMP depends upon.</Paragraph>
    <Paragraph position="1"> In the case of AI research, DUMP can serve as a support program to knowledge-based processors.</Paragraph>
    <Paragraph position="2"> The FRUMP program (DeJong, L979), for example, creates summaries from sketchy scripts by looking for key requests, or main events, in the text.</Paragraph>
    <Paragraph position="3"> So, the script for an earthquake story might contain key requests for information about the quake's rating on the Richter Scale, the amount of property damage It did, where the epicenter was located, and how far shock waves were felt.</Paragraph>
    <Paragraph position="4"> FRUMP would then look to the newspaper text for evidence of each of the key requests in the script.</Paragraph>
    <Paragraph position="5"> The scripts are written by the programmer, based on his or her assumption of the most important information likely to be found in all stories about a particular topic. DUMP is feted from reliance on such scripts because of the fact that the news reporter, however unconsciously, encodes key requests syntactically. DUMP can locate these key requests easily and also signal the beginning of new elpsodes, thus facilitating one of the tasks which FRUMP finds most difflcu~t--thafi of script selection. (Imaglne the confusion that could result in scot 7 17 when the Congressional script is interrupted in the eighth sentence by an episode requiring a judicial script.) Once all of the detached clauses and episodes in a report have been correctly ~abeLled by DUMP, a knowledge-based processor could then go about building conceptual representations for each unit.</Paragraph>
    <Paragraph position="6"> It is expected that DUMP's approach could be extended to other genres of writing, since most texts achieve texture by distinguishing foreground from background. However, texts vary in the proportion of foregrounded to backgrounded material and in their pref~ence for certain forms to convey grounding. The literary style of a discourse will therefore influence the design of automatic text processing programs. The style of news reports is relatively subordinated, non-redundant, and predicatlonaiiy dense. The sentences in the DUMP corpus average 2.88 predications per sentence, as compared to a high of 2.78 in the informative sections of the Brown corpus and 2.6A across all genres (Francis and Kucera, 1982). The term predication refers co both the flniCe and non-flnlCe types, and therefore the 2.88 figure indicates that the news corpus is characterized by a great deal of embedding of both types: finite clauses (relative clause~ adverbial clauses), and well as non-finites (infinitive complements, reduced relatives, participials). It can be hypothesized that a highly predicated writing style such as Journalese will show greater variety in its syntactic structures than a style with few predications per sentence. This syntactic diversity will reflect a text with less foregrounded material--in short, a text with greater texture. A further hypothesis is that in a predirationally dense style there will be a stronger correlation between syntactic forms and the partitular Information types expressed by these forms. It seems likely that a genre which uses few predications per sentence would consist chiefl 7 of main clauses used as the workhorse to express all kinds of information: background, main events, plans, import, and so forth. Some of these information categories will be distinguishable by verb tense, aspect, mood and voice, as in the news. But others will have to rely on world knowledge for categorization. As an example, consider a revised version of the opening of story 6, rewritten so that embedded clauses in the original are expressed as main c~auses: Richard B. Stone met face-co-face today with a key leader of the Salvadoran guerrilla movement. He spent several frustrating weeks  maneuvering the meeting.</Paragraph>
    <Paragraph position="7"> &amp;quot;The Ice has been broken,&amp;quot; proclaimed President Belisario BeCancur of Colombia.</Paragraph>
    <Paragraph position="8"> He engineered the meeting.</Paragraph>
    <Paragraph position="9"> Knowledge about the way plans are made would be needed to distinguish foreground from background in these sentences.</Paragraph>
    <Paragraph position="10"> One further metric can be hypothesized for determining discourse genres suitable for syntactic analysis. In syntactic theory there is a well-known correlation between the flexibility of word order in a language and its use of morphosyutactic Inflections. Languages llke English which have Lost most of their inflectional markers rely on rigid word order to establish syntactic relations. On the other hand, highly inflected ~anguages llke Latin can afford greater flexibility in word order since inflections on the ends of words indicate their function in the sentence.</Paragraph>
    <Paragraph position="11"> An analogy might be drawn in which syntactic structures correspond to morphosyntactic \[nflec-Lions and information order in discourse corresponds to word order. The discourse structure of news reports violates canonical story form. The writer does not start at the beginning and relate events through to the end. The potential confusion introduced by this unpredictability is compounded by the density of new information in news reports.</Paragraph>
    <Paragraph position="12"> Perhaps the great regularity in the use of distinct syntactic forms to express the types of information conveyed in the news serves to compensate for the flexibility ~n discourse structure. It is as though the strong correlation between syntactic form and tnforma~ion type frees the reader to process the large amount of new information being delivered. Just as inflectional endings allow the Listener to assign words to their functional slots regardless of the order in which they appear, so the syntactic correlates to information types allow the news reader to quickly assign phrases their function in the discourse. Stories which adhere to a standard story grammar do not need such syncactlc regularity, since the position of the material in the text indicates its function.</Paragraph>
    <Paragraph position="13"> The extension of a program Like DUMP to other discourse genres would require, first, the identification of the information categories expressed by the kind of text. Cookbooks, for example, convey instructions and descriptions, not main events, effects and identifications.</Paragraph>
    <Paragraph position="14"> Secondly, correlations between syntactic form and information type and the syntactic means for ~ndicating episode boundaries must be determined.</Paragraph>
    <Paragraph position="15"> The degree of correlation between syntactic form and PSnformation type in non-news genres is a matter for further investigation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML