File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2199_metho.xml
Size: 18,010 bytes
Last Modified: 2025-10-06 14:15:00
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2199"> <Title>Segregatory Coordination and Ellipsis in Text Generation</Title> <Section position="5" start_page="1220" end_page="1220" type="metho"> <SectionTitle> 4 The Semantic Representation </SectionTitle> <Paragraph position="0"> CASPER uses a representation influenced by Lexical-Functional Grammar (Kaplan and Bresnan, 1982) and Semantic Structures (Jackendoff, 1990). While it would have been natural to use thematic roles proposed in Functional Grammar, because our realization component, FUF/SURGE, uses them, these roles would add more complexity into the coordination process. One major task of generating coordination expression is identifying identical elements in the propositions being combined. In Func- null tional Grammar, different processes have different names for their thematic roles (e.g., MEN-TAL process has role SENSER for agent while INTENSIVE process has role IDENTIFIED).</Paragraph> <Paragraph position="1"> As a result, identifying identical elements under various thematic roles requires looking at the process first in order to figure out which thematic roles should be checked for redundancy. Compared to Lexical-Functional Grammar which uses the same feature names, the thematic roles for Functional Grammar makes the identifying task more complicated.</Paragraph> <Paragraph position="2"> In our representation, the roles for each event or state are PRED, ARG1, ARG2, ARG3, and MOD. The slot PRED stores the verb concept.</Paragraph> <Paragraph position="3"> Depending on the concept in PRED, ARG1, ARG2, and ARG3 can take on different thematic roles, such as Actor, Beneficiary, and Goal in &quot;John gave Mary a red book yesterday.&quot; respectively. The optional slot MOD stores modifiers of the PRED. It can have one or multiple circumstantial elements, including MANNER, PLACE, or TIME. Inside each argument slot, it too has a MOD slot to store information such as POSSESSOR or ATTRIBUTE.</Paragraph> <Paragraph position="4"> An example of the semantic representation is provided in Figure 1.</Paragraph> </Section> <Section position="6" start_page="1220" end_page="1223" type="metho"> <SectionTitle> 5 Coordination Algorithm </SectionTitle> <Paragraph position="0"> We have divided the algorithm into four stages, where the first three stages take place in the sentence planner and the last stage takes place in the lexical chooser: Stage 1: group propositions and order them according to their similarities while satisfying pragmatic and contextual constraints. Stage 2&quot; determine recurring elements in the ordered propositions that will be combined.</Paragraph> <Paragraph position="1"> Stage 3: create a sentence boundary when the combined clause reaches pre-determined thresholds.</Paragraph> <Paragraph position="2"> Stage 4&quot; determine which recurring elements are redundant and should be deleted.</Paragraph> <Paragraph position="3"> In the following sections, we provide detail on each stage. To illustrate, we use the imaginary employee report generation system for a human resource department in a supermarket.</Paragraph> <Section position="1" start_page="1220" end_page="1222" type="sub_section"> <SectionTitle> 5.1 Group and Order Propositions </SectionTitle> <Paragraph position="0"> It is desirable to group together propositions with similar elements because these elements are likely to be inferable and thus redundant at surface level and deleted. There are many ways to group and order propositions based on similarities. For the propositions in Figure 2, the semantic representations have the following slots: PRED, ARG1, ARG2, MOD-PLACE, and MOD-TIME. To identify which slot has the most similarity among its elements, we calculate the number of distinct elements in each slot across the propositions, which we call NDE (number of distinct elements). For the purpose of generating concise text, the system prefers to group propositions which result in as many slots with NDE -- 1 as possible. For the propositions in Figure 2, both NDEs of PRED and ARG1 are 1 because all the actions are &quot;re-stock&quot; and all the agents are &quot;AI&quot;; the NDE for ARG2 is 4 because it contains 4 distinct elements: &quot;milk&quot;, &quot;coffee&quot;, &quot;tea&quot;, and &quot;bread&quot;; similarly, the NDE of MOD-PLACE is 3; the NDE of MOD-TIME is 2 (&quot;on Monday&quot; and &quot;on Friday&quot;). The algorithm re-orders the propositions by sorting the elements in each slots using comparison operators which can determine that Monday is smaller than Tuesday, or Aisle 2 is smaller than Aisle 4. Starting from the slots with largest NDE to the lowest, the algorithm re-</Paragraph> </Section> <Section position="2" start_page="1222" end_page="1222" type="sub_section"> <SectionTitle> 5.2 Identify Recurring Elements </SectionTitle> <Paragraph position="0"> The current algorithm makes its decisions in a sequential order and it combines only two propositions at any one time. The result proposition is a semantic representation which represents the result of combining the propositions.</Paragraph> <Paragraph position="1"> One task of the sentence planner is to find a way to combine the next proposition in the ordered propositions into the resulting proposition. In Stage 2, it is concerned with how many slots have distinct values and which slots they are.</Paragraph> <Paragraph position="2"> When multiple adjacent propositions have only one slot with distinct elements, these propositions are 1-distinct. A special optimization can be carried out between the 1-distinct propositions by conjoining their distinct elements into a coordinate structure, such as conjoined verbs, nouns, or adjectives. McCawley (McCawley, 1981) described this phenomenon as Conjunction Reduction - '~whereby conjoined clauses that differ only in one item can be replaced by a simple clause that involves conjoining that item.&quot; In our example, the first and second propositions are 1-distinct at ARG2, and they are combined into a semantic structure representing &quot;A1 re-stocked coffee and tea in Aisle 2 on Monday.&quot; If the third proposition is 1-distinct at ARG2 in respect to the result proposition also, the element &quot;milk&quot; in ARG2 of the third proposition would be similarly combined.</Paragraph> <Paragraph position="3"> In the example, it is not. As a result, we cannot combine the third proposition using only conjunction within a syntactic structure.</Paragraph> <Paragraph position="4"> When the next proposition and the result proposition have more than one distinct slot or their 1-distinct slot is different from the previous 1-distinct slot, the two propositions are said to be multiple-distinct. Our approach in combining multiple-distinct propositions is different from previous linguistic analysis. Instead of removing recurring entities right away based on transformation or substitution, the current system generates every conjoined multiple-distinct proposition. During the generation process of each conjoined clause, the recurring elements might be prevented from appearing at the surface level because the lexical chooser prevented the realization component from generating any string for such redundant elements. Our multiple-distinct coordination produces what linguistics describes as ellipsis and gapping.</Paragraph> <Paragraph position="5"> Figure 4 shows the result combining two propositions that will result in &quot;A1 re-stocked tea on Monday and milk on Friday.&quot; Some readers might notice that PRED and ARG1 in both propositions are marked as RECURRING but only subsequent recurring elements are deleted at surface level. The reason will be explained in Section 5.4.</Paragraph> </Section> <Section position="3" start_page="1222" end_page="1223" type="sub_section"> <SectionTitle> 5.3 Determine Sentence Boundary </SectionTitle> <Paragraph position="0"> Unless combining the next proposition into the result proposition will exceed the pre-determined parameters for the complexity of a sentence, the algorithm wilt keep on combining more propositions into the result proposition using 1-distinct or multiple-distinct coordination. In normal cases, the predefined parameter limits the number of propositions conjoined by multiple-distinct coordination to two.</Paragraph> <Paragraph position="1"> In special cases where the same slots across multiple propositions are multiple-distinct, the pre-determined limit is ignored. By taking advantage of parallel structures, these propositions can be combined using multiple-distinct procedures without making the coordinate structure more difficult to understand. For example, the sentence &quot;John took aspirin on Monday, peni- null cillin on Tuesday, and Tylenol on Wednesday.&quot; is long but quite understandable. Similarly, conjoining a long list of 3-distinct propositions produces understandable sentences too: &quot;John played tennis on Monday, drove to school on Tuesday, and won the lottery on Wednesday.&quot; These constraints allow CASPER to produce sentences that are complex and contain a lot of information, but they are also reasonably easy to understand.</Paragraph> </Section> <Section position="4" start_page="1223" end_page="1223" type="sub_section"> <SectionTitle> 5.4 Delete Redundant Elements </SectionTitle> <Paragraph position="0"> Stage 4 handles ellipsis, one of the most difficult phenomena to handle in syntax. In the previous stages, elements that occur more than once among the propositions are marked as RE-CURRING, but the actual deletion decisions have not been made because CASPER lacks the necessary information. The importance of the surface sequential order can be demonstrated by the following example. In the sentence &quot;On Monday, A1 re-stocked coffee and \[on Monday,\] \[A1\] removed rotten milk.&quot;, the elements in MOD-TIME delete forward (i.e. the subsequent occurrence of the identical constituent disappears). When MOD-TIME elements are realized at the end of the clause, the same elements in MOD-TIME delete backward (i.e. the antecedent occurrence of the identical constituent disappears): &quot;Al re-stocked coffee \[on Monday,\] and \[A1\] removed rotten milk on Monday.&quot; Our deletion algorithm is an extension to the Directionality Constraint in (Tai, 1969), which is based on syntactic structure. Instead, our algorithm uses the sequential order of the recurring element for making deletion decisions.</Paragraph> <Paragraph position="1"> In general, if a slot is realized at the front or medial of a clause, the recurring elements in that slot delete forward. In the first example, MOD-TIME is realized as the front adverbial while ARC1, &quot;Ar', appears in the middle of the clause, so elements in both slots delete forward.</Paragraph> <Paragraph position="2"> On the other hand, if a slot is realized at the end position of a clause, the recurring elements in such slot delete backward, as the MOD-TIME in second example. The extended directionality constraint also applies to conjoined premodifiers and postmodifiers as well, as demonstrated by &quot;in Aisle 3 and \[in Aisle\] 4&quot;, and &quot;at 3 \[PM\] and \[at\] 9 PM&quot;.</Paragraph> <Paragraph position="3"> Using the algorithm just described, the result of the supermarket example is concise and easily understandable: &quot;A1 re-stocked coffee and tea in Aisle 2 and milk in Aisle 5 on Monday.</Paragraph> <Paragraph position="4"> A1 re-stocked bread in Aisle 3 on Friday.&quot; Further discourse processing will replace the second &quot;Al&quot; with a pronoun &quot;he&quot;, and the adverbial &quot;also&quot; may be inserted too.</Paragraph> <Paragraph position="5"> CASPER has been used in an upgraded version of PLANDoc(McKeown et al., 1994), a robust, deployed system which generates reports for justifying the cost to the management in telecommunications domain. Some of the current output is shown in Figure 5. In the figure, &quot;CSA&quot; is a location; &quot;QI&quot; stands for first quarter; &quot;multiplexor&quot; and '~orking-pair transfer&quot; are telecommunications equipment. The first example is a typical simple proposition in the domain, which consists of PRED, ARC1, ARC2, MOD-PLACE, and MOD-TIME. The second example shows 1-distinct coordination at MOD-PLACE, where the second CSA been deleted.</Paragraph> <Paragraph position="6"> The third example demonstrates coordination of two propositions with multiple-distinct in MOD-PLACE and MOD-TIME. The fourth example shows multiple things: the ARC1 became plural in the first proposition because multiple placements occurred as indicated by simple conjunction in MOD-PLACE; the gapping of the PRED '~ras projected&quot; in the second clause was based on multiple-distinct coordination. The last example demonstrates the deletion of MOD-PLACE in the second proposition because it is located at the front of the clause at surface level, so MOD-PLACE deletes forward.</Paragraph> </Section> </Section> <Section position="7" start_page="1223" end_page="1224" type="metho"> <SectionTitle> 6 Linguistic Phenomenon </SectionTitle> <Paragraph position="0"> In this section, we take examples from various linguistic literature (Quirk et al., 1985; van Oir- null souw, 1987) and show how the algorithm developed in Section 5 generates them. We also show how the algorithm can generate sentences with non-constituent coordination, which pose difficulties for most syntactic theories.</Paragraph> <Paragraph position="1"> Coordination involves elements of equal syntactic status. Linguists have categorized coordination into simple and complex. Simple coordination conjoins single clauses or clause constituents while complex coordination involves multiple constituents. For example, the coordinate construction in &quot;John .finished his work and \[John\] went home.&quot; could be viewed as a single proposition containing two coordinate VPs. Based on our algorithm, the phenomenon would be classified as a multiple-distinct coordination between two clauses with deleted ARG1, &quot;John&quot;, in the second clause. In our algorithm, the 1-distinct procedure can generate many simple coordinations, including coordinate verbs, nouns, adjectives, PPs, etc. With simple extensions to the algorithm, clauses with relative clauses could be combined and coordinated too.</Paragraph> <Paragraph position="2"> Complex coordinations involving ellipsis and gapping are much more challenging. In multiple-distinct coordination, each conjoined clause is generated, but recurring elements among the propositions are deleted depending on the extended directionalityconstraints mentioned in Subsection 5.4. It works because it takes advantage of the parallel structure at the surface level.</Paragraph> <Paragraph position="3"> Van Oirsouw (van Oirsouw, 1987), based on the literature on coordinate deletion, identified a number of rules which result in deletion under identity: Gapping, which deletes medial material; Right-Node-Raising (RNR), which deletes identical right most constituents in a syntactic tree; VP-deletion (VPD), which deletes identical verbs and handles post-auxiliary deletion (Sag, 1976). Conjunction Reduction (CR), which deletes identical right-most or leftmost material. He pointed out that these four rules reduce the length of a coordination by deleting identical material, and they serve no other purpose. We will describe how our algorithm handles the examples van Oirsouw used in Figure 6.</Paragraph> <Paragraph position="4"> The algorithm described in Section 5 can use the multiple-distinct procedure to handle all the cases except VPD. In the gapping example, the PRED deletes forward. In RNR, ARG2 deletes Gapping: John ate fish and Bill C/ rice.</Paragraph> <Paragraph position="5"> P,_NR: John caught C/, and Mary killed the rabid dog.</Paragraph> <Paragraph position="6"> VPD: John sleeps, and Peter does C/, too.</Paragraph> <Paragraph position="7"> CRI: John gave C/ C/, and Peter sold a record to Sue.</Paragraph> <Paragraph position="8"> CR2: John gave a book to Mary and C/ C/ a deletion described by van Oirsouw.</Paragraph> <Paragraph position="9"> backward because it is positioned at the end of the clause. In CR1, even though the medial slot ARG2 should delete forward, it deletes backward because it is considered at the end position of a clause. In this case, once ARG3 (the BEN-EFICIARY &quot;to Sue&quot;) deletes backward, ARG2 is at the end position of a clause. This process does require more intelligent processing in the lexical chooser, but it is not difficult. In CR2, it is straight forward to delete forward because both ARG1 and PRED are medial. The current algorithm does not address VPD. For such a sentence, the system would have generated &quot;John and Peter slept&quot; using 1-distinct. Non-constituent coordination phenomena, the coordination of elements that are not of equal syntactic status, are challenging for syntactic theories. The following non-constituent coordination can be explained nicely with the multiple-distinct procedure. In the sentence, &quot;The spy was in his forties, of average build, and spoke with a slightly foreign accent.&quot;, the coordinated constituents are VP, PP, and VP. Based on our analysis, the sentence could be generated by combining the first two clauses using the 1-distinct procedure, and the third clause is combined using the multiple-distinct procedure, with ARG1 (&quot;the spy&quot;) deleted forward.</Paragraph> <Paragraph position="10"> The spy was in his forties, \[the spy\] \[was\] of average build, and \[the spy\] spoke with a slightly foreign accent.</Paragraph> </Section> class="xml-element"></Paper>