File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/p95-1053_metho.xml
Size: 9,754 bytes
Last Modified: 2025-10-06 14:14:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1053"> <Title>Conciseness through Aggregation in Text Generation</Title> <Section position="4" start_page="0" end_page="330" type="metho"> <SectionTitle> 3 Combining Strategy </SectionTitle> <Paragraph position="0"> Because PLANDoc can produce many paraphrases for a single message, aggregation during the syntactic phase of generation would be difficult; semantically similar messages would already have different surface forms. As a result, aggregation in PLANDoc is carried out at the content planning level using semantic FDs. Three main criteria were used to design the combining strategy: 1. domain independence: the algorithm should be applicable in other domains.</Paragraph> <Paragraph position="1"> 2. generating the most concise text: it should avoid repetition of phrases to generate shortest</Paragraph> <Paragraph position="3"> This refinement activated ALL-DLC for CSA 3134 in 1994 Q3.</Paragraph> <Paragraph position="4"> This refinement activated DLC for CSA 3130 in 1994 Q1.</Paragraph> <Paragraph position="5"> This refinement activated DSS-DLC for CSA 3208 in 1994 Q3.</Paragraph> <Paragraph position="6"> This refinement activated DLC for CSA 3122 in 1994 Q1.</Paragraph> <Paragraph position="7"> Equipment: El= ALL-DLC, E2= DLC, E3= DSS-DLC Site: SI= CSA 3122, $2= CSA 3130, $3= CSA 3134, $4= CSA 3208 Date: DI= 1994 Q1, D2= 1994 Q3</Paragraph> <Paragraph position="9"> should not generate sentences that are too complex or ambiguous for readers.</Paragraph> <Paragraph position="10"> The first aggregation step is to identify semantically related messages. This is done by grouping messages with the same action attribute. Then the system attempts to generate concise and unambiguous text for each action group separately. This reduces the problem size from tens of messages into much smaller sizes. Though this heuristic disallows the combination of messages with different actions, the messages in each action group already contain enough information to produce quite complex sentences.</Paragraph> <Paragraph position="11"> The system combines the maximum number of re- null lated messages to meet the second design criteriongenerating the most concise text. But such combination is blocked when a sentence becomes too complex. A bottom-up 4-step algorithm was developed: 1. Sorting: putting similar messages right next to each other.</Paragraph> <Paragraph position="12"> 2. Merging Same Attribute: combining adjacent messages that only have one distinct attribute. null 3. Identity Deletion: deletion of identical components across messages.</Paragraph> <Paragraph position="13"> 4. Sentence Breaking: determining sentence breaks.</Paragraph> <Section position="1" start_page="329" end_page="329" type="sub_section"> <SectionTitle> 3.1 Step h Sorting </SectionTitle> <Paragraph position="0"> The system first ranks the attributes to determine which are most similar across messages with the same action. For each potential distinct attribute, the system calculates its rank using the formula m - d, where m is the number of messages and d is the number of distinct attributes for that particular attribute. The rank is an indicator of how similar an attribute is across the messages. Combining messages according to the highest ranking attribute ensures that minimum text will be generated for these messages. Based on the ranking, the system reorders the messages by sorting, which</Paragraph> <Paragraph position="2"> puts the messages that have the same attribute right next to each other. In Fig. 2, equipment has rank 1 because it has 3 distinct equipment values - ALL-DLC, DLC, and DSS-DLC; date has rank 2 because it has two distinct date values - 1994 Q1 and 1994 Q3; site has rank 0. Attribute class and action (Fig.</Paragraph> <Paragraph position="3"> 1) are ignored because they are always the same at this stage. When two attributes have the same rank, the system breaks the tie based on a priority hierarchy determined by the domain experts. Because the final sorting operation dominates the order of the resulting messages, PLANDoc sorts the message list from the lowest rank attribute to the highest. In this case, the ordering for sorting is site, equipment, and then date. The resulting message list after sorting each attribute is shown in Fig. 4.</Paragraph> </Section> <Section position="2" start_page="329" end_page="330" type="sub_section"> <SectionTitle> 3.2 Step 2: Merging Same Attribute </SectionTitle> <Paragraph position="0"> The list of sorted messages is traversed. Whenever there is only one distinct attribute between two adjacent messages, they are merged into one message with a conjoined attribute, which is a list of the distinct attributes from both messages.</Paragraph> <Paragraph position="1"> What about messages with two or more distinct attributes? Merging two messages with two or more distinct attributes will result in a syntactically valid sentence but with an undesirable meaning: &quot;*This refinement activated ALL-DLC and DSS-DLC for CSAs 3122 and 3130 in the third quarter of 1993.&quot; By tracking which attribute is compound, a third message can be merged into the aggregate message if it also has the same distinct attribute. Continue from Step 1, (E2 S1 D1) and (E2 $2 D1) are merged because they have only one distinct attribute, site.</Paragraph> <Paragraph position="2"> A new FD, (E2 (S1 $2) D1), is assembled to replace those two messages. Note that although (El $3 D2) and (E3 $4 D2) have the date in common, they are not combined because they have more than one distinct attribute, site and equipment.</Paragraph> <Paragraph position="3"> Step 2 is applied to the message list recursively to generate possible crossing conjunction, as in the following output which merges four messages: &quot;This refinement activated ALL-DLC and DSS-DLC for CSAs 3122 and 3130 in the third quarter of 1993.&quot; Though on the outset this phenomenon seems unlikely, it does happen in our domain.</Paragraph> </Section> <Section position="3" start_page="330" end_page="330" type="sub_section"> <SectionTitle> 3.3 Step 3: Identity Deletion </SectionTitle> <Paragraph position="0"> After merging at step 2, the message list left in an action group either has only one message, or it has more than one message with at least two distinct attributes between them. Instead of generating two separate sentences for (E2 (S1 $2) D1) and (El $3 D2), the system realizes that both the subject and verb are the same, thus it uses deletion on identity to generate &quot;This refinement activated DLC for CSAs 3122 and 3130 in 1994 Q1 and \[this refinement activated\] ALL-DLC for CSA 3134 in 1994 Q3.&quot; For identical attributes across two messages (as shown in the bracketed phrase), a &quot;deletion&quot; feature is inserted into the semantic FD, so that SURGE will suppress the output.</Paragraph> </Section> <Section position="4" start_page="330" end_page="330" type="sub_section"> <SectionTitle> 3.4 Step 4: Sentence Break </SectionTitle> <Paragraph position="0"> Applying deletion on identity blindly to the whole message list might make the generated text incomprehensible because readers might have to recover too much implicit information from the sentence.</Paragraph> <Paragraph position="1"> As a result, the combining algorithm must have a way to determine when to break the messages into separate sentences that are easy to understand and unambiguous.</Paragraph> <Paragraph position="2"> How much information to pack into a sentence does not depend on grammaticality, but on coherence, comprehensibility, and aesthetics which are hard to formalize. PLANDoc uses a heuristic that always joins the first and second messages, and continues to do so for third and more if the distinct attributes between the messages are the same. This heuristics results in parallel syntactic structure and the underlying semantics can be easily recovered.</Paragraph> <Paragraph position="3"> Once the distinct attributes are different from the combined messages, the system starts a new sentence. Using the same example, (E2 (S1 $2) D1) and (El $3 D2) have three distinct attributes. They are combined because they are the first two messages.</Paragraph> <Paragraph position="4"> Comparing the third message (E3 $4 D2) to (El $3 D2), they have different equipment and site, but not date, so a sentence break will take place between them. Aggregating all three messages together will results in questionable output. Because of the parallel structure created between the first 2 messages, readers are expecting a different date when reading the third clause. The second occurrence of &quot;1994 Q3&quot; in the same sentence does not agree with readers' expectation thus potentially confusing.</Paragraph> </Section> </Section> <Section position="5" start_page="330" end_page="330" type="metho"> <SectionTitle> 4 Future Directions </SectionTitle> <Paragraph position="0"> In this paper, I have described a general algorithm which not only reduces the amount of the text produced, but also increases the fluency of the text.</Paragraph> <Paragraph position="1"> While other systems do generate conjunctions, they deal~vith restricted cases such as conjunction of subjects and predicates(Dalianis~zHovy93). There are other interesting problems in aggregations. Generating marker words to indicate relationships in conjoined structures, such as &quot;respectively&quot;, is another short term goal. Extending the current aggregation algorithm to be more general is currently being investigated, such as combining related messages with different actions.</Paragraph> </Section> class="xml-element"></Paper>