File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1077_metho.xml
Size: 13,572 bytes
Last Modified: 2025-10-06 14:11:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1077"> <Title>Semantic Rule Based Text Generation</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> XCALIBUR EXPERTS </SectionTitle> <Paragraph position="0"/> </Section> <Section position="4" start_page="0" end_page="376" type="metho"> <SectionTitle> 3. Declarative Knowledge Representation </SectionTitle> <Paragraph position="0"> The XCALIBUR system uses a case-frame based inter/ingua for communication between the components. To provide as nearly canonical a representation as possible, the semantic information in each case-frame is used to determine the major structure of the tree, and any syntactic information is stored at the leaves of the semantic tree. The &quot; resulting case-frame can be converted into a canonical representation by merely deleting the leaves of the tree.</Paragraph> <Paragraph position="1"> The canonical representation is very useful for handling ellipsis, since phrases such as &quot;dual port disk&quot; and &quot;disk with two ports&quot; are represented identically. Figure 3-1 shows a sample query and its representation with purely syntactic information removed.</Paragraph> <Paragraph position="2"> + What is the largest 11780 fixed disk under $40,0O0? The rpO7.aa is a 516 MB fixed pack disk that costs 38000 dollars.</Paragraph> <Paragraph position="3"> / Tell me about the Ixyl 1.</Paragraph> <Paragraph position="4"> The Ixyl 1 is a 240 I/m line printer with plotting capabilities.</Paragraph> <Paragraph position="5"> Tell me all about the largest dual port disk with removable pack.</Paragraph> <Paragraph position="6"> The rmO5-ba is a 39140 dollar 256 MB dual port disk with removable pack, 1200 KB peak transfer rate and 38 ms access time.</Paragraph> <Paragraph position="7"> + What is the price of the largest single port disk? The 176 MB single port rpO6-aa costs 34000</Paragraph> <Paragraph position="9"/> </Section> <Section position="5" start_page="376" end_page="377" type="metho"> <SectionTitle> 4. The Kafka Generator </SectionTitle> <Paragraph position="0"> Kafka is used to build replies to user queries, to paraphrase the user's input for clarificational dialogs, and to generate the system's queries for the user. Figure 4-1 shows the major components and data flow of the Kafka generator. Either one or two inputs are provided: (1) a case frame in the XCALIBUR format, and (2) a set of tuples from the information broker (such as might be returned from a relational database). Either of these may be omitted. Four of the seven major components of Kafka use the transformational formalism, and are shown in bold outlines.</Paragraph> <Paragraph position="1"> Kafka is a direct descendant of an earlier natural language generator described in \[2\], which in turn had many components either derived from or inspired by Goldman's BABEL generator \[7\]. The case frame knowledge representation used in XCALIBUR has much in common with Schank's Conceptual Dependency graphs i8\]. The earlier XCALIBUR generator was very much ad hoc, and Kafka is an effort to formalize the processes used in that generator. The main similarity between Kafka and BABEL is in the verb selection process (described in section 5).</Paragraph> <Paragraph position="2"> The embedded transformational language used by Kafka was inspired by the OPS5 programming language developed by Forgy at CMU \[3\]. ors5 was in fact an early candidate for the implementation of Kafka, but OPS,5 supports only flat data structures. Since the case frame knowledge representation used in XCALIBUR is highly recursive, an embedded language supporting case frame matches was developed. The kafka programming language can be viewed as a production system with only a single working memory element and a case frame match rather than the flat match used in ors,5.</Paragraph> <Section position="1" start_page="376" end_page="377" type="sub_section"> <SectionTitle> 4.2. Transformational Rules </SectionTitle> <Paragraph position="0"> Some of the transformation rules in Kafka were derived from the verb selection method of BABEL, and others were derived taken from TGG rules given in \[10\]. Although Kafka has been based mainly on the generative grammar theory of Chomsky, the rule syntax allows much more powerful rules than tl~ose allowed in either the standard or extended standard theory. We have tried to provide a sufficiently powerful formalism to encode more than one grammatical tradition, and have not restricted our rules to any particular linguistic convention. Our goal has not been to validate any linguistic theory but rather to demonstrate the feasibility of using a single computational mechanism for text generation.</Paragraph> <Paragraph position="1"> The basic unit of knowledge in XCALIBUR is the case lrame. Kafka repeatedly transforms case frames into other case frames until either an error is found, no pattern matches, or a surface level case frame is generated. The surface case frame is converted into English by render, which traverses the case frame according to the sentence plan, printing out lexical items. A transformation is defined by an ordered set of rules.</Paragraph> <Paragraph position="2"> Each rule has up to four parts: =A pattern, which is matched against the current node. This match, if successful, usually binds several local variables to the sub-expressions matched.</Paragraph> <Paragraph position="3"> * A result, which is another case frame with variables at some leaves. These variables are replaced with the values found during the match. This process is called instantiation.</Paragraph> <Paragraph position="4"> = An optional variable check, the name of a lisp function which takes a binding list as input and returns either nil which causes the rule to fail, or a new binding list to be used in the instantiation phase. This feature allows representation of functional constraints.</Paragraph> <Paragraph position="5"> * An optional final flag, indicating that the output from this rule should be returned as the value of the rule set's transformation.</Paragraph> <Paragraph position="6"> A transformation is applied to a case frame by first recursively matching and instantiating the sub-cases of the expression and then transforming the parent node.</Paragraph> <Paragraph position="7"> Variables match either a single s-expression or a list of them. For example = HEAD would match either an atom or a list, = &quot;REST would match zero or more s-expressions, and = +OTHER would match one or more s-expressions. If a variable occurs more than once in a pattern, the first binds a value to that variable, and the second and subsequent occurrences must match that binding exactly.</Paragraph> <Paragraph position="8"> This organization is very similar to that of the ILIAD program developed by Bates at BBN \[1\]. The pattern, a result, and variable check correspond to the structural description, structural change, and condition of Bates' transformational rules, with only a minor variation in the semantics of each operation. The ILIAD system, though, is very highly syntax oriented (since the application is teaching English grammar to deaf children) and uses semantic information only in lexical insertion. The rules in Kafka The price ot X is FO0 and converts it to X costs FO0. More sophisticated rules for verb selection check for semantic agreement between various slot fillers, but this rule merely encodes knowledge about the relationship between the PRICE attribute and the COST verb. Figure 4-3 shows an input structure that this rule would match; Figure 4-4 shows the structure which would be returned.</Paragraph> <Paragraph position="10"/> </Section> </Section> <Section position="6" start_page="377" end_page="379" type="metho"> <SectionTitle> 5. The Generation Process </SectionTitle> <Paragraph position="0"> The first step in the generation process is preprocessing, which removes a lot of unnecessary fields from each case frame. These are mostly syntactic information left by the parser which are not used during the semantic processing of the query. Some complex input forms are converted into simpler forms. This step provides a high degree of insulation between the XCALIBUR system and the text generator, since changes in the XCALIBUR representation can be caught and converted here before any of Kafka's internal rules are affected.</Paragraph> <Paragraph position="1"> In the second phase (not used when paraphrasing user input) the case frame is converted from a query into a declarative response by filling some slots with (&quot;UNKNOWN) place-holders. Next the re~at module replaces these place-holders with information from the back-end (usually data from the XCON static database). The result is a CD graph representing a reply for each item in the user's query, with all the available data filled in.</Paragraph> <Paragraph position="2"> In the third phase of operation, the verb transform selects an English verb for each underlying CD primitive. Verb selection is very similar to that used by Goldman in his BABEL generator \[7\], except that BABEL uses hand-coded discrimination nets to select verbs, while Kafka keeps the rules separate. A rule compiler is being written which builds these discrimination nets automatically. The D-nets are used to weed out rules which cannot possibly apply to the current structure. Since the compiler in not yet installed, Kafka uses an interpreter which tries each rule in turn.</Paragraph> <Paragraph position="3"> After verb selection, the np-instantiation transform provides texical items for each of the adjectives and nouns present in each CD graph. Finally the order module linearizes the parse tree by choosing an order for the cases and deciding whether they need case markers. The final sentence is produced by the render module, which traverses the parse tree according to the sentence plan produced by order, printing the lexical items from each leaf node.</Paragraph> <Section position="1" start_page="378" end_page="378" type="sub_section"> <SectionTitle> 5.1. A Sample Run </SectionTitle> <Paragraph position="0"> The following is a transcript of an actual generation run which required 30 seconds of CPU time on a VAX 11/780 running under Franz Lisp. Most of the system's time is wasted by attempting futile matches during the first part of the match/instantiate loop. The variable parse1 has been set by the parser to the case frame shown in Figure 3-1. The variable data1 is the response from the information broker to the user's query. This transcript shows the Kafka system combining these two inputs to produce a reply for the user including (1) the answer to his direct query and (2) the information used by the information broker to determine that answer.</Paragraph> <Paragraph position="1"> (subject: (agent)) (plan: ((unmarked agent) *verb (unmarked object))) (verb-conj: (costs))) And the resulting surface string is: The 516 MB single port rpO7-aa costs 38000 dollars.</Paragraph> </Section> <Section position="2" start_page="378" end_page="378" type="sub_section"> <SectionTitle> 5.2. Generating Anaphora </SectionTitle> <Paragraph position="0"> Kafka has minimal capability to generate anaphora. A discourse history is kept by the dialog manager which maps each nominal case frame to a surface noun phrase.</Paragraph> <Paragraph position="1"> Anaphoric references are generated by choosing the shortest noun phrase representing the nominal not already bound to another nominal. Thus the pronoun it could only refer to a single item. Each noun phrase is generated in the output order, so the discourse history can be used to make decisions about anaphoric reference based on what the user has read p to that point. This technique is similar to but less sophisticated than that used by McDonald\[6\].</Paragraph> <Paragraph position="2"> Generation of anaphora is inhibited when new information must be displayed to the user, or when confirmational text is to be included.</Paragraph> </Section> <Section position="3" start_page="378" end_page="379" type="sub_section"> <SectionTitle> 5.3. Confirmational Information </SectionTitle> <Paragraph position="0"> Speakers in a conversation use redundancy to ensure that all parties understand one another. This redundancy can be incorporated into natural language interfaces by &quot;echoing,&quot; or including additional information such as paraphrases in the generated text to confirm that the computer has chosen the correct meaning of the user's input, For example, of the user asks: + What is the price of the largest single port disk? The following reply, while strictly correct, is likely to be unhelpful, and does not reassure the user that the meaning of the query has been properly understood: 34000 dollars.</Paragraph> <Paragraph position="1"> The XCALIBUR system would answer with the following sentence which not only answers the user's question, but includes evidence that the system has correctly determined the user's request: The 176 MB single port rpO6-aa costs 34000 dollars.</Paragraph> <Paragraph position="2"> XCALIBUR uses focus information to provide echoing. As each part of the user's query is processed, all the attributes of the object in question which are needed to answer the queryare recorded. Then the generator assures that the value of each of these attributes is presented in the final output.</Paragraph> </Section> </Section> class="xml-element"></Paper>