File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1409_metho.xml
Size: 15,053 bytes
Last Modified: 2025-10-06 14:07:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1409"> <Title>An Extended Architecture for Robust Generation*</Title> <Section position="4" start_page="64" end_page="65" type="metho"> <SectionTitle> 3 Architecture </SectionTitle> <Paragraph position="0"> As described in section t, the deep processing in VERBMOBIL is based on a pipeline of modules which use a unique interface language (VIT 1) that incorporates a semantic representation. Since this semantic representation is obviously grammar-independent and could reflect the effects of spoken, spontaneous language, we have no guarantee that the grammar covers the semantic representation given by the transfer module. Consequently we have chosen to extend the classical generation architecture with a new module dedicated to robust preprocessing. We first present our classical generator architecture (see also (Becker et al., 1998; Becker et al., 2000)) in terms of the RAGS architecture and then discuss its extension to the task-inherent problems.</Paragraph> <Paragraph position="1"> The RAGS architecture (Cahill et al., 1999) is a reference architecture for natural language generation systems. Reflecting the common parts of natural language generation systems, this proposal aims to provide a standard architecture allowing the identification of some important generation subtasks and resources. By presenting our system in the light of the RAGS specifications, our goal is to propose general solutions that could be used by other researchers who need to extend their own generation architecture to similar tasks.</Paragraph> <Paragraph position="2"> While the macro-planning task is important and mandatory in text generation, it is limited in dialogue translation. Most of the related problems, for instance the sentence segmentation and the pronoun choices, have been solved by the user in the source language. Considering the RAGS architecture, conceptual and rhetorical levels of representation are also outside the scope of our system. Our architecture consists of four main modules (see figure 2). For an easy adaptation to other domains and languages, we have emphasized an organization based on a general kernel system and the declarativity of knowledge sources (Becker et al., 1998). All but the first modules are captured by the RAGS architecture. However, the first module is dedicated solely to robustness in the specific speech-to-speech translation task and will be presented and discussed last in this section. It can easily be added to a RAGSlike system whose whiteboard is perfectly suited for lVerbmobil Interface Term, (Bos et al., 1996; Dorna, 1996) the transformations that the robustness preprocessing module performs.</Paragraph> <Paragraph position="3"> Microplanning Module At the level of sentence generation, the quality of the planning process depends on the interdependencies between conceptual semantics, predicative semantics and syntax. A particular lexical choice can imply constraints on other lexical items. The role of the microplanner is to realize lexical choices in a way that a syntactic realization is possible and costly backtracking is prevented. The microplanning task can be viewed as a constraint solving problem and implemented using an adapted constraint solving mechanism in order to achieve efficiency, flexibility, and declarativity of knowledge. The microplanner produces a dependency tree representation indicating for each node a set of syntactical constraints to be fulfilled by the corresponding lexical syntactic units (predicate, tense, aspect, mood, etc.).</Paragraph> <Paragraph position="4"> Syntactic Realization Module This module is in charge of the concrete syntax generation. The processing is .based ,on a fully lexicatized Tree-Adjoining Grammar derived from the HPSG grammar used in the deep-processing parser module (Kasper~et aL, 1995; Becker, 1998).</Paragraph> <Paragraph position="5"> Surface Realization Module The syntactic realization module produces a derived tree from which tile output string is extracted. The morphological features in preterminal nodes are used for inflection.</Paragraph> <Paragraph position="6"> The surface string is also annotated by syntactic information (phrase boundary, aspect, sentence mood) that are exploited by the speech synthesis module.</Paragraph> <Paragraph position="7"> Robustness Preprocessing Module We have described three modules corresponding to classical tasks of generation systems and pointed out at the beginning of this section the necessity for robustness. Where can we integrate the required robustness in such a generation architecture? One approach could be the relaxation of constraints during the syntactic realization (relaxing word order or/and dependency relations). One can argue against this approach that: clearly separated from the microplanning rules, justifying our presentation of robustness as a separate module.</Paragraph> <Section position="1" start_page="65" end_page="65" type="sub_section"> <SectionTitle> 4.2 Conforming to the Interface Language Definition </SectionTitle> <Paragraph position="0"> The definition of the interface language 2 comprises only its syntax and some semantic constraints.</Paragraph> <Paragraph position="1"> There is an implementation of expressions in the interface language as an abstract data type which can at least check syntactic conformity (Dorna, 1996).</Paragraph> <Paragraph position="2"> But we also have to deal with semantic faults.</Paragraph> <Paragraph position="3"> -*. There is no .straightf~r~ard~Way~t~Aimi.t~he.J~e~.:.~.~`-~,;~T~f~rs~e~amp~e~i~''minating>r`0bust~pre~r~esslaxation of syntactic constraints only to the robustness problematic structures.</Paragraph> <Paragraph position="4"> * We must be sure that the microplanning module can deal with problematic semantic input.</Paragraph> <Paragraph position="5"> These points suggest to check and repair the inconsistencies of the semantic representation as early as possible, i.e., before sentence microplanning.</Paragraph> <Paragraph position="6"> Moreover we show in the next section that most of the problems presented in section 2 can be identified based on the microplanner input.</Paragraph> <Paragraph position="7"> We now present more concretely the robust pre-processing module.</Paragraph> </Section> </Section> <Section position="5" start_page="65" end_page="66" type="metho"> <SectionTitle> 4 Robustness </SectionTitle> <Paragraph position="0"> In this section we describe the types of problems defined in section 2 using examples from our system and discuss how our module is made robust enough to handle a lot of these problems.</Paragraph> <Paragraph position="1"> Before the semantic representation is handed to microplanning, the robustness preproeessing module of the generator checks the input, inspecting its parts for known problems. For each problem found, the preprocessor lowers a confidence value for the generation output which measures the reliability of our result. In a number of cases, we use heuristics to fix problems, aiming at improved output.</Paragraph> <Paragraph position="2"> As discussed in section 2, problems in the input to the generator can be technical or task-inherent.</Paragraph> <Paragraph position="3"> Technical problems manifest themselves as faults wrt. the interface language definition, whereas the task-inherent problems concern mismatches between a specific semantic expression and the coverage of the natural language grammar used in the generator. These mismatches are known as the generation gap (Meteer, 1990).</Paragraph> <Section position="1" start_page="65" end_page="65" type="sub_section"> <SectionTitle> 4.1 Declarativity </SectionTitle> <Paragraph position="0"> In..our implementation; the :robustness module is partly integrated into the constraint solving approach of the microplanning module. Using the constraint solving approach allows for a strict separation of algorithms (i.e., some constraint solving algorithln) and declarative knowledge sources. On this level, the rules (constraints) for robustness can be ing is on the connectedness of the semantic input graph. Our interface language describes an interface term to contain a connected semantic graph plus an index pointing to the root of the graph. Two types of problems can occur according to this definition: Disconnectedness of the Graph: The robustness preprocessor checks whether the input graph is in fact connected. If there are several disconnected parts, a distinct generation call is made for each subgraph. In the end, all sub-results are connected to produce a global result. We are currently working on a better heuristic to order the sub-results, taking into account information about the word order in the source language.</Paragraph> <Paragraph position="1"> Wrong Index: The robustness preprocessor tests whether the index points to the root of the graph or one of the subgraphs. For each sub-graph without an adequate index, we compute a local root pointer which is used for further processing. This turned out to be an easy and reliable heuristic, leading to good results.</Paragraph> <Paragraph position="2"> There are several types of technical problems which cannot be repaired well. Minimally, these cases are detected, warning messages are produced, and the confidence value is lowered. We apply heuristics where possible. Examples are uniqueness of labels (every semantic predicate must have a unique identifier), the use of undefined predicate names, and contradicting information (e.g., the use of a DEFINITE and an INDEFINITE quantifier for the same object). In the case of incorrect predicate classes, i.e., where a predicate is used with an undefined-argument frame, only those parts of the input are handled which are analyzed as correct.</Paragraph> </Section> <Section position="2" start_page="65" end_page="66" type="sub_section"> <SectionTitle> 4.3 Falling into the Generation Gap </SectionTitle> <Paragraph position="0"> The robustness preprocessor even does more than checking for structural contradictions between input and interface language. Based on analyses of a large amount of test-suites it is fed with some heuristics which help to bridge the generation gap that reflects the unpredictability, whether_a specific semantic structure can be mapped to an acceptable utterance in the target language. Some examples of heuristics used in our system are as follows: Conflicting Information: Often it is inconsistent to allow several predicates to include the same depending structure in their argument frames, e.g., two predicates describing different prepositions should not point to the same entity. We have to pick one-,possibitity~heuristically: ........ Gaps in Generation Knowledge: There are input configurations that have no reflection within the generator's knowledge bases, e.g., the DISCOURSE predicate defining a sequence of otherwise unrelated parts of the input. The robustness preprocessor removes this predicate, thereby subdividing the connected graph into several unconnected ones and continuing as for disconnected graphs described above.</Paragraph> <Paragraph position="1"> Other examples for generation constraints that can conflict with the input are the occurrence of some specific cyclic subparts of graphs, selfreferring predicates, and chains of predicates which are not realizable in generation.</Paragraph> <Paragraph position="2"> Robustness inside the Microplanner and the Syntactic Generator additionally helps to get rid of some generation gap problems: Contradictions to Generation Constraints: The knowledge bases of the generator (mieroplanning rules, grammar and lexicon) describe constraints on the structure of the output utterance that might conflict with the input. A common problem occuring in our system is the occurrence of subordinating predicates with empty obligatory arguments.</Paragraph> <Paragraph position="3"> Here the microplanner relaxes the constraint for argument completeness and hands over a structure to the syntactic generator that does not fulfill all syntactic constraints or contains elliptical arguments. In these cases, the grammar constraints for obligatory arguments are relaxed in the syntactic generator and elliptical arguments are allowed.beyond the constraints of the grammar. The result is often output that reflects the spontaneous speech input which we accept for the sake of robustness.</Paragraph> <Paragraph position="4"> Missing attributes: Often there are obligatory attributes for the semantic predicates missing in the input, e.g., statements about the directionality of prepositions, agreement information, etc. The generator uses heuristics to choose a value for its own.</Paragraph> <Paragraph position="5"> Contradictions on the Semantic Level: Some attributes may lead to conflicts during generation,.e.g:, if~ pronoun is given:as SORT~HUMAN and TYPE-~SPEAKER. The generator uses a heuristics to set the value of SORT in this case. Solving Part of the Analysis Task: Sometimes the input to the generator is underspecified in a way that it can be improved easily by using simple heuristics to &quot;continue analysis.&quot; A common example in. our system is an input expression like &quot;on the third&quot; which often is .... .. ~analyzed..as.. (ABSTR.,-NOM .A OPal)(.3.).), e~-e,,..an elliptical noun with ordinal number 3. We add the sort TIME_DOFM 3 to the attributes of ABSTR_NOM SO that, e.g., a semantic relation TEMPORAL_OR_LOCAL is correctly mapped to the German preposition &quot;an.&quot;</Paragraph> </Section> <Section position="3" start_page="66" end_page="66" type="sub_section"> <SectionTitle> 4.4 How much robustness? </SectionTitle> <Paragraph position="0"> There is a limit to the power of heuristics that we have determined using a large corpus of test data.</Paragraph> <Paragraph position="1"> Some examples for possible pitfalls: * When realizing &quot;empty nominals&quot; ABSTR_NOM as elliptical nouns, guessing the gender can cause problems: &quot;Thursday is my free day ti &quot; as FREE A DAY A ABSTR_NOM (with a reading as in &quot;day job&quot;) might result in &quot;*Donnerstag ist mein freies Tag ti.&quot; o Conflicts between sort and gender of a pronoun might be resolved incorrectly: &quot;Es (English: 'it') trifft sich ganz hervorragend&quot; with PRON (GENDER:NTR, SORT~HUMAN) should not be translated as &quot;#He is really great.&quot; Although the boundary beyond which deep translation cannot be achieved even with heuristics is somewhat arbitrary, the big advantage of deep processing lies in the fact that the system 'knows' its boundaries and actually fails when a certain level of quality cannot be guaranteed. As discussed in section 1, in a dialogue system, a bad translation nfight still be better than none at all, so one of the shallow modules can be selected when deep processing fails.</Paragraph> </Section> </Section> class="xml-element"></Paper>