File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2173_intro.xml
Size: 17,511 bytes
Last Modified: 2025-10-06 14:06:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2173"> <Title>Multilingual authoring using feedback texts</Title> <Section position="2" start_page="0" end_page="1056" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The production of multilingual documentation has an obvious practical importance. Companies seeking global markets for their products must provide instructions or other reference materials in a variety of languages. Large political organizations like the European Union are under pressure to provide multilingual versions of official documents, especially when communicating with the public. This need is met mostly by human translation: an author produces a source document which is passed to a number of other people for translation into other languages. null Human translation has several well-known disadvantages. It is not only costly but timeconsuming, often delaying the release of the product in some markets; also the quality is uneven and hard to control (Hartley and Paris, 1997). For all these reasons, the production of multilingual documentation is an obvious candidate for automation, at least for some classes of document. Nobody expects that automation will be applied in the foreseeable future for literary texts ranging over wide domains (e.g. novels). However, there is a mass of non-literary material in restricted domains for which automation is already a realistic aim: instructions for using equipment are a good example.</Paragraph> <Paragraph position="1"> The most direct attempt to automize multi-lingual document production is to replace the human translator by a machine. The source is still a natural language document written by a human author; a program takes this source as input, and produces an equivalent text in another language as output. Machine translation has proved useful as a way of conveying roughly the information expressed by the source, but the output texts are typically poor and over-literal.</Paragraph> <Paragraph position="2"> The basic problem lies in the analysis phase: the program cannot extract from the source all the information that it needs in order to produce a good output text. This may happen either because the source is itself poor (e.g. ambiguous or incomplete), or because the source uses constructions and concepts that lie outside the program's range. Such problems can be alleviated to some extent by constraining the source document, e.g. through use of a 'Controlled Language' such as AECMA (1995).</Paragraph> <Paragraph position="3"> An alternative approach to translation is that of generating the multilingual documents from a non-linguistic source. In the case of automatic Multilingual Natural Language Generation (M null NLG), the source will be a knowledge base expressed in a formal language. By eliminating the analysis phase of MT, M-NLG can yield high-quality output texts, free from the 'literal' quality that so often arises from structural imitation of an input text. Unfortunately, this benefit is gained at the cost of a huge increase in the difficulty of obtaining the source. No longer can the domain expert author the document directly by writing a text in natural language. Defining the source becomes a task akin to building an expert system, requiring collaboration between a domain expert (who understands the subject-matter of the document) and a knowledge engineer (who understands the knowledge representation formalism). Owing to this cost, M-NLG has been applied mainly in contexts where the knowledge base is already available, having been created for another purpose (Iordanskaja et al., 1992; Goldberg et al., 1994); for discussion see Reiter and Mellish (1993).</Paragraph> <Paragraph position="4"> Is there any way in which a domain expert might author a knowledge base without going through this time-consuming and costly collaboration with a knowledge engineer? Assuming that some kind of mediation is needed between domain expert and knowledge formalism, the only alternative is to provide easier tools for editing knowledge bases. Some knowledge management projects have experimented with graphical presentations which allow editing by direct manipulation, so that there is no need to learn the syntax of a programming language see for example Skuce and Lethbridge (1995).</Paragraph> <Paragraph position="5"> This approach has also been adopted in two M-NLG systems: GIST (Power and Cavallotto, 1996), which generates social security forms in English, Italian and German; and DRAFTER (Paris et al., 1995), which generates instructions for software applications in English and French.</Paragraph> <Paragraph position="6"> These projects were the first attempts to produce symbolic authoring systems - that is, systems allowing a domain expert with no training in knowledge engineering to author a knowledge base (or symbolic source) from which texts in many languages can be generated.</Paragraph> <Paragraph position="7"> Although helpful, graphical tools for managing knowledge bases remain at best a compromise solution. Diagrams may be easier to understand than logical formalisms, but they still lack the flexibility and familiarity of natural language text, as empirical studies on editing diagrammatic representations have shown (Kim, 1990; Petre, 1995); for discussion see Power et al. (1998). This observation has led us to explore a new possibility, at first sight paradoxical: that of a symbolic authoring system in which the current knowledge base is presented through a natural language text generated by the system.</Paragraph> <Paragraph position="8"> This kills two birds with one stone: the source is still a knowledge base, not a text, so no problem of analysis arises; but this source is presented to the author in natural language, through what we will call a feedback text. As we shall see, the feedback text has some special features which allow the author to edit the knowledge base as well as viewing its contents. We have called this editing method 'WYSIWYM', or 'What You See Is What You Meant': a natural language text ('what you see') presents a knowledge base that the author has built by purely semantic decisions ('what you meant').</Paragraph> <Paragraph position="9"> A basic WYSIWYM system has three components: null * A module for building and maintaining knowledge bases. This includes a 'T-Box' (or 'terminology'), which defines the concepts and relations from which assertions in the knowledge base (or 'A-Box') will be formed.</Paragraph> <Paragraph position="10"> * Natural language generators for the languages supported by the system. As well as producing output texts from complete knowledge bases, these generators will produce feedback texts from knowledge bases in any state of completion.</Paragraph> <Paragraph position="11"> * A user interface which presents output or feedback texts to the author. The feedback texts will include mouse-sensitive 'anchors' allowing the author to make semantic decisions, e.g. by selecting options from pop-up menus.</Paragraph> <Paragraph position="12"> The WYSIWYM system allows a domain expert speaking any one of the supported languages to produce good output texts in all of them. A more detailed description of the architecture is given in Scott et al. (1998).</Paragraph> <Paragraph position="13"> 2 Example of a WYSIWYM system The first application of WYSIWYM was DRAFTER-II, a system which generates in- null stuctions for using word processors and diary managers. At present three languages are supported: English, French and Italian. As an example, we will follow a session in which the author encodes instructions for scheduling an appointment with the OpenWindows Calendar Manager. The desired content is shown by the following output text, which the system will generate when the knowledge base is complete: To schedule the appointment: Before starting, open the Appointment Editor window by choosing the Appointment option from the Edit menu.</Paragraph> <Paragraph position="14"> Then proceed as follows: 1 Choose the start time of the appointment. null 2 Enter the description of the appointment in the What field.</Paragraph> <Paragraph position="15"> 3 Click on the Insert button.</Paragraph> <Paragraph position="16"> In outline, the knowledge base underlying this text is as follows. The whole instruction is represented by a procedure instance with two attributes: a goal (scheduling the appointment) and a method. The method instance also has two attributes: a precondition (expressed by the sentence beginning 'Before starting') and a sequence of steps (presented by the enumerated list). Preconditions and steps are procedures in their turn, so they may have methods as well as goals. Eventually we arrive at sub-procedures for which no method is specified: it is assumed that the reader of the manual will be able to click on the Insert button without being told how.</Paragraph> <Paragraph position="17"> Since in DRAFTER-II every output text is based on a procedure, a newly initialised knowledge base is seeded with a single procedure instance for which the goal and method are undefined. In Prolog notation, we can represent such a knowledge base by the following assertions: procedure(procl).</Paragraph> <Paragraph position="18"> goal(procl, A).</Paragraph> <Paragraph position="19"> method(procl, B).</Paragraph> <Paragraph position="20"> Here procl is an identifier for the procedure instance; the assertion procedure (procl) means that this is an instance of type procedure; and the assertion goal(procl, A) means that procl has a goal attribute for which the value is currently undefined (hence the variable A).</Paragraph> <Paragraph position="21"> When a new knowledge base is created, DRAFTER-II presents it to the author by generating a feedback text in the currently selected language. Assuming that this language is English, the instruction to the generator will be generate(procl, english, feedback) and the feedback text displayed to the author will be Achieve this goal by applying this method.</Paragraph> <Paragraph position="22"> This text has several special features.</Paragraph> <Paragraph position="23"> * Undefined attributes are shown through anchors in bold face or italics. (The system actually uses a colour code: red instead of bold face, and green instead of italics.) * A red anchor (bold face) indicates that the attribute is obligatory: its value must be specified. A green anchor (italics) indicates that the attribute is optional.</Paragraph> <Paragraph position="24"> * All anchors are mouse-sensitive. By click null ing on an anchor, the author obtains a pop-up menu listing the permissible values of the attribute; by selecting one of these options, the author updates the knowledge base.</Paragraph> <Paragraph position="25"> Although the anchors may be tackled in any order, we will assume that the author proceeds from left to right. Clicking on this goal yields (to save space, this figure omits some options), from which the author selects 'schedule'. Each option in the menu is associated with an 'updater', a Prolog term (not shown to the author) that specifies how the knowledge base should be updated if the option is selected. In this case the updater is insert(procl, goal, schedule) meaning that an instance of type schedule should become the value of the goal attribute on procl. Running the updater yields an extended knowledge base, including a new instance schedl with an undefined attribute actee. (Assertions describing attribute values are indented to make the knowledge base easier to read.) procedure (proc 1).</Paragraph> <Paragraph position="26"> goal(procl, schedl).</Paragraph> <Paragraph position="27"> schedule (schedl).</Paragraph> <Paragraph position="28"> actee(schedl, C).</Paragraph> <Paragraph position="29"> method(procl, B).</Paragraph> <Paragraph position="30"> From the updated knowledge base, the generator produces a new feedback text.</Paragraph> <Paragraph position="31"> Schedule this event by applying this method.</Paragraph> <Paragraph position="32"> Note that this text has been completely regenerated. It was not produced from the previous text merely by replacing the anchor this goal by a longer string.</Paragraph> <Paragraph position="33"> Continuing to specify the goal, the author now clicks on this event.</Paragraph> <Paragraph position="34"> appointment meeting This time the intended selection is 'appointment', but let us assume that by mistake the author drags the mouse too far and selects 'meeting'. The feedback text Schedule the meeting by applying this method.</Paragraph> <Paragraph position="35"> immediately shows that an error has been made, but how can it be corrected? This problem is solved in WYSIWYM by allowing the author to select any span of the feedback text that represents an attribute with a specified value, and to cut it, so that the attribute becomes undefined, while its previous value is held in a buffer. Even large spans, representing complex attribute values, can be treated in this way, so that complex chunks of knowledge can be copied across from one knowledge base to another. When the author selects the phrase 'the meeting', the system displays a pop-up menu with two options: By selecting 'Cut', the author activates the updater null cut(schedl, actee) which updates the knowledge base by removing the instance meetl, currently the value of the actee attribute on schedl, and holding it in a buffer. With this attribute now undefined, the feedback text reverts to Schedule this event by applying this method.</Paragraph> <Paragraph position="36"> whereupon the author can once again expand this event. This time, however, the pop-up menu that opens on this anchor will include an extra option: that of pasting back the material that has just been cut. Of course this option is only provided if the instance currently held in the buffer is a suitable value for the attribute represented by the anchor.</Paragraph> <Section position="1" start_page="1055" end_page="1056" type="sub_section"> <SectionTitle> Paste appointment meeting </SectionTitle> <Paragraph position="0"> The 'Paste' option here will be associated with the updater paste(schedl, actee) which would assign the instance currently in the buffer, in this case meetl, as the value of the actee attribute on schedl. Fortunately the author avoids reinstating this error, and selects 'appointment', yielding the following reassuring feedback text: Schedule the appointment by applying this method.</Paragraph> <Paragraph position="1"> Note incidentally that this text presents a knowledge base that is potentially complete, since all obligatory attributes have been specified. This can be immediately seen from the absence of any red (bold) anchors.</Paragraph> <Paragraph position="2"> Intending to add a method, the author now clicks on this method. In this case, the pop-up menu shows only one option: \[ method \] Running the associated updater yields the following knowledge base: procedure(procl).</Paragraph> <Paragraph position="3"> goal(procl, schedl).</Paragraph> <Paragraph position="4"> schedule(schedl).</Paragraph> <Paragraph position="5"> actee(schedl, apptl).</Paragraph> <Paragraph position="6"> appointment(apptl).</Paragraph> <Paragraph position="7"> method(procl, methodl).</Paragraph> <Paragraph position="8"> method(methodl).</Paragraph> <Paragraph position="9"> precondit+-on(methodl, D).</Paragraph> <Paragraph position="10"> steps(methodl, stepsl).</Paragraph> <Paragraph position="11"> steps(stepsl).</Paragraph> <Paragraph position="12"> first(stepsl, proc2).</Paragraph> <Paragraph position="13"> procedure(proc2).</Paragraph> <Paragraph position="14"> goal(proc2, F).</Paragraph> <Paragraph position="15"> method(proc2, G).</Paragraph> <Paragraph position="16"> rest(stepsl, E).</Paragraph> <Paragraph position="17"> meeting(meetl).</Paragraph> <Paragraph position="18"> A considerable expansion has taken place here because the system has been configured to automatically instantiate obligatory attributes that have only one permissible type of value. (In other words, it never presents red anchors with pop-up menus having only one option.) Since the steps attribute on methodl is obligatory, and must have a value of type steps, the instance stepsl is immediately created. In its turn, this instance has the attributes first and rest (it is a list), where first is obligatory and must be filled by a procedure. A second procedure instance proc2 is therefore created, with its own goal and method. To incorporate all this new material, the feedback text is recast in a new pattern, the main goal being expressed by an infinitive construction instead of an imperative: null To schedule the appointment: First, achieve this precondition.</Paragraph> <Paragraph position="19"> Then follow these steps.</Paragraph> <Paragraph position="20"> Note that at any stage the author can switch to one of the other supported languages, e.g. French. This will result in a new call to the generator generate(procl, french, feedback) and hence in a new feedback text expressing the procedure proc 1.</Paragraph> <Paragraph position="21"> Insertion du rendez-vous: Avant de commencer, accomplir cette tdche.</Paragraph> <Paragraph position="22"> Ex~cuter les actions suivantes.</Paragraph> <Paragraph position="23"> 1 Ex~cuter cette action en appliquant cette mdthode.</Paragraph> </Section> </Section> class="xml-element"></Paper>