File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/h90-1009_abstr.xml
Size: 22,603 bytes
Last Modified: 2025-10-06 13:46:58
<?xml version="1.0" standalone="yes"?> <Paper uid="H90-1009"> <Title>Interactive Multimedia Explanation for Equipment Maintenance and Repair</Title> <Section position="1" start_page="0" end_page="46" type="abstr"> <SectionTitle> Introduction COMET (COordinated Multimedia Explanation </SectionTitle> <Paragraph position="0"> Testbed) is an experimental system that generates interactive multimedia explanations of how to operate, maintain. and repair equipment. Our research stresses the dynamic generation of the content and form of all material presented, addressing issues in the generation of text and graphics, and in coordinating text and graphics in an integrated presentmJon.</Paragraph> <Paragraph position="1"> COMET coDtain~ a static knowledge base describing objects and plan for maintenance and repair, and a dynamic knowledge source for diagnosing failures. A menu interface allows users to request explanations of specific procedures and to specify failure symptoms that will invoke a diagnostic component. The diagnostic component can ask the user to carry out procedures that COMET will explain if requested. In contrast to hypermedia systems that present previously authored material, COMET has underlying models of the user and context that allow each aspect of the expl~nation generated to be based on the currant situation.</Paragraph> <Paragraph position="2"> In this paper we discuss recent progress on COMET, including the development of an interface for user input, the integration of its individual modules into a working system, and further zesults in our work on the media coordinator, the text generator, and the graphics generator.</Paragraph> <Section position="1" start_page="0" end_page="42" type="sub_section"> <SectionTitle> System Overview </SectionTitle> <Paragraph position="0"> COMET consists of the major components illustrated in Fig. 1. On receiving a request for an explanation, the content planner uses text plans, or schemas, to determine which information should be included from the underlying knowledge sources in the explanation. COMET uses three different knowledge sources: a static representation of the domain encoded in LOOM \[11\], a rule-base learned over time \[2\], and a detailed geometric knowledge base necessary for the generation of graphics \[12\]. The content planner produces the full content for the explanation, represented as a hierarchy of logical forms (LFs) \[I\], which are passed to the media coordinator. The media coordinator refines the LFs by adding directives indicating which portions ate to be produced by each of a set of media-specific generation systems. The text generator and layout component, which formats the final presentation for the low-level rendering and typesetting software. Much of our work on COMET has been done in a maintenance and repair domain for the US Army AN/PRC-119 portable radio receiver-trausmitter \[3\].</Paragraph> <Paragraph position="1"> Currently, the system runs in parallel on five Sun and lip machines, one for each of COMET's most computation-intensive modules, which communicate through pipes. A user interacts with COMET through an Xll menu interface, using menus that are created on the fly by the system. At the highest level, the user can choose to request an explanalion for an explidt repair procedure directly or can specify that troubleshooting help is needed. When help is requested, the underlying diagnostic system is invoked and the user is asked to specify symptoms of the failure from the menu shown in Fig. 2. In the course of diagnosing the failure, COMET will ask the user to carry out certain troubleshooting procedures. For example, ff the user indicates a loss of memory in the radio, COMET will generate a multiple-step test procedure. Each step is shown sequentially on the display. The user can request an explanation of any step, or can move forward or backward in the generated explanation, by using the menu interface.</Paragraph> <Paragraph position="2"> Figure 3 shows one display from an explanation generated by COMET.</Paragraph> </Section> <Section position="2" start_page="42" end_page="44" type="sub_section"> <SectionTitle> Media Coordination </SectionTitle> <Paragraph position="0"> In previous work [6] we focused on three features of our media coordinator: the use of a common content description language by each media-specific generator, allowing goals and information to be mapped to media-specific resources; the ability to make a fine-grained division of information between media; and the ability for information expressed in one medium only to influence the realization of information in the other. In this paper, we describe our recent advances in coordinzting sentence breaks with picture breaks.</Paragraph> <Paragraph position="1"> Informal experiments that we carded out when designing the media coordinator indicated that our subjects strongly prefer sentence breaks to coincide with picture breaks [10]. While more than one sentence may appear with a single picture, there was a strong objection to sentences that run across picture boundaries. For example, in Fig. 3, users would prefer a sentence break to correspond to the two pictures: &quot;Loosen the captive screws.&quot; and &quot;Pull the holding battery cover plate off of the radio.&quot; Coordinating sentence and picture breaks requires bidirectional interaction between the text and graphics generators since COMET.</Paragraph> <Paragraph position="2"> graphical constraints on picture size may sometimes force delimitation of sentences, while grammatical constraints on sentence construction may sometimes control picture size.</Paragraph> <Paragraph position="3"> Our implementation of sentence-picture coordination involves three stages of processing.</Paragraph> <Paragraph position="4"> In the first stage of processing, text and graphics generators separately annotate their own copies of the LF to indicate minimal sentence and picture break locations.</Paragraph> <Paragraph position="5"> In our current implementation, when the verb for the sentence is selected, the text generator annotates the LF to indicate the grammatical sentence with the smallest number of constituents that can be formed. The lexicon contains the required set of inherent roles for each verb; these are the case roles that must be present to form a grammatical sentence. For example, when the verb &quot;reinstall&quot; is selected, there are two required inherent case roles, the agent and the medium (note that the agent can be omitted in imperative sentences). Thus, the sentence &quot;Reinstall the primary battery.'&quot; is perfectly grammatical. However, if the verb &quot;return&quot; is selected, there are three required inherent case roles: agent, medium, and to-location. Thus, while the sentence &quot;Return the primary battery to the radio.&quot; is acceptable, &quot;Return the prirneay battery.&quot; is not in this context. The text generator will Annotate the LF corresponding to these two sentences differently when the verb is selected. If &quot;reinstall&quot; is selected, the attributes agent and medium are each annotated with an attribute indicating that it is required. If &quot;return&quot; is selected, the to-ioc role is also annotated.</Paragraph> <Paragraph position="6"> In the second stage of processing, ff the text generator has a choice of verbs, it will check the graphics generator's placement of picture brenks. For example, if there is no reason to select &quot;return&quot; over &quot;reinstall&quot;, then the text generator will read the graphics generator's copy of the LF by nnifying it with its own. This has the effect of adding the graphics annotations to the text generator's copy. If two pictures were used to express the action (e.g., one for the installing action and a second to indicate the location), text would select &quot;reinstall&quot; and would generate a second sentence to accompany the second picture that conveys the location (e.g., &quot;Place it on the radio socket.&quot;). However, if a single picture expressing both the installation action and location were generated, then the verb &quot;return&quot; would be selected and a single sentence would be generated to accompany the picture. We are in the process of implementing this second stage.</Paragraph> <Paragraph position="7"> In the third and final stage, the text generator will check if there are conflicts between minimal sentence size and the graphics generator's assignment of picture breaks. If graphics generates more than one picture for the informarion required for a minimal grammatical sentence, text will attempt to select two basic verbs that together convey the meaning of the verb ori~nally selected, and that individually coxrespond to the information in the two pictures. For example, reinstalling the battery consists of first placing it on the radio and then snapping some latches. If each of these steps is portrayed in a separate picture, then text can select the verbs &quot;place&quot; and &quot;select&quot; to convey the compositional meaning of &quot;reinstall&quot; and generate two separate sentences. This stage is also currently under development.</Paragraph> <Paragraph position="8"> Note that after text and graphics are generated with coordinated breaks, it will be necessary to lay them out so that relationships between corresponding material in different media are clearly visible. Although COMET's current media layout component does not take these relationships into account, we have begun to design a new one that will, building on our previous work on automated layout \[7\].</Paragraph> <Paragraph position="9"> Text Generation One focus in the text generation component has been on selection of appropriate vocabulary for the explanation. We have developed a framework for lexical choice using the Ftmctional Unification Formalism (FUF)\[9, 4, 5\]. In addition, we have identified how previous discourse and the underlying knowledge sources influence lexical choice and implemented these influences as part of the lexieal chooser.</Paragraph> <Paragraph position="10"> The lexical chooser is part of the text surface generator.</Paragraph> <Paragraph position="11"> It receives its input from the media coordinator and passes its output to the surface generator, which contains COMET's grammar and constructs the grammatical slxucture of the sentence. As output, the lexieal chooser produces a list of pardally specified functional descriptions (PSFDs) that are passed as input to the surface generator.</Paragraph> <Paragraph position="12"> Thus, a PSFD is basically a lexicalized LF (using the special feature lex) that, in addition, specifies the overall grammatieal form of that utterance (e.g., declarative). COMET's grammar will enrich the PSFD with syntactic features to form a complete syntactic structure that is then linearized to produce a sentence.</Paragraph> <Paragraph position="13"> In the general case, the mapping between a LF and a PSFD is done as follows: each simple action in the LF is mapped onto a clause of the PSFD and each object description in the LF onto a nominal 1 of the clause. The process of the action is mapped onto a verb of the clause. Both mappings are made by unifying the description with a Functional Unification Lexicon (FUL). However, while unification in FUF is normally performed top-down, unification with a FUL is performed bottom-up, starting with the most embedded sub-LFs. This is because the lexicalizations of the process roles sometimes constrain the possible lexicalizations of the process itself (i.e., the verb). As an example, consider a case where semantic features in the knowledge base are used to select the verb of the sentence. Fig. 4 presents two LFs of the concept c-turn, with c-channel-knob and c-radio-transmitter as the respective mediums. In this example, the input LF contains a process that is a c-turn. In Fig. 4(a), the verb &quot;to set&quot; is selected because the medium (the object being turned) has discrete settings, as is the case for the c-channel-knob. In Fig. 4(b), the medium does not have discrete settings, as is the ease for the c-radio.transmitter, and the verb &quot;to turn&quot; is selected.</Paragraph> <Paragraph position="14"> For each example, the lexicon is st accessed to lexicalize the object concepts embedded in the roles of the top-level LF: c-channel-knob by &quot;channel knob&quot;, c-radio-transmitter by &quot;radio&quot;, c-position-1 by 'position 1' and c-front-panel by &quot;front-panel&quot;. It is then accessed again to lexicalize the process concept of the top-level LF: c-turn by &quot;to set&quot; in Fig. 4(a), where c-channel-knob is the medium and by &quot;to turn&quot; in Fig. 4(b), where c-radio-transmitter is the medium. In selecting the verb, the lexical chooser invokes a function that accesses the knowledge base to cheek whether the medium is an instance of a discrete knob or not. If it is, the verb &quot;to set&quot; is chosen. Otherwise, &quot;to turn&quot; is chosen.</Paragraph> <Paragraph position="15"> As illustrated in Fig. 5, this lexical choice is implemented by using a special feature of FUF termed CONTROL in the FUL entry for the concept c-turn. It allows invocation of an arbitrary LISP predicate during the unification process. Only ff this predicate is satisfied will unification of the FD containing the CONTROL pair succeed. In this example, CONTROL is used to have FUL directly query the knowledge base for additional informarion about the medium of c.turn.</Paragraph> <Paragraph position="16"> COMET's lexical chooser can also choose between words based on context. For example, it will choose the verb &quot;reinstall&quot; or &quot;return&quot; in place of &quot;install&quot; when it li.e. noun phrase, pronoun or proper noun.</Paragraph> <Paragraph position="18"> role values.</Paragraph> <Paragraph position="19"> instructs the user to install an object that it has previously instructed the user to remove. For each action that has an inverse action, COMET checks whether it has already instructed the user to perform the inverse action in the current explanation. If so, it will select a verb reflecting the inverse. Consider the partial set of instructions for troubleshooting loss of memory in Fig. 6. With no previous discourse, COMET selects the verb &quot;install&quot; to describe the installation for the holding battery. However, after it has instructed the user to &quot;remove&quot; the primary battery and &quot;pull&quot; the battery box away from the radio, COMET selects the verbs &quot;reinstall&quot; and &quot;return&quot; to lexicalize the same installation process.</Paragraph> <Paragraph position="20"> The use of the unification algorithm for lexical choice is a novel approach that allows for the integration of various types of constraints in a uniform formalism. For example, in COMET, the choice of verb for a process has been constrained simultaneously by its location in the domain hierarchy, by the semantic features of its role, and by the contextual features of the previous discourse. FUF also</Paragraph> <Paragraph position="22"> extensible, and ultimately will allow for extensive interaction between the lexieal chooser and grammar through a uniform formalism.</Paragraph> </Section> <Section position="3" start_page="44" end_page="46" type="sub_section"> <SectionTitle> Graphics Generation </SectionTitle> <Paragraph position="0"> Work on graphics generation in COMET has concentrated on the development of an approach for generating technical illustrations of 3D objects, embodied in the rule-based graphics generator IBIS (Intent-Based Illustration System) \[12\]. As in COMET's text generation component, all material is created on the fly, making it possible for the explanation to be customized to the individual user and situation.</Paragraph> <Paragraph position="1"> Each of IBIS's illustrations is created by an illustrator, which designs its illustration to fulfill a set of communicative goals derived from the LF that is presented to it. The illustrator realizes these goals by mating an illustration that includes a set of objects to be depicted and their at-Install the new holding battery.</Paragraph> <Paragraph position="2"> deg.deg Remove the primary battery: ... pull the battery box away from the radio* * deg deg Rei~Rdl the primary battery: Return the primary battery to the radio, relnstall the battery box, and snap the latches.</Paragraph> <Paragraph position="3"> are lit, and a viewing specification that indicates how the 3D objects are to be projected onto the 2D display. In designing an illnstration, the iUustrator relies on a set of rules that form an illustration style.</Paragraph> <Paragraph position="4"> -......,.: ........:................................,......-.......... :.....-.-.-.-.-.-.-.-.-.... :...... ;. ;..-:. :.., :. :.: ;. :4. :. ~. :. ;. :-:. :. :. :. :. ;-:-:. :. :. :. :-:. :. :-:-:-:. :-:, :-:-:. ;-:-:-:-:-:-:. :. :. :. :. :+ :. :. :. :. :. :. :, ;. :. :. :, :. :, :4. :. :. :. :. :. :. :. :. :. :. :. :. :-:. :-:-:. :. :. :. :. :. :. :. :. :, :-:. :. :. :-:. :. :. :. :. ;. :. :. :. :. :. :. :. :. :. :. :. :. :. :. :. :. :. :. :. :. ~:, :. :. :, :. :. :,: Older version of Fig. 3 without constraints from previous picture generation. By default, IBIS attempts to express the contents of a logical form in a single illustration. There are many situations, however, in which this cannot be accomplished. For example, an illustration may need to show two objects that are not simultaneously visible from the same viewpoint.</Paragraph> <Paragraph position="5"> Alternatively, two objects to be included may be visible, but may be of sufficiently different size or distance from the viewpoint that showing one in its entirety may necessitate showing the other at too small a size for it to be legible. The objects to be depicted may even include the same object at different points in time. In all of these cases, IBIS can generate composite illustrations, much as COMET's text generator can create compound sentences.</Paragraph> <Paragraph position="6"> A composite illustration contains nested subpictures whose objects, lighting specification, or viewing specification may differ. Each subpicture is generated by an illustrator that is spawned by the parent picture's illustrator, and that is given a subset of the parent illustrator's goals to fulfill [12].</Paragraph> <Paragraph position="7"> IBIS's rules have recently been expanded to deal with certain cases in which an illustration's design should be influenced by previously generated illustrations. One example of this is the incorporation of constraints from previously selected viewing specifications. When tWo pictures are displayed in spatial or temporal sequence, small changes in viewing specification can be disconcerting, and may appear to be the result of accidental, rather than intentional, camera movement. For example, cinematographers often use a rule of thumb that a change of viewing specification corresponding to less than a 30 deg rotation about the object of interest is too small [8]. When generating an illustration, IBIS takes into account the viewing specification used in previous illustrations to avoid small changes. Otherwise, attempts to optimize each viewing specification for the individual illustration's goals would result in a picture whose &quot;locally optimal&quot; viewing specification would not be as effective in context of those pictures already generated.</Paragraph> <Paragraph position="8"> IBIS designed the illustrations in Fig. 3, raking into account the viewing specification of the left illustration when generating the right illustration. In contrast, Fig. 7 includes an earlier version of the right illustration, created with a rule base that does not incorporate these constraints on the viewing specification. Note how the locally oplirnized subpictures of Fig. 7 look somewhat inconsistent when viewed next to each other. Although IBIS currently completes the processing of each LF before starting on the next, it could use lookabead, as well as lookhehind, to delay making certain decisions until additional information about succeeding illustrations is known. For example, this would allow an illustration's viewing specification to be based on the contents of those illustrations that follow it, as well as those that precede it, maximizing the number of illustrations for which the same viewing specification could be used effectively. null IBIS currently generates each illustration from scratch.</Paragraph> <Paragraph position="9"> We are currently redesigning its picture generation ap- null proach so that it can incrementally modify a design when small changes are made to the goals that an illuslration must satisfy. For example, if the viewing specification is partially specified as an input communicative goal, two illustrations' sets of communicative goals may differ only in their viewing speeificafious. Since IBIS runs on a machine that can render a 3D shaded image in a fraction of a second, ff an illustration's specification can be incrementally regenerated fast enough, we can make possible simple user controUed animation. For example, the user could move the camera around a set of objects to view them from different positions, while IBIS maintained constraints such as legibility and visibility of designated objects.</Paragraph> <Paragraph position="10"> Summary In this paper, we described our most recent advances in COMET. These included the integration of individual components and the addition of a menu-based user interface, yielding a fully operational testbed. In the media coordinator, we have made progress towards the coordination of picture and sentence breaks. In the text generator, we focused on the problem of lexical choice, developing a framework for lexical choice using the Functional Unification Formalism and implementing influences from previous discourse and the underlying knowledge sources on lexical choice. In the graphics generator, we implemented constraints from previous (pictorial) discourse, and began work on incremental regeneration of illustrations.</Paragraph> </Section> </Section> class="xml-element"></Paper>