File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1409_metho.xml
Size: 10,274 bytes
Last Modified: 2025-10-06 14:14:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1409"> <Title>Planning Referential Acts for Animated Presentation Agents</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Highlevel Planning of Referential Acts </SectionTitle> <Paragraph position="0"> Following a speech-act theoretic perspective, we consider referring as a goal-directed activity (cf.</Paragraph> <Paragraph position="1"> (AK87)). The goal underlying a referring expression is to make the user activate appropriate mental representations in the sense of picking them out of a set of representations which are already available or which have to be built up (e.g., by localizing an object in a user's visual field). To plan referential acts which accomplish such goals, we build upon our previous work on multimedia presentation design (cf.</Paragraph> <Paragraph position="2"> (AR96)). The main idea behind this approach was to formalize action sequences for designing presentation scripts as operators of a planning system. Starting from a complex communicative goal, the planner tries to find a presentation strategy which matches this goal and generates a refinement-style plan in the form of a directed acyclic graph (DAG). This plan reflects not only the rhetorical structure, but also the temporal behavior of a presentation by means of qualitative and metric constraints. Qualitative constraints are represented in an &quot;Allen-style&quot; fashion (cf. (All83)) which allows for the specification of thirteen temporal relationships between two named intervals, e.g. (Speak1 (During) PointP). Quantitative constraints appear as metric (in)equalities, e.g. (5 < Duration Point2). While the top of the presentation plan is a more or less complex presentation goal (e.g., instructing the user in switching on a device), the lowest level is formed by elementary production (e.g., to create an illustration or to encode a referring expression) and presentation acts (e.g., to display an illustration, to utter a verbal reference or to point to an object).</Paragraph> <Paragraph position="3"> If the presentation planner decides that a reference to an object should be made, it selects a strategy for activating a mental representation of this object.</Paragraph> <Paragraph position="4"> These strategies incorporate knowledge concerning: * the attributes to be selected for referent disambiguation null To discriminate objects from alternatives, the system may refer not only to features of an object in a scene, but also to features of the presentation model, their interpretation and to the position of objects within a presentation, see also (Waz92).</Paragraph> <Paragraph position="5"> * the determination of an appropriate media combination null To discriminate an object against its alternatives through visual attributes, such as shape or surface, or its location, illustrations are used.</Paragraph> <Paragraph position="6"> Pointing gestures are planned to disambiguate or simplify a referring expression or to establish a coreferential relationship to other document parts.</Paragraph> <Paragraph position="7"> * the temporal coordination of the constituents of a referential act If a referrring expression is composed of several constituents of different media, they have to be synchronized in an appropriate manner. For instance, a pointing gesture should be executed while the corresponding verbal part of the referring expression is uttered.</Paragraph> <Paragraph position="8"> After the planning process is completed, the system builds up a schedule for the presentation which specifies the temporal behavior of all production and presentation acts. To accomplish this task, the system first builds up a temporal constraint network by collecting all temporal constraints on and between the actions. Some of these constraints are given by the applied plan operators. Others result from linearization constraints of the natural-langnage generator. null For illustration, let's assume the presentation planner has built up the following speech and pointing acts: AI: (S-Speak Persona User (type pushto modus (def imp tense pres number sg))) At this time decisions concerning word orderings are not yet made. The only temporal constraints 70 E. Andrg and T. Rist which have been set up by the planner are: (AS (During) A3). That is the Persona has to point to an object while the object's name and type is uttered verbally.</Paragraph> <Paragraph position="9"> The act specifications A1 to A4 are forwarded to the natural-language generation component where grammatical encoding, linearization and inflection takes place. This component generates: &quot;Push the on/off switch to the right&quot;. That is, during text generation we get the following additional constraints:</Paragraph> <Paragraph position="11"> After collecting all constraints, the system determines the transitive closure over all qualitative constraints and computes numeric ranges over interval endpoints and their difference. Finally, a schedule is built up by resolving all disjunctions and computing a total temporal order (see (AR96)).</Paragraph> <Paragraph position="12"> Among other things, disjunctions may result from different correct word orderings, such as &quot;Press the on/off switch now.&quot; versus &quot;Now, press the on/off switch.&quot; In this case, the temporal constraint network would contain the following constraints: (Or (S-Speak-Now (Meets) S-Speak-Press) (S-Speak-Switch (Meets) S-Speak-Now)), (S-Speak-Press (Meets) S-Speak-Switch), (S-Point (During) S-Speak-Switch). For these constraints, the system would build up the following schedules: Schedule 1 1: Start S-Speak-Now 2: Start S-Speak-Press, End S-Speak-Now 3: Start S-Speak-Switch, End S-Speak-Press 4: Start S-Point 5: End S-Point 6: End S-Speak-Switch Schedule 2 1: Start S-Speak-Press 2: Start S-Speak-Switch, End S-Speak-Press 3: Start S-Point 4: End S-Point 5: Start S-Speak-Now, End S-Speak-Switch 6: End S-Now Since it is usually difficult to anticipate at design time the exact durations of speech acts, the system just builds up a partial schedule which reflects the ordering of the acts. This schedule is refined at presentation display time by adding new metric constraints concerning the duration of speech acts to the temporal constraint network.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Context-sensitive Refinement of </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Referential Acts </SectionTitle> <Paragraph position="0"> The presentation scripts generated by the presentation planner are forwarded to the Persona Server 1Note that we don't get any temporal constraints for A2 since it is not realized on the surface level.</Paragraph> <Paragraph position="1"> which converts them into fine-grained animations.</Paragraph> <Paragraph position="2"> Since the basic actions the Persona has to perform depend on its current state, complex dependencies have to be considered when creating of animation sequences. To choose among different start positions and courses of pointing gestures (see Fig. 3), we consider the following criteria: - the position o/the Persona relative to the target object; If the Persona is too far away from the target object, it has to walk to it or use a tele-scope pointing stick. In case the target object is located behind the Persona, the Persona has to turn around. To determine the direction of the pointing gesture, the system considers the orientation of the vector from the Persona to the target object. For example, if the target object is located on the right of the Persona's right foot, the Persona has to point down and to the right.</Paragraph> <Paragraph position="3"> - the set of adjacent objects and the size of the target object; To avoid ambiguities and occlusions, the Persona may have to use a pointing stick. On the other hand, it may point to isolated and large objects just with a hand.</Paragraph> <Paragraph position="4"> - the current screen layout; If there are regions which must not be occluded by the Persona, the Persona might not be able to move closer to the target object and may have to use a pointing stick instead.</Paragraph> <Paragraph position="5"> - the expected length of a verbal explanation that accompanies the pointing gesture; If the Persona intends to provide a longer verbal explanation, it should move to the target object and turn to the user (as in the upper row in Fig. 3). In case the verbal explanation is very short, the Persona should remain stationary if possible.</Paragraph> <Paragraph position="6"> - the remaining overall presentation time.</Paragraph> <Paragraph position="7"> While the default strategy is to move the Persona towards the target object, time shortage will make the Persona use a pointing stick instead. null To support the definition of Persona actions, we have defined a declarative specification language and implemented a multi-pass compiler that enables the automated generation of finite-state automata from these declarations. These fine-state automata in turn are translated into efficient machine code (cf. (RAM97)).</Paragraph> <Paragraph position="8"> Fig. 4 shows a context-sensitive decomposition of a pointing act delivered by the presentation planner into an animation sequence. Since in our case the object the Persona has to point to is too far away, the Persona first has to perform an navigation act before the pointing gesture may start. We associate with each action a time interval in which the action takes place. For example, the act take-position has to be executed during (tl t2). The same applies to the move-to act, the specialization of take-position. The intervals associated with the subactions of move-to are subintervals of (tl t2) and form a sequence. That is the Persona first has to turn to the right during (tl t21), then take some steps during (t21 t22) and finally turn to the front during (t22 t2). Note that the exact length of all time intervals can only be determined at runtime.</Paragraph> </Section> </Section> class="xml-element"></Paper>