XML Viewer - e87-1030

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/87/e87-1030_intro.xml
Size: 14,232 bytes
Last Modified: 2025-10-06 14:04:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="E87-1030">
  <Title>NATURAL AND SIMULATED POINTING</Title>
  <Section position="3" start_page="180" end_page="183" type="intro">
    <SectionTitle>
4. Form deixis
</SectionTitle>
    <Paragraph position="0"> Pointing at two-dimensional objects (forms, diagrams, maps, pictures etc.) differs in various aspects from pointing at objects within the entire visual field.</Paragraph>
    <Paragraph position="1"> This offers a definite advantage from a linguistic point of view: Some problems of local deixis are reduced in complexity without the communicative setting having to become unnatural (Schmauks 1986a). Furthermore,  this domain is interesting from an artificial intelligence point of view, since some of the pointing actions with regard to forms can now be simulated on a terminal screen.</Paragraph>
    <Section position="1" start_page="181" end_page="181" type="sub_section">
      <SectionTitle>
4.1 Reduction of problems
</SectionTitle>
      <Paragraph position="0"> Following Bfihler's terminology (1982), form deixis belongs to the kind of deixis called 'demonstratio ad oculos', because all objects pointed at are visible. Furthermore, it represents an example of the 'canonical situation of utterance' (Lyons 1977): All the participants are co-present and can thus mutually perceive their (pointing) gestures etc. Form deixis is relatively precise, because tactile pointing is always possible. Precise pointing at small objects (e.g. single words) is frequently performed by using a pencil etc., larger areas by encircling them. The ambiguity with regard to objects behind each other does not occur, because the deictic space is only two-dimensional. If speaker and hearer are situated side by side, their deictic fields are co-oriented.</Paragraph>
      <Paragraph position="1"> Therefore, this position makes cooperation easier, and thus is the most advantageous one.</Paragraph>
    </Section>
    <Section position="2" start_page="181" end_page="181" type="sub_section">
      <SectionTitle>
4.2 Remaining problems
</SectionTitle>
      <Paragraph position="0"> Although form deixis implies a reduction of problems, referent identification has not at all become a trivial task. It cannot be taken for granted that demonstratum and referent are identical. This might be due to the fact that the speaker has mistakenly pointed at a wrong place because s/he doesn't know the referent's actual location or misses the target by accident. Other divergencies emerge intentionally: The speaker doesn't want to cover the referent and therefore points a bit lower.</Paragraph>
      <Paragraph position="1"> Other essential problems arise because there exist subset relations among form regions. For example, the demonstratum can be a part of the referent - this is referred to as 'pars-pro-toto deixis'. In those cases, one must take into account the verbal description to resolve the ambiguity.</Paragraph>
      <Paragraph position="2"> Furthermore, pointing at one form region can (depending on linguistic context) refer to three different entities:  1. The form region itself: 'What is to be entered here? 2. The actual entry: 'I want to increase this sun'/. ' 3. Correlated concepts: 'Are these expenses to be verified?' 5. Simulated pointing  This section investigates the extent to which some features of natural pointing can already be simulated in dialog systems developed to date. In section 6, some steps towards more accurate simulation are briefly suggested. null</Paragraph>
    </Section>
    <Section position="3" start_page="181" end_page="181" type="sub_section">
      <SectionTitle>
5.1 Different ways of simulating pointing ges-
tures
</SectionTitle>
      <Paragraph position="0"> Face-to-face interaction is performed by gestures and speech in parallel. In many domains (e.g. form deixis), objects are often and efficiently referred to by pointing gestures. Thus, dialog systems will become more natural if the user has the possibility of 'pointing' at the objects which are visible on the screen.</Paragraph>
      <Paragraph position="1"> The goal 'reference by pointing' can be achieved by various strategies. One fundamental decision must be made first: whether one wants to simulate natural pointing (as is the aim of TACTILUS) or to offer functionM equivalents. In the former case, there is the presupposed but questionable demand that man-machinecommunication should be performed by the same means as interhuman communication.</Paragraph>
      <Paragraph position="2"> If the main emphasis relies on simulation, then the pointing device and its use must correspond to natural pointing as accurately as possible. In this case, the most adequate simulation will be pointing at a touch-sensitive screen (see section 6). But other devices (e.g. input via mouse-clicks) can also partially simulate natural pointing (see sections 5.3).</Paragraph>
      <Paragraph position="3"> Functional equivalents to natural pointing include the following devices: Framing the referent or zooming in on it, highlighting it in different colours etc. (see Ffthnrich et al. 1984). On the one hand, the system can 'point' by these means. On the other hand, the user gets immediate teedback as to whether the system has recognized the intended referent. This advantage is paid for by the loss of 'naturalness'.</Paragraph>
    </Section>
    <Section position="4" start_page="181" end_page="182" type="sub_section">
      <SectionTitle>
5.2 Historical remarks
</SectionTitle>
      <Paragraph position="0"> Multimodal input, especially the possibility of pointing at visible objects, offers certain crucial advantages. For example, the use of simple pointing actions was already possible in the following systems: SCHOLAR (Carbonell 1970) allows pointing gestures in order to specify regions of geographic maps. Pointing in Woods' (1979) system, combined with simple descriptions, refers to substructures of a parse tree displayed on the screen. In NLG (Brown et al. 1979), the user can draw simple geometric objects through descriptive NL-commands and simultaneous tactile touches on the screen. SDMS (Bolt 1980)enables the user to create and manipulate geometric objects on a screenarrangement called 'MEDIA ROOM'. In all those systems, there exist predefined relations between the pointing gesture and its demonstratum. Referent identification is not dependent on context etc.</Paragraph>
      <Paragraph position="1"> Currently, several projects are investigating problems concerning the integration of pointing actions and NL input, e.g.: In NLMENU (Thompson 1986), the user can select parts of a street map by means  of a mouse-controlled rubber-band technique. Hayes (1986) oudines the integration of a deictic component into the Language Craft System, which should allow the user to click on items on the screen, e.g. the machines on a blueprint of a factory floor. ACORD investigates pointing actions with respect to various two-dimensional objects, e.g. a map of the planetary system (Hanne, Hoepelmann, and F~ihnrich 1986) and a form for university registration (Wetzel, Hanne, and Hoepelmann 1987).</Paragraph>
    </Section>
    <Section position="5" start_page="182" end_page="182" type="sub_section">
      <SectionTitle>
5.3 Pointing actions in TAGTILUS
</SectionTitle>
      <Paragraph position="0"> One aim of XTRA is the integration of (typed) verbal descriptions and pointing gestures (currently realized by mouse-clicks) for referent identification (Kobsa et al. 1986). The user should be able to efficiently refer to objects on the screen, even when s/he uses underspecified descriptions and/or imprecise pointing gestures (Allgayer, Reddig 1986). Hence the process of specifying referents is speeded up and requires less knowledge of specialist terms.</Paragraph>
      <Paragraph position="1"> The deictic component of XTRA (called TAC-TILUS) is completely implemented on a Symbolics Lisp Machine (Allgayer 1086). It offers four types of pointing gestures which differ in accuracy. They correspond to three modes of punctual pointing (with pencil, index finger, or hand) and to the possibility of encircling the demonstratum. Thus, pointing becomes a two-step process: First, one has to select the intended degree of preciseness and then to 'point'.</Paragraph>
      <Paragraph position="2"> These pointing actions are natural because of their ambiguity: There is no predefined relation between the spot where the mouse is activated and the object which is thereby referred to. Therefore, the system has to take into account additional knowledge sources for referent identification, e.g. verbal descriptions and dialog memory. From the user's point of view, the essential indication of this naturalness is the lack of visual feedback. In analogy to natural pointing, the identified referent is not highlighted.</Paragraph>
    </Section>
    <Section position="6" start_page="182" end_page="182" type="sub_section">
      <SectionTitle>
5.4 Problems in processing mixed input
</SectionTitle>
      <Paragraph position="0"> One essential problem is to assign a mouse-click to its corresponding verbal constituent. This task is not trivial since there is no guarantee that the user 'points' within the range of the deictic expression. Possibly, the click occurs too late because of the user being inattentive, not familiar with the system etc. One example is: What is this sum above the last entry/&amp;quot; ? Here, the pointing action occurs next to 'the last entry'.</Paragraph>
      <Paragraph position="1"> But this is an anaphor and doesn't need to be amplified.</Paragraph>
      <Paragraph position="2"> On the other hand, there is the deictic expression 'this sum' without its correlated obligatory pointing action.</Paragraph>
      <Paragraph position="3"> Therefore, the system has to recognize that '/ ' belongs to 'this sum'. This problem is aggravated by the fact that the words 'here'/'there' and 'this'/'that' are not only the most frequent deictic expressions but have anaphoric and text- deictic readings as well.</Paragraph>
      <Paragraph position="4"> Matching mouse-clicks and phrases becomes even more difficult if a single utterance requires more than one pointing action. This case is called 'multiple pointing'.</Paragraph>
      <Paragraph position="5"> Examples include: This sum I would prefer to enter here.</Paragraph>
      <Paragraph position="6"> Hayes (1986) assumes that pointing actions are performed in the same order as their corresponding phrases. But until this hypothesis is confirmed empirically, it can only serve as a heuristic rule.</Paragraph>
      <Paragraph position="7"> As soon as reference by pointing is possible, the use of incomplete expressions will increase. In these cases, additional knowledge sources are needed for referent identification, like descriptor analysis and case frame analysis (Kobsa et al. 1986). For example, the expression 'this' in the sentence 'I want to add Sis/&amp;quot; ' surely refers to a number in the present domain, because 'add' is categorized as an action to be performed with numbers.</Paragraph>
    </Section>
    <Section position="7" start_page="182" end_page="183" type="sub_section">
      <SectionTitle>
5.5 Problems in generating mixed output
</SectionTitle>
      <Paragraph position="0"> If the pointing actions of the system are also conceived as a simulation of natural pointing, the user is confronted with the same problems that have already been identified in the last subsection (Reithinger 1987).</Paragraph>
      <Paragraph position="1"> But, whereas multiple pointing can be simulated during input, there seems to be no adequate mode for simulating it during output as well: In normal communication, the hearer doesn't need to watch the speaker in order to understand him/her unless the occurence ofa deictic expression (or the sound of touching during tactile pointing) demands his/her visual attentiveness. Also, during typed dialog, there is no need to observe the output sentences permanently. In the case of multiple pointing, the possibility cannot be ruled out that the user might fail to notice one of the pointing actions.</Paragraph>
      <Paragraph position="2"> 6. Prospects of more natural simulation Up till now, only certain kinds of tactile pointing gestures can be simulated on a screen. Negroponte (1981) oudines some future plans, e.g. the consideration of non-tactile actions such as eye tracking and body movements.</Paragraph>
      <Paragraph position="3"> Simulation of tactile pointing gestures by mouse-clicks has some serious limitations with regard to its 'naturalness'. Empirical investigations are needed to determine the extent to which mouse-clicks can be regarded as an equivalent of natural pointing. These investigations are currently carried out in the XTRA project.</Paragraph>
      <Paragraph position="4">  In the case of natural pointing, the choice of a more or less precise pointing gesture is made automatically rather than consciously. But in TACTILUS, the user has tc, select explicitly the intended degree of accuracy. Empirical investigations must examine whether the user regards this as a disadvantage.</Paragraph>
      <Paragraph position="5"> Furthermore, pointing via mouse-clicks differs from natural tactile pointing, because there is no physical contact between finger and demonstratum. A better solution would be the use of a touch-sensitive screen on which 'real-world gestures' (see Minsky 1984) are possible. Touch-sensitive screens allow highly natural pointing gestures (see Picketing 1986), but have some shortcomings, e.g. a restricted degree of resolution.</Paragraph>
      <Paragraph position="6"> A problem just as serious as the aforementioned is the temporal dissociation of a pointing gesture and its corresponding phrase. This problem would be soluble if the system would accept input via voice. But this alone wouldn't be sufficient: There is no guarantee that spoken phrases and correlated mouse-clicks occur simultaneously. Furthermore, current voice-input systems have too small a vocabulary and cannot process fluent speech.</Paragraph>
      <Paragraph position="7"> Therefore, the most adequate simulation would be the combination of voice input~output and gestures on a touch-sensitive screen. However, the state of the art with respect to the required devices is not yet sufficient.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML