File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-1412_evalu.xml

Size: 5,402 bytes

Last Modified: 2025-10-06 14:00:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1412">
  <Title>A Syndetic Approach to Referring Phenomena in Multimodal Interaction</Title>
  <Section position="5" start_page="7" end_page="7" type="evalu">
    <SectionTitle>
7 Analysis
</SectionTitle>
    <Paragraph position="0"> We will examine the above specified model informaily, since there is not space to conduct a full for-A Syndetic Approach to Referring Phenomena in Multimodal Interaction 91 mal analysis. The interested reader may address the referenced papers on syndesis for a more deep understanding. Here we will show directly the result of the analysis and will make comments on it.</Paragraph>
    <Paragraph position="1"> To satisfy the first sub-goal (item, read), the object subsystem receives coherent representations from :prop-obj: and :vis-obj: that are in its sources.</Paragraph>
    <Paragraph position="2"> They must be also stable and coherent so that their representations are blended. The enriched representation is tranformed by the object system into propositional, morphonolexical and limb representations. Since the goal is to read, the psychological subject becomes an entry in the list. The morphonolexicai system can operate on this representation in order to find its related sound structure. Similarly the propositional system revives it through its buffer to both morphonolexical and object systems enriching their representations.</Paragraph>
    <Paragraph position="3"> In the case of the MDisplay system, which uses the mouse, the information transmitted by the object system is of little use for the limb system. In fact, the cursor is far from the psychological object in the representation structure. Consequently, the information from the body-state which 'feels' the mouse troough the 'ooperate' action and the one from the object system cannot be blended leading to buffering. However the buffer is already allocated and consequently the stream is disengaged leading to a change of the configuration.</Paragraph>
    <Paragraph position="4"> In the case of the TDisplay model, which makes use of the touch screen, the same stream resulting from the :obj-lim: transformation is relevant to the limb system since it blends with the information arriving from the body-state. In is interesting to note that in this second case the movement of pointing to an item starts before the same information is processed by the articulatory system for speaking. This is confirmed by experiments in the field of cognitive psychology.</Paragraph>
    <Paragraph position="5"> After one cycle of processing of the goal by all the involved subsystems, the propositional system removes the first part of the goal and starts satisfying the two parallel tasks of speaking and gesturing by sending representations again to the morphonolexical and object subsystems. At the morphonolexical level this representation blends naturally since all the information was already available for specking and it can be passed directly to the articulatory subsystem. At the object level the new representation blend with the information stream from the visual subsystem.</Paragraph>
    <Paragraph position="6"> In the case of the MDisplay system, the ghost node of figure 5 is built and sent to the propositional system for semantic checking. Only after a further loop between the propositional and the object systems this information is sent to the limb system where it can now be blended with the body-state information to perform the pointing gesture. However, at this time the articulatory system has already directed the speech of the referred word. Consequently, in this case the speech and locate actions cannot occur in parallel but are performed in a sequence.</Paragraph>
    <Paragraph position="7"> In the case of the TDisplay model, the limb system has already started to locate the item within the screen so that the operation can continue in parallel with the articulatory system and synchronize through the body state.</Paragraph>
    <Paragraph position="8"> The result is extremely interesting when related to previous works carried on the process of fusion of information within multimedia systems.</Paragraph>
    <Paragraph position="9"> At University of Grenoble, CLIPS, they have developped an original algorithm, known as the 'melting pot', to support deixis within the Matis system. Matis is a Multimodal Airline Travel Information System supporting several combinations of modalities to formulate queries against a flights data base. The melting pot algorithm is built around the intrinsic uncertainty found in relating mouse events and spoken words, the authors have directly experimented in building the system. The practical consequence is that the algorithm is noon-deterministic. Our woork clearly gives a motivation for this.</Paragraph>
    <Paragraph position="10"> In (Faconti et al., 1996) the fusion process is described at a high level of abstracion. It defines a system architecture of fusion and a class of algorithms which the melting pot is one instance of. The work is in line with the findings of this paper suggesting that a non-deterministic fusion algorithm can be developped based on exact temporal windows within which pointing events may occur. These temporal windows are defined by the limb and articulary sub.systems processes within ICS and can be captured by the system speech recognizer.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML