File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1411_metho.xml

Size: 8,570 bytes

Last Modified: 2025-10-06 14:14:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1411">
  <Title>Referring to Displays in Multimodal Interfaces</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 World Model and Display Model
</SectionTitle>
    <Paragraph position="0"> It is common in systems which present visual information on the screen (e.g. GISs) for there to be a display model. This is an explicit representation of what items are currently on the screen and what their characteristics are. This is distinct from the world model which represents the facts about the world that the system has, which may not be displayed on the screen. In such systems, the main role of the display model is to maintain the visual display in an orderly fashion, and to connect screen objects to world (or database) objects. It must be updated systematically as items appear, disappear or move on the screen. Very often, the display model is quite a low-level structure, as it performs basic housekeeping for the display.</Paragraph>
    <Paragraph position="1"> Our proposal is that, for NL querying of the visual display to be possible, the display model must contain suitable high level information in a form which is accessible to an NL front-end; preferably, this form would be similar to, or related to, the representation the NL front-end uses to access the world model.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Illustrative Examples
</SectionTitle>
    <Paragraph position="0"> A non-spatial domain It might seem that queries about the visual display would make sense only in a domain where spatial in1We shall discuss natural language, but with the assumption that working systems in a few years' time would operate with speech input.</Paragraph>
    <Paragraph position="1"> 80 D. He, G. Ritchie and J. Lee formation is directly relevant, such as a street map or room plan. However, if an iconic display is being used to represent some non-spatial set of objects, it might still be desirable to use visual attributes to refer to these abstract icons. To make these remarks slightly more concrete, let us consider a (fictitious) example system. This system does not handle spatial information, but it uses iconic representations on the screen to convey database facts to the user.</Paragraph>
    <Paragraph position="2"> The application is a car-sales catalogue, in which a number of (presumably used/second-hand/preowned) cars are available for the user to browse through. Icons on the screen represent individual cars, and various characteristics of the icons convey attributes of the corresponding cars (Figure 1). The  size of an icon conveys the price band, the colour conveys the year of production, and the letter on each icon indicates the initial of the manufacturer.</Paragraph>
    <Paragraph position="3"> The user can point to icons, move icons round, or ask questions about them, such as What is the insurance group of the ear in the top right hand corner?, Is the green car a hatehback?. Notice that spatial phrases (in the top right hand corner) can be used, even though this is a non-spatial domain. Also, the colour adjective green would (given the coding in Figure 1) probably refer to the colour of the icon on the screen rather than the colour of the actual car, but a similar scenario can be imagined where a colour term would be used to denote the colour of the actual world object. In some cases, both might be possible, leading to ambiguity (Wilson and Conway, 1991).</Paragraph>
    <Paragraph position="4"> What is clear from this is that the mapping from the world model (database) to the display model is centrally important. In particular, if we wish to be able to handle questions which are explicitly about the visual representation, such as What does green represent?, the mapping itself must be accessible to some form of symbol querying by the NL/MM interface. null A spatial domain Let us now consider a (fictitious) spatial domain.</Paragraph>
    <Paragraph position="5"> In this domain, a 2D graphic display is being used to help the user plan the layout of a room. The display represents the overall plan, and icons are stylised images of furnishings and fittings. In such a situation, the user might pose queries such as What kind of chair is to the right of the table ?, Would a cupboard fit above the table ?. Here, spatial relations are again used, but there is potential ambiguity as to whether they refer to relations in the world being modelled, or on the screen. An object might be &amp;quot;above&amp;quot; the table in the image, but &amp;quot;to the left&amp;quot; of it in reality.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Levels of ambiguity
</SectionTitle>
    <Paragraph position="0"> As argued above, certain forms of reference (e.g. colour, spatial relations) can be ambiguous between visual characteristics of the display and actual characteristics of the world being modelled. For referring expressions, there are two levels to this ambiguity: Described referent. When the query interpreter is processing a referring expression, it has to determine in which model - the display model or the world model - the features of the object (e.g. colour, size) are being described, and hence used to indicate the referent. During this process, the objects in the world model and those in the display model should be counted as different even in cases where a representation relation exists between them.</Paragraph>
    <Paragraph position="1"> There may, as noted above, be ambiguity here, between the two models.</Paragraph>
    <Paragraph position="2"> Intended referent. Even if a unique object is determined (a display object such as an icon or a world object such as a database item), it is conceivable that this object is being used as a surrogate to refer to the corresponding object under the mapping relation. This can be illustrated using the &amp;quot;car&amp;quot; domain introduced earlier. In a query such as What is the price of the blue one?, the colour blue may be (unambiguously) a display feature, indicating a blue icon, but the intended referent (for use in the price predicate) is the corresponding world object, not the icon (cars have prices, icons do Referring to Displays in Multimodal Interfaces 81 not). Conversely, in a command Move the 1.5 litre car to the top of the screen, the noun phrase uses domain attributes to indicate a domain object, but the action of move is to operate on the corresponding display object. The third, and simplest, possibility is that there is no intervening use of the mapping relation - the described referent is itself the intended referent.</Paragraph>
    <Paragraph position="3"> This level of indirection can lead to ambiguity when the noun phrase is viewed in isolation, since the choice of intended referent often needs information from the rest of the sentence, or from the context, to disambiguate it.</Paragraph>
    <Paragraph position="4"> The consequence of this added level of ambiguity is that the normal way of considering the &amp;quot;sense&amp;quot; and &amp;quot;reference&amp;quot; of an NL phrase has to be reconsidered. Instead of the usual two-level approach in which a symbolic description (the sense) is evaluated, matched, or otherwise processed to produce a particular set of objects (the reference), we need a three-level approach allowing for sense, described referent, and intended referent. All of these have to be managed systematically, so that the correct relationships are maintained, and utilised, between the various objects.</Paragraph>
    <Paragraph position="5">  to display, world and mapping</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Our aim
</SectionTitle>
    <Paragraph position="0"> The aim of our project is to devise a uniform, general and flexible architecture and representation mechanism by which a NL/MM query system can process queries about objects displayed on the screen, about objects in the database or world model, and about the relationship between these two. By &amp;quot;general&amp;quot;, we mean that the mechanisms should not be hard-wired or domain-specific. We intend to produce a method whereby a given database and a formally specified visual representation scheme for the data-base entities can be used to interface directly with a domain independent NL front-end system, thus building a working multi-media system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML