File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1410_metho.xml
Size: 10,870 bytes
Last Modified: 2025-10-06 14:14:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1410"> <Title>Exploiting Image Descriptions for the Generation Expressions of Referring</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Generation of referring expressions </SectionTitle> <Paragraph position="0"> In (Dale and Reiter, 1995, p. 259) it is assumed that &quot;a referring expression contains two kinds of information: navigation and discrimination.</Paragraph> <Paragraph position="1"> Each descriptor used in a referring expression plays one of these two roles. Navigational, or 76 K. Hartmann and J. SchSpp attention-directing information, is intended to bring the referent in the hearer's focus of attention \[while\] discriminating information is intended to distinguish the intended referent from other objects in the hearer's focus of attention&quot;. In the following, we show how we compute navigational and discriminating descriptions of a given intended referent, especially a component of a complex object, using the results of our characteristic component algorithm.</Paragraph> <Paragraph position="2"> As shown in example 4, the characteristic component algorithm computes sets of characteristic components for the intrinsic sides of a given complex object. Assuming that the system wants to refer to a component of the complex object, the intended referent can be an element of a unary set, of a non-unary set or it can be no element of a set of characteristic components at all. We will analyse all these cases in turn. Where the intended referent belongs to several characteristic component sets, the system selects one, preferring the smallest set, in order to generate referring expressions which employ a minimal number of descriptors.</Paragraph> <Paragraph position="3"> Case 1: The intended referent is a unique characteristic component. Figure 1 shows the front side, the top side and the right side of a toaster. The elevating pushbutton and the roast intensity selector are both elements of a unary set of characteristic components for the front side.</Paragraph> <Paragraph position="4"> Hence, one can refer unambiguously to these components in an associated text, because the addressee can unambiguously distinguish these components from all components which are located on the other sides of the depicted toaster and hence no navigational description is necessary.</Paragraph> <Paragraph position="5"> Press the spray button.</Paragraph> <Paragraph position="6"> coordination between text and graphics (AndrE, 1995, page 80) However, the characteristic component algorithm considers only the components which are located on other sides, but not the components which are located on the same side. For the generation of referring expressions, the intended referent has also to be distinguished from the other components on the same side of the complex object. Figure 5, for instance, shows a detail of an iron with two buttons on the top side. According to the characteristic component algorithm both buttons represent unique characteristic components for the top side of the depicted electric iron, and hence no navigational description is generated. null Nevertheless, we still have to provide discriminating descriptions for the intended referent with respect to the set of the components of the same type on that side. As the colour and the shape of both buttons in example 5 do not differ, we have to exploit information on the relative location, which enables us to generate a sentence like &quot;Press the left button, which is the spray button&quot;. This establishes a co-referential connection between the referent of the nominal phrase &quot;the spray button&quot; and the left button on the top side, which can be exploited in the subsequent dialogue. In contrast to that, an augmentation of the depicted graphics with an arrow is proposed by (Andre, 1995, page 81) in order to establish this co-reference.</Paragraph> <Paragraph position="7"> Case 2: The intended referent is not a unique characteristic component, but an element of a set of characteristic components.</Paragraph> <Paragraph position="8"> Since the set of characteristic components enables the hearer to infer on which side these components are located, no further navigational information is needed, if all components of that set are mentioned in the referring expression. For the construction of the referring expression, we compute a set of discriminating descriptions for the intended referent with respect to the other components in the set of characteristic components C' (formally C' is the set difference of the set of characteristic components C and the intended referent {r}). These discriminating descriptions of the intended referent should be perceptually recognisable, like its colour, shape or the relative location with respect to the other components in C' and can be retrieved from the illumination model or the geometric model.</Paragraph> <Paragraph position="9"> If we use the relative location of the intended referent with respect to all the components in C' for generating the referring expression, no further navigational information needs to be included, as the intended referent together with C' specifies a Exploiting Image Descriptions for the Generation of Referring Expressions 77 set of characteristic components and all the components of this characteristic component set are mentioned in the referring expression.</Paragraph> <Paragraph position="10"> In example 4, the component a6 on side s8 is included in the set {a6, b6, ~ } of characteristic components. To enable the addressee to distinguish the intended referent a6 from b6 and c6, we have to provide further descriptors. Thus, we have to search for perceptually recognisable attributes of a6 like its colour, shape -- or its relative location with respect to b6 and c6.</Paragraph> <Paragraph position="11"> Case 3: The intended referent is not an element of a characteristic component set at all. Navigational information indicating on which side the intended referent is located has to be included. In addition, we have to provide discriminating descriptions for the intended referent that distinguish it from all the other components which are located on this side. This set of discriminating descriptions can be computed by a traditional reference algorithm. If the system intends to refer to the component al of side sl in example 4, it would insert the name of the side sl as navigational information and the set of attributes which distinguishes al from bl.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> In previous work to generate referring expressions several algorithms were proposed (Dale and Reiter, 1995), (Horacek, 1996). The main goal of these algorithms is to compute a referring expression for a given referent, which enables the hearer to distinguish it from all other objects in the hearer's focus of attention, the contrast set. Dale and Reiter proposed a number of algorithms that differ in their computational complexity. Since the task of finding the minimal set of descriptors is NP-hard 3, a number of heuristics are used, which approximate the minimal set.</Paragraph> <Paragraph position="1"> The computation of the referring expressions in our approach is done in a two-stage process: First, we use only the type information to find the characteristic components of the sides which can be used for the generation of navigational descriptors. In a second step, classical reference algorithms compute the discriminating information for the intended referent with a reduced contrast set using perceptually recoguisable attributes like colour, shape and relative location of components with respect to other components.</Paragraph> <Paragraph position="2"> The proposed characteristic component algorithm computes a set of descriptors which enable 3The problem can be transformed into the problem to find the minimal size set cover, which is proven to be NP-haxd (Garey and Johnson, 1979).</Paragraph> <Paragraph position="3"> the addressee to identify a side of a given complex object in contrast to the set of the other sides of the given object. For the characteristic component algorithm, while the intended referent is the given side of the object, the other sides of the object can be considered as the contrast set in Dale & Reiter's terms. In contrast to (Dale and Reiter, 1995) where at most one descriptor set is computed which distinguishes the referent from all other objects in the contrast set, our algorithm computes all minimal descriptor sets. The algorithm is far more expensive than classical reference algorithms, because we calculate all minimal distinguishing descriptions of the given side using only the type attribute. On the other hand, this enables us to use sources other than the part-whole relation (IDAS (Reiter et al., 1995)) or the spatial inclusion relation (KAMP (Appelt, 1985)) for the generation of the navigational part of the referring expression.</Paragraph> <Paragraph position="4"> The set of characteristic components contains no negative expressions. Negative expressions would enable us to compute characteristic components of sides, for which the proposed algorithm computes an empty set of characteristic components. On the other hand, that would force us to generate referring expressions which contain statements about components that are not located on the same side as the intended component. We think that statements of this kind would confuse the addressee.</Paragraph> <Paragraph position="5"> This proposed work incorporates propositional and analogue representation as suggested by (Habel et al., 1995). Within the VisDok-project (visualization in technical documentation), we decided to combine geometric information and information gained from the illumination model with a propositional representation of the type of the objects in a knowledge base.</Paragraph> <Paragraph position="6"> A first prototypical system for the generation of multimodal multilingual documentation for technical devices within an interactive setting has been realised. We employ separate processes for the rendering of predefined pictures and animations, and text generation. Our algorithm enables us to minimise the time-consuming communication between separate processes in order to generate referring expressions, as the procedure described in section 3 relies only partly on perceptually recognisable attributes of objects like colour, shape and relative location while employing the type attribute, which is explicitly represented in the knowledge base.</Paragraph> <Paragraph position="7"> 78 K. Hartrnann and J. SchSpp</Paragraph> </Section> class="xml-element"></Paper>