File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1607_metho.xml
Size: 19,028 bytes
Last Modified: 2025-10-06 14:09:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1607"> <Title>A context-dependent algorithm for generating locative expressions in physically situated environments</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Approach </SectionTitle> <Paragraph position="0"> The approach we adopt to generating locative expressions involves extending the incremental algorithm [Dale and Reiter, 1995]. The motivation for this is the polynomial complexity of the incremental algorithm. The incremental algorithm iterates through the properties of the target and for each property computes the set of distractor objects for which (a) the conjunction of the properties selected so far, and (b) the current property hold. A property is added to the list of selected properties if it reduces the size of the distractor object set. The algorithm succeeds when all the distractors have been ruled out, it fails if all the properties have been processed and there are still some distractor objects. The algorithm can be refined by ordering the checking of properties according to fixed preferences, e.g. first a taxonomic description of the target, second an absolute property such as colour, third a relative property such as size. [Dale and Reiter, 1995] also stipulate that the type description of the target should be included in the description even if its inclusion does not distinguish the target from any of the distractors, see Algorithm 1.</Paragraph> <Paragraph position="1"> However, before applying the incremental algorithm we must construct a context model within which we can check whether or not the description generated distinguishes the target object. In order to constrain the combinatorial issues inherent in relational scene model construction we construct a series of reduced scene models, rather than constructing one complex exhaustive model. This construction process is driven by a hierarchy of spatial relations and the partitioning of the context model into objects that may and may not function as landmarks. These two components are developed Algorithm 1 The Basic Incremental Algorithm Require: T = target object; D = set of distractor objects.</Paragraph> <Paragraph position="2"> Initialise: P = {type,colour,size}; DESC = {}</Paragraph> <Paragraph position="4"> Distinguishing description generated if type(x) negationslash[?] DESC then</Paragraph> <Paragraph position="6"> Failed to generate distinguishing description return DESC in the next two sections. In SS3.1 we develop the hierarchy of spatial relations and in SS3.2 we develop a classification of landmarks and use these groupings to create a definition of a distinguishing locative description. In SS3.3 we present the generation algorithm that integrates these components.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Cognitive Ordering of Contexts </SectionTitle> <Paragraph position="0"> Psychological research indicates that spatial relations are not preattentively perceptually available [Treisman and Gormican, 1988]. Rather, their perception requires attention [Logan, 1994; 1995]. These findings point to subjects constructing contextually dependent reduced relational scene models, rather than an exhaustive context free model. Mimicking this, we have developed an approach to context model construction that attempts to constrain the combinatorial explosion inherent in the construction of relational context models by incrementally constructing a series of reduced context models. Each context model focuses on a different spatial relation.</Paragraph> <Paragraph position="1"> The ordering of the spatial relations is based on the cognitive load of interpreting the relation. In this section, we motivate and develop the ordering of relations used.</Paragraph> <Paragraph position="2"> It seems reasonable to asssume that it takes less effort to describe one object than two. Consequently, following the Principle of Minimal Cooperative Effort [Clark and Wilkes-Gibbs, 1986], a speaker should only use a locative expression when they cannot create a distinguishing description of the target object using a simple feature based approach. Moreover, the Principle of Sensitivity [Dale and Reiter, 1995] states that when producing a referring expression, the speaker should prefer features which the hearer is known to be able to interpret and perceive. This points to a preference, due to cognitive load, towards descriptions that distinguish an object using purely physical and easily perceivable features over descriptions that use spatial expressions. Psycholinguistic results support this preference [van der Sluis and Krahmer, 2004].</Paragraph> <Paragraph position="3"> Similarly, we can distinguish between the cognitive loads of processing different forms of spatial relations. In comparing the cognitive load associated with different spatial relations it is important to recognize that they are represented and processed at several levels of abstraction. For example, the geometric level, where metric properties are dealt with, the functional level, where the specific properties of spatial entities deriving from their functions in space are considered, and the pragmatic level, which gathers the underlying principles that people use in order to discard wrong relations or to deduce more information [Edwards and Moulin, 1998]. Our discussion is grounded at the geometric level of representation and processing.</Paragraph> <Paragraph position="4"> Focusing on static prepositions, it is reasonable to reference forms propose that topological prepositions have a lower perceptual load than projective prepositions, due to the relative ease of perceiving two objects that are close to each other and the complex processing required to handle frame of reference ambiguity [Carlson-Radvansky and Irwin, 1994; Carlson-Radvansky and Logan, 1997]. Figure 4 lists these preferences, with further distinctions among features: objects type is the easiest to process, before absolute gradable predicates (e.g. color), which is still easier than relative gradable predicates (e.g. size) [Dale and Reiter, 1995]. This topological versus projective preference can be further refined if we consider the contrastive and relative uses of these relations noted in SS2. Perceiving and interpreting a constrastive use of a spatial relation is computationally easier than judging a relative use. Finally, within the set of projective prepositions, psycholinguistic data indicates a perceptually based ordering of the relations: above/below are easier to percieve and interpret than in front of /behind which in turn are easier than to the right of /to the left of [Bryant et al., 1992; Gapp, 1995].</Paragraph> <Paragraph position="5"> In sum, we would like to propose the following ordering of spatial relations: 1. topological contrastive 2. topological relative 3. projective constrastive [above/below, front/back/, right/left] 4. projective relative [above/below, front/back, right/left] For each level of this hierarchy we require a computational model of the semantics of the relation at that level that accomodates both contrastive and relative representations. In SS2 we noted that the distinctions between the semantics of the different topological prepositions is often based on functional and pragmatic issues.3 Currently, however, more psycholinguistic data is required to distinguish the cognitive load associated with the different topological prepositions. We use the 3See inter alia [Talmy, 1983; Herskovits, 1986; Vandeloise, 1991; Fillmore, 1997; Garrod et al., 1999] for more discussion on these differences model of topological proximity developed in [Kelleher and Kruijff, 2005] to model all the relations at this level. Using this model we can define the extent of a region proximal to an object. If the trajector or one of the distractor objects is the only object within the region of proximity around a given landmark this is taken to model a contrastive use of a topological relation relative to that landmark. If the landmark's region of proximity contains more than one object from the trajector and distractor object set then it is a relative use of a topological relation. We handle the issue of frame of reference ambiguity and model the semantics of projective prepostions using the framework developed in [Kelleher and van Genabith, 2005]. Here again, the contrastive-relative distinction is dependent on the number of objects within the region of space defined by the preposition.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Landmarks and Distinguishing Descriptions </SectionTitle> <Paragraph position="0"> In order to use a locative expression an object in the context must be selected to function as the landmark. An implicit assumption in selecting an object to function as a landmark is that the hearer can easily identify and locate the object within the context. A landmark can be: the speaker (3)a, the hearer (3)b, the scene (3)c, an object in the scene (3)d, or a group of objects in the scene (3)e.4 (3) a. the ball on my right [speaker] b. the ball to your left [hearer] c. the ball on the right [scene] d. the ball to the left of the box [an object in the scene] e. the ball in the middle [group of objects] Currently, new empirical research is required to see if there is a preference order between these landmark categories. Intuitively, in most situations, either of the interlocutors are ideal landmarks because the speaker can naturally assume that the hearer is aware of the speaker's location and their own. Focusing on instances where an object in the scene is used as a landmark, several authors [Talmy, 1983; Landau, 1996; Gapp, 1995] have noted a trajector-landmark asymmetry: generally, the landmark object is more permanently located, larger, and taken to have greater geometric complexity. These characteristics are indicative of salient objects and empirical results support this correlation between object salience and landmark selection [Beun and Cremers, 1998]. However, the salience of an object is intrinsically linked to the context it is embedded in. For example, in the context provided by Figure 5 the ball has a relatively high salience, because it is a singleton, despite the fact that it is smaller and geometrically less complex than the other figures. Moreover, in this context, the ball is the only object in the scene that can function as a landmark without recourse to using the scene itself or a grouping of objects in the scene. Clearly, deciding which objects in a given context are suitable to function as landmarks is a complex and contextually dependent process. Some of the factors effecting this decision tics of topological and projective prepositions.</Paragraph> <Paragraph position="1"> are object salience and the functional relationships between objects. However, one basic constraint on landmark selection is that the landmark should be distinguishable from the trajector. For example, given the context in Figure 5 and all other factors being equal, using a locative such as the man to the left of the man would be much less helpful than using the man to the right of the ball. Following this observation, we treat an object as a candidate landmark if the trajector object can be distinguished from it using the basic incremental algorithm, Algorithm 1.5 Furthermore, a trajector landmark is a member of the candidate landmark set that stands in relation to the trajector and a distractor landmark is a member of the candidate landmark set that stands in relation to a distractor object under the relation being considered. Using these categories of landmark we can define a distinguishing locative description as a locative description where there is trajector landmark that can be distinguished from all the members of the set of distractor landmarks under the relation used in the locative.</Paragraph> <Paragraph position="2"> We can illustrate these different categories of landmark using Figure 6 as the visual context. In this context, if W1 is taken as the target object, the distractor set equals {T1,B1,W2,B2}. Running the basic incremental algorithm would generate the description white block. This distinguishes W1 from T1, B1 and B2 but not from W2. Consequently, the set of candidate landmarks equals {T1,B1,B2}. If we now create a context model for the relation near the set of trajector landmarks would be {T1,B1} and the set of distractor landmarks would be {B1,B2}. Obviously, B1 cannot be distinguished from all the distractor landmarks as it cannot be distinguished from itself. As a result, B1 cannot function as the landmark for a distinguishing locative description for W1 using the relation near. However, T1 can be distinguished from the distractor landmarks B1 and B2 by its type, triangle.</Paragraph> <Paragraph position="3"> So the white block near the triangle would be considered a distinguishing description.</Paragraph> <Paragraph position="4"> gories of landmark.</Paragraph> <Paragraph position="5"> 5As noted by one of our reviewers, one unwanted effect of this definition of a landmark is that it precludes the generation of descriptions that use a landmark that are themselves distinguished using a locative expression. For example, the block to the right of the block which has a ball on it.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Algorithm </SectionTitle> <Paragraph position="0"> The basic approach is to try to generate a distinguishing description using the standard incremental algorithm. If this fails, we divide the context into three components: the trajector: the target object, the distractor objects: the objects that match the description generated for the target object by the standard incremental algorithm, the set of candidate landmarks: the objects that do not match the description generated for the target object by the standard incremental algorithm.</Paragraph> <Paragraph position="1"> We then begin to iterate through the hierarchy of relations and for each relation we create a context model that defines the set of trajector and distractor landmarks. Once a context model has been created we iterate through the trajector landmarks (using a salience ordering if there is more than one)6 and try to create a distinguishing locative description. A distinguishing locative description is created by using the basic incremental algorithm to distinguish the trajector landmark from the distractor landmarks. If we succeed in generating a distinguishing locative description we return the description and stop processing. Algorithm 2 lists the steps in the algorithm. null Algorithm 2 The Locative Algorithm Require: T = target object; D = set of distractor objects; R = hierarchy of relations.</Paragraph> <Paragraph position="2"> DESC = Basic-Incremental-Algorithm(T,D) if DESC negationslash= Distinguishing then create CL the set of candidate landmarks</Paragraph> <Paragraph position="4"> create a context model for relation Ri consisting of TL the set of trajector landmarks and DL the set of distractor land-</Paragraph> <Paragraph position="6"> If we cannot create a distinguishing locative description we are faced with a choice of: (1) iterate on to the next relation 6We model both visual and linguistic salience. Visual salience is computed using a modified version of the visual saliency algorithm described in [Kelleher and van Genabith, 2004]. Discourse salience is computed based on recency of mention as defined in [Hajicov'a, 1993] except we represent the maximum overall salience in the scene as 1, and use 0 to indicate object is not salient. We integrate these two components by summing them and dividing the result by 2.</Paragraph> <Paragraph position="7"> in the hierarchy, (2) create an embedded locative description that distinguishes the landmark. Currently, we prefer option (1) over (2), preferring the dog to the right of the car over the dog near the car to the right of the house. However, the algorithm can generate these longer embedded descriptions if needed. This is done by replacing the call to the basic incremental algorithm for the trajector landmark object with a call to the whole locative expression generation algorithm, with the trajector landmark as the target object and the set of distractor landmarks as the distractor objects. Algorithm 3 lists the steps in the recursive version of the algorithm.</Paragraph> <Paragraph position="8"> Algorithm 3 The Recursive Locative Algorithm Require: T = target object; D = set of distractor objects; R = hierarchy of relations.</Paragraph> <Paragraph position="9"> DESC = Basic-Incremental-Algorithm(T,D) if DESC negationslash= Distinguishing then create CL the set of candidate landmarks</Paragraph> <Paragraph position="11"> create a context model for relation Ri consisting of TL the set of trajector landmarks and DL the set of distractor landmarks null</Paragraph> <Paragraph position="13"> if LANDDESC = Distinguishing then Distinguishing locative generated return {DESC,Ri,LANDDESC} end if end for end for end if</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> FAIL </SectionTitle> <Paragraph position="0"> For both versions of the locative algorithm an important consideration is the issue of infinite regression. As noted by [Dale and Haddock, 1991] a compositional GRE system may, in certain contexts, generate an infinite description by trying to distinguish the landmark in terms of the trajector and the trajector in terms of the landmark, see (4). However, this infinite recursion can only occur if the context is not modified between calls to the algorithm. This issue does not effect Algorithm 2 because each call to the algorithm results in the domain being partitioned into those objects that can and cannot be used as landmarks. One effect of this partitioning is a reduction in the number of object pairs that relations must be computed for. However, and more importantly for this discussion, another consequence of this partitioning is that the process of creating a distinguishing description for a landmark is carried out in a context that is a subset of the context the trajector description was generated in. The distractor set used during the generation of a landmark description is the set of distractor landmarks. This minimally excludes the trajector object, since by definition the landmark objects cannot fulfill the description of the trajector generated by the basic incremental algorithm. This naturally removes the possibility for the algorithm to distinguish a landmark using its trajector.</Paragraph> <Paragraph position="1"> and R2 (4) the bowl on the table supporting the bowl on the table supporting the bowl ...</Paragraph> </Section> class="xml-element"></Paper>