File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1041_metho.xml
Size: 21,309 bytes
Last Modified: 2025-10-06 14:10:05
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1041"> <Title>Structuring Knowledge for Reference Generation: A Clustering Algorithm</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Perspective-taking The contextual appropriate- </SectionTitle> <Paragraph position="0"> ness of a description depends on the perspective being takenin context. For instance, if it is known ofareferentthatitisateacher,andasportsman,it is better to talk of the teacher in a context where another referent has been introduced as the student. This is discussed further inSS3.</Paragraph> <Paragraph position="1"> Our aim is to motivate an approach to GRE where these problems are solved by pre-processing the information in the knowledge base, prior to content determination. To this end,SS4 describes a clustering algorithmandshowshowitcanbeappliedtothesedifferent null problems to structure the KB prior to GRE.</Paragraph> </Section> <Section position="4" start_page="0" end_page="321" type="metho"> <SectionTitle> 2 Numeric values: The case of location </SectionTitle> <Paragraph position="0"> Several types of information about domain entities, such as gradable properties (van Deemter, 2000) and physical location, are best captured by real-valued attributes. Here, we focus on the example of location as an attribute taking a tuple of values which jointly determine the position of an entity.</Paragraph> <Paragraph position="1"> The ability to distinguish groups is a well-established feature of the human perceptual apparatus (Wertheimer, 1938; Treisman, 1982). Representing salient groups can facilitate the task of excluding distractors in the search for a referent. For instance, the set of referents marked as the intended referential target in Figure 1 is easily distinguishable as a group and warrants the use of a spatial description such as the objects in the top left corner, possibly with a collective predicate, such as clustered or gathered. In case of reference to a subset of the marked set, although loca-tion would be insufficient to distinguish the targets, it would reduce the distractor set and facilitate reference resolution2.</Paragraph> <Paragraph position="2"> In GRE, an approach to spatial reference based on grouping has been proposed by Funakoshi et al.</Paragraph> <Paragraph position="3"> 2Location has been found to significantly facilitate resolution, even when it is logically redundant (Arts, 2004)</Paragraph> <Paragraph position="5"> (2004). Given a domain and a target referent, a sequence of groups is constructed, starting from the largest group containing the referent, and recursively narrowing down the group until only the referent is identified. The entire sequence is then rendered linguistically. The algorithm used for identifying perceptual groups is the one proposed by Thorisson (1994), the core of which is a procedure which takes as input a list of pairs of objects, ordered by the distance between the entities in the pairs. The procedure loops through the list, finding the greatest difference in distance between two adjacent pairs. This is determined as a cutoff point for group formation. Two problems are raised by this approach: P1 Ambiguous clusters A domain entity can be placed in more than one group. If, say, the input list is angbracketleftbig{a,b},{c,e},{a,f}angbracketrightbig and the greatest difference after the first iteration is between {c,e}and{a,f},thenthefirstgrouptobeformed will be{a,b,c,e}with{a,f}likely to be placed in a different group after further iterations. This may be confusingfroma referentialpointof view.</Paragraph> <Paragraph position="6"> The problem arises because grouping or clustering takes place on the basis of pairwise proximity or distance. This problem can be partially circumvented by identifying groups on several perceptual dimensions (e.g. spatial distance, colour, and shape) and then seeking to merge identical groups determined on the basis of these differentqualities(seeThorisson(1994)). However,the groupingstrategy can still return groupswhich do not conform to human perceptual principles. A better strategy is to base clustering on the Nearest Neighbour Principle, familiar from computational geometry (Prepaarata and Shamos, 1985), whereby elements are clustered with their nearest neighbours, given a distance function. The solution offered below is based on this principle.</Paragraph> <Paragraph position="7"> P2 Perceptual proximity Absolute distance is not sufficient for cluster identification. In Figure 1, for example, the pairs{e1,e2}and{e5,e6}could easily be consecutively ranked, since the distance between e1 and e2 is roughly equal to that between e5 and e6. However, they would not naturally be clustered together by a human observer, because grouping of objects also needs to take into account the position of the surrounding elements. Thus, while e1 is as far away from e2 as e5 is from e6, there are elements which are closer to{e1,e2}than to{e5,e6}.</Paragraph> <Paragraph position="8"> The proposal in SS4 represents a way of getting around these problems, which are expected to arise in any kind of domain where the information given is the pairwise distance between elements. Before turning to the framework, we consider another situation in GRE where the need for clustering could arise.</Paragraph> </Section> <Section position="5" start_page="321" end_page="322" type="metho"> <SectionTitle> 3 Perspectives and semantic similarity </SectionTitle> <Paragraph position="0"> In real-world discourse, entities can often be talked about from different points of view, with speakers bringing to bear world and domain-specificknowledge to select information that is relevant to the current topic. In order to generate coherent discourse, a generatorshouldideallykeeptrackofhowentitieshavebeen null referredto, and maintainconsistencyas far as possible.</Paragraph> <Paragraph position="1"> Suppose e1 in Table 1 has been introduced into the discourse via the description the student and the next utterance requires a reference to e2. Any one of the three available attributes would suffice to distinguish the latter. However, a description such as the woman ortheitalianwoulddescribethisentityfromadifferent point of view relative to e1. By hypothesis, the teacher is more appropriate, because the property ascribed to e2 is more similar to that ascribed to e1.</Paragraph> <Paragraph position="2"> A similar case arises with plural disjunctive descriptionsof the formlx[p(x)[?]q(x)], which are usuallyrealised as coordinate constructions of the form the N'1 and the N'2. For instance a reference to{e1,e2}such as the woman and the student, or the englishman and the teacher, would be odd, compared to the alternative the student and the teacher. The latter describes these entities under the same perspective. Note that 'consistency' or 'similarity' is not guaranteed simply by attempting to use values of the same attribute(s) for a given set of referents. The description the student and the chef for{e1,e3}is relatively odd compared to the alternative the englishman and the greek. In both kinds of scenarios, a GRE algorithm that relied on a rigid preferenceordercouldnotguaranteethata coherent description would be generated every time it was available.</Paragraph> <Paragraph position="3"> The issues raised here have never been systematically addressedin the GRE literature, althoughsupport for the underlying intuitions can be found in various quarters. Kronfeld (1989) distinguishes between functionally and conversationally relevant descriptions. A descriptionis functionallyrelevantif itsucceedsin distinguishing the intended referent(s), but conversational relevance arises in part from implicatures carried by the use of attributes in context. For example, describing e1 as the student carries the (Gricean) implicature that the entity's academic role or profession is somehow relevant to the current discourse. When two entities are described using contrasting properties, say the student and the italian, the listener may find it harder to work out the relevance of the contrast. In a related vein,Aloni(2002)formalisesthe appropriatenessofan answer to a question of the form Wh x? with reference to the 'conceptual covers' or perspectives under which x can be conceptualised, not all of which are equally relevant given the hearer's information state and the discourse context.</Paragraph> <Paragraph position="4"> With respect to plurals, Eschenbach et al. (1989) argue that the generation of a plural anaphor with a split antecedent is more felicitous when the antecedents have something in common, such as their ontological category. This constraint has been shown to hold psycholinguistically (Kaup et al., 2002; Koh and Clifton, 2002; Moxey et al., 2004). Gatt and van Deemter (2005a)have shown that people's perception of the adequacy of plural descriptions of the form, the N1 and (the) N2 is significantly correlated with the semantic similarity of N1 and N2, while singular descriptions are more likely to be aggregated into a plural if semantically similar attributes are available (Gatt and Van Deemter, 2005b).</Paragraph> <Paragraph position="5"> The two kinds of problems discussed here could be resolved by pre-processing the KB in order to identify available perspectives. One way of doing this is to group available properties into clusters of semantically similar ones. This requires a well-defined notion of 'similarity' which determines the 'distance' between properties in semantic space. As with spatial clustering, the problem is then of how to get from pairwise distance to well-formed clusters or groups, while respecting the principles underlying human perceptual/conceptual organisation. The next section describes an algorithm that aims to achieve this.</Paragraph> </Section> <Section position="6" start_page="322" end_page="325" type="metho"> <SectionTitle> 4 A framework for clustering </SectionTitle> <Paragraph position="0"> In what follows, we assume the existence of a set of clustersCin a domain S of objects (entities or properties), to be 'discovered' by the algorithm. We further assume the existence of a dimension, which is characterised by a function d that returns the pairwise distanced(a,b), where<a,b> [?]SxS. In case an attribute ischaracterisedbymorethanonedimension,say<x,y> coordinates in a 2D plane, as in Figure 1, then d is defined as the Euclidean distance between pairs:</Paragraph> <Paragraph position="2"> where D is a tuple of dimensions, xab = d(a,b) on dimension x. d satisfies the axioms of minimality (2a), symmetry (2b), and the triangle inequality (2c), by which it determines a metric space on S:</Paragraph> <Paragraph position="4"> We now turn to the problems raised inSS2. P1 would be avoided by a clustering algorithm that satisfies (3).</Paragraph> <Paragraph position="6"> It was also suggested above that a potential solution to P1 is to cluster using the Nearest Neighbour Principle. Before considering a solution to P2, i.e. the problem of discovering clusters that approximate human intuitions, it is useful to recapitulate the classic principles of perceptual grouping proposed by Wertheimer (1938), of which the following two are the most relevant: null 1. Proximity The smaller the distance between objects in the cluster, the more easily perceived it is. 2. Similarity Similar entities will tend to be more easily perceived as a coherent group.</Paragraph> <Paragraph position="7"> Arguably, once a numeric definition of (semantic) similarity is available, the Similarity Principle boils down to the Proximity principle, where proximity is defined via a semantic distance function. This view is adopted here. How well our interpretation of these principles can be ported to the semantic clustering problem of SS3 will be seen in the following subsections. null To resolve P2, we will propose an algorithm that uses a context-sensitive definition of 'nearest neighbour'. Recall that P2 arises because, while d is a measure of 'objective' distance on some scale, perceived proximity (resp. distance) of a pair<a,b> is contingent not only on d(a,b), but also on the distance of a and b from all other elements in S. A first step towards meeting this requirement is to consider, for a given pair of objects, not only the absolute distance (proximity) between them, but also the extent to which they are equidistant from other objects in S. Formally, a measure of perceived proximity prox(a,b) can be approximated by the following function. Let the two sets Pab,Dab be defined as follows:</Paragraph> <Paragraph position="9"> that is, prox(a,b) is a function of the absolute distance d(a,b), the number of elements in S [?]{a,b} which are roughly equidistant from a and b, and the number of elements which are not equidistant. One way of conceptualising this is to consider, for a given object a, the list of all other elements of S, ranked by their distance (proximity)to a. Suppose there exists an object b whose ranked list is similar to that of a, while another object c's list is very different. Then, all other things being equal (in particular, the pairwise absolute distance), a clusters closer to b than does c.</Paragraph> <Paragraph position="10"> This takes us from a metric, distance-based conception,toabroadernotionofthe'similarity'betweentwo null objects in a metric space. Our definition is inspired by Tversky's feature-based Contrast Model (1977), in which the similarity of a,b with feature sets A,B is a linear function of the features they have in common and the features that pertain only to A or B, i.e.: sim(a,b) = f(A[?]B)[?]f(A[?]B). In (4), the distance of a and b from every other object is the relevant feature.</Paragraph> <Section position="1" start_page="323" end_page="323" type="sub_section"> <SectionTitle> 4.1 Computing perceived proximity </SectionTitle> <Paragraph position="0"> The computation of pairwise perceived proximity prox(a,b), shown in Algorithm 1, is the first step towards finding clusters in the domain.</Paragraph> <Paragraph position="1"> Following Thorisson (1994), the procedure uses the absolute distance d to calculate 'absolute proximity' (1.7), a value in (0,1), with 1 corresponding to d(a,b) = 0, i.e. identity (cf. axiom (2a) ). The procedure then visits each element of the domain, and compares its rank with respect to a and b (1.9-1.13)3, incrementing a proximity score s (1.10) if the ranks are 3We simplify the presentation by assuming the function rank(x,a) that returns the rank of x with respect to a. In practice, this is achieved by creating, for each element of the input pair, a totally ordered list La such that La[r] holds the set of elements ranked at r with respect to d(x,a) 2: if a = b then 3: return 1 4: end if 5: s-0 6: d-0 7: p(a,b)-1[?] d(a,b)maxD 8: for all x[?]S[?]{a,b} do 9: if |rank(x,a)[?]rank(x,b)|[?]k then 10: s-s + 1 11: else 12: d-d + 1 13: end if 14: end for 15: return p(a,b)x sd approximately equal, or a distance score d otherwise (1.12). Approximate equality is determined via a constant k (1.1), which, based on our experiments is set to a tenth the size of S. The procedurereturns the ratio of proximity and distance scores, weighted by the absolute proximity p(a,b) (1.15). Algorithm 1 is called for all pairs in SxS yielding, for each element a[?]S, a list of elementsorderedby their perceivedproximityto a. The entity with the highest proximity to a is called its anchor. Note that any domain object has one, and only one anchor.</Paragraph> </Section> <Section position="2" start_page="323" end_page="324" type="sub_section"> <SectionTitle> 4.2 Creating clusters </SectionTitle> <Paragraph position="0"> The procedure makeClusters(S,Anchors), shown in its basic form in Algorithm 2, uses the notion of an anchor introduced above. The rationale behind the algorithm is captured by the following declarative principle, where C [?]Cis any cluster, and anchor(a,b) means 'b is the anchor of a':</Paragraph> <Paragraph position="2"> A cluster is defined as the transitive closure of the anchor relation, that is, if it holds that anchor(a,b) and anchor(b,c), then {a,b,c} will be clustered together. Apartfromsatisfying(5),theprocedurealsoinduces a partition on S, satisfying (3). Given these primary aims, no attempt is made, once clusters are generated, to further sub-divide them, although we briefly return to this issue in SS5. The algorithm initialises a set Clusters to empty (2.1), and iterates through the list of objects S (2.5). For each object a and its anchor b (2.6), it first checks whether they have already been clustered (e.g. if either of them was the anchor of an object visited earlier) (2.7, 2.12). If this is not the case, then a provisionalcluster is initialised foreach element containing a with that of its b (2.18), having removed the latter from the cluster set (2.14).</Paragraph> <Paragraph position="3"> This algorithm is guaranteed to induce a partition, since no element will end up in more than one group.</Paragraph> <Paragraph position="4"> It does not depend on an ordering of pairs `a la Thorisson. However, problems arise when elements and anchors are clustered n&quot;aively. For instance, if an element is very distant from every other element in the domain, prox(a,b) will still find an anchor for it, and makeClusters(S,Anchors) will place it in the same cluster as its anchor, although it is an outlier. Before describing how this problem is rectified, we introduce the notion of a family (F) of elements. Informally,this is a set of elementsofS that have the same anchor,that is:</Paragraph> <Paragraph position="6"> The solution to the outlier problem is to calculate a centroid value for each family found after prox(a,b).</Paragraph> <Paragraph position="7"> This is the averageproximitybetween the commonanchor and all members of its family, minus one standard deviation. Prior to merging, at line (2.18), the algorithm now checks whether the proximity value between an element and its anchor falls below the centroid value. If it does, the the cluster containing an object and that containing its anchor are not merged.</Paragraph> </Section> <Section position="3" start_page="324" end_page="325" type="sub_section"> <SectionTitle> 4.3 Two applications </SectionTitle> <Paragraph position="0"> The algorithm was applied to the two scenarios described in SS2 and SS3. In the spatial domain, the algorithm returns groups or clusters of entities, based on theirspatialproximity. Thiswastested ondomainslike Figure 1 in which the input is a set of entities whose position is defined as a pair of x/y coordinates. Figure 1 illustrates a potential problem with the procedure. In that figure, it holds that anchor(e8,e9) and anchor(e9,e8), making e8 and e9 a reciprocal pair.</Paragraph> <Paragraph position="1"> In such cases, the algorithm inevitably groups the two elements, whatever their proximity/distance. This may be problematic when elements of a reciprocal pair are very distant from eachother, in which case they are unlikely to be perceived as a group. We return to this problem briefly inSS5.</Paragraph> <Paragraph position="2"> The second domain of application is the clustering of properties into 'perspectives'. Here, we use the information-theoretic definition of similarity developed by Lin (1998) and applied to corpus data by Kilgarriff and Tugwell (Kilgarriff and Tugwell, 2001).</Paragraph> <Paragraph position="3"> This measure defines the similarity of two words as a functionofthe likelihoodoftheiroccurringinthesame grammatical environments in a corpus. This measure was shown experimentally to correlate highly with human acceptability judgments of disjunctive plural descriptions (Gatt and van Deemter, 2005a), when compared with a number of measures that calculate the similarity of word senses in WordNet. Using this as the measure of semantic distance between words, the algorithm returns clusters such as those in Figure 2.</Paragraph> <Paragraph position="4"> input: {waiter, essay, footballer, article, servant, cricketer, novel, cook, book, maid, player, striker, goalkeeper} output: If the words in Figure 2 represented properties of different entities in the domain of discourse, then the clusters would represent perspectives or 'covers', whose extension is a set of entities that can be talked about from the same point of view. For example, if some entity were specified as having the property footballer, and the property striker, while another entity had the propertycricketer, then accordingto the output of the algorithm, the description the footballer and the cricketer is the most conceptually coherent one available. It could be argued that the units of representation in GRE are not words but 'properties' (e.g. values of attributes) which can be realised in a number of different ways (if, for instance, there are a number of synonyms corresponding roughly to the same intension).</Paragraph> <Paragraph position="5"> This could be remedied by defining similarity as 'distance in an ontology'; conversely, properties could be viewed as a set of potential (word) realisations.</Paragraph> </Section> </Section> class="xml-element"></Paper>