File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0417_metho.xml
Size: 26,376 bytes
Last Modified: 2025-10-06 14:14:24
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0417"> <Title>Strategies for Comparison in Encyclopaedia Descriptions</Title> <Section position="3" start_page="161" end_page="162" type="metho"> <SectionTitle> 2 An Overview of PEBA-II </SectionTitle> <Paragraph position="0"> The architecture of the PEBA-II system is shown in Figure 1; the components are as follows. null The knowledge base that currently underlies the system has been hand-constructed from an analysis of encyclopaedia descriptions of animals and constitutes a taxonomy of the Linnaean animal classes with their associated properties. Particular properties may also be labeled as distinguishing for a specific class. The plan library consists of discourse plans which are used by the text planning component. Currently, the system makes use of two high level discourse plans, which we name identify and compare-and-contrast. The identify discourse plan is used to describe an entity and the compare-and-contrast discourse plan is used to compare two entities. These discourse plans are similar in spirit but rather different in content to the similarly-named schemas used by McKeown \[1985\], with a number of the differences arising from the fact that we are generating hypertext pages.</Paragraph> <Paragraph position="1"> A new discourse goal is generated by the user clicking on a hypertext link in the current document being viewed. Given this new goal, the text planning component selects any relevant information from the knowledge base and organises the information according to the current discourse plan. The leaves of the instantiated discourse plan are then realised via a simple template mechanism. 1 The output from the PEBA-II system is a document marked up using a subset of HTML commands. This document may be displayed using any www document renderer such as Mosaic or Netscape. The user poses new discourse goals to the system by clicking on any of the hypertext tags, and the cycle continues. 2 The combination of text generation and hypertext has been explored by others, most notably in Moore's \[1989, 1995\] PEA and in Re1Although we have experimented with using Elhadad's \[1992\] FUF realisation engine, for the texts we currently generate a template-based mechanism is faster and seems quite adequate. Speed is important in the context of Web-based generation: see Tulloch and Dale \[1995\] for some ideas on addressing the problems here.</Paragraph> <Paragraph position="2"> iter et al's \[1992, 1995\] IDAS. PEBA-II is closest in concept to the IDAS system; a more detailed description of PEBA-II can be found in \[Milosavljevic, Tulloch and Dale 1996\]. Knott et al \[1996\] discuss some further issues involved in combining hypertext with natural language generation.</Paragraph> </Section> <Section position="4" start_page="162" end_page="162" type="metho"> <SectionTitle> 3 Defining Comparisons </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="162" end_page="162" type="sub_section"> <SectionTitle> 3.1 Data analysis </SectionTitle> <Paragraph position="0"> A corpus analysis has been conducted to identify how comparisons are used in encyclopeedia articles, so that these techniques may be built into the PEBA-II system. In the first instance, we have concentrated on the domain of animal descriptions; we intend to widen the scope of this analysis to other domains in order to provide a more domain-independent theory of comparative forms. The two encyclopmdias analysed were Microsoft Encarta \[Microsoft 1995\] and</Paragraph> </Section> <Section position="2" start_page="162" end_page="162" type="sub_section"> <SectionTitle> Groliers Multimedia Encyclopmdia \[Groliers </SectionTitle> <Paragraph position="0"> 1992\]; each encyclopaedia yielded around 1200 animal entries, and from these we collated a subcorpus of sentences involving comparison. This subcorpus contains 1722 sentences from the Encarta corpus, and 1557 from the Groliers corpus.</Paragraph> <Paragraph position="1"> The aim of the corpus analysis was to reverse-engineer the comparisons found in animal descriptions in order to answer the following questions: What entities are compared in descriptive texts and how do they relate to each other? What properties of these entities are used in comparisons? Why are particular entities compared? Why are some entities better comparators than others? What techniques do we need to build into a text generation system to be able to produce similar comparisons?</Paragraph> </Section> <Section position="3" start_page="162" end_page="162" type="sub_section"> <SectionTitle> 3.2 Some Definitions 3.2.1 Comparison </SectionTitle> <Paragraph position="0"> We will adopt the following definitions: A comparative proposition is a proposition whose purpose is to draw the hearer's attention to a difference or a similarity that two entities have for the value of a shared attribute. 3 A comparison is the linguistic realisation of a set of one or more comparative propositions, where the purpose of the set of propositions is to draw the hearer's attention to one or more differences or similarities between two entities.</Paragraph> <Paragraph position="1"> We have identified three different types of comparative forms that appear in descriptive texts, which we refer to here as DI-</Paragraph> </Section> </Section> <Section position="5" start_page="162" end_page="165" type="metho"> <SectionTitle> RECT COMPARISONS, CLARIFICATORY COM- </SectionTitle> <Paragraph position="0"> PARISONS, and ILLUSTRATIVE COMPARISONS.</Paragraph> <Paragraph position="1"> Of these three types, only the first has been explored to any great degree in the context of natural language generation: both McKeown \[1985\] and Maybury \[19951 have looked at various aspects of direct comparisons.</Paragraph> <Paragraph position="2"> A DIRECT COMPARISON is a comparison whose purpose is to compare two entities where neither entity is more central to the discourse than the other. In the context of a language generation system like PEBA-II, direct comparisons arise when the user enters a request such as: What is the difference between the Echidna and the African Porcupine? PEBA-II generates the text shown in Figure 2 in response to such a query.</Paragraph> <Paragraph position="3"> This text is essentially 'bi-focal': the echidna and the porcupine are equally imporaIn the terminology we adopt here, a PROPERTY is a tuple consisting of an ATTRIBUTE and a VALUE; \[or example, (colour, red/.</Paragraph> <Paragraph position="4"> : &quot; PSile ~,.dit ~iew ~o ~ookmarks .O.ptlons D.irectoly ~indow ~elp I 1 1 112\] I ! , The Echidna and the African Porcupine i The Echidna, also known as the spiny Anteater, is a type of Monotreme. The Monotreme is a type of Mammal that lays eggs with leathery shells similar to reptiles. The African Porcupine is a type of Flacental Mammal. The Placental Mammal is a type of Mammal that carries its developing young inside the mothers womb. rant, and the purpose of the text is to determine their similarities and differences based on both their relationship within a taxonomy of animals (their lowest common ancestor) and their attributes. This is, of course, the same notion of comparison that is used in McKeown's \[1985\] TEXT system.</Paragraph> <Paragraph position="5"> The key point here is that direct comparisons are generally user-initiated. More interesting from the point of view of text generation are clarificatory and illustrative comparisons: here, the entity being described by the system is described in relation to some other entity chosen by the system.</Paragraph> <Paragraph position="6"> A CLARIFICATORY COMPARISON is a comparison whose purpose is to describe an entity by distinguishing it clearly from another entity with which it might be confused or with which it shares a number of salient properties.</Paragraph> <Paragraph position="7"> In such cases we will refer to the first entity as the FOCUSED ENTITY, and to the second entity as the POTENTIAL CONFUSOR.</Paragraph> <Paragraph position="8"> The main difference between a clarificatory comparison and a direct comparison is that a clarificatory comparison is made within a text whose purpose is to describe one entity and not purely to provide a comparison between two entities. A clarificatory comparison serves to describe the focused entity; thus, it corresponds to the user entering a request such as What is the echidna? In such a case, instead of describing the echidna in isolation, the system may choose to describe it using a clarificatory comparison with the porcupine.</Paragraph> <Paragraph position="9"> There are two reasons why a clarificatory comparison might be used: * The focused entity might be extremely similar to another entity, and therefore often confused with that entity. In this case, it is important that, when describing the focused entity, it is sufficiently distinguished from the potential confusor. null * Alternatively, an entity sharing a number of salient features with the focused entity might already be known to the user; in such a case, a clarificatory comparison between these entities may aid the user's understanding of the focused entity.</Paragraph> <Paragraph position="10"> For example, consider the following text extracted from the animal corpus: Sheep, are hollow-horned ruminants belonging to the genus Ovis, suborder Ruminata, family Bovidae. Similar to goats, sheep differ in their stockier bodies, the presence of scent glands in face and hind feet, and the absence of beards in the males.</Paragraph> <Paragraph position="11"> Domesticated sheep are also more timid and prefer to flock and follow a leader. \[Groliers 1992\].</Paragraph> <Paragraph position="12"> In this text, the focused entity (the sheep) is very similar and might often be confused with the comparator entity (the goat); this is particularly true of some wild sheep. A reader who is familiar with the comparator entity will also more easily form a mental picture of what the focused entity is like.</Paragraph> <Paragraph position="13"> There are a number of interesting research issues here: * How is a comparator entity selected? For example, a very appropriate comparator for the echidna is the porcupine, but the two entities are not closely related within the Linnaean taxonomy of animal classes.</Paragraph> <Paragraph position="14"> The reason for the choice of comparator entity here lies in the fact that both animals possess sharp spines--this is the only salient property the animals share.</Paragraph> <Paragraph position="15"> * How do we make clarificatory comparisons which do not cause the user to make incorrect inferences? For example, if a user who is not familiar with sheep requests a description of the sheep and the system describes the sheep by informing the user of its similarities with the goat and not their differences, then the user could be led to believe that the two animals are more similar than they are in reality. The text shown above very carefully describes both similarities and differences for only the most salient features which clearly distinguish the animals.</Paragraph> <Paragraph position="16"> A user model is advantageous here since the importance of different attribute types will vary from person to person. For example, if external appearance is the most important attribute, then we would want to compare the echidna to the porcupine. If, on the other hand, reproduction is considered a more important feature, then we might compare the echidna to the platypus. The geographical location of the user can also play an important role: for example, in the texts that we have examined, squirrels are often used as comparators; but Australians are not necessarily familiar with the features of squirrels, and some North Americans might only know of the existence of black squirrels.</Paragraph> <Paragraph position="17"> An ILLUSTRATIVE COMPARISON is a comparison whose purpose is to describe one or more attributes of an entity by referring to the same attribute(s) of another entity with which the user is familiar. The difference between an illustrative comparison and a clarificatory comparison is that in an illustrative comparison, the comparator entity, although usually of a similar type (in this case, an animal), may only share one attribute with the focused entity, and is not necessarily similar in any other way to the focused entity.</Paragraph> <Paragraph position="18"> Here are some illustrative comparisons from our corpus: * Powerful and aggressive animals about the size of a large dog, baboons have strong, elongated jaws, large cheek pouches in which they store food, and eyes close together. \[Microsoft 1995\] * \[Aye-aye\] are about the size of a large cat and have long, bushy tails, a shaggy brown coat, and large ears. \[Microsoft 1995\] * About the size of a small fox, \[the Ayeaye\] has a long, bushy tail, moderately large eyes, thick fur, and a pair of enlarged front teeth resembling those of rodents. \[Groliers 1992\] * This echolocation system, similar to that of the bat, enables the dolphin to navigate among its companions and larger objects and to detect fish, squid, and even small shrimp. \[Microsoft 1995\] Slightly larger than chinchillas, the mountain viscachas have long, rabbitlike ears and a long squirrel-like tail. \[Microsoft 1995\] In each of these sentences, an illustrative comparison is made so that the reader can more easily grasp the concept being described. Instead of describing the size and proportion of the viscacha's ears in absolute terms, a reference to the rabbit's ears makes it easier for the reader to understand what the ears really look like.</Paragraph> <Paragraph position="19"> There is a great deal of scope for tailoring descriptions to a user's knowledge here: for example, illustrating the size of the aye-aye with the fox might be appropriate for a user who is familiar with the fox; however this illustration might not be appropriate for someone located in Australia, since the fox is not found in Australia. The features of a particular animal (the sheep, for example) might also vary geographically.</Paragraph> </Section> <Section position="6" start_page="165" end_page="168" type="metho"> <SectionTitle> 4 Implementing Comparison </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="165" end_page="165" type="sub_section"> <SectionTitle> Strategies </SectionTitle> <Paragraph position="0"> Above, we identified three particular types of comparisons that are present in our corpus.</Paragraph> <Paragraph position="1"> In PEBA-II, each corresponds to a particular discourse strategy for generating a hypertext page. In this section, we describe how these strategies are implemented within PEBA-II.</Paragraph> </Section> <Section position="2" start_page="165" end_page="165" type="sub_section"> <SectionTitle> 4.1 Choosing Amongst the Strate- </SectionTitle> <Paragraph position="0"> gies We are faced with two interdependent questions: when do we decide to describe an entity by comparing it to another entity, and how do we decide which type of comparison to use? Recall from earlier that PEBA-II can address two different discourse goals: requests to describe some specified entity, and requests to compare two specified entities. The latter discourse goal corresponds, of course, to the category of direct comparisons we identified above. As we noted earlier, direct comparisons are thus user-initiated. We are more interested here in how PEBA-II decides when it is appropriate to use either a clarificatory comparison or an illustrative comparison. Each becomes an option when PEBA-I! has been asked to describe some specified entity. A clarificatory comparison is generated whenever the entity to be described is known to have a POTENTIAL CONFUSOR: our implementation of this strategy is currently very simple, and is described in Section 4.3. Illustrative comparisons are the focus of the current work, and we describe our approach to these in Section 4.4.</Paragraph> </Section> <Section position="3" start_page="165" end_page="166" type="sub_section"> <SectionTitle> 4.2 Direct Comparisons </SectionTitle> <Paragraph position="0"> As mentioned earlier, the PEBA-II system allows the user to request one of two actions: to describe a single entity or to compare two entities. A direct comparison is generated by PEBA-II whenever the user requests a comparison between two entities. Using a</Paragraph> </Section> <Section position="4" start_page="166" end_page="167" type="sub_section"> <SectionTitle> 4.3 Clarificatory Comparisons </SectionTitle> <Paragraph position="0"> The purpose of a clarificatory comparison is to ensure that the reader does not confuse the entity being described with some other entity.</Paragraph> <Paragraph position="1"> Such confusions are possible when the entity being described is similar in relevant respects to some other entity.</Paragraph> <Paragraph position="2"> We could try to generate such clarificatory comparisons from first principles: when we have to describe some entity e, we could search the knowledge base for entities which share properties with e, and then use some mechanism to determine whether there is any chance that the two entities might be confused. We could then phrase our description of e to make sure that we distinguish e from such potential confusors. For example, in describing the rabbit, it may be important to distinguish it from the very similar hare in order to avoid confusion. 4 There are problems with such an approach: searching the knowledge base in this way would be a very costly process: it assumes a rather more complete knowledge base than we may be able to rely on; and, most important of all, it assumes that we can determine likelihood of confusability on the basis of some metric--but it is not at all clear what such a metric might be.</Paragraph> <Paragraph position="3"> Our current solution to these problems is to sidestep them entirely: for each entity that has a potential confusor--for example, sheep and goats--we specify this explicitly in the 4There are clearly ideas we might use here in Mc-Coy's \[1988\] work on correcting a user's misconceptions; however, the real issue here lies in determining whether such a misconception might arise from a generated comparison (see Zukerman and McConachy \[1993, 1995\] for some work in this area).</Paragraph> <Paragraph position="4"> knowledge base by means of a clause of the following form: (hasprop sheep (potential-confusor goat)) Then, whenever we have to describe the sheep, we know immediately that it has a potential confusor in the goat, and invoke a discourse strategy that makes an explicit comparison between the two entities. The resulting text includes a comparison with the goat but is aimed at describing the sheep and hence goes further than a direct comparison between the sheep and goat.</Paragraph> <Paragraph position="5"> Hard-coding potential confusors might be considered an 'easy way out', although it is our view that this is one of many places in NLG where there is benefit in adopting solutions that make use of precomputed information in preference to working things out from first principles. For example, singling out potential comparator entities in this way is no different in principle to explicitly marking in the knowledge base those properties which are distinguishing characteristics, a tactic that both McKeown \[1985\] and we ourselves use. 5 We have adopted this philosophy for various design decisions made in the development of PEBA-II, so that, for example, we also make use of a phrasal lexicon as a repository of precomputed mappings from semantic units to multi-word lexico-syntactic resources (see \[Becker 1979\] for an early justification for this approach). Again, a similar philosophy underpins the use of precomputed lists of preferred attributes in the work on the generation of referring expressions reported in \[Reiter and Dale, 1992\]. Our position is that such methods can be a virtue rather than a vice, since they allow broad coverage systems to be built more quickly.</Paragraph> <Paragraph position="6"> 5Note that Maybury \[1995\], on the other hand, outlines an algorithm for determining the distinguishing characteristics for an entity from first principles.</Paragraph> </Section> <Section position="5" start_page="167" end_page="168" type="sub_section"> <SectionTitle> 4.4 Illustrative Comparisons </SectionTitle> <Paragraph position="0"> Currently, most of our attention is focused on the third category of comparisons, those we have termed illustrative comparisons. These are cases where one or more attributes of an entity being described are compared to those of a common object with which the reader is assumed to be familiar. For the present discussion, we will concentrate on the attributes of size and weight, and the mechanisms used to produce illustrative comparisons that indicate these attributes of the entity being described. These are probably two of the easiest properties to deal with; it remains to be seen to what extent the mechanisms we propose will generalise to other attributes.</Paragraph> <Paragraph position="1"> For illustrative comparisons, there are two questions to be answered: * How do we decide whether an illustrative comparator should be introduced? * How do we decide which comparator to choose when there are multiple candidates? null We could perform these comparisons using a similar approach to that which we adopted for clarificatory comparisons: for each entity-attribute pair we could specify some entity that can be used as a comparator. Thus, we might have clauses in the knowledge base that look like the following: (hasprop baboon (illustrative-comparator size dog)) However, this would be unwieldy: part of the justification for taking this approach in the case of clarificatory comparisons is that we would expect a relatively small subset of the entities in the knowledge base to have potential confusors, and so the cost of explicitly encoding a representation of these potential confusors is not too great. However, virtually any entity-attribute pair might be described using an illustrative comparison, and so we need some way of generalising the processing here.</Paragraph> <Paragraph position="2"> We do this by making use of the notion of a COMMON COMPARATOR SET. This is a set of entity types that can be compared against for illustrative purposes. For the moment, a common comparator set is defined for each attribute we might wish to describe; there may be some scope for interesting generalisations later. We focus here on the size and weight attributes: for both of these, our common comparator set is the set (human, dog, cat) Note that the common comparator set for any given attribute is domain specific: different comparator sets for size and weight will be appropriate in different domains; user specific: it is likely that different comparator sets will be appropriate for different users; and in principle extensible, both directly and indirectly: we can imagine the user explicitly being allowed to specify a set of comparator objects, or we could dynamically extend the set used on the basis of the ongoing discourse history.</Paragraph> <Paragraph position="3"> There may be ways of building or precompiling a common comparator set automatically using the knowledge base and information from a user model, but for the moment we assume that it has been preconstructed.</Paragraph> <Paragraph position="4"> Given an entity e we want to describe and some attribute a of the entity we want to communicate, we use the algorithm in Figure 3. The procedure used here for finding the best match is one that in our current experiments looks acceptable, although it is likely to be applicable only for a relatively narrow range of attributes. There are a number of obvious deficiencies, all of which we are currently exploring: * Properties are not independent: for example, we have found that, when deal- null To describe attribute a of entity e (the focused entity): - choose ei whose median value for a is closest to Val -if this doesn't select uniquely from amongst the comparator set then choose ei whose range for a is closest to Val Figure 3: Choosing a comparator object ing with size, we also need to take account of similarity of body-form in determining which entity makes the best comparator, and so our current mechanism distinguishes three different size measurements: height, length and shoulderheight. null * Similarity and difference are not completely distinct: the similarity of two values for a particular attribute should be viewed as a scale of similarity rather than as a binary distinction.</Paragraph> <Paragraph position="5"> * The user's degree of familiarity with the potential comparators can help in making a choice.</Paragraph> <Paragraph position="6"> * The degree of relatedness between the two entities can also play a role in choosing the best comparator.</Paragraph> <Paragraph position="7"> So far, however, the results of the simple method we have outlined seem promising. For example, PEBA-II currently generates the following sentences: * The platypus is about the same length as a domestic cat.</Paragraph> <Paragraph position="8"> * The baboon has about the same shoulder height as a domestic dog.</Paragraph> <Paragraph position="9"> Note that the use of a common comparator set in conjunction with the algorithm specified here means that we can separate the domain-specific aspects of the computation from the domain-independent aspects; in principle, the aim is that the comparator set specifies domain-specific information, but the algorithm itself is domain independent.</Paragraph> <Paragraph position="10"> As always, our methodology is to pursue solutions that first assume a considerable amount of precompiled knowledge and then introduce generalisability and flexibility through subsequent parameterisation, rather than beginning with a very limited coverage solution that works from first principles. It is our view that this methodology is the only one that is likely to be successful for broad coverage, practical NLG systems.</Paragraph> </Section> </Section> class="xml-element"></Paper>