File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1052_intro.xml

Size: 5,719 bytes

Last Modified: 2025-10-06 14:02:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1052">
  <Title>Generating Referring Expressions in Open Domains</Title>
  <Section position="3" start_page="0" end_page="5" type="intro">
    <SectionTitle>
2 Overview of Prior Approaches
</SectionTitle>
    <Paragraph position="0"> The incremental algorithm (Reiter and Dale, 1992) is the most widely discussed attribute selection algorithm. It takes as input the intended referent and a contrast set of distractors (other entities that could be confused with the intended referent). Entities are represented as attribute value matrices (AVMs). The algorithm also takes as input a *preferred-attributes* list that contains, in order of preference, the attributes that human writers use to reference objects. For example, the preference might be fcolour;size;shape:::g. The algorithm then repeatedly selects attributes from *preferred-attributes* that rule out at least one entity in the contrast set until all distractors have been ruled out.</Paragraph>
    <Paragraph position="1"> It is instructive to look at how the incremental algorithm works. Consider an example where a large brown dog needs to be referred to. The contrast set contains a large black dog. These are represented by the AVMs shown below.</Paragraph>
    <Paragraph position="2">  Assuming that the *preferred-attributes* list is [size;colour;:::], the algorithm would rst compare the values of the size attribute (both large), disregard that attribute as not being discriminating, compare the values of the colour attribute and return the brown dog.</Paragraph>
    <Paragraph position="3"> Subsequent work on referring expression generation has expanded the logical framework to allow reference by negation (the dog that is not black) and references to multiple entities (the brown or black dogs) (van Deemter, 2002), explored different search algorithms for nding the minimal description (e.g., Horacek (2003)) and offered different representation frameworks like graph theory (Krahmer et al., 2003) as alternatives to AVMs. However, all these approaches are based on very similar formalisations of the problem, and all make the following assumptions:  1. A semantic representation exists.</Paragraph>
    <Paragraph position="4"> 2. A classi cation scheme for attributes exists.</Paragraph>
    <Paragraph position="5"> 3. The linguistic realisations are unambiguous.</Paragraph>
    <Paragraph position="6"> 4. Attributes cannot be reference modifying.</Paragraph>
    <Paragraph position="7">  All these assumptions are violated when we move from generation in a very restricted domain to re-generation in an open domain. In regeneration tasks such as summarisation, open-ended question answering and text simpli cation, AVMs for entities are typically constructed from noun phrases, with the head noun as the type and pre-modi ers as attributes. Converting words into semantic labels would involve sense disambiguation, adding to the cost and complexity of the analysis module.</Paragraph>
    <Paragraph position="8"> Also, attribute classi cation is a hard problem and there is no existing classi cation scheme that can be used for open domains like newswire; for example, WordNet (Miller et al., 1993) organises adjectives as concepts that are related by the non-hierarchical relations of synonymy and antonymy (unlike nouns that are related through hierarchical links such as hyponymy, hypernymy and metonymy). In addition, selecting attributes at the semantic level is risky because their linguistic realisation might be ambiguous and many common adjectives are polysemous (cf., example 1 in x3.1). Reference modication, which has not been considered in the referring expression generation literature, raises further issues; for example, referring to an alleged murderer as the murderer is potentially libellous.</Paragraph>
    <Paragraph position="9"> In addition to the above, there is the issue of overlap between values of attributes. The case of subsumption (for example, that the colour red subsumes crimson and the type dog subsumes chihuahua) has received formal treatment in the literature; Dale and Reiter (1995) provide a nd-bestvalue function that evaluates tree-like hierarchies of values. As mentioned earlier, such hierarchical knowledge bases do not exist for open domains. Further, a treatment of subsumption is insuf cient, and degrees of intersection between attribute values also require consideration. van Deemter (2000) discusses the generation of vague descriptions when entities have gradable attributes like size; for example, in a domain with four mice sized 2, 5, 7 and 10cm, it is possible to refer to the large mouse (the mouse sized 10cm) or the two small mice (the mice sized 2 and 5cm). However, when applying referring expression generation to regeneration tasks where the representation of entities is derived from text rather than a knowledge base, we have to consider the case where the grading of attributes is not explicit. For example, we might need to compare the attribute dark with black, light or white.</Paragraph>
    <Paragraph position="10"> In contrast to previous approaches, our algorithm works at the level of words, not semantic labels, and measures the relatedness of adjectives (lexicalised attributes) using the lexical knowledge base Word-Net rather than a semantic classi cation. Our approach also addresses the issue of comparing intersective attributes that are not explicitly graded, by making novel use of the synonymy and antonymy links in WordNet. Further, it treats discriminating power as only one criteria for selecting attributes and allows for the easy incorporation of other considerations such as reference modi cation (x5).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML