File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/j02-1003_abstr.xml

Size: 6,909 bytes

Last Modified: 2025-10-06 13:42:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-1003">
  <Title>Generating Referring Expressions: Boolean Extensions of the Incremental Algorithm</Title>
  <Section position="2" start_page="0" end_page="38" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Generation of referring expressions (GRE) is a key task of most natural language generation (NLG) systems (e.g., Reiter and Dale 2000, Section 5.4). Regardless of the type of knowledge base (KB) forming the input to the generator, many objects will not be designated in it via an ordinary proper name. A person like Mr. Jones, for example, may be designated using an artificial name like #Jones083, if the name Jones is not uniquely distinguishing. The same is true for a piece of furniture, a tree, or an atomic particle, for instance, for which no proper name is in common use at all, or (in most cases) if the generator tries to refer to an entire set of objects. In all such cases, the generator has to &amp;quot;invent&amp;quot; a description that enables the hearer to identify the intended referent.</Paragraph>
    <Paragraph position="1"> In the case of Mr. Jones, for example, the program could identify him by providing his full name and address; in the case of a tree, some longer description may be necessary.</Paragraph>
    <Paragraph position="2"> Henceforth, we will call the intended referent the target of the GRE algorithm.</Paragraph>
    <Paragraph position="3"> The question that we set out to answer is whether existing GRE algorithms produce adequate descriptions whenever such descriptions exist: in short, whether these algorithms are, as we shall say, complete. The paper brings a degree of formal precision to this issue and reveals a number of reasons why current GRE algorithms are incomplete; we sketch remedies and discuss their consequences in terms of linguistic coverage and computational tractability. We take the Incremental Algorithm (Dale and Reiter 1995) to represent the state of the art in this area, and we minimize the deviations from this algorithm. As a result, this paper might be read as an investigation into how widely the ideas underlying the Incremental Algorithm can be used, and the extent to which they may be generalized. The main generalization that we will investigate involves complex Boolean combinations of properties, that is, descriptions that involve more than a merely intersective (i.e., logically conjunctive) combination of properties. Such generalizations are natural because the properties involved are implicitly present in the KB, as we will explain; they become especially relevant when the Information Technology Research Institute (ITRI), University of Brighton, Lewes Road, Brighton BN2 4GJ, UK. E-mail: Kees.van.Deemter@itri.brighton.ac.uk.</Paragraph>
    <Paragraph position="4"> c(c) 2002 Association for Computational Linguistics Computational Linguistics Volume 28, Number 1 algorithms are also generalized to generate references to sets, rather than individual objects. But, before we arrive at these generalizations, we will identify and confront a number of cases in which current GRE algorithms are incomplete even with respect to merely intersective descriptions.</Paragraph>
    <Paragraph position="5"> In this paper, we will deal with &amp;quot;first mention&amp;quot; descriptions only (unlike Dale 1992, Chapter 5; Mittal et al. 1998; Kibble 1999), assuming that the information used for generating the description is limited to a KB containing complete information about which properties are true of each object. Also, we focus on &amp;quot;one shot&amp;quot; descriptions, disregarding cases where an object is described through its relations with other objects (Dale and Haddock 1991; Horacek 1997; Krahmer, van Erk, and Verleg 2001).</Paragraph>
    <Paragraph position="6"> More crucially, we follow Dale and Reiter (1995) in focusing on the semantic content of a description (i.e., the problem of content determination, for short), assuming that any combination of properties can be expressed by the NLG module responsible for linguistic realization. This modular approach allows us to separate logical aspects of generation (which are largely language independent) from purely linguistic aspects, and it allows the realization module to base its decisions on complete information about which combination of properties is to be realized. Accordingly, when we write Generation of Referring Expressions or GRE, we will refer specifically to determination of the semantic content of a description. Analogously, the word description will refer to the semantic content of a linguistic expression only. Note that our modular approach makes it unnatural to assume that a description is always expressed by a single noun phrase: if several sentences are needed, then so be it.</Paragraph>
    <Paragraph position="7"> After summarizing the Incremental Algorithm in Section 2, in Section 3 we take a closer look at the algorithm in its standard, &amp;quot;intersective&amp;quot; form, in which it identifies an object by intersecting a number of atomic properties. We discuss cases in which this algorithm fails to find an adequate description even though such a description exists, and we propose a number of possible remedies. Having extablished a completeness result for a version of the intersective Incremental Algorithm, we turn to questions of completeness that involve more complex Boolean combinations in Section 4. In Section 5, we summarize the main results of our exploration and put them in perspective.</Paragraph>
    <Paragraph position="8"> 2. Dale and Reiter (1995): The Incremental Algorithm The Incremental Algorithm of Dale and Reiter (1995) singles out a target object from among some larger domain of entities. It does this by logically conjoining a number of properties found in a part of the KB that represents information shared between speaker and hearer. The authors observed that the problem of finding a (&amp;quot;Full Brevity&amp;quot;) description that contains the minimum number of properties is computationally intractable (i.e., NP Hard). They combined this with the known fact that speakers often produce nonminimal descriptions anyway (e.g., Pechman 1989). Accordingly, they proposed an algorithm that only approximates Full Brevity, while being of only linear complexity. Our summary of the algorithm glosses over many details, yet still allows us to discuss completeness. In particular, we disregard any special provisions that might be made for the selection of head nouns because, arguably, this has to involve realizational issues.</Paragraph>
    <Paragraph position="9">  determination. Head nouns can also be selected during linguistic realization or by interleaving of content determination and realization (e.g., Horacek 1997; Stone and Webber 1998; Krahmer and Theune 1999).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML