XML Viewer - w06-1420

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-1420_abstr.xml
Size: 4,006 bytes
Last Modified: 2025-10-06 13:45:21
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1420">
  <Title>Building a semantically transparent corpus for the generation of referring expressions</Title>
  <Section position="2" start_page="0" end_page="130" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper discusses the construction of a corpus for the evaluation of algorithms that generate referring expressions. It is argued that such an evaluation task requires a semantically transparent corpus, and controlled experiments are the best way to create such a resource. We address a number of issues that have arisen in an ongoing evaluation study, among which is the problem of judging the output of GRE algorithms against a human gold standard.</Paragraph>
    <Paragraph position="1"> 1 Creating and using a corpus for GRE A decade ago, Dale and Reiter (1995) published a seminal paper in which they compared a number of GRE algorithms. These algorithms included a Full Brevity (FB) algorithm which generates descriptions of minimal length, a greedy algorithm (GA), and an Incremental Algorithm (IA). The authors argued that the latter was the best model of human referential behaviour, and versions of the IA have since come to represent the state of the art in GRE. Dale and Reiter's hypothesis was motivated by psycholinguistic findings, notably that speakers tend to initiate references before they have completely scanned a domain.</Paragraph>
    <Paragraph position="2"> However, this finding affords different algorithmic interpretations. Similarly, the finding that basic-level terms in referring expressions allow hearers to form a psychological gestalt could be incorporated into practically any GRE algorithm.1 We decided to put Dale and Reiter's hypothesis to the test by an evaluation of the output of dif1A separate argument for IA involves tractability, but although some alternatives (such as FB) are intractable, others (such as GA) are only polynomial, and can therefore not easily be dismissed on purely computational grounds. ferent GRE algorithms against human production.</Paragraph>
    <Paragraph position="3"> However, it is notoriously difficult to obtain suitable corpora for a task that is as semantically intensive as Content Determination (for GRE). Although existing corpora are valuable resources, NLG often requires information that is not available in text. Suppose, for example, that a corpus contained articles about politics, how would the output of a GRE algorithm be evaluated against the corpus? It would be difficult to infer from an article exactly which representatives in the British House of Commons are Liberal Democrats, or Scottish. Combining multiple texts is hazardous, since facts could alter across sources and time.</Paragraph>
    <Paragraph position="4"> Moreover, the conditions under which such texts were produced (e.g. fault-critical or not, as explained below) are hard to determine.</Paragraph>
    <Paragraph position="5"> A recent GRE evaluation by Gupta and Stent (2005) focused on dialogue corpora, using MAPTASK and COCONUT, both of which have an associated domain. Their results show that referent identification in MAPTASK often requires no more than a TYPE attribute, so that none of the algorithms performed better than a baseline. In contrast to MAPTASK, COCONUT has a more elaborate domain, but it is characterised by a collaborative task, and references frequently go beyond the identification criterion that is typically invoked in GRE2. Mindful of the limitations of existing corpora, and of the extent to which evaluation depends on the corpus under study, we are using controlled experiments to create a corpus whose construction will ensure that existing algorithms can be adequately differentiated on an identification task.</Paragraph>
    <Paragraph position="6"> 2Jordan and Walker (2000) have demonstrated a significantly better match to the human data when task-related constraints are taken into account.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML