File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1056_intro.xml

Size: 5,700 bytes

Last Modified: 2025-10-06 14:01:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1056">
  <Title>Evaluating a Trainable Sentence Planner for a Spoken Dialogue System</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The past several years have seen a large increase in commercial dialog systems. These systems typically use system-initiative dialog strategies, with system utterances highly scripted for style and register and recorded by voice talent. However several factors argue against the continued use of these simple techniques for producing the system side of the conversation. First, text-to-speech has improved to the point of being a viable alternative to pre-recorded prompts. Second, there is a perceived need for spoken dialog systems to be more flexible and support user initiative, but this requires greater flexibility in utterance generation. Finally, systems to support complex planning are being developed, which will require more sophisticated output.</Paragraph>
    <Paragraph position="1"> As we move away from systems with pre-recorded prompts, there are two possible approaches to producing system utterances. The first is template-based generation, where utterances are produced from hand-crafted string templates. Most current research systems use template-based generation because it is conceptually straightforward. However, while little or no linguistic training is needed to write templates, it is a tedious and time-consuming task: one or more templates must be written for each combination of goals and discourse contexts, and linguistic issues such as subject-verb agreement and determiner-noun agreement must be repeatedly encoded for each template. Furthermore, maintenance of the collection of templates becomes a software engineering problem as the complexity of the dialog system increases.1 The second approach is natural language generation (NLG), which customarily divides the generation process into three modules (Rambow and Korelsky, 1992): (1) Text Planning, (2) Sentence Planning, and (3) Surface Realization. In this paper, we discuss only sentence planning; the role of the sentence planner is to choose abstract lexico-structural resources for a text plan, where a text plan encodes the communicative goals for an utterance (and, sometimes, their rhetorical structure). In general, NLG promises portability across application domains and dialog situations by focusing on the development of rules for each generation module that are general and domain1Although we are not aware of any software engineering studies of template development and maintenance, this claim is supported by abundant anecdotal evidence.</Paragraph>
    <Paragraph position="2"> independent. However, the quality of the output for a particular domain, or a particular situation in a dialog, may be inferior to that of a template-based system without considerable investment in domain-specific rules or domain-tuning of general rules. Furthermore, since rule-based systems use sophisticated linguistic representations, this handcrafting requires linguistic knowledge.</Paragraph>
    <Paragraph position="3"> Recently, several approaches for automatically training modules of an NLG system have been proposed (Langkilde and Knight, 1998; Mellish et al., 1998; Walker, 2000). These hold the promise that the complex step of customizing NLG systems by hand can be automated, while avoiding the need for tedious hand-crafting of templates. While the engineering benefits of trainable approaches appear obvious, it is unclear whether the utterance quality is high enough.</Paragraph>
    <Paragraph position="4"> In (Walker et al., 2001) we propose a new model of sentence planning called SPOT. In SPOT, the sentence planner is automatically trained, using feedback from two human judges, to choose the best from among different options for realizing a set of communicative goals. In (Walker et al., 2001), we evaluate the performance of the learning component of SPOT, and show that SPOT learns to select sentence plans that are highly rated by the two human judges.</Paragraph>
    <Paragraph position="5"> While this evaluation shows that SPOT has indeed learned from the human judges, it does not show that using only two human judgments is sufficient to produce more broadly acceptable results, nor does it show that SPOT performs as well as optimized hand-crafted template or rule-based systems. In this paper we address these questions.</Paragraph>
    <Paragraph position="6"> Because SPOT is trained on data from a working system, we can directly compare SPOT to the hand-crafted, template-based generation component of the current system. In order to perform an exhaustive comparison, we also implemented two rule-based and two baseline sentence-planners.</Paragraph>
    <Paragraph position="7"> One baseline simply produces a single sentence for each communicative goal. Another baseline randomly makes decisions about how to combine communicative goals into sentences. We directly compare these different approaches in an evaluation experiment in which 60 human subjects rate each system's output on a scale of 1 to 5.</Paragraph>
    <Paragraph position="8"> The experimental design is described in section  2. The sentence planners used in the evaluation are described in section 3. In section 4, we present our results. We show that the trainable sentence planner performs better than both rule-based systems and as well as the hand-crafted template-based system. These four systems outperform the baseline sentence planners. Section 5 summarizes our results and discusses related and future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML