File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2302_intro.xml

Size: 4,612 bytes

Last Modified: 2025-10-06 14:02:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2302">
  <Title>Stochastic Language Generation in a Dialogue System: Toward a Domain Independent Generator</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
6.
2 Background
</SectionTitle>
    <Paragraph position="0"> Template-based approaches have been widely used for surface generation. This has traditionally been the case because the many other areas of NLP research (speech recognition, parsing, knowledge representation, etc.) within a dialogue system require an output form to indicate the algorithms are functional. Templates are created very cheaply, but provide a rigid, inflexible output and poor text quality. See Reiter (Reiter, 1995) for a full discussion of templates. Dialogue systems particularly suffer as understanding is very dependent on the naturalness of the output.</Paragraph>
    <Paragraph position="1"> Rule-based generation has developed as an alternative to templates. Publicly available packages for this type of generation take strides toward independent generation.</Paragraph>
    <Paragraph position="2"> However, a significant amount of linguistic information is usually needed in order to generate a modest utterance.</Paragraph>
    <Paragraph position="3"> This kind of detail is not available to most domain independent dialogue systems. A smaller, domain-specific rule-based approach is difficult to port to new domains.</Paragraph>
    <Paragraph position="4"> The corpus-based approach to surface generation does not use large linguistic databases but rather depends on language modeling of corpora to predict correct and natural utterances. The approach is attractive in comparison to templates and rule-based approaches because the language models implicitly encode the natural ordering of English. Recently, the results from corpus-based surface generation in dialogue systems have been within specific domains, the vast majority of which have used the Air Travel Domain with Air Travel corpora.</Paragraph>
    <Paragraph position="5"> Ratnaparkhi (Ratnaparkhi, 2000; Ratnaparkhi, 2002) and Oh and Rudnicky (Oh and Rudnicky, 2000) both studied surface generators for the air travel domain. Their input semantic form is a set of attribute-value pairs that are specific to the airline reservation task. The language models were standard n-gram approaches that depended on a tagged air travel corpus for the attribute types. Both groups ran human evaluations; Ratnaparkhi studied a 2 subject evaluation (with marks of OK,Good,Bad) and Oh and Rudnicky studied 12 subjects that compared the output between a template generator and the corpus-based approach. The latter showed no significant difference.</Paragraph>
    <Paragraph position="6"> Most recently, Chen et al. utilized FERGUS (Bangalore and Rambow, 2000) and attempted to make it more domain independent in (Chen et al., 2002). There are two stochastic processes in FERGUS; a tree chooser that maps an input syntactic tree to a TAG tree, and a trigram language model that chooses the best sentence in the lattice. They found that a domain-specific corpus performs better than a Wall Street Journal (WSJ) corpus for the tri-gram LM. Work was done to try and use an independent LM, but (Rambow et al., 2001) found interrogatives to be unrepresented by a WSJ model and fell back on air travel models. This problem was not discussed in (Chen et al., 2002). Perhaps automatically extracted trees from the corpora are able to create many good and few bad possibilities that the LM might choose.</Paragraph>
    <Paragraph position="7"> (Chen et al., 2002) is the first paper to this author's knowledge that attempts to create a stochastic domain independent generator for dialogue systems. One of the main differences between FERGUS and this paper's approach is that the input to FERGUS is a deep syntactic tree. Our approach integrates semantic input, reducing the need for large linguistic databases and allowing the LM to choose the correct forms. We are also unique in that we are intentionally using two out-of-domain language models. Most of the work on FERGUS and the previous surface generation evaluations in dialogue systems are dependent on English syntax and word choice within the air travel domain. The final generation system cannot be ported to a new domain without further effort. By creating grammar rules that convert a semantic form, some of these restrictions can be removed. The next section describes our stochastic approach and how it was modified from machine translation to spoken dialogue. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML