File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1404_metho.xml

Size: 6,915 bytes

Last Modified: 2025-10-06 14:10:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1404">
  <Title>Overgeneration and ranking for spoken dialogue systems</Title>
  <Section position="4" start_page="0" end_page="21" type="metho">
    <SectionTitle>
2 Overgeneration for spoken dialogue
</SectionTitle>
    <Paragraph position="0"> Table 1 shows some example outputs of the system. The wording of the realizations is informed by a wizard-of-oz data collection. The task of the generator is to produce these verbalizations given dialogue strategy, constraints and further discourse context, i.e. the input to the generator is non-linguistic. We perform mild overgeneration of candidate moves, followed by ranking. The highest-ranked candidate is selected for output.</Paragraph>
    <Section position="1" start_page="0" end_page="20" type="sub_section">
      <SectionTitle>
2.1 Chart generation
</SectionTitle>
      <Paragraph position="0"> We follow a bottom-up chart generation approach (Kay, 1996) for production systems similar to (Varges, 2005). The rule-based core of the generator is a set of productions written in a production system. Productions map individual database constraints to phrases such as &amp;quot;open for lunch&amp;quot;, &amp;quot;within 3 miles&amp;quot;, &amp;quot;a formal dress code&amp;quot;, and recursively combine them into NPs. This includes the use of coordination to produce &amp;quot;restaurants with a 5-star rating and a formal dress code&amp;quot;, for example. The NPs are integrated into sentence templates, several of which can be combined</Paragraph>
      <Paragraph position="2"> Last column: frequency in user study (180 tasks, 596 constraint inputs to generator) to form an output candidate turn. For example, a constraint realizing template &amp;quot;I found no [NPoriginal] but there are [NUM] [NP-optimized] in my database&amp;quot; can be combined with a follow-up sentence template such as &amp;quot;You could try to look for [NP-constraint-suggestion]&amp;quot;. 'NP-original' realizes constraints directly constructed from the user utterance; 'NP-optimized' realizes potentially modified constraints used to obtain the actual query result. To avoid generating separate sets of NPs independently for these two - often largely overlapping - constraint sets, we assign unique indices to the input constraints, overgenerate NPs and check their indices.</Paragraph>
      <Paragraph position="3"> The generator maintains state across dialogue turns, allowing it to track its previous decisions (see 'variation' below). Both input constraints and chart edges are indexed by turn numbers to avoid confusing edges of different turns.</Paragraph>
      <Paragraph position="4"> We currently use 102 productions overall in the restaurant and MP3 domains, 38 of them to generate NPs that realize 19 input constraints.</Paragraph>
    </Section>
    <Section position="2" start_page="20" end_page="21" type="sub_section">
      <SectionTitle>
2.2 Ranking: alignment &amp; variation
</SectionTitle>
      <Paragraph position="0"> Alignment Alignment is a key to successful natural language dialogue (Brockmann et al., 2005).</Paragraph>
      <Paragraph position="1"> We perform alignment of system utterances with user utterances by computing an ngram-based overlap score. For example, a user utterance &amp;quot;I want to find a Chinese restaurant&amp;quot; is presented by the bag-of-words {'I', 'want', 'to', 'find', ...} and the bag-of-bigrams {'I want', 'want to', 'to find', ...}. We compute the overlap with candidate system utterances represented in the same way and combine the unigram and bigram match scores.</Paragraph>
      <Paragraph position="2"> Words are lemmatized and proper nouns of example items removed from the utterances.</Paragraph>
      <Paragraph position="3"> Alignment allows us to prefer &amp;quot;restaurants that serve Chinese food&amp;quot; over &amp;quot;Chinese restaurants&amp;quot; if the user used a wording more similar to the first. The Gricean Maxim of Brevity, applied to NLG in (Dale and Reiter, 1995), suggests a preference for the second, shorter realization. However, if the user thought it necessary to use &amp;quot;serves&amp;quot;, maybe to correct an earlier mislabeling by the classifier/parse-matching patterns, then the system should make it clear that it understood the user correctly by using those same words. On the other hand, a general preference for brevity is desirable in spoken dialogue systems: users are generally not willing to listen to lengthy synthesized speech.</Paragraph>
      <Paragraph position="4"> Variation We use a variation score to 'cycle' over sentence-level paraphrases. Alternative candidates for realizing a certain input move are given a unique alternation ('alt') number in increasing order. For example, for the simple move continuation query we may assign the following alt values: &amp;quot;Do you want more?&amp;quot; (alt=1) and &amp;quot;Do you want me to continue?&amp;quot; (alt=2). The system cycles over these alternatives in turn. Once we reach alt=2, it starts over from alt=1. The actual alt 'score' is inversely related to recency and normalized to [0...1].</Paragraph>
      <Paragraph position="5"> Score combination The final candidate score is a linear combination of alignment and variation scores:</Paragraph>
      <Paragraph position="7"> where l1,l2 [?] {0...1}. A high value of l1 places more emphasis on alignment, a low value yields candidates that are more different from previously chosen ones. In our experience, alignment should be given a higher weight than variation, and, within alignment, bigrams should be  weighted higher than unigrams, i.e. l1 &gt; 0.5 and l2 &lt; 0.5. Deriving weights empirically from corpus data is an avenue for future research.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="21" end_page="21" type="metho">
    <SectionTitle>
3 User study
</SectionTitle>
    <Paragraph position="0"> Each of 20 subjects in a restaurant selection task was given 9 scenario descriptions involving 3 constraints. We use a back-end database of 2500 restaurants containing the 13 attributes/constraints for each restaurant.</Paragraph>
    <Paragraph position="1"> On average, the generator produced 16 output candidates for inputs of two constraints, 160 candidates for typical inputs of 3 constraints and 320 candidates for 4 constraints. For larger constraint sets, we currently reduce the level of overgeneration but in the future intend to interleave overgeneration with ranking similar to (Varges, 2002). Task completion in the experiments was high: the subjects met all target constraints in 170 out of 180 tasks, i.e. completion rate was 94.44%. To the question &amp;quot;The responses of the system were appropriate, helpful, and clear.&amp;quot; (on a scale where 1 = 'strongly agree', 5 = 'strongly disagree'), the subjects gave the following ratings: 1: 7, 2: 9, 3: 2, 4: 2 and 5: 0, i.e. the mean user rating is 1.95.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML