File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1304_evalu.xml

Size: 11,122 bytes

Last Modified: 2025-10-06 13:59:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1304">
  <Title>Interactive Question Answering and Constraint Relaxation in Spoken Dialogue Systems</Title>
  <Section position="7" start_page="31" end_page="33" type="evalu">
    <SectionTitle>
6 Evaluation
</SectionTitle>
    <Paragraph position="0"> We conducted experimental studies involving 20 subjects in a MP3 player task and 20 subjects in a restaurant selection task. In the following, we concentrate on the restaurant selection task because it is more challenging for constraint handling and information presentation.</Paragraph>
    <Paragraph position="1"> Each subject in the restaurant selection task was given 9 scenario descriptions involving 3 constraints. Subjects were instructed to use their own words to find a fitting restaurant. We use a back-end database of 2500 restaurants containing the following information for each restaurant: restaurant name, cuisine type, city and street names, service level, rating, whether they accept credit cards or reservations, price level, open hours, dress code, additional information (e.g. vegetarian food) and distance from current location. Some of these constraints have a fixed number of values (e.g.</Paragraph>
    <Paragraph position="2"> service level and dress code), whereas those taking named entities as values are obviously openended. null Figures 1 and 2 show two of the dialogues from the experiments. To exemplify the challenges the system is faced with, user turns are shown in the form of the output of the speech recognizer (we currently use the best ASR hypothesis as input to the parser). The task of the first dialogue (figure 1) was to find a restaurant that has the following constraints: [Reservations=yes, ServiceLevel:excellent, Rating=5star]. Despite some minor speech recognition errors, the system correctly builds a query for the two constraints provided by the user in (1.1.) and verbalizes these in (1.2) to show its understanding of the user input.</Paragraph>
    <Paragraph position="3"> It also gives some example items and makes a suggestion. This suggestion is not taken up by the user  turn speaker utterance</Paragraph>
    <Section position="1" start_page="32" end_page="32" type="sub_section">
      <SectionTitle>
2.1 User breakfast
2.2 System Are you looking for a restaurant that is open for breakfast ?
2.3 User yes located on bower street
2.4 System I found 71 restaurants on bower street that serve breakfast , for example ,
</SectionTitle>
      <Paragraph position="0"> SOUPLANTATION-TO-GO , RAMAYANI WESTWOOD , and , PHILLY WEST .</Paragraph>
      <Paragraph position="1"> You could try to look for cuisine type .</Paragraph>
    </Section>
    <Section position="2" start_page="32" end_page="33" type="sub_section">
      <SectionTitle>
2.5 User which
2.6 System I'm sorry, I do not understand what you mean by: which
2.7 System Please try rephrasing that.
2.8 User casual dress
2.9 System There are 51 restaurants on bower street that serve breakfast and have a casual
</SectionTitle>
      <Paragraph position="0"> dress code in my database , for example , BERTY'S , LABOTETEGA , and ,  - in fact, we find that suggestions are generally not taken up by the user. We believe this is due to the nature of the tasks, which specified exactly which criteria to match. On the other hand, in more open application scenarios, where users may not know what questions can be asked, suggestions may be useful. In (1.3) the user issues a sub-query that further constrains the result set. By again summarizing the constraints used, the system confirms in (1.4) that it has interpreted the new constraint as a revision of the previous query. The alternative is to start a new query, which would be wrong in this context.</Paragraph>
      <Paragraph position="1"> The task of the second dialogue, figure 2, was to find a restaurant that meets the constraints [BusinessHours:breakfast, StreetName='bower street', DressCode=casual]. This user tends to give shorter, keyword-style input to the system (2.1, 2.8). In (2.3), the user reacts to a clarification question and adds another constraint which the system summarizes in (2.4). (2.5) is an ASR error which the system cannot handle (2.6, 2.7). The user constraint of (2.8) is correctly used to revise the query (2.9), but &amp;quot;british&amp;quot; (2.10) is another ASR error that leads to a cuisine constraint not intended in the scenario/by the user. This additional constraint yields an empty result set, from which the system recovers automatically by relaxing the hierarchically organized cuisine constraint to &amp;quot;European food&amp;quot;. In (2.11) the system uses dialogue strategy s3b for medium-sized result sets with constraint modifications (section 4). The result of both dialogues is that all task constraints are met.</Paragraph>
      <Paragraph position="2"> We conducted 20 experiments in the restaurant domain, 2 of which were restarted in the middle.</Paragraph>
      <Paragraph position="3"> Overall, 180 tasks were performed involving 1144 user turns and 1818 system turns. Two factors contributing to the higher number of system turns are a) some system turns are counted as two turns, such as 2.6, 2.7 in figure 2, and b) restaurants in longer enumerations of result items are counted as individual turns. On average, user utterances are significantly shorter than system utterances (4.9 words, standard deviation s = 3.82 vs 15,4 words, s = 13.53). This is a result of the 'constraint summaries' produced by the generator. The high standard deviation of the system utterances can be explained by the above-mentioned listing of individual result items (e.g. utterance (2.12) in figure 2). We collected usage frequencies for the dialogue strategies presented in section 4: there was no occurrence of empty final result sets (strategy s1a/b) because the system successfully relaxed constraints if it initially obtained no results. Strategy s2a (small result sets without modifications) was used for 61 inputs, i.e. constraint sets constructed from user utterances. Strategy s3a/b (medium-sized result sets) was used for 217 times and required constraint relaxations in 5 cases.</Paragraph>
      <Paragraph position="4"> Strategy s4a/b (large result sets) was used for  316 inputs and required constraint relaxations in 16 cases. Thus, the system performed constraint modifications in 21 cases overall. All of these yielded non-empty final result sets. For 573 inputs, no modification was required. There were no empty final result set despite modifications.</Paragraph>
      <Paragraph position="5"> On average, the generator produced 16 output candidates for inputs of two constraints, 160 candidates for typical inputs of 3 constraints and 320 candidates for 4 constraints. Such numbers can easily be handled by simply enumerating candidates and selecting the 'best' one.</Paragraph>
      <Paragraph position="6"> Task completion in the experiments was high: the subjects met all target constraints in 170 out of 180 tasks, i.e. completion rate was 94.44%. An error analysis revealed that the reasons for only partially meeting the task constraints were varied.</Paragraph>
      <Paragraph position="7"> For example, in one case a rating constraint (&amp;quot;five stars&amp;quot;) was interpreted as a service constraint by the system, which led to an empty result set. The system recovered from this error by means of constraint relaxation but the user seems to have been left with the impression that there are no restaurants of the desired kind with a five star rating.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="33" end_page="33" type="evalu">
    <SectionTitle>
7 Discussion
</SectionTitle>
    <Paragraph position="0"> Based on wizard-of-oz data, the system alternates specific and unspecific refinement suggestions (&amp;quot;You could search by cuisines type&amp;quot; vs &amp;quot;Can you refine your query?&amp;quot;). Furthermore, many of the phrases used by the generator are taken from wizard-of-oz data too. In other words, the system, including the generator, is informed by empirical data but does not use this data directly (Reiter and Dale, 2000). This is in contrast to generation systems such as the ones described in (Langkilde, 2000) and (Varges and Mellish, 2001).</Paragraph>
    <Paragraph position="1"> Considering the fact that the domain ontology and database schema are known in advance, it is tempting to make a closed world assumption in the generator (which could also help system development and testing). However, this seems too restrictive: assume, for example, that the user has asked for Montenegrin food, which is an unknown cuisine type, and that the statistical parser combined with the parse-matching patterns in the dialogue manager has labeled this correctly. The content optimization module will remove this constraint since there is no Montenegrin restaurant in the database. If we now want to generate &amp;quot;I did not find any restaurants that serve Montenegrin food ...&amp;quot;, we do need to be able to use generation input that uses unseen attribute-value pairs. The price one has to pay for this increased robustness and flexibility is, of course, potentially bad output if NLU mislabels input words. More precisely, we find that if any one of the interpretation modules makes an open-world assumption, the generator has to do as well, at least as long as we want to verbalize the output of that module.</Paragraph>
    <Section position="1" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
7.1 Future work
</SectionTitle>
      <Paragraph position="0"> Our next application domain will be in-car navigation dialogues. This will involve dialogues that define target destinations and additional route planning constraints. It will allow us to explore the effects of cognitive constraints due to changing driving situations on dialogue behavior. The navigation domain may also affect the point of interaction between dialogue system and external devices: we may query a database to disambiguate proper names such as street names as soon as these are mentioned by the user, but start route planning only when all planning constraints are collected.</Paragraph>
      <Paragraph position="1"> An option for addressing the current lack of a user model is to extend the work in (Cheng et al., 2004). They select the level of detail to be communicated to the user by representing the driver's route knowledge to avoid repeating known information. null Another avenue of future research is to automatically learn constraint relaxation strategies from (appropriately annotated) evaluation data. User modeling could be used to influence the order in which refinement suggestions are given and determine the thresholds for the information presentation moves described in section 4.</Paragraph>
      <Paragraph position="2"> One could handle much larger numbers of generation candidates either by using packing (Langkilde, 2000) or by interleaving rule-based generation with corpus-based pruning (Varges and Mellish, 2001) if complexity should become an issue when doing overgeneration.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML