File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2129_intro.xml
Size: 7,356 bytes
Last Modified: 2025-10-06 14:06:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2129"> <Title>Evaluating Response Strategies in a Web-Based Spoken Dialogue Agent</Title> <Section position="3" start_page="0" end_page="781" type="intro"> <SectionTitle> 2 TOOT </SectionTitle> <Paragraph position="0"> TOOT allows users to access online AMTRAK train schedules via a telephone dialogue, as in Figure 1 .I (All examples are from the experiment in Section 3.) We have built two versions of TOOT: literal TOOT (LT) and cooperative TOOT (CT). LT and CT have equivalent functionality, but use different response strategies to present tabular results of web queries in a displayless environment) LT and CT incorpo- null and Gerbino, 1995) strategies are equivalent in LT and CT, and were chosen to reduce speech recognition errors.</Paragraph> <Paragraph position="1"> T: Hi, this is the Amtrak train schedule system. Which city do you want to go to? U: Philadelphia.</Paragraph> <Paragraph position="2"> T: What city do you want to go to Philadelphia from? U: New York City.</Paragraph> <Paragraph position="3"> T: On what day of the week do you want to go to Philadelphia from New York City? U: On the weekend.</Paragraph> <Paragraph position="4"> T: What time do you want to leave on the weekend? U: 4o'clockpm.</Paragraph> <Paragraph position="5"> T: Do you want me to find the trains going to Philadelphia from New York City on the weekend around 4 o' clock pm now?.</Paragraph> <Paragraph position="6"> U: Yes.</Paragraph> <Paragraph position="7"> T: I am going to query the Amtrak website for you. This may take some time if the internet traffic is heavy. Please wait patiently. \[30 second pause\] There are 7 trains available around 4 o'clock pro. Here are the first 3 trains. The 1st train leaves at 3:00 pm on Saturday and StoMa), and it takes 1 hour 12 rains. The 2nd train leaves at 3:20 pm ever)' da3, and it takes I hour 22 rains. The 3rd train leaves at 4:00 pm on Sunda), and it takes 1 hour 12 rains. Do you want to hear the rest of the trains? U: No.</Paragraph> <Paragraph position="8"> T: Do you want to continue and find a new set of trains? U: No.</Paragraph> <Paragraph position="9"> T: Thank you for using the Amtrak schedule system. See you next time.</Paragraph> <Paragraph position="10"> Figure 1 : Example dialogue with (literal) TOOT. literature into relatively literal and cooperative response strategies. (More sophisticated cooperative strategies could be imagined.) When there is too much information to present in a single utterance, LT groups the information into units of 3 trains, then presents each unit, as in the italicized portion of Figure 1. In contrast, CT summarizes the range of trains available, then tells the user to either list the trains or further constrain the query. In CT, the italicized portion of Figure 1 would be replaced with the following response: (1) There are 7 trains available around 4 o'clock pro. Here is the earliest train we have. The frst train leaves at 3:00 pm on Saturday and Sunday, and it takes 1 hour 12 rains. Here is the latest train we have. The seventh train leaves at 5:OOpm on Saturda); and it takes I hour 12 rains. Please say &quot;list&quot; to hear trains 3 at a time, or say &quot;add constraint&quot; to constrain your departure time or travel day, or say &quot;continue&quot; if nO' answer was sufficient, or say &quot;repeat&quot; to hear this message agahz.</Paragraph> <Paragraph position="11"> LT's response incrementally presents the set of trains that match the query, until the user tells LT to stop. Enumerating large lists, even incrementally, can lead to information overload. CT's response is more cooperative because it better respects the resource limitations of the listener. CT presents a subset of the matching trains using a summary response (Pao and Wilpon, 1992), followed by an option to reduce the information to be retrieved (Pieraccini et al., 1997; Goddeau et al., 1996; Seneff et al., 1995; Pao and Wilpon, 1992).</Paragraph> <Paragraph position="12"> If there is no information that matches a query, LT reports only the lack of an answer to the query, as in the following dialogue excerpt: (2) There are no trains going to Chicago from Philadelphia on Sunday around 10:30 am. Do you want to continue and find a new set of trains? CT automatically relaxes the user's time constraint and allows the user to perform other relaxations: (3) There are no trains going to Chicago front Philadelphia on Sunday around 10:30 ant. The closest earlier train leaves at 9:28 am ever), da3; and it takes I day 3 hours 36 rains. The closest later train leaves at 11:45 ant on Saturday and Sunda3; and it takes 22 hours 5 rains. Please say &quot;relax&quot; to change your departure time or travel da3; or say &quot;continue&quot; if n O' answer was sufficient, or say &quot;repeat&quot; to hear this message again.</Paragraph> <Paragraph position="13"> CT's response is more cooperative since identifying the source of a query failure can help block incorrect user inferences (Pieraccini et al., 1997; Pao and Wilpon, 1992; Joshi et al., 1984; Kaplan, 1981; Mays, 1980). LT's response could lead the user to believe that there are no trains on Sunday.</Paragraph> <Paragraph position="14"> When there are 1-3 trains that match a query, both LT and CT list the trains: (4) There are 2 trains available around6 pro. The first train leaves at 6:05 pm ever), day and it takes 5 hours 10 rains. The second train leaves at 6:30 pm ever), da); and it takes 2 days 11 hours 30 rains. Do you want to continue and find a new set of trains? TOOT is implemented using a platform for spoken dialogue agents (Kamm et al., 1997) that combines automatic speech recognition (ASR), text-to-speech (TTS), a phone interface, and modules for specifying a dialogue manager and application functions. ASR in our platform supports barge-in, an advanced functionality which allows users to interrupt an agent when it is speaking.</Paragraph> <Paragraph position="15"> The dialogue manager uses a finite state machine to implement dialogue strategies. Each state specifies 1) an initial prompt (or response) which the agent says upon entering the state (such prompts often elicit parameter values); 2) a helpprompt which the agent says if the user says help; 3) rejection prompts which the agent says if the confidence level of ASR is too low (rejection prompts typically ask the user to repeat or paraphrase their utterance); and 4) timeout prompts which the agent says if the user doesn't say anything within a specified time frame (timeout prompts are often suggestions about what to say). A context-free grammar specifies what ASR can recognize in each state. Transitions between states are driven by semantic interpretation.</Paragraph> <Paragraph position="16"> TOOT' s application functions access and process information on AMTRAK'S web site. Given a set of constraints, the functions return a table listing all matching trains in a specified temporal interval, or within an hour of a specified timepoint. This table is converted to a natural language response which can be realized by TTS through the use of templates for either the LT or the CT response type; values in the table instantiate template variables.</Paragraph> </Section> class="xml-element"></Paper>