File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1614_intro.xml

Size: 2,004 bytes

Last Modified: 2025-10-06 14:01:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1614">
  <Title>Empirical Methods for Evaluating Dialog Systems</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In evaluating the performance of dialog systems, designers face a number of complicated issues.</Paragraph>
    <Paragraph position="1"> On the one hand, dialog systems are ultimately created for the user, so usability factors such as satisfaction or likelihood of future use should be the final criteria. On the other hand, because usability factors are subjective, they can be erratic and highly dependent on features of the user interface (Kamm et al., 1999). So, designers have turned to &amp;quot;objective&amp;quot; metrics such as dialog success rate or completion time.</Paragraph>
    <Paragraph position="2"> Unfortunately, due to the interactive nature of dialog, these metrics do not always correspond to the most effective user experience (Lamel et al., 2000). Furthermore, several different metrics may contradict one another (Kamm et al., 1999), leaving designers with the tricky task of untangling the interactions or correlations between metrics.</Paragraph>
    <Paragraph position="3"> Instead of focusing on developing a new metric that circumvents the problems above, we maintain that designers need to make better use of the ones that already exist. Toward that end, we first examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or &amp;quot;gold standard&amp;quot; for comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML