File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/w01-0902_concl.xml
Size: 1,395 bytes
Last Modified: 2025-10-06 13:53:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0902"> <Title>Empirical Methods for Evaluating Dialog Systems</Title> <Section position="5" start_page="2" end_page="2" type="concl"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> Instead of focusing on developing new dialog metrics that allow for comparative judgments across different systems and domain tasks, we proposed empirical methods that accomplish the same purpose while taking advantage of dialog metrics that already exist. In particular, we outlined an experimental protocol for conducting a WOZ study to collect human performance data that can serve as a gold standard. We then described how to substantiate performance claims using both a benchmark graph and a gold impurity graph. Finally, we explained how to optimize a dialog system using component analysis and value optimization.</Paragraph> <Paragraph position="1"> Without a doubt, the greatest drawback to the empirical methods proposed is the tremendous cost of conducting WOZ studies, both in terms of time and money. In special circumstances, such as the Communicator Project, where participants all work within the same domain task, DARPA itself might finance WOZ studies for evaluation on behalf of the participants. Nonparticipants may resort to average marginal cost to optimize their own expenditure.</Paragraph> </Section> class="xml-element"></Paper>