File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-1614_evalu.xml
Size: 1,128 bytes
Last Modified: 2025-10-06 13:58:48
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1614"> <Title>Empirical Methods for Evaluating Dialog Systems</Title> <Section position="6" start_page="2" end_page="2" type="evalu"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> Instead of focusing on developing new dialog metrics that allow for comparative judgments across different systems and domain tasks, we proposed empirical methods that accomplish the same purpose while taking advantage of dialog metrics that already exist. In particular, we outlined a protocol for conducting a WOZ experiment to collect human performance data that can be used as a gold standard. We then described how to substantiate performance claims using both a benchmark graph and a gold impurity graph. Finally, we explained how to optimize a dialog system using component analysis and value optimization.</Paragraph> <Paragraph position="1"> Without a doubt, the greatest drawback to the empirical methods we propose is the tremendous cost of running WOZ studies, both in terms of time and money. In special cases, such as the</Paragraph> </Section> class="xml-element"></Paper>