File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1073_intro.xml

Size: 4,469 bytes

Last Modified: 2025-10-06 14:01:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1073">
  <Title>[6] K. Hacioglu, W. Ward, &amp;quot;Dialog-Context Dependent Language Modeling Using N-Grams and Stochastic Context-Free Grammars&amp;quot;,</Title>
  <Section position="3" start_page="1" end_page="2" type="intro">
    <SectionTitle>
2. DATA COLLECTION &amp; EVALUATION
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Data Collection Efforts
Local Collection Effort
</SectionTitle>
      <Paragraph position="0"> The Center for Spoken Language Research maintains a dialup Communicator system for data collection  . Users wishing to use the dialogue system can register at our web site [1] and receive a PIN code and system telephone number. To date, our system has fielded over 1750 calls totaling over 25,000 utterances from nearly 400 registered users.</Paragraph>
      <Paragraph position="1"> NIST Multi-Site Data Collection  During the months of June and July of 2000, The National Institute of Standards (NIST) conducted a multi-site data collection effort for the nine DARPA Communicator participants. Participating sites included: AT&amp;T, IBM, BBN, SRI, CMU, Colorado, MIT, Lucent, and MITRE. In this data collection, a pool of potential users was selected from various parts of the United States by a market research firm. The selected subjects were native speakers of American English who were possible frequent travelers. Users were asked to perform</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
2.2 System Evaluation
Task Completion
</SectionTitle>
      <Paragraph position="0"> A total of 72 calls from NIST participants were received by the CU Communicator system. Of these, 44 callers were female and 28 were male. Each scenario was inspected by hand and compared against the scenario provided by NIST to the subject.</Paragraph>
      <Paragraph position="1"> For the two open-ended tasks, judgment was made based on what the user asked for with that of the data provided to the user. In total, 53/72 (73.6%) of the tasks were completed successfully.</Paragraph>
      <Paragraph position="2"> A detailed error analysis can be found in [11].</Paragraph>
      <Paragraph position="3"> Word Error Rate Analysis A total of 1327 utterances were recorded from the 72 NIST calls. Of these, 1264 contained user speech. At the time of the June 2000 NIST evaluation, the CU Communicator system did not implement voice-based barge-in. We noticed that one source of error was due to users who spoke before the recording process was started. Even though a tone was presented to the user to signify the time to speak, 6.9% of the utterances contained instances in which the user spoke before the tone. Since all users were exposed to several other Communicator systems that  The system can be accessed toll-free at 1-866-735-5189 employed voice barge-in, there may be some effect from exposure to those systems. Table 3 summarizes the word error rates for the system utilizing the June 2000 NIST data as the test set. Overall, the system had a word error rate (WER) of 26.0% when parallel gender-dependent decoders were utilized. Since June of 2000, we have collected an additional 15,000 task-dependent utterances. With the extra data, we were able to remove our dependence on the CMU Communicator training data [12]. When the language model was reestimated and language model weights reoptimized using only CU Communicator data, the WER dropped from 26.0% to 22.5%.</Paragraph>
      <Paragraph position="4"> This amounts to a 13.5% relative reduction in WER.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
User Words to Task End 19 39.4 105
System Words to End 173 331.9 914
</SectionTitle>
      <Paragraph position="0"> Number of Reprompts 0 2.4 15 Table 2 summarizes results obtained from metrics derived automatically from the logged timing markers for the calls in which the user completed the task assigned to them. The average time to task completion is 260. During this period there are an average of 19 user turns and 19 computer turns (37.6 average total turns). The average response latency was 1.86 seconds.</Paragraph>
      <Paragraph position="1"> The response latency also includes the time required to access the data live from the Internet travel information provider.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML