File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-3002_evalu.xml
Size: 2,245 bytes
Last Modified: 2025-10-06 13:59:11
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-3002"> <Title>ITSPOKE: An Intelligent Tutoring Spoken Dialogue System</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Performance Analysis </SectionTitle> <Paragraph position="0"> A formal evaluation comparing ITSPOKE and other tutoring methods began in November 2003, and is still ongoing. Subjects are University of Pittsburgh students who have taken no college physics and are native speakers of American English. Our experimental procedure, taking roughly 4 hours/student, is as follows: students 1) read a small document of background material, 2) take a pretest measuring their physics knowledge, 3) use ITSPOKE to work through 5 physics problems, and 4) take a post-test similar to the pretest.</Paragraph> <Paragraph position="1"> As of March 2004, we have collected 80 dialogues from 16 students (21 total hours of speech, mean dialogue time of 17 minutes). An average dialogue contains 21.3 student turns and 26.3 tutor turns. The mean student turn length is 2.8 words (max=28, min=1).2 ITSPOKE uses 56 dialogue-state dependent language models for speech recognition; 43 of these 56 models have been used to process the data collected to date.3 These stochastic language models were initially trained using 4551 typed student utterances from a 2002 evaluation of Why2-Atlas, then later enhanced with spoken utterances obtained during ITSPOKE's pilot testing. For the 1600 student turns that we have collected, ITSPOKE's current Word Error Rate is 31.2%. While this is the traditional method of evaluating speech recognition, semantic rather than transcription accuracy is more useful for dialogue evaluation as it does not penalize for word errors that are unimportant to overall utterance interpretation. Semantic analysis based on speech recognition is the same as based on perfect transcription 92% of the time. An average dialogue contains 1.4 rejection prompts (when ITSPOKE is not con dent of the speech recognition output, it asks the user to repeat the utterance), and .8 timeout prompts (when the student doesn't say anything within a speci ed time frame, ITSPOKE repeats its previous question).</Paragraph> </Section> class="xml-element"></Paper>