XML Viewer - p03-2034

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-2034_metho.xml
Size: 7,222 bytes
Last Modified: 2025-10-06 14:08:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2034">
  <Title>A speech interface for open-domain question-answering</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 System overview
</SectionTitle>
    <Paragraph position="0"> Our demonstration system has three components: a commercial speaker-dependent dictation system, a predictive interface for typing or correcting natural-language questions, and a Web-based open-domain question-answering engine. We describe these in turn.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Speech recognizer
</SectionTitle>
      <Paragraph position="0"> The dictation system is Dragon NaturallySpeaking 6.1, whose language models we have customized to a large corpus of questions. We performed tests with a head-mounted microphone in a relatively quiet acoustic environment. (The Dragon Audio Setup Wizard identified the signal-to-noise ratio as 22 dBs.) We tested a male native speaker of English and a female non-native speaker, requesting each first to train the acoustic models with 5-10 minutes of software-prompted dictation.</Paragraph>
      <Paragraph position="1"> We also trained the language models by presenting the Vocabulary Wizard the corpus of 280,000 questions described in (Schofield, 2003), of which Table 1 contains a random sample. The primary function of this training feature in NaturallySpeaking is to add new words to the lexicon; the nature of the other adaptations is not clearly documented.</Paragraph>
      <Paragraph position="2"> New 2-grams and 3-grams also appear to be identified, which one would expect to reduce the word-error rate by increasing the 'hit rate' over the 3050% of 3-grams in a new text for which a language model typically has explicit frequency estimates.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Predictive typing interface
</SectionTitle>
      <Paragraph position="0"> We have designed a predictive typing interface whose purpose is to save keystrokes and time in editing misrecognitions. Such an interface is particularly applicable in a mobile context, in which text entry is slow and circumstances may prohibit speech altogether.</Paragraph>
      <Paragraph position="1"> We fitted a 3-gram language model to the same corpus as above using the CMU-Cambridge SLM Toolkit (Clarkson and Rosenfeld, 1997). The interface in our demo is a thin JavaScript client accessible from a Web browser that intercepts each keystroke and performs a CGI request for an updated list of predictions. The predictions themselves appear as hyperlinks that modify the question when clicked.</Paragraph>
      <Paragraph position="2"> Figure 1 shows a screen-shot.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Question-answering system
</SectionTitle>
      <Paragraph position="0"> The AnswerBus system (Zheng, 2002) has been running on the Web since November 2001. It serves thousands of users every day. The original engine was not designed for a spoken interface, and we have recently made modifications in two respects. We describe these in turn. Later we propose other modifications that we believe would increase robustness to a speech interface.</Paragraph>
      <Paragraph position="1"> Speed The original engine took several seconds to answer each question, which may be too slow in a spoken interface or on a mobile device after factoring in the additional computational overhead of decoding the speech and the longer latency in mobile data networks. We have now implemented a multi-level caching system to increase speed.</Paragraph>
      <Paragraph position="2"> Our cache system currently contains two levels.</Paragraph>
      <Paragraph position="3"> The first is a cache of recently asked questions. If a question has been asked within a certain period of time the system will fetch the answers directly  What are squamous epithelial cells from the cache. The second level is a cache of semi-structured Web documents. If a Web document is in the cache and has not expired the system will use it instead of connecting to the remote site. By 'semistructured' we mean that we cache semi-parsed sentences rather than the original HTML document. We will discuss some technical issues, like how and how often to update the cache and how to use hash tables for fast access, in another paper.</Paragraph>
      <Paragraph position="4"> Output The original engine provided a list of sentences as hyperlinks to the source documents. This is convenient for Web users but should be transformed for spoken output. It now offers plain text as an alternative to HTML for output. 1 We have also made some cosmetic modifications for small-screen devices like shrinking the large logo.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> We evaluated the accuracy of the system subject to spoken input using 200 test questions from the TREC 2002 QA track (Voorhees, 2002). AnswerBus returns snippets from Web pages containing possible answers; we compared these with the refer- null perfect text versus misrecognized speech.</Paragraph>
    <Paragraph position="1"> Speaker 1 Speaker 2 Misrecognized speech 39% 26% Verbatim typing 58% 60% ence answers used in the TREC competition, overriding about 5 negative judgments when we felt the answers were satisfactory but absent from the TREC scorecard. For each of these 200 questions we passed two strings to the AnswerBus engine, one typed verbatim, the other transcribed from the speech of one of the people described above. The results are in Tables 2 and 3.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> We currently perform no automatic checking or correction of spelling and no morphological stemming  of words in the questions. Table 3 indicates that these features would improve robustness to errors in speech recognition. We now make some specific points regarding homographs, which are typically troublesome for speech recognizers. QA systems could relatively easily compensate for confusion in two common classes of homograph: * plural nouns ending -s versus possessive nouns ending -'s or -s'. Our system answered Q39 Where is Devil's tower?, but not the transcribed question Where is Devils tower? * written numbers versus numerals. Our system could not answer What is slang for a 5 dollar bill? although it could answer Q92 What is slang for a five dollar bill?.</Paragraph>
    <Paragraph position="1"> More extensive 'query expansion' using synonyms or other orthographic forms would be trickier to implement but could also improve recall. For example, Q245 What city in Australia has rain forests? it answered correctly, but the transcription What city in Australia has rainforests (without a space), got no answers. Another example: Q35 Who won the Nobel Peace Prize in 1992? got no answers, whereas Who was the winner . . . ? would have found the right answer.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML