File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1017_intro.xml

Size: 3,180 bytes

Last Modified: 2025-10-06 14:02:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1017">
  <Title>Lattice-Based Search for Spoken Utterance Retrieval</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Automatic systems for indexing, archiving, searching and browsing of large amounts of spoken communications have become a reality in the last decade. Most such systems use an automatic speech recognition (ASR) component to convert speech to text which is then used as an input to a standard text based information retrieval (IR) component. This strategy works reasonably well when speech recognition output is mostly correct or the documents are long enough so that some occurrences of the query terms are recognized correctly.</Paragraph>
    <Paragraph position="1"> Most of the research has concentrated on retrieval of Broadcast News type of spoken documents where speech is relatively clean and the documents are relatively long.</Paragraph>
    <Paragraph position="2"> In addition it is possible to find large amounts of text with similar content in order to build better language models and enhance retrieval through use of similar documents.</Paragraph>
    <Paragraph position="3"> We are interested in extending this to telephone conversations and teleconferences. Our task is locating occurrences of a query in spoken communications to aid browsing. This is not exactly spoken document retrieval.</Paragraph>
    <Paragraph position="4"> In fact, it is more similar to word spotting. Each document is a short segment of audio.</Paragraph>
    <Paragraph position="5"> Although reasonable retrieval performance can be obtained using the best ASR hypothesis for tasks with moderate ([?] 20%) word error rates, tasks with higher (40[?]50%) word error rates require use of multiple ASR hypotheses. Use of ASR lattices makes the system more robust to recognition errors.</Paragraph>
    <Paragraph position="6"> Almost all ASR systems have a closed vocabulary.</Paragraph>
    <Paragraph position="7"> This restriction comes from run-time requirements as well as the finite amount of data used for training the language models of the ASR systems. Typically the recognition vocabulary is taken to be the words appearing in the language model training corpus. Sometimes the vocabulary is further reduced to only include the most frequent words in the corpus. The words that are not in this closed vocabulary - the out of vocabulary (OOV) words - will not be recognized by the ASR system, contributing to recognition errors. The effects of OOV words in spoken document retrieval are discussed by Woodland et al. (2000). Using phonetic search helps retrieve OOV words.</Paragraph>
    <Paragraph position="8"> This paper is organized as follows. In Section 2 we give an overview of related work, focusing on methods dealing with speech recognition errors and OOV queries.</Paragraph>
    <Paragraph position="9"> We present the methods used in this study in Section 3.</Paragraph>
    <Paragraph position="10"> Experimental setup and results are given in Section 4. Finally, our conclusions are presented in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML