File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1025_intro.xml

Size: 2,099 bytes

Last Modified: 2025-10-06 14:01:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1025">
  <Title>A Method for Open-Vocabulary Speech-Driven Text Retrieval</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Problem Statement
</SectionTitle>
    <Paragraph position="0"> One major problem in speech-driven retrieval is related to out-of-vocabulary (OOV) words.</Paragraph>
    <Paragraph position="1"> On the one hand, recent IR systems do not limit the vocabulary size (i.e., the number of index terms), and can be seen as open-vocabulary systems, which allow users to input any keywords contained in a target collection. It is often the case that a couple of million terms are indexed for a single IR system.</Paragraph>
    <Paragraph position="2"> On the other hand, state-of-the-art speech recognition systems still need to limit the vocabulary size (i.e., the number of words in a dictionary), due to problems in estimating statistical language models (Young, 1996) and constraints associated with hardware, such as memories. In addition, computation time is crucial for a real-time usage, including speech-driven retrieval. In view of these problems, for many languages the vocabulary size is limited to a couple of ten thousands (Itou et al., 1999; Paul and Baker, 1992; Steeneken and van Leeuwen, 1995), which is incomparably smaller than the size of indexes for practical IR systems.</Paragraph>
    <Paragraph position="3"> In addition, high-frequency words, such as functional words and common nouns, are usually included in dictionaries and recognized with a high accuracy. However, those words are not necessarily useful for retrieval. On the contrary, low-frequency words appearing in specific documents are often effective query terms.</Paragraph>
    <Paragraph position="4"> To sum up, the OOV problem is inherent in speech-driven retrieval, and we need to fill the gap between speech recognition and text retrieval in terms of the vocabulary size. In this paper, we propose a method to resolve this problem aiming at open-vocabulary speech-driven retrieval.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML