XML Viewer - h94-1073

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1073_intro.xml
Size: 5,487 bytes
Last Modified: 2025-10-06 14:05:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1073">
  <Title>Assessing the Retrieval Effectiveness of a Speech :Retrieval System by Simulating Recognition Errors</Title>
  <Section position="2" start_page="0" end_page="370" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> We show how the recognition performance of a speech recognition component in a speech retrieval system affects the retrieval effectiveness. A speech retrieval system facilitates content-based retrieval of speech documents, i.e. audio recordings containing spoken text \[5\]. The speech retrieval process receives queries from users and for every query it ranks the speech documents in decreasing order of their probabilities that they are relevant to the query. These probabilities are derived from the occurrences of indexing features that were identified in the speech documents by a speech recognition component \[4\]. Because the recognition of indexing features in continuous speech is error prone, the question arises how much an error prone recognition of indexing features affects the retrieval effectiveness.</Paragraph>
    <Paragraph position="1"> The indexing features used in our speech retrieval system are phonetically motivated subword units having an intermediate specificity. The general pattern of an indexing feature is a maximum sequence of consonants enclosed by two maximum sequences of vowels at both ends. We call these indexing features VCV-features where C stands for the maximum sequence of consonants and V stands for the maximum sequence of vowels. As an example, the word INTERNA-TIONAL contains the VCV-features INTE, ERNA, ATIO, and IONA. The indexing vocabulary is defined to be the set of those VCV-featutes ~,i whose inverse document frequencies idf(~oi) are between a lower bound idfmi, and an upper bound idf, no, such that the indexing features are neither  very specific nor very broad. The lower bound guarantees the suitability for indexing and the upper bound guarantees the trainability, i.e. there are enough examples to train the 8MMs. Experiments on standard information retrieval test collections showed that when using an appropriate subset of only 1000 VCV-features we can achieve a retrieval effectiveness that is comparable to standard weighted retrieval which is based on a much larger indexing vocabulary \[4\]. In addition to the VCV-features, we have also studied indexing vocabularies that have been extended by CV- and VC-features at the word boundaries The recognition of speech documents is carried out with standard speech recognition technology, i.e. a wordspotter \[6\], \[11\], \[14\] locates the occurrences of indexing features in documents. For each document in the collection we create a description vector based on the number of occurrences of each feature and use a conventional retrieval function \[12\] to estimate the similarity between a document and a query description.</Paragraph>
    <Paragraph position="2"> Our indexing features consisting of VCV-features can be identified in both text and speech documents. As a consequence, the document collection may contain a mixture of text and speech documents. Furthermore, the query may also be entered as either text or speech. The controlled indexing vocabulary consisting of selected VCV-features has the advantage that the document description can be computed before the query evaluation. In particular, an access structure (e.g. an inverted file) can be constructed to allow fast query evaluation. Another important advantage of our indexing vocabulary for both text and speech retrieval is that speech retrieval can simulated by using text collections as described in the subsequent sections.</Paragraph>
    <Paragraph position="3"> Information Retrieval on audio documents has been investigated very little. A wordspotting system for voice indexing was developed by Wilcox and Bush \[14\] and an information retrieval system that classifies speech messages was presented by Rose, Chang, and Lippmann \[10\]. Recently a project for video mail retrieval using voice was proposed by Olivetti research Limited, Cambridge University Engineering Department, and Cambridge University Computer Laboratory \[7\].</Paragraph>
    <Paragraph position="4"> The effects of recognition errors on on the retrieval effectiveness has been studied in the context of OCR based Information Retrieval \[1\]. These results are not directly comparable because speech retrieval performance may be considerably affected by false alarms in contrast to OCR-based retrieval where false alarms can be ignored because the occur infrequently. null MEDLARS: average precision of the reference method: 0.534 (100%)</Paragraph>
    <Paragraph position="6"> (key word) per hour within the range of 0 and 140. The numbers in brackets represent the percentage of the average precision of the reference method, i.e. a standard text retrieval method.</Paragraph>
    <Paragraph position="7"> The main contribution of this paper is the conclusion that speech retrieval is feasible to some extent even when the recognition performance is poor. A closer look reveals that recognition errors and occurrences of query features in the documents have different distributions and standard retrieval methods are quite good in distinguishing these two distributions. The next two sections describe the test setting and the results respectively. Then, some conclusions are drawn.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML