File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2016_intro.xml

Size: 2,109 bytes

Last Modified: 2025-10-06 14:03:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2016">
  <Title>Investigating Cross-Language Speech Retrieval for a Spontaneous Conversational Speech Collection</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The emergence of large collections of digitized spoken data has encouraged research in speech retrieval. Previous studies, notably those at TREC (Garafolo et al, 2000), have focused mainly on well-structured news documents. In this paper we report on work carried out for the Cross-Language Evaluation Forum (CLEF) 2005 Cross-Language Speech Retrieval (CL-SR) track (White et al, 2005).</Paragraph>
    <Paragraph position="1"> The document collection for the CL-SR task is a part of the oral testimonies collected by the USC Shoah Foundation Institute for Visual History and Education (VHI) for which some Automatic Speech Recognition (ASR) transcriptions are available (Oard et al., 2004). The data is conversional spontaneous speech lacking clear topic boundaries; it is thus a more challenging speech retrieval task than those explored previously. The CLEF data is also annotated with a range of automatic and manually generated sets of metadata. While the complete VHI dataset contains interviews in many languages, the CLEF 2005 CL-SR task focuses on English speech.</Paragraph>
    <Paragraph position="2"> Cross-language searching is evaluated by making the topic statements (from which queries are automatically formed) available in several languages.</Paragraph>
    <Paragraph position="3"> This task raises many interesting research questions; in this paper we explore alternative term weighting methods and content indexing strategies.</Paragraph>
    <Paragraph position="4"> The remainder of this paper is structured as follows: Section 2 briefly reviews details of the CLEF 2005 CL-SR task; Section 3 describes the system we used to investigate this task; Section 4 reports our experimental results; and Section 5 gives conclusions and details for our ongoing work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML