File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2901_intro.xml
Size: 2,759 bytes
Last Modified: 2025-10-06 14:02:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2901"> <Title>A System for Searching and Browsing Spoken Communications</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Archiving and organizing multimedia communications for easy user access is becoming more important as such information sources are becoming available in amounts that can easily overwhelm a user. As storage and access become cheaper, the types of multimedia communications are also becoming more diverse. Therefore, it is necessary for multimedia content analysis and navigation systems to handle various forms of data.</Paragraph> <Paragraph position="1"> In this paper we present SpeechLogger, a research system for searching and browsing spoken communications, or the spoken component of multimedia communications.</Paragraph> <Paragraph position="2"> In general, the information contained in a spoken communication consists of more than just words. Our goal is to make use of all the information within a spoken communication. Our system uses automatic speech recognition (ASR) to convert speech into a format which makes word and phonetic searching of the material possible. It also uses speaker segmentation to aid navigation.</Paragraph> <Paragraph position="3"> We are interested in a wide range of spoken communications with different characteristics, including broadcast material, lectures, meetings, interviews, telephone conversations, call center recordings, and teleconferences.</Paragraph> <Paragraph position="4"> Each of these communication types presents interesting opportunities, requirements and challenges. For example, lectures might have accompanying material that can aid ASR and navigation. Prior knowledge about the speakers and the topic may be available for meetings. Call center recordings may be analyzed to create aggregate reports.</Paragraph> <Paragraph position="5"> Spoken document retrieval (SDR) for Broadcast News type of content has been well studied and there are many research and commercial systems. There has also been some interest in the Voicemail domain (Hirschberg et al., 2001) which consists of typically short duration humanto-machine messages. Our focus here is on telephone conversations and teleconferences with comparisons to broadcast news.</Paragraph> <Paragraph position="6"> The paper is organized as follows. In Section 2, we motivate our approach by describing the user needs under various conditions. Then we describe our system in Section 3, giving the details of various components. Experimental results for some components are given in Section 4. Finally, in Section 5 we present a summary.</Paragraph> </Section> class="xml-element"></Paper>