File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1023_intro.xml
Size: 4,677 bytes
Last Modified: 2025-10-06 14:05:23
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1023"> <Title>Topic and Speaker Identification via Large Vocabulary Continuous Speech Recognition</Title> <Section position="2" start_page="0" end_page="119" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> The task of topic identification is to select from a set of possibilities the topic that is most likely to represent the subject matter covered by a sample of speech. Similarly, speaker identification requires selecting from a list of possibilities the speaker most likely to have produced the speech. In this paper, we present a novel approach to the problems of topic and speaker identification which uses a large vocabulary continuous speech recognizer as a preprocessor of the speech messages.</Paragraph> <Paragraph position="1"> The motivation for developing improved message identification systems derives in part from the increasing reliance on audio databases such as arise from voice mail, for example, and the consequent need to extract information from them. Technology that is capable of searching such a database of recorded speech and classifying material by subject matter or by speaker would have substantial value, much as text-based information retrieval technology has for textual corpora. Several approaches to the problems of topic and speaker identification have already appeared in the literature. For example, an approach to topic identification using wordspotting is described in \[1\] and approaches to the speaker identification problem are reported in \[2\] and \[3\].</Paragraph> <Paragraph position="2"> Dragon Systems' approach to the message identification tasks depends crucially on the existence of a large vocabulary continuous speech recognition system. We view the tasks of topic and speaker identification as complementary problems: for topic identification, the speaker is irrelevant and only the subject matter is of interest; for speaker identification, the reverse is true. For efficiency of computation, in either case we first use a speaker-independent topic-independent recognizer to transcribe the speech messages. The resulting output is then scored using topic-sensitive or speaker-sensitive models.</Paragraph> <Paragraph position="3"> This approach to the problem of message identification is based on the belief that the contextual information used in a full-scale recognition is invaluable in extracting reliable data from difficult speech channels. For example, unlike standard approaches to topic identification through spotting a small collection of topic-specific words, the approach via continuous speech recognition should more reliably detect keywords because of the acoustic and language model context available to the recognizer. Moreover, with large vocabulary recognition, the list of keywords is no longer limited to a small set of highly topic-specific (but generally infrequent) words, and instead can grow to include much (or even all) of the recognition vocabulary. The use of contextual information makes the message systems sufficiently robust that they are able to operate even with vocabulary sizes and noise environments that would make speech recognition extremely difficult for other applications.</Paragraph> <Paragraph position="4"> To test our message identification systems, we have been using the &quot;Switchboard&quot; corpus of recorded telephone messages \[4\] collected by Texas Instruments and now available through the Linguistic Data Consortium.</Paragraph> <Paragraph position="5"> This collection of roughly 2500 messages includes conversations involving several hundred speakers. People who volunteered to participate in this program were prompted with a subject to discuss (chosen from a set that they had previously specified as acceptable) and were expected to talk for at least five minutes. We report results of topic identification tests involving messages on ten different topics using four and a half minutes of speech and speaker identification tests involving 24 speakers with test intervals containing as little as 10 seconds of speech.</Paragraph> <Paragraph position="6"> In the next section, we describe the theoretical framework on which our message identification systems are based and discuss the dual nature of the two problems.</Paragraph> <Paragraph position="7"> We then describe how this theory is implemented in the current message processing systems. Preliminary tests of the systems using the Switchboard corpus are reported in Section 4. We close with a discussion of the test results and plans for further research.</Paragraph> </Section> class="xml-element"></Paper>