XML Viewer - p01-1039

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1039_intro.xml
Size: 5,343 bytes
Last Modified: 2025-10-06 14:01:11
<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1039">
  <Title>Information Extraction From Voicemail</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In recent years, the task of automatically extracting information from data has grown in importance, as a result of an increase in the number of publicly available archives and a realization of the commercial value of the available data. One aspect of information extraction (IE) is the retrieval of documents. Another aspect is that of identifying words from a stream of text that belong in pre-defined categories, for instance, &amp;quot;named entities&amp;quot; such as proper names, organizations, or numerics.</Paragraph>
    <Paragraph position="1"> Though most of the earlier IE work was done in the context of text sources, recently a great deal of work has also focused on extracting information from speech sources. Examples of this are the Spoken Document Retrieval (SDR) task (NIST, 1999), named entity (NE) extraction (DARPA, 1999; Miller et al., 2000; Kim and Woodland, 2000). The SDR task focused on Broadcast News and the NE task focused on both Broadcast News and telephone conversations.</Paragraph>
    <Paragraph position="2"> In this paper, we focus on a source of conversational speech data, voicemail, that is found in relatively large volumes in the real-world, and that could benefit greatly from the use of IE techniques. The goal here is to query one's personal voicemail for items of information, without having to listen to the entire message. For instance, &amp;quot;who called today?&amp;quot;, or &amp;quot;what is X's phone number?&amp;quot;. Because of the importance of these key pieces of information, in this paper, we focus precisely on extracting the identity and the phone number of the caller. Other attempts at summarizing voicemail have been made in the past (Koumpis and Renals, 2000), however the goal there was to compress a voicemail message by summarizing it, and not to extract the answers to specific questions.</Paragraph>
    <Paragraph position="3"> An interesting aspect of this research is that because a transcription of the voicemail is not available, speech recognition algorithms have to be used to convert the speech to text and the subsequent IE algorithms must operate on the transcription. One of the complications that we have to deal with is the fact that the state-of-the-art accuracy of speech recognition algorithms on this type of data 1 is only in the neighborhood of 6070% (Huang et al., 2000).</Paragraph>
    <Paragraph position="4"> The task that is most similar to our work is named entity extraction from speech data (DARPA, 1999). Although the goal of the named entity task is similar - to identify the names of persons, locations, organizations, and temporal and numeric expressions - our task is different, and in some ways more difficult. There are two main reasons for this: first, caller and number information constitute a small fraction of all named entities. Not all person-names belong to callers, and not all digit strings specify phone-numbers. In this sense, the algorithms we use must be more precise than those for named entity detection.</Paragraph>
    <Paragraph position="5"> Second, the caller's identity may include information that is not typically found in a named entity, for example, &amp;quot;Joe on the third floor&amp;quot;, rather than simply &amp;quot;Joe&amp;quot;. We discuss our definitions of &amp;quot;caller&amp;quot; and &amp;quot;number&amp;quot; in Section 2. To extract caller information from transcribed speech text, we implemented three different systems, spanning both statistical and non-statistical approaches. We evaluate these systems on manual voicemail transcriptions as well as the output of a speech recognizer. The first system is a simple rule-based system that uses trigger phrases to identify the information-bearing words. The second system is a maximum entropy model that tags the words in the transcription as belonging to one of the categories, &amp;quot;caller's identity&amp;quot;, &amp;quot;phone number&amp;quot; or &amp;quot;other&amp;quot;. The third system is a novel technique based on automatic stochastictransducer induction. It aims to learn rules automatically from training data instead of requiring hand-crafted rules from experts. Although the results with this system are not yet as good as the other two, we consider it highly interesting because the technology is new and still open to significant advances.</Paragraph>
    <Paragraph position="6"> The rest of the paper is organized as follows: Section 2 describes the database we are using; Section 3 contains a description of the baseline system; Section 4 describes the maximum entropy model and the associated features; Section 1The large word error rate is due to the fact that the speech is spontaneous, and characterized by poor grammar, false starts, pauses, hesitations, etc. While this does not pose a problem for a human listener, it causes significant problems for speech recognition algorithms.</Paragraph>
    <Paragraph position="7"> 5 discusses the transducer induction technique; Section 6 contains our experimental results and Section 7 concludes our discussions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML