File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1041_concl.xml
Size: 2,847 bytes
Last Modified: 2025-10-06 13:53:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1041"> <Title>Information Extraction from Voicemail Transcripts</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusion and Outlook </SectionTitle> <Paragraph position="0"> The novel contributions of this paper can be summarized as follows: * We demonstrated empirically that positional cues can be an important source of information for locating caller names and phrases.</Paragraph> <Paragraph position="1"> * We showed that good performance on the task of extracting caller information can be achieved using a very small inventory of lexical and positional features.</Paragraph> <Paragraph position="2"> * We argued that for extracting telephone numbers it is extremely useful to take the length of their numeric representation into account.</Paragraph> <Paragraph position="3"> Our grammar-based extractor translates spoken numbers into such a numeric representation.</Paragraph> <Paragraph position="4"> * Our two-phase approach allows us to efficiently develop a simple extraction grammar for which the only requirement is high recall. This places less of a burden on the grammar developers than having to write an accurate set of rules like the baseline of (Huang et al., 2001).</Paragraph> <Paragraph position="5"> * The combined performance of our simple extraction grammar and the second-phase classifier exceeded the performance of all other methods, including the current state of the art (Huang et al., 2001).</Paragraph> <Paragraph position="6"> Our results point towards approaches that use a small inventory of features that have been tailored to specific tasks. Generic methods like the named entity tagger used by Huang et al. (2001) may not be the best tools for particular tasks; in fact, we do not expect the bigram and trigram features used by such taggers to be sufficient for accurately extracting phone numbers. We also believe that using all available lexical information for extracting caller information can easily lead to over-fitting, which can partly be avoid by not relying on names being transcribed correctly by an ASR component.</Paragraph> <Paragraph position="7"> In practice, determining the identity of a caller might have to take many diverse sources of information into account. The self-identification of a caller and the phone numbers mentioned in the same message are not uncorrelated, since there is usually only a small number of ways to reach any particular caller. In an application we might therefore try to use a combination of speaker identification (Rosenberg et al., 2001), caller name extraction, and recognized phone numbers to establish the identity of the caller. An investigation of how to combine these sources of information is left for future research.</Paragraph> </Section> class="xml-element"></Paper>