File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1078_intro.xml

Size: 3,422 bytes

Last Modified: 2025-10-06 14:03:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1078">
  <Title>discriminative named entity recognition of speech data</Title>
  <Section position="3" start_page="0" end_page="617" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> As network bandwidths and storage capacities continue to grow, a large volume of speech data including broadcast news and PodCasts is becoming available. These data are important information sources as well as such text data as newspaper articles and WWW pages. Speech data as information sources are attracting a great deal of interest, such as DARPA's global autonomous language exploitation (GALE) program. We also aim to use them for information extraction (IE), question answering, and indexing.</Paragraph>
    <Paragraph position="1"> Named entity recognition (NER) is a key technique for IE and other natural language processing tasks. Named entities (NEs) are the proper expressions for things such as peoples' names, locations' names, and dates, and NER identifies those expressions and their categories. Unlike text data, speech data introduce automatic speech recognition (ASR) error problems to NER. Although improvements to ASR are needed, developing a robust NER for noisy word sequences is also important. In this paper, we focus on the NER of ASR results and discuss the suppression of ASR error problems in NER.</Paragraph>
    <Paragraph position="2"> Most previous studies of the NER of speech data used generative models such as hidden Markov models (HMMs) (Miller et al., 1999; Palmer and Ostendorf, 2001; Horlock and King, 2003b; B'echet et al., 2004; Favre et al., 2005).</Paragraph>
    <Paragraph position="3"> On the other hand, in text-based NER, better results are obtained using discriminative schemes such as maximum entropy (ME) models (Borthwick, 1999; Chieu and Ng, 2003), support vector machines (SVMs) (Isozaki and Kazawa, 2002), and conditional random fields (CRFs) (McCallum and Li, 2003). Zhai et al. (2004) applied a text-level ME-based NER to ASR results. These models have an advantage in utilizing various features, such as part-of-speech information, character types, and surrounding words, which may be overlapped, while overlapping features are hard to use in HMM-based models.</Paragraph>
    <Paragraph position="4"> To deal with ASR error problems in NER, Palmer and Ostendorf (2001) proposed an HMM-based NER method that explicitly models ASR errors using ASR confidence and rejects erroneous word hypotheses in the ASR results. Such rejection is especially effective when ASR accuracy is relatively low because many misrecognized words may be extracted as NEs, which would decrease NER precision.</Paragraph>
    <Paragraph position="5"> Motivated by these issues, we extended their approach to discriminative models and propose an NER method that deals with ASR errors as fea- null tures. We use NE-labeled ASR results for training to incorporate the features into the NER model as well as the corresponding transcriptions with NE labels. In testing, ASR errors are identified by ASR confidence scores and are used for the NER.</Paragraph>
    <Paragraph position="6"> In experiments using SVM-based NER and speech data from Japanese newspaper articles, the proposed method increased the NER F-measure, especially in precision, compared to simply applying text-based NER to the ASR results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML