XML Viewer - n04-4010

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-4010_evalu.xml
Size: 6,012 bytes
Last Modified: 2025-10-06 13:59:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4010">
  <Title>Using N-best Lists for Named Entity Recognition from Chinese Speech</Title>
  <Section position="5" start_page="1" end_page="6" type="evalu">
    <SectionTitle>
2.4 Results and discussion
</SectionTitle>
    <Paragraph position="0"> From text to speech Table 1 compares the NER performances of the same MaxEnt model on the Chinese textual PFR test data and the one-best BBN ASR hypotheses. We can see a significant drop in performance in the latter. These results support the claim that transferring NER approaches from text to spoken language is a significantly more difficult task for Chinese than for English. We argue that this is due to the combination of different factors specific to spoken Chinese. First, Chinese has a large number of homonyms that leads to a degradation in speech recognition accuracy which in turn leads to low NER accuracy. Second, the vocabulary used in Chinese person names is an open set so many characters/words are unseen in the training data.</Paragraph>
    <Paragraph position="1"> Comparison to IBM baseline Table 2 compares results on IEER data from our baseline word-based MaxEnt model compared with that of IBM's HMM word-based model. These two models achieved almost the same results, which show that our NER system based on MaxEnt is state-of-the-art.</Paragraph>
    <Paragraph position="2"> Re-segmentation effect Table 3 shows that by discarding word boundaries from the ASR hypothesis, and then re-segmenting using our MaxEnt segmenter, we obtained a better performance in most cases. We believe that some reduction in segmentation errors due to recognition errors is obtained this way; for example, in the ASR output, two words &amp;quot;Hao Ling &amp;quot; in &amp;quot;Qian Shu Liao Di Si Shi Er Hao Ling &amp;quot; are misrecognized as one word &amp;quot;Hao Ling &amp;quot;, which can be corrected by re-segmentation.</Paragraph>
    <Paragraph position="3"> Post-classification effect Table 3 also shows that the one-pass identification/classification method yields better results than the two-pass method. However, there are still errors in the one-pass output where the bracketing is correct, but the NE classification is wrong. In particular, the type ORG is easily confusable with LOC in Chinese. Both types of NEs tend to be rather long. We propose a hybrid approach by first using the one-pass method to extract NEs, and then removing all type information, combining words of one NE to a whole NE-word and post-classifying all the NE-words again. Our results in Figure 1 show that the post-classification combined with the one-pass approach performs much better on all types.</Paragraph>
    <Paragraph position="4">  model, and one-pass NER is better than two-pass.</Paragraph>
    <Paragraph position="5"> 3. Using N-Best Lists to Improve NER Miller et al. (1999) performed NER on the one-best hypothesis of English Broadcast News data. Palmer &amp; Ostendorf (2001) and Horlock &amp; King (2003) carried out English NER on word lattices. We are interested in investigating how to best utilize the n-best hypothesis from the ASR system to improve NER performances.</Paragraph>
    <Paragraph position="6"> From Figure 1, we can see that recall increases as the number of hypotheses increases. Thus it would appear possible to find a way to make use of the n-best ASR output, in order to improve the NER performance.</Paragraph>
    <Paragraph position="7"> However, we can expect it to be difficult to get significant improvement since the same figure (Figure 1) shows that precision drops much more quickly than recall. This is because the nth hypothesis tends to have more character errors than the (n-1)th hypothesis, which may lead to more NER errors. Therefore the question is, given n NE-tagged hypotheses, what is the best way to use them to obtain a better NER overall performance than by using the one-best hypothesis alone? One simple approach is to allow all the hypotheses to vote on a possible NE output. In simple voting, a recognized named-entity is considered correct only when it appears in more than 30 percent of the total number of all the hypotheses for one utterance. The result of this simple voting is shown in Table 4. Next, we propose a mechanism of weighted voting using confidence measure for each hypothesis. In one experiment, we use the MaxEnt NER score as confidence measure. In another experiment, we use all the six scores (acoustic, language model, number of words, number of phones, number of silence, or NER score) provided by the BBN ASR system as confidence measure. During implementation, an optimizer based on Powell's algorithm is used to find the 6 weights (o</Paragraph>
    <Section position="1" start_page="6" end_page="6" type="sub_section">
      <SectionTitle>
3.1 Experimental setup
</SectionTitle>
      <Paragraph position="0"> We use the n-best hypothesis of 1,046 Broadcast News Chinese utterances from the BBN LVCSR system. n ranges from one to 300, averaging at 68. Each utterance has a reference transcription with no recognition error.</Paragraph>
    </Section>
    <Section position="2" start_page="6" end_page="6" type="sub_section">
      <SectionTitle>
3.2 Results and discussion
</SectionTitle>
      <Paragraph position="0"> Table 4 presents the NER results for the reference sentence, one best hypothesis, and different n-best voting methods. Results for the reference sentences show the upper bound performance (68% F-measure) of applying a MaxEnt NER system trained from the Chinese text corpus (e.g., PFR) to Chinese speech output (e.g., Broadcast News). From Table 4, we can conclude that it is possible to improve NER precision by using n-best hypothesis by finding the optimized combination of different acoustic, language model, NER, and other scores. In particular, since most errors in Chinese ASR seem to be for person names, using NER score on the n-best hypotheses can improve recognition results by a relative 6.7% in precision and</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML