File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-2021_evalu.xml
Size: 4,293 bytes
Last Modified: 2025-10-06 13:59:39
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2021"> <Title>Initial Study on Automatic Identification of Speaker Role in Broadcast News Speech</Title> <Section position="7" start_page="82" end_page="83" type="evalu"> <SectionTitle> 4.2 Results </SectionTitle> <Paragraph position="0"> A HMM and Maxent: Table 1 shows the role identification results using the HMM and the Maxent model, including the overall classification accuracy and the precision/recall rate (%) for each role. These results are the average over the 10 test sets.</Paragraph> <Paragraph position="1"> From Table 1 we find that the overall classification performance is similar when using the HMM and the Maxent model; however, their error patterns are quite different. For example, the Maxent model is better than the HMM at identifying &quot;reporter&quot; role, but worse at identifying &quot;other&quot; speakers (see the recall rate shown in the table). In the HMM, we only used the first and the last sentence in a speaker's turn, which are more indicative of the speaker's role. We observed significant performance degradation, that is, 74.68% when using all the sentences for LM training and perplexity calculation, compared to 77.18% as shown in the table using a subset of a speaker's speech. Note that the sentences used in the HMM and Maxent models are the same; however, the Maxent does not use any contextual role tags (which we will examine next), although it does include some words from the previous and the following speaker segments in its feature set.</Paragraph> <Paragraph position="2"> B Contextual role information: In order to investigate how important the role sequence is, we conducted two experiments for the Maxent model. In the first experiment, for each segment, the reference role tag of the previous and the following segments and the combination of them are included as features for model training and testing (a &quot;cheating&quot; experiment). In the second experiment, a two-step approach is employed. Following the HMM and Max-ent experiments (i.e., results as shown in Table 1), Viterbi decoding is performed using the posterior probabilities from the Maxent model and the transition probabilities from the role LM as in the HMM (with weight 0.3). The average performance over the ten test sets is shown in Table 2 for these two experiments. For comparison, we also present the decoding results of the HMM with and without using sequence information (i.e., the transition probabilities in the HMM). Additionally, the system combination results of the HMM and Maxent are presented in the table, with more discussion on this later. We observe from Table 2 that adding contextual role information improves performance. Including the two reference role tags yields significant gain in the Maxent model, even though some sentences from the previous and the following segments are already included as features. The HMM suffers more than the Max-ent classifier when role sequence information is not used during decoding, since that is the only contextual information used in the HMM, unlike the Max-ent model, which uses features extracted from the neighboring speaker turns.</Paragraph> <Paragraph position="3"> HMM and Maxent classifiers. The combination results of the HMM and Maxent are also provided.</Paragraph> <Paragraph position="4"> C System combination: For system combination, we used two different Maxent results: with and without the Viterbi sequence decoding, corresponding to experiments (0) and (2) as shown in Table 2 respectively. When combining the HMM and Maxent, i.e., the last two rows in Table 2, the posterior probabilities from them are linearly weighted (weight 0.6 for the Maxent in the upper one, and 0.7 for the Max-ent in the bottom one). The combination of the two approaches yields better performance than any single model in the two cases. We also investigated other system combination approaches. For example, a decision tree or SVM that builds a 3-way superclassifier using the posterior probabilities from the HMM and Maxent. However, so far we have not found any gain from more complicated system combination than a simple linear interpolation. We will study this in our future work.</Paragraph> </Section> class="xml-element"></Paper>