File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/p99-1083_evalu.xml

Size: 6,495 bytes

Last Modified: 2025-10-06 14:00:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1083">
  <Title>Modeling Filled Pauses in Medical Dictations</Title>
  <Section position="7" start_page="621" end_page="623" type="evalu">
    <SectionTitle>
7. Results and discussion
</SectionTitle>
    <Paragraph position="0"> Although a perplexity test provides a good theoretical measure of a language model, it is not always accurate in predicting the model's performance in a recognizer (Chen 1998); therefore, both perplexity and recognition accuracy were used in this study. Both were calculated using ECRL's LM Transcriber tools.</Paragraph>
    <Section position="1" start_page="621" end_page="621" type="sub_section">
      <SectionTitle>
7.1 Perplexity
</SectionTitle>
      <Paragraph position="0"> Perplexity tests were conducted with ECRL's LPlex tool based on the same text corpus (BFP-CORPUS) that was used to build the BIGRAM-FP-LM. Three conditions were used. Condition A used the whole corpus. Condition B used a subset of the corpus that contained high frequency FP users (FPs/Words ratio above 1.0).</Paragraph>
      <Paragraph position="1"> Condition C used the remaining subset containing data from lower frequency FP users (FPs/Words ratio below 1.0). Table 1 summarizes the results of perplexity tests at 3-gram level for the models under the three conditions.</Paragraph>
      <Paragraph position="2">  .... , : Lp~ Lplex.: :: i OOV: ~. :Lpl~ :NOFP~LIV, ::, = ,,: ,: 617.59 6.35 1618.35 6.08 287.46 ADAVT~. M ........ i.. ;'L = 132.74 6.35 ::: ..... 6.08 ' ~:13L70 : ....  The perplexity measures in Condition A show over 400 point difference between ADAPTFP-LM and NOFP-LM language models. The 363,08 increase in perplexity for ALLFP-LM model corroborates the results discussed in Section 6. Another interesting result is contained in the highlighted fields of Table 1. ADAPTFP-LM based on CONTROLLED-FP-CORPUS has lower perplexity in general. When tested on conditions B and C, ADAPTFP-LM does better on frequent FP users, whereas RANDOMFP-LM-A does better on infrequent FP users, which is consistent with the recognition accuracy results for the two models (see Table 2).</Paragraph>
    </Section>
    <Section position="2" start_page="621" end_page="623" type="sub_section">
      <SectionTitle>
7.2 Recognition accuracy
</SectionTitle>
      <Paragraph position="0"> Recognition accuracy was obtained with ECRL's HResults tool and is summarized in  The results in Table 2 demonstrate two things. First, a FP model performs better than a clean model that has no FP representation~ Second, a FP model based on populating a no-FP training corpus with FP's whose distribution was derived from a  small sample of speech data performs better than the one populated with FP's at random based solely on the frequency of FP's. The results also show that ADAPTFP-LM performs slightly better than RANDOMFP-LM-1 on high FP users. The gain becomes more pronounced towards the higher end of the FP use continuum. For example, the scores for the top four high FP users are 62.07% with RANDOMFP-LM-1 and 63.51% with ADAPTFP-LM. This difference cannot be attributed to the fact that RANDOMFP-LM-1 contains fewer FP's than ADAPTFP-LM. The word accuracy rates for RANDOMFP-LM-2 indicate that frequency of FP's in the training corpus is not responsible for the difference in performance between the RANDOM-FP-LM-1 and the ADAPTFP-LM. The frequency is roughly the same for both RANDOMFP-CORPUS-2 and CONTROLLED-FP-CORPUS, but RANDOMFP-LM-2 scores are lower than those of RANDOMFP-LM-1, which allows in absence of further evidence to attribute the difference in scores to the pattern of FP distribution, not their frequency.</Paragraph>
      <Paragraph position="1"> Conclusion Based on the results so far, several conclusions about FP modeling can be  made: 1. Representing FP's in the training data improves both the language model's perplexity and recognition accuracy.</Paragraph>
      <Paragraph position="2"> 2. It is not absolutely necessary to have a  corpus that contains naturally occurring FP's for successful recognition. FP distribution can be extrapolated from a relatively small corpus containing naturally occurring FP's to a larger clean corpus. This becomes vital in situations where the language model has to be built from &amp;quot;clean&amp;quot; text such as finished transcriptions, newspaper articles, web documents, etc.</Paragraph>
      <Paragraph position="3"> 3. If one is hard-pressed for hand transcribed data with natural FP's, a .</Paragraph>
      <Paragraph position="4"> random population can be used with relatively good results.</Paragraph>
      <Paragraph position="5"> FP's are quite common to both quasi-spontaneous monologue and spontaneous dialogue (medical dictation).</Paragraph>
      <Paragraph position="6"> Research in progress The present study leaves a number of issues to be investigated further: 1. The results for RANDOMFP-LM-1 are very close to those of ADAPTFP-LM. A statistical test is needed in order to determine if the difference is significant.</Paragraph>
      <Paragraph position="7"> 2. A systematic study of the syntactic as well as discursive contexts in which FP's are used in medical dictations. This will involve tagging a corpus of literal transcriptions for various kinds of syntactic and discourse boundaries such as clause, phrase and theme/rheme boundaries. The results of the analysis of the tagged corpus may lead to investigating which lexical items may be helpful in identifying syntactic and discourse boundaries. Although FP's may not always be lexically conditioned, lexical information may be useful in modeling FP's that occur at discourse boundaries due to co-occurrence of such boundaries and certain lexical items.</Paragraph>
      <Paragraph position="8"> 3. The present study roughly categorizes talkers according to the frequency of FP's in their speech into high FP users and low FP users. A more finely tuned categorization of talkers in respect to FP use as well as its usefulness remain to be investigated.</Paragraph>
      <Paragraph position="9"> 4. Another area of investigation will focus on the SOAP structure of medical dictations. I plan to look at relative frequency of FP use in the four parts of a medical dictation. Informal observation of data collected so far indicates that FP use is more frequent and different from other parts during the  Subjective part of a dictation. This is when the doctor uses fewer frozen expressions and the discourse is closest to a natural conversation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML