File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/n01-1017_evalu.xml

Size: 6,078 bytes

Last Modified: 2025-10-06 13:58:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="N01-1017">
  <Title>Generating Training Data for Medical Dictations</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4.1.3 Results
</SectionTitle>
    <Paragraph position="0"> Table 1 shows the test results. As expected, both recognition accuracy and correctness increase with any of the three kinds of adaptation. Adaptation using Literal transcriptions yields an overall 10.84% absolute gain in correctness and 11.49% in accuracy over the baseline.</Paragraph>
    <Paragraph position="1"> Adaptation using Non-literal transcriptions yields an overall 6.36 % absolute gain in correctness and 5.23 % in accuracy over the baseline. Adaptation with Semi-literal transcriptions yields an overall 11.39 % absolute gain in correctness and 11.05 % in accuracy over the baseline. No statistical significance tests were performed on this data.</Paragraph>
    <Paragraph position="2">  The results of this experiment provide additional support for using automatically generated semi-literal transcriptions as a viable (and possibly superior) substitute for literal data. The fact that three SEMI-LITERAL adapted AM's out of 5 performed better than their LITERAL counterparts seems to indicate that there may be undesirable noise either in the literal transcriptions or in the corresponding audio. It may also be due to the relatively small amount of training data used for SI modeling thus providing a baseline that can be improved with little effort. However, the results still indicate that generating semi-literal transcriptions may help eliminate the undesirable noise and, at the same time, get the benefits of broader coverage that semi-literal transcripts can afford over NON-LITERAL transcriptions.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Language Model Evaluation
</SectionTitle>
      <Paragraph position="0"> For ASR applications where there are significant discrepancies between an utterance and its formal transcription, the inclusion of literal data in the language model can reduce language model perplexity and improve recognition accuracy. In medical transcription, the non-literal texts typically depart from what has actually been said. Hence if the talker says &amp;quot;lungs are clear&amp;quot; or &amp;quot;lungs sound pretty clear&amp;quot;, the typed transcription is likely to have &amp;quot;Lungs - clear&amp;quot;. In addition, as we noted earlier, the non-literal transcription will omit disfluencies and asides and will correct grammatical errors.</Paragraph>
      <Paragraph position="1"> Literal and semi-literal texts can be added onto language model training data or interpolated into an existing language model. Below we will present results of a language modeling experiment that compares language models built from literal, semi-literal and non-literal versions of the same training set. The results substantiate our claim that automatically generated semi-literal transcription can lead to a significant improvement in language model quality.</Paragraph>
      <Paragraph position="2"> In order to test the proposed method's suitability for language modeling, we constructed three trigram language models and used perplexity as the measure of the models' goodness.</Paragraph>
      <Paragraph position="3"> Setup The following models were trained on three versions of a 270,000-word corpus. The size of the training corpus is dictated by availability of literal transcriptions. The vocabulary was derived from a combination of all three corpora to keep the OOV rate constant.</Paragraph>
      <Paragraph position="4"> LLM - language model built from a corpus of literal transcriptions NLM - language model built from non-literal transcriptions SLM - language model built from semi-literal transcriptions Approximately 5,000-word literal transcriptions corpus consisting of 24 dictations was set aside for testing</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Results
</SectionTitle>
      <Paragraph position="0"> The results of perplexity tests of the three models on the held-out data at 3-gram level are summarized in Table 2. The tests were carried out using the Entropic Transcriber Toolkit It is apparent that SLM yields considerably better perplexity than NLM, which indicates that although semi-literal transcriptions are not as good as actual literal transcriptions, they are more suitable for  language modeling than non-literal transcriptions. These results are obtained with 270,000 words of training data; however, the typical amount is dozens of million. We would expect the differences in perplexity to become smaller with larger amounts of training data.</Paragraph>
      <Paragraph position="1"> Conclusions and future work We have described ATRS, a system for reconstructing semi-literal transcriptions automatically. ATRS texts can be used as a substitute for literal transcriptions when the cost and time required for generating literal transcriptions are infeasible, e.g. in a telephony based transcription operation that processes thousands of acoustic and language models. Texts produced with ATRS were used in training speaker adapted acoustic models, speaker independent acoustic models and language models.</Paragraph>
      <Paragraph position="2"> Experimental results show that models built from ATRS training data yield performance results that are equivalent to those obtained with models trained on literal transcriptions. In the future, we will address the issue of the amount of training data for the SI model. Also, current ATRS system does not take advantage of various confidence scores available in leading recognition engines. We believe that using such confidence measures can improve the generation of semi-literal transcriptions considerably. We would also like to investigate the point at which the size of the various kinds of data  used for adaptation stops making improvements in recognition accuracy.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML