File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-0305_evalu.xml
Size: 2,607 bytes
Last Modified: 2025-10-06 13:58:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0305"> <Title>MPLUS: A Probabilistic Medical Language Understanding System</Title> <Section position="4" start_page="5" end_page="5" type="evalu"> <SectionTitle> 6 Evaluation </SectionTitle> <Paragraph position="0"> M+ was evaluated for the extraction of American College of Radiology (ACR) utilization review codes from Head CT reports (Fiszman, 2002). The ACR codes compare the outcome in a report with the suspected diagnosis provided by emergency department physicians.</Paragraph> <Paragraph position="1"> If the outcome relates to the suspected diagnosis then the report should be encoded as positive (P). If the outcome is negative and does not relate to the suspected diagnosis then the report should be encoded as negative (N). In order to extract those ACR codes we trained M+ to extract eleven broad disease concepts, then inferred the ACR codes based on the application of a rule to the M+ output: If any of the concepts was present, the report was considered positive, else the report was considered negative.</Paragraph> <Paragraph position="2"> Twenty six hundred head CT scan reports were used for this evaluation. Six hundred reports were randomly selected for testing, and the rest were used to train M+ in this domain. The performance of M+ on this task was measured against that of four board certified physicians, using a gold standard based on majority vote, as described in (Fiszman, 2002). For each subject we calculated recall, precision and specificity with their respective 95 % confidence intervals for the capture of ACR utilization codes.</Paragraph> <Paragraph position="3"> From 600 head CT reports, 67 were judged to be positive (P) by the gold standard physicians and 534 were judged to be negative (N). Therefore the positive rate for head CT in this sample was 11%. Recall, precision and specificity for every subject are presented with their respective 95% confidence intervals in 88% (CI, 84% to 92.%), an average precision of 86% (CI, 81% to 90%), and average specificity of 98% (CI, 97% to 99%). M+ had recall of 87% (CI, 78% to 95%), precision of 85% (CI, 77% to 94%) and specificity of 98% (CI, 97% to 99).</Paragraph> <Paragraph position="4"> The results on Head CT reports are encouraging, but there are limitations. We only evaluated 600 reports, because it's very hard to get physicians to produce gold standard data for medical reports. The prevalence of positive reports is only 11% and reflects the fact that the individual brain conditions have very low prevalence.</Paragraph> </Section> class="xml-element"></Paper>