File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/c04-1140_abstr.xml
Size: 961 bytes
Last Modified: 2025-10-06 13:43:26
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1140"> <Title>High-Performance Tagging on Medical Texts</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We ran both Brill's rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TNT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also - when trained on medical corpora - achieve a higher performance level than for the newspaper genre.</Paragraph> </Section> class="xml-element"></Paper>