XML Viewer - c96-2209

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/c96-2209_evalu.xml
Size: 2,474 bytes
Last Modified: 2025-10-06 14:00:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2209">
  <Title>A tagger/lemmatiser for Dutch medical language</Title>
  <Section position="6" start_page="1148" end_page="1148" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> In order to assess the performance of the T/L, several data sets were used. A learning set of 1314 tokens (5 reports) from the cardiology department (cardio) should eliminate as much as possible errors due to unknown vocabulary. A new large test set of 3167 tokens of 35 neurosurgical reports was fed to the T/L to see how robust it is when confronted with the vocabulary of a comt)letely new domain. The t)roblem with an application of this type is the trade-olr between overkill (a good analysis is injustly discarded) and undershoot (an invalid analysis is kept). The extensive tagset (tagsetl) provides all the morphosyntaetic information as required by the DMLP parser for sentence analysis, while the reduced tagset (tagset2) consists of 15 ('ategories and 25 speciliers (which gives 43 meaningfifl combinations). This simplifi'fable 1: results of contextual tagging with an extensive tagset (tagsetl) versus a reduced one (tagset2) on the eardio and neuro sets  cation of the syntactic information greatly improves the results.</Paragraph>
    <Paragraph position="1"> All the results were manually examined and synthesised (of. table 1). As soon as even one feature of the complete feature bundle with linguistic information is wrong, the analysis as a whole is considered to be incorrect. All the words that have wrong, lacking, doubtful or more than 2 competing analyses are considered as bad. Sometimes, two competing readings could not be disambiguated without sernantico-pragmatic knowledge. In addition, we deliberately left some ambiguities pending for the syntactic parser to avoid the danger of overkill (el. also (Jaeobs and l:\[au, 1993, pp.166--167) on this matter). These eases of &amp;quot;double analysis&amp;quot; are grouped in the &amp;quot;class 2''. The question whether these cases should be considered as bad or correct is left open &amp;quot;~ The difference between the results is mainly due to the amount of unknown vocabulary (around 9 % for the cardio set VS. around 18% for the neuro set which results in a difference of 82.42 % vs. 73.63 % and 91.32 % vs. 83.04 %) and the nature of the tagsets (82.42 % vs. 91.32 % and 73.63 % vs. 8'.1.0/1%).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML