File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1080_evalu.xml
Size: 3,837 bytes
Last Modified: 2025-10-06 14:00:29
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1080"> <Title>Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset</Title> <Section position="7" start_page="487" end_page="488" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> We have used 130,000 words as the training set and a test set of 1000 words. There have been 378 different ambiguity classes (of subtags) across all categories.</Paragraph> <Paragraph position="1"> We have used two evaluation metrics: one which evaluates each category separately and one &quot;flatlist&quot; error rate which is used for comparison with other methods which do not predict the morphological categories separately. We compare the new method with results obtained on Czech previously, as reported in (Hladk~, 1994) and (Hajie, Hladk~, 1997). The apparently high baseline when compared to previously reported experiments is undoubtedly due to the introduction of multiple models based on ambiguity classes.</Paragraph> <Paragraph position="2"> In all cases, since the percentage of text tokens which are at least two-way ambiguous is about 55%, the error rate should be almost doubled if one wants to know the error rate based on ambiguous words only.</Paragraph> <Paragraph position="3"> The baseline, or &quot;smoothing-only&quot; error rate was at 20.7 % in the test data and 22.18 % in the training data.</Paragraph> <Paragraph position="4"> Table 2 presents the initial error rates for the individual categories computed using only the smoothing part of the model (n = 0 in equation 3).</Paragraph> <Paragraph position="5"> Training took slightly under 20 hours on a Linuxpowered Pentium 90, with feature adding threshold set to 4 (which means that a feature batch was not added if it improved the absolute error rate on training data by 4 errors or less). 840 (batches) of features (which corresponds to about 2000 fully specified features) have been learned. The tagging itself is (contrary to training) very fast. The average speed is about 300 words/sec, on morphologically prepared data on the same machine. The results are summarized in Table 3.</Paragraph> <Paragraph position="6"> There is no apparent overtraining yet. However, it does appear when the threshold is lowered (we have tested that on a smaller set of training data consisting of 35,000 words: overtraining started to occur when the threshold was down to 2-3).</Paragraph> <Paragraph position="7"> Table 4 contains comparison of the results achieved with the previous experiments on Czech tagging (Hajji, HladkA, 1997). It shows that we got more than 50% improvement on the best error rate achieved so far. Also the amount of training data used was lower than needed for the HMM experiments. We have also performed an experiment using 35,000 training words which yielded by about 4% worse results (88% combined tag accuracy).</Paragraph> <Paragraph position="8"> Finally, Table 5 compares results (given differ- null ent training thresholds 9) obtained on larger training data using the &quot;separate&quot; prediction method discussed so far with results obtained through a modification, the key point of which is that it considers only &quot;Valid (sub)Tag Combinations (VTC)'. The probability of a tag is computed as a simple product of subtag probabilities (normalized), thus assuming subtag independence. The &quot;winner&quot; is presented in boldface. As expected, the overall error rate is always better using the VTC method, but some of the subtags are (sometimes) better predicted using the &quot;separate&quot; prediction method ldeg. This could have important practical consequences - if, for example, the POS or SUBPOS is all that's interesting.</Paragraph> </Section> class="xml-element"></Paper>