File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1042_evalu.xml

Size: 6,888 bytes

Last Modified: 2025-10-06 13:58:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1042">
  <Title>Statistical Morphological Disambiguation for Agglutinative Languages</Title>
  <Section position="6" start_page="287" end_page="290" type="evalu">
    <SectionTitle>
5 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> To evaluate our models, we frst trained our models and then tried to morphologically disambiguate our test data. For statistical modeling we used SRILM - the SRI language modeling toolkit (Stolcke, 1999).</Paragraph>
    <Paragraph position="1"> Both the test data and training data were collected from the web resources of a Turkish daily newspN)er. The tokens were analyzed using the morphological analyzer, developed by Oflazer (1994). The mnbiguity of the training data was then reduced fl'om 1.75 to 1.55 using a preprocessor, that disambiguates lexicalized and non-lexicalized collocations and removes certain obviously impossible parses, and trigs to analyze unknown words with all unkllown word processor. The training data consists of the unambiguous sequences (US) consisting of about 650K tokens in a corpus of i million tokens, and two sets of manually dismnbiguated corpora of 12,000 and 20,000 tokens. Tile idea of using unambiguous sequences is similar to Brill's work on unsupervised learning of disambiguation rules for POS tagging (199517).</Paragraph>
    <Paragraph position="2"> The test data consists of 2763 tokens, 935 (~34deg/0) of which have more than one morphological analysis after preprocessing. The ambiguity of the test data was reduced from 1.74 to 1.53 after prct)rocessing.</Paragraph>
    <Paragraph position="3"> As our evaluation metric, we used accuracy defined as follows: # of correct parses x 100 accuracy = # o.f tokens The accuracy results are given in Table 4. For all cases, our models pertbrmed better than baseline tag model. As expected, the tag model suffered considerably from data sparseness. Using all of our training data., we achieved an accuracy of 93.95%, wlfich is 2.57% points better titan tile tag model trained using the same amount of data. Models 2 and 3 gave  In all three models we assume that roots and IGs are indel)cn(tenl. Model 1: This model assumes that un IG in ~ word depends on the last IGs of the two previous words. P(IGi,t:\[(ri-~, 1Gi-2,~ ... Gi-'~,,~_.2), (ri-1, IGi-l,~,..., IGi-~,,,~_~ ), ri, IGi,~,..., IGi,t,-~)</Paragraph>
    <Paragraph position="5"> In order to simpli\[y the uotation, wc lmve dctlncd the follc, v:ing:</Paragraph>
    <Paragraph position="7"> similar results, Model 2 suffered from data sparsehess slightly more than Model 3, as expected.</Paragraph>
    <Paragraph position="8"> Surprisingly, the bigram version of Model I (i.e., Equation (7), but with bigrams in root and IG models), also performs quite well. If we consider just the syntactically relevant morl)hological features and ignore any senlantic features that we mark in ulorphology, the accuracy increases a bit flirt, her. These stem ti'om two properties of %lrkish: Most Turkish root words also have a proper noun reading, when written with the first letter cai)italized. 4 We (:ount it; as an error if the tagger does not get the correct 1)roper noun marking, for a proper noun. But this is usua\]ly impossil)le especially at the begimfing of sentences where the tagger can not exploit caI)italization and has to back-off to a lower-order model. In ahnost all of such cases, all syntactically relevant morl)hosyntactic features except the proper noun marking are actually correct. Another imi)ortant ease is the pronoun o, which has t)oth personal prollottll (s/he) and demonstrative 1)ronoun readings (it) (in addition to a syntactically distinct determiner reading (that)).</Paragraph>
    <Paragraph position="9"> Resolution of this is always by semantic cousi(leratious. When we count as (:orreet m~y errors involving such selnantic marker cases, we get an accuracy of 95.07% with the best (',as(; (cf. 93.91% of the Model 1). This is slightly 1)etter than the precision figures that is reported earlier on morphological disambiguation of Turkish using constraint-based techniques (Oflazer and T/Jr, 1997). Our resuits are slightly better than the results on Czech of Haji~ and Hla(lkg (1998). Megyesi (1999) reports a 95.53% accuracy on Hungarian (a language whose features relevant to this task are very close to those of Turkish), with just the POS tags 1)eing correct. In our model this corresponds to the root and the POS tag of the last IG 1)eing correct and the accuracy of our best model with this assumi)tion is 96.07%.</Paragraph>
    <Paragraph position="10"> When POS tags and subtags are considered, the reported accuracy for Hungarian is 91.94% while the corresl)onding accuracy in our case is 95.07%. We.</Paragraph>
    <Paragraph position="11"> can also note that the results presented by Ezeiza et al. (1998) for Basque are better titan ours. The main reason tbr this is that they eml)loy a much more sot)histicated (comt)ared to our t)reprocessor) din fact, any word form is a i)otential first name or a last naII10.</Paragraph>
    <Paragraph position="12"> constraint-grammar based system which imI)roves t)recision without reducing recall. Statistical techniques applied at'~er this disaml)iguation yield a better accuracy compared to starting from a more am1)iguous initial state.</Paragraph>
    <Paragraph position="13"> Since our models assmned that we have independent models for disambiguating the root words, and the IOs, we ran experiments to see the contribution of the individual models. Table 5 summm'izes the accuracy results of the individual models for the best  ibr the 1)est case.</Paragraph>
    <Paragraph position="14"> There are quite a number of classes of words which are always ambiguous and the t)reprocessing that we have employed in creating the unambiguous sequences ca.n never resolve these cases. Tlms statistical models trained using only the unambiguous sequences as the training data do not handle these ambiguous cases at all. This is why the accuracy results with only unalnbiguous sequences are significantly lower (row 1 in Table 4). The manually dismnl)iguated training sets have such mnbiguities resolved, so those models perform much better.</Paragraph>
    <Paragraph position="15"> An analysis of the errors indicates the following: In 15% of the errors, the last IG of the word is incorrect t)ut the root and the rest of the IOs, if any, are correct. In 3% of the errors, the last IG of the word is correct but the either the root or SOlne of the previous IGs are incorrect. In 82% of the errors, neither the last IG nor any of the previous IOs are correct. Along a different dimension, in about 51% of the errors, the root and its part-of-speech are not determined correctly, while in 84% of the errors, the root and the tirst IG combination is not correctly determined.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML