File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1042_intro.xml

Size: 6,320 bytes

Last Modified: 2025-10-06 14:03:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1042">
  <Title>Learning Morphological Disambiguation Rules for Turkish</Title>
  <Section position="2" start_page="0" end_page="328" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Morphological disambiguation is the task of selecting the correct morphological parse for a given word in a given context. The possible parses of a word are generated by a morphological analyzer. In Turkish, close to half the words in running text are morphologically ambiguous. Below is a typical word masal with three possible parses.</Paragraph>
    <Paragraph position="2"> The rst two parses start with the same root, masal (= story, fable), but the interpretation of the following + suf x is the Accusative marker in one case, and third person possessive agreement in the other. The third parse starts with a different root, masa (= table) followed by a derivational suf x +l (= with) which turns the noun into an adjective. The symbol DB represents a derivational boundary and splits the parse into chunks called in ectional groups (IGs).1 We will use the term feature to refer to individual morphological features like +Acc and +With; the term IG to refer to groups of features split by derivational boundaries ( DB), and the term tag to refer to the sequence of IGs following the root.</Paragraph>
    <Paragraph position="3"> Morphological disambiguation is a useful rst step for higher level analysis of any language but it is especially critical for agglutinative languages like Turkish, Czech, Hungarian, and Finnish. These languages have a relatively free constituent order, and  syntactic relations are partly determined by morphological features. Many applications including syntactic parsing, word sense disambiguation, text to speech synthesis and spelling correction depend on accurate analyses of words.</Paragraph>
    <Paragraph position="4"> An important qualitative difference between part of speech tagging in English and morphological disambiguation in an agglutinative language like Turkish is the number of possible tags that can be assigned to a word. Typical English tag sets include less than a hundred tag types representing syntactic and morphological information. The number of potential morphological tags in Turkish is theoretically unlimited. We have observed more than ten thousand tag types in our training corpus of a million words. The high number of possible tags poses a data sparseness challenge for the typical machine learning approach, somewhat akin to what we observe in word sense disambiguation.</Paragraph>
    <Paragraph position="5"> One way out of this dilemma could be to ignore the detailed morphological structure of the word and focus on determining only the major and minor parts of speech. However (Oflazer et al., 1999) observes that the modi er words in Turkish can have dependencies to any one of the in ectional groups of a derived word. For example, in mavi masal oda (= the room with a blue table) the adjective mavi (= blue) modi es the noun root masa (= table) even though the nal part of speech of masal is an adjective. Therefore, the nal part of speech and inection of a word do not carry suf cient information for the identi cation of the syntactic dependencies it is involved in. One needs the full morphological analysis.</Paragraph>
    <Paragraph position="6"> Our approach to the data sparseness problem is to consider each morphological feature separately.</Paragraph>
    <Paragraph position="7"> Even though the number of potential tags is unlimited, the number of morphological features is small: The Turkish morphological analyzer we use (Oflazer, 1994) produces tags that consist of 126 unique features. For each unique feature f, we take the subset of the training data in which one of the parses for each instance contain f. We then split this subset into positive and negative examples depending on whether the correct parse contains the feature f. These examples are used to learn rules using the Greedy Prepend Algorithm (GPA), a novel decision list learner.</Paragraph>
    <Paragraph position="8"> To predict the tag of an unknown word, rst the morphological analyzer is used to generate all its possible parses. The decision lists are then used to predict the presence or absence of each of the features contained in the candidate parses. The results are probabilistically combined taking into account the accuracy of each decision list to select the best parse. The resulting tagging accuracy is 96% on a hand tagged test set.</Paragraph>
    <Paragraph position="9"> A more direct approach would be to train a single decision list using the full tags as the target classi cation. Given a word in context, such a decision list assigns a complete morphological tag instead of predicting individual morphological features. As such, it does not need the output of a morphological analyzer and should be considered a tagger rather than a disambiguator. For comparison, such a decision list was built, and its accuracy was determined to be 91% on the same test set.</Paragraph>
    <Paragraph position="10"> The main reason we chose to work with decision lists and the GPA algorithm is their robustness to irrelevant or redundant features. The input to the decision lists include the suf xes of all possible lengths and character type information within a ve word window. Each instance ends up with 40 attributes on average which are highly redundant and mostly irrelevant. GPA is able to sort out the relevant features automatically and build a fairly accurate model. Our experiments with Naive Bayes resulted in a significantly worse performance. Typical statistical approaches include the tags of the previous words as inputs in the model. GPA was able to deliver good performance without using the previous tags as inputs, because it was able to extract equivalent information implicit in the surface attributes. Finally, unlike most statistical approaches, the resulting models of GPA are human readable and open to interpretation as Section 3.1 illustrates.</Paragraph>
    <Paragraph position="11"> The next section will review related work. Section 3 introduces decision lists and the GPA training algorithm. Section 4 presents the experiments and the results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML