XML Viewer - e87-1005

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/e87-1005_metho.xml
Size: 17,331 bytes
Last Modified: 2025-10-06 14:11:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="E87-1005">
  <Title>A MORPHOLOGICAL PROCESSOR FOR MODERN GREEK</Title>
  <Section position="2" start_page="0" end_page="26" type="metho">
    <SectionTitle>
4. A list of phonemes described as sets of featu-
</SectionTitle>
    <Paragraph position="0"> res. The same file contains also a set of phonological rules generating lexical phonological phenomena. These rules govern permissible correspondences between the form of entries listed in the dictionary and the form they develop when they are combined in sequences of morphemes.</Paragraph>
    <Paragraph position="1"> These files are used both for analysis and generation. The process of the present morphological analysis consists of parsing an input of inflected words with respect to the word grammar. Stems associated to the appropriate morpho-syntactic information will be the output of the parsing.</Paragraph>
    <Paragraph position="2"> The process of generation of a given inflected word consists of a. determining its stem by a morphological analysis.</Paragraph>
    <Paragraph position="3"> b. Generating all or a subset of the permissible word forms.</Paragraph>
    <Paragraph position="4"> For the needs of this presentation, lexical items have been transcribed in a semi-phonological manner. According to this transcription,all greek vowels written as double character are kept as such:</Paragraph>
    <Paragraph position="6"> Moreover, the sounds \[i\] and \[o~ written in Greek as n and ~ respectively are transcribed as i: and o:. The transcription of the last two vowels reminds of their ancient greek status as long vowels.</Paragraph>
    <Paragraph position="7"> As far as accent is concerned, we decided to exclude this aspect from the present form of the processor. Accentuation in Greek is a linguistic problem which has not been solved as yet. We are working on this matter and we hope to implement accent in the near future.</Paragraph>
    <Paragraph position="8"> The morphological processing is controlled by a finite automaton I with the help of the dictio-T--F~r-a detailed discussion on the control automaton, c.f.Courtin et al 1969.</Paragraph>
    <Paragraph position="9">  namy and the word grammar which controls word formarion and carries out the transmission of The linguistic information needed for the processing. In certain cases, the gPammar makes use of phonological rules in order To capture lexlcal phonological phenomena such as insertion, deletion and change.</Paragraph>
    <Paragraph position="10"> The processor is implemented in TURBO-PROLO~ (version 1.0) running under MS-DOS (version 3.10) on an IBM-XT with 640 kB main memory. It consists of an analysis and a generation sub-module.</Paragraph>
  </Section>
  <Section position="3" start_page="26" end_page="27" type="metho">
    <SectionTitle>
2. Linguistic assumptions
</SectionTitle>
    <Paragraph position="0"> The theoPetical fPamework underlying the linsuistic aspects of the project is that of Generative Morphology, in particular the recent work by Lieber 1980, Selkirk 1982, Kiparsky 1982 and others.</Paragraph>
    <Paragraph position="1"> In developing our system, we have adopted the proposals made in Ralli's study on Greek MorphologY (Ph.D.diss., 1987). Therefore, we assume that the greek lexicon contains a list of entries (dictionary) and a grammap which combines morphology with phonology. The dictionary is morpheme based. It contains stems and affixes which ape associated with the following infor~nation fields.  a. The string in its basic phonological form.</Paragraph>
    <Paragraph position="2"> b. Reference to possible allomorphic variations of The string which are not productively generated by rule.</Paragraph>
    <Paragraph position="3"> c. Specifications of grammatical category and other morpho-syntactic features that characterize the particular entries.</Paragraph>
    <Paragraph position="4"> d. The meaning.</Paragraph>
    <Paragraph position="5"> e. Diacritic marks which are integers permitring the correct mapping between the stem and the affix where this cannot be done by rule.</Paragraph>
    <Paragraph position="6"> (i) Stem Affix</Paragraph>
    <Paragraph position="8"> In our work, diacritic marks replace the traditional use of declensions and conjugations which fail to divide nouns and verbs in inflectional classes.</Paragraph>
    <Paragraph position="9"> The inflectional structure of words is handled by a grammar which assigns a binary tree structure to the words in question. The rules are of the form  (2) Word / stem Infl, where, Word and stem are lexical categories and Infl indicates the inflectional ending. For nominal stems, Infl corresponds to a single affix marked for number and case.</Paragraph>
    <Paragraph position="10"> (3) Infl ~ affix Example: 6romos / 6rom-os (nom, sg) &amp;quot;street&amp;quot; For verbs, the constituent Infl refers either to one or to two affixes. In the latter case, Two affixes belong to The endings of verbal types that are aspectually marked.</Paragraph>
    <Paragraph position="11"> (4) Infl * affix Infl Example: 7mapsame + 7rap s &amp;quot;we wTote .... write&amp;quot; ~erf~ ame BP</Paragraph>
    <Paragraph position="13"> Note that the stem 7rap is listed in the dictionary as ymaf. The consonant \[f~ is changed to \[p\] because of the \[s 3 that follows. The phonological rule in ouestion is lexical and it applies to the morpheme boundary. As such, the rule is morphologically conditioned and ~r allows exceptions~ When verbal types do not contain an aspectual marker, Infl refers to a single affix.</Paragraph>
    <Section position="1" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
3.1 The dictionary structure
</SectionTitle>
      <Paragraph position="0"> In our system, The dictionary consists of a sequence of entries each in the form of a Prolog term.</Paragraph>
      <Paragraph position="1"> It has to be noted that no significant semantic information is present in our entries because that field is still unexploited. Similarly, The syntactic information concerning subcategorization properties of lexical entries is not taken into account.</Paragraph>
      <Paragraph position="2"> The dictionary also contains information That perTniTs the &amp;quot;linking&amp;quot; with the grammar. So, apart from the linguistic information mentioned in section 2, every entry of the dictionary contains also a. a list of rules that permit the use of a particular entry (rules That have the entry as Their Terminal symbol).</Paragraph>
      <Paragraph position="3"> b. a list of validatio~ rules (rules that can be applied after each use of that entry).</Paragraph>
      <Paragraph position="4"> As far as morphology is concerned, forms can be arranged into classes. We choose arbitrarily an element of this class called a &amp;quot;model&amp;quot; and every stem in the dictionary refers to a model. Morphological information is found at the model level. In this way, the size of the dictionary is significantly reduced.</Paragraph>
      <Paragraph position="5"> The model file consists also of sequences of entries, each in the form of a Prolog term. Each model includes information concerning a. The form of the string, b. the &amp;quot;basic initial mule&amp;quot; which identifies the string, c. the possible diacritic mark, d. the set of morpho-syntactic features, e. the validation rules which substitute word formation rules.</Paragraph>
    </Section>
    <Section position="2" start_page="26" end_page="27" type="sub_section">
      <SectionTitle>
3.2 Examples from the dictionary
</SectionTitle>
      <Paragraph position="0"> We did not write separate dictionary entries for affixes because each affix is a model on its own.</Paragraph>
      <Paragraph position="1"> Therefore, information associated with an affix model must cover all unpredictable information listed within the corresponding dictionary entry.</Paragraph>
      <Paragraph position="2"> Instead of a &amp;quot;basic initial rule&amp;quot;, every affix model refers to a set of rules that govern the combination of the affix with a particular stem. An affix that terminates a word is identified by an empty set of validation rules.</Paragraph>
      <Paragraph position="3"> Example of an affix model EnVy Rules Diac. Feat. Val.</Paragraph>
      <Paragraph position="4"> af(&amp;quot;o&amp;quot;, \[n12, a4\], \[3\], \[nom, sg\] , \[\])</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="27" end_page="28" type="metho">
    <SectionTitle>
4. The gmammam
</SectionTitle>
    <Paragraph position="0"> In order to carry out the processing we use a &amp;quot;validation grammar&amp;quot; as defined in Cour~in 1977.</Paragraph>
    <Section position="1" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
4.1 Review of validation g~e,,,a~s
</SectionTitle>
      <Paragraph position="0"> A validation grammar GV is a 4-tuple GV=(VTv , SV, gV, E), where, VTV = a vocabulary of terminal symbols. E=a subset of the set of integers.</Paragraph>
      <Paragraph position="1"> SV @ ~(E) and is called axiom ~V=a finite set of production rules.</Paragraph>
      <Paragraph position="2"> A production is an element of the application</Paragraph>
      <Paragraph position="4"> A validation Krammar is equivalent to a re~ul~v grammar since they generate the same language.</Paragraph>
      <Paragraph position="5"> Consequently, there is a finite automaton that recognizes the strings generated by a validation grammar.</Paragraph>
      <Paragraph position="6"> P~oper, ty 2 The number of production rules of a validation grammar is less than or equal to the number of production rules of its equivalent regular grammar.</Paragraph>
    </Section>
    <Section position="2" start_page="27" end_page="28" type="sub_section">
      <SectionTitle>
4.2 Contmol, Transmission and phonological changes
</SectionTitle>
      <Paragraph position="0"> Contr~l is carried out with the help of validations which ame redefined after the application of each rule. In our system, validation rules consist of a list of PPolog clauses.</Paragraph>
      <Paragraph position="1"> Transmission concerns the grammatical category and other morpho-syntactic features.</Paragraph>
      <Paragraph position="2"> Linguistically, we regard stems to be the head of inflectedwords. As such, they contribute to the categorial specifications of the words. Moreover, all morpho-syntactic features of inflectional affixes ape also copied to the word. In word structures built in the form of a tree, features ape percolated to the mother node according to the Percolation Principle as it was formulated by Selkirk.</Paragraph>
      <Paragraph position="3">  (i) Percolation Principle (Selkirk 1982) a. If a head has a feature specification \[aFi\], a~u, its mother node must be specified \[aFi\] and vice versa.</Paragraph>
      <Paragraph position="4"> b. If a non head has a feature specification  uSfj\] and the head has the feature specification Fjj, then the mother node must have the feature specification ~Fj\]. (page 76).</Paragraph>
      <Paragraph position="5"> The principle in question is incorporated in our validation Pules where, for each inflected word, it is determined which features are taken from the stem and which come from the affix.</Paragraph>
      <Paragraph position="6">  where, &amp;quot;concat&amp;quot; is a Prolog predicate performing the concatenation of two strings and &amp;quot;append list&amp;quot; is a Prolog predicate performing the concatenation of two lists.</Paragraph>
      <Paragraph position="7"> However, accoDding to Ralli's study, features are not only percolated To words from stems and affixes. Feature values may also be inserted to certain underspecified environments. For instance, when an inflected word fails to take certain features fl~om both the stem and the ending, the rule then takes over the role of adding them. Consider the verbal form 71&amp;quot;afo: &amp;quot;I write&amp;quot;. It takes the category value from the stem (TTaf-) and the features of person and number from the affix (-o:). It is clear that at this point, 7Taro: is underspecifled because besides the values of person and number, greek verbal forms must be characterized by aspect, tense and voice. Following this, we assume that specific values of the last three attributes are inserted by the rule governing the combination of the stem ymaf- with the ending -o:.</Paragraph>
      <Paragraph position="8">  IT is worth noting that a validation rule can also take into account instances of morpho-phonological phenomena.</Paragraph>
      <Paragraph position="9"> #.2.1 Morpho-phonological insertion In Greek, in several cases, transition elements appear at a morpheme boundary between Two consti-Tuents (c.f.Ralli 1987). Both the insertion and the phonological form of the elements are always conditioned by the morphological environment.</Paragraph>
      <Paragraph position="10"> Nominal as well as verbal inflection undergo morpho-phonological insertion depending on the kind of stem that is involved in the process. An example of morpho-phonological insertion is the verbal thematic vowel.</Paragraph>
      <Paragraph position="11">  (i) Stem Th.V. Af yraf o mai &amp;quot;I am written&amp;quot; yraf e Tai &amp;quot;It is written&amp;quot; Similarly, in certain nouns and adjectives, a vowel appears in singular, between the stem and the inflection.</Paragraph>
      <Paragraph position="12"> (2) Stem Th.V. Af tami a s &amp;quot;cashier&amp;quot; foiti:t i: s &amp;quot;univ. student&amp;quot;  Insertion is not the only morphophonological phenomenon.</Paragraph>
      <Paragraph position="13">  As already mentioned in section 2, verbal inflecZion undergoes morphophonological changes on the stem and/or the affix during the construction of aspectually marked verbal types. Rules performing phonological changes are applied cyclically each time the appropriate lexical string is formed. Phonological rules take into account a list of phonemes described as sets of distinctive features. In our system, phonemes are listed as Prolog terms. Phonological rules are listed as Prolog clauses. Take for example the form 6e-s-ame &amp;quot;we tied&amp;quot;. The stem 6e- is listed in the dictionary as 6en-. The validation rule authorizing the concatenation of 6en- and -s- demands the application of a lexical phonological rule responsible for the deletion of the final Inl.</Paragraph>
      <Paragraph position="14"> ~.2.3 The augment rule It is generally accepted that augment in Modern Greek must be considered as a phonological element introduced in the appropriate morphological environment. That is, an e- is prefixed to forms marked for past in which it is always accentuated. Given the fact that accentuation is not treated here, we decided to divide verbal stems in marked and unmarked for augment. Once a verbal item is built, the e- is added at the beginning of the form in singular and third person plural only if the stem carries the feature \[aug\].</Paragraph>
      <Paragraph position="15"> In our system, the augment rule, listed also as a Prolog clause, is activated by validation rules authorizing the concatenation of a verbal stem and a verbal affix marked for past. The same rules insert the feature value &amp;quot;active&amp;quot;.</Paragraph>
      <Paragraph position="16"> In this way, we obtain:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="28" end_page="29" type="metho">
    <SectionTitle>
5. The Process
</SectionTitle>
    <Paragraph position="0"> The analysis of a word form is carried out independently of its syntactic environment. Consequently, the analyzer will provide the set of all possible analyses.</Paragraph>
    <Paragraph position="1"> In order to program and store the automaton,we perform a splitting of its transitions and each transition is represented by a rule.</Paragraph>
    <Paragraph position="2"> (1) avli: &amp;quot;yard&amp;quot; (nom/acc singular) dictionamy entries diet( &amp;quot;avl&amp;quot;, &amp;quot;avl&amp;quot;, \[\] ) model ant:ties stem( &amp;quot;avl&amp;quot;, \[init\], \[l'l,  The rule init starts the analysis by taking every information from the dictionary level. The stem &amp;quot;avl&amp;quot; is validated by rules n2! and n23, among others, which will also authorize the use of a 0-affix. Moreover, they perform morpho-phonological insertion of the transition element -i: during the concatenation of &amp;quot;avl&amp;quot; and &amp;quot; &amp;quot;. The resulting string is avli: in both cases. These rules also perform feature insertions. Rule n21 inserts feature values \[nominative\] and \[singular\] while n23 inserts feature values ~ccusative\] and \[singular_~ .</Paragraph>
    <Paragraph position="3"> The analysis of the form avli: is completed in 27 hundredths of a second (cpu time).</Paragraph>
    <Paragraph position="4"> As already mentioned the system is reversible. In order to generate all possible forms of avli: we apply all validation rules of the stem &amp;quot;avl&amp;quot; and thus we obtain:  The generation of all possible forms of avl-~:) is completed in 43 hundredths of a second (cpu time).</Paragraph>
    <Paragraph position="5"> As an example of processing of a verbal form we mention the analysis of 5e-s-ame &amp;quot;we tied&amp;quot; discussed in section 4.2.2 which is completed in 50 hundredths of a second (cpu time), while the generation of all possible forms of 5en-(o:) &amp;quot;to tie&amp;quot; is completed in i second and 59 hundredths (cpu time).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML