File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1038_metho.xml
Size: 10,945 bytes
Last Modified: 2025-10-06 14:11:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1038"> <Title>A GENERAL COMPUTATIONAL MODEL FOR WORD-FORM RECOGNITION AND PRODUCTION</Title> <Section position="2" start_page="178" end_page="180" type="metho"> <SectionTitle> 4. Two-level rules </SectionTitle> <Paragraph position="0"> There are only two representations in the two-level model: the lexical representation and the surface representation. No intermediate stages &quot;exist&quot;, even in principle. To demonstrate this, we take an example from Finnish morphology. The noun lasi 'glass' represents the productive and most common type of nouns ending in i. The lexical representation of the partitive plural form consists of the stem lasi, the plural morpheme I, and the partitive ending A. In the two-level framework we write the lexical representation lasiIA above the surface form laseja: Lexical representation: 1 a s i I A Surface representation: 1 a s e j a This configuration exhibits three morpho-phonological variations: a) Stem final i is realized as e in front of typical plural forms, i.e. when I follows on the lexical level, schematically: ~I (1) b) The plural I itself is realized as j if it occurs between vowels on the surface, schematically:</Paragraph> <Paragraph position="2"> c) The partitive ending, like other endings, agrees with the stem with respect to vowel harmony. An archiphoneme A is used instead of two distinct partitive endings.</Paragraph> <Paragraph position="3"> It is realized as ~ or a according to the harmonic value of the stem, schematically: back-V ...~~a (3) The task of the two-level rules is to specify how lexical and surface representations may correspond to each other. For each lexical segment one must define the various possible surface realizations. The rule component should state the necessary and sufficient conditions for each alternative. A rule formalism has been designed for expressing such statements.</Paragraph> <Paragraph position="4"> A typical two-level rule states that a lexical segment may be realized in a certain way if and only if a context condition is met. The alternation (i) in the above example can be expressed as the following two-level rule:</Paragraph> <Paragraph position="6"> This rule states that a lexical i may be realized as an e only if it is followed by a plural I, and if we have a lexical i in such an environment, it must be realized as e (and as nothing else). Both statements are needed: the former to exlude i-e correspondences occurring elsewhere, and the latter to prevent the default i-i correspondence in this context.</Paragraph> <Paragraph position="7"> Rule (i') referred to a lexical segment I, and it did not matter what was the surface character corresponding to it (thus the pair I-=). The following rule governs the realization of I: <deg> v--- v This rule requires that the plural I must be between vowels on the surface. Because certain stem final vowels are realized as zero in front of plural I, the generative phonology orders the rule for plural I to be applied after the rules for stem final vowels. In the two-level framework there is no such ordering. The rules only state a static correspondence relation, and they are nondirectional and parallel.</Paragraph> <Paragraph position="8"> 5. Rules as automata In the following we construct an automaton which performs the checking needed for the i-e alternation discussed above. Instead of single characters, the automaton accepts character pairs. This automaton (and the automata for other rules) must accept the following sequence of pairs: i-I, a-a, s-s, i-e, I-j, A-a The task of the rule-automaton is to permit the pair i-e if and only if the plural I follows. The following automaton with three states (I, 2, 3) performs this: (i&quot;) State 1 is the initial state of the automaton. If the automaton receives pairs without lexical i it will remain in state 1 (the symbol =-= denotes &quot;any other pair&quot;). Receiving a pair i-e causes a transition to state 3. States 1 and 2 are final states (denoted by double circles), i.e. if the automaton is in one of them at the end of the input, the automaton accepts the input. State 3 is, however, a nonfinal state, and the automaton should leave it before the input ends (or else the input is rejected). If the next character pair has plural I as its lexical character (which is denoted bY I-=), the automaton returns to state 1. Any other pair will cause the input to be rejected because there is no appropriate transition arc. This part of the automaton accomplishes the &quot;only if&quot; part of the correspondence: the pair i-e is allowed only if it is followed by the plural I.</Paragraph> <Paragraph position="9"> The state 2 is needed for the &quot;if&quot; part. If a lexical i is followed by plural I, we must have the correspondence i-e.</Paragraph> <Paragraph position="10"> Thus, if we encounter a correspondence of lexical i other than i-e (i-=) it must not be followed by the plural I. Anything else (=-=) will return the automaton to state i.</Paragraph> <Paragraph position="11"> Each rule of a two-level description model corresponds to a finite state automaton as in the model of Kay and Kaplan. In the two-level model the rules or the automata operate, however, in parallel instead of being cascaded:</Paragraph> <Section position="1" start_page="179" end_page="180" type="sub_section"> <SectionTitle> Lexical </SectionTitle> <Paragraph position="0"> ~. ~ representation..</Paragraph> <Paragraph position="1"> - Surface representation The rule-automata compare the two representations, and a configuration must be accepted by each of them in order to be valid.</Paragraph> <Paragraph position="2"> The two-level model (and the program) operates in both directions: the same description is utilized as such for producing surface word-forms from lexical representations, and for analyzing surface forms.</Paragraph> <Paragraph position="3"> As it stands now, two-level programs read the rules as tabular automata, e.g. the automaton (i&quot;) is coded as:</Paragraph> <Paragraph position="5"> This entry format is, in fact, more practical than the state transition diagrams.</Paragraph> <Paragraph position="6"> The tabular representation remains more readable even when there are half a dozen states or more. It has also proven to be quite feasible even for those who are linguists rather than computer professionals.</Paragraph> <Paragraph position="7"> Although it is feasible to write morphological descriptions directly as automata, this is far from ideal. The two-level rule formalism is a much more readable way of documenting two-level descriptions, even if hand compiled automata are used in the actual implementation. A compiler which would accept rules directly in some two-level rule formalism would be of great value. The compiler could automatically transform the rules into finite state automata, and thus facilitate the creation of new descriptions and further development of existing ones.</Paragraph> <Paragraph position="8"> 5. Two-level lexicon system Single two-level rules are at least as powerful as single rules of generative phonology. The two-level rule component as a whole (at least in practical descriptions) appears to be less powerful, because of the lack of extrinsic rule ordering. null Variations affecting longer sequences of phonemes, or where the relation between the alternatives is phonologically otherwise nonnatural, are described by giving distinct lexical representations. Generalizations are not lost since insofar as the variation pertains to many lexemes, the alternatives are given as a minilexicon referred to by all entries possessing the same alternation.</Paragraph> <Paragraph position="9"> The alternation in words of the following types are described using the minilexicon method: hevonen - hevosen 'horse' vapaus - vapautena - vapauksia 'freedom' The lexical entries of such words gives only the nonvarying part of the stem and refers to a common alternation pattern nen/S or s-t-ks/S: hevo nen/S &quot;Horse S&quot;; vapau s-t-ks/S &quot;Freedom S&quot;; The minilexicons for the alternation pat- null terns list the alternative lexical representations and associate them with the appropriate sets of endings:</Paragraph> </Section> </Section> <Section position="3" start_page="180" end_page="180" type="metho"> <SectionTitle> LEXICON nen/S LEXICON s-t-ks/S </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"/> </Section> <Section position="4" start_page="180" end_page="180" type="metho"> <SectionTitle> 6. Current status </SectionTitle> <Paragraph position="0"> The two-level program has been implemented first in PASCAL language and is running at least on the Burroughs B7800, DEC-20, and large IBM systems. The program is fully operational and reasonably fast (about 0.05 CPU seconds per word although hardly any effort has been spent to optimize the execution speed). It could be used run on 128 kB micro-computeres as well. Lauri Karttunen and his students at the University of Texas have implemented the model in INTERLISP (Karttunen 1983, Gajek & al. 1983, Khan & al. 1983). The execution speed of their version is comparable to that of the PASCAL version. The two-level model has also been rewritten in Zetalisp (Ken Church at Bell) and in NIL (Hank Bromley in Helsinki and Ume~).</Paragraph> <Paragraph position="1"> The model has been tested by writing a comprehensive description of Finnish morphology covering all types of nominal and verbal inflection including compounding (Koskenniemi, 1983a,b). Karttunen and his students have made two-level descriptions of Japanese, Rumanian, English and French (see articles in TLF 22). At the University of Helsinki, two comprehensive descriptions have been completed: one of Swedish by Olli Bl~berg (1984) and one of Old Church Slavonic by Jouko Lindstedt (forthcoming). Further work is in progress in Helsinki for making descriptions for Arabic (Jaakko H~meen-Anttila) and for Modern Greek (Martti Nyman). The system is also used the University of Oulu, where a description for Lappish is in progress (Pekka Sammallahti), in Uppsala, where a more comprehensive French description is in progress (Anette Ostling), and in Gothenburg. null The two-level model could be part of any natural language processing system.</Paragraph> <Paragraph position="2"> Especially the ability both to analyze and to generate is useful. Systems dealing with many languages, such as machine translation systems, could benefit from the uniform language-independent formalism. The accuracy of information retrieval systems can be enhanced by using the two-level model for discarding hits which are not true inflected forms of the search key. The algorithm could be also used for detecting spelling errors.</Paragraph> </Section> class="xml-element"></Paper>