File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2074_metho.xml

Size: 11,262 bytes

Last Modified: 2025-10-06 14:12:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-2074">
  <Title>Morphological Analysis and Synthesis by Automated Discovery and Acquisition of Linguistic Rules</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 An Overview of XMAS
XMAS (Xpert in Morphological Analysis and
</SectionTitle>
    <Paragraph position="0"> Synthesis) is a learning system \[12,13,14\] which consists of a learning element (Meta-XMAS), a knowledge base (KB), and two inference engines of a morphological analyzer (MOA) and a morphological synthesizer (MOS). Figure 1 shows an overview of XMAS and its operational environment (parser and generator of natural language systems).</Paragraph>
    <Paragraph position="2"> The knowledge base contains a lexicon and a rule base. The lexicon has entries only for word stems. The rule base contains the morphological rules which are learned from training examples. TEACHER, a human trainer, traces the execution of XMAS and provides training examples and critiques for Meta-XMAS. Meta-XMAS, a machine learner, acquires linguistic knowledge by discovering, formulating, generalizing and specializing grammatical rules. MOA solves the morphological analysis problems by applying the rules. MOS accomplishes the morphological synthesis by applying the rules.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Description Languages
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Example Description Language
</SectionTitle>
      <Paragraph position="0"> The rule discovery procedure is initiated by training examples. Two kinds of training examples are used: generalization examples and specialization examples. The generalization examples are used to maintain the completeness of the rule base and the specializaition examples are used to maintain the consistency of the rule base \[3\]. A training example consists of a class name, two strings of before and after, and a critique-an optional tag, indicating if the example is a specialization example. An instance of a generalization example is (PLURAL &amp;quot;baby&amp;quot; &amp;quot;babies&amp;quot;). null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Rule Description Language
</SectionTitle>
      <Paragraph position="0"> The formalism for the representation of morphological rules is the string productions which are similar to if-then production rules \[14\]. A string production is a self-contained operator which consists of a left-hand side (LHS) and a right-hand side (RHS). It is symmetric and therefore applicable bidirectionally.</Paragraph>
      <Paragraph position="1">  The string production can accomodate not only graphemic patterns but also syntactic and phonological features \[1\]. A string production can have name(s) which is useful for direct access and maintenance of the rules.</Paragraph>
      <Paragraph position="2"> For example, the following rule is a string pro-</Paragraph>
      <Paragraph position="4"> where =y/ies% can be assigned to the name of the rule whose class name is PLURAL.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Automated Learning of
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Morphological Rules
4.1 Learning Procedure
</SectionTitle>
      <Paragraph position="0"> The procedure LEARNING (Figure 2) describes the top-level algorithm for rule learning in Meta- null XMAS.</Paragraph>
      <Paragraph position="1"> procedure L\]~ARNING(classname, before, after~ critique) features ~-- LEXICON(before) (mop, rulename) *- DISCOVER(before, ~fier) H 4-- FORMULATE(cI~ssname~ ration&amp;me, featnre6~ mop) matchset 4~- MA'~CH(rulename) if (matchset = {})</Paragraph>
      <Paragraph position="3"> Given a training example, Meta-XMAS searches the lexicon for the features of the word stem before. By comparing the before with after, a micro-operator (mop) and its name is constructed (DISCOVER). The mop is a description of the transformation procedure from before to after. From these, a hypothetical rule H is generated (FORMULATE). Then it checks if other rules with the same name already exist (MATCH). If yes, then the most special rule R is selected (RESOLVE) and H and R are generalized or specialized according to the critique (GENERALIZE or SPECIALIZE). If no, then the rule H is appended to the present rule base with the new rule name (CREATE).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Discovery of Rules
</SectionTitle>
      <Paragraph position="0"> The rule discovery procedure works as follows.</Paragraph>
      <Paragraph position="1"> First, the input strings before and after of the training example are compared and splitted by using the four pointers--/b, rb, la and ra. The pointers Ib and la move one grapheme at a time from the left end to the right, while the pointers rb and ra move from the right end to the left, until lb and la (rb and ra, respectively) point different graphemes.</Paragraph>
      <Paragraph position="2">  This results in dividing each string into the three regions (lwindow, main and rwindow) which represent the left context, main string, and the right context, respectively. The corresponding contexts of both strings are generalized to '=' symbol which means remains unchanged. The result is a micro-operator (mop). And then, a rule name is given to the mop by creating a symbol representing the transformation form. For the training example</Paragraph>
      <Paragraph position="4"> whose name is =y/ies%.</Paragraph>
      <Paragraph position="5"> Finally, a hypothetical rule for the given training ex0anple is generated by integrating the class name, the rule name, the features, and the mop.</Paragraph>
      <Paragraph position="6"> In our example, the following rule is generated (in a simplified form):  The acquisition process of rules is the iteration of the creation of new rules and the modification of existing rules. Modification is carried out by generalization on the one hand and specialization on  the other hand. Meta-XMAS uses the four kinds of induction rules in generalization \[10\]: 1. Variablization of constant: replace a grapheme by =.</Paragraph>
      <Paragraph position="7"> 2. Dropping AND conditions: decrement the contextual window size.</Paragraph>
      <Paragraph position="8"> 3. Adding OR conditions: 1talon the two grapheme sets.</Paragraph>
      <Paragraph position="9"> 4. Climbing generalization trees: climb the  path to the root of the tree.</Paragraph>
      <Paragraph position="10"> A concept is represented as a set of graphemes or features which can have an explicit name. The generalization tree \[11\] is a tree which describes the generalization relations between the concepts in the domain. Figure 3 shows a generalization tree for Korean graphemes.</Paragraph>
      <Paragraph position="11">  Generalization using the tree is a process of following the path to the root on the basis of training examples. This is useful in concept learning in a well-established domain. In general, however, the generalization tree is not given a priori. In this case the Meta-XMAS builds the trees for itself (see Appendix A). The two generalization strategies have their strengths and \]imitations. The first strategy needs more a priori linguistic knowledge, but it is better in learning efficiency. The second strategy guarantees to find detailed rules, but is less efficient in learning speed.</Paragraph>
      <Paragraph position="12"> If Meta-XMAS applies only the generalization schemes, the rules learned can be overgeneralized. So at any point (e.g. in case of false output) the rules should be specialized to avoid the overgeneralization or to handle the exceptions. Meta-XMAS accomplishes this automatically with the aid of specialization examples. The specialization process is a kind of rule creation procedure in which overgeneralization is checked and the overly generalized rule is recovered--by going down the generalization tree to the leaves or eliminating a grapheme from a concept, if necessary.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Morphological Analysis and
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Synthesis
</SectionTitle>
      <Paragraph position="0"> XMAS has two inference engines (or ruleinterpreters) of MOA and MOS. MOA, the morphological analyzer, solves the morphological analysis problms by applying the rules in the RHS-driven forward chaining manner \[14\].</Paragraph>
      <Paragraph position="1"> MOS, the morphological synthesizer, solves the morphologicM synthesis problems by applying * the rules in the LHS-driven forward chaining  manner (see Appendix B). More than one morphemes can be synthesized and analyzed by an instruction. If more than one rule is applicable in rule selection, then MOA/MOS selects the most special rule. So the rules in the rule base need not be ordered, making the maintenance of the rule base very easy.</Paragraph>
      <Paragraph position="2"> XMAS is implemented on a SUN workstation in a Common-LISP program. A variety of experiments were carried out with XMAS on deriva~ tional and inflectional phenomena in almost all of the Korean language and part of the English and the German languages \[17\]. The results were successful in the sense that Meta-XMAS learns the gralnmatical rules and MOA/MOS solves correctly the morphological analysis and synthesis problems by applying the rules. Part of the rules learned by XMAS are shown in Appendix C. A more efficient Korean version of XMAS, called MASK \[19\], was applied to the generation of Korean language in the English-Korean machine translation system KSHALT \[4\] and MASK is effectively working now.</Paragraph>
      <Paragraph position="3"> improved by the capabilities (1), (2), (4)and (6). The advantage (8) comes especially from (6) and (7). The property (9)is an indirect demonstration of XMAS approach as a practical approach. As for all rule-based systems, XMAS is less efficient in run time in comparison with procedural systems \[6\]. There is a way to improve the run-time efficiency. One can construct a rule compiler as the transducer in KIMMO systems \[5,7\].</Paragraph>
      <Paragraph position="4"> But XMAS is better in effectiveness in development and maintenance. In addition to the run-time efficiency, these factors should also be considered because, as was mentioned at the beginning, a natltral language processing system is a complex knowledge-based system which reo quires a long period of design, implementation, test, debugging, and extention time.</Paragraph>
      <Paragraph position="5"> In addition to being a practical knowledge acquisition tool, Meta-XMAS in isolation can also be used as a linguists' tool for scientific discovery \[8\] which aids linguists in the discovery and test of grammatical rules in morphology.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML