File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/e87-1012_metho.xml

Size: 13,856 bytes

Last Modified: 2025-10-06 14:12:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="E87-1012">
  <Title>A TOOL FOR THE AUTOMATIC CREATION, EXTENSION AND UPDATING OF LEXICAL KNOWI.F.nGE BA.~F-g</Title>
  <Section position="3" start_page="70" end_page="70" type="metho">
    <SectionTitle>
THE KNOWLEDGE LEVEL
</SectionTitle>
    <Paragraph position="0"> We used the knowledge representation system KRS (Steels, 1986) to implement the linguistic and lexicographic knowledge. KRS can best be viewed as a glue for connecting and integrating different formalisms (functional, network, rules, frames, predicate logic etc.). New formalisms can also be defined on top of KRS. Its kernel is a frame-based object-oriented language embedded in Lisp, with several useful features. In KRS objects are called concepts. A concept has a name and a concept structure.</Paragraph>
    <Paragraph position="1"> A concept structure is a list of subjects (slots), used to associate declarative and procedural knowledge with a concept. Subjects are also implemented as concepts, which leads to a uniform representation of objects and their associated information.</Paragraph>
    <Paragraph position="2"> KRS has an explicit notion of meaning: each concept has a referent (comparable to the notion of ~on) and may have a definition, which is a Lisp form that can be used to compute the referent of the concept within a particular Lisp environment (comparable to the notion of intcnsion). This explicit notion of meaning makes possible a clean interface between KRS and Lisp and between different formalisms.</Paragraph>
    <Paragraph position="3"> Evaluation in KRS is lazy, which means that new objects can always be defined, but are only evaluated when they are accessed. Caching assures that slot fillers are computed only once, after which the result is stored.</Paragraph>
    <Paragraph position="4"> The built-in consistency maintenance system provides the automatic undoing of these stored results when changes which have an effect on them are made. Different /nber/tance strategies can be specified by the user.</Paragraph>
    <Paragraph position="5"> At present, the linguistic knowledge pcrtain.q to aspects of Dutch morphology and phonology. Our word formation component consists of a number of morphological rules for afftxmion and compounding. These rules work on lexical representations (confining graphcmes, phonemes, morphophoncmes, boundary symbols, stress symbols etc.) A set of spelling rules transforms Icxical representations into spelling representations, a set of phonological rules transforms lexical representations into phonetic transcriptions. We have implemented object hierarchies and procedures to compute inflections, internal word boundaries, morpheme boundaries syllable boundaries and phonetic representations (our linguistic model is fully described in Dnelemans, 1987).</Paragraph>
    <Paragraph position="6"> Lcxicographic knowledge consists of a number of sorting routines and storage strategies. At present, the definition of filters can be based on the following primitive procedures: sequential organisation, (single-key) indexed-sequential organisation, letter tree organisation, alphabetic sorting (taking into account the alphabetic position of non-standard letters like phonetic symbols) and frequency sorting.</Paragraph>
    <Paragraph position="7"> Constructors can be defined using primitive procedures attached to linguistic objects. E.g. when a new citation form of a verb is entered at the knowledge level, constructors exist to compute the inflected forms of this verb, the phonetic transcription, syllable and morphological boundaries of the citation form and the inflected forms, and of the forms derived from these inflected forms, and so on rccursively. Our present understandi~ of Dutch morphophonology has not yet advanced to such  a level of sophistication that fully automatic extension of this kind is possible. Therefore, the output of the constructors should be checked by the user. To this end, a cooperative user interface was built. After checking by the user, newly created or modified lexical objects can be transformed again into 'frozen' records at the storage level. This happens through a translation function which transforms concepts into records. Another translation function creates a KRS object on the basis of a record.</Paragraph>
    <Paragraph position="8"> Figure 2 shows a KRS object and its corresponding record. This record contains the spelling, the lexical representation, the pronunciation, the citation form (lexeme) and some morpho-syntactic codes of the verb form werkte (worked). (Records for citation forms contain pointers to the different forms belonging to their paradigm, and information relevant to all forms of a paradigm: e.g. case frames and semantic information). The corresponding concept contains exactly the same information in its subjects, but through inheritance from concepts like verb-form and werken-lexeme, a large amount of additional information becomes accessible.</Paragraph>
    <Paragraph position="9"> werkte werklO@ wcrkle werken-lexeme 11210</Paragraph>
  </Section>
  <Section position="4" start_page="70" end_page="72" type="metho">
    <SectionTitle>
THE USER INTERFACE
</SectionTitle>
    <Paragraph position="0"> We envision two categories of users of our architecture: linguists, who program the linguistic knowledge and provide primitive procedures which can be used as basic building blocks in constructors, and lexicographers, using predefined filters and constructors, creating new ones on the basis of existing ones and on the basis of primitive linguistic and lexicographic procedures, and checking the output of the constructors before it is added to the dictionary. The aim of the user interface is to reduce user intervention in this checking phase to a minimum. It fully uses the functionality of the mouse, menu and window system of the Symbolics Lisp Machine.</Paragraph>
    <Paragraph position="1"> When due to the incompleteness of the linguistic knowledge new information cannot be computed with full certainty, the system nevertheless goes ahead, using heuristics to present an 'educated gue,s' and notifying the user of this. These heuristics are based on linguistic as well as probabilistic aata A user monitoring the o~put of the conswactor only needs to click on incorrect items or parts of items in the output (which is mouse-semitive).</Paragraph>
    <Paragraph position="2"> This activates diagnostic procedures associated with the relevant linguistic objects. These procedures can delete erroneous objects already created, recompute them or transfer control to other objects. If the system can diagnose its error, a correction is presented. Otherwise, a menu of possible corrections (again constrained by heuristics) is presented from which the user may choose, or in the worst case, the user has to enter the correct information himself.</Paragraph>
    <Paragraph position="3"> Consider for example the conjugation of Dutch verbs. At some point, the citation form of an irregular verb (blijven, to stay) is ~d~ to the system, and we want to add all inflected forms (the paradigm of the verb) to the dictionary with their pronunciation. As a first hypothesis, the system assumes that the inflection is regulax. It presents the computed forms to the user, who can indicate erroneous forms with a simple mouse click.</Paragraph>
    <Paragraph position="4"> Information about which and how many forms were objected to is returned to the diagnosis procedure associated with the object responsible for computing the regular paradigm, which analyses this information and transfers control to an object computing forms of verbs belonging to a particular category of irregular verbs. Again the forms are presented to the user. If this time no forms are refused, the pronunciation of each form is computed and presented to the user for correction, and so on. This sequence of events is illustrated in Figure 3.</Paragraph>
    <Paragraph position="5"> Diagnostic procedures were developed for objects involved in morphological synthesis, morphological analysis, syllabification and phonemisation. At least for the linguistic procedures implemented so fax a maximum of two corrective feedbacks by the user is necessary to compute the correct representations.</Paragraph>
    <Paragraph position="6">  ous forms are indicated (top left), second (and correct) try by the system (top right), presentation of the pronunciations of the accepted paradigm for checking by the user (down).</Paragraph>
  </Section>
  <Section position="5" start_page="72" end_page="72" type="metho">
    <SectionTitle>
CONSTRUCTING A RHYME DICTIONARY
</SectionTitle>
    <Paragraph position="0"> Automatic dictionary construction can be easily done by using a particular filter (e.g., a citation form dictionary can be filtered out from a word form dictionary).</Paragraph>
    <Paragraph position="1"> Other more complex constructions can be achieved by combining a particular constructor or set of constructors with a filter. For example, to generate a word form lexicon on the basis of a citation form lexicon, we first have to apply a constructor to it (morphological synthesis), and afterwards filter the result into a suitable format. In this section, we will describe how a rhyme dictionary can be constructed on the basis of a spelling word form lexicon in an attempt to point out how our architecture can be applied advantageously in lexicography.</Paragraph>
    <Paragraph position="2"> First, a constructor must be defined for the computation of a broad phonetic transcription of the spelling forms if this information is not already present in the MD. Otherwise, it can be simply retrieved from the MD.</Paragraph>
    <Paragraph position="3"> Such a constructor can be defined by means of the primitive linguistic procedures syllabification, phonemisation and stress assignment The phoncmisation algorithm should be adapted in this case by removing a number of irrelevant phonological rules (e.g. assimilation rules).</Paragraph>
    <Paragraph position="4"> This, too can be done interactively (each rule in the linguistic knowledge base can be easily turned on or off by the user). The result of applying this constructor to the MD is the extension of each entry in it with an additional field (or slot at the knowledge level) for the transcription. Next, a filter object is defined working in three steps: (i) Take the broad phonetic transcription of each dictionary entry and reverse it (reverse is a primitive procedure available to the lexicographer).</Paragraph>
    <Paragraph position="5"> (ii) Sort the reversed transcriptions first acOordin~ to their rhyme determining part and then alphabetically. The rhyme determining part consists of the nucleus and coda of the last stressed syllable and the following weak syllables if any. For example, the rhyme determining part of w~rrelea (to whirl) is er-ve-len, of versn6llea (to accelerate) el-lea, and of 6verwdrk (overwork) erk.</Paragraph>
    <Paragraph position="6"> (iii) Print the spelling associated with each transcription in the output file. The result is a spelling rhyme dictionary. If desirable, the spelling forms can be accompanied by their phonetic transcription.</Paragraph>
    <Paragraph position="7"> Using the same information, we can easily develop an alternative filter which takes into account the metre of the words as well. Although two words rhyme even when their rhythm (defined as the succession of stressed and unstressed syllables) is different, it is common poetic practice to look for rhyme words with the same metre.</Paragraph>
    <Paragraph position="8"> The metre frame can be derived from the phonetic transcription. In this variant, step (ii) must he preceded by a step in which the (reversed) phonetic transcriptions are sorted according to their metre frame.</Paragraph>
  </Section>
  <Section position="6" start_page="72" end_page="73" type="metho">
    <SectionTitle>
RELATRD ~CH
</SectionTitle>
    <Paragraph position="0"> The presence of both static information (morpheancs and features) and dynamic information (morphological rules) in LKBs is also advocated by Domenig and Shann (1986). Their prototype includes a morphological &amp;quot;shell' making possible real time word analysis when only stems are stored. This morphological knowledge is not used, however, to extend the dictionary and their system is committed to a particular formalism while ours is notation-neutral and unresuictediy extensible due to the object-oriented implementation.</Paragraph>
    <Paragraph position="1"> The LKB model outlined in Isoda, Also, Kamibayashi and Matsunaga (1986) shows some similarity to our filter concept. Virtual dictionaries can be created using base dictionaries (physically existing dictionaries) and user-defined Association Interpreters (KIPs). The latter are programs which combine primitive procedures (patmm matching, parsing, string manipulation) to modify the fields of the base dictionary and transfer control to other dictionaries. This way, for example, a virtual English-Japanese synonym dictionary can be created from English-English and FJlglish-Japanese base dictionaries. In our own approach, all information available is present in the same MD, and filters are used to create base dictionaries (physical, not virtual). Constructors are abeamt in  the architecture of Isoda et al. (1986).</Paragraph>
    <Paragraph position="2"> Johnson (1985) describes a program computing a reconstructed form on the basis of surface forms in different languages by undoing regular sound changes. The program, which is part of a system compiling a comparative dictionary (semi-)automatically, may be interpreted as related to the concept of a constructor in our own system, with construction limited to simple string manipulations, and not extensible unlike our own system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML