File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1020_metho.xml

Size: 11,459 bytes

Last Modified: 2025-10-06 14:11:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1020">
  <Title>Towards a Dedicated Database Management System for I)ictionaries</Title>
  <Section position="3" start_page="0" end_page="93" type="metho">
    <SectionTitle>
3. The inadequacy of general purpose DBMS
</SectionTitle>
    <Paragraph position="0"> General purpose DBMS - be they relational or whatever - do not live up to the formulated requirements for a real-time dictionary database. On the one hand, they are in many areas much too powerful for the task at hand, i.e. they can be adapted to a wealth of problems which have nothing to do with dictionaries. This flexibility ensues both a relatively low level of abstraction and a massive overhead. On the other hand, general purpose DBMS are not powerful enough; for example, a relational data definition language (DDL) provides no transparent means to express morphological processes.</Paragraph>
    <Paragraph position="1"> 4. The design of the dedicated DBMS The design of the dedicated DBMS put forward in \[Domenig 1986\] follows the ANSI/SPARC 3-Schema-Model. As shown in Fig. 1, it assumes that three different interfaces are needed: * A linguist interface with which the conceptual schema is defined, i.e. the structure and the consistency rules of the database.</Paragraph>
    <Paragraph position="2">  * A lexicographer interface for the editing of entries. * A process interface for the real-time question-answering service in the message-switching environment.</Paragraph>
    <Paragraph position="3">  From the point of view of the software-design, the most complex part of this conception is the linguist interface with the DDL and its compiler. All the other parts of the system depend very much on it because of its far-reaching dedication. We will therefore concentrate on the linguist interface and the DDL in this paper. The principal guidelines for their definition have been the following: deg The syntax of the DDL should be intelligible for linguists.</Paragraph>
    <Paragraph position="4"> * The linguist interface should be interactive and give some leeway for experiments in order to test different morphological strategies.</Paragraph>
    <Paragraph position="5"> The proposed solution foresees the implementation of the system on a high-performance workstation. It includes multiple window technology with pop-up menus for monitor- and manipulation functions as well as incremental compilation. Some brief examples: q-he top-level window of the interface looks as follows (if we assume that we have seven dictionaries):  If the linguist wants to define the conceptual schema of the Danish dictionary he selects - with a mouse - the according string on the screen, whereupon a second window is pasted on top of the existing one:  Identically to the top-level window, this window is unalterable, i.e. all the dictionary schemas consist of four different definition parts, an alphabet-, a type-, a grammar- and a structure-definition (the structure-definition is represented by the keyword root). If the linguist wants to edit one of the definition parts, he again selects the according string:</Paragraph>
    <Paragraph position="7"> In contrast to the two top-levels, this window can be edited. We will not go into the function or the syntax of the alphabet-definition as both are quite trivial. As might be inferred from the name, this is the place where character sets and the like are defined (because the system discerns a lexical and a surface level, some metacharacters denoting morphological classes etc., the character set is not quite as trivial as might be imagined at first glance). If something is entered into this window, the according string in the window above appears henceforth with an icon (1:3) behind it:  In a similar fashion the other three definition parts of the conceptual schema can be defined: The type definition comprises name- and domain-specifications of all but the string-typed features allowed in the database. We will not go into its syntax here either.</Paragraph>
    <Paragraph position="8"> The grammar definition contains morphonological rules which mediate between the lexical and the surface level. We have adapted their concept from Koskenniemi (\[Koskenniemi 1983, 1984\]), whose formalism has been widely acknowledged by now, especially in the US (at SRI \[Shieber 1984\], University of Texas \[Karttunen 1983\], by M. Kay of Xerox etc.). A few examples:  rule: +/&lt;CI&gt;&lt;--&gt;\['I#\]C*V&lt;CI&gt; V where &lt;CI&gt; = {b, d,f, g, l, m, n, p, r, s, t} example 3: surface-'i' for lexieai 'y' rule: y/i&lt;--&gt; C +/=^\[il a\]</Paragraph>
  </Section>
  <Section position="4" start_page="93" end_page="94" type="metho">
    <SectionTitle>
AVV +/Oe,
</SectionTitle>
    <Paragraph position="0"> where &lt;C2&gt; = {CP; CP in AV &amp; CP in A{c, g} } example 5: surfaee.'y' for lexieal 'i' rule: ily &lt;--&gt; _ elO +/0 i The structure definition is at least syntactically the most complex part of the conceptual schema. It contains an arbitrary number of hierarchical levels which define a collection of so called lexical unit classes (luclasses) on the one hand, irregular entries (luentries) on the other. The fundamental ideas behind it are: * Entries which obey the same morphological rules should be grouped into classes so that those rules have to be specified only once.</Paragraph>
    <Paragraph position="1"> * Entries which are too irregular to fit into such a class should be defined as irregular. The boundary between regularity/irregularity should be defined by the database manager (linguist) and hence be unalterable by lexicographers. Irregular entries are therefore defined in the conceptual schema (the interactivity of the interface, the powerful editing functions and the incremental compilation provide for the feasibility of this approach).</Paragraph>
    <Paragraph position="2"> The consequence of this approach is that the structure definition consists of a set of luclass-definitions on the one hand, a set of luentry-definitions on the other. In order to facilitate the management of the members of these sets, they are organized in a hierarchical structure, whereas the criteria for the hierarchy are some of the features which qualify the sets. Syntactically, this looks e.g. as follows:  This window defines one hierarchical level (the top) of the organization of the luclasses and luentries respectively. The meaning of it should be quite obvious if we leave out del \[\] and gease \[\] and concentrate on the case-distinction enclosed in the square brackets: The features Cat:N, Cat:V,.. are defined to be distinctive for certain subsets out of the collection of luclasses and luentries. Note that the names of the attributes and values are entirely arbitrary (they must be defined in the type-definition, of course). Subordinate levels of the definition are again abstracted by icons (node U), i.e. they are defined and viewed in separate windows:  \[{VCat:REG, node } \[ {VCat:IRREG, node \[\]}\] end dcase end node {Cat:INT, node D}\] end dcase end root  In the leaves of this tree the keyword node is replaced by either luclass or luentry. Their syntax is almost identical, so let us just give an example of an luelass-definition:  Apart from the strings transL7 and gcaseC?, the meaning of it should again be quite obvious. In prose we might summarize it as follows: All entries of this class m'e nouns of a certain subclass - the features Cat:N .... denoting this qualification are specified on the path from the root to this leaf - and within this subclass a zero-morpheme attached to the stem is interpreted as one of the following alternatives of feature sets:  {Case:AKK, Nl,mber:PL}. The string Fenster acts in tiffs definition mainly as an illustrative example, i.e. it has no conceptual function and may be replaced by all nounstems belonging to this class. Conceptually speaking, the definition therefore specifies all the inflectional fmxns of this noun class. The consequence of this is that lexicographers have to enter only the stems of words, the inflections are defined in the system. Together with some additional language constructs, the regularitics of morphology can thus be quite thoroughly grasped. The additional constructs are: o a fommlism with approximately the power of a context-free grammm' for compounding and deriwltion which allows the combination of different luclasses and luentries.</Paragraph>
    <Paragraph position="3"> o a formalism for the specification of stem-alterations (e.g. German Umlaut).</Paragraph>
  </Section>
  <Section position="5" start_page="94" end_page="95" type="metho">
    <SectionTitle>
50 Coilclusioh
</SectionTitle>
    <Paragraph position="0"> The impml:ant difference of this approach compared to other systems is the definition of morphological phenomena in the conceptual schema of the DBMS itself. This conceptual schema can be easily compiled into a redundancy-optimized internal schema. This in turn provides for two things: first for an efficient real-time access to the lexical units etc., second for very comfortable monitor- and manipulation-functions for the linguist interface. For example, it is trivial to implement functions which generate all forms which are associated with certain features or combinations thereof.</Paragraph>
    <Paragraph position="1"> It is equally easy to test the impact of complex rules, be they grammar-rules of the Koskenniemi-style or difficult to handle compounding roles (implemented by the fommlism which is similar to a context4ree grammar), q%e most intriguing quality of the internal schema, however, is probably that it enables the database manager (linguist) to alter the morphological strategies dynamically, i.e. to experiment with them.</Paragraph>
    <Paragraph position="2"> This is possible, because the system always knows which syntactico-semantic features and which morphological rules have to be associated with the different classes of entries; whenever those associations -- you could also call them consistency rules - are altered, the system can determine whether the entries belonging to the according classes lose or gain information, whether the alteration is legal etc.. We do not want to go further into those consistency problems as we haw; not really explained them in this summary. We would like to stress, however, that we consider their integration in the DBMS a major' advantage and necessity as they autonomize the whole system. Apart from the possibilities for experiments they facilitate tim integration of existing machine-readable dictionaries, again, because the system always knows which kind of inRmnation is distinctive and which is mandatory for which class of entries.</Paragraph>
    <Paragraph position="3"> Summarising we could say that the kind of morphology supported by the DBMS is rather a traditional one, i.e. the biggest eftort has been spent on truly regular phenomena like inflection. For compounding and derivation the offered choice is either a full implementation (--&gt;redundancy) or the rather dangerous * - potentially overgenerating -. formalism resembling a context-free grammar. It has to be stressed that we conceive this system as a prototype which will probably be subject to some alterations in tim ft, ture. q he proposed software-design is accordingly tuned, i.e. it relies on the availability of powerful software tools (EMACS, LEX, YACC, LISP etc.) nmning in a UNIXenvironment. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML