File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-2013_abstr.xml

Size: 6,380 bytes

Last Modified: 2025-10-06 13:46:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-2013">
  <Title>TOWARDS THE ORGANIZATION OY LEXICAL DEPINITIONS ON A DATABASE STRUCTURE</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
TOWARDS THE ORGANIZATION OY LEXICAL DEPINITIONS ON A DATABASE
STRUCTURE
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Niooletta Calzolari
</SectionTitle>
      <Paragraph position="0"> Istituto di Glottologia - Untversit~ di Flea, Italy Printed dictionaries are grest repositories of information, and it is important that they can be exploited as full~ as possible, with regard to all the different types of data they contain. This was one of the aims when organizing the Machine DPSotionax7 of the Italian language on a database structure.</Paragraph>
      <Paragraph position="1"> The design and organization of the iexical database for the first two relations implemented, i.e. the set of Le~s (106, 091) and the set of Word-forms (1,016,320), has been described in other papers (see for example Calzolari and CeC/~ oottPS, 1980).</Paragraph>
      <Paragraph position="2"> These two very large archives are maintaine~ continuous\].7 on-line and are interactively invoked through a query language whioh permits to the user to access, in transparent mode, the data, and to have his particular &amp;quot;view&amp;quot; of the data. The database concept and methodology give rise, in fact, to a radical change in perspective when confronted with sequential organization of data. We have a dynamic rather than a statio object which is flexible and easy to query, update, extend.</Paragraph>
      <Paragraph position="3"> This lexical database is now being extended by the insertion of lexioal definitions (185,899) and semantic data.</Paragraph>
      <Paragraph position="4"> The guiding principle behind this pro~ect is the C/onviction that the study of the defining vocabulary of an actual dictionary can provide a precious tool in the semantic analysis - 61 of a language (see Noel, 1981).</Paragraph>
      <Paragraph position="5"> The logical or~ni~-atlon of this definitional Infoz:ation is not a trivial task, and must be performed bearing in .~nd the goals to be achieved. It must in fact be possible to ~ve dlrect access to each and every piece of l~fo~tlon contained in the definitions. The significance of &amp;quot;piece of lnfo1~atlon&amp;quot; in this context is in ~Lreot relationship to the eventual use to be made of it. By &amp;quot;piece of information&amp;quot; inside the definitions, we intend not only the single woe-fezes, as they are written in the definitions, but also the lemma, to which every word-form is connected; moreover, at a further stage of analysis, the specific sense of every polysemio lemma in the particul~r context (context:definition) must be considereddeg null The logical or~isatton of the definitional part ot the database must, therefore, be structured to provide, for each word in every defiuition, direct access to: a) the word.form itself, with the associated information (morphological, usage level, etco); b) the lama to which the word-form pertains, with the associated information (part-of speech, variants, usage level other word forms ice. paradi~n); c) the specific sense of the le~madeg The implementation of a definitional archive thus requires an enormous task of dissmbi~ation at all the three levels: word-forms, lemmas and senses, in order to produce material which can be used effectively to extract semantic information fran the dictionary. ~ The first step in this direction ~s the lemmatization of the definitions themselvesdeg For this task,the other two archives of the database (the word-form and lemma~ archives) ere being used, together with ad hoo prooedu_~es, to produce an autorustic lemmatizatlon of a large percentage of the words conrained in the defi~itionsdeg For the other words, those for which automatic lemmatization has not 7et been achieved, a dissmbiguatlon strate~ has been developed in which the human - 62 operator works iuteraotively with the computer, and the computer can memorize choices on homographic forms as they are made.</Paragraph>
      <Paragraph position="6"> After lemmatization, each word is associated in the computer memory to the addresses of its word-form and of its lamina. Therefore, the definitions are organized in the memory not as actual strings of words, but as lists of addresses of word-forms and laminas. In this way, a number of important results are achieved: a) a great reduction in storage size; b) data types (addresses i.e. binary numbers) which are easily bandied by the computer; c) data which are strictly associated to the first two archives, ~ at the eventual construction of an integrated system; d) much more rapid data processing and direct accesses to each kind of data, in each position of the definition itself; e) the possibility of being able to immediately retramslate addresses into character strJ.ugs, and list of addresses into phrases, i.e. definitions; f) the possibility of correcting, updating and iuserti~ within the deletions.</Paragraph>
      <Paragraph position="7"> 0mly once this preliminsa-y stage has been completed is it possible to extract many kinds of semantic information from the diotionaxT. The memorized definitions have an internal logical structure which permits the construction of semantic chains (to evidence taxonomic relationships)and also of other types of semantic links (to evidence other types of semantic relationships, such as &amp;quot;part of', &amp;quot;set of', &amp;quot;in the form of', &amp;quot;apt to', etc.) between words in the lexicon. These chains and links, which can be not only displayed, but also handled by computer procedures in many different ways, surely provide a good starting point for the study of the semantic structure of the lexicon. In fact, it is hoped that the computerized d~cti0nary will offer a model of the Italian lexical system in the various aspects which can be associated with a lexicon (phonology, morphology, syntax i.e. verbal frames, - 63 lezioal semantics), TILts approach is included in the general theoretical viewwhich considers the lexicon as a central reference point both for language analysis and for many linguistic applications.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML