File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1036_metho.xml

Size: 14,251 bytes

Last Modified: 2025-10-06 14:11:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1036">
  <Title>DETECTING PATTERNS IN A LEXICAL DATA BASE</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
I INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> In previous papers it has been pointed out that ill a well-structured Lexical Data Has(. it becomes possible to detect automatical;y, an(l ~e evidence through interactlve queries a number Of morphologica\] , syntact.ic, or semant i~.</Paragraph>
    <Paragraph position="1"> relationships between lexical entries, .~uch ~lb synonymy, hyponymy, hyperonymy, der ivat ion, case-argument, lexical field, etc.</Paragraph>
    <Paragraph position="2"> The present article examines hyponymy, a.~ dI: example of paradigmatic relation, and what can b(. called &amp;quot;restriction or modification&amp;quot; relaLion, as a syntagmat ic relation, l-~y reSLl'iet Jell or modification relation, l mean that part of a so-called &amp;quot;aristotellan&amp;quot; definition which has tiJe function of linking th(~ &amp;quot;genus&amp;quot; and the &amp;quot;differentia specifica&amp;quot;.</Paragraph>
    <Paragraph position="3"> When evidenced in a lexicon, tile hyponymy relation produces hierarchical trees partitioniI*K the lexicon in many semant ica i ly coilerent subsets. These trees are not created once and for al i, but it is important that uhey are procedurally activated at the query moment.</Paragraph>
    <Paragraph position="4"> While evidencing the second relation considered, one can investigate as to whether it is possible to discover any correlation be~wneI* lexical or grammatical features in definitions and particular kinds of &amp;quot;definienda&amp;quot;, and thus try to answer questions such as the following: &amp;quot;Are there any connections between these restriction relations and ~he fundamental ways of definition, i.e. the criterial parameters by which people defines things?&amp;quot; For both relations, the paper presents the different procedures by which they are&amp;quot; automatically recognized and extracted from the natural language definitions, the degree of reliability of their automatic labeling, the use of these labels in interactive queries on the lexical data base, and finally the theoretical results of their implementation in a Machine-Dictionary.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="170" type="metho">
    <SectionTitle>
II THE LANGUAGE OF DEFINITIONS AS A SUBLANGUAGE
</SectionTitle>
    <Paragraph position="0"> 1 am trying to develop and exploit the idea of considering the language of dictionary definitions as a particular sublanguage within natural language. This perspective cannot obviously be adopted for subject matter restrictions in definitions, but only for the purpose of the text, i.e. the specific communicative goal. From this restriction on the purpose of the text, certain lexico-grammatical restrictions do result, which prove to be very useful.</Paragraph>
    <Paragraph position="1"> As to tile restrictions on tile lexical richness of definitions, these are not due to the fact that they relate to a specific domain of discourse, but only to the property of closure (although not satisfied at 100%') that the defining vocabulary should in principle be simpler and more restricted than the defined set of \]emmas, i.e. the former should be a proper subset of the latter.</Paragraph>
    <Paragraph position="2"> This kind of quantitative restriction on the vocabulary of definitions would not be of any interest in itself, if it were not accompanied by other kinds of constraints both on a) the lexical, and on b) the grammatical side.</Paragraph>
    <Paragraph position="3"> a) From the frequency list of the words used in definitions (about 800,000 word-occurrences, and 75,000 word-types), it appears in fact that some words have a much greater importance than in normal language, as evidenced by a comparison with the data of the Lessico di Frequenza della Lingua Italiano Contemporaneo (Bortolini et al., 1971). These are the defining generic terms  which are traditionally used by lexicographers, such as ACT, EFFECT, PERSON, OBJECT, WHO, PROCESS, CAUSE, etc. It is not by chance that these same concepts are of relevance in many Artificial Intelligence systems.</Paragraph>
    <Paragraph position="4"> b) Not only single words, or classes of words, are particularly relevant in the defining sublanguage. There are also lexical patterns and syntactic patterns which occur with great frequency, and which play a very special role in defining sentences.</Paragraph>
    <Paragraph position="5"> The combination of these constraints carl be and actually is very useful, when trying to exploit the information contained in definitions, and when transforming an archive of natural language definitions into a knowledge base.</Paragraph>
    <Paragraph position="6"> structured as a network. Some important parts of knowledge are in fact already retrievable in interactive mode from the Italian Lexica\] Data Base, which has recently been restructured.</Paragraph>
    <Paragraph position="7"> Analyses on large corpora of definitions, carried out on many dictionaries (Amsler. I')80; Calzolari, 1983a, 1983b; Michiels, Noel, 1')82) have in fact shown that the definitions sublanguage displays several regularities of lexJca\] and syntactic occurrences and patterns. These general lexica\] c\]asses and the classes of recurrent patterns can be more or less eusi\]y captured for instance by pattern-matching r. les. and if possible characterized with formal rules.</Paragraph>
  </Section>
  <Section position="4" start_page="170" end_page="171" type="metho">
    <SectionTitle>
II\] HYPONYMY RELATION
</SectionTitle>
    <Paragraph position="0"> Hyponymy is the most important relation to b(, evidenced ill a lexicon. Due tO it.% taxollom i {: nature, it gives the lexicon, when implemented, a particular hierarchical structure: its result is obviously not a tree, but many tangled hierarchies (Amsler, 1980).</Paragraph>
    <Paragraph position="1"> Instead of evidencing and labelling this relation by hand, I have tried to characterize it procedurally. The procedure which automatically coded (with a precision of more thah 90% calculated on a random sample of 2000 definitions) true superordinates in all the definitions (approx. 185.000 for \]03.000 iemmas). was based almost exclusively on the position of the &amp;quot;genus&amp;quot; term at the beginning of the definitional phrases, giving Nouns, Verbs. and Adjectives as superordinates of defined entries of the same lexical category. Ad hoc subroutines solved exceptional cases where a) quantifiers, or other modifiers preceded the genus term (e.g.</Paragraph>
    <Paragraph position="2"> aletta ---&gt; piccolo gruppo di Donne dietro l'angolo dell'ala), or b) more than one genus was present in the definition (e.g. Qssordore ---&gt; attutire, smorzarsi detto di suono), or c) a prepositional phrase, usually of locative type, was at the beginning of the phrase (e.g. piazzato ---&gt; nel rugby, calcio al pallone collocate sul terreno).</Paragraph>
    <Paragraph position="3"> Even though the first immediate purpose of this procedure is of classificationa\] nature, the ultimate goal is the extraction and formalization of the most relevant relationship between lexical items which is implicitly stored in any standard printed dictionary. It is in fact now possible to retrieve in the \]exica\] data base not only all the definitions in which any possible word-form appears, together with the defined lemmas (e.g.</Paragraph>
    <Paragraph position="4"> SUONO appears in 328 definitions), but also to retrieve on-line, if desired, only the definitions in which the given word-form is used as a superordinate, therefore with the list of its hyponyms (e.g. the same word SUONO is used as superordinate of only 65 words, i.e. of a subset of the preceding set containing MUSICA, RUNORE, SQUILLO, SUSSURRO, etc.~.</Paragraph>
    <Paragraph position="5"> The query-language so far implemented for the lexica\] data base permits therefore to retrieve information on this hierarchical relation.</Paragraph>
    <Paragraph position="6"> identifying on-line the a\]lowable interconnections within the entire lexicon. The links produced can he analyzed, evaluated, and, if necessary, interactive\]y corrected.</Paragraph>
    <Paragraph position="7"> From explorations on the trees thus obtained.</Paragraph>
    <Paragraph position="8"> we can also try Lo set up classes and subclasses of superordinates, on the basis of the upper nodes to which many other nodes are connected as descendants. Only as an example, the identification criterion for the noun-class &amp;quot;SET-OF&amp;quot; containing \]NSIEME, GRUPPO, COLLEZJONE, COMPLESSO. AGGREGATO. etc., among the set of noun-superordinates, is the fact that they are linked one to the other in the tree which results from querying the data base. Their hyponyms will obviously be for the most part collective nouns.</Paragraph>
    <Paragraph position="9"> The identification of word-classes like this one leads to the next step Jn the formalization of the hyponymy relation, which will consist in the insertion of a label indicating a semantic class to these sets of superordinates. It will thus be possible to retrieve, for example, all the nouns generically definable as &amp;quot;SET-OF&amp;quot;, independently of tile particular word denoting a set used in definitions. Since it is already possible to trace these chains of hyponyms going upwards or downwards for more than one level, one can immediately ask whether, for example, MASSERIA belongs to the set of collectives even if it is defined as HANDRIA, because MANDRIA is defined as BRANCO, which is in turn defined as INSIENE, which finally is one of the nouns belonging to the class &amp;quot;SET-OF&amp;quot;.</Paragraph>
  </Section>
  <Section position="5" start_page="171" end_page="171" type="metho">
    <SectionTitle>
IV RESTRICTION RELATION
</SectionTitle>
    <Paragraph position="0"> Even though some refinements are still required in order to improve the reliability of the automatic recovery of ISA-re\]ated terms chains, this kind of structural relation within the lexicon, that is hyponymy, is at a good stage of implementation in the Italian \]exica\] data base.</Paragraph>
    <Paragraph position="1"> Much still remains to be done as far as other very interesting rel at iouships bt~tween tile entries are concerned. I am now considering what could be called &amp;quot;restriction or modificatioi*&amp;quot; relation, since its purpose is to restrict or modify the meaning of the genus term. It is exemplified in the following definitions by the words in italics: stannJte ---&gt; calcopirite contenente stagno arricciolare ---&gt; modellare o \[ormo di rieciolo risonatore ---:&amp;quot; dispositivo otto o generaro risonauza I wish to evaluate what could be done with respect to this kind of relation, starting from the available definitional data. One of the first aims of this lexicologJcal rese;Irch is to analyze, by m~ans of computational tools. ;llld to use tile information ConLalned in tile dJ fl or,,nL definitional formats and suructures. &amp;quot;l'i~c implementaLion of a number of proc:eduros which convert the natural language information convey~,d by definitions into processable formals, made tlp by structured relational links between lexJcal items or classes of lexical items, i.~ nok Lakol; into consideration.</Paragraph>
    <Paragraph position="2"> These formals call be made ~raceable e.g. in all Information Retrieval system on definitions, like, the one actually implemented, on th,: entir., corpus, for the taxonomic part of the |exical structure. But these formatted re I ationa \] structures can also be used as starting points for a computationally exploitable reorgnnizat~on of the definitional content. (me, of the characteristics of the definitional sublanguage, i.e. the presence of recurrent patterns ( ,%uch as proprio di, relotivo o, prodotro do, originorio di, etc.), enables, at least in certain cases, to produce a constant mapplng from certain variable types of more frequently detected definitional phrases no constant underlying relationa! structures.</Paragraph>
    <Paragraph position="3"> Using rather simple pattern-matching procedures some classes and subclasse~ of definitions can be separated, and a small number of simpler types of definitions have already been converted into a formalized coded format also with regard to this restriction relation. A new virtual Relation is thus added to the original data base. The distinguished elements of a number of simple natural language patterns are mapped into some general structured information formats. Up to now, some of the definitions displaying the following restriction relations have been treated:  REL.FORM (e.g. o formo di) REL.PROV (e.g. provvisto di) REL.APT (e.g. otto o)  and the corresponding relational links generated. Among the lexical variants of REL.PROV there are fornito di, dototo di, munito di, pieno di, rlcco di, etc.; while REL.FORM groups the following variants of a different type: in \[ormo di, che ha (la) forma (di), di formo, di formo simile a (quella di), $otto forma dl, avente formo di, etc, It is thus possible, for example, to retrieve, among the 1271 definitions in which the word FORHA appears, only those defining something as &amp;quot;having the shape of something else&amp;quot;. The implementation of these links allows to produce another kind of partitioning within the lexical system, and permits to better investigate the internal structure of words.</Paragraph>
    <Paragraph position="4"> A procedure of the kind exemplified above, based on pattern-matching, is possible for a good number of definition types; for example, with a different formaL, for many adjectives: def , NP = Adj .... &gt;&gt; REL.X</Paragraph>
  </Section>
  <Section position="6" start_page="171" end_page="171" type="metho">
    <SectionTitle>
: VP :
</SectionTitle>
    <Paragraph position="0"> where several groups of definitions are found to share a common underlying structure in terms of the restriction relation involved, in spite of other lexical and syntactic differences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML