File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/c92-3151_abstr.xml
Size: 19,719 bytes
Last Modified: 2025-10-06 13:47:28
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-3151"> <Title>A SET-THmRErlC APPROACH TO LEXICAL</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> RESUME UNE APPRO~tE ENSEMBLtSIE DU ~ ~ </SectionTitle> <Paragraph position="0"> r,bus pr~ntms ~ ~a~ts d'm travan mm~ depuis ltt~mrs ~ par la sooi(~ ~DATA stria ~ s~mmtique du lexique ~ Ce travail a fait l'objet d'tm lmm~ ~ & ~ av~ le ~& h ~a delaT~ et ab(x~ attjourd'hui ~ un diaimrtm k~id de #us de ~mmo Le fitrC/ &quot;dk~tm~ amiog~&quot; lcgro~ des r6ali~s tr~s ~ l~tr nnee part, mus dvitms le mvail d~ Sisy~ parm ~ h ~ st~t2p~ poar mus omm~ sat h mummkm des dm~ s~anfio~ Aini, nnm dassm~ i~ l'aide de rat~mr, et selm reapmm hrmi~ ~ rrmm de fairs ~ qu~ ~ 6amines & langu~ s'amemt ~ rmam et ~ f~r dins rus~ um par~ invam~ de l'a~ d~ait d'tme fagon fmmi~ 1~ d/akmaire des dmCs l~aux: il est une stma~im & mots a d'w~ de n~ pm~t l'aspoct d'un gral~ t~ ~ Ixly~. A h brae de C/e gr41,~, nous tmu~ms des etmrbtes &quot;pfimitifs ~ : des etmri~ qui n'ont pas & ammam at~ que 1~ ~iC/timnaim ~out ml~. A 1'~ exI~ni~ se * umt les mo~s ciffmis par les emm/~ aaxqu~s ils al~afiena~ ~ir~ que par r~ de leurs ~o~ians d'~ Dam DIOOLOGIQUE, ks mx~s~ d'~ des ena~fl~ rfivemx d'tm n,~ae &quot;qmsi-d~ni~&quot; d'un mot. I1 exisle 9 types d'eram'~s: - Ome ~ &quot;Liste&quot; mmtx~ de nms a~t une ap,rala~ de sins et (~ ca~C/~ ~ (nora, v,~oe, atje~, ahcrbe).</Paragraph> <Paragraph position="1"> -Un ~ &quot;Classe&quot; des~ ~t ~ des am'~ims en ~n 0a mologie ~ar ex.).</Paragraph> <Paragraph position="2"> -Un msml~e &quot;Tma'es lids&quot;, au eontmu assez ~ de trams n'ay~nt pu dmmr lieu ~ h ca~on & lims dam un 'ql't~ dmr,6 -Un msmt~ q'hi~&quot; cap~ d'~ t~t b champ 1odin1 d'um raix Enae ~e, ilpm mmmir desthM~ - Lrn mmn~ &quot;Descdp~&quot; emli~ m cas de rt~ces~ amnitt~ - Un msm~ &quot;~&quot; qui ~ des mots dont les signifi6s ~mtxma un ~ trot sailk~ Les 6nonc6s math~matiques con'estxmd~t it des fonctionnalit6s du dictionnaim dlecU'otfique que nous avis iUustrfi Par des exemples empnmtfs l~ celui-ci : -~ de an:errs (veeoes ex#nmt &quot;faire mrlzr&quot; et &quot;o~ze', stlm,~ af~mt une &quot;c~e&quot; du &quot;Pape&quot;, symiymes de &quot;voler&quot; Imur des &quot;alzilles&quot;...). - Editim de lis~ ou de lt-,~res ( la lisle &quot;Pem~ atilera</Paragraph> <Paragraph position="4"> ~ le diakmmite des chan~ ~xicatx peut are consul~ ttitm-~ par on tmqimtmr htmmt Nous proms al~ le gdlam de sm ex#/ta~ par h n-mhine dbca~.</Paragraph> <Paragraph position="5"> mots, crtk~ par un expert humain, ddveloppmt un gtaphe de 4 000 000 de successions d'h6ri-tage que nous aa~omns sam cesse. Les outils de base que nous ~nsmfisons, tel le SEMIOGRAPHE pour la recherche decumentaire, nous Izrme~t d'~valuer la progression de la qualit~ des interpr6mtions que nous obtemms.</Paragraph> <Paragraph position="6"> Nous aaxitms mtre article par norm smfl~ de rmm~, ias du 00Llt~ des pamm~ fmOis on arang~ qui ~xtraim a~ nous aamger des tmC/~ st~ oes question~ AcrEs DE COLING-92, NANTES. 23-28 ^ol~'r 1992 9 8 2 PROC. OF COL1NG-92, NANTES, Aua. 23-28, 1992 A SEI'-~C APPROACTq TO I.EXICAL SEMANTICS D.DU101T, MEMODATA.</Paragraph> <Paragraph position="7"> ~stlRclagrilt~ We present the results of the work carried out over several years by the Memodata Company on the slructure of the French lexicon. This work has been accomplished thanks to a first research contract with the Ministry of Research and Technology and today has lead to a dictionary of more than 100 000 words and phrases grouped analogically and syr~ymo~y.</Paragraph> <Paragraph position="8"> If we understand quite well how a dictiolmry like this can be used with ease by humans, we set the problem of the identification of meaning by a computer. We will evaluate how Dicologique adds information eomplentaty to the information contained in semantic nets. Thanks to a somewhat unusual cons~on method and the systematic classification of words according to their meaning, we are progressing to a continuous system of localisalion of the meaning itself. On the map we created, it is possible to compute the meaning due to lexical semantics for any sentence written in natural language...</Paragraph> <Paragraph position="10"> The purpose of dictionaries grouped under the name &quot;analogical&quot; is always the ease of the passage from a word to an idea and the inverse passage from an idea to a word. This aim is reached by the make up of lists of stereotyped associations and of semantic fidds. The first aRnoach does not have the same likelihood of ending with a satisfactory result as the second.</Paragraph> <Paragraph position="11"> The stereotyped associations depend on the idea of time, the background and the experience of each individual. Their record can only be a track of the ~ve memory from the individual.</Paragraph> <Paragraph position="12"> On the other hand, the dictionary of semantic fields is perfectly workable at any time ; it is based on the linguistic ~x~aventicqrs that the language dictionaries have tried for ~nturies to record and to normalize.</Paragraph> <Paragraph position="13"> It is not possible, by definition, to consmact a dictionary made up of stereotyped ass~ations, whereas it is possible to work on the complexity of hundreds and t.housar~ of linguistic facts which we have classified.</Paragraph> <Paragraph position="14"> We will give a mathematical description of the dictionary of the semantic fields. This approach is in parallel with concrete examples derived from the database.</Paragraph> <Paragraph position="15"> 2) ~ Area ~Tm~ or Tr~ &quot;lhe dictionary of analogies and synonyms that has been set up is a structure of sets and words which the conceptual figure (1) shows. The objects &quot;words&quot; (shown by W,,) are represented in the reclhngles and the objects &quot;sets&quot; (shown by Ci) in the parallelograms.</Paragraph> <Paragraph position="16"> I ,I Fig l : G , ffle ~ mxtd of grc dktkrmy _ Lecture 1 coat.dr N) f ~C/&quot; b~tcau de pEda~ dC/ ~&quot; /,/pla~tnc,~ J = , , Leetare t =ppartmh&quot; it, ~t Inclns dtas Fig 2&quot; ~npb ofa ~ 2.1 ) MOVI~IG H~M Ti~ l.E~ To'n~RIOn&quot; When moving through Dicologique from the left to the right, we move from the general to the specific. a) Ikfalifions Imlast ta tbe nlnlim fmm llle Idt to the * Suplx~ we talc ~ a set aaatah~ in C,</Paragraph> <Paragraph position="18"> In o.~ exarr~, ~ A pmduees tbe syrmyrm for boa~ Becmm of their po~im on ~ gta# we win aim cmsid~them as lexical ~ * Suppose v~ take a fmc~n M(C0. M(G) oa~s the ~ofmmtsin a set C, (M(C0 = to) * Suplxm we take U(CO, the otrmm of the lexical field of G, i.e~ fl,~e set of~rds ~-aah~ in G. U(CO = {W~, wilh 1 <= u <= C/o mxl such that it exima set Q, ommmg wu ard m:h ff~t P(G, C9 > = o)} OCtal) = {bam~, na~ ~ ttm~, batmierL ~, ~ (day boat), U(ba~ t~ de paso), U(ba~ de #d,e de li~r,~)&quot;}.</Paragraph> <Paragraph position="19"> This ~ allows to edit, with thor muca~ c, without, 1444 veem cumm~ ~mined in ~ set &quot;emir&quot; (to dm~to a~r).</Paragraph> <Paragraph position="20"> Cann~ : acoxding to our ~ ~ funaim U(ki~ 2) ~uld ixov~ a result qui~ diffem~ fium ~ actml C/~akmy Dialog~. ~n fact &quot;~isum&quot; is a stmame with se~ml thamnds of words ard se~nal k~.ls of ~ sets we lme mt s'~wn k~ figa~e (2).</Paragraph> <Paragraph position="21"> b) l~peay d file ~ph dwi~t fmm fl~ese The existance of the function M(Ci) for all sets Ci infers that the sUucUnes of inclusion are without loops, i.e. there are sets which are not contah~ in any odm, r set but the set of the graph G (root node) itself.</Paragraph> <Paragraph position="22"> sas on/y o~d ~ ~ ~ ofa~ ~-a# G are C/) Semai~ an0 ~mati~ ~ ~ the setstmd words In ~the~ me 9 types of sets : - fot~ types of sets mrmd &quot;lists&quot;. They give the quasi synonyms, i.e. lists of words which are equivalent in ~ and identical in grammar. We have the following types of grammatical sets : noun, verb, adjective, adverb.</Paragraph> <Paragraph position="23"> - the set mn-~ &quot;dass&quot; A set of this type C/xmtains nouns which can be subsumed under the same concept.</Paragraph> <Paragraph position="24"> In our example in figure (2), &quot;bateau&quot; is a set containing on the one hand words which represent its ~ vahe.s (&quot;bateau', &quot;navire&quot;, &quot;embarcalion&quot;) and ca the ocher &quot;~&quot; of specific boats~ - the set mn~ &quot;mlazd ~xts&quot; ~y, the contents and utilizations of this type of set are rather various. We need it, for example, to ~t the link between &quot;baleinier&quot; and &quot;baleine &quot;13, which is not shown in figure 2 so as not to weigh down the graph.</Paragraph> <Paragraph position="25"> -tbe set mrmd &quot;tlxrne&quot; This set contains all the concepts and words associated in a particular semantic field. It may also contain ~ sets such as &quot;related words&quot; or smaller &quot;thetis&quot;.</Paragraph> <Paragraph position="26"> - the set mmed &quot;des:ripfm&quot; It contains the constituents organically ~ to a ~ It is only used when absolutely necessary for a definition.</Paragraph> <Paragraph position="27"> - the set nmmd&quot;~&quot; It mabsumes words having the same outstanding feature. For example, our set of class &quot;bateau 16ger de #che&quot; could be found under a set characterized by the feature &quot;small&quot; which differenciates this class from other classes of boats.</Paragraph> <Paragraph position="28"> As for the words, we have provided them with the usual characteristics, i.e. their morphological classes (grammar) and their usage labels (colloquial, archaic, literary ...) which contain the labekine: ~le ACRES DE COLING-92. NANTES. 23-28 nofrr 1992 9 8 4 PRoc. OF COL1NG-92, NANTES. AUG. 23-28. 1992 usual information associated to each word in every ~ficfionary.</Paragraph> <Paragraph position="29"> ~ Use ~e p~m d~ia~s ~ t~o~k~e Moving through the dictionary from the general to the particular is a process widely put into practice by users who may either search a precise term to be discovered by intersection of associated concepts or intend to edit a lexical field or a classified list. ~ ofana~_ The logic of the sets takes into account the logical &quot;and&quot;, &quot;or&quot; and negation. Here are some examples of ~arches which are always based on the intersection of sets edited by the function U(Cj) : - Search ofapnxfise mm Search of the name of the &quot;coiffure du Pape t4''. The intersection of &quot;Pape l-s&quot; (theme of 162 words) and &quot;chapeau ~6&quot; (list of 180 words) or &quot;couronne m (theme of 33 words) produces the Wol~ds &quot;\[Jal~ 18&quot;, &quot;calott~ 19&quot; et~. in 10 seconds on the micro eompuler.</Paragraph> <Paragraph position="30"> - Search of v~ds *o extmms aa klea Search of the verb to express the idea of &quot;faire to~0. and &quot;couper 21&quot;. T~ intersection of the two lists of corresponding verbs converges on alxx~ 20 words (abattre, d6~apiter, ~ter, ~brancher .. 3.</Paragraph> <Paragraph position="31"> - Sea~ of syrmyrm acemting ta a ctm~xt The synonym of &quot;voler 2v' such that the meaning is more suitable for a bee. The words &quot;bufil'ler 23&quot; and &quot;voltige~ '24&quot; ale immediately produced. Fd~n ot/m/za/~rds It laimpally has two ~ms .&quot; - to search among veay wide lists for the tram which help ~ get al~xise kiea For example, the list of verbs &quot;penser2S&quot; contains about .500 ~ verbs that allow to move confinously through the whole field concerned.</Paragraph> <Paragraph position="33"> This is especially interesting for the sets containing predetermined taxonomies.</Paragraph> <Paragraph position="34"> &quot;lhe edition of the set of class &quot;animals&quot; presents the scientific taxonomy of the animal world. Atxmt 4100 indexed animals can be visualized in a ~ucture of 500 classes.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 ~HC}MT~ TOT~L~T </SectionTitle> <Paragraph position="0"> This is the opposite of the previous wore It corresponds to moving from the particular to the general.</Paragraph> <Paragraph position="1"> a) l~alflims lldad~l to lhe ~ f~m file l'~t ~o fl~e left A set named Cj may beincluded in 1 to f~ sets C~ (i going from 1 to ~, f~ being the number of &quot;parents&quot; (main sets) of Cj). This function permits, therefore, to move upward (towards the root node) in the structure of the sets. In figure (2), the set &quot;bateau de l~he de plaisance&quot; is defined by the set of sets h = {{bateau de p~zhe}, {bateau de phisan~}}.</Paragraph> <Paragraph position="2"> A terminal word W. can belong to 1 to I sets Ci (i going from 1 to I, I being the number of &quot;parents&quot; (main sets) of W,).</Paragraph> <Paragraph position="3"> In Dicologique the direct questioning of a word gives, as in all dictionaries, the &quot;(quasi)-definition&quot; oftbe word.</Paragraph> <Paragraph position="4"> tm-aW~ &quot;tm~ de ptkt~e&quot; and the set of linl~,~ds &quot;lxdeim&quot;. b) Pwpmim ame G gra~ we affmed t~s It exists for every non primitive object of G, 1 to E series of connections which link it to one of these primitives.</Paragraph> <Paragraph position="5"> Allsmesof~franobjecttal~ tog~a&quot; amCia~ ~ M~anee H of ttis objeet. c) ~e of me previom d~m~s h~ nku~ique The table (3) represents the result of a IXn't of the search of the polysemotts word &quot;abattm&quot;. We have limited the reproduction of the result to the polysemous zone only. The left cohimn shows the sets containing &quot;abattxe&quot; directly. The colum in the middle, the type of set concerned. The fight column shows the number of eleraents in the set conoerned.</Paragraph> <Paragraph position="6"> In the first place the elements of the above table lead to the following searches which correslx~ to moving from the left to the right: * Sea~ of ~ of&quot;~me&quot; wilh the mear~ ~ The edition of the ~ set &quot;d~mtire&quot; (L) produces the 262 verbs which constitute the set &quot;d~tnm&quot; in alphabetical order.</Paragraph> <Paragraph position="7"> * Smreh of ~ of&quot;abme&quot; wilh lhe mearmg of &quot;a,4~ am'f~e ~-a,eC/.</Paragraph> <Paragraph position="8"> We apply the logical function &quot;AND&quot; to these sets of verbs and about 20 verbs are prtxtueed. 7 to 8 verbs will be left if we add the list &quot;tuer&quot; as a supplementary constraint. The processing lakes 4 to 5 seconds.</Paragraph> <Paragraph position="9"> M~ r,m~ aema~.a~ a, ~ ~,ra~ it is rxmt~ to enla~e the idea of&quot;danm&quot; to ~he sets whida irdtde it The struetut~ edition of 1400 words contained in the &quot;changer&quot; list takes less lhan a minute on the micro eomputer.</Paragraph> <Paragraph position="10"> The motion from the partieular to the general offers much fewer functions than the inveg~e motion. At the very most it is used to locate. Often the consultation of Dicologique is motivated by the search of synonyms. The edition of the contents of the terminal nodes of figure 3 appears to be largely enough for human users.</Paragraph> <Paragraph position="12"> But the position of the computer is very different : if humans possess the structures necessary for interpreting the terms, i. e. the linguistic heritage and the knowledge of the world, the computer for its part possesses neither of them. This is why we try to supply it with the lexical knowledge absolutely necessary in a coherent system.</Paragraph> <Paragraph position="13"> Obviously this knowledge is situated in the inheritance from the &quot;primitives&quot; to the words. Let's see an example in which the motion from the right to the left is applied to a problem of automatic indexing in information retrieval.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 An an~a6m to ~Jt~m~i~ rnrlt'val </SectionTitle> <Paragraph position="0"> We want the system to retrieve the lexieal key elements of the following small piece of text : &quot;The accident, on Friday, took place in foggy weather. The two ears that cmsbed into each other caused a pile up of about 50 vehicles on the congested national dual carriageway.&quot; Strategy for resolution : The strictly lexical analysis of this short passage win be based on the calculation of the surface cormslxmding to each cxmeept (set) covering one or more words of this document : the more abstract a concept, the bigger its an'face.</Paragraph> <Paragraph position="1"> For e~ setq we ~ aerrea ~ earar'~ M(Q) and i~ a~ in com0adsm wi~ ~ ~fimia've r~ci, q3, with Ci as a iximiti~ wd considering a specifie tms~ f~a ~ gen~ ~o ~utar.</Paragraph> <Paragraph position="2"> supple v,e ~e Max r,(ci, cj) the nmimma ck~ of the graph wha~ver j might be. We know, flora experm~ that a~is rmximma a~a is attair~ in detafl~ entanerafiom of ~lrds whieh have very pmcise n',mings and de,igr~,xm~ thin~ ACRES DE COLING-92, NANTES. 23-28 AOt)r 1992 9 8 6 PROC. OV COLING-92. NANTES. AUG. 23-28, 1992 We define the ftmaim (Cj) to nxasme the dimme surface accounting for its h series of connections. 2) In reality A(Cj) takes into account a complementary pieoe of information : the semantic characteristics of sets. A set of the type &quot;list&quot; introduces a more stretched arc than a set of the types &quot;theme&quot; or &quot;related words&quot;. 3) If Dioologique is a general dictionary capable of resolving problems of information retrieval referring to general language, it is very easy to adjust it to a precise problem (for example, the thesaurus of a Slxcific undertaking). One only has to stretch t~da arc situated on the passage of the series of conneclions of each term of the thesaurus. The surfaces S, which are situated in a norm mferenee, describe a map of meaning on which all the continuous calculations of Euclidian geon~try are made possible.</Paragraph> <Paragraph position="3"> To resolve our problem, it is possible to keep the mathematical expressions very simple : a simple Each word of the document is recognized in the dictionary as it activates all the sects containing it according to their specific weights S(Q) which depend on their series of C/x~r~xfions.</Paragraph> <Paragraph position="4"> Finally, each set will have been activated k times. analysis will ~ into aocount as the most relevant set the one which presents the smallest ra~o of S(Q)& s(cg& treasu~ ~e wdght of~ cma~t in ~e texL Our ~ ~a~x~ ~ t~h-~ng of a hm~ in ~ wi~ the concela~ sm wh~ ~ ~e text givm in ~ esan'~e : 1 deg : car ; 2 deg : a~:lent ; 3 deg :road.</Paragraph> <Paragraph position="5"> The olher sets have negligible weights.</Paragraph> <Paragraph position="6"> The complete analysis (but useless) takes 5 mhmtes on a compatible ~.</Paragraph> <Paragraph position="7"> We tree ~ wilh the ~ of the ~malts of our ~'k on lhe ~'an6c smmtae of the lexi~ We think it might be ~ ifwe add a ~ ofthe t~txxt v,e use f~r ~ the map of mm~ ~e Ime</Paragraph> </Section> </Section> class="xml-element"></Paper>