File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1032_metho.xml

Size: 12,573 bytes

Last Modified: 2025-10-06 14:07:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1032">
  <Title>Morphological Rule Induction for Terminology Acquisition Bdatrice Daille</Title>
  <Section position="3" start_page="0" end_page="215" type="metho">
    <SectionTitle>
2 Linguistic properties of relational
adjectives
</SectionTitle>
    <Paragraph position="0"> Ac(:ording to linguistic and gralnlnaticM tradition, there are two nlain categories aUlOllg adjectives: el)ithetic slM1 as important (,sign'~ificant) and relatio11M adjectives such as laitier Malty).</Paragraph>
    <Paragraph position="1"> The tirst ones cannot \]l~ve an ~gentive interl)re null ration in contrast to the second: tile adjective laiticr (dairy) within the uoun phrase pr'oduclion laiti~re (dairy production) is an argument to the predicative noun production (production) and this is not the case fbr the adjective impof tant (significant) within the phrase production importante (significant production). Relational adjectives (RAdj) possess the following well-known linguistic properties: * they are either denonfinal adjectives -morphologically derived from a noun thanks to suttix--, or adjectives having a noun usage such as mathdmatique (mathcmatical/mathcmatics). For the former, not all the adjective-tbrming sufiqxes lead to relational adjectives. The following suftixes are considered by (Dubois, 1962) as appropriate:-ain, -air'e, -al, -el, -estr'c, ien,-icr',-il(e),-in,-ique. However, (Guyon, 1993) remarks that a suffix, even the most appropriate, is never necessary nor sufficient. Several adjectives carrying a favorable suffix are not relationah this is the case with the adjectives ending with -iquc (-ic), which characterize chemistry and which are not derived from a noun, such as ddsox!/ribonucldique (deoryribonucleic), dodecanoiquc Modecanoic), etc. Other suffixes inappropriate are sometimes used such as the suffixes -d and -e'a:~:: car-</Paragraph>
    <Paragraph position="3"> etc.</Paragraph>
    <Paragraph position="4"> * they own tile possibility, in special conditions, of replacing tile attributiw'~ use of a corresponding prepositional phrase. The preposition employed, as well as tile presence or not of a deternfiner, depends on the head noun of the noun phrase: aciditd sanguine (blood acidity) ~_ aciditd du sang (acidity of the blood) conqugtc spatiale (space conquest) ~_ conqu~tc de l'espace (conquest of space) ddbit horairc (hourly rate) ~- ddbit par' heure (rate per&amp;quot; h, our) cxpdrimentations animales (animal experimentation) ~ cxpdrimcntations sur lea animaux (experimentation on animals) * and several other properties such the impossibility of a predicative position, the illcompatibility with a degree modification, etc.</Paragraph>
  </Section>
  <Section position="4" start_page="215" end_page="216" type="metho">
    <SectionTitle>
3 Morphological Rule Induction
</SectionTitle>
    <Paragraph position="0"> ~lb identify RAdj trough a term extractor, we use their paraphrastic property which inchldes the morphological property, the morl)hological property being insufficient alone. We need rules to recover the lemma of the noun fl'om which the lemma of the RAdj has been derived.</Paragraph>
    <Paragraph position="1"> These rules tbllow the tbllowing schemata:</Paragraph>
    <Paragraph position="3"> S is the relational suffix to be deleted from the end of an adjective. The result of this deletion is the stem R; M is the mutative segment to be concatenated to R in order to tbrm a noun; exceptions list the adjectives that should not be submitted to this rule.</Paragraph>
    <Paragraph position="4"> For example, the rule \[-d -l-e \]{agd} says that if there is an adjective which ends with d, we should strip this ending from it and append tile string c to tile stem except if this a4jective belongs to tile list of exceptions, namely agd. We extract these mort)hological rules Kom the corpora following the method presented in (Mikheev, 1997) with the difl'erenee that we don't limit the length of the mutative segment. The relational suffixes are known, only the nmtative segments have to be guessed. For tlm lemma of an adjective ending with a relational suffix in the corpus Adji, we strip this suffix of Adji and store the resulting stem ill R. Then, wc try to segment this stein R to each noun Nounj at)pearing in the corpus. If the subtraction result in all non-empty string, the system creates a morphological rule where tile mutative segment is tile result of the subtraction of R to Nounj. We thus obtained couples (Ad.ii, Nounj) associated to a morphological rule. For example: (gazeux, gaz) \[-cux +&amp;quot;&amp;quot;\].</Paragraph>
    <Paragraph position="5"> This schemata doesn't take into account stem alternants such as: el6 alphabe t/ aph, abd t-ique ~/~ hygi~ ne/hygidn-ique e/i polle n/polli n-ique x/c th, orux / thorac-ique  In order to h~mdle this alh)mort)hy, we, use the Lcvenshtein's weighted distance (l,cvenshtcin, 1.966) which determines the minimum numl)e,r of insertions or deletions of characters to transform one word into another. (Wagner and Fisher, 1974) presents n re(:nrsive ~dgorithm to (:ah:ulate this dist~mcc.</Paragraph>
    <Paragraph position="6">  * ! &amp;.s t ('w ~,i, 'w~ ,j ) = min(di.st (w~ ,i-~, &amp;quot;&amp;quot;~ ,j) + q, * ~ di,~t(wi,i, with w~,m 1)eing the substring t)egimfing nt tlm 1l I'h' C\]I}II'}I, CtCI&amp;quot; }~ll(1 tinishing after tim mth characl;(;r of the word w, dis@c,y) = 1 i.f:c--y = 0 if :~: C/ y and  q cost; of the, inserl;ion/de, h',tion of one, character p cost of&amp;quot; t;he sul)stitution of one (:h;~racter |)y ~mothcr.</Paragraph>
    <Paragraph position="7"> Generally, a substitution is (:onsidcr(~,d as a dch~lion fi)llowed 1)y ;m insertion, thus I ) -- 2(1* Wc apply this alg()rithm to e,a(:h stem 1{, ()l)tahm(t ;d'te, r the (h~letion of tim r(~,lational suffix, that had not; 1)c(m found ~s a stem ()f n llOllll. \]~llt,, we add the constraint that l/. ~m(1 the n(mn must share the same, two first; characters, i.e. the sul)string comput(:d t)cgin at character 3. We only rel;~fin cout)les comi)oscd of ml ~uljectivc and a noun with it Levenshtcin's w(;ightcd e(tual (;o 3 (i.e. one sul)stitutiol~ + one insertion) . From the, se tout)los, wc dcdu(:c new rel;~tional suffixcs to l)c ~ulded to list; of ~dlowc, d sullixes. More, 1)re('iscly, we (:onsidcr theft such suffixes are, allomorphic w~rbmts of the relation suffixes. Wc also add new mort)hok)gic;d ruh',s. For cxample, for the couple (hygi&amp;t,c, hygidniquc,), we add the suffix -~niquc which is conside, red as an allomorph of the sutfix -iquc, mid creatc tim rule: \[-&amp;t, ique +&amp;~,e\]. However, this method doesn't rc,~ricve, RAdj lmilt from non ~mtonomous t)ascs of such nor from Latin noun 1)ases such as ph'r(,./patc/r (fathen'/patcr), vill@urb (tov,,,/,~rb).</Paragraph>
    <Paragraph position="8"> We check m~mmflly the rules ot)tained and ll,elational Number of Number Suffix allomorphs of rules  -al 3 5 -airc 4 8 -d 2 2 -d 1 2 -or 1 2 -cu:c 1 3 -ion 1 2 -i~:r 1 2 -if 2 6 - in 1 2 -iquc 8 18 -isle 1 1 -cite 1 1 Total 25 54 Figm:c l: Numl)er of varimlts mid rules 1)y rel;Ltiomd suffix added to the list; of cxccptions thc wrong (lcrivalions obtain(',(l. %d)\]c I prescnt, s tim 1mini)or of rules r(',t~dn(xt nn(t the mnubcr of v~riants fl)r (~(:11 suffix.</Paragraph>
  </Section>
  <Section position="5" start_page="216" end_page="218" type="metho">
    <SectionTitle>
4 Term Extractor
</SectionTitle>
    <Paragraph position="0"> First, we present the tcrm e, xtr~mtor ('hosen the, n, the modifications perfi)nn to enable the al)l)li('ation of the dcriw~tional rules.</Paragraph>
    <Section position="1" start_page="216" end_page="216" type="sub_section">
      <SectionTitle>
4.1 Initial Term Extractor
</SectionTitle>
      <Paragraph position="0"> ACAB\]T (\])ailh~, 1996), the term cxtra(:tor used ti)r this (!xt)(',rim(mt; eases I;he task ()f t;he, t;ernlino\]ogist l)y proposing, \['or ;~ given (:orl)uS , a, list of (:mldi(l~tc terms ranked, from the most rei)rcscnl;ativc of the domain to the lc:~sl; using a st~tistical score. Can(lid~tte terms whi(:h are cxtr;tctcd fl:om the corlms t)elong to a Sl)CCiM type of cooc(:m:rcnces: * the cooc(:urrcn(:c is oriented and follows the lincar ordcr of the text;; * it; is ('Oml)OS(,xl of two lexi(:al milts whi('h (lo not l)elong to the, (:lass of functional words such as prcl)ositions, articles, etc.;  and do accept several variations. Those which are taken into account are: 1. Inflexional and Internal morphosyntactic variants: * graphic and orthographic variants which gather together predictable inflexional variants: conservation de  p~vduit (product preservation), conservations de p'rvduit (product preservations), or not: conservation dc prod'aits (products preservation) and ease dif ferences.</Paragraph>
      <Paragraph position="1"> * variations of the preposition: eh, wmatographie en colonne (column chrwnatography), chromatographic sur colonne (chrvmatograph, y on col'area); * optional character of the preposition and of the z~rticle: fixation azote (hitrogen fization), fixation d'azote (fiz null ation of nitrogen), fi.~:ation de l~azote (fization of the nitrogen); 2. Intermfl modification variants: insertion inside the base-term structure of a modifier such as the adjective inside the Noun1 (Prep (Det)) Nom~2 structure: lair de brebis (goat's milk), lait cru de brebis (milk straigh, t .from the goat); 3. Coordinational w~riants: coordination of base term structures: alimentation humaine (human diet), alimentation animale et hnmaine (human and animal diet); 4. Predicative variants: the predicative role of  the adjective: peetinc mdthylgc (mcthylate pectin), cos pectines sont m6thyldes (these pectins are metylated).</Paragraph>
      <Paragraph position="2"> The corpus is tagged and lemmatized. The program scans the corpus, counts and extracts collocations whose syntax characterizes base-terms or one of their variants. This is done with shallow parsing using local grammars based on regular expressions (Basili et al., 1993). These grammars use the morphosyntactie information associated with the words of the corpus by the tagger. The different occurrences are grouped as pairs formed by lemmas of the candidate term and sorted following an association measure which takes into account the frequence of the COOCCtlrrOllCeS.</Paragraph>
    </Section>
    <Section position="2" start_page="216" end_page="218" type="sub_section">
      <SectionTitle>
4.2 Term Extractor modifications
</SectionTitle>
      <Paragraph position="0"> The identilication of relational adjective takes place afl;er extraction of the occurrences of the candidate terms and their syntactic variations. The algorithm below resmnes the successive steps tbr identifying relational adjectives:  1. Examine each candidate of Noun Adj structure; null 2. Apply a transtbrmational rule in order to generate all the possible corresponding base nouns. We added morphosyntactie constraints for some suffixes, such as tbr the suffix -er, that the identitied adjective is not a past-participle; 3. Search the set of candidate terms tbr a pair  step 2.</Paragraph>
      <Paragraph position="1"> 4. If step 3 succeeds, group the two base structures mlcter a new candidate term. Take out all the Noun Adj structures owing this adjective from the set; of Noun Adj candidates and rename them as a Nomt RAdj structure.</Paragraph>
      <Paragraph position="2"> I11 Step 2, morl)hoh)gical rules generate one or  several nouns tbr a given adjective. We generate a notllt for each relational suffix class. A class of suffixes includes the allomorphic variants. This overgeneration method used in inforlnation retrieval by (aacquemin and Tzoukermann, 1999) gives low noise because the base noun must not only be an attested for in the corpus, but must also appear as an extension of a head noun. For exanti)le, with the adjective ioniqne (ionic), we generate both ionic ('ionia) and ion (ion), but only ion (ion) is an attested tbrm; with the adjective gazeux (gaseous), the noun forms gaz #as) and gaze #auze); are generated and the two of them are attested; but, the adjective gazeux (gaseous) appears with the  llOllll dchange (ezch, ange) whi(:h is t)aral)hrased in the tort)us t)y dchangc de gaz (.qa.s ezchange) and not by ~.changc de gaze (gauze exehanftc). I,i)r adjectives with a mmn fimction, as for example pwbldmc technique (te.ehnical pTvblem) and Frobl&amp;nc de tech.nique~ (pwbh:m of tech7~,ics), we tl;tve ac(:el)ted th~tt ~t (:;m(ti(l~te term (:ouhl share several base stru(:tur('.s: on(; ()f type</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML